I Built a Voice Agent for My Portfolio. V1 Was Embarrassing.

The Recruiter I Couldn't Reach

Last week, LinkedIn Premium told me an ElevenLabs recruiter had visited my profile. It wouldn't tell me who. That anonymous-visitor notification is the closest LinkedIn will come to hinting that an opportunity just brushed past, with no thread to pull.

Fine. Can't DM her back. But the next voice AI recruiter who lands on my portfolio could get something more interesting than a text-based bio.

The question worth asking: what would keep a voice AI recruiter on the page for five more minutes? The answer was obvious, given who I was targeting. Voice.

If you're recruiting for a voice AI company, the most interesting thing a candidate's portfolio can do is talk to you. Not describe voice work. Not link to a demo. Actually talk. An AI clone trained on my background that a recruiter could ask anything about the work and get a real answer from.

The build took forty-eight hours. V1 shipped. V1 was embarrassing.

What V1 Did Wrong

The first version of the voice agent talked. That part worked. ElevenLabs cloned my voice accurately enough that my partner thought she was overhearing me on the phone from another room.

Everything else about it was wrong.

V1 interrupted. It would start responding the moment someone paused to think, which in a conversation about technical work means constantly. A recruiter would ask a complex question, take a beat to frame a follow-up, and the agent would jump in with "Great question!" and start monologuing.

V1 ignored emotional tone. Someone saying "huh, interesting..." with the rising inflection that means "I need a minute to process" got the same response as someone saying "Huh. Interesting." flat, meaning "convince me." The agent couldn't tell the difference.

V1 didn't handle silence. If the person on the other end went quiet for more than three seconds, the agent would start filling the air with backup chatter. "So, to expand on that..." It was desperate. Same way a junior SDR is desperate on a first sales call.

V1 also failed gracefully at being interrupted. Say the agent was mid-way through explaining a project, and the recruiter cut in with "wait, just the Mapbox part." V1 would finish its prepared sentence first, then respond. By then the recruiter had lost the thread. The agent was performing, not talking.

V1 was, in other words, a robot. Specifically the kind of robot people roleplay in sketch comedy. READ THE ROOM, ROBOT.

It was embarrassing the first time someone who wasn't me tested it.

The Architecture That Made V2 Work

V2 took another forty-eight hours and fixed four specific things.

Turn-taking with silence detection

V2 treats silence as part of the conversation, not an error state. The agent now waits for an explicit end-of-turn signal (natural speech cadence falling off, extended pause past a threshold, or a clear question-mark intonation) before responding. It also tolerates silence from the other side. If a recruiter goes quiet for eight seconds, the agent doesn't panic. It waits.

That change alone made the agent feel more human than anything else. The turn-taking was the tell.

Tone-aware response framing

The agent now does a quick tone read on the incoming audio before deciding how to respond. Not sentiment analysis in the generic sense. Three specific modes: exploratory (rising inflection, trailing off into "interesting..."), skeptical ("wait, but..."), decisive ("OK so what about..."). Each mode gets a different response shape.

Exploratory means the agent shouldn't pitch. It should expand, offer an angle, wait. Skeptical means the agent should concede the concern specifically, not reach for a reassurance. Decisive means the agent can go direct.

This is the part that matches what a good human does on a sales call. V1 treated every input as a prompt to answer. V2 treats inputs as conversational beats to match.

Bidirectional interruption handling

V2 also handles being cut off. If the recruiter interrupts mid-sentence, the agent drops what it was saying, acknowledges with a short beat ("yeah, go"), and responds to the new thread. It doesn't finish the paragraph it was building. That one fix changed the feel of the whole conversation.

A context window with the right information in it

The context the model sees isn't a resume. It's a layered set of material: a short bio, a list of actual shipped projects with specific build details, a set of objection-handling notes ("no CS degree, yes, and here's what's shipped despite that"), and a running summary of the current conversation.

The objection-handling block was the unlock. A recruiter asks "have you worked with voice models at scale?" The honest answer is "not at scale, but here's what I've built with ElevenLabs, what it taught me, what I'd do differently next." Without the objection-handling context, V1 would either overclaim or underclaim. V2 lands in the honest middle.

That's context engineering applied to a real conversation: give the model what it needs to give honest answers, and it will.

Why ElevenLabs for the Voice Clone

V2 runs on ElevenLabs' voice agent platform, with Claude as the reasoning layer and a knowledge base I built to hold the context. Speech-to-speech is native to the platform, so the listen/think/respond loop isn't something I stitched together from pieces. ElevenLabs handles the voice choreography. Claude handles the reasoning. The knowledge base is where the context lives: bio, projects, objection-handling notes, everything the model needs to sound like me with specifics.

Could have gone with OpenAI's voice agent stack, or built from scratch with Whisper plus a custom TTS pipeline. Went with ElevenLabs because the voice clone was the emotional anchor. You don't trust a conversation about someone's background if the voice sounds nothing like them. And because most of the hard conversational engineering (turn-taking, barge-in, natural cadence) was already solid on their platform. The work for V2 was on top of that foundation, not underneath it.

One unexpected delight along the way: hearing your own voice say things in a language you don't speak. Had the agent do its intro in Japanese once, for a friend, and spent twenty minutes replaying it. The voice clone is convincing enough that it feels like overhearing yourself in a parallel life. Cheap reminder that good tools reshape what feels possible.

What Shipping V1 Taught Me

A lot of AI building advice says "don't ship embarrassing things." That's the opposite of what's actually useful.

Shipping V1 was the thing that made V2 possible. V1 showed exactly what was broken by letting someone real try to talk to it. Three friends tested the agent before anyone else saw it. All three stopped mid-sentence the first time it interrupted them. All three made the same face. That face was the spec for V2.

Iterating in isolation, guessing at what was wrong, would have taken a month. Instead: V1 shipped in two days, collected three rounds of honest reactions in another day, and V2 shipped two days after that. One week from "what if" to something a recruiter could engage with.

The second lesson was about what "done" means. V1 felt done the moment it worked end-to-end. That was a lie I told myself to get something shipped. Done means a stranger can use it without my explanation. V1 failed that test instantly. V2 passed it on the third try.

The ugly middle is the thing. V1 that embarrasses you is the prerequisite for V2 that doesn't. Trying to avoid the embarrassing version skips the only feedback loop that pays off.

Where the Voice Agent Lives Now

It runs on my portfolio at teamvince.com. Anyone can trigger it from the homepage. Trained on my background, my shipped work, and a set of objection-handling notes I keep updating as real conversations happen and I notice where it lands flat.

The ElevenLabs recruiter never came back, by the way. That's fine. The tool wasn't built for her specifically. It's built for the next recruiter, the one after that, and every curious person who wants to know what a voice-first portfolio feels like before the rest of the story gets explained.

If you're hiring for voice AI work or you want to see what a conversational portfolio does differently from a static one, go talk to it. It handles skeptical questions better than most humans do on a cold first call.

One More Thing

Every build worth shipping comes with a public build log. The voice agent was no exception: embarrassing V1, explicit feedback from three humans, deliberate V2, shipped within a week.

That rhythm is what the course teaches. Every other week I'll post another one of these.