What Happens to Your Audiobook Between the Booth and Your Inbox

Apr 24

A look at the seven phases of production behind every finished hour of audio, and where you come in at the end.

My audiobook narration FAQ says approximately six hours of work for every finished hour of audio. Let’s walk through what those six hours actually mean.

Maybe you’re considering hiring me. Maybe you already have and want to understand what’s happening on my side. Either way, this is the full picture. No phase skipped and nothing hidden.

The short version: a finished audiobook is the output of a multi-pass process. I narrate. I self-proofread the audio against your manuscript and re-record the corrections. I edit and master through a custom processing stack I built with a professional audio engineer. A dedicated proofer compares the audio against your manuscript word by word for accuracy, pacing, pronunciation, and character voice consistency. Then I record final pickups, remaster, and you get the final review. Each phase exists for a specific reason, and understanding those reasons makes the whole process easier to navigate, especially when it’s time for your own review at the end.

Phase 1: Prep Read

Before I record a word, I read your full manuscript. Not skim, read. This is where I plan character voices, research pronunciations, internalize the emotional arc, and flag every word I’ll need to confirm with you before recording.

For speculative fiction, the flag list gets long. Invented names, worldbuilding terminology, phonetic spellings that don’t map to standard English phonics, and incantations or spell words. All of it needs to be confirmed against your vision before I step into the booth. This is why the pronunciation guide matters so much and why authors who provide a thorough one protect their own story. A narrator guessing at a name is a narrator who introduces errors that the proofer has to catch later, stretching the production timeline. A narrator with a clear guide ideally gets it right the first time.

My book editing background sharpens this prep read. I notice continuity threads, dialogue patterns, and character voice consistency markers that shape how I’ll perform your book, even though the manuscript itself is locked at this point.

Phase 2: Recording

I record continuously and use a finger snap to mark retake points. When I catch a mistake mid-read, I snap my fingers to mark my place in the recording, start again from a clean point, and continue. (Some narrators use a dog clicker for the same purpose; I use my hands.) This way, marking a retake doesn’t pull me out of the emotional state of the scene, and I know exactly where to pick up again as cleanly as if I’d done punch-and-roll. The listener never hears the retakes. When I get to editing, the snap tells me exactly where to cut, and the corrected take is already there waiting.

Narration is cognitively expensive. I’m reading ahead, performing the current line, keeping each character’s voice consistent, managing my breath, landing the right pronunciation, and saying exactly the word on the page. Even experienced narrators produce several misreads per finished hour, because sometimes your brain auto-completes a grammatically plausible word that isn’t the one on the page. This is the cognitive reality of the work. The production pipeline exists specifically to catch and correct these.

For scale: a ten-hour audiobook might take thirty to forty hours of booth time. “Just read it out loud” is not what narration is.

Phase 3: Narrator Self-Proof

Before any files move on to technical editing, I do my own proofing pass, listening to the manuscript to catch as many of my own errors as I can: dropped words, substitutions, and mispronunciations I didn’t catch in the moment.

By the time the files leave my hands for the proofer, your audio has already been through one quality pass. The raw recording is messier than anyone downstream ever hears.

Here’s the honest part: I can’t catch everything in my own work. The brain fills in what it expects to hear because it knows the text. That’s not carelessness, it’s how attention works when you’re inside the material. Phase 5 exists because an outside ear doesn’t have that disadvantage.

Phase 4: Technical Editing

I worked with a professional audio engineer to build a custom processing stack for my voice: EQ, compression, limiting, and processing tuned specifically to my voice and my recording environment. I run every audiobook through that stack and handle all editing and mastering myself.

The cleanup pass handles mouth noise, clicks, plosives (the popped P’s and B’s), sibilance (harsh S sounds), and breath spikes. I manually match room tone across chapters recorded on different days. I tighten the gaps between sentences without flattening the natural rhythm of the performance.

This also includes checking that every retake splice gets smoothed. When I correct a mistake mid-sentence, there’s a seam between the original take and the corrected one. It’s my job to make every seam inaudible.

Mastering is what brings audio up to retail spec: the loudness, balance, and consistency that lets your book play cleanly across headphones, car speakers, and phone speakers alike.

After the editing and mastering work is done, I do a verification pass before anything moves on to the proofer. The stack is consistent, but consistent isn’t the same as right for every passage. Sometimes a loud emotional line gets over-compressed. Sometimes the stack cuts a breath when it was intentional. My ear is the safety net for my own technical work, just as the proofer’s ear is the safety net for the manuscript-to-audio match. Two QC passes, two different jobs—both real.

Engineering is a different craft from narration. I’m a narrator, which is exactly why I invested in working with a professional audio engineer to build a stack tuned to my voice. That way, the technical side of the work serves the performance rather than fighting it.

Phase 5: Proofing

A dedicated proofer listens to the edited audio while following the manuscript, hunting for any textual errors that survived my self-proof.

What they’re catching: dropped words, substituted words (“will” when the manuscript says “while”), added words, skipped lines, mispronunciations, singular-plural mismatches, wrong verb tenses, and anything where the audio doesn’t match the text. They also catch pacing notes and character voice drift across the book.

On those pacing and character voice catches in particular: the proofer is the only person in the chain who experiences my performance the way a listener will, from start to finish, in real time, without already knowing what’s coming. They’re not reviewing your book; the book is the reference. They’re reviewing my take on your book. That includes pacing moments where my delivery didn’t serve the scene the way the surrounding chapter set it up to, or character voice choices that drifted from the established direction. Those notes don’t get treated like misreads. I take them as input, listen back to the section, and decide whether a pickup serves the listener better than the original take. Sometimes I agree and re-record. Sometimes I trust the original choice and we leave as is.

Some proofers use AI-assisted tools (like Pozotron) that compare the waveform against the manuscript algorithmically, catching objective mismatches with high accuracy. The AI pass is then verified and supplemented by a human ear. I do not use any AI pass unless you’ve explicitly requested it. Either way, the output is a corrections log, a document listing every error with chapter, timestamp, what the manuscript says, and what I actually said.

This is the phase that protects your words. Every “the” that got dropped, every name that got slightly mispronounced, every verb tense that slipped—this is where it gets caught. It’s also the most labor-intensive quality step in the entire chain, and the one that matters most for textual accuracy.

One thing worth knowing: ACX’s own quality review checks technical specs (volume levels, noise floor, file format) but does not perform manuscript-to-audio proofing. An audiobook can pass ACX QC while containing dozens of misreads. The textual accuracy of your audiobook falls entirely on the narrator and their team. This is where mine gets defended.

Phase 6: Pickup Recording and Final Mastering

I receive the corrections log, set up the microphone, and re-record every flagged error. These are called pickups, and each one needs to match the original performance in tone, energy, pacing, and room sound, even if weeks have passed since the original session.

Voice matching is a real skill. I listen to the original audio around each correction point, get back into that exact vocal placement and emotional register, and record the corrected line. Then I splice it in seamlessly. Some pickups take thirty seconds. Some take ten minutes of attempts to get the match right. This is why “can you just fix that one word?” is never as simple as it sounds. Every correction requires rebuilding the exact performance context of the surrounding audio.

Once pickups are recorded and spliced in, I remaster the affected files for the final product: audio that meets retailer technical specifications. For ACX/Audible, that means RMS levels between -23dB and -18dB, peaks below -3dB, a noise floor below -60dB, plus specific requirements for chapter head and tail silence, file format, and naming conventions.

Phase 7: Author QC (Where You Come In)

After delivery, you have a review window (10 business days per our agreement) to listen to the mastered files and flag any remaining issues.

What to listen for: mispronunciations of character names or places that only you would catch, because you invented them. Technical problems like audible clicks or gaps. Anything that pulls you out of the story as the person who knows it best.

What author QC is not: a second pass at direction. The artistic approach (tone, pacing, and characterization) was approved during the fifteen-minute sample phase (or, if you purchased the Premium Checkpoint Review Service, throughout production). Author QC is for catching genuine problems, not for rewriting performance choices you already signed off on.

The format that makes corrections fast: chapter, timestamp, text as written, what was said, and why it needs fixing. Clean, specific, one submission. My delivery guide includes a QC Corrections Log spreadsheet built for exactly this, and I strongly encourage using it.

Audiobook QC corrections log template—spreadsheet for flagging misreads, mispronunciations, and audio issues. Columns: chapter, timestamp, manuscript text, audio text, notes. — Audiobook QC corrections log template—a spreadsheet for flagging misreads, mispronunciations, and audio issues, with columns for chapter, timestamp, manuscript text, audio text, and notes.

Here’s why specificity matters so much: by the time your book reaches you for QC, I’ve listened to it three or four times already, and the proofer has been through it word by word. We’ve lived in your book for weeks at this point. A note like “something sounds off in chapter 12” requires someone to re-listen to the entire chapter and hunt for the problem. A note like “Chapter 12, 14:55 said ‘gripped’ but manuscript says ‘grabbed’” takes thirty seconds to address. The cleaner your log, the faster your corrections get done and the less time everyone spends hunting.

What You Can Do to Make the Process Smoother

These come directly from what I’ve learned in production:

Finalize your manuscript before the recording starts.

Changes after the recording starts cost time and money. I’m not being rigid. Every change requires a re-record, a re-edit, and a remaster of the affected section. A manuscript that’s truly record-ready is the single biggest thing you can hand me.

Provide a thorough pronunciation guide.

I’ll need a character and pronunciation guide before recording begins. If one isn’t ready, I’ll prompt you for it before we start. But the more complete you make it before that prompt, the better protected your invented terminology is. Every invented word, character name, place name, and unusual term with phonetic spelling and, if possible, a short audio clip of you saying it. Invented terminology is the largest category of narrator errors in speculative fiction, and a good guide prevents most of them before they happen. I need to know how your world sounds, and you’re the only person who truly does.

Trust the production timeline.

When I say fourteen to seventeen weeks for a ten-hour book, it’s because that’s a realistic multi-phase production pipeline with multiple correction rounds. Rush delivery compresses every phase, reducing the number of quality checks and leading to more errors in the finished product. Quality takes the time it takes.

Understand that narrator errors are normal, not negligent.

Several misreads per finished hour is the cognitive reality of the work, not a measure of care. The production pipeline exists specifically to catch and correct these before they reach you. That’s what you’re paying for.

Why I’m Showing You All of This

My brand promise is that it’s your story and I’m the vessel bringing it to listeners’ ears. This is how I keep that promise.

The prep read exists so I know your world before I live in it. The recording approach exists so your listeners hear a performance, not a struggle. The self-proof, technical editing, proofing pass, and pickup cycle exist so that what listeners hear is exactly what you wrote. The mastering exists so your book sounds the way a retail audiobook should sound. Author QC exists so you get the final word on your own story.

My job is interpretation, performance, and the technical craft that delivers them clean. The proofer’s job is to make sure no word slips between manuscript and audio. Your job is to provide clear source material and honest, specific feedback during QC. When all three roles hold their own weight, the result is an audiobook that sounds like your book was always meant to be heard out loud.

That’s what I’m working toward, every time.

— Ashley

Audiobook NarrationBefore You HireProduction Process

Ashley Duongtran https://www.toecurlingtales.com