How to Use Descript to Edit a Podcast by Editing Text

March 24, 2026 · Editorial Team · 6 min read · descript podcast-editing ai-audio

Traditional audio editing requires you to find a mistake by ear, locate it on a waveform, trim it precisely, and move on to the next one. For a 45-minute interview with 12 separate "um" moments, that process takes a while. Descript turns that around entirely: you edit the transcript, and the audio changes to match.

If you deleted the phrase "you know what I mean" from the transcript, the audio clip containing those words disappears. If you move a paragraph in the text, the corresponding audio moves too. Once you've worked this way, going back to waveform-first editing feels primitive.

Getting Your Audio Into Descript

Create a new project, then drag your audio files directly into the composition. Descript accepts MP3, WAV, M4A, and most common formats. For a standard podcast with a host and one guest, you'll likely have either a single stereo file from a recorder or two separate mono tracks from a remote recording setup (Riverside, Zencastr, Squadcast, and similar tools all export per-track files).

Import both tracks separately if you have them. Descript handles multitrack compositions, and keeping tracks separate gives you much more control later over volume balancing and speaker-specific processing.

Transcription starts automatically after upload. Accuracy depends on audio quality and accent. For clean studio audio with North American or British English, Descript's AI transcription is usually 95% to 98% accurate out of the box. For heavy accents, technical jargon, or background noise, expect more corrections.

The transcription takes roughly 1 to 3 minutes for a 45-minute file, sometimes faster. While it's running, you can set up speaker labels by clicking on the "Untitled Speaker" labels and typing names. Descript will ask you to confirm voice assignments; once it learns who is who, it applies the labels automatically throughout the transcript.

Transcript-Based Editing: The Core Workflow

Read through the transcript like a document. When you find a section you want to cut, select that text and press Delete. The audio is gone. That's really it for basic cuts.

A few things to know that make this more precise:

Gap handling: When you delete a word or phrase, Descript leaves a tiny gap marker in the timeline. By default, it trims the gap when you export. You can also choose to leave gaps as natural pauses or shorten them manually.

Highlighting before deleting: Before you delete anything, hover over the selected text to see the waveform preview at the bottom. This confirms you're cutting what you think you're cutting, which helps when the transcript has a mis-transcription nearby.

Undo is solid: Descript has deep undo history. Don't be afraid to make cuts and then undo a few if the pacing feels wrong.

For a typical hour-long interview podcast, the first editing pass usually involves:

Removing the pre-show small talk at the top
Cutting any long tangents that don't serve the episode's topic
Trimming obvious stumbles mid-sentence
Cutting the post-show wind-down at the end

The second pass is filler word removal, which Descript handles with a dedicated feature.

Removing Filler Words Automatically

Go to Actions in the toolbar and select Remove Filler Words. Descript scans the transcript for "um," "uh," "like," "you know," and similar phrases, highlights them, and presents them for batch review.

You can review each one individually (recommended for your first few projects) or batch-approve them all at once (fine once you trust the detection accuracy). The individual review mode shows you a one-second preview of the cut so you can hear whether removing that "um" creates an awkward silence or cuts too close to the next word.

One pattern I've noticed: filler words at the end of a sentence, right before a natural pause, are almost always safe to remove. Filler words in the middle of a thought, where the speaker was genuinely searching for a word, sometimes need more careful handling because the pause after them is part of the natural rhythm.

The tool catches about 80 to 90% of filler words automatically. The rest you find by reading and listening, which you'd do anyway in a thorough edit.

Studio Sound: Fixing Bad Audio

Studio Sound is a one-click audio enhancement feature. You'll find it in the track settings panel on the left, or by right-clicking the track and selecting Apply Studio Sound.

What it actually does: noise reduction, de-reverb, and frequency shaping to make the recording sound like it was done in a treated room. It's not magic, but it's genuinely good for podcast audio recorded in a home office or bedroom.

Results vary by source material. I tested it on a recording made in a kitchen with some background fan noise. The fan disappeared almost completely. The voice retained its natural quality. On a recording made outdoors with wind, the result was noticeably processed and a bit unnatural. Studio Sound works best when the original problem is room noise and reverb, not wind or competing voices.

Apply Studio Sound per track, not to the mixed output. If you apply it to a mixed stereo file where the guest's audio is already baked in with yours, it processes everything together and can introduce artifacts. Separate tracks give it cleaner input to work with.

Overdub: Fixing Mistakes With Synthetic Voice

Overdub is the feature that feels like cheating, in the best possible way. After you upload at least 10 minutes of your voice (Descript recommends 30 minutes for best quality), you can generate new words in your voice and drop them into the edit.

This is useful for two specific scenarios: correcting a factual mistake ("I said March 14th but it was March 17th") and patching a stumbled sentence you couldn't re-record. You type the corrected text in the transcript, select it, and click Regenerate with Overdub. Descript replaces the original audio with the synthetic version.

Quality is very good for short corrections (one sentence or less). For longer synthetic passages, there's sometimes a slight flatness compared to your live recording, particularly on longer sentences with natural emotional variation. Use it for patches, not for whole sections.

You need a paid plan for Overdub. The voice training is done once, then reusable across all your projects.

Multitrack Editing

If you imported two separate tracks (host and guest), you'll see them stacked in the composition. The transcript view shows both speakers interleaved with color coding.

When you select text from one speaker and delete it, only that track's audio is affected. The other speaker's audio stays in place. This is the correct behavior for removing someone's interruption, for example, without cutting the other person's response.

Volume balancing across tracks is done in the track panel on the left. Each track has a level slider. For a typical podcast, a good starting point is to set both voices to the same perceived loudness (listen with headphones and trust your ears over the numbers). If one guest recorded on a laptop mic and sounds thinner, boost their track 2 to 3 dB and apply Studio Sound.

Exporting the Final Episode

When the edit is done, go to Publish or Export at the top. For podcast distribution, you want Audio Export, not video.

Export settings to pay attention to:

Format: MP3 for most podcast platforms. WAV if you're delivering to a client or doing further mastering.
Bit rate: 128 kbps is the minimum most platforms accept. 192 kbps is safer for stereo music-heavy content. For voice-only podcasts, 128 kbps sounds fine.
Sample rate: 44.1 kHz is standard for podcast audio.

Descript also offers a direct publish integration with several podcast hosts. It works, but I prefer the manual export so I have a local file as backup before anything goes live.

Check the final export by listening to the first and last 30 seconds, and spot-check two or three cuts in the middle. Make sure no gaps are too abrupt and the audio levels are consistent throughout.

Editing a podcast through the transcript changes the job from audio engineering to writing editing. If you're comfortable reading and cutting text, you can produce a clean episode in a fraction of the time it takes with a traditional DAW. The workflow rewards careful reading more than technical skill, which is the right trade-off for most podcast producers.