Not Every Podcast Has a Transcript - Now I Don't Care

· 6 min read
Not Every Podcast Has a Transcript - Now I Don't Care
Photo by Anshuman Anand on Unsplash

For podcasts I co-host, transcripts already exist. Riverside generates them automatically, we import those to Transistor, and I have a skill that imports them directly into my Obsidian vault.

For podcasts I just listen to, sometimes there's nothing. No transcript, no structured notes, no way to search what was said.

I wanted the same workflow for both: give a URL, get a note with a summary, key takeaways, quotes, and a searchable full transcript. So I built a tool. URL in, structured Obsidian note out.

Here's how it works.


Step 1: Check for an Existing Transcript First

Before downloading any audio, the skill checks if the episode page already has a transcript.

Transistor-hosted shows publish transcripts at /[episode]/transcript. A quick HTML fetch finds the link. If it's there, scrape it, skip the audio entirely. Free and instant.

Other patterns to check: embedded transcript blocks in the page HTML, <podcast:transcript> tags in the RSS feed. More shows are publishing these than you'd expect.

Only if none of that works does the skill fall back to downloading audio and running it through Whisper, OpenAI's speech-to-text API.


Step 2: Get the Audio URL

If there's no transcript, I need the audio file.

The cleanest source is the RSS feed. Apple Podcasts pages embed the feedUrl in the page's JSON data. Fetch the RSS, find the right episode's <enclosure> tag, and you have a direct MP3 link.

For the first show I tested, the RSS had every episode URL, season numbers, and GUIDs for state tracking. No yt-dlp needed.

yt-dlp is a command-line tool that can download audio from just about any podcast or video URL. It's the fallback for pages that don't expose RSS cleanly.


Step 3: Transcribe with OpenAI Whisper

The Whisper API has a 25MB file limit. A typical one-hour podcast episode is 50-70MB.

The fix: split the audio with ffmpeg, a command-line tool for processing audio and video files.

chunk_duration = 1200  # 20 minutes

For a 60-minute episode, that's 3 chunks. Each goes to the Whisper API separately. The transcripts get joined. Total cost for an hour of audio: about $0.36.

The script handles the full flow: download the audio, detect file size, split if needed, transcribe each chunk, stitch the output.


Step 4: Process with Claude

The raw Whisper transcript is a wall of text. No speaker labels, no paragraph breaks, no structure.

After transcription, Claude reads it and generates:

  • Summary: 2-3 sentences on what the episode covered
  • Key takeaways: 5-8 bullet points with the main insights
  • Notable quotes: 3-5 exact quotes pulled from the transcript
  • Action items: anything actionable for the listener

Here's what a processed note looks like in practice:

## Summary
The hosts break down how their team adopted trunk-based development
after years of long-lived feature branches, and what broke along the way...

## Key Takeaways
- Short-lived branches forced smaller PRs, which made reviews faster
- Feature flags replaced branch-based isolation for incomplete work
- The hardest part was trust, not tooling
- ...

## Notable Quotes
> "We didn't have a branching problem. We had a confidence problem.
> Nobody trusted main."

The full transcript goes in the note too. I can re-run Claude on the same text later with different prompts. Last week I asked it to pull every question the host asked across five episodes. No re-transcription, no extra Whisper cost. The raw text is the asset.


Step 5: Save the Note

Each episode becomes a structured Obsidian note with frontmatter including show, season, episode, date, and source URL. The show folder gets an _index.md that uses a Dataview query to auto-populate the episode list from frontmatter. No manual maintenance as new episodes come in.

Personal/Podcasts/
  Show Name/
    _index.md              ← Dataview query, auto-populates
    2026-03-17 - Episode Title Here.md
    2026-03-10 - Another Episode Title.md
    ...

Step 6: Auto-Sync New Episodes

For shows I want to follow week over week, I added a tracking config:

// Personal/Podcasts/_tracked.json
[
  {
    "show": "Show Name",
    "rss": "https://rss.example.com/feed.xml",
    "description": "One-line description of the show.",
    "sync_from": "2026-03-18"
  }
]

A Python script reads this file, checks each show's RSS for episodes it hasn't seen (tracked by GUID), and transcribes any new ones. The sync_from date tells it where to start. Without it, a new show would backfill its entire back catalog on the first run. Per-show state lives in _sync_state.json so shows are fully independent.

A cron job runs daily at 10am ET. If any tracked show has a new episode, it gets transcribed and added to the vault automatically.

To track a new show: add an entry to _tracked.json. That's it. The cron picks it up without any other changes.

If you're building something similar: use underscore prefixes (_sync_state.json), not dotfiles. Obsidian Sync skips dotfiles by default, which means your state file won't sync across devices. Learned that the hard way.


What It Actually Cost to Build

A few hours. The transcription script is about 160 lines of Python. The sync script is about 300. The Obsidian skill that ties it together is a markdown file.

Tools used:

  • OpenAI Whisper API for transcription (~$0.006/min)
  • ffmpeg for audio splitting and duration detection
  • yt-dlp as a fallback for audio extraction
  • Python 3 for glue code (stdlib only, no pip packages)
  • Claude for post-processing the raw transcript into structured notes

The whole thing runs on the same DigitalOcean droplet that runs OpenClaw. No new infrastructure.


What I'd Add Next

Speaker diarization is the obvious gap. Whisper doesn't identify who's speaking. AssemblyAI and Deepgram both have this built in for similar pricing. For any show with consistent hosts and guests, labeled transcripts would make the notes far more useful.

I'll swap the transcription backend when I have a reason to re-transcribe.

The other thing: cross-episode analysis. Once I have a full season of transcripts, I can ask Claude to find every time the hosts discussed a specific topic across all episodes. That's a skill worth building once there's enough content in the vault.


A month ago, I couldn't search anything from the podcasts I listen to. Now every episode lands in the vault with structure, summaries, and a full transcript I can query whenever I want. The vault gets richer without extra work. Some podcasts publish transcripts. Some don't. I stopped thinking about which is which.


If you're building something similar I'd love to chat with you about it. Find me on Bluesky or X.

Part of a series

Building AI-powered Personal Infrastructure

Liked this? Get Dev Notes.

A weekly newsletter for developers. AI tools, Laravel, dev workflows, and things I find interesting. Every Friday at 8:45 AM Eastern.