Skip to content

Transcript Playbook

Skill: transcript SKILL.md: transcript/SKILL.md Trigger keywords: CLI invocation — no automatic keyword triggers

Document Sections

Section Purpose
When to Use Activation criteria and exclusions
Prerequisites What must be in place before invoking
Step-by-Step Primary invocation path
Examples Concrete invocation examples
Troubleshooting Common failure modes
Domain Contexts The 9 supported domain contexts with selection guidance
Input Formats VTT, SRT, and plain text format handling
Related Resources Cross-references to other playbooks and SKILL.md

When to Use

Important: The transcript skill is NOT triggered by keyword detection. It does NOT activate automatically when you mention words like "transcript" or "meeting notes" in conversation. You MUST explicitly invoke it using the /transcript command or by asking Claude to run uv run jerry transcript parse. This design is intentional — the skill requires a specific input file path that cannot be inferred from keyword context alone.

Use this skill when:

  • You have a meeting recording transcript file (VTT, SRT, or plain text) that you need converted into structured notes
  • You want to extract action items, decisions, open questions, or key topics from a meeting
  • You need to process a Zoom, Teams, or other conferencing platform subtitle/caption file
  • You want to generate a navigable Markdown knowledge packet from a recorded conversation
  • You need domain-specific entity extraction from a meeting (commitments, architectural decisions, security findings, UX insights, etc.)
  • You are processing a post-mortem, standup, sprint review, or similar structured meeting where structured output is required for follow-up

Do NOT use this skill when:

  • You want to summarize a document that is not a transcript or meeting recording — use /problem-solving for general research and analysis tasks
  • You want to analyze a conversation that is already in structured note form — the skill requires a raw transcript file as input
  • You have no transcript file to provide — the Phase 1 CLI parser requires an actual file path; it cannot operate on text pasted into the conversation

Prerequisites

  • JERRY_PROJECT is set — an active Jerry project is required (H-04). If not set, the session will not proceed. Run jerry session start to establish a session.
  • uv is installed — all Python execution uses uv run (H-05). The Phase 1 CLI parser MUST be invoked via uv run jerry transcript parse. Never use python directly.
  • A transcript file is available on the local filesystem — the parser requires an absolute or resolvable path to a VTT, SRT, or plain text transcript file.
  • The jerry CLI is installed — verify with uv run jerry --help. The transcript subcommand must be available (uv run jerry transcript parse --help).
  • An output directory is specified or the default is acceptable — output goes to ./transcript-output/ by default. Specify --output-dir to control placement.

Step-by-Step

Primary Path: Processing a VTT transcript with the two-phase workflow

The transcript skill uses a mandatory two-phase architecture:

  • Phase 1 (CLI — deterministic): Python parser converts the raw file into structured JSON chunks. This is ~1,250x cheaper than LLM parsing and produces 100% accurate timestamps. This phase MUST use the CLI — never ask Claude to parse VTT directly. (Cost basis: A 1-hour VTT transcript produces ~280K tokens of structured data. The Python parser processes this at zero API token cost in <1 second. LLM parsing of the same data requires ~280K input tokens + ~50K output tokens at API rates, yielding a ~1,250:1 cost ratio. See SKILL.md Design Rationale for full methodology.)
  • Phase 2+ (LLM agents — semantic): Agents read the JSON chunks and produce the structured Markdown output packet.

Steps:

  1. Identify your transcript file path and desired output directory. Use an absolute path to avoid ambiguity. Example:

    /Users/me/meetings/quarterly-review.vtt
    

  2. Invoke the transcript skill with the file path and any optional flags:

    /transcript /Users/me/meetings/quarterly-review.vtt --output-dir /Users/me/output/quarterly-review/
    
    Or with a domain:
    /transcript /Users/me/meetings/quarterly-review.vtt --output-dir /Users/me/output/ --domain software-engineering
    

  3. Phase 1 executes automatically — Claude runs the CLI parser:

    uv run jerry transcript parse "/Users/me/meetings/quarterly-review.vtt" \
        --output-dir "/Users/me/output/quarterly-review/"
    
    This produces three files in the output directory:

  4. index.json (~8KB) — chunk metadata and speaker summary
  5. chunks/chunk-*.json (~130KB each) — transcript segments in processable chunks
  6. canonical-transcript.json (~930KB) — full parsed output (NEVER read into context — see warning below)

  7. Phase 2 — ts-extractor agent runs — reads index.json and each chunks/chunk-*.json sequentially, then writes extraction-report.json with all extracted entities (action items, decisions, questions, topics, speakers).

  8. Phase 3 — ts-formatter agent runs — reads index.json and extraction-report.json, then writes the 8-file Markdown output packet:

  9. 00-index.md — navigation hub
  10. 01-summary.md — executive summary
  11. 02-transcript.md — full formatted transcript
  12. 03-speakers.md — speaker directory
  13. 04-action-items.md — action items with citations
  14. 05-decisions.md — decisions with context
  15. 06-questions.md — open questions
  16. 07-topics.md — topic segments

  17. Phase 4 — ts-mindmap agents run (unless --no-mindmap was specified) — generate visual summaries in 08-mindmap/mindmap.mmd (Mermaid) and/or 08-mindmap/mindmap.ascii.txt (ASCII). Mindmaps are ON by default per ADR-006.

  18. Phase 5 — ps-critic runs — validates quality against the >= 0.90 threshold (the transcript skill uses a skill-specific threshold lower than the general 0.92 SSOT; see the SKILL.md Design Rationale section for the selection rationale). If quality is below threshold, revision is triggered automatically.

  19. Review the output packet — open 00-index.md as the entry point for navigation across all generated files.

CRITICAL WARNING — canonical-transcript.json: This file is generated by Phase 1 and can be ~930KB (~280K tokens for large meetings). Agents MUST NEVER read canonical-transcript.json into context — it will fill the context window, cause summarization, and result in data loss. Always use index.json (~8KB) and chunks/chunk-*.json (~130KB each) instead. This is a documented architectural constraint (SKILL.md Large File Handling).


Examples

Example 1: Processing a VTT file with default settings

User request: "/transcript /Users/me/meetings/standup-2026-02-18.vtt"

System behavior: Claude invokes the Phase 1 CLI parser:

uv run jerry transcript parse "/Users/me/meetings/standup-2026-02-18.vtt" \
    --output-dir "./transcript-output/"
The parser produces index.json, chunks/chunk-*.json, and canonical-transcript.json in ./transcript-output/. Phase 2 agents then run ts-extractor, ts-formatter, and ts-mindmap. The final output is an 8-file Markdown packet plus a mindmap in ./transcript-output/, with 00-index.md as the navigation entry point.


Example 2: Processing a software engineering standup with domain context

User request: "/transcript /Users/me/meetings/sprint-review.vtt --output-dir /Users/me/notes/ --domain software-engineering"

System behavior: Claude invokes the Phase 1 CLI parser with the --domain flag:

uv run jerry transcript parse "/Users/me/meetings/sprint-review.vtt" \
    --output-dir "/Users/me/notes/" \
    --domain software-engineering
The software-engineering domain context loads extraction rules tuned for standups and sprint events: commitments, blockers, and risks are extracted in addition to the standard action items, decisions, and questions. The output packet in /Users/me/notes/ contains domain-enriched entities attributed to speakers, with citations linking back to timestamps in the original transcript.


Example 3: Processing an SRT file without mindmaps

User request: "/transcript /Users/me/captions/webinar.srt --no-mindmap"

System behavior: Claude invokes the Phase 1 CLI parser. Because the input is an SRT file (not VTT), LLM-based parsing is used instead of the Python parser — SRT and plain text formats do not have the deterministic Python path that VTT uses. The --no-mindmap flag suppresses the ts-mindmap phase. The output packet contains the standard 8 files (00-index.md through 07-topics.md) but no 08-mindmap/ directory.


Example 4: Processing a user research interview

User request: "/transcript /Users/me/research/user-interview-001.vtt --domain user-experience --output-dir /Users/me/research/processed/"

System behavior: Claude runs Phase 1 then orchestrates LLM agents with the user-experience domain context, which activates verbatim quote preservation. The extracted entities include user insights, pain points, and verbatim quotes attributed to participants. The output is suitable for a UX research repository.


Troubleshooting

Symptom Cause Resolution
uv: command not found when running Phase 1 uv is not installed or not on PATH Install uv: curl -LsSf https://astral.sh/uv/install.sh \| sh. Verify with uv --version. See H-05 — NEVER use python directly as a workaround.
No such file or directory error on the transcript file The file path provided is incorrect or the file does not exist at that location Verify the full absolute path to the transcript file. Use tab-completion or ls to confirm. Always quote paths that contain spaces.
canonical-transcript.json was read and context filled up / output is incomplete or summarized An agent mistakenly read canonical-transcript.json into context, exhausting the token budget and triggering LLM summarization Never read canonical-transcript.json. Use index.json (entry point, ~8KB) and chunks/chunk-*.json (~130KB each). If this happened, restart the Phase 2+ pipeline from ts-extractor using the existing chunk files.
Phase 1 CLI exits with non-zero code / ParseError The transcript file is malformed, has an unsupported encoding, or is corrupted Check the error message for the specific line number. VTT files must have a valid WEBVTT header. Try opening the file in a text editor to confirm it is valid UTF-8 and has correct VTT syntax.
Quality review score below 0.90 — ps-critic rejects output Extraction quality was insufficient: missing citations, incomplete entity coverage, or formatting errors in the packet files The ps-critic agent will identify specific defects. Address the flagged issues in the relevant packet file (e.g., 04-action-items.md). Common causes: very short chunks (too few entities), low-quality source transcript (heavy background noise, overlapping speech), or unsupported speaker labeling format.
Domain-specific entities are missing from the output The --domain flag was not specified, or the wrong domain was selected Re-run the Phase 2+ agents (ts-extractor onwards) with the correct --domain flag. The Phase 1 JSON output (chunks) can be reused — only re-run from ts-extractor. See the domain table in the Domain Contexts section below.
Mindmap generation fails but core packet is intact ts-mindmap agent encountered an error (e.g., content too sparse for a meaningful mindmap) This is expected graceful degradation per ADR-006. The 8-file packet remains valid. Use --no-mindmap on future invocations if mindmaps are not needed, or investigate the ts-mindmap error output for specific causes.
Agent fails mid-execution (e.g., ts-extractor or ts-formatter crashes partway through) Token budget exhaustion, session interruption, or agent error during Phase 2+ processing 1. Identify: Check which phase failed — Phase 1 (CLI) artifacts (index.json, chunks/) are always recoverable since they are written by the Python parser. 2. Salvage: Phase 1 output is reusable — the JSON chunks do not need to be regenerated. 3. Recover: Re-run from the failed phase only: if ts-extractor failed, re-invoke it with the existing index.json and chunks/ directory; if ts-formatter failed, re-invoke it with the existing extraction-report.json. The CLI parser does NOT need to re-run.

Domain Contexts

The transcript skill supports 9 domain contexts. Select the domain that best matches your meeting type using the --domain <name> flag. If no domain is specified, general is the default.

Domain Use For Key Additional Entities
general Any transcript (default) speakers, topics, questions
transcript Base transcript entities, extends general + segments, timestamps
meeting Generic meetings + action items, decisions, follow-ups
software-engineering Standups, sprint planning, code reviews + commitments, blockers, risks
software-architecture ADR discussions, design sessions + architectural decisions, alternatives, quality attributes
product-management Roadmap planning, feature prioritization + feature requests, user needs, stakeholder feedback
user-experience Research interviews, usability tests + user insights, pain points, verbatim quotes
cloud-engineering Post-mortems, capacity planning + incidents, root causes (blameless culture)
security-engineering Security audits, threat modeling + vulnerabilities, threats (STRIDE), compliance gaps

Domain selection examples:

uv run jerry transcript parse "standup.vtt" --output-dir "./out/" --domain software-engineering
uv run jerry transcript parse "postmortem.vtt" --output-dir "./out/" --domain cloud-engineering
uv run jerry transcript parse "user-interview.vtt" --output-dir "./out/" --domain user-experience


Input Formats

The transcript skill supports three input formats. The Phase 1 CLI parser handles each as follows:

Format Extension Parsing Method Notes
VTT (WebVTT) .vtt Python parser (deterministic) ~1,250x cheaper than LLM; 100% timestamp accuracy; MUST use CLI
SRT (SubRip) .srt LLM-based parsing Full LLM API cost (~280K+ tokens for 1-hour file); processed by ts-extractor agent directly
Plain text .txt LLM-based parsing Full LLM API cost; meeting notes, chat logs, or any unstructured transcript text

Command syntax for each format:

# VTT (Zoom, Teams, Google Meet captions)
uv run jerry transcript parse "meeting.vtt" --output-dir "./output/"

# SRT (subtitle files)
uv run jerry transcript parse "captions.srt" --output-dir "./output/"

# Plain text meeting notes
uv run jerry transcript parse "notes.txt" --output-dir "./output/"


  • SKILL.md — Authoritative technical reference for the transcript skill, including full agent pipeline specifications, domain context YAML schemas, ADR design rationale, and quality threshold documentation
  • Problem-Solving Playbook — Use when you need to research, analyze, or investigate a topic (not for processing transcript files)
  • Orchestration Playbook — Use when designing multi-phase workflows; the transcript skill itself uses an internal orchestration pipeline that can serve as a reference pattern