Transcript Playbook¶
Skill: transcript SKILL.md: transcript/SKILL.md Trigger keywords: CLI invocation — no automatic keyword triggers
Document Sections¶
| Section | Purpose |
|---|---|
| When to Use | Activation criteria and exclusions |
| Prerequisites | What must be in place before invoking |
| Step-by-Step | Primary invocation path |
| Examples | Concrete invocation examples |
| Troubleshooting | Common failure modes |
| Domain Contexts | The 9 supported domain contexts with selection guidance |
| Input Formats | VTT, SRT, and plain text format handling |
| Related Resources | Cross-references to other playbooks and SKILL.md |
When to Use¶
Important: The transcript skill is NOT triggered by keyword detection. It does NOT activate automatically when you mention words like "transcript" or "meeting notes" in conversation. You MUST explicitly invoke it using the
/transcriptcommand or by asking Claude to runuv run jerry transcript parse. This design is intentional — the skill requires a specific input file path that cannot be inferred from keyword context alone.
Use this skill when:¶
- You have a meeting recording transcript file (VTT, SRT, or plain text) that you need converted into structured notes
- You want to extract action items, decisions, open questions, or key topics from a meeting
- You need to process a Zoom, Teams, or other conferencing platform subtitle/caption file
- You want to generate a navigable Markdown knowledge packet from a recorded conversation
- You need domain-specific entity extraction from a meeting (commitments, architectural decisions, security findings, UX insights, etc.)
- You are processing a post-mortem, standup, sprint review, or similar structured meeting where structured output is required for follow-up
Do NOT use this skill when:¶
- You want to summarize a document that is not a transcript or meeting recording — use
/problem-solvingfor general research and analysis tasks - You want to analyze a conversation that is already in structured note form — the skill requires a raw transcript file as input
- You have no transcript file to provide — the Phase 1 CLI parser requires an actual file path; it cannot operate on text pasted into the conversation
Prerequisites¶
JERRY_PROJECTis set — an active Jerry project is required (H-04). If not set, the session will not proceed. Runjerry session startto establish a session.uvis installed — all Python execution usesuv run(H-05). The Phase 1 CLI parser MUST be invoked viauv run jerry transcript parse. Never usepythondirectly.- A transcript file is available on the local filesystem — the parser requires an absolute or resolvable path to a VTT, SRT, or plain text transcript file.
- The
jerryCLI is installed — verify withuv run jerry --help. The transcript subcommand must be available (uv run jerry transcript parse --help). - An output directory is specified or the default is acceptable — output goes to
./transcript-output/by default. Specify--output-dirto control placement.
Step-by-Step¶
Primary Path: Processing a VTT transcript with the two-phase workflow¶
The transcript skill uses a mandatory two-phase architecture:
- Phase 1 (CLI — deterministic): Python parser converts the raw file into structured JSON chunks. This is ~1,250x cheaper than LLM parsing and produces 100% accurate timestamps. This phase MUST use the CLI — never ask Claude to parse VTT directly. (Cost basis: A 1-hour VTT transcript produces ~280K tokens of structured data. The Python parser processes this at zero API token cost in <1 second. LLM parsing of the same data requires ~280K input tokens + ~50K output tokens at API rates, yielding a ~1,250:1 cost ratio. See SKILL.md Design Rationale for full methodology.)
- Phase 2+ (LLM agents — semantic): Agents read the JSON chunks and produce the structured Markdown output packet.
Steps:
-
Identify your transcript file path and desired output directory. Use an absolute path to avoid ambiguity. Example:
-
Invoke the transcript skill with the file path and any optional flags:
Or with a domain: -
Phase 1 executes automatically — Claude runs the CLI parser:
This produces three files in the output directory: index.json(~8KB) — chunk metadata and speaker summarychunks/chunk-*.json(~130KB each) — transcript segments in processable chunks-
canonical-transcript.json(~930KB) — full parsed output (NEVER read into context — see warning below) -
Phase 2 — ts-extractor agent runs — reads
index.jsonand eachchunks/chunk-*.jsonsequentially, then writesextraction-report.jsonwith all extracted entities (action items, decisions, questions, topics, speakers). -
Phase 3 — ts-formatter agent runs — reads
index.jsonandextraction-report.json, then writes the 8-file Markdown output packet: 00-index.md— navigation hub01-summary.md— executive summary02-transcript.md— full formatted transcript03-speakers.md— speaker directory04-action-items.md— action items with citations05-decisions.md— decisions with context06-questions.md— open questions-
07-topics.md— topic segments -
Phase 4 — ts-mindmap agents run (unless
--no-mindmapwas specified) — generate visual summaries in08-mindmap/mindmap.mmd(Mermaid) and/or08-mindmap/mindmap.ascii.txt(ASCII). Mindmaps are ON by default per ADR-006. -
Phase 5 — ps-critic runs — validates quality against the >= 0.90 threshold (the transcript skill uses a skill-specific threshold lower than the general 0.92 SSOT; see the SKILL.md Design Rationale section for the selection rationale). If quality is below threshold, revision is triggered automatically.
-
Review the output packet — open
00-index.mdas the entry point for navigation across all generated files.
CRITICAL WARNING —
canonical-transcript.json: This file is generated by Phase 1 and can be ~930KB (~280K tokens for large meetings). Agents MUST NEVER readcanonical-transcript.jsoninto context — it will fill the context window, cause summarization, and result in data loss. Always useindex.json(~8KB) andchunks/chunk-*.json(~130KB each) instead. This is a documented architectural constraint (SKILL.md Large File Handling).
Examples¶
Example 1: Processing a VTT file with default settings¶
User request: "/transcript /Users/me/meetings/standup-2026-02-18.vtt"
System behavior: Claude invokes the Phase 1 CLI parser:
uv run jerry transcript parse "/Users/me/meetings/standup-2026-02-18.vtt" \
--output-dir "./transcript-output/"
index.json, chunks/chunk-*.json, and canonical-transcript.json
in ./transcript-output/. Phase 2 agents then run ts-extractor, ts-formatter, and
ts-mindmap. The final output is an 8-file Markdown packet plus a mindmap in
./transcript-output/, with 00-index.md as the navigation entry point.
Example 2: Processing a software engineering standup with domain context¶
User request: "/transcript /Users/me/meetings/sprint-review.vtt --output-dir /Users/me/notes/ --domain software-engineering"
System behavior: Claude invokes the Phase 1 CLI parser with the --domain flag:
uv run jerry transcript parse "/Users/me/meetings/sprint-review.vtt" \
--output-dir "/Users/me/notes/" \
--domain software-engineering
software-engineering domain context loads extraction rules tuned for standups and
sprint events: commitments, blockers, and risks are extracted in addition to the standard
action items, decisions, and questions. The output packet in /Users/me/notes/ contains
domain-enriched entities attributed to speakers, with citations linking back to timestamps
in the original transcript.
Example 3: Processing an SRT file without mindmaps¶
User request: "/transcript /Users/me/captions/webinar.srt --no-mindmap"
System behavior: Claude invokes the Phase 1 CLI parser. Because the input is an SRT
file (not VTT), LLM-based parsing is used instead of the Python parser — SRT and plain
text formats do not have the deterministic Python path that VTT uses. The --no-mindmap
flag suppresses the ts-mindmap phase. The output packet contains the standard 8 files
(00-index.md through 07-topics.md) but no 08-mindmap/ directory.
Example 4: Processing a user research interview¶
User request: "/transcript /Users/me/research/user-interview-001.vtt --domain user-experience --output-dir /Users/me/research/processed/"
System behavior: Claude runs Phase 1 then orchestrates LLM agents with the
user-experience domain context, which activates verbatim quote preservation. The
extracted entities include user insights, pain points, and verbatim quotes attributed to
participants. The output is suitable for a UX research repository.
Troubleshooting¶
| Symptom | Cause | Resolution |
|---|---|---|
uv: command not found when running Phase 1 |
uv is not installed or not on PATH |
Install uv: curl -LsSf https://astral.sh/uv/install.sh \| sh. Verify with uv --version. See H-05 — NEVER use python directly as a workaround. |
No such file or directory error on the transcript file |
The file path provided is incorrect or the file does not exist at that location | Verify the full absolute path to the transcript file. Use tab-completion or ls to confirm. Always quote paths that contain spaces. |
canonical-transcript.json was read and context filled up / output is incomplete or summarized |
An agent mistakenly read canonical-transcript.json into context, exhausting the token budget and triggering LLM summarization |
Never read canonical-transcript.json. Use index.json (entry point, ~8KB) and chunks/chunk-*.json (~130KB each). If this happened, restart the Phase 2+ pipeline from ts-extractor using the existing chunk files. |
Phase 1 CLI exits with non-zero code / ParseError |
The transcript file is malformed, has an unsupported encoding, or is corrupted | Check the error message for the specific line number. VTT files must have a valid WEBVTT header. Try opening the file in a text editor to confirm it is valid UTF-8 and has correct VTT syntax. |
| Quality review score below 0.90 — ps-critic rejects output | Extraction quality was insufficient: missing citations, incomplete entity coverage, or formatting errors in the packet files | The ps-critic agent will identify specific defects. Address the flagged issues in the relevant packet file (e.g., 04-action-items.md). Common causes: very short chunks (too few entities), low-quality source transcript (heavy background noise, overlapping speech), or unsupported speaker labeling format. |
| Domain-specific entities are missing from the output | The --domain flag was not specified, or the wrong domain was selected |
Re-run the Phase 2+ agents (ts-extractor onwards) with the correct --domain flag. The Phase 1 JSON output (chunks) can be reused — only re-run from ts-extractor. See the domain table in the Domain Contexts section below. |
| Mindmap generation fails but core packet is intact | ts-mindmap agent encountered an error (e.g., content too sparse for a meaningful mindmap) | This is expected graceful degradation per ADR-006. The 8-file packet remains valid. Use --no-mindmap on future invocations if mindmaps are not needed, or investigate the ts-mindmap error output for specific causes. |
| Agent fails mid-execution (e.g., ts-extractor or ts-formatter crashes partway through) | Token budget exhaustion, session interruption, or agent error during Phase 2+ processing | 1. Identify: Check which phase failed — Phase 1 (CLI) artifacts (index.json, chunks/) are always recoverable since they are written by the Python parser. 2. Salvage: Phase 1 output is reusable — the JSON chunks do not need to be regenerated. 3. Recover: Re-run from the failed phase only: if ts-extractor failed, re-invoke it with the existing index.json and chunks/ directory; if ts-formatter failed, re-invoke it with the existing extraction-report.json. The CLI parser does NOT need to re-run. |
Domain Contexts¶
The transcript skill supports 9 domain contexts. Select the domain that best matches your
meeting type using the --domain <name> flag. If no domain is specified, general is
the default.
| Domain | Use For | Key Additional Entities |
|---|---|---|
general |
Any transcript (default) | speakers, topics, questions |
transcript |
Base transcript entities, extends general | + segments, timestamps |
meeting |
Generic meetings | + action items, decisions, follow-ups |
software-engineering |
Standups, sprint planning, code reviews | + commitments, blockers, risks |
software-architecture |
ADR discussions, design sessions | + architectural decisions, alternatives, quality attributes |
product-management |
Roadmap planning, feature prioritization | + feature requests, user needs, stakeholder feedback |
user-experience |
Research interviews, usability tests | + user insights, pain points, verbatim quotes |
cloud-engineering |
Post-mortems, capacity planning | + incidents, root causes (blameless culture) |
security-engineering |
Security audits, threat modeling | + vulnerabilities, threats (STRIDE), compliance gaps |
Domain selection examples:
uv run jerry transcript parse "standup.vtt" --output-dir "./out/" --domain software-engineering
uv run jerry transcript parse "postmortem.vtt" --output-dir "./out/" --domain cloud-engineering
uv run jerry transcript parse "user-interview.vtt" --output-dir "./out/" --domain user-experience
Input Formats¶
The transcript skill supports three input formats. The Phase 1 CLI parser handles each as follows:
| Format | Extension | Parsing Method | Notes |
|---|---|---|---|
| VTT (WebVTT) | .vtt |
Python parser (deterministic) | ~1,250x cheaper than LLM; 100% timestamp accuracy; MUST use CLI |
| SRT (SubRip) | .srt |
LLM-based parsing | Full LLM API cost (~280K+ tokens for 1-hour file); processed by ts-extractor agent directly |
| Plain text | .txt |
LLM-based parsing | Full LLM API cost; meeting notes, chat logs, or any unstructured transcript text |
Command syntax for each format:
# VTT (Zoom, Teams, Google Meet captions)
uv run jerry transcript parse "meeting.vtt" --output-dir "./output/"
# SRT (subtitle files)
uv run jerry transcript parse "captions.srt" --output-dir "./output/"
# Plain text meeting notes
uv run jerry transcript parse "notes.txt" --output-dir "./output/"
Related Resources¶
- SKILL.md — Authoritative technical reference for the transcript skill, including full agent pipeline specifications, domain context YAML schemas, ADR design rationale, and quality threshold documentation
- Problem-Solving Playbook — Use when you need to research, analyze, or investigate a topic (not for processing transcript files)
- Orchestration Playbook — Use when designing multi-phase workflows; the transcript skill itself uses an internal orchestration pipeline that can serve as a reference pattern