A powerful CLI tool to transcribe and diarize audio/video files using AssemblyAI. Automatically identifies speakers and generates transcripts in multiple formats.
- 🎙️ Automatic speaker diarization
- 👥 Interactive speaker identification with context
- 📝 Multiple output formats (Markdown, SRT, TXT, JSON)
- 🕒 Timestamps for each segment
- 🔑 Secure API key management
- 💾 Smart caching for faster processing
- 💻 Cross-platform support
You can use meeting-diary
directly without installation using npx
or bunx
:
# Using npx (Node.js)
npx meeting-diary input.mp4
# Using bunx (Bun)
bunx meeting-diary input.mp4
If you prefer to install the tool globally:
# Using npm
npm install -g meeting-diary
# Using yarn
yarn global add meeting-diary
# Using bun
bun install -g meeting-diary
Then use it as:
meeting-diary input.mp4
meeting-diary input.mp4
This will:
- Transcribe and diarize your audio/video file
- Help you identify each speaker by showing their most significant contributions
- Generate a timestamped transcript in markdown format
meeting-diary input.mp4 -f txt # Simple text format
meeting-diary input.mp4 -f srt # SubRip subtitle format
meeting-diary input.mp4 -f json # JSON format with detailed metadata
meeting-diary input.mp4 -f md # Markdown format (default)
The markdown format includes:
- Timestamp for each segment
- Speaker list
- Chronological transcript with speaker attribution
- Processing metadata
Example:
# Meeting Transcript
_Processed on 2/10/2024, 3:43:26 PM_
_Duration: 5 minutes_
## Speakers
- **Hrishi**
- **Alok**
## Transcript
[0:00] **Hrishi**: Yeah, didn't have a chance yet...
[0:15] **Alok**: No engagement in terms of my Mushroom photos.
[0:18] **Hrishi**: Basically Samsung phones have the ability...
You can identify speakers in two ways:
- Interactive identification (default):
meeting-diary input.mp4
The tool will:
- Show you the most significant contributions from each speaker
- Display context (what was said before and after)
- Show previously identified speakers for context
- Ask you to identify each speaker in turn
- Specify speakers up front:
meeting-diary input.mp4 -s "John Smith" "Jane Doe"
Options:
-o, --output <file> Output file (defaults to input file name with new extension)
-f, --format <format> Output format (json, txt, srt, md) (default: "md")
-s, --speakers <names> Known speaker names (skip interactive identification)
--skip-diarization Skip speaker diarization
-v, --verbose Show verbose output
--api-key <key> AssemblyAI API key (will prompt if not provided)
--no-cache Disable caching of uploads and transcripts
--cache-dir <dir> Directory to store cache files
--no-interactive Skip interactive speaker identification
-h, --help display help for command
The tool automatically caches uploaded audio files and transcripts to avoid unnecessary re-processing. This is especially useful when:
- Experimenting with different output formats
- Re-running transcription with different speaker names
- Processing the same file multiple times
Cache files are stored in your system's temporary directory by default. You can:
- Disable caching with
--no-cache
- Change cache location with
--cache-dir
- Cache is enabled by default for faster processing
- Cache files are automatically cleaned up by your OS's temp file management
You'll need an AssemblyAI API key to use this tool. You can:
- Set it as an environment variable:
ASSEMBLYAI_API_KEY=your-key
- Pass it via the command line:
--api-key your-key
- Let the tool prompt you for it (it can be saved for future use)
# Clone the repository
git clone https://github.com/southbridgeai/meeting-diary.git
cd meeting-diary
# Install dependencies
bun install
# Build
bun run build
# Run tests
bun test
# Development mode
bun run dev
Apache-2.0 - see LICENSE for details.
Contributions are welcome! Please feel free to submit a Pull Request.