A flexible video generation system that makes it easy to create videos using different templates while sharing core functionality like audio generation and subtitle synchronization. Easily extendable to your own templates, and your own agents.
- Multiple video template support (Story mode, Meme coin mode)
- Automated audio generation and subtitle synchronization
- Word-to-word transcription using whisper_timestamped
- Extensible template system
- Trading view integration for meme coin data
- Docker
- Bun
- FFmpeg
- Build the transcription service:
docker build -t transcribe .
- Start the transcription service:
docker run -d \
--name transcribe \
-p 5005:5005 \
-v $(pwd)/public:/app/video/public \
transcribe \
gunicorn \
--timeout 120 \
-w 1 \
-b 0.0.0.0:5005 \
--access-logfile access.log \
--error-logfile error.log \
--chdir /app/video \
"transcribe:app"
- Add .env file based on .env.example
- OpenAI API key (for generating transcript)
- Claude API key (cleans srt files)
- ElevenLabs API key + voice ids (for audio generation)
- Social Data API key (if you want to use the meme coin template)
- Run
bun install
to install dependencies
bun run build.ts -t story
bun run build.ts -t meme -s <COIN_SYMBOL>
# Example:
bun run build.ts -t meme -s PILLZUMI
The system uses whisper_timestamped for precise word-to-word transcription, which is then converted to SRT format. The transcription is cleaned up using the original transcript as context to correct any model errors.
Audio files are mapped to specific speakers using the transcript data. When generating SRT files, each audio segment (e.g., 'redpill-0.mp3') gets a corresponding SRT file ('redpill-0.srt'). All audio files are ultimately concatenated into a single public/audio.mp3
file, while maintaining speaker timing information through the SRT files.
The project includes two main templates:
-
Story Template
- Uses pre-generated story transcript
- Example data provided in the
data
folder - Requires external story transcript generation system. We use Eliza for our story generation pipeline.
-
Meme Coin Template
- End-to-end implementation
- Fetches solana memecoin data from Trading View
- Generates transcripts automatically
- Uses the same core audio and subtitle generation system
To add a new template (e.g., "shitpost"):
- Create
ShitpostComposition.tsx
- Add the composition to
Root.tsx
(maybe with id="Shitpost") - Add
generateShitpostContextContent
function incontextGenerators.ts
- Update
invariantContext
function with new variables that are used by this new template. - Update types in
index.d.ts
- add generateShitpostTranscript function in
transcript.ts
- Add switch case for this new template in
build.ts
src/tmp/context.tsx
: Dynamic context file for all templatessrc/Root.tsx
: Contains all template compositionscontextGenerators.ts
: Generates template specific contextutils/
: Helper functions and utilitiesdata/
: Example story data
Video generation is slow because these templates run with concurrency 1 because adding concurrency adds a few minor visual bugs in the subtitles. But if you need it to go fast, change the concurrency in the scripts of package.json. To see the max concurrency your computer can do, run bun run os.ts
.
- At least 8GB RAM recommended
- FFmpeg installed on your system
- Node.js 18+ (for Bun compatibility)
- Disk space for video processing and docker image (at least 6GB recommended)
-
Docker Service Not Running
# Start Docker service sudo systemctl start docker
-
Port 5005 Already in Use
# Find and kill process using port 5005 lsof -i :5005 kill -9 <PID>
-
FFmpeg Missing
# MacOS brew install ffmpeg # Ubuntu sudo apt-get install ffmpeg
- Check Docker logs:
docker logs transcribe
- Application logs are in
access.log
anderror.log