RAGVenture

RAGVenture is an intelligent startup idea generator powered by Retrieval-Augmented Generation (RAG). It helps entrepreneurs generate innovative startup ideas by learning from successful companies, combining the power of large language models with real-world startup data.

Why RAGVenture?

Traditional startup ideation tools either rely on expensive API calls or generate ideas without real-world context. RAGVenture solves this by:

Data-Driven: Learns from real startup data to ground suggestions in reality
Context-Aware: Understands patterns from successful startups
Intelligent: Uses RAG to combine LLM capabilities with precise information retrieval
Fast & Local: Runs entirely on your machine with no API costs (except optional Hugging Face)
Production-Ready: Comprehensive test suite, error handling, and monitoring

System Requirements

Python 3.11 or higher
8GB RAM minimum (16GB recommended)
2GB disk space for models and data
Operating Systems:
- Linux (recommended)
- macOS
- Windows (with WSL for best performance)

Quick Start

Installation:

# Clone the repository
git clone https://github.com/yourusername/RAGVenture.git
cd RAGVenture

# Create virtual environment
python -m venv .venv

# Activate virtual environment
# On Windows:
.venv\Scripts\activate
# On Unix or MacOS:
source .venv/bin/activate

# Install dependencies
pip install -r requirements.txt

# Install spaCy language model for market analysis
python -m spacy download en_core_web_sm

Environment Setup:

# Required for text generation
export HUGGINGFACE_TOKEN="your-token-here"  # Get from huggingface.co

# Optional for LangChain tracing (debugging)
export LANGCHAIN_TRACING_V2=true
export LANGCHAIN_ENDPOINT="https://api.smith.langchain.com"
export LANGCHAIN_API_KEY="your-langsmith-api-key"
export LANGCHAIN_PROJECT="your-project-name"

Generate Ideas:

python rag_startup_ideas.py --topic "AI" --num-ideas 1

Features & Capabilities

Core Features

Intelligent Idea Generation:
- Uses RAG to combine LLM knowledge with real startup data
- Generates contextually relevant and grounded ideas
- Provides structured output with problem, solution, and market analysis

Command-Line Interface

--topic: Target domain or area (required)
--num-ideas: Number of ideas to generate (default: 3)
--file: Custom startup data file (default: yc_startups.json)
--max-lines: Limit data processing (optional)

Technical Features

Smart Analysis:
- Semantic search for relevant examples
- Automatic metadata extraction
- Pattern recognition from successful startups
Performance Optimized:
- One-time embedding generation (~22s)
- Fast idea generation (~0.5s per idea)
- Efficient data processing (~0.1s load time)
Production Quality:
- 31 comprehensive unit tests
- Automated code formatting
- Extensive error handling

Performance

Typical processing times on a standard machine:

Initial Setup: ~22s (one-time embedding generation)
Data Loading: ~0.1s
Idea Generation: ~0.5s per idea

Docker Support

For containerized deployment, we provide both CPU and GPU support.

Prerequisites

Docker and Docker Compose
For GPU support:
- NVIDIA GPU with CUDA
- NVIDIA Container Toolkit
- nvidia-docker2

Quick Start with Docker

# CPU Version
docker-compose up app-cpu

# GPU Version (with NVIDIA support)
docker-compose up app-gpu

Development Setup

Clone and setup:

git clone https://github.com/yourusername/RAGVenture.git
cd RAGVenture
python -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate

Install development dependencies:

pip install -r requirements.txt
pre-commit install  # Sets up automatic code formatting

Run tests:

pytest tests/  # Should show 31 passing tests

Data Requirements

RAGVenture works with startup data in JSON format. Two options:

Use YC Data (Recommended):
- Download from Y Combinator
- Convert CSV to JSON:
```
python -m rag_startups.data.convert_yc_data input.csv -o startups.json
```
Use Custom Data:
- Prepare JSON file with required fields
- See docs/data_format.md for schema

Troubleshooting

Embedding Generation Time:
- First run takes ~22s to generate embeddings
- Subsequent runs use cached embeddings
- GPU can significantly speed up this process
Common Issues:
- Missing HUGGINGFACE_TOKEN: Sign up at huggingface.co
- Memory errors: Reduce batch size with --max-lines
- GPU errors: Ensure CUDA toolkit is properly installed

Documentation

docs/api.md: API documentation
docs/examples.md: Usage examples
docs/data_format.md: Data schema
CONTRIBUTING.md: Development guidelines

Contributing

See CONTRIBUTING.md for development setup and guidelines.

License

This project is licensed under the MIT License - see LICENSE for details.

Name		Name	Last commit message	Last commit date
Latest commit History 89 Commits
.github/workflows		.github/workflows
.idea		.idea
config		config
data		data
docs		docs
src/rag_startups		src/rag_startups
tests		tests
.dockerignore		.dockerignore
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
Dockerfile.gpu		Dockerfile.gpu
LICENSE		LICENSE
README.md		README.md
canvas_suggested_example.py		canvas_suggested_example.py
docker-compose.yml		docker-compose.yml
embed_master.py		embed_master.py
market_analysis_demo.py		market_analysis_demo.py
mkdocs.yml		mkdocs.yml
pyproject.toml		pyproject.toml
rag_startup_ideas.py		rag_startup_ideas.py
requirements.txt		requirements.txt
setup.py		setup.py
yc_startups.json		yc_startups.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RAGVenture

Why RAGVenture?

System Requirements

Quick Start

Features & Capabilities

Core Features

Command-Line Interface

Technical Features

Performance

Docker Support

Prerequisites

Quick Start with Docker

Development Setup

Data Requirements

Troubleshooting

Documentation

Contributing

License

About

Releases

Packages

Contributors 2

Languages

License

valginer0/RAGVenture

Folders and files

Latest commit

History

Repository files navigation

RAGVenture

Why RAGVenture?

System Requirements

Quick Start

Features & Capabilities

Core Features

Command-Line Interface

Technical Features

Performance

Docker Support

Prerequisites

Quick Start with Docker

Development Setup

Data Requirements

Troubleshooting

Documentation

Contributing

License

About

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages