CASSIA (Collaborative Agent System for Single cell Interpretable Annotation) is a tool that enhances cell type annotation using multi-agent Large Language Models (LLMs).
📖 Read our paper for detailed methodology and benchmarking results.
🌐 Try CASSIA Web UI - A web interface for basic CASSIA functionality
Option 1: Install from GitHub
# Install dependencies
install.packages("devtools")
install.packages("reticulate")
# Install CASSIA
devtools::install_github("ElliotXie/CASSIA/CASSIA_R")
Option 2: Install from source
install.packages("reticulate")
install.packages("remotes")
remotes::install_url("https://github.com/ElliotXie/CASSIA/raw/main/CASSIA_source_R/CASSIA_0.1.0.tar.gz")
We recommend starting with OpenRouter since it provides access to most models through a single API key. While slightly more expensive and occasionally unstable, it offers greater convenience. For production use, direct access via OpenAI or Anthropic provides better stability.
Note that in certain countries, OpenAI and Anthropic may be banned. In these cases, users can use OpenRouter instead.
# For OpenAI
setLLMApiKey("your_openai_api_key", provider = "openai", persist = TRUE)
# For Anthropic
setLLMApiKey("your_anthropic_api_key", provider = "anthropic", persist = TRUE)
# For OpenRouter
setLLMApiKey("your_openrouter_api_key", provider = "openrouter", persist = TRUE)
- API Provider Guides:
CASSIA includes example marker data in two formats:
# Load example data
markers_unprocessed <- loadExampleMarkers(processed = FALSE) # Direct Seurat output
markers_processed <- loadExampleMarkers(processed = TRUE) # Processed format
runCASSIA_pipeline(
output_file_name, # Base name for output files
tissue, # Tissue type (e.g., "brain")
species, # Species (e.g., "human")
marker, # Marker data from findallmarker
max_workers = 4, # Number of parallel workers
annotation_model = "gpt-4o", # Model for annotation
annotation_provider = "openai", # Provider for annotation
score_model = "anthropic/claude-3.5-sonnet", # Model for scoring
score_provider = "openrouter", # Provider for scoring
annotationboost_model="anthropic/claude-3.5-sonnet", #model for annotation boost
annotationboost_provider="openrouter", #provider for annotation boost
score_threshold = 75, # Minimum acceptable score
additional_info = NULL # Optional context information
)
gpt-4o
(recommended): Balanced performance and costgpt-4o-mini
: Faster, more economical optiono1-mini
: Advanced reasoning capabilities (higher cost)
claude-3-5-sonnet-20241022
: High-performance model
anthropic/claude-3.5-sonnet
: High rate limit access to Claudeopenai/gpt-4o-2024-11-20
: Alternative access to GPT-4ometa-llama/llama-3.2-90b-vision-instruct
: Cost-effective open-source option
The pipeline generates four key files:
- Initial annotation results
- Quality scores with reasoning
- Summary report
- Annotation boost report
# Check if API key is set correctly
key <- Sys.getenv("ANTHROPIC_API_KEY")
print(key) # Should not be empty
# Reset API key if needed
setLLMApiKey("your_api_key", provider = "anthropic", persist = TRUE)
- Use absolute paths when necessary
- Check file permissions
- Ensure files aren't open in other programs
- Verify sufficient disk space
- Keep API keys secure
- Maintain sufficient API credits
- Backup data before overwriting files
- Double-check file paths and permissions
Note: This README covers basic CASSIA functionality. For a complete tutorial including advanced features and detailed examples, please visit: CASSIA Complete Tutorial.