Skip to content

ElliotXie/CASSIA

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CASSIA

CASSIA (Collaborative Agent System for Single cell Interpretable Annotation) is a tool that enhances cell type annotation using multi-agent Large Language Models (LLMs).

📖 Read our paper for detailed methodology and benchmarking results.

📝 Example R workflow with TS large intestine data

📚 Complete Documentation

🌐 Try CASSIA Web UI - A web interface for basic CASSIA functionality

Installation

Option 1: Install from GitHub

# Install dependencies
install.packages("devtools")
install.packages("reticulate")

# Install CASSIA
devtools::install_github("ElliotXie/CASSIA/CASSIA_R")

Option 2: Install from source

install.packages("reticulate")
install.packages("remotes")
remotes::install_url("https://github.com/ElliotXie/CASSIA/raw/main/CASSIA_source_R/CASSIA_0.1.0.tar.gz")

Set Up API Keys

We recommend starting with OpenRouter as it provides access to most models with a single API key. While slightly more expensive, it offers greater convenience. Direct access via OpenAI or Anthropic provides more stability for production use.

# For OpenAI
setLLMApiKey("your_openai_api_key", provider = "openai", persist = TRUE)

# For Anthropic
setLLMApiKey("your_anthropic_api_key", provider = "anthropic", persist = TRUE)

# For OpenRouter
setLLMApiKey("your_openrouter_api_key", provider = "openrouter", persist = TRUE)

Example Data

CASSIA includes example marker data in two formats:

# Load example data
markers_unprocessed <- loadExampleMarkers(processed = FALSE)  # Direct Seurat output
markers_processed <- loadExampleMarkers(processed = TRUE)     # Processed format

Pipeline Usage

runCASSIA_pipeline(
    output_file_name,     # Base name for output files
    tissue,               # Tissue type (e.g., "brain")
    species,              # Species (e.g., "human")
    marker,               # Marker data from findallmarker
    max_workers = 4,      # Number of parallel workers
    annotation_model = "gpt-4o",                    # Model for annotation
    annotation_provider = "openai",                 # Provider for annotation
    score_model = "anthropic/claude-3.5-sonnet",    # Model for scoring
    score_provider = "openrouter",                  # Provider for scoring
    annotationboost_model="anthropic/claude-3.5-sonnet", #model for annotation boost
    annotationboost_provider="openrouter", #provider for annotation boost
    score_threshold = 75,                          # Minimum acceptable score
    additional_info = NULL                         # Optional context information
)

Supported Models

OpenAI (Most Common)

  • gpt-4o (recommended): Balanced performance and cost
  • gpt-4o-mini: Faster, more economical option
  • o1-mini: Advanced reasoning capabilities (higher cost)

Anthropic

  • claude-3-5-sonnet-20241022: High-performance model

OpenRouter

  • anthropic/claude-3.5-sonnet: High rate limit access to Claude
  • openai/gpt-4o-2024-11-20: Alternative access to GPT-4o
  • meta-llama/llama-3.2-90b-vision-instruct: Cost-effective open-source option

Output

The pipeline generates four key files:

  1. Initial annotation results
  2. Quality scores with reasoning
  3. Summary report
  4. Annotation boost report

Troubleshooting

Authentication (Error 401)

# Check if API key is set correctly
key <- Sys.getenv("ANTHROPIC_API_KEY")
print(key)  # Should not be empty

# Reset API key if needed
setLLMApiKey("your_api_key", provider = "anthropic", persist = TRUE)

File Errors

  • Use absolute paths when necessary
  • Check file permissions
  • Ensure files aren't open in other programs
  • Verify sufficient disk space

Best Practices

  • Keep API keys secure
  • Maintain sufficient API credits
  • Backup data before overwriting files
  • Double-check file paths and permissions

Note: This README covers basic CASSIA functionality. For a complete tutorial including advanced features and detailed examples, please visit: CASSIA Complete Tutorial.

About

CASSIA: A multiagent llm based single cell Annottaion framework

Resources

Stars

Watchers

Forks

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •