CASSIA (Cell type Annotation using Specialized System with Integrated AI) is a Python package for automated cell type annotation in single-cell RNA sequencing data using large language models.
- Automated cell type annotation using multiple LLM providers (OpenAI, Anthropic, OpenRouter)
- Support for batch processing of multiple clusters
- Variance analysis for annotation reliability
- Detailed HTML report generation
- Score-based annotation quality assessment
- Support for marker gene validation
- Interactive analysis with step-by-step reasoning
#Prepare dependency
pip install pandas openai requests numpy anthropic
#Install CASSIA
pip install CASSIA
pip install CASSIA_rag # optional for the RAG agent
import CASSIA
# Set your API keys
CASSIA.set_api_key("your-openai-key", provider="openai")
CASSIA.set_api_key("your-anthropic-key", provider="anthropic")
CASSIA.set_api_key("your-openrouter-key", provider="openrouter")
# Run single cluster analysis
result = CASSIA.run_cell_type_analysis_wrapper(
model="gpt-4",
temperature=0,
marker_list=["CD3D", "CD4", "IL7R"],
tissue="PBMC",
species="human",
provider="openai"
)
# Run batch analysis
results = CASSIA.run_cell_type_analysis_batchrun(
marker="path/to/marker_file.csv",
output_name="results.json",
model="gpt-4",
tissue="PBMC",
species="human"
)
CASSIA accepts marker genes in several formats:
- CSV files with cluster and marker columns
- Pandas DataFrames
- Direct lists of marker genes
- Seurat or Scanpy differential expression results
run_cell_type_analysis_wrapper()
: Single cluster analysis with support for multiple LLM providersrun_cell_type_analysis_batchrun()
: Batch analysis of multiple clustersrun_analysis_n_times()
: Multiple runs for consensus analysis
process_cell_type_variance_analysis_batch()
: Analyze annotation variancescore_annotation_batch()
: Score annotation qualitygenerate_cell_type_analysis_report_wrapper()
: Generate detailed HTML reports
set_api_key()
: Set API keys for different providersget_top_markers()
: Extract top markers from differential expression resultssplit_markers()
: Process marker lists into standardized format
# Basic single cluster analysis
result, conversation = CASSIA.run_cell_type_analysis_wrapper(
model="gpt-4",
temperature=0,
marker_list=["CD3D", "CD4", "IL7R", "FOXP3"],
tissue="PBMC",
species="human",
provider="openai"
)
# Print the main cell type identified
print(result["main_cell_type"])
# Run batch analysis multiple times for consensus
CASSIA.run_batch_analysis_n_times(
n=3, # Number of iterations
marker="markers.csv",
output_name="batch_results",
model="gpt-4",
tissue="lung",
species="human",
max_workers=10
)
CASSIA.generate_cell_type_analysis_report_wrapper(
full_result_path="results.csv",
marker="markers.csv",
cluster_name="cluster_1",
major_cluster_info="Human PBMC",
output_name="analysis_report.html",
model="gpt-4",
provider="openai"
)
We welcome contributions! Please feel free to submit pull requests or open issues on our GitHub repository.
This project is licensed under the MIT License.
For support, please open an issue on our GitHub repository.