Next Step:
- Marker Grouping
2. Incorporating more LLM (CLAUDE)
3. Incorporating RAG
4. Build Web UI
-
Build Python package, R package
-
Incorporating subcluster and cluster funcation
7. WEB UI
This project has multiple versions with varying levels of complexity and automation:
Version 1 (Full Feature Set)
The most comprehensive version, including:
-
Onboarding Agent
-
Functional Gene Expert Agent
-
Cell Type Gene Expert Agent
-
Integration Agent
-
Validator Agent
-
Human Input Component
-
Final Formatting Agent
Version 2 (Reduced Expert Agents)
Similar to Version 1, but excludes:
-
Functional Gene Expert Agent
-
Cell Type Gene Expert Agent
Version 3 (Automated Process)
Fully automatic version, excluding:
-
Onboarding Agent
-
Human Input Agent
Version 4 (Minimal Automatic)
The most streamlined version, including only:
- Integration Agent
-
Introduction
-
Prerequisites
-
Script Overview
-
Key Components
4.1 Agent Class
4.2 Conversation Functions
4.3 Data Processing Functions
4.4 Specialized Agents
-
Main Execution Flow
-
Output
-
Customization and Extension
-
Troubleshooting
-
Conclusion
-
Introduction
This script implements a multi-agent system for analyzing single-cell data. It uses a series of AI agents to perform functional analysis, cell type identification, and integrative annotation of gene markers.
- Prerequisites
-
Python 3.x
-
Required libraries:
openai
,dotenv
,httpx
,re
,os
,json
-
OpenAI API key (set in environment variables)
- Script Overview
The script performs the following high-level steps:
-
Onboarding process to gather initial information
-
Functional analysis of gene markers
-
Cell type analysis of gene markers
-
Integrative analysis combining results from steps 2 and 3
-
Formatting and structuring the final output
-
Key Components
4.1 Agent Class
The Agent
class is the core component of the script. It represents an AI agent capable of engaging in conversations and performing specific tasks.
Key features:
-
Initialization with system prompt, model, and interaction settings
-
Conversation history management
-
Execution of AI model queries
-
Human input handling
4.2 Conversation Functions
-
two_agent_conversation_with_validation_celltype
: Manages the conversation for cell type analysis -
two_agent_conversation_with_validation_functional
: Manages the conversation for functional analysis -
integrate_and_annotate
: Handles the integrative analysis conversation -
onboarding_process
: Gathers initial information from the user
4.3 Data Processing Functions
-
extract_agent1_analysis
: Extracts analysis results from conversation history -
extract_json_from_reply
: Parses JSON data from agent responses -
list_to_comma_separated_string
: Converts list to comma-separated string -
construct_prompt
: Builds the initial prompt based on user input
4.4 Specialized Agents
The script defines several specialized agents:
-
agent1_functional
andagent2_functional
: For functional analysis -
agent1_celltype
andagent2_celltype
: For cell type analysis -
agent3
: For onboarding -
integrative_agent
: For integrative analysis -
formatting_agent
: For formatting final results
-
Main Execution Flow
-
Start onboarding process
-
Conduct functional analysis
-
Perform cell type analysis
-
Execute integrative analysis
-
Format and structure final results
-
Output
The script produces a structured JSON output containing:
-
Main cell type identified
-
Sub-cell types (if applicable)
- Customization and Extension
-
Modify agent system prompts to adjust behavior
-
Add new specialized agents for additional analyses
-
Extend the
Agent
class for more complex interactions
- Troubleshooting
-
Ensure all required libraries are installed
-
Verify that the OpenAI API key is correctly set in environment variables
-
Check for any rate limiting or API usage issues with OpenAI
- Conclusion
This script provides a flexible framework for single-cell data analysis using AI agents. It can be adapted and extended for various types of biological data analysis tasks.