This repository contains a multi-agent workflow designed to function as a virtual scholar agent. The system automates summarizing team documents, analyzing research opportunities, retrieving relevant academic papers, and generating actionable insights for research teams.
The Virtual Scholar Agent System operates on a daily, weekly, or monthly basis to streamline knowledge synthesis and research strategy for teams.
![image](https://private-user-images.githubusercontent.com/75750464/396329039-17eb03e8-7821-4cf1-b22a-849f313265bd.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3Mzk1OTAzMDksIm5iZiI6MTczOTU5MDAwOSwicGF0aCI6Ii83NTc1MDQ2NC8zOTYzMjkwMzktMTdlYjAzZTgtNzgyMS00Y2YxLWIyMmEtODQ5ZjMxMzI2NWJkLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNTAyMTUlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjUwMjE1VDAzMjY0OVomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWQwY2IxZjY1NTZmNzExNDdkODRmYWJlYzI4MGEyYjY3ZDMyMThjYTk2NTNhMGRmYzdmYmI5ZTBlZTE1NmYwNTkmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.1kqk29oJcWO522G_EeFwq6KpEZ6siuGTGO0-jlVWc8U)
-
Centralized Upload Step
- Internal team documents are uploaded to a centralized folder.
- Documents must be well-organized and tagged with metadata (e.g., project name, date).
-
Project Context Agent
- Summarizes internal team documents under predefined categories:
- Research Framework & Key Concepts
- Critical Analysis & Improvement Areas
- Research Gaps & Extension Opportunities
- Literature Review Strategy: Keywords for paper search
- Highlights new information compared to the previous week’s documents.
- Generates a set of optimized, refined keywords for paper retrieval.
- Summarizes internal team documents under predefined categories:
-
Summary Project Context Agent
- Aggregates individual summaries while avoiding redundant insights.
-
Rough Paper Retrieval Agent
- Fetches the latest academic papers (e.g., published in the past month) using the refined keywords.
- Includes a scoring mechanism for relevance, based on metrics such as:
- Download count / Citation count (For latest papers maybe only using download count )
- Abstract similarity
-
Granular Paper Retrieval Agent
- Matches paper embeddings against internal team documents for deeper context alignment.
- Suggests tiered relevance levels:
- Highly Relevant
- Moderately Relevant
-
Counseling Agent
- Summarize how each selected papers could help improve the projects
-
Final Output
- A clear, categorized report that includes:
- Summarized internal context
- Highlighted research gaps and improvement areas
- Refined keywords for literature search
- Annotated external papers with actionable insights and tiered relevance levels
- A clear, categorized report that includes:
- Automated Document Analysis: Extracts and summarizes key research points from internal documents.
- Change Detection: Highlights new insights compared to previous document versions.
- Keyword Optimization: Generates refined keywords for precise literature search.
- Relevance Scoring: Assesses external papers based on citation count and content similarity.
- Contextual Paper Filtering: Embedding-based matching for deeper alignment.
- Actionable Insights: Annotates external papers with follow-up recommendations.
- Flexible Scheduling: Runs daily, weekly, or monthly depending on team needs.
-
Create a Virtual Environment: Run the following command to create a virtual environment named
venv
:python3 -m venv venv
-
Activate the Virtual Environment: After creating the virtual environment, activate it using the following command:
On macOS and Linux:
source venv/bin/activate
On Windows:
.\venv\Scripts\activate
-
Install the Required Packages: Once the virtual environment is activated, install the required packages using the
requirements.txt
file:pip install -r requirements.txt
-
Set Up Centralized Folder: Upload team documents to the
data/team_docs/
directory for analysis. -
Configure API Keys:
- For paper retrieval, integrate APIs such as Semantic Scholar, arXiv, or other academic databases.
- Add your OpenAI API Key to a
.env
file in the project directory. The.env
file should include the following:OPENAI_API_KEY=your_openai_api_key_here
- Ensure that the system reads this key during execution.
-
Run the Main Application: To run the main application with a UI, use the following command:
python app.py
After running, access the application in your browser at: http://127.0.0.1:5000.
- Upload your files via the UI.
- Click on 'Start Analysis' to begin the analysis.
- Once the analysis is complete, download the results directly from the UI.
- Use the chatbot feature to ask questions about internal or external papers.
-
Run Offline Analysis: To analyze team documents offline without the UI, run:
python scripts/analyze_team_docs.py
The system will:
- Summarize internal documents.
- Retrieve relevant papers.
- Generate research insights and feedback.
Results will be saved in the output/
folder.
The system generates a structured report including:
- Summarized Internal Context
- Highlighted Research Gaps & Opportunities
- Optimized Keywords for Literature Search
- Annotated External Papers
- Tiered relevance levels
- Follow-up actions and collaboration recommendations
- Enable multimodal data ingestion to analyze various types of data, including:
- Text documents (current implementation)
- Images and diagrams (e.g., flowcharts, annotated graphs)
- Audio and video content (e.g., meeting recordings, presentations)
- Generate insights that combine analysis across multiple data modalities for richer understanding.
- Scale search to multiple academic repositories and websites:
- Add support for databases such as IEEE Xplore, Springer, PubMed, and others.
- Implement aggregation mechanisms to prioritize search results across platforms.
- Introduce adaptive search strategies to optimize queries dynamically based on project needs.
- Enhance recommendation capabilities by:
- Suggesting proactive collaboration opportunities with authors of relevant external papers.
- Incorporating trend analysis to predict upcoming research directions and topics based on retrieved papers.
- Providing visualizations of research gaps, collaboration pathways, and key insights for easier interpretation.
We welcome contributions! Please open an issue to suggest improvements or submit a pull request.
This project is licensed under the MIT License.
For questions or collaborations, reach out via email: [email protected], [email protected].
Happy researching! 🚀