Sesamo -- Virtual Scholar Agent System

This repository contains a multi-agent workflow designed to function as a virtual scholar agent. The system automates summarizing team documents, analyzing research opportunities, retrieving relevant academic papers, and generating actionable insights for research teams.

Overview

The Virtual Scholar Agent System operates on a daily, weekly, or monthly basis to streamline knowledge synthesis and research strategy for teams.

Key Workflow Steps

Centralized Upload Step
- Internal team documents are uploaded to a centralized folder.
- Documents must be well-organized and tagged with metadata (e.g., project name, date).
Project Context Agent
- Summarizes internal team documents under predefined categories:
  - Research Framework & Key Concepts
  - Critical Analysis & Improvement Areas
  - Research Gaps & Extension Opportunities
  - Literature Review Strategy: Keywords for paper search
- Highlights new information compared to the previous week’s documents.
- Generates a set of optimized, refined keywords for paper retrieval.
Summary Project Context Agent
- Aggregates individual summaries while avoiding redundant insights.
Rough Paper Retrieval Agent
- Fetches the latest academic papers (e.g., published in the past month) using the refined keywords.
- Includes a scoring mechanism for relevance, based on metrics such as:
  - Download count / Citation count (For latest papers maybe only using download count )
  - Abstract similarity
Granular Paper Retrieval Agent
- Matches paper embeddings against internal team documents for deeper context alignment.
- Suggests tiered relevance levels:
  - Highly Relevant
  - Moderately Relevant
Counseling Agent
- Summarize how each selected papers could help improve the projects
Final Output
- A clear, categorized report that includes:
  - Summarized internal context
  - Highlighted research gaps and improvement areas
  - Refined keywords for literature search
  - Annotated external papers with actionable insights and tiered relevance levels

Features

Automated Document Analysis: Extracts and summarizes key research points from internal documents.
Change Detection: Highlights new insights compared to previous document versions.
Keyword Optimization: Generates refined keywords for precise literature search.
Relevance Scoring: Assesses external papers based on citation count and content similarity.
Contextual Paper Filtering: Embedding-based matching for deeper alignment.
Actionable Insights: Annotates external papers with follow-up recommendations.
Flexible Scheduling: Runs daily, weekly, or monthly depending on team needs.

Workflow Diagram

Installation

Create a Virtual Environment: Run the following command to create a virtual environment named venv:
```
python3 -m venv venv
```
Activate the Virtual Environment: After creating the virtual environment, activate it using the following command:

On macOS and Linux:
```
source venv/bin/activate
```
On Windows:
```
.\venv\Scripts\activate
```
Install the Required Packages: Once the virtual environment is activated, install the required packages using the requirements.txt file:
```
pip install -r requirements.txt
```
Set Up Centralized Folder: Upload team documents to the data/team_docs/ directory for analysis.
Configure API Keys:
- For paper retrieval, integrate APIs such as Semantic Scholar, arXiv, or other academic databases.
- Add your OpenAI API Key to a .env file in the project directory. The .env file should include the following:
```
OPENAI_API_KEY=your_openai_api_key_here
```
- Ensure that the system reads this key during execution.

Usage

Run the Main Application: To run the main application with a UI, use the following command:
```
python app.py
```
After running, access the application in your browser at: http://127.0.0.1:5000.
- Upload your files via the UI.
- Click on 'Start Analysis' to begin the analysis.
- Once the analysis is complete, download the results directly from the UI.
- Use the chatbot feature to ask questions about internal or external papers.
Run Offline Analysis: To analyze team documents offline without the UI, run:
```
python scripts/analyze_team_docs.py
```

The system will:

Summarize internal documents.
Retrieve relevant papers.
Generate research insights and feedback.

Results will be saved in the output/ folder.

Outputs

The system generates a structured report including:

Summarized Internal Context
Highlighted Research Gaps & Opportunities
Optimized Keywords for Literature Search
Annotated External Papers
- Tiered relevance levels
- Follow-up actions and collaboration recommendations

Future Enhancements

Internal Artifacts Review

Enable multimodal data ingestion to analyze various types of data, including:
- Text documents (current implementation)
- Images and diagrams (e.g., flowcharts, annotated graphs)
- Audio and video content (e.g., meeting recordings, presentations)
Generate insights that combine analysis across multiple data modalities for richer understanding.

External Paper Retrieval

Scale search to multiple academic repositories and websites:
- Add support for databases such as IEEE Xplore, Springer, PubMed, and others.
- Implement aggregation mechanisms to prioritize search results across platforms.
Introduce adaptive search strategies to optimize queries dynamically based on project needs.

Project Improvement Recommendation

Enhance recommendation capabilities by:
- Suggesting proactive collaboration opportunities with authors of relevant external papers.
- Incorporating trend analysis to predict upcoming research directions and topics based on retrieved papers.
- Providing visualizations of research gaps, collaboration pathways, and key insights for easier interpretation.

Contributions

We welcome contributions! Please open an issue to suggest improvements or submit a pull request.

License

This project is licensed under the MIT License.

Contact

For questions or collaborations, reach out via email: [email protected], [email protected].

Happy researching! 🚀

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
data/team_docs		data/team_docs
scripts		scripts
static		static
templates		templates
LICENSE		LICENSE
README.md		README.md
app.py		app.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sesamo -- Virtual Scholar Agent System

Overview

Key Workflow Steps

Features

Workflow Diagram

Installation

Usage

Outputs

Future Enhancements

Internal Artifacts Review

External Paper Retrieval

Project Improvement Recommendation

Contributions

License

Contact

About

Releases

Packages

Contributors 2

Languages

License

cc4019/SesamoHackathon

Folders and files

Latest commit

History

Repository files navigation

Sesamo -- Virtual Scholar Agent System

Overview

Key Workflow Steps

Features

Workflow Diagram

Installation

Usage

Outputs

Future Enhancements

Internal Artifacts Review

External Paper Retrieval

Project Improvement Recommendation

Contributions

License

Contact

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages