An automated document analyzer for Paperless-ngx using OpenAI API and Ollama (Mistral, llama, phi 3, gemma 2) to automatically analyze and tag your documents.
It features: Automode, Manual Mode, Ollama and OpenAI, a Chat function to query your documents with AI, a modern and intuitive Webinterface.
paperless-ai makes changes to the documents in your productive paperlessNGX instance that cannot be easily undone. Do the configuration carefully and think twice. Please test the results beforehand in a separate development environment and be sure to back up your documents and metadata beforehand.
π Thank you for all your support, bug submit, feature requests π
If you upgrade from 1.x to 2.1.x or later:
- You are now forced to setup a user as the Webapp now requires authentication. I know many of you only use it in secured and encapsulated networks and don't care for authentication. But I think we all do good when we secure out data as much as possible.
- You have to set the username the token key belongs to. There were so many bugs and issues that were traced back to Documents having other user/access rights then the api key could provide.
- Thanks for listening, love ya!
- Automatic Scanning: Identifies and processes new documents within Paperless-ngx.
- AI-Powered Analysis: Leverages OpenAI API and Ollama (Mistral, Llama, Phi 3, Gemma 2) for precise document analysis.
- Metadata Assignment: Automatically assigns titles, tags, and correspondent details.
- Predefined Processing Rules: Specify which documents to process based on existing tags. (Optional) π
- Selective Tag Assignment: Use only selected tags for processing. (Disables the prompt dialog) π
- Custom Tagging: Assign a specific tag (of your choice) to AI-processed documents for easy identification. π
- AI-Assisted Analysis: Manually analyze documents with AI support in a modern web interface. (Accessible via the
/manual
endpoint) π
- Document Querying: Ask questions about your documents and receive accurate, AI-generated answers. π
- Streamlined Configuration: Easy-to-use setup interface available at
/setup
. - Dashboard Overview: A clean and organized dashboard for monitoring and managing document processing.
- Error Handling: Automatic restarts and health monitoring for improved stability.
- Health Checks: Ensures system integrity and alerts on potential issues.
- Docker Integration: Full Docker support, including health checks, resource management, and persistent data storage.
- Docker and Docker Compose
- Access to a Paperless-ngx installation
- OpenAI API key or your own Ollama instance with your chosen model running and reachable.
- Basic understanding of cron syntax (for scan interval configuration)
Visit the Wiki for installation:
Click here for Installation
-
Document Discovery
- Periodically scans Paperless-ngx for new documents
- Tracks processed documents in a local SQLite database
-
AI Analysis
- Sends document content to OpenAI API or Ollama for analysis
- Extracts relevant tags and correspondent information
- Uses GPT-4o-mini or your custom Ollama model for accurate document understanding
-
Automatic Organization
- Creates new tags if they don't exist
- Creates new correspondents if they don't exist
- Updates documents with analyzed information
- Marks documents as processed to avoid duplicate analysis
You can now manually analyze your files by hand with the help of AI in a beautiful Webinterface.
Reachable via the /manual
endpoint from the webinterface.
The application can be configured through the Webinterface on the /setup
Route.
You dont need/can't set the environment vars through docker.
The application comes with full Docker support:
- Automatic container restart on failure
- Health monitoring
- Volume persistence for database
- Resource management
- Graceful shutdown handling
# Start the container
docker-compose up -d
# View logs
docker-compose logs -f
# Restart container
docker-compose restart
# Stop container
docker-compose down
# Rebuild and start
docker-compose up -d --build
The application provides a health check endpoint at /health
that returns:
# Healthy system
{
"status": "healthy"
}
# System not configured
{
"status": "not_configured",
"message": "Application setup not completed"
}
# Database error
{
"status": "database_error",
"message": "Database check failed"
}
The application includes a debug interface accessible via /debug
that helps administrators monitor and troubleshoot the system's data:
- π View all system tags
- π Inspect processed documents
- π₯ Review correspondent information
- Navigate to:
http://your-instance:3000/debug
- The interface provides:
- Interactive dropdown to select data category
- Tree view visualization of JSON responses
- Color-coded data representation
- Collapsible/expandable data nodes
Endpoint | Description |
---|---|
/debug/tags | Lists all tags in the system |
/debug/documents | Shows processed document information |
/debug/correspondents | Displays correspondent data |
The debug interface also integrates with the health check system, showing a configuration warning if the system is not properly set up.
To run the application locally without Docker:
- Install dependencies:
npm install
- Start the development server:
npm run test
- Fork the repository
- Create your feature branch (
git checkout -b feature/AmazingFeature
) - Commit your changes (
git commit -m 'Add some AmazingFeature'
) - Push to the branch (
git push origin feature/AmazingFeature
) - Open a Pull Request
- Store API keys securely
- Restrict container access
- Monitor API usage
- Regularly update dependencies
- Back up your database
This project is licensed under the MIT License - see the LICENSE file for details.
- Paperless-ngx for the amazing document management system
- OpenAI API
- The Express.js and Node.js communities for their excellent tools
If you encounter any issues or have questions:
- Check the Issues section
- Create a new issue if yours isn't already listed
- Provide detailed information about your setup and the problem
- Support for custom AI models
- Support for multiple language analysis
- Advanced tag matching algorithms
- Custom rules for document processing
- Enhanced web interface with statistics