Feather is a lightweight framework for statistical testing and validation of LLM outputs and behaviors. With Feather, you can implement comprehensive test suites, automated evaluations, and behavioral checks to ensure your AI applications perform reliably and align with specified requirements.
- 📊 Statistical Testing: Comprehensive testing suite for model behavior validation
- ✍️ Evaluations: Quantitative and qualitative metrics to measure model performance
- 🛡️ Validations: Simple safety checks and output validation
- Grab your API key here: app.pegasi.ai
- Quickstart Evals notebook:
- DeepSeek-R1 on FinQA notebook:
- Establish AI validators
- Setup out-of-the-box Judges
- Add distribution-based testing
- Expand statistical validation tools
- Improve test results visualization
- Enable custom test case creation
- Add community-driven test suites