Simple, Opinionated benchmark for testing the viability of Efficient Language Models (ELMs) for personal use cases.
Uses bun, promptfoo, and ollama for a minimalist, cross-platform, local LLM prompt testing & benchmarking experience.
- Install Bun
- Install Ollama
- Install llama3
ollama run llama3
- Install phi3
ollama run phi3
- Install gemma
ollama run gemma
- Install llama3
- Setup .env variables
cp .env.sample .env
- Add your OpenAI API key to the .env file
- Install dependencies:
bun i
- Run the minimal tests:
bun minimal
- Open test viewer:
bun view
- Run the ELM-ITV tests:
bun elm
- First, watch the video where we walk through ELMs and this codebase.
- To get started take a look at
BENCH__minimal_test_suite/
to get an idea of how to structure a basic test suite. - Next take a look at the
BENCH__efficient_language_models/
test suite to get an idea of how you can setup tests for your own viability tests for ELMs. - Explore other ollama based models you can test
- Or OpenAI models
- Or Anthropic models
- Or Groq models
- Modify the
BENCH__minimal_test_suite/
orBENCH__efficient_language_models/
to suit your needs - Create a new test with the Create a new test suite script
/BENCH__<name of test suite>
/prompt.txt
- the prompt(s) to test/test.yaml
- variables and assertions/promptfooconfig.yaml
- llm model config
- Create a new test suite:
bun run ./scripts/new_prompt_test
- Run a test prompt against a running ollama server
bun run ./scripts/ollama_local_model_call
- Ollama model library
- LMSYS Chatbot Arena Leaderboard
- Ollama api.md docs
- Promptfoo Ollama Provider
- Promptfoo LLM Providers
- Promptfoo Assertions