Efficient Language Model Personal Viability Benchmarking

Simple, Opinionated benchmark for testing the viability of Efficient Language Models (ELMs) for personal use cases.

Uses bun, promptfoo, and ollama for a minimalist, cross-platform, local LLM prompt testing & benchmarking experience.

Setup

Install Bun
Install Ollama
- Install llama3 ollama run llama3
- Install phi3 ollama run phi3
- Install gemma ollama run gemma
Setup .env variables
- cp .env.sample .env
- Add your OpenAI API key to the .env file
Install dependencies: bun i
Run the minimal tests: bun minimal
Open test viewer: bun view
Run the ELM-ITV tests: bun elm

Guide

First, watch the video where we walk through ELMs and this codebase.
To get started take a look at BENCH__minimal_test_suite/ to get an idea of how to structure a basic test suite.
Next take a look at the BENCH__efficient_language_models/ test suite to get an idea of how you can setup tests for your own viability tests for ELMs.
Explore other ollama based models you can test
- Or OpenAI models
- Or Anthropic models
- Or Groq models
Modify the BENCH__minimal_test_suite/ or BENCH__efficient_language_models/ to suit your needs
Create a new test with the Create a new test suite script

Folder Structure

/BENCH__<name of test suite>
- /prompt.txt - the prompt(s) to test
- /test.yaml - variables and assertions
- /promptfooconfig.yaml - llm model config

Scripts

Create a new test suite: bun run ./scripts/new_prompt_test
Run a test prompt against a running ollama server bun run ./scripts/ollama_local_model_call

Resources

Ollama model library
- https://ollama.com/library
LMSYS Chatbot Arena Leaderboard
- https://huggingface.co/spaces/lmsys/chatbot-arena-leaderboard
Ollama api.md docs
- https://github.com/ollama/ollama/blob/main/docs/api.md#generate-a-chat-completion
Promptfoo Ollama Provider
- https://promptfoo.dev/docs/providers/ollama
Promptfoo LLM Providers
- https://www.promptfoo.dev/docs/providers
Promptfoo Assertions
- https://www.promptfoo.dev/docs/configuration/expected-outputs/

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
BENCH__efficient_language_models		BENCH__efficient_language_models
BENCH__minimal_test_suite		BENCH__minimal_test_suite
custom_models		custom_models
imgs		imgs
scripts		scripts
.env.sample		.env.sample
.gitignore		.gitignore
README.md		README.md
package.json		package.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Efficient Language Model Personal Viability Benchmarking

Setup

Guide

Folder Structure

Scripts

Resources

About

Releases

Packages

Languages

disler/elm-itv-benchmark

Folders and files

Latest commit

History

Repository files navigation

Efficient Language Model Personal Viability Benchmarking

Setup

Guide

Folder Structure

Scripts

Resources

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages