GitHub - softwaredoug/local-llm-judge: Local LLM as a search relevance judge

Local LLM Search Relevance Judge

(Runs on Apple Silicon only with MLX)

Using the WANDS dataset, use a local LLM (Qwen 2.5) to try to evaluate pairwise search relevance preferenec.

The LLM strategies here attempt to recover the pairwise relevance preference of the WANDS human labelers. See this blog post

$ poetry install

Get Qwen from Hugging face, convert to MLX format

$ mkdir -p ~/.mlx
$ poetry run mlx_lm.convert --hf-path Qwen/Qwen2.5-7B-Instruct --mlx-path ~/.mlx/Qwen2.5-7B-Instruct/ -q\n

Run local judge

$ poetry run python -m local_llm_judge.main --verbose --eval-fn name

Optionally - Talk to Qwen

poetry run python -m local_llm_judge.shell

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
local_llm_judge		local_llm_judge
.flake8		.flake8
.gitignore		.gitignore
README.md		README.md
collect.sh		collect.sh
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml