webup

Haili Zhang webup

65 followers · 42 following

Achievements

Stars

⚖️ Evaluation

10 repositories

chuzhumin98 / PRE

A general framework used on evaluating the performance of large language models (LLMs) based on the peer review mechanism among LLMs

Python 16 2 Updated Aug 3, 2024

tencent-ailab / persona-hub

Official repo for the paper "Scaling Synthetic Data Creation with 1,000,000,000 Personas"

Python 937 64 Updated Sep 25, 2024

symflower / eval-dev-quality

DevQualityEval: An evaluation benchmark 📈 and framework to compare and evolve the quality of code generation of LLMs.

Go 139 5 Updated Dec 17, 2024

Arize-ai / phoenix

AI Observability & Evaluation

Jupyter Notebook 4,312 318 Updated Dec 18, 2024

gkamradt / LLMTest_NeedleInAHaystack

Doing simple retrieval from LLM models at various context lengths to measure accuracy

Jupyter Notebook 1,611 177 Updated Aug 17, 2024

cambridgeltl / PairS

Aligning with Human Judgement: The Role of Pairwise Preference in Large Language Model Evaluators (Liu et al.; arXiv preprint arXiv:2403.16950)

Python 40 1 Updated Jul 11, 2024

huggingface / evaluation-guidebook

Sharing both practical insights and theoretical knowledge about LLM evaluation that we gathered while managing the Open LLM Leaderboard and designing lighteval!

Jupyter Notebook 903 55 Updated Dec 16, 2024

hinthornw / promptimizer

Prompt optimization scratch

Python 533 37 Updated Dec 13, 2024

ianarawjo / ChainForge

An open-source visual programming environment for battle-testing prompts to LLMs.

TypeScript 2,417 189 Updated Dec 16, 2024

langchain-ai / agent-evals

Evals for agents

Python 2 2 Updated Dec 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Haili Zhang webup

Achievements

Achievements

Block or report webup

⚖️ Evaluation

chuzhumin98 / PRE

tencent-ailab / persona-hub

symflower / eval-dev-quality

Arize-ai / phoenix

gkamradt / LLMTest_NeedleInAHaystack

cambridgeltl / PairS

huggingface / evaluation-guidebook

hinthornw / promptimizer

ianarawjo / ChainForge

langchain-ai / agent-evals