The official evaluation suite and dynamic data release for MixEval.
benchmark
evaluation
benchmarking-suite
evaluation-framework
benchmarking-framework
foundation-models
large-language-models
large-language-model
llm-inference
llm-evaluation
large-multimodal-models
llm-evaluation-framework
benchmark-mixture
mixeval
-
Updated
Nov 10, 2024 - Python