Easy2Hard-Bench: Standardized Difficulty Labels for Profiling LLM Performance and Generalization

NeurIPS 2024 Track Datasets and Benchmarks

Mucong Ding* · Chenghao Deng* · Jocelyn Choo · Zichu Wu · Aakriti Agarawal · Avi Schwarzschild · Tianyi Zhou · Tom Goldstein · John Langford · Anima Anandkumar · Furong Huang

[Paper] · [Dataset] · [Project page] · [X (Twitter)]

The codebase for the paper "Easy2Hard-Bench: Standardized Difficulty Labels for Profiling LLM Performance and Generalization" (https://arxiv.org/abs/2409.18433) by Mucong Ding*, Chenghao Deng*, Jocelyn Choo, Zichu Wu, Aakriti Agrawal, Avi Schwarzschild, Tianyi Zhou, Tom Goldstein, John Langford, Anima Anandkumar, Furong Huang.

We are still working on the final version of evaluation code for Easy2Hard-Bench. See you soon!

Citing

Please cite our work if you find it is helpful:

@inproceedings{
          ding2024easyhardbench,
          title={Easy2Hard-Bench: Standardized Difficulty Labels for Profiling {LLM} Performance and Generalization},
          author={Mucong Ding and Chenghao Deng and Jocelyn Choo and Zichu Wu and Aakriti Agrawal and Avi Schwarzschild and Tianyi Zhou and Tom Goldstein and John Langford and Anima Anandkumar and Furong Huang},
          booktitle={The Thirty-eight Conference on Neural Information Processing Systems Datasets and Benchmarks Track},
          year={2024},
}

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
img		img
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Easy2Hard-Bench: Standardized Difficulty Labels for Profiling LLM Performance and Generalization

NeurIPS 2024 Track Datasets and Benchmarks

[Paper] · [Dataset] · [Project page] · [X (Twitter)]

We are still working on the final version of evaluation code for Easy2Hard-Bench. See you soon!

Citing

About

Releases

Packages

umd-huang-lab/Easy2Hard-Bench

Folders and files

Latest commit

History

Repository files navigation

Easy2Hard-Bench: Standardized Difficulty Labels for Profiling LLM Performance and Generalization

NeurIPS 2024 Track Datasets and Benchmarks

[Paper] · [Dataset] · [Project page] · [X (Twitter)]

We are still working on the final version of evaluation code for Easy2Hard-Bench. See you soon!

Citing

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages