Skip to content
forked from stanford-crfm/helm

Holistic Evaluation of Language Models (HELM), a framework to increase the transparency of language models (https://arxiv.org/abs/2211.09110).

License

Notifications You must be signed in to change notification settings

yancong222/helm

 
 

Repository files navigation

Welcome! This repository contains all the assets for the CRFM benchmarking project, which includes the following features:

  • Collection of datasets in a standard format (e.g., NaturalQuestions)
  • Collection of models accessible via a unified API (e.g., GPT-3, MT-NLG, OPT, BLOOM)
  • Collection of metrics beyond accuracy (efficiency, bias, toxicity, etc.)
  • Collection of perturbations for evaluating robustness and fairness (e.g., typos, dialect)
  • Modular framework for constructing prompts from datasets
  • Proxy server for managing accounts and providing unified interface to access models

To read more:

About

Holistic Evaluation of Language Models (HELM), a framework to increase the transparency of language models (https://arxiv.org/abs/2211.09110).

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 94.3%
  • JavaScript 4.6%
  • Other 1.1%