MultiPL-E is a system for translating unit test-driven neural code generation benchmarks to new languages. We have used MultiPL-E to translate two popular Python benchmarks (HumanEval and MBPP) to 18 other programming languages.
For more information:
- MultiPL-E is part of the BigCode Code Generation LM Harness. This is the easiest way to use MultiPL-E.
- We have a tutorial on how to use MultiPL-E directly.
- Read our paper MultiPL-E: A Scalable and Polyglot Approach to Benchmarking Neural Code Generation.
- The MultiPL-E dataset of translated prompts is available on the Hugging Face Hub.
-
Version 0.3.0 (work in progress)
-
Version 0.2.0: used to evaluate SantaCoder