AI2's Mosaic Team has created benchmark datasets for various Commonsense Understanding tasks. To keep track of progress, each dataset is associated with a leaderboard linked here: https://leaderboard.allenai.org/
This repository provides implementations for baselines and evaluation scripts for each dataset.
- αNLI: Abductive Natural Language Inference
- VCR: Visual Commonsense Reasoning
- HellaSwag: Can a Machine Really Finish Your Sentence?
- Social IQA: Commonsense Reasoning about Social Interactions
- Physical IQA: Commonsense Reasoning about Physical Interactions
- Cosmos QA: Machine Reading Comprehension with Contextual Commonsense Reasoning
- WinoGrande: An Adversarial Winograd Schema Challenge at Scale