Skip to content

Latest commit

 

History

History
11 lines (9 loc) · 2.17 KB

DATASETS.md

File metadata and controls

11 lines (9 loc) · 2.17 KB

Studying historical GitHub datasets allows you to identify trends over time and answer questions about the larger state of open-source software development.

Here is a curated list of academic datasets containing historical GitHub data:

Name URL Dataset Type
GHTorrent http://ghtorrent.org/ MySQL and Mongo database dumps of the GitHub event stream
GHTorrent - BigQuery http://ghtorrent.org/gcloud.html Access to GHTorrent data via BigQuery
GH Archive https://www.gharchive.org/ Compressed JSON dumps of the GitHub event stream
GH Archive - BigQuery https://bigquery.cloud.google.com/table/githubarchive:day.20190827?pli=1&tab=preview Access to GH Archive data via BigQuery
GitHub - BigQuery https://console.cloud.google.com/marketplace/details/github/github-repos?filter=solution-type:dataset&id=46ee22ab-2ca4-4750-81a7-3ee0f0150dcb Access to a full snapshot of the content of more than 2.8 million open source GitHub repositories via BigQuery