Have you ever worked on data processing billions of records? Unless you're working as a data engineer, you won't have the experience of processing billions of records. The github activity dataset, which contains records of developers' activities on GitHub sing 2012, contains billions of log records.
To handle this, we will use Google BigQuery, powerful and effective data warehouse tool. Even without spending a lot of money, you can experience the experience of analyzing Github records over the years through Google BigQuery.
pip install --upgrade google-cloud-bigquery
pip install --upgrade pandas-gpq
pip install --upgrade six
pip install --upgrade pyarrow
If you wanna run the scripts in the repository, Download the credential json file according to the link and save it in the credentials/
folder
Read this article. You need to create a personal access token to use Github API
It is so easy to handle Google BigQuery in the Jupyter. Let's learn 3 ways to handle Google BigQuery in Jupyter.
This repository is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.