Skip to content

Github Public Event들의 활동 양상을 분석하고 정리한 리파짓토리

Notifications You must be signed in to change notification settings

vienna-project/github-analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Introduction

Have you ever worked on data processing billions of records? Unless you're working as a data engineer, you won't have the experience of processing billions of records. The github activity dataset, which contains records of developers' activities on GitHub sing 2012, contains billions of log records.

To handle this, we will use Google BigQuery, powerful and effective data warehouse tool. Even without spending a lot of money, you can experience the experience of analyzing Github records over the years through Google BigQuery.

requirements

1. install python packages
pip install --upgrade google-cloud-bigquery
pip install --upgrade pandas-gpq
pip install --upgrade six
pip install --upgrade pyarrow
2. Get Bigquery Credentials

If you wanna run the scripts in the repository, Download the credential json file according to the link and save it in the credentials/ folder

3. Get Github Credentials

Read this article. You need to create a personal access token to use Github API

Reading List

It is so easy to handle Google BigQuery in the Jupyter. Let's learn 3 ways to handle Google BigQuery in Jupyter.
Before we start analyzing the data, let's see how the github archive is structured.
Grasp the overall aspect of the github action log
Understand how to obtain the data ghrough the Github API

CopyRight CC BY-SA 4.0

This repository is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

CC BY-SA 4.0

About

Github Public Event들의 활동 양상을 분석하고 정리한 리파짓토리

Topics

Resources

Stars

Watchers

Forks