Skip to content

Data process (Spark sql, Mysql and Pandas); Big data query of AWS s3 (Presto via Athena, qubole and boto3)

Notifications You must be signed in to change notification settings

yuanDataScience/Big_Data_Tech

Repository files navigation

Big_Data_Tech

Querying data from cloud platform is critical for big data analysis. In addition, data manipulation and wrangling is an important skill set in order to get data in appropriate formats for data analysis. This repo collects some of the routinely used functions and code that I used to query big data from AWS and format my dataset. Currently, the techniques in this repo include Presto query, Spark sql, Mysql and Pandas, and tools for querying AWS s3 using Athena, qubole and boto3

About

Data process (Spark sql, Mysql and Pandas); Big data query of AWS s3 (Presto via Athena, qubole and boto3)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published