Querying data from cloud platform is critical for big data analysis. In addition, data manipulation and wrangling is an important skill set in order to get data in appropriate formats for data analysis. This repo collects some of the routinely used functions and code that I used to query big data from AWS and format my dataset. Currently, the techniques in this repo include Presto query, Spark sql, Mysql and Pandas, and tools for querying AWS s3 using Athena, qubole and boto3
-
Notifications
You must be signed in to change notification settings - Fork 0
yuanDataScience/Big_Data_Tech
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
About
Data process (Spark sql, Mysql and Pandas); Big data query of AWS s3 (Presto via Athena, qubole and boto3)
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published