Big_Data_Tech

Querying data from cloud platform is critical for big data analysis. In addition, data manipulation and wrangling is an important skill set in order to get data in appropriate formats for data analysis. This repo collects some of the routinely used functions and code that I used to query big data from AWS and format my dataset. Currently, the techniques in this repo include Presto query, Spark sql, Mysql and Pandas, and tools for querying AWS s3 using Athena, qubole and boto3

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
.ipynb_checkpoints		.ipynb_checkpoints
Athena_s3_Query_Pandas_Tools.ipynb		Athena_s3_Query_Pandas_Tools.ipynb
Data_Transfer_Pandas_s3.ipynb		Data_Transfer_Pandas_s3.ipynb
File_transfer_storage_s3_by_cli_Athena_hive_tables.ipynb		File_transfer_storage_s3_by_cli_Athena_hive_tables.ipynb
MySQL_Pandas_Spark_Window_Functions.ipynb		MySQL_Pandas_Spark_Window_Functions.ipynb
RDS		RDS
README.md		README.md
SageMaker		SageMaker
athena_docker		athena_docker
athena_reagent		athena_reagent
aws_account_config_tool.rtf		aws_account_config_tool.rtf
qubole_presto_tool.ipynb		qubole_presto_tool.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Big_Data_Tech

About

Releases

Packages

Languages

yuanDataScience/Big_Data_Tech

Folders and files

Latest commit

History

Repository files navigation

Big_Data_Tech

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages