-
yfin-etl Public
Forked from Mega-Barrel/yfin-etlYahoo Finance ETL script
Python MIT License UpdatedJul 21, 2023 -
-
-
spark Public
Forked from apache/sparkApache Spark - A unified analytics engine for large-scale data processing
Scala Apache License 2.0 UpdatedApr 11, 2021 -
spark-redshift Public
Forked from spark-redshift-community/spark-redshiftPerformant Redshift data source for Apache Spark
Scala Apache License 2.0 UpdatedJan 7, 2021 -
wos-spark-manager-api Public
Forked from arsuryan/wos-spark-manager-apiA Flask based application that facilitates IBM Watson OpenScale to read/write files from/to remote HDFS, run and get details about a job running in remote Spark cluster.
Python Apache License 2.0 UpdatedDec 3, 2020 -
chombo Public
Forked from pranab/chomboBig Data ETL and Utilities for Hadoop Map Reduce, Spark and Storm
Java UpdatedNov 12, 2020 -
airbyte Public
Forked from airbytehq/airbyteAirbyte is an open-source data integration engine that helps you consolidate your data in your warehouses.
Java MIT License UpdatedSep 24, 2020 -
Udacity_Nanodegree_Project Public
Udacity Nanodegree AWS Cloud Architect Project Work
HCL UpdatedJul 16, 2020 -
flink-registry-avro-row-schema Public
Forked from ztore/flink-registry-avro-row-schemaFlink Avro Format schema that supports schema registry.
-
SqlShift Public
Forked from himanshpal/SqlShiftMysql to Redshift data transfer using Apache Spark.
Scala MIT License UpdatedOct 14, 2019 -
ndscheduler Public
Forked from Nextdoor/ndschedulerA flexible python library for building your own cron-like system, with REST APIs and a Web UI.
Python BSD 2-Clause "Simplified" License UpdatedJun 2, 2019 -
metorikku Public
Forked from YotpoLtd/metorikkuA simplified, lightweight ELT Framework based on Apache Spark
Scala MIT License UpdatedJan 11, 2019 -
data-engineering-gcp Public
Forked from jorwalk/data-engineering-gcpData Engineering on Google Cloud Platform
Jupyter Notebook UpdatedOct 29, 2018 -
babar Public
Forked from criteo/babarProfiler for large-scale distributed java applications (Spark, Scalding, MapReduce, Hive,...) on YARN.
Java Apache License 2.0 UpdatedSep 7, 2018 -
JSON-Linting Public
Forked from bennadel/JSON-LintingThis application provides simple, secure, 100% client-side, network-free JSON linting that ensures that no one else is seeing the data you are testing. This is worry-free JSON linting.
HTML UpdatedJun 29, 2018 -
-
Optimus Public
Forked from hi-primus/optimus🚀 Optimus is the missing framework for cleansing (cleaning and much more), pre-processing and exploratory data analysis in a distributed fashion with Apache Spark.
Python Apache License 2.0 UpdatedMay 30, 2018 -
crossdata Public
Forked from Stratio/crossdataEasy access to big things. Library for Apache Spark extending and improving its capabilities
Scala Apache License 2.0 UpdatedFeb 22, 2018 -
pyspark-example-project Public
Forked from AlexIoannides/pyspark-example-projectExample project and best practices for Python-based Spark ETL jobs and applications.
Python UpdatedJan 18, 2018 -
developer-roadmap Public
Forked from kamranahmedse/developer-roadmapRoadmap to becoming a web developer in 2018
UpdatedJan 11, 2018 -
bigdata-ecosystem Public
Forked from zenkay/bigdata-ecosystemBigData Ecosystem Dataset
HTML Other UpdatedNov 19, 2017 -
-
Spark Streaming Kafka Example
Apache License 2.0 UpdatedMay 31, 2017 -
-
cm_api Public
Forked from cloudera/cm_apiCloudera Manager API Client
Java Apache License 2.0 UpdatedApr 19, 2017 -
Will Contains All the Code Related to Different File Format Analysis
HTML UpdatedApr 15, 2017 -
-
spark-json-schema Public
Forked from zalando-incubator/spark-json-schemaJSON schema parser for Apache Spark
Scala MIT License UpdatedApr 6, 2017 -