GitHub - botwala/pipeline: Real-time, End-to-End, Advanced Analytics and Machine Learning Recommendation Pipeline

Complete Real-time, End-to-End, Machine Learning and AI Pipeline

Follow Wiki to Setup Docker-based Environment

New! Production-Ready, Docker/Kubernetes, and NetflixOSS-based PipelineIO

Architecture Overview

Follow Wiki to Setup Docker-based Environment

Mapped to Code

Powered by the PANCAKE STACK!

Upcoming Workshops

Title

Building an End-to-End Streaming Analytics and Recommendations Pipeline with Spark, Kafka, and TensorFlow

Agenda (Full Day)

Part 1 (Analytics and Visualizations)

Analytics and Visualizations Overview (Live Demo!)
Verify Environment Setup (Docker, Cloud Instance)
Notebooks (Zeppelin, Jupyter/iPython)
Interactive Data Analytics (Spark SQL, Hive, Presto)
Graph Analytics (Spark, Elastic, NetworkX, TitanDB)
Time-series Analytics (Spark, Cassandra)
Visualizations (Kibana, Matplotlib, D3)
Approximate Queries (Spark SQL, Redis, Algebird)
Workflow Management (Airflow)

Part 2 (Streaming and Recommendations)

Streaming and Recommendations (Live Demo!)
Streaming (NiFi, Kafka, Spark Streaming, Flink)
Cluster-based Recommendation (Spark ML, Scikit-Learn)
Graph-based Recommendation (Spark ML, Spark Graph)
Collaborative-based Recommendation (Spark ML)
NLP-based Recommendation (CoreNLP, NLTK)
Geo-based Recommendation (ElasticSearch)
Hybrid On-Premise+Cloud Auto-scale Deploy (Docker)
Save Workshop Environment for Your Use Cases

2016 Locations and Dates

San Francisco: Saturday, April 23rd (SOLD OUT)
San Francisco: Saturday, June 4th (SOLD OUT)
Washington DC: Saturday, June 18th (SOLD OUT)
Los Angeles: Sunday, July 10th (SOLD OUT)
Seattle: Saturday, July 30th (SOLD OUT)
Santa Clara: Saturday, August 6th (SOLD OUT)
Chicago: Saturday, August 27th (SOLD OUT)
New York: Saturday, October 1st (SOLD OUT)
Munich: Saturday, October 15th (SOLD OUT)
London: Saturday, October 22nd (SOLD OUT)
Brussels: Saturday, October 29th (SOLD OUT)
Madrid: Saturday, November 19th (SOLD OUT)
Bangalore: Saturday, December 10th

2017 Locations and Dates

London: Saturday, January 7th, 2017
Tokyo: Coming Soon, 2017
Shanghai: Coming Soon, 2017
Beijing: Coming Soon, 2017
Sydney: Coming Soon, 2017
Melbourne: Coming Soon, 2017
Sao Paulo: Coming Soon, 2017
Rio de Janeiro: Coming Soon, 2017

Suggest a City and Date

Description

The goal of this workshop is to build an end-to-end, streaming data analytics and recommendations pipeline on your local machine using Docker and the latest streaming analytics

First, we create a data pipeline to interactively analyze, approximate, and visualize streaming data using modern tools such as Apache Spark, Kafka, Zeppelin, iPython, and ElasticSearch.
Next, we extend our pipeline to use streaming data to generate personalized recommendation models using popular machine learning, graph, and natural language processing techniques such as collaborative filtering, clustering, and topic modeling.
Last, we productionize our pipeline and serve live recommendations to our users!

Name		Name	Last commit message	Last commit date
Latest commit History 1,282 Commits
bin		bin
config		config
datasets		datasets
myapps		myapps
.gitignore		.gitignore
Dockerfile		Dockerfile
Dockerfile.kube		Dockerfile.kube
LICENSE		LICENSE
README.md		README.md
pipeline-rc.yaml		pipeline-rc.yaml
pipeline-svc.yaml		pipeline-svc.yaml
run		run

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Complete Real-time, End-to-End, Machine Learning and AI Pipeline

Follow Wiki to Setup Docker-based Environment

New! Production-Ready, Docker/Kubernetes, and NetflixOSS-based PipelineIO

Architecture Overview

Mapped to Code

Powered by the PANCAKE STACK!

Upcoming Workshops

Title

Agenda (Full Day)

2016 Locations and Dates

2017 Locations and Dates

Description

Screenshots

Apache Zeppelin Notebooks

Stanford CoreNLP Sentiment Analysis

Jupyter/iPython Notebooks

SparkR Notebooks

TensorFlow Notebooks

Deploy Spark ML and TensorFlow Models into Production with Netflix OSS

Apache NiFi Data Flows

AirFlow Workflows

Presto Queries

Tableau Integration

Beeline Command-line Hive Client

Log Visualization with Kibana & Logstash

Spark, Spark Streaming, and Spark SQL Admin UIs

Vector Host and Guest (Docker) System Metric UIs

Ganglia System and JVM Metrics Monitoring UIs

Tools Overview

About

Releases

Packages

Languages

License

botwala/pipeline

Folders and files

Latest commit

History

Repository files navigation

Complete Real-time, End-to-End, Machine Learning and AI Pipeline

Follow Wiki to Setup Docker-based Environment

New! Production-Ready, Docker/Kubernetes, and NetflixOSS-based PipelineIO

Architecture Overview

Mapped to Code

Powered by the PANCAKE STACK!

Upcoming Workshops

Title

Agenda (Full Day)

2016 Locations and Dates

2017 Locations and Dates

Description

Screenshots

Apache Zeppelin Notebooks

Stanford CoreNLP Sentiment Analysis

Jupyter/iPython Notebooks

SparkR Notebooks

TensorFlow Notebooks

Deploy Spark ML and TensorFlow Models into Production with Netflix OSS

Apache NiFi Data Flows

AirFlow Workflows

Presto Queries

Tableau Integration

Beeline Command-line Hive Client

Log Visualization with Kibana & Logstash

Spark, Spark Streaming, and Spark SQL Admin UIs

Vector Host and Guest (Docker) System Metric UIs

Ganglia System and JVM Metrics Monitoring UIs

Tools Overview

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages