Skip to content

Real-time, End-to-End, Advanced Analytics and Machine Learning Recommendation Pipeline

License

Notifications You must be signed in to change notification settings

rakesharya/pipeline

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

End-to-End, Real-time ML Reference Data Pipeline

Gitter Chat Room

Powered by the PANCAKE STACK!

PANCAKE STACK

Upcoming PANCAKE STACK Workshops!

Title

Building an End-to-End Streaming Analytics and Recommendations Pipeline with Spark, Kafka, and TensorFlow

Agenda (Full Day)

Part 1 (Analytics and Visualizations)

  • Analytics and Visualizations Overview (Live Demo!)
  • Verify Environment Setup (Docker, Cloud Instance)
  • Notebooks (Zeppelin, Jupyter/iPython)
  • Interactive Data Analytics (Spark SQL, Hive, Presto)
  • Graph Analytics (Spark, Elastic, NetworkX, TitanDB)
  • Time-series Analytics (Spark, Cassandra)
  • Visualizations (Kibana, Matplotlib, D3)
  • Approximate Queries (Spark SQL, Redis, Algebird)
  • Workflow Management (Airflow)

Part 2 (Streaming and Recommendations)

  • Streaming and Recommendations (Live Demo!)
  • Streaming (NiFi, Kafka, Spark Streaming, Flink)
  • Cluster-based Recommendation (Spark ML, Scikit-Learn)
  • Graph-based Recommendation (Spark ML, Spark Graph)
  • Collaborative-based Recommendation (Spark ML)
  • NLP-based Recommendation (CoreNLP, NLTK)
  • Geo-based Recommendation (ElasticSearch)
  • Hybrid On-Premise+Cloud Auto-scale Deploy (Docker)
  • Save Workshop Environment for Your Use Cases

Locations and Dates

Suggest a City and Date

Description

The goal of this workshop is to build an end-to-end, streaming data analytics and recommendations pipeline on your local machine using Docker and the latest streaming analytics

  • First, we create a data pipeline to interactively analyze, approximate, and visualize streaming data using modern tools such as Apache Spark, Kafka, Zeppelin, iPython, and ElasticSearch.
  • Next, we extend our pipeline to use streaming data to generate personalized recommendation models using popular machine learning, graph, and natural language processing techniques such as collaborative filtering, clustering, and topic modeling.
  • Last, we productionize our pipeline and serve live recommendations to our users!

Architecture Overview

Follow Wiki to Setup Docker-based Environment Pipeline Architecture Overview

Screenshots

Apache Zeppelin Notebooks

Apache Zeppelin Notebooks

Stanford CoreNLP Sentiment Analysis

Stanford CoreNLP Sentiment

Jupyter/iPython Notebooks

Jupyter/iPython Notebooks

SparkR Notebooks

SparkR Notebooks

TensorFlow Notebooks

TensorFlow Notebooks

Apache NiFi Data Flows

Apache NiFi Data Flows

AirFlow Workflows

AirFlow Workflows

Presto Queries

Presto Queries

Tableau Integration

Tableau Integration

Beeline Command-line Hive Client

Beeline Command-line Hive Client

Log Visualization with Kibana & Logstash

Log Visualization with Kibana & Logstash

Spark, Spark Streaming, and Spark SQL Admin UIs

Spark Admin UI Spark Admin UI Spark Admin UI Spark Admin UI Spark Admin UI Spark Admin UI

Ganglia System and JVM Metrics Monitoring UIs

Ganglia Metrics UI Ganglia Metrics UI Ganglia Metrics UI

Tools Overview

Apache Spark Redis Apache Cassandra Apache Kafka NiFi ElasticSearch Logstash Kibana Apache Zeppelin Ganglia Hadoop HDFS iPython Notebook Docker

About

Real-time, End-to-End, Advanced Analytics and Machine Learning Recommendation Pipeline

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 94.9%
  • JavaScript 1.5%
  • CSS 1.2%
  • Scala 0.7%
  • Python 0.6%
  • Shell 0.4%
  • Other 0.7%