Streaming with Kafka and Neo4j

Introduction

This repository contains the code of a little project whose intention is that of implementing a data streaming using Apache Kafka consumed then by a Neo4j Sink instance.

How it works

An instance of Kafka, Zookeeper and Schema Registry alongside with Neo4j is deployed using the definition inside a docker-compose.yml file. Inside a Java application we define two Producers to produce movies from different sources, in this case the sources are two movie datasets obtained from Kaggle: Netflix Movies And TV Shows and The Movies Dataset. Csv files are read from the Java application at some rate and sent to the kafka broker that's being deployed using docker under specified topic names. Data has been serialized using Avro. The Neo4j instance that's acting as a sink is waiting to poll records from the kafka broker and when it does it a custom query specified with the env-variable name NEO4J_streams_sink_topic_cypher_<topic-name> is executed to merge them.

Stack used

The most representative stack that's been used in this project is:

Docker v20.10.6
- Zookeeper (cp-zookeeper)
- Kafka (cp-enterprise-kafka)
- Schema Registry (cp-schema-registry)
- Neo4j v4.2.6
  - APOC v4.2.0.2
  - neo4j-streams-4.0.8
Java 11
- Avro serializer v5.3.0

There's also other libraries used to develop the Java source code.

How to run the program

Download credits.csv and movies_metadata.csv from The Movies Dataset and place them in the data/ folder. Do the same for netflix_titles.csv from Netflix Movies And Tv Shows.
There's a file called .env.development in the root folder where the environmental variables are declared, create a .env file and fill the placeholder declarations in there. Eg: BOOTSTRAP_SERVERS_ADDR=localhost:9092.
Run mvn clean followed by mvn package.
Inside the docker/ folder run docker-compose --env-file .env up. This will create the containers, be careful to wait enough for everything to be initialized, take a look at the logs.
To execute the producers run the main() function of src/main/java/com/movies/graph/MoviesProducer.java. The Neo4j instance will automatically consume those.

Acknowledgment

Thanks to Bruno Berisso for helping in the development of the idea and gifting his time to solve questions.

Name		Name	Last commit message	Last commit date
Latest commit History 51 Commits
.idea		.idea
data		data
docker		docker
src		src
.env.development		.env.development
.gitignore		.gitignore
ARTICLE.md		ARTICLE.md
README.md		README.md
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Streaming with Kafka and Neo4j

Introduction

How it works

Stack used

How to run the program

Acknowledgment

About

Releases

Packages

Languages

pachuru/Real-time-graph-with-Kafka-and-Neo4j

Folders and files

Latest commit

History

Repository files navigation

Streaming with Kafka and Neo4j

Introduction

How it works

Stack used

How to run the program

Acknowledgment

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages