Skip to content

rmakaoui/Project_GraphX_SparkML_neo4j

Repository files navigation

Game of Thrones Network Analysis

Introduction

This project aims to perform an in-depth analysis of the Game of Thrones network using Apache GraphX, Neo4j, and Spark ML. The dataset used for this analysis is available here. Here is an overview of the project architecture: image

Project Overview

The project comprises several key steps:

  1. Read the Dataset:

  2. Connect Spark GraphX with Neo4j:

  3. Import Dataset into Neo4j:

    • Use the Cypher query language to perform CRUD operations on the dataset in Neo4j.
  4. Apache Zeppelin with GraphX:

    • Integrate Apache Zeppelin with GraphX.
    • Read data from Neo4j.
    • Conduct exploratory data analysis.
    • Execute five graph algorithms of your choice using GraphX.
    • Visualize the results.
  5. Create a Customizable Dashboard:

    • Develop a customizable dashboard to visualize dataset information and the results of graph algorithms.
  6. Spark ML:

    • Use Spark ML to apply machine learning algorithms to the dataset.

Docker-Compose Configuration

Use the provided docker-compose.yml file to set up the cluster. The included services are:

  • Zeppelin (Apache Zeppelin 0.10.0)
  • Spark Master (Bitnami Spark 3.1.2)
  • Neo4j (Bitnami Neo4j 5)

Make sure to configure volumes and ports accordingly.

Running the Project

Clone the Repository

# Clone the repository from GitHub
git clone https://github.com/rmakaoui/Project_GraphX_SparkML_neo4j.git
cd Project_GraphX_SparkML_neo4j

Download the Game of Thrones dataset and place it in the appropriate location.

Start the Project with Docker Compose

# Make sure to be in the project directory
cd Project_GraphX_SparkML_neo4j

# Start the services in the background with Docker Compose
docker-compose up -d

After running these commands, the Docker services (Zeppelin, Spark Master, Neo4j) will start, and you can access Apache Zeppelin at http://localhost:8080 in your browser.

  1. Follow the steps outlined in the guide for data analysis, graph algorithms, and machine learning.

References

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published