Game of Thrones Network Analysis

Introduction

This project aims to perform an in-depth analysis of the Game of Thrones network using Apache GraphX, Neo4j, and Spark ML. The dataset used for this analysis is available here. Here is an overview of the project architecture:

Project Overview

The project comprises several key steps:

Read the Dataset:
- Use the provided Game of Thrones dataset.
Connect Spark GraphX with Neo4j:
- Follow the instructions in the Neo4j Spark Connector documentation to establish a connection.
Import Dataset into Neo4j:
- Use the Cypher query language to perform CRUD operations on the dataset in Neo4j.
Apache Zeppelin with GraphX:
- Integrate Apache Zeppelin with GraphX.
- Read data from Neo4j.
- Conduct exploratory data analysis.
- Execute five graph algorithms of your choice using GraphX.
- Visualize the results.
Create a Customizable Dashboard:
- Develop a customizable dashboard to visualize dataset information and the results of graph algorithms.
Spark ML:
- Use Spark ML to apply machine learning algorithms to the dataset.

Docker-Compose Configuration

Use the provided docker-compose.yml file to set up the cluster. The included services are:

Zeppelin (Apache Zeppelin 0.10.0)
Spark Master (Bitnami Spark 3.1.2)
Neo4j (Bitnami Neo4j 5)

Make sure to configure volumes and ports accordingly.

Running the Project

Clone the Repository

# Clone the repository from GitHub
git clone https://github.com/rmakaoui/Project_GraphX_SparkML_neo4j.git
cd Project_GraphX_SparkML_neo4j

Download the Game of Thrones dataset and place it in the appropriate location.

Game of Thrones Dataset

Start the Project with Docker Compose

# Make sure to be in the project directory
cd Project_GraphX_SparkML_neo4j

# Start the services in the background with Docker Compose
docker-compose up -d

After running these commands, the Docker services (Zeppelin, Spark Master, Neo4j) will start, and you can access Apache Zeppelin at http://localhost:8080 in your browser.

Follow the steps outlined in the guide for data analysis, graph algorithms, and machine learning.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
All necessaryNotebooks		All necessaryNotebooks
Dataset		Dataset
GraphXResults		GraphXResults
jar		jar
notebook		notebook
notebooks		notebooks
GuideProjet.docx		GuideProjet.docx
GuideProjet.pdf		GuideProjet.pdf
Projet PPT.pptx		Projet PPT.pptx
README.md		README.md
docker-compose.yml		docker-compose.yml
hadoop.env		hadoop.env

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Game of Thrones Network Analysis

Introduction

Project Overview

Docker-Compose Configuration

Running the Project

Clone the Repository

Download the Game of Thrones dataset and place it in the appropriate location.

Start the Project with Docker Compose

References

About

Releases

Packages

Languages

rmakaoui/Project_GraphX_SparkML_neo4j

Folders and files

Latest commit

History

Repository files navigation

Game of Thrones Network Analysis

Introduction

Project Overview

Docker-Compose Configuration

Running the Project

Clone the Repository

Download the Game of Thrones dataset and place it in the appropriate location.

Start the Project with Docker Compose

References

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages