email: [email protected]
Data science and machine learning have traditionally revolved around creating models based on the assumption that individual data points are uncorrelated. However, this ignores a signal that could potentially be very strong: the relationships between data points. We will look at this data as a network graph, and explore how to unlock the potential using a graph database.
We will cover the following in this tutorial:
- An introduction to graphs and graph theory
- How working with a graph differs from columnar data, such as is available via SQL or Pandas dataframes
- A brief exploration of basic Python packages that can be used to interact with graphs
- Creation of a free Sandbox graph database
- A crash course in the Cypher query language
- Creation of a basic graph of the CORA database
- Generation of graph embeddings
- Evaluation and comparison of machine learning models based on word embeddings versus graph embeddings
This tutorial assumes no previous knowledge.
- "Graph Algorithms: Practical Examples in Apache Spark and Neo4j" (free book)
- Create a Neo4j Sandbox
- Google Colab
- The Neo4j Cheat Sheet and Quick Reference
- Cypher Manual
- Neo4j Cypher Reference Card
- Advanced Cypher Query Tuning (video)
- Graph Data Science Library API Docs
- Bite-Sized Neo4j for Data Scientists (video series)
- My Medium Articles