Graph database that uses a Cypher language for querying.
- Use Case is complex queries that would be too expensive with many cartesian product joins in RDBMS
- Not as suited to simple aggregates / large set-oriented / graph global operations
- Labeled Property Graph (LPG) Database of nodes and relationships
- Small footprint
- powerful traversal framework
- Cypher - declarative query language (SQL for graphs but instead uses ASCII art to represent patterns)
- Gremlin Traversal Language
- ACID compliant
- granular modelling (don't use rich properties, split in to entities and link them to allow for more flexible querying)
- avoid dense/super nodes - too highly connected as this impacts traversal performance
- create artificial "meta" nodes of 100 relationships to fan out the user base to an artist
- Community Edition - Single Server (no horizontal scalability)
- Enterprise Edition:
- Clustering (HA & LB) - Master -> Slaves, LB different queries to different slaves to get shard caching of subsets of the graph on each node for better performance (3.0+) - HA master election is based on Paxos
- advanced monitoring
- advanced caching
- online backups
graph local -> anchor vertex (node), traverse edges (relationships) from there
- specialised graph modelling required based on use case queries
- highly flexible schema / dynamic schema evolution
Example:
- news index ElasticSearch tokenize nouns -> neo4j <- clients Sentiment
neo4j start
Browse to:
Nodes <--- Relationships ---> Nodes
Nodes + Relationships each have Properties (key:value pairs)
node.property
/ relationship.property
- boolean, numeric, string or array of boolean, numeric, string
Label - organizes nodes into groups (added in 2.0 late 2013)
nouns => entities verbs => relationships adjectives => properties
Nagios TODO:
- neo4j replication lag check
Talend
-> JDBC
-> Rest API
(inefficient)
Generate Cypher create
statements on the Neo4j web page.
- reads nodes + relationships CSV, generate CSVs + import using:
import-cypher -d ',' -i 'nodes.csv' -o 'nodes-output.log' 'CREATE (n:#{Label}) {id:{Node},name:{Name}}) RETURN *';
import-cypher -d ',' -i 'rels.csv' -o 'rels-output.log' 'MATCH (from {id:{From}}), (to {id:{To}}) CREATE from-[:#{Relationship Type}]->to RETURN *;'
LOAD CSV WITH HEADERS FROM "file:/path/to/file.csv" AS NODES CREATE (n {id: nodes.Node, name: nodes.Name, type: nodes.Label}) RETURN n;
For 200k+ rows
LOAD CSV WITH HEADERS FROM "file:/path/to/file.csv" AS RELS MATCH (from {id: rels.From}), (to {id: rels.To}) CREATE from-[:REL {type: rels {type: rels.`Relationship Type`}]->to RETURN from,to;
Assign labels:
MATCH (m {type:"Male"}), (f {type:"Female"}) SET m:Male, f:Female RETURN m,f;
Create relationships:
MATCH (n)-[r1 {type:"MOTHER_OF"}]->(m), (s)-[r2 {type:"FATHER_OF"}]->(t) CREATE n[:MOTHER_OF]->m, s-[:FATHER_OF]->(t) RETURN *;
Remove duplicate relationships:
MATCH ()->[r:REL]-() delete r;
https://github.com/jexp/batch-import
- can append, default overwrite
- tsv nodes - node name label
- rels - start end type
import.sh test.db nodes.tsv rels.tsv &&
mv test.db "$NEO4J_HOME/data" &&
neo4j start
The query language used by Neo4j.
Identifier used for query reference purpose to RETURN
identifier.properties
Syntax | Description |
---|---|
(a) |
Nodes given identifier a |
(a) --> (b) |
Nodes a have a relationship to nodes b |
(a) -[r]-> (b) |
Relationship given identifier r between nodes a and b |
(a)-[:ACTED_IN]->(m) |
Find nodes a that have an ACTED_IN type relationship to nodes m . Relationship type is by convention all UPPERCASE_WITH_UNDERSCORES |
(m {title:"The Matrix"}) |
Find nodes where the title property equals "The Matrix" |
MATCH (node:Label)-[relationship_id:relationship_type)->(node2:Label)
RETURN node.propertyA, node2.propertyB, relationship_id.propertyC, type(relationship_id) ;
WHERE - both these queries are equivalent ways
MATCH (node:Label) WHERE node.property = "value" AND node.property2 = "value2"
MATCH (node:Label {property:"value", property2:"value2"} )
CREATE INDEX ON :Label(property);
Find all a
nodes with any relationships to any other nodes:
MATCH (a) ---> ( )
RETURN a
Command | Description |
---|---|
MATCH |
Matches the given graph pattern in the graph data |
WHERE |
Filters using predicates or anchors pattern elements |
RETURN |
Returns and projects result data, also handles aggregation |
ORDER BY |
Sorts the query result |
SKIP/LIMIT |
Paginates the query result |
Command | Description |
---|---|
CREATE |
Creates nodes and relationships |
MERGE |
Creates nodes uniquely |
CREATE UNIQUE |
Creates relationships uniquely |
DELETE |
Removes nodes, relationships |
SET |
Updates properties and labels |
REMOVE |
Removes properties and labels |
FOREACH |
Performs updating actions once per element in a list |
WITH |
Divides a query into multiple, distinct parts and passes results from one to the next |
INDEX |
|
DISTINCT |
MATCH (actor)-[:ACTED_IN]->(movie)<-[:DIRECTED]-(director)
RETURN actor.name, movie.title, director.name;
Path:
MATCH p=(a)-->(b)
RETURN p
Return just the nodes or the relationships:
RETURN nodes(p)
RETURN rels(p)
MATCH (a)-[:ACTED_IN]->(m)
RETURN a.name, m.title
SKIP 10
LIMIT 10;
MATCH (p:Person)
RETURN p.name
ORDER BY p.born
LIMIT 5;
Browse nodes:
MATCH (n) RETURN (n) LIMIT 20;
See Labels:
MATCH (n) WHERE has(n.name) RETURNS labels(n);
See types of relationships:
MATCH (n:Person)-[r]->() RETURN DISTINCT type(r);
MATCH (a)-[:ACTED_IN]->()
RETURN DISTINCT a
ORDER BY a.born
LIMIT 5
Aggregate functions:
count(x)
min(x)
max(x)
avg(x)
sum(x)
collect(x) # Collect all the values into a list
Ported from private Knowledge Base page 2013+