diff --git a/integration/apache-spark/apache-spark.adoc b/integration/apache-spark/apache-spark.adoc index deb54952..9bcf958d 100644 --- a/integration/apache-spark/apache-spark.adoc +++ b/integration/apache-spark/apache-spark.adoc @@ -53,7 +53,7 @@ The infrastructure is set up using Docker containers, there are dedicated contai * [Presentation: Combining Neo4j and Apache Spark using Docker] [[preprocessing]] -== Spark Preprocessing +== Spark for Data Preprocessing One example of pre-processing raw data (Chicago Crime dataset) into a format that's well suited for import into Neo4j, was demonstrated by http://twitter.com/markhneedham[Mark Needham]. He combined a number of functions into a Spark-job that takes the existing data, cleans and aggregates it and outputs fragments which are recombined later to larger files.