Quick POC on RDD.
Data is taken from http://bioinfo.uib.es/~joemiro/marvel.html. It is copied over here to make sure the code of the notebook will work.
The following article explains the POC: http://vincentlauzon.com/2018/01/17/azure-databricks-rdd-resilient-distributed-dataset/.
The iPython notebook can be visualized here on GitHub.