Skip to content

Commit

Permalink
Update:
Browse files Browse the repository at this point in the history
.gitignore
README.md
dataframes
features-tour
h1b-visa
rdds
ted
  • Loading branch information
vplauzon committed Jun 21, 2020
1 parent ffed465 commit b9a7edc
Showing 1 changed file with 34 additions and 0 deletions.
34 changes: 34 additions & 0 deletions features-tour/notebooks/02-spark.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
# Databricks notebook source
# MAGIC %md
# MAGIC
# MAGIC # Notebook demoing spark
# MAGIC
# MAGIC This is a markdown cell.

# COMMAND ----------

# Simple Python
1+1

# COMMAND ----------

# Declare the path to a sample blob
sampleBlobPath = '/mnt/source/wikipedia/year=2020/month=05/day=05/hour=04/part-merged.snappy.parquet'

# COMMAND ----------

# Read the parquet blob
sample = spark.read.parquet(sampleBlobPath)
# Here we leverage passthrough authentication to the blob storage

# COMMAND ----------

# What is "sample" object?
sample
# A Data Frame: kind of a lazy loaded readonly table pointing to some storage

# COMMAND ----------

# Let's look at the top of the blob
display(sample)
# We can notice this job takes longer than the loading job ; why? Lazy loading

0 comments on commit b9a7edc

Please sign in to comment.