Skip to content

Latest commit

 

History

History
 
 

2020-10-27 | Top Tuning Tips for Spark 3.0 and Delta Lake on Databricks

Tech Talk: Top Tuning Tips for Spark 3.0 and Delta Lake on Databricks

2020-10-27 | Watch the video | This folder contains the notebooks used in this tutorial.

Apache Spark™️ has become the de-facto open-source standard for big data processing due to its ease of use and performance. And the open-source Delta Lake project enhances Spark’s lead with new capabilities like ACID transactions, Schema Enforcement and Time Travel. These features help ensure that data lakes and data pipelines can deliver high-quality, reliable data to downstream data teams for successful data analytics and machine learning projects.

In this tech talk, we will discuss the top tuning tips for Apache Spark 3.0 and Delta Lake on Databricks. Come prepared to ask your questions and join Joe Widen, Chris Hoshino-Fish, and Denny Lee to discuss when to use which join operations, how to pick your machine sizes, how to help speed up your merge operations, and how to make your jobs easier!

Speakers

Chris Hoshino-Fish is a Solutions Architect at Databricks. Chris is an active member of the Performance Subject Matter Expert group and a former Principal Consultant focused on Data Engineering, working with several Fortune 500 Databricks customers. Prior to Databricks, Chris worked for an adtech company as a data engineer managing pipelines using Apache Spark for 3.5 years. Chris has a B.A. in Computational Mathematics from University of California, Santa Cruz.

Denny Lee is a developer advocate at Databricks, where he works on Delta Lake, Apache Spark, Data Sciences, and Healthcare Life Sciences. He has previously built enterprise DW/BI and big data systems at Microsoft including Azure Cosmos DB, Project Isotope (HDInsight), and SQL Server as well as the Senior Director of Data Sciences Engineering at SAP Concur. Denny holds a Masters in Biomedical Informatics from Oregon Health Sciences University.

Joe Widen is a Solutions Architect at Databricks. Joe leads the Performance and Delta SME horizontal initiatives along with making customers successful with the Databricks Unified Analytics Platform. Joe has been working with Spark and more generally Hadoop for 5 years, with previous stops at Hortonworks and Capital One.