forked from databricks/devrel
-
Notifications
You must be signed in to change notification settings - Fork 4
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
1 changed file
with
16 additions
and
0 deletions.
There are no files selected for viewing
16 changes: 16 additions & 0 deletions
16
2020-10-27 | Top Tuning Tips for Spark 3.0 and Delta Lake on Databricks/readme.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
## Tech Talk: Top Tuning Tips for Spark 3.0 and Delta Lake on Databricks | ||
|
||
2020-10-27 | [Watch the video](https://www.youtube.com/watch?v=hcoMHnTcvmg&feature=youtu.be) | This folder contains the notebooks used in this tutorial. | ||
|
||
Apache Spark™️ has become the de-facto open-source standard for big data processing due to its ease of use and performance. And the open-source Delta Lake project enhances Spark’s lead with new capabilities like ACID transactions, Schema Enforcement and Time Travel. These features help ensure that data lakes and data pipelines can deliver high-quality, reliable data to downstream data teams for successful data analytics and machine learning projects. | ||
|
||
In this tech talk, we will discuss the top tuning tips for Apache Spark 3.0 and Delta Lake on Databricks. Come prepared to ask your questions and join Joe Widen, Chris Hoshino-Fish, and Denny Lee to discuss when to use which join operations, how to pick your machine sizes, how to help speed up your merge operations, and how to make your jobs easier! | ||
|
||
|
||
### Speakers ### | ||
|
||
Chris Hoshino-Fish is a Solutions Architect at Databricks. Chris is an active member of the Performance Subject Matter Expert group and a former Principal Consultant focused on Data Engineering, working with several Fortune 500 Databricks customers. Prior to Databricks, Chris worked for an adtech company as a data engineer managing pipelines using Apache Spark for 3.5 years. Chris has a B.A. in Computational Mathematics from University of California, Santa Cruz. | ||
|
||
Denny Lee is a developer advocate at Databricks, where he works on Delta Lake, Apache Spark, Data Sciences, and Healthcare Life Sciences. He has previously built enterprise DW/BI and big data systems at Microsoft including Azure Cosmos DB, Project Isotope (HDInsight), and SQL Server as well as the Senior Director of Data Sciences Engineering at SAP Concur. Denny holds a Masters in Biomedical Informatics from Oregon Health Sciences University. | ||
|
||
Joe Widen is a Solutions Architect at Databricks. Joe leads the Performance and Delta SME horizontal initiatives along with making customers successful with the Databricks Unified Analytics Platform. Joe has been working with Spark and more generally Hadoop for 5 years, with previous stops at Hortonworks and Capital One. |