- Practice from these notebooks throroughly and below pdf
- de-mod-0-get-started-with-pyspark-programming
- de-mod-1-get-started-with-databricks-data-science-and-engineering-workspace
- de-mod-2-transform-data-with-spark
- de-mod-3-manage-data-with-delta-lake
- de-mod-4-build-data-pipelines-with-delta-live-tables
- de-mod-5-deploy-workloads-with-databricks-workflows
- Udemy Practice[You can try this. Use Udemy for Bussiness for free]
- Udemy Practice[You can try this. Use Udemy for Bussiness for free]
- Practice "PracticeExam" questions available in this repo.
- Every options in "PracticeExam" question becomes a question in actual exam.
- Read Databricks Certified Associate Data Engineer Exam thoroughly.
- Read Manage data with delta lake
- Read Build pipeline with DLT
Repo link
- Why might data not be available in a Delta Lake table?
- Hint: Consider commands like
vacuum
,merge
, oroptimize
.
- Hint: Consider commands like
- What does Delta Lake become in the context of a data platform?
- Hint: Think about the concept of a single source of truth.
- What does a Delta table contain in terms of history, metadata, and data?
- Hint: Does it have single or multiple files?
- What is an advantage of Delta Lake over a traditional data lake?
- Which component is a web application a part of in Databricks architecture?
- Hint: Control plane or Data plane?
- What operations need to be done outside of a Databricks repo?
- Hint: Consider operations like pull, push, commit, or clone.
- What is an advantage of using a repo over notebook versioning in Databricks?
- Hint: Consider branching.
- Is Delta Lake ACID compliant?
- How can you avoid duplicates when merging data in a Delta Lake table?
- Hint:
MERGE
command.
- Hint:
- What should you consider when using the
INSERT OVERWRITE
command? - What is the purpose of using Z-ordering in Databricks?
- Hint: For query performance optimization.
- Why might the
COPY INTO
command not work in a particular code block?- Hint: Refer to this documentation.
- What happens when using
Expect
orDrop
on violation in Delta Live Tables (DLT)?- Hint: Refer to this documentation.
- When should you use
GRANT ALL PRIVILEGES
in Unity Catalog? - When should you use
GRANT USAGE
in Unity Catalog? - What is an advantage of using array functions in Spark?
- How do you set
processingTime
to 5 seconds in a streaming query?- Hint: Refer to the practice exam question.
- How do you configure a streaming query for continuous processing in a production environment?
- Hint: Refer to practice exam Q36.
- Which physical object should you create for 10 tables so that other teams can use them in Databricks?
- How can you delete metadata but retain the data files in a Delta table?
- Hint: Consider the concept of an external table.
- What happens when a stream is marked as "Streaming Live" in Databricks?
- Hint: Refer to the practice exam question.
- How can you mark a table as containing PII data in Databricks?
- Hint: Use a
comment
during table creation.
- Hint: Use a
- How can you describe a database in Databricks to get the path for
customer360
? - What is the advantage of a gold table over a silver table in Delta Lake?
- What is the difference between a bronze table and a raw table in Delta Lake?
- How do you identity silver or bronze in Databricks?
- Hint: Refer to practice exam Q31.
- How do you create dependent tasks in a Delta Live Tables (DLT) pipeline?
- How can you speed up query execution in Databricks?
- Hint: Refer to the practice exam question.
- How can you prevent a specific block of code from running on Sundays in Databricks?
- Where can you see data quality metrics in Delta Live Tables (DLT)?
- How can you execute a Delta Live Tables (DLT) pipeline?
- How can you save costs using a serverless SQL warehouse or control DBU usage?
- If a manager is concerned about project over-costing, how can you save costs in Databricks?
- Hint: Consider the impact of using serverless endpoints, auto-stop, etc.
- How can you reduce cluster costs in Databricks?
- Hint: Consider adding an auto-stop setting in a SQL endpoint.
- Refer question Q40 from the practice exam?
- Refer question Q1 from the practice exam?
- Refer question Q3 from the practice exam?
- Which command or method is more appropriate for accessing a table in PySpark:
spark.table("mytable")
,spark.delta.table("mytable")
, orspark.sql("mytable")
? - What is the JDBC driver name for SQLite when connecting via Spark?
- Given two tables,
march_transaction
andapril_transaction
, how can you create a new tableall_transaction
without duplicates?- Hint: Consider using join, merge, or union.
- Refer question Q27 from the practice exam?
- Refer question Q33 from the practice exam?
- How do you check the failed status of a task in a Delta Live Tables (DLT) pipeline?
- Which should be used to trigger alerts in Databricks: a webhook or an email alert?
- How can you speed up query execution using cluster pools in Databricks?
No matter what, please attempt the practice exam thoroughly. The answers in each question becomes another question. You will have ample amount of time during assessment.