A Professional Data Engineer enables data-driven decision making by collecting, transforming, and visualizing data. The Data Engineer designs, builds, maintains, and troubleshoots data processing systems with a particular emphasis on the security, reliability, fault-tolerance, scalability, fidelity, and efficiency of such systems.
The Data Engineer also analyzes data to gain insight into business outcomes, builds statistical models to support decision-making, and creates machine learning models to automate and simplify key business processes.
The Google Cloud Certified - Professional Data Engineer exam assesses your ability to:
- Build and maintain data structures and databases
- Design data processing systems
- Analyze data and enable machine learning
- Model business processes for analysis and optimization
- Design for reliability
- Visualize data and advocate policy
- Design for security and compliance
This repository contains a collection of resources that will help you prepare.
- Google Cloud Platform (GCP) services you can use to manage data throughout its entire lifecycle, from initial acquisition to final visualization.
- Migrating On-Premises Hadoop Infrastructure to Google Cloud Platform
- User Contributed Study Guide
- Certification Exam Guide
Provide a guided, tutorial, hands-on coding experience. Most codelabs will step you through the process of building a small application, or adding a new feature to an existing application. They cover a wide range of topics such as Android Wear, Google Compute Engine, Project Tango, and Google APIs on iOS.
https://codelabs.developers.google.com/
https://github.com/GoogleCloudPlatform/training-data-analyst
https://codelabs.developers.google.com/codelabs/cpb100-compute-engine/
https://codelabs.developers.google.com/codelabs/cpb100-cloud-storage/
This notebook demonstrates how to use Datalab to display the earthquakes that have happened over the past 7 days. The data come from USGS, and we will use the Python module basemap to do the plotting. https://github.com/GoogleCloudPlatform/datalab-samples/blob/master/basemap/earthquakes.ipynb
https://codelabs.developers.google.com/codelabs/cpb100-cloud-sql/
- Create Cloud SQL instance
- Create database tables by importing .sql files from Cloud Storage
- Populate the tables by importing .csv files from Cloud Storage
- Allow access to Cloud SQL
- Explore the rentals data using SQL statements from CloudShell
https://codelabs.developers.google.com/codelabs/cpb100-dataproc/
- Launch DataprocRun Spark
- ML jobs using Dataproc
https://codelabs.developers.google.com/codelabs/cpb100-datalab
gcloud compute zones list
datalab connect mydatalabvm
datalab create mydatalabvm --zone us-central1-a
https://codelabs.developers.google.com/codelabs/cpb100-bigquery-dataset/
- Use BigQuery and Datalab to explore and visualize data
- Build a Pandas dataframe that will be used as the training dataset for machine learning using TensorFlow
https://codelabs.developers.google.com/codelabs/cpb100-tensorflow/
- Use TensorFlow to create a neural network model to forecast taxicab demand in NYC
Machine Learning APIs
- Learn how to invoke ML APIs from Python and use their results. https://codelabs.developers.google.com/codelabs/cpb100-translate-api/
- Cloud Datastore: https://cloud.google.com/datastore/
- Cloud Bigtable: https://cloud.google.com/bigtable/
- Google BigQuery: https://cloud.google.com/bigquery/
- Cloud Datalab: https://cloud.google.com/datalab/
- TensorFlow: https://www.tensorflow.org/
- Cloud Machine Learning: https://cloud.google.com/ml/
- Vision API: https://cloud.google.com/vision/
- Translate API: https://cloud.google.com/translate/
- Speech API: https://cloud.google.com/speech/