Skip to content

rdelaguila/data-engineering-gcp

Repository files navigation

Google - Professional Data Engineer

A Professional Data Engineer enables data-driven decision making by collecting, transforming, and visualizing data. The Data Engineer designs, builds, maintains, and troubleshoots data processing systems with a particular emphasis on the security, reliability, fault-tolerance, scalability, fidelity, and efficiency of such systems.

The Data Engineer also analyzes data to gain insight into business outcomes, builds statistical models to support decision-making, and creates machine learning models to automate and simplify key business processes.

The Google Cloud Certified - Professional Data Engineer exam assesses your ability to:

  • Build and maintain data structures and databases
  • Design data processing systems
  • Analyze data and enable machine learning
  • Model business processes for analysis and optimization
  • Design for reliability
  • Visualize data and advocate policy
  • Design for security and compliance

This repository contains a collection of resources that will help you prepare.

Conceptual Knowledge Articles

Case Studies

Google Developers Codelabs

Provide a guided, tutorial, hands-on coding experience. Most codelabs will step you through the process of building a small application, or adding a new feature to an existing application. They cover a wide range of topics such as Android Wear, Google Compute Engine, Project Tango, and Google APIs on iOS.

https://codelabs.developers.google.com/

Labs and demos for courses for GCP Training

https://github.com/GoogleCloudPlatform/training-data-analyst

In this lab you spin up a virtual machine, configure its security, and access it remotely.

https://codelabs.developers.google.com/codelabs/cpb100-compute-engine/

In this lab you carry out the steps of an ingest-transform-and-publish data pipeline manually.

https://codelabs.developers.google.com/codelabs/cpb100-cloud-storage/

Geographic data in Datalab

This notebook demonstrates how to use Datalab to display the earthquakes that have happened over the past 7 days. The data come from USGS, and we will use the Python module basemap to do the plotting. https://github.com/GoogleCloudPlatform/datalab-samples/blob/master/basemap/earthquakes.ipynb

Setup rentals data in Cloud SQL

https://codelabs.developers.google.com/codelabs/cpb100-cloud-sql/

  • Create Cloud SQL instance
  • Create database tables by importing .sql files from Cloud Storage
  • Populate the tables by importing .csv files from Cloud Storage
  • Allow access to Cloud SQL
  • Explore the rentals data using SQL statements from CloudShell

Setup rentals data in Cloud SQL

https://codelabs.developers.google.com/codelabs/cpb100-dataproc/

  • Launch DataprocRun Spark
  • ML jobs using Dataproc

Create ML dataset with BigQuery

https://codelabs.developers.google.com/codelabs/cpb100-datalab

gcloud compute zones list
datalab connect mydatalabvm
datalab create mydatalabvm --zone us-central1-a

https://codelabs.developers.google.com/codelabs/cpb100-bigquery-dataset/

  • Use BigQuery and Datalab to explore and visualize data
  • Build a Pandas dataframe that will be used as the training dataset for machine learning using TensorFlow

https://codelabs.developers.google.com/codelabs/cpb100-tensorflow/

  • Use TensorFlow to create a neural network model to forecast taxicab demand in NYC

Machine Learning APIs

About

Data Engineering on Google Cloud Platform

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 94.6%
  • Standard ML 5.4%