Training schedule
Join waiting listIN-COMPANY TRAINING PROGRAMS
Contact Giovanni Lanzani, if you want to know more about custom data & AI training for your teams. He’ll be happy to help you!
Check out more
Become a GCP Data Engineer
Become a Professional Data Engineer on GCP. Design data processing systems, build end-to-end data pipelines, analyze data and carry out machine learning. This course is part of Google’s Data Engineering track that leads to the Professional Data Engineer certificate.
This 4-day training offers a combination of presentations, demos, and hands-on labs. You will learn how to design data processing systems, build end-to-end data pipelines, analyze data, and carry out Machine Learning.
Clients we've helped
What you'll learn
For everyone familiar with SQL, ETL, and a programming language
- Design and build data processing system on GCP
- Analyze data using GCP services
- Train and deploy machine learning models
- How to deal with structured, unstructured, and streaming data
The schedule
Google Cloud Dataproc Overview
- Creating and managing clusters
- Leverage custom machine types and preemptible worker nodes
- Scaling and deleting Clusters
- Lab: Creating Hadoop Clusters with Google Cloud Dataproc
Running Dataproc Jobs
- Running Pig and Hive jobs
- Separation of storage and compute
- Lab: Running Hadoop and Spark Jobs with Dataproc
- Lab: Submit and monitor jobs
Integrating Dataproc with Google Cloud Platform
- Customize clusters with initialization actions
- BigQuery Support
- Lab: Leveraging Google Cloud Platform Services
Making Sense of Unstructured Data with Google’s Machine Learning APIs
- Google’s Machine Learning APIs
- Common ML Use Cases
- Invoking ML APIs
- Lab: Adding Machine Learning Capabilities to Big Data Analysis
Serverless Data Analysis With Big Query
- What is BigQuery?
- Queries and Functions
- Lab: Writing queries in BigQuery
- Loading data into BigQuery
- Exporting data from BigQuery
- Lab: Loading and exporting data
- Nested and repeated fields
- Querying multiple tables
- Lab: Complex queries
- Performance and pricing
Serverless, Autoscaling Data Pipelines With Dataflow
- The Beam programming model
- Data pipelines in Beam Python
- Data pipelines in Beam Java
- Lab: Writing a Dataflow pipeline
- Scalable Big Data processing using Beam
- Lab: MapReduce in Dataflow
- Incorporating additional data
- Lab: Side inputs
- Handling stream data
- GCP Reference architecture
Getting Started With Machine Learning
- What is machine learning (ML)
- Effective ML: concepts, types
- ML datasets: generalization
- Lab: Explore and create ML datasets
Building ML Models With Tensorflow
- Getting started with TensorFlow
- Lab: Using tf.learn.
- TensorFlow graphs and loops + lab
- Lab: Using low-level TensorFlow + early stopping
- Monitoring ML training
- Lab: Charts and graphs of TensorFlow training
Scaling ML Models With CloudML
- Why Cloud ML?
- Packaging up a TensorFlow model
- End-to-end training
- Lab: Run a ML model locally and on cloud
Feature Engineering
- Creating good features
- Transforming inputs
- Synthetic features
- Preprocessing with Cloud ML
- Lab: Feature engineering
Architecture of Streaming Analytics Pipelines
- Stream data processing: Challenges
- Handling variable data volumes
- Dealing with unordered/late data
- Lab: Designing streaming pipeline
Ingesting Variable Volumes
- What is Cloud Pub/Sub?
- How it works: Topics and Subscriptions
- Lab: Simulator
Implementing Streaming Pipelines
- Challenges in stream processing
- Handle late data: watermarks, triggers, accumulation
- Lab: Stream data processing pipeline for live traffic data
Streaming Analytics and Dashboards
- Streaming analytics: from data to decisions
- Querying streaming data with BigQuery
- What is Google Data Studio?
- Lab: build a real-time dashboard to visualize processed data
High Throughput and Low-Latency With Bigtable
- What is Cloud Spanner?
- Designing Bigtable schema
- Ingesting into Bigtable
- Lab: streaming into Bigtable