what you learn / training schedule / your trainer

Training schedule

IN-COMPANY TRAINING PROGRAMS

Contact Giovanni Lanzani, if you want to know more about custom data & AI training for your teams. He’ll be happy to help you!
Check out more

Data Science Learning Journey - Senior

Data Science with Spark Training

Apache Spark is a powerful, open-source processing engine built around speed, ease of use, and advanced analytics. In this course, you will learn to unlock its full potential and master this challenging tool.

This training is for you if…

You have worked with Python before and you want to know how to scale to large datasets
You have started, or are about to start, working with large data
You know the concepts of machine learning and you want to know how to apply them at scale

This training is not for you if…

You won’t be working with spark but want to Python (check out the Python for Data Analysis training instead)
You want a deep dive into machine learning (check out the Certified Data Science with Python training instead)

Clients we've helped

What you'll learn

Spark basics

Spark execution and the Spark session
Transformations vs. actions
Laziness and lineage: how Spark optimizes code
How to use the Spark UI

DataFrames

Spark DataFrames vs pandas DataFrames
How to load and save DataFrames
How to join data
User-defined functions and pandas’ user-defined functions (with performance implications)
Window operations

Advanced Spark

How to apply partitioning and how Spark reads and writes data
Shuffling, narrow wide operations, and thei impact on performance
The catalyst optimizer
About scheduling and job execution
About caching and persistence levels

Spark.ml

Machine learning with Spark
Pre-processing data and feature engineering
Model selection
Pipeline API
Advanced topics

Spark structured streaming

Structured streaming
Machine learning & streaming
Windows and aggregations
Fault tolerance & Kafka
Kafka as a source and sink

The schedule

Training Day 1

Spark execution and Spark sessions
DataFrame methods, properties, and actions
APIs: (Py)Spark DataFrame vs Spark SQL
Reading and writing data in Spark

Training Day 2

The anatomy of a Spark job
Narrow and wide transformations
Window functions

Training Day 3

Applied machine learning in Spark
Spark structured streaming
Integrating Apache Spark with Apache Kafka

After the training you will be able to:

Process large-scale data using PySpark
Understand the fundamentals of Apache Spark
Perform machine learning on large-scale data

learning journey

Machine Learning Engineering Learning Journey

Kubernetes

MLOps Training

Certified Data Science with Python Foundation Training

Production Ready Machine Learning Training

ML System Design

Docker

Google Cloud Platform Fundamentals: Big Data & Machine Learning Training

Data Science with Spark Training

meet your trainer

Vadim Nelidov

Data Enchanter

Vadim is Data Scientist passionate about solving data-driven problems and sharing his analytical insights to make Data literacy a reality for all.

Flexible delivery

The Right Format For Your Preferred Learning Style

In-Classroom & In-Company Training

Online, Instructor-Led Training

Hybrid and Blended Learning

Self-Paced Training

Structured, to-the-point, good combination of theory and practical examples, very knowledgeable trainer who can explain concepts very well

Data scientist

It was a hands-on and tangible course. We could apply what we learned in a matter of minutes. The trainer did a great job of answering ad-hoc questions that complemented the material. We appreciated the fact that we could apply what we were taught directly to our company.

Technical Leader & Software Architect

I liked every aspect of this training and would like to thank the trainers. They did an excellent job of explaining how to use Spark for data science. This is the fourth GoDataDriven training I’ve followed. All were great, but this was the best one so far.

Data Scientist

Climbing a steep Python and Machine Learning curve in three days. This would have taken me months on my own.

Data Scientist

Data Science with Spark Training

Training schedule

IN-COMPANY TRAINING PROGRAMS

Data Science with Spark Training

This training is for you if…

This training is not for you if…

Clients we've helped

What you'll learn

Spark basics

DataFrames

Advanced Spark

Spark.ml

Spark structured streaming

The schedule

After the training you will be able to:

learning journey

Machine Learning Engineering Learning Journey

Vadim Nelidov

The Right Format For Your Preferred Learning Style

Have any questions?

Course: Data Science with Spark Training