Training schedule

IN-COMPANY TRAINING PROGRAMS

Contact Giovanni Lanzani, if you want to know more about custom data & AI training for your teams. He’ll be happy to help you!
Check out more

Data Science Lernpfad - Senior

Data Science mit Spark Kurs

Apache Spark ist eine leistungsstarke Open-Source Processing Engine, der für Tempo, Benutzerfreundlichkeit und fortgeschrittene Analytics entwickelt wurde. Mit Hilfe unserer erfahrenen Consultants werden Sie im Data Science mit Spark Kurs lernen, wie Sie das volle Potenzial dieses herausfordernden Tools nutzen und es für sich selbst einsetzen.
„Mir haben alle Aspekte des Kurses gefallen und ich möchte mich bei den Kursleitern bedanken. Sie haben uns toll erklärt, wie sich Spark für Data Science einsetzen lässt. Dies ist der vierte Kurs von GoDataDriven, an dem ich teilgenommen habe. Alle waren gut, aber dies war der bislang Beste.“ – Data Scientist, Knab

This training is for you if…

  • You have worked with Python before and you want to know how to scale to large datasets

  • You have started, or are about to start, working with large data

  • You know the concepts of machine learning and you want to know how to apply them at scale

This training is not for you if…

Clients we've helped

What you'll learn

Spark basics

  • Spark execution and the Spark session
  • Transformations vs. actions
  • Laziness and lineage: how Spark optimizes code
  • How to use the Spark UI

DataFrames

  • Spark DataFrames vs pandas DataFrames
  • How to load and save DataFrames
  • How to join data
  • User-defined functions and pandas’ user-defined functions (with performance implications)
  • Window operations

  • How to apply partitioning and how Spark reads and writes data
  • Shuffling, narrow wide operations, and thei impact on performance
  • The catalyst optimizer
  • About scheduling and job execution
  • About caching and persistence levels

Spark.ml

  • Machine learning with Spark
  • Pre-processing data and feature engineering
  • Model selection
  • Pipeline API
  • Advanced topics

The schedule

Training Day 1
  • Spark-Grundlagen
  • Advanced Spark
  • DataFrames
Training Day 2
  • Window-Funktionen
  • Spark.ml
Training Day 3
  • Strukturiertes Streaming mit Spark
  • Apache Spark mit Apache Kafka integrieren

After the training you will be able to:

  • Process large-scale data using PySpark
  • Understand the fundamentals of Apache Spark
  • Perform machine learning on large-scale data
meet your trainer

Vadim Nelidov

Data Enchanter

Vadim is Data Scientist passionate about solving data-driven problems and sharing his analytical insights to make Data literacy a reality for all.

Flexible delivery

The Right Format For Your Preferred Learning Style

In-Classroom & In-Company Training
Online, Instructor-Led Training
Hybrid and Blended Learning
Self-Paced Training
Get in touch with the experts

Have any questions?

Contact Giovanni Lanzani, our Managing Director of Learning and Development, if you want to know more. He’ll be happy to help you!

Call me back

You can reach him out by phone as well at +31 6 51 20 6163

Course: Data Science mit Spark Kurs

Book now