Training schedule
Join waiting listIN-COMPANY TRAINING PROGRAMS
Contact Giovanni Lanzani, if you want to know more about custom data & AI training for your teams. He’ll be happy to help you!
Check out more
Data Processing at Scale
Data is knowledge and knowledge is power. But processing data efficiently can be challenging as it scales up. This training goes deep down into one of the most popular and scalable tools in the market for large-data transformation: Apache Spark!
Clients we've helped
What you'll learn
- How Apache Spark works and advanced features of the tool
- How to write efficient ETL jobs
- Basic and advanced use of the API to transform data
- How to think in terms of distributed systems when writing Spark jobs
The schedule
The program consists of both theory and hands-on exercises.
- Inner-workings of Apache Spark
- Loading data from various formats
- Basic and advanced dataframe operations
- Window and user-defined functions
- Unit testing
- Hands-on exercise to analyze large-scale logs to find trending topics
learning journey
Data Engineering Learning Journey
This online course is perfect for
Data and Machine Learning Engineers who deal with transformation of large volumes of data. Basic experience with Python is required. If you’re not quite there yet, we recommend the Python for Data Engineers course as preparation for this training.
What will you learn during Data Processing at Scale?
After this training, you will have learned how Apache Spark works and have acquired essential skills necessary to write efficient ETL Spark jobs to process large sets of data.
Andrew Snare
Big data hackerAndrew is a Big Data Hacker at GoDataDriven. He is an experienced software engineer with a deep understanding of numerous technologies and languages.
Andrew is a certified Cloudera, Databricks, and Cassandra instructor, and also enjoys sharing his experiences on stage, for example at Goto Conference.