Getting Started with Apache Airflow

Julian de Ruiter – Apache Airflow Trainer

Machine learning engineer and consultant Julian de Ruiter teaches the Apache Airflow course. He recently used Apache Airflow at a bank to collect data from several external systems into a central data lake. The bank uses the data lake to perform analyses and train machine-learning models. We asked Julian to share some of his challenges with the project and to explain how he uses and teaches Apache Airflow.

Flexible Framework for Data Workflows

„One of the main challenges with such a project is being able to talk to different systems and ensure that data is fetched and processed on time,“ he explained. „Apache Airflow makes this process much easier by providing a flexible and extendable framework for defining data workflows. Combined with strong scheduling semantics, it’s relatively easy to use it to build complex workflows that run regularly to ensure your data lake stays up-to-date,“ he said.

Julian also explained how he teaches the course. „We start from scratch and work on several different use cases, everyone at their own level,“ he said. „The training begins with building a simple example workflow, but by the end of the day, everyone will have built more realistic workflows that automate extraction and loading of data from a database into cloud-based storage that’s suitable for further processing,“ he explained.

Did you know that...

... Julian is writing a book on Apache Airflow together with fellow GoDataDriven data engineer Bas Harenslak.

Find out more about the book

Testing Workflows

„Many people struggle with setting up a development environment and testing their workflows in Apache Airflow,“ explained Julian, „so we spend considerable time on this subject. Everyone learns how to set up a local development environment, how to build custom Airflow components, and how to approach automated testing in Airflow.“

In one of Julian’s recent training sessions, an analyst was using Google’s Big Query to analyze data from a particular source. She wanted to automate the process of getting her data into Airflow. „I thought it was really cool to see how quickly she managed to get this process up-and-running in Apache Airflow, after less than a day of training,“ he smiled.

Interested in the Apache Airflow training?

Head over to Xebia Academy to check out the full program and to save yourself a spot.

Check out the Apache Airflow course