GoDataDriven Open Source Contribution: March 2017 Edition

This article is featured in the free magazine "Data Science in Production – Download here

Like announced last month, we are trying to collect all the contributions we
do the the open source world, either to existing or to new projects.

This second edition starts with Fokko that contributed to 5 different projects: Druid,
Docker-Druid, Airflow, Flink, and scalatra-sample-app.

On the other hand yours truly fixed in NiFi a wrong description of the UnpackContent processor
in PR 1558 and open sourced a project to provision Google Cloud Engine instances to ease the
classroom trainings deployment1. We are in fact often faced with a lot of challenges when delivering
training where Spark is involved:

Since Google Cloud Engine makes it extremely easy to create clusters, the project kind of assumes
that that’s what you’re using. That said, it should be easy enough to modify it. Personally I’m
working on getting Anaconda + JupyterHub integrated so that users don’t even need to have SSH
access to the machine.

That’s all for the second edition. As always, we’re hiring Data Scientists and Data Engineers. Head
up to our career page if you’re interested. You
get plenty of opportunities to give back to the community.


  1. Huge props to Ron as he created the first working Ansible implementation! 

Subscribe to our newsletter

Stay up to date on the latest insights and best-practices by registering for the GoDataDriven newsletter.