I was hired by Touchpoints to create a scalable service which used various APIs to aggregate user data of the client's customers.

The project was particularly interesting because the company had no tech employees, so the service had to be user friendly for non-technical people to use and manage.

The main service was a hadoop cluster on Amazon EMR(elastic map reduce) which ran parallelized API query jobs. I used the Cascading framework on top of Hadoop. The service included automated data cleaning and reconciliation.

I also created a web interface in Flask for employees to do manual data reconciliation and to manage running jobs.

Tools Used

Hadoop, Cascading, Java, Python, Flask, Data Science