I am a data engineer, solutions architect and principal consultant on BigData projects with 15 years of industry experience.
My primary area of interest is in building large scale distributed systems. I have experience both in building frameworks, and applications that use frameworks. I was an early contributor, committer and project lead of Hadoop MapReduce, Hadoop on Demand - the earliest provisioning system for Hadoop on a shared cluster, and the first version of the Capacity Scheduler - which continues to be one of the main schedulers in Hadoop today. Since then, I have led teams on other BigData projects including Amazon EMR based solutions. Of late, I have developed near realtime ...