This is data for runs scored by players in different countries in different years. Lets assume some external process is writing data into a directory in CSV format, Write a flume configuration to copy this data to HDFS using flume and then write a PIG script to process data using PIG to find out sum of run scored and balls played by each player.