This document discusses how to efficiently handle dynamic width files in Spark using Scala, Spark RDDs, and dataframes. It demonstrates reading in a dynamic width source file from mainframe sources, defining a schema, executing code to create a dataframe with the schema, registering the dataframe as a temporary table, and running analytical queries on the temporary table.
1 of 2
Download to read offline
More Related Content
Dynamic Width File in Spark_2016
1. How to handle Dynamic Width File in Spark
Dynamic WidthFile is a common type of source fromMainframe sources;The Belowdemonstrationis one of the efficient
ways to handle dynamic widthFile usingScala, Spark RDDandDataframe. Check thiscode, Execute in your REPL.
Source File
Schema of the File
Code to be Executed
2. Dataframe Schema
Registeringas Temp Table and Show the Data
ImplementingAnalytical Queryinto the temptable
SELECT id,fname,lname,CAST(sum(subject_wise_marks.marks)/numberofsubjectasDouble) as
percentage FROMscore LATERALVIEW explode(subjectwisemarks) marks_tableas
subject_wise_marksgroupbyid,fname,lname,numberofsubject;
Result