際際滷

際際滷Share a Scribd company logo
COLLEGE SCORECARD
ANALYSIS USING HIVE
Presented By:
Abhishek Kumar
Anurag Anand
Aditya Patil
Siva Sai
TABLE OF CONTENT
 What is BigData
 College Scorecard
 What is our Project
 What technology we used
 SDLC
 Hive Queries
 Graphs and Final Results
 Conclusion
 Github and Data Source
 References
BIG DATA IN AROUND THE WORLD
BIG DATA ECO-SYSTEMS
A Applications Of Big Data
Homeland
Security
Smarter
Healthcare
Sales
Telecom
Manufacturing
Traffic Control Analytics
Search
Quality
DATA SET SOURCE -
http://catalog.data.gov/dataset/road-traffic-injuries-2002-2010
THE RAW DATA
COLLEGE SCORECARDS MAKE IT EASIER FOR
STUDENTS TO SEARCH FOR A COLLEGE THAT IS A
GOOD FIT FOR THEM. THEY CAN USE THE COLLEGE
SCORECARD TO FIND OUT
 Popular Colleges among students
 Affordability
 Net Price
 No of enrollments
 State with Most number of University
TECHNOLOGIES WHAT HAVE WE USED
 Microsoft Power BI
 Apache Ambari - Version 2.1.2
 Hortonworks Sandbox with HDP 2.4
 HIVE
 Microsoft Excel
 Putty - Release 0.65
 Google Fusion Table
SYSTEM DEVELOPMENT LIFE CYCLE
SYSTEM DEVELOPMENT LIFE CYCLE
Planning
 Defined Scope
 Requirement
Gathering
 Time
Estimation
Analysis
 Gathered
data from
Data.Gov
Design
 Gathered required
softwares such as
Azure, Power View,
Microsoft Power BI
Impleme
ntation
 Developed
Queries &
Created
Tables
Testing
 Analysis
made on the
created
Tables using
graph and
Map
WHAT IS OUR PROJECT
 Data analysis is done on College Student Data
Cost of college Tution Fee
Admission Rate
Popular Colleges
Popular States
Biggest Universities
 Data analysis is done by using HDFS Cluster, HiveQL
 Analyzed data will be displayed using MS Power BI & Power Query in the form
of Graphs and Maps.
CREATING THE CLUSTER IN SANDBOX
USING PUTTY TO LOGIN TO CLUSTER
AND CHECK HIVE STATUS
PROCESS OF ANALYSIS
Step 1- Data CLEANING by removing unwanted an NULL column.
Step-2- LOADING Data to HDFS
STEP-3- Running HQL Queries
Steo-4-Saving results in CVS files
Step-5- Combining the results into one Excel file.
Step-6- Analyzing data through Power BI & EXCEL.
FLOWCHART OF DATA ANALYSIS
DOWNLOAD DATA
FROM DATA.GOV
Uploaded the txt files
into HDFS Using Ambari
Created tables using
HiveQL
Analysed data using the
query, Microsoft BI and
powerview
Analysis of Bar and line
Graphs.
DATA UPLOAD VIA AMBARI
 We have put data in HDFS using Ambari
CREATING THE TABLES
CREATING A COST TABLE
 CREATE TABLE newcost2011(
 UNITID INT, INSTNM STRING,CITY STRING,CONTROL INT,ADM_RATE
FLOAT,ADM_RATE_ALL FLOAT,TUITFTE FLOAT,TUITIONFEE_IN
FLOAT,TUITIONFEE_OUT FLOAT,COSTT4_A FLOAT, UGDS INT)
 COMMENT 'This is the Student 2011 data'
 ROW FORMAT DELIMITED
 FIELDS TERMINATED BY 't'
 STORED AS TEXTFILE;
 LOAD DATA INPATH '/tmp/newcost2011.txt' OVERWRITE INTO TABLE
newcost2011;
OUTPUT OF QUERY
RESULTS OF THE QUERY
DOWNLOADING RESULTS INTO CSV
FORMAT
HIVE QUERY FOR SORTING DATA
MICROSOFT POWER BI USED FOR
DATA ANALYSIS
COMBINING THE RESULT QUERY
TOGETHER FOR ANALYSIS
GRAPHICAL REPRESENTATION USING POWER BI
GRAPHICAL REPRESENTATION USING POWERVIEW
GRAPHICAL REPRESENTATION USING GOOGLE
FUSION TABLES
Big Data Project using HIVE - college scorecard
GRAPHICAL REPRESENTATION USING POWER BI
MOST COSTLY UNIVERSITY IS IN EAST COST
OVERALL CHEAPEST COLLEGE
BEST ADMIT RATES AMONG
COLLEGES
CONCLUSION
SATE WITH HIGHEST NUMBER OF
UNIVERSITY
MOST POPULAR COLLEGE MAJORS
CONCLUSION
 COSTLIET UNIVERSITY is NEW YORK UNIVERSITY
 Most of costly university are located in east cost i.e New York and nearby area
 BIGGEST UNIVERSITY of Phoenix-Online Campus
 Biggest Major Is business.
 STATE WITH MOST UNIVERSITY WITH 10000 student is California i.e 16
 CUNY College of Staten Island has highest admission rate i.e. its easiet to get
admission here.
 CHEAPEST College is High Point University
LINK
 GITHUB Link: (Code Only)
https://github.com/abhimisedu/CIS520GroupF
 Dataset Link: (Dataset Size  1580 MB uncompressed)
http://catalog.data.gov/dataset/college-scorecard
REFERENCE
 https://azure.microsoft.com
 www.Data.gov
 http://www.lynda.com/Hadoop-tutorials/
 http://www.tutorialspoint.com/big_data_tutorials.htm
 http://searchstorage.techtarget.com/guides/Big-data-tutorial-Everything-you-need-
to-know
THANK YOU
ANY QUERIES?

More Related Content

Viewers also liked (16)

APAC Big Data Strategy RadhaKrishna Hiremane
APAC Big Data  Strategy RadhaKrishna  HiremaneAPAC Big Data  Strategy RadhaKrishna  Hiremane
APAC Big Data Strategy RadhaKrishna Hiremane
IntelAPAC
How Salesforce.com uses Hadoop
How Salesforce.com uses HadoopHow Salesforce.com uses Hadoop
How Salesforce.com uses Hadoop
Narayan Bharadwaj
Using Hadoop and Hive to Optimize Travel Search , WindyCityDB 2010
Using Hadoop and Hive to Optimize Travel Search, WindyCityDB 2010Using Hadoop and Hive to Optimize Travel Search, WindyCityDB 2010
Using Hadoop and Hive to Optimize Travel Search , WindyCityDB 2010
Jonathan Seidman
Structure chart
Structure chartStructure chart
Structure chart
Roy Antony Arnold G
Webinar Series Part 5 New Features of HDF 5
Webinar Series Part 5 New Features of HDF 5Webinar Series Part 5 New Features of HDF 5
Webinar Series Part 5 New Features of HDF 5
Hortonworks
DFD, Decision Table, Decision Chart, Structure Charts
DFD, Decision Table, Decision Chart, Structure ChartsDFD, Decision Table, Decision Chart, Structure Charts
DFD, Decision Table, Decision Chart, Structure Charts
SOuvagya Kumar Jena
Student result mamagement
Student result mamagementStudent result mamagement
Student result mamagement
Mickey
What is big data?
What is big data?What is big data?
What is big data?
David Wellman
Big Data & Hadoop Tutorial
Big Data & Hadoop TutorialBig Data & Hadoop Tutorial
Big Data & Hadoop Tutorial
Edureka!
Seminar Presentation Hadoop
Seminar Presentation HadoopSeminar Presentation Hadoop
Seminar Presentation Hadoop
Varun Narang
Double Your Hadoop Hardware Performance with SmartSense
Double Your Hadoop Hardware Performance with SmartSenseDouble Your Hadoop Hardware Performance with SmartSense
Double Your Hadoop Hardware Performance with SmartSense
Hortonworks
Big data and Hadoop
Big data and HadoopBig data and Hadoop
Big data and Hadoop
Rahul Agarwal
What is Big Data?
What is Big Data?What is Big Data?
What is Big Data?
Bernard Marr
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with Hadoop
Philippe Julio
Big data ppt
Big  data pptBig  data ppt
Big data ppt
Nasrin Hussain
際際滷share ppt
際際滷share ppt際際滷share ppt
際際滷share ppt
Mandy Suzanne
APAC Big Data Strategy RadhaKrishna Hiremane
APAC Big Data  Strategy RadhaKrishna  HiremaneAPAC Big Data  Strategy RadhaKrishna  Hiremane
APAC Big Data Strategy RadhaKrishna Hiremane
IntelAPAC
How Salesforce.com uses Hadoop
How Salesforce.com uses HadoopHow Salesforce.com uses Hadoop
How Salesforce.com uses Hadoop
Narayan Bharadwaj
Using Hadoop and Hive to Optimize Travel Search , WindyCityDB 2010
Using Hadoop and Hive to Optimize Travel Search, WindyCityDB 2010Using Hadoop and Hive to Optimize Travel Search, WindyCityDB 2010
Using Hadoop and Hive to Optimize Travel Search , WindyCityDB 2010
Jonathan Seidman
Webinar Series Part 5 New Features of HDF 5
Webinar Series Part 5 New Features of HDF 5Webinar Series Part 5 New Features of HDF 5
Webinar Series Part 5 New Features of HDF 5
Hortonworks
DFD, Decision Table, Decision Chart, Structure Charts
DFD, Decision Table, Decision Chart, Structure ChartsDFD, Decision Table, Decision Chart, Structure Charts
DFD, Decision Table, Decision Chart, Structure Charts
SOuvagya Kumar Jena
Student result mamagement
Student result mamagementStudent result mamagement
Student result mamagement
Mickey
What is big data?
What is big data?What is big data?
What is big data?
David Wellman
Big Data & Hadoop Tutorial
Big Data & Hadoop TutorialBig Data & Hadoop Tutorial
Big Data & Hadoop Tutorial
Edureka!
Seminar Presentation Hadoop
Seminar Presentation HadoopSeminar Presentation Hadoop
Seminar Presentation Hadoop
Varun Narang
Double Your Hadoop Hardware Performance with SmartSense
Double Your Hadoop Hardware Performance with SmartSenseDouble Your Hadoop Hardware Performance with SmartSense
Double Your Hadoop Hardware Performance with SmartSense
Hortonworks
Big data and Hadoop
Big data and HadoopBig data and Hadoop
Big data and Hadoop
Rahul Agarwal
What is Big Data?
What is Big Data?What is Big Data?
What is Big Data?
Bernard Marr
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with Hadoop
Philippe Julio
際際滷share ppt
際際滷share ppt際際滷share ppt
際際滷share ppt
Mandy Suzanne

Similar to Big Data Project using HIVE - college scorecard (20)

Mobile data collection using odk
Mobile data collection using odkMobile data collection using odk
Mobile data collection using odk
KARUMBA GATAMA
odkk.pptx
odkk.pptxodkk.pptx
odkk.pptx
natnaelmamuye
Efficient & effective data management for research projects : ILRI's Data Ma...
Efficient & effective  data management for research projects : ILRI's Data Ma...Efficient & effective  data management for research projects : ILRI's Data Ma...
Efficient & effective data management for research projects : ILRI's Data Ma...
CIARD Movement
Intro to Machine Learning with H2O and AWS
Intro to Machine Learning with H2O and AWSIntro to Machine Learning with H2O and AWS
Intro to Machine Learning with H2O and AWS
Sri Ambati
Bigdata Hadoop project payment gateway domain
Bigdata Hadoop project payment gateway domainBigdata Hadoop project payment gateway domain
Bigdata Hadoop project payment gateway domain
Kamal A
Proof of Concept for Learning Analytics Interoperability
Proof of Concept for Learning Analytics InteroperabilityProof of Concept for Learning Analytics Interoperability
Proof of Concept for Learning Analytics Interoperability
Open Cyber University of Korea
[DSC DACH 23] The Modern Data Stack - Bogdan Pirvu
[DSC DACH 23] The Modern Data Stack - Bogdan Pirvu[DSC DACH 23] The Modern Data Stack - Bogdan Pirvu
[DSC DACH 23] The Modern Data Stack - Bogdan Pirvu
DataScienceConferenc1
Analytical Innovation: How to Build the Next Generation Data Platform
Analytical Innovation: How to Build the Next Generation Data PlatformAnalytical Innovation: How to Build the Next Generation Data Platform
Analytical Innovation: How to Build the Next Generation Data Platform
VMware Tanzu
President Election of Korea in 2017
President Election of Korea in 2017President Election of Korea in 2017
President Election of Korea in 2017
Jongwook Woo
Neo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdf
Neo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdfNeo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdf
Neo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdf
Neo4j
Nodes2020 | Graph of enterprise_metadata | NEO4J Conference
Nodes2020 | Graph of enterprise_metadata | NEO4J ConferenceNodes2020 | Graph of enterprise_metadata | NEO4J Conference
Nodes2020 | Graph of enterprise_metadata | NEO4J Conference
Deepak Chandramouli
When Data Visualizations and Data Imports Just Dont Work
When Data Visualizations and Data Imports Just Dont WorkWhen Data Visualizations and Data Imports Just Dont Work
When Data Visualizations and Data Imports Just Dont Work
Jim Kaplan CIA CFE
Business Intelligence in Laymen terms
Business Intelligence in Laymen termsBusiness Intelligence in Laymen terms
Business Intelligence in Laymen terms
Amity University | FMS - DU | IMT | Stratford University | KKMI International Institute | AIMA | DTU
Resume ricky jairath
Resume   ricky jairathResume   ricky jairath
Resume ricky jairath
RICKY JAIRATH
Intro to Data Science for Non-Data Scientists
Intro to Data Science for Non-Data ScientistsIntro to Data Science for Non-Data Scientists
Intro to Data Science for Non-Data Scientists
Sri Ambati
Agile Big Data Analytics Development: An Architecture-Centric Approach
Agile Big Data Analytics Development: An Architecture-Centric ApproachAgile Big Data Analytics Development: An Architecture-Centric Approach
Agile Big Data Analytics Development: An Architecture-Centric Approach
SoftServe
Data-Driven Transformation: Leveraging Big Data at Showtime with Apache Spark
Data-Driven Transformation: Leveraging Big Data at Showtime with Apache SparkData-Driven Transformation: Leveraging Big Data at Showtime with Apache Spark
Data-Driven Transformation: Leveraging Big Data at Showtime with Apache Spark
Databricks
Cross-Tier Application and Data Partitioning of Web Applications for Hybrid C...
Cross-Tier Application and Data Partitioning of Web Applications for Hybrid C...Cross-Tier Application and Data Partitioning of Web Applications for Hybrid C...
Cross-Tier Application and Data Partitioning of Web Applications for Hybrid C...
nimak
School performance management analytics in cloud
School performance management analytics in cloudSchool performance management analytics in cloud
School performance management analytics in cloud
Nitai Partners Inc
Annual Report Portal for Educational Institutes, Streamlining Departmental Re...
Annual Report Portal for Educational Institutes, Streamlining Departmental Re...Annual Report Portal for Educational Institutes, Streamlining Departmental Re...
Annual Report Portal for Educational Institutes, Streamlining Departmental Re...
ap5277478
Mobile data collection using odk
Mobile data collection using odkMobile data collection using odk
Mobile data collection using odk
KARUMBA GATAMA
Efficient & effective data management for research projects : ILRI's Data Ma...
Efficient & effective  data management for research projects : ILRI's Data Ma...Efficient & effective  data management for research projects : ILRI's Data Ma...
Efficient & effective data management for research projects : ILRI's Data Ma...
CIARD Movement
Intro to Machine Learning with H2O and AWS
Intro to Machine Learning with H2O and AWSIntro to Machine Learning with H2O and AWS
Intro to Machine Learning with H2O and AWS
Sri Ambati
Bigdata Hadoop project payment gateway domain
Bigdata Hadoop project payment gateway domainBigdata Hadoop project payment gateway domain
Bigdata Hadoop project payment gateway domain
Kamal A
Proof of Concept for Learning Analytics Interoperability
Proof of Concept for Learning Analytics InteroperabilityProof of Concept for Learning Analytics Interoperability
Proof of Concept for Learning Analytics Interoperability
Open Cyber University of Korea
[DSC DACH 23] The Modern Data Stack - Bogdan Pirvu
[DSC DACH 23] The Modern Data Stack - Bogdan Pirvu[DSC DACH 23] The Modern Data Stack - Bogdan Pirvu
[DSC DACH 23] The Modern Data Stack - Bogdan Pirvu
DataScienceConferenc1
Analytical Innovation: How to Build the Next Generation Data Platform
Analytical Innovation: How to Build the Next Generation Data PlatformAnalytical Innovation: How to Build the Next Generation Data Platform
Analytical Innovation: How to Build the Next Generation Data Platform
VMware Tanzu
President Election of Korea in 2017
President Election of Korea in 2017President Election of Korea in 2017
President Election of Korea in 2017
Jongwook Woo
Neo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdf
Neo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdfNeo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdf
Neo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdf
Neo4j
Nodes2020 | Graph of enterprise_metadata | NEO4J Conference
Nodes2020 | Graph of enterprise_metadata | NEO4J ConferenceNodes2020 | Graph of enterprise_metadata | NEO4J Conference
Nodes2020 | Graph of enterprise_metadata | NEO4J Conference
Deepak Chandramouli
When Data Visualizations and Data Imports Just Dont Work
When Data Visualizations and Data Imports Just Dont WorkWhen Data Visualizations and Data Imports Just Dont Work
When Data Visualizations and Data Imports Just Dont Work
Jim Kaplan CIA CFE
Resume ricky jairath
Resume   ricky jairathResume   ricky jairath
Resume ricky jairath
RICKY JAIRATH
Intro to Data Science for Non-Data Scientists
Intro to Data Science for Non-Data ScientistsIntro to Data Science for Non-Data Scientists
Intro to Data Science for Non-Data Scientists
Sri Ambati
Agile Big Data Analytics Development: An Architecture-Centric Approach
Agile Big Data Analytics Development: An Architecture-Centric ApproachAgile Big Data Analytics Development: An Architecture-Centric Approach
Agile Big Data Analytics Development: An Architecture-Centric Approach
SoftServe
Data-Driven Transformation: Leveraging Big Data at Showtime with Apache Spark
Data-Driven Transformation: Leveraging Big Data at Showtime with Apache SparkData-Driven Transformation: Leveraging Big Data at Showtime with Apache Spark
Data-Driven Transformation: Leveraging Big Data at Showtime with Apache Spark
Databricks
Cross-Tier Application and Data Partitioning of Web Applications for Hybrid C...
Cross-Tier Application and Data Partitioning of Web Applications for Hybrid C...Cross-Tier Application and Data Partitioning of Web Applications for Hybrid C...
Cross-Tier Application and Data Partitioning of Web Applications for Hybrid C...
nimak
School performance management analytics in cloud
School performance management analytics in cloudSchool performance management analytics in cloud
School performance management analytics in cloud
Nitai Partners Inc
Annual Report Portal for Educational Institutes, Streamlining Departmental Re...
Annual Report Portal for Educational Institutes, Streamlining Departmental Re...Annual Report Portal for Educational Institutes, Streamlining Departmental Re...
Annual Report Portal for Educational Institutes, Streamlining Departmental Re...
ap5277478

Recently uploaded (20)

LDPlayer 9.1.20 Latest Crack Free Download
LDPlayer 9.1.20 Latest Crack Free DownloadLDPlayer 9.1.20 Latest Crack Free Download
LDPlayer 9.1.20 Latest Crack Free Download
5ls1bnl9iv
salesforce development services - Alt digital
salesforce development services - Alt digitalsalesforce development services - Alt digital
salesforce development services - Alt digital
Alt Digital Technologies
iTop VPN Latest Version 2025 Crack Free Download
iTop VPN Latest Version 2025 Crack Free DownloadiTop VPN Latest Version 2025 Crack Free Download
iTop VPN Latest Version 2025 Crack Free Download
lr74xqnvuf
Consequences and Principles of Software Quality v1.0
Consequences and Principles of Software Quality v1.0Consequences and Principles of Software Quality v1.0
Consequences and Principles of Software Quality v1.0
Yann-Ga谷l Gu辿h辿neuc
Wondershare Filmora 14.3.2 Crack + License Key Free Download
Wondershare Filmora 14.3.2 Crack + License Key Free DownloadWondershare Filmora 14.3.2 Crack + License Key Free Download
Wondershare Filmora 14.3.2 Crack + License Key Free Download
arshadkhokher01
SketchUp Pro Crack [2025]-Free Download?
SketchUp Pro Crack [2025]-Free Download?SketchUp Pro Crack [2025]-Free Download?
SketchUp Pro Crack [2025]-Free Download?
kiran10101khan
AI-Powered Chatbots for Employee Support
AI-Powered Chatbots for Employee SupportAI-Powered Chatbots for Employee Support
AI-Powered Chatbots for Employee Support
AutomationEdge Technologies
Rise of the Phoenix: Lesson Learned Build an AI-powered Test Gen Engine
Rise of the Phoenix: Lesson Learned Build an AI-powered Test Gen EngineRise of the Phoenix: Lesson Learned Build an AI-powered Test Gen Engine
Rise of the Phoenix: Lesson Learned Build an AI-powered Test Gen Engine
stevebrudz1
SE- Lecture 5 SE for easy understanding.ppt
SE- Lecture 5 SE for easy understanding.pptSE- Lecture 5 SE for easy understanding.ppt
SE- Lecture 5 SE for easy understanding.ppt
theworldimagine985
Enscape Latest 2025 Crack Free Download
Enscape Latest 2025  Crack Free DownloadEnscape Latest 2025  Crack Free Download
Enscape Latest 2025 Crack Free Download
rnzu5cxw0y
SolidWorks 2025 Crack free Download updated
SolidWorks 2025 Crack  free Download updatedSolidWorks 2025 Crack  free Download updated
SolidWorks 2025 Crack free Download updated
sanasabaa73
AI/ML Infra Meetup | How Uber Optimizes LLM Training and Finetune
AI/ML Infra Meetup | How Uber Optimizes LLM Training and FinetuneAI/ML Infra Meetup | How Uber Optimizes LLM Training and Finetune
AI/ML Infra Meetup | How Uber Optimizes LLM Training and Finetune
Alluxio, Inc.
Adobe InDesign Crack Full Version Free Download 2025
Adobe InDesign Crack  Full Version Free Download 2025Adobe InDesign Crack  Full Version Free Download 2025
Adobe InDesign Crack Full Version Free Download 2025
sannnasaba545
Code or No-Code Tests: Why Top Teams Choose Both
Code or No-Code Tests: Why Top Teams Choose BothCode or No-Code Tests: Why Top Teams Choose Both
Code or No-Code Tests: Why Top Teams Choose Both
Applitools
SE- Lecture 5 for software development.ppt
SE- Lecture 5 for software development.pptSE- Lecture 5 for software development.ppt
SE- Lecture 5 for software development.ppt
theworldimagine985
Douwan Preactivated Plus Crack 2025-Latest
Douwan Preactivated Plus Crack 2025-LatestDouwan Preactivated Plus Crack 2025-Latest
Douwan Preactivated Plus Crack 2025-Latest
mubeen010khan
AI/ML Infra Meetup | Building Production Platform for Large-Scale Recommendat...
AI/ML Infra Meetup | Building Production Platform for Large-Scale Recommendat...AI/ML Infra Meetup | Building Production Platform for Large-Scale Recommendat...
AI/ML Infra Meetup | Building Production Platform for Large-Scale Recommendat...
Alluxio, Inc.
DevOpsDays LA - Platform Engineers are Product Managers.pdf
DevOpsDays LA - Platform Engineers are Product Managers.pdfDevOpsDays LA - Platform Engineers are Product Managers.pdf
DevOpsDays LA - Platform Engineers are Product Managers.pdf
Justin Reock
AI/ML Infra Meetup | Deployment, Discovery and Serving of LLMs at Uber Scale
AI/ML Infra Meetup | Deployment, Discovery and Serving of LLMs at Uber ScaleAI/ML Infra Meetup | Deployment, Discovery and Serving of LLMs at Uber Scale
AI/ML Infra Meetup | Deployment, Discovery and Serving of LLMs at Uber Scale
Alluxio, Inc.
Advance Website Helpdesk Customer Support Ticket Management Odoo
Advance Website Helpdesk Customer Support Ticket Management OdooAdvance Website Helpdesk Customer Support Ticket Management Odoo
Advance Website Helpdesk Customer Support Ticket Management Odoo
Aagam infotech
LDPlayer 9.1.20 Latest Crack Free Download
LDPlayer 9.1.20 Latest Crack Free DownloadLDPlayer 9.1.20 Latest Crack Free Download
LDPlayer 9.1.20 Latest Crack Free Download
5ls1bnl9iv
salesforce development services - Alt digital
salesforce development services - Alt digitalsalesforce development services - Alt digital
salesforce development services - Alt digital
Alt Digital Technologies
iTop VPN Latest Version 2025 Crack Free Download
iTop VPN Latest Version 2025 Crack Free DownloadiTop VPN Latest Version 2025 Crack Free Download
iTop VPN Latest Version 2025 Crack Free Download
lr74xqnvuf
Consequences and Principles of Software Quality v1.0
Consequences and Principles of Software Quality v1.0Consequences and Principles of Software Quality v1.0
Consequences and Principles of Software Quality v1.0
Yann-Ga谷l Gu辿h辿neuc
Wondershare Filmora 14.3.2 Crack + License Key Free Download
Wondershare Filmora 14.3.2 Crack + License Key Free DownloadWondershare Filmora 14.3.2 Crack + License Key Free Download
Wondershare Filmora 14.3.2 Crack + License Key Free Download
arshadkhokher01
SketchUp Pro Crack [2025]-Free Download?
SketchUp Pro Crack [2025]-Free Download?SketchUp Pro Crack [2025]-Free Download?
SketchUp Pro Crack [2025]-Free Download?
kiran10101khan
Rise of the Phoenix: Lesson Learned Build an AI-powered Test Gen Engine
Rise of the Phoenix: Lesson Learned Build an AI-powered Test Gen EngineRise of the Phoenix: Lesson Learned Build an AI-powered Test Gen Engine
Rise of the Phoenix: Lesson Learned Build an AI-powered Test Gen Engine
stevebrudz1
SE- Lecture 5 SE for easy understanding.ppt
SE- Lecture 5 SE for easy understanding.pptSE- Lecture 5 SE for easy understanding.ppt
SE- Lecture 5 SE for easy understanding.ppt
theworldimagine985
Enscape Latest 2025 Crack Free Download
Enscape Latest 2025  Crack Free DownloadEnscape Latest 2025  Crack Free Download
Enscape Latest 2025 Crack Free Download
rnzu5cxw0y
SolidWorks 2025 Crack free Download updated
SolidWorks 2025 Crack  free Download updatedSolidWorks 2025 Crack  free Download updated
SolidWorks 2025 Crack free Download updated
sanasabaa73
AI/ML Infra Meetup | How Uber Optimizes LLM Training and Finetune
AI/ML Infra Meetup | How Uber Optimizes LLM Training and FinetuneAI/ML Infra Meetup | How Uber Optimizes LLM Training and Finetune
AI/ML Infra Meetup | How Uber Optimizes LLM Training and Finetune
Alluxio, Inc.
Adobe InDesign Crack Full Version Free Download 2025
Adobe InDesign Crack  Full Version Free Download 2025Adobe InDesign Crack  Full Version Free Download 2025
Adobe InDesign Crack Full Version Free Download 2025
sannnasaba545
Code or No-Code Tests: Why Top Teams Choose Both
Code or No-Code Tests: Why Top Teams Choose BothCode or No-Code Tests: Why Top Teams Choose Both
Code or No-Code Tests: Why Top Teams Choose Both
Applitools
SE- Lecture 5 for software development.ppt
SE- Lecture 5 for software development.pptSE- Lecture 5 for software development.ppt
SE- Lecture 5 for software development.ppt
theworldimagine985
Douwan Preactivated Plus Crack 2025-Latest
Douwan Preactivated Plus Crack 2025-LatestDouwan Preactivated Plus Crack 2025-Latest
Douwan Preactivated Plus Crack 2025-Latest
mubeen010khan
AI/ML Infra Meetup | Building Production Platform for Large-Scale Recommendat...
AI/ML Infra Meetup | Building Production Platform for Large-Scale Recommendat...AI/ML Infra Meetup | Building Production Platform for Large-Scale Recommendat...
AI/ML Infra Meetup | Building Production Platform for Large-Scale Recommendat...
Alluxio, Inc.
DevOpsDays LA - Platform Engineers are Product Managers.pdf
DevOpsDays LA - Platform Engineers are Product Managers.pdfDevOpsDays LA - Platform Engineers are Product Managers.pdf
DevOpsDays LA - Platform Engineers are Product Managers.pdf
Justin Reock
AI/ML Infra Meetup | Deployment, Discovery and Serving of LLMs at Uber Scale
AI/ML Infra Meetup | Deployment, Discovery and Serving of LLMs at Uber ScaleAI/ML Infra Meetup | Deployment, Discovery and Serving of LLMs at Uber Scale
AI/ML Infra Meetup | Deployment, Discovery and Serving of LLMs at Uber Scale
Alluxio, Inc.
Advance Website Helpdesk Customer Support Ticket Management Odoo
Advance Website Helpdesk Customer Support Ticket Management OdooAdvance Website Helpdesk Customer Support Ticket Management Odoo
Advance Website Helpdesk Customer Support Ticket Management Odoo
Aagam infotech

Big Data Project using HIVE - college scorecard

  • 1. COLLEGE SCORECARD ANALYSIS USING HIVE Presented By: Abhishek Kumar Anurag Anand Aditya Patil Siva Sai
  • 2. TABLE OF CONTENT What is BigData College Scorecard What is our Project What technology we used SDLC Hive Queries Graphs and Final Results Conclusion Github and Data Source References
  • 3. BIG DATA IN AROUND THE WORLD
  • 5. A Applications Of Big Data Homeland Security Smarter Healthcare Sales Telecom Manufacturing Traffic Control Analytics Search Quality
  • 6. DATA SET SOURCE - http://catalog.data.gov/dataset/road-traffic-injuries-2002-2010
  • 8. COLLEGE SCORECARDS MAKE IT EASIER FOR STUDENTS TO SEARCH FOR A COLLEGE THAT IS A GOOD FIT FOR THEM. THEY CAN USE THE COLLEGE SCORECARD TO FIND OUT Popular Colleges among students Affordability Net Price No of enrollments State with Most number of University
  • 9. TECHNOLOGIES WHAT HAVE WE USED Microsoft Power BI Apache Ambari - Version 2.1.2 Hortonworks Sandbox with HDP 2.4 HIVE Microsoft Excel Putty - Release 0.65 Google Fusion Table
  • 11. SYSTEM DEVELOPMENT LIFE CYCLE Planning Defined Scope Requirement Gathering Time Estimation Analysis Gathered data from Data.Gov Design Gathered required softwares such as Azure, Power View, Microsoft Power BI Impleme ntation Developed Queries & Created Tables Testing Analysis made on the created Tables using graph and Map
  • 12. WHAT IS OUR PROJECT Data analysis is done on College Student Data Cost of college Tution Fee Admission Rate Popular Colleges Popular States Biggest Universities Data analysis is done by using HDFS Cluster, HiveQL Analyzed data will be displayed using MS Power BI & Power Query in the form of Graphs and Maps.
  • 13. CREATING THE CLUSTER IN SANDBOX
  • 14. USING PUTTY TO LOGIN TO CLUSTER AND CHECK HIVE STATUS
  • 15. PROCESS OF ANALYSIS Step 1- Data CLEANING by removing unwanted an NULL column. Step-2- LOADING Data to HDFS STEP-3- Running HQL Queries Steo-4-Saving results in CVS files Step-5- Combining the results into one Excel file. Step-6- Analyzing data through Power BI & EXCEL.
  • 16. FLOWCHART OF DATA ANALYSIS DOWNLOAD DATA FROM DATA.GOV Uploaded the txt files into HDFS Using Ambari Created tables using HiveQL Analysed data using the query, Microsoft BI and powerview Analysis of Bar and line Graphs.
  • 17. DATA UPLOAD VIA AMBARI We have put data in HDFS using Ambari
  • 19. CREATING A COST TABLE CREATE TABLE newcost2011( UNITID INT, INSTNM STRING,CITY STRING,CONTROL INT,ADM_RATE FLOAT,ADM_RATE_ALL FLOAT,TUITFTE FLOAT,TUITIONFEE_IN FLOAT,TUITIONFEE_OUT FLOAT,COSTT4_A FLOAT, UGDS INT) COMMENT 'This is the Student 2011 data' ROW FORMAT DELIMITED FIELDS TERMINATED BY 't' STORED AS TEXTFILE; LOAD DATA INPATH '/tmp/newcost2011.txt' OVERWRITE INTO TABLE newcost2011;
  • 21. RESULTS OF THE QUERY
  • 23. HIVE QUERY FOR SORTING DATA
  • 24. MICROSOFT POWER BI USED FOR DATA ANALYSIS
  • 25. COMBINING THE RESULT QUERY TOGETHER FOR ANALYSIS
  • 28. GRAPHICAL REPRESENTATION USING GOOGLE FUSION TABLES
  • 30. GRAPHICAL REPRESENTATION USING POWER BI MOST COSTLY UNIVERSITY IS IN EAST COST
  • 32. BEST ADMIT RATES AMONG COLLEGES
  • 34. SATE WITH HIGHEST NUMBER OF UNIVERSITY
  • 36. CONCLUSION COSTLIET UNIVERSITY is NEW YORK UNIVERSITY Most of costly university are located in east cost i.e New York and nearby area BIGGEST UNIVERSITY of Phoenix-Online Campus Biggest Major Is business. STATE WITH MOST UNIVERSITY WITH 10000 student is California i.e 16 CUNY College of Staten Island has highest admission rate i.e. its easiet to get admission here. CHEAPEST College is High Point University
  • 37. LINK GITHUB Link: (Code Only) https://github.com/abhimisedu/CIS520GroupF Dataset Link: (Dataset Size 1580 MB uncompressed) http://catalog.data.gov/dataset/college-scorecard
  • 38. REFERENCE https://azure.microsoft.com www.Data.gov http://www.lynda.com/Hadoop-tutorials/ http://www.tutorialspoint.com/big_data_tutorials.htm http://searchstorage.techtarget.com/guides/Big-data-tutorial-Everything-you-need- to-know