際際滷

際際滷Share a Scribd company logo
DepartmentofMechanicalEngineering Humility Entrepreneurship Teamwork
Learning Social Responsibility Respect for IndividualDeliver The Promise
GMRInstituteofTechnology,Rajam
Term Paper
Final-Review
GMR Institute of Technology
An Autonomous Institute Affiliated to JNTUK, Kakinada
1
Department of Computer Science Engineering
DepartmentofMechanicalEngineering Humility Entrepreneurship Teamwork
Learning Social Responsibility Respect for IndividualDeliver The Promise
GMRInstituteofTechnology,Rajam
Performance analysis of MapReduce
task in Big data using Hadoop
2November 13, 2016
TITLE
by
M. S. V. S. K .Avadhani
(14341A05A4)
Under the Guidance and supervision Of
Mrs. K . Jayasri
Assistant Professor
Department Of Computer Science Engineering
DepartmentofMechanicalEngineering Humility Entrepreneurship Teamwork
Learning Social Responsibility Respect for IndividualDeliver The Promise
GMRInstituteofTechnology,Rajam
ABSTRACT
 Big Data is a huge amount of data that cannot be managed by the traditional
data management system.
 There can be three forms of data, structured form, unstructured form and semi
structured form. Most of the part of big data is in unstructured form.
 Unstructured data is difficult to handle.
 Hadoop is a technological answer to Big Data.
 The Apache Hadoop project provides better tools and techniques to handle this
huge amount of data.
 A Hadoop Distributed File System (HDFS) for storage and the MapReduce
techniques for processing the data.
 This paper discusses the work done on Hadoop by applying a number of files
as input to the system and then analysing the performance of the Hadoop .
DepartmentofMechanicalEngineering Humility Entrepreneurship Teamwork
Learning Social Responsibility Respect for IndividualDeliver The Promise
GMRInstituteofTechnology,Rajam ABSTRACT(contd..)
Besides it discusses the behaviour of the map method and the
reduce method with increasing number of files and the amount of
bytes written and read by these tasks.
oKeywords:
Big data
Hadoop
 HDFS
 MapReduce.
4November 13, 2016
DepartmentofMechanicalEngineering Humility Entrepreneurship Teamwork
Learning Social Responsibility Respect for IndividualDeliver The Promise
GMRInstituteofTechnology,Rajam
5November 13, 2016
DepartmentofMechanicalEngineering Humility Entrepreneurship Teamwork
Learning Social Responsibility Respect for IndividualDeliver The Promise
GMRInstituteofTechnology,Rajam
Hadoop
 Hadoop is an open-source framework that
allows to store and process big data in a
distributed environment across clusters of
commodity hardware.
 Storing HDFS(Hadoop Distributed File System)
 Processing MapReduce
6November 13, 2016
DepartmentofMechanicalEngineering Humility Entrepreneurship Teamwork
Learning Social Responsibility Respect for IndividualDeliver The Promise
GMRInstituteofTechnology,Rajam
 HDFS:
Specially designed file system for storing
huge data sets in cluster of commodity
hardware with streaming access pattern.
5 services :
 Name node
 Secondary node
Job tracker
Data node
Task tracker
7November 13, 2016
DepartmentofMechanicalEngineering Humility Entrepreneurship Teamwork
Learning Social Responsibility Respect for IndividualDeliver The Promise
GMRInstituteofTechnology,Rajam
8November 13, 2016
DepartmentofMechanicalEngineering Humility Entrepreneurship Teamwork
Learning Social Responsibility Respect for IndividualDeliver The Promise
GMRInstituteofTechnology,Rajam
 MapReduce:
MapReduce is a processing technique and a program
model for distributed computing based on java.
 The MapReduce algorithm contains two important
tasks, namely Map and Reduce.
 Map stage : The map or mappers job is to process the
input data.
 Reduce stage : This stage is the combination of
the Shuffle stage and the Reduce stage.
9November 13, 2016
DepartmentofMechanicalEngineering Humility Entrepreneurship Teamwork
Learning Social Responsibility Respect for IndividualDeliver The Promise
GMRInstituteofTechnology,Rajam
10November 13, 2016
DepartmentofMechanicalEngineering Humility Entrepreneurship Teamwork
Learning Social Responsibility Respect for IndividualDeliver The Promise
GMRInstituteofTechnology,Rajam
 The performance of the
MapReduce task on the basis of
the byte written, File bytes read,
Reduce input records, have been
recorded in the beside Table.
 Number of bytes written by the
Map Reduce task does not
increase with the rate at which the
number of files is increasing.
11November 13, 2016
DepartmentofMechanicalEngineering Humility Entrepreneurship Teamwork
Learning Social Responsibility Respect for IndividualDeliver The Promise
GMRInstituteofTechnology,Rajam
The reason is that when the reduce function reduces the map output
it just combines the output of map reduce like in example two time how is saved
with only a single value increase by one.
12November 13, 2016
DepartmentofMechanicalEngineering Humility Entrepreneurship Teamwork
Learning Social Responsibility Respect for IndividualDeliver The Promise
GMRInstituteofTechnology,Rajam
Conclusion:
We have analyzed the performance of the map reduce task
with the increase number of files. We have used the word count application
of the Map reduce for this analysis. The output shows that the Bytes written
do not increase in the same proportion as compared to the amount of files
increase.
13November 13, 2016
DepartmentofMechanicalEngineering Humility Entrepreneurship Teamwork
Learning Social Responsibility Respect for IndividualDeliver The Promise
GMRInstituteofTechnology,Rajam REFERENCES
[1] Shankar Ganesh Manikandan, Siddarth Ravi , Big Data Analysis using Apache
Hadoop, IEEE,2014
[2] Ankita Saldhi, Abhinav Goel, Big Data Analysis Using Hadoop Cluster,
IEEE,2014
[3] Amrit Pal, Pinki Agrawal, Kunal Jain, Kunal Jain, A Performance Analysis of
MapReduce Task with Large Number of Files Dataset in Big Data Using Hadoop,
2014 Fourth International Conference on Communication Systems and Network
Technologies
[4] Aditya B. Patel, Manashvi Birla, Ushma Nair, Big Data Problem Using Hadoop
and Map Reduce, NIRMA UNIVERSITY INTERNATIONAL CONFERENCE ON
ENGINEERING, NUiCONE -2012
14November 13, 2016
DepartmentofMechanicalEngineering Humility Entrepreneurship Teamwork
Learning Social Responsibility Respect for IndividualDeliver The Promise
GMRInstituteofTechnology,Rajam
Thank you.
-Avadhani M.k
15November 13, 2016

More Related Content

Similar to Big data- hadoop -MapReduce (20)

Future of Data Intensive Applicaitons
Future of Data Intensive ApplicaitonsFuture of Data Intensive Applicaitons
Future of Data Intensive Applicaitons
Milind Bhandarkar
An Analytical Study on Research Challenges and Issues in Big Data Analysis.pdf
An Analytical Study on Research Challenges and Issues in Big Data Analysis.pdfAn Analytical Study on Research Challenges and Issues in Big Data Analysis.pdf
An Analytical Study on Research Challenges and Issues in Big Data Analysis.pdf
April Knyff
Hadoop performance modeling for job estimation and resource provisioning
Hadoop performance modeling for job estimation and resource provisioningHadoop performance modeling for job estimation and resource provisioning
Hadoop performance modeling for job estimation and resource provisioning
LeMeniz Infotech
Presentation1
Presentation1Presentation1
Presentation1
Atul Singh
AnupDudaniDataScience2015
AnupDudaniDataScience2015AnupDudaniDataScience2015
AnupDudaniDataScience2015
Anup Dudani
Optimizing Bigdata Processing by using Hybrid Hierarchically Distributed Data...
Optimizing Bigdata Processing by using Hybrid Hierarchically Distributed Data...Optimizing Bigdata Processing by using Hybrid Hierarchically Distributed Data...
Optimizing Bigdata Processing by using Hybrid Hierarchically Distributed Data...
IJCSIS Research Publications
Social Media Market Trender with Dache Manager Using Hadoop and Visualization...
Social Media Market Trender with Dache Manager Using Hadoop and Visualization...Social Media Market Trender with Dache Manager Using Hadoop and Visualization...
Social Media Market Trender with Dache Manager Using Hadoop and Visualization...
IRJET Journal
Multi-Cloud Services
Multi-Cloud ServicesMulti-Cloud Services
Multi-Cloud Services
IRJET Journal
B1803031217
B1803031217B1803031217
B1803031217
IOSR Journals
IRJET- Comparatively Analysis on K-Means++ and Mini Batch K-Means Clustering ...
IRJET- Comparatively Analysis on K-Means++ and Mini Batch K-Means Clustering ...IRJET- Comparatively Analysis on K-Means++ and Mini Batch K-Means Clustering ...
IRJET- Comparatively Analysis on K-Means++ and Mini Batch K-Means Clustering ...
IRJET Journal
Performance evaluation of Map-reduce jar pig hive and spark with machine lear...
Performance evaluation of Map-reduce jar pig hive and spark with machine lear...Performance evaluation of Map-reduce jar pig hive and spark with machine lear...
Performance evaluation of Map-reduce jar pig hive and spark with machine lear...
IJECEIAES
Resume_latest_22_01
Resume_latest_22_01Resume_latest_22_01
Resume_latest_22_01
Raghu Golla
Influence of Hadoop in Big Data Analysis and Its Aspects
Influence of Hadoop in Big Data Analysis and Its Aspects Influence of Hadoop in Big Data Analysis and Its Aspects
Influence of Hadoop in Big Data Analysis and Its Aspects
IJMER
big data and hadoop
big data and hadoopbig data and hadoop
big data and hadoop
Shamama Kamal
Sustainable Software for a Digital Society
Sustainable Software for a Digital SocietySustainable Software for a Digital Society
Sustainable Software for a Digital Society
Patricia Lago
PriyankaDighe_Resume_new
PriyankaDighe_Resume_newPriyankaDighe_Resume_new
PriyankaDighe_Resume_new
Priyanka Dighe
Big data with hadoop
Big data with hadoopBig data with hadoop
Big data with hadoop
Anusha sweety
Performance Improvement of Heterogeneous Hadoop Cluster using Ranking Algorithm
Performance Improvement of Heterogeneous Hadoop Cluster using Ranking AlgorithmPerformance Improvement of Heterogeneous Hadoop Cluster using Ranking Algorithm
Performance Improvement of Heterogeneous Hadoop Cluster using Ranking Algorithm
IRJET Journal
ongc report
ongc reportongc report
ongc report
Prachi Chauhan
LARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENT
LARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENTLARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENT
LARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENT
ijwscjournal
Future of Data Intensive Applicaitons
Future of Data Intensive ApplicaitonsFuture of Data Intensive Applicaitons
Future of Data Intensive Applicaitons
Milind Bhandarkar
An Analytical Study on Research Challenges and Issues in Big Data Analysis.pdf
An Analytical Study on Research Challenges and Issues in Big Data Analysis.pdfAn Analytical Study on Research Challenges and Issues in Big Data Analysis.pdf
An Analytical Study on Research Challenges and Issues in Big Data Analysis.pdf
April Knyff
Hadoop performance modeling for job estimation and resource provisioning
Hadoop performance modeling for job estimation and resource provisioningHadoop performance modeling for job estimation and resource provisioning
Hadoop performance modeling for job estimation and resource provisioning
LeMeniz Infotech
Presentation1
Presentation1Presentation1
Presentation1
Atul Singh
AnupDudaniDataScience2015
AnupDudaniDataScience2015AnupDudaniDataScience2015
AnupDudaniDataScience2015
Anup Dudani
Optimizing Bigdata Processing by using Hybrid Hierarchically Distributed Data...
Optimizing Bigdata Processing by using Hybrid Hierarchically Distributed Data...Optimizing Bigdata Processing by using Hybrid Hierarchically Distributed Data...
Optimizing Bigdata Processing by using Hybrid Hierarchically Distributed Data...
IJCSIS Research Publications
Social Media Market Trender with Dache Manager Using Hadoop and Visualization...
Social Media Market Trender with Dache Manager Using Hadoop and Visualization...Social Media Market Trender with Dache Manager Using Hadoop and Visualization...
Social Media Market Trender with Dache Manager Using Hadoop and Visualization...
IRJET Journal
Multi-Cloud Services
Multi-Cloud ServicesMulti-Cloud Services
Multi-Cloud Services
IRJET Journal
IRJET- Comparatively Analysis on K-Means++ and Mini Batch K-Means Clustering ...
IRJET- Comparatively Analysis on K-Means++ and Mini Batch K-Means Clustering ...IRJET- Comparatively Analysis on K-Means++ and Mini Batch K-Means Clustering ...
IRJET- Comparatively Analysis on K-Means++ and Mini Batch K-Means Clustering ...
IRJET Journal
Performance evaluation of Map-reduce jar pig hive and spark with machine lear...
Performance evaluation of Map-reduce jar pig hive and spark with machine lear...Performance evaluation of Map-reduce jar pig hive and spark with machine lear...
Performance evaluation of Map-reduce jar pig hive and spark with machine lear...
IJECEIAES
Resume_latest_22_01
Resume_latest_22_01Resume_latest_22_01
Resume_latest_22_01
Raghu Golla
Influence of Hadoop in Big Data Analysis and Its Aspects
Influence of Hadoop in Big Data Analysis and Its Aspects Influence of Hadoop in Big Data Analysis and Its Aspects
Influence of Hadoop in Big Data Analysis and Its Aspects
IJMER
big data and hadoop
big data and hadoopbig data and hadoop
big data and hadoop
Shamama Kamal
Sustainable Software for a Digital Society
Sustainable Software for a Digital SocietySustainable Software for a Digital Society
Sustainable Software for a Digital Society
Patricia Lago
PriyankaDighe_Resume_new
PriyankaDighe_Resume_newPriyankaDighe_Resume_new
PriyankaDighe_Resume_new
Priyanka Dighe
Big data with hadoop
Big data with hadoopBig data with hadoop
Big data with hadoop
Anusha sweety
Performance Improvement of Heterogeneous Hadoop Cluster using Ranking Algorithm
Performance Improvement of Heterogeneous Hadoop Cluster using Ranking AlgorithmPerformance Improvement of Heterogeneous Hadoop Cluster using Ranking Algorithm
Performance Improvement of Heterogeneous Hadoop Cluster using Ranking Algorithm
IRJET Journal
LARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENT
LARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENTLARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENT
LARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENT
ijwscjournal

Recently uploaded (20)

Satisfaction_Framework_Presentation.pptx
Satisfaction_Framework_Presentation.pptxSatisfaction_Framework_Presentation.pptx
Satisfaction_Framework_Presentation.pptx
nagom47355
Lecture 2-DATABASE MODELS lecture 2.pptx
Lecture 2-DATABASE MODELS lecture 2.pptxLecture 2-DATABASE MODELS lecture 2.pptx
Lecture 2-DATABASE MODELS lecture 2.pptx
elvis24mutura
ARCH 2025: New Mexico Respite Provider Registry
ARCH 2025: New Mexico Respite Provider RegistryARCH 2025: New Mexico Respite Provider Registry
ARCH 2025: New Mexico Respite Provider Registry
Allen Shaw
Employee data login and attendance for region
Employee data login and attendance for regionEmployee data login and attendance for region
Employee data login and attendance for region
nagom47355
Big-O notations, Algorithm and complexity analaysis
Big-O notations, Algorithm and complexity analaysisBig-O notations, Algorithm and complexity analaysis
Big-O notations, Algorithm and complexity analaysis
drsomya2019
The rise of AI Agents - Beyond Automation_ The Rise of AI Agents in Service ...
The rise of AI Agents -  Beyond Automation_ The Rise of AI Agents in Service ...The rise of AI Agents -  Beyond Automation_ The Rise of AI Agents in Service ...
The rise of AI Agents - Beyond Automation_ The Rise of AI Agents in Service ...
Yasen Lilov
Hadoop-and-R-Programming-Powering-Big-Data-Analytics.pptx
Hadoop-and-R-Programming-Powering-Big-Data-Analytics.pptxHadoop-and-R-Programming-Powering-Big-Data-Analytics.pptx
Hadoop-and-R-Programming-Powering-Big-Data-Analytics.pptx
MdTahammulNoor
Exploratory data analysis (EDA) is used by data scientists to analyze and inv...
Exploratory data analysis (EDA) is used by data scientists to analyze and inv...Exploratory data analysis (EDA) is used by data scientists to analyze and inv...
Exploratory data analysis (EDA) is used by data scientists to analyze and inv...
jimmy841199
Presentation_DM_applications for another services
Presentation_DM_applications for another servicesPresentation_DM_applications for another services
Presentation_DM_applications for another services
aldowilmeryapita
FOOD LAWS.pptxbshdhdhdhdhdhhdhdhdhdhdhhdh
FOOD LAWS.pptxbshdhdhdhdhdhhdhdhdhdhdhhdhFOOD LAWS.pptxbshdhdhdhdhdhhdhdhdhdhdhhdh
FOOD LAWS.pptxbshdhdhdhdhdhhdhdhdhdhdhhdh
cshdhdhvfsbzdb
Drillingis_optimizedusingartificialneural.pptx
Drillingis_optimizedusingartificialneural.pptxDrillingis_optimizedusingartificialneural.pptx
Drillingis_optimizedusingartificialneural.pptx
singhsanjays2107
GLOBAL-GOALS-LOCAL-ACTIONS-The-SDG-Journey-from-Vision-to-Reality.pptx
GLOBAL-GOALS-LOCAL-ACTIONS-The-SDG-Journey-from-Vision-to-Reality.pptxGLOBAL-GOALS-LOCAL-ACTIONS-The-SDG-Journey-from-Vision-to-Reality.pptx
GLOBAL-GOALS-LOCAL-ACTIONS-The-SDG-Journey-from-Vision-to-Reality.pptx
KunalBhadana3
LITERATURE-MODEL.pptxddddddddddddddddddddddddddddddddd
LITERATURE-MODEL.pptxdddddddddddddddddddddddddddddddddLITERATURE-MODEL.pptxddddddddddddddddddddddddddddddddd
LITERATURE-MODEL.pptxddddddddddddddddddddddddddddddddd
Maimai708843
Mastering Data Science with Tutort Academy
Mastering Data Science with Tutort AcademyMastering Data Science with Tutort Academy
Mastering Data Science with Tutort Academy
yashikanigam1
MeasureCamp Belgrade 2025 - Yasen Lilov - Past - Present - Prompt
MeasureCamp Belgrade 2025 - Yasen Lilov - Past - Present - PromptMeasureCamp Belgrade 2025 - Yasen Lilov - Past - Present - Prompt
MeasureCamp Belgrade 2025 - Yasen Lilov - Past - Present - Prompt
Yasen Lilov
chap2_nnejjejehhehehhhhhhhhhehslides.ppt
chap2_nnejjejehhehehhhhhhhhhehslides.pptchap2_nnejjejehhehehhhhhhhhhehslides.ppt
chap2_nnejjejehhehehhhhhhhhhehslides.ppt
Nikhil620181
Introduction to Microsoft Power BI is a business analytics service
Introduction to Microsoft Power BI is a business analytics serviceIntroduction to Microsoft Power BI is a business analytics service
Introduction to Microsoft Power BI is a business analytics service
Kongu Engineering College, Perundurai, Erode
Capital market of Nigeria and its economic values
Capital market of Nigeria and its economic valuesCapital market of Nigeria and its economic values
Capital market of Nigeria and its economic values
ezehnelson104
Construction Management full notes (15CV61).pdf
Construction Management full notes (15CV61).pdfConstruction Management full notes (15CV61).pdf
Construction Management full notes (15CV61).pdf
Ajaharuddin1
High-Paying Data Analytics Opportunities in Jaipur and Boost Your Career.pdf
High-Paying Data Analytics Opportunities in Jaipur and Boost Your Career.pdfHigh-Paying Data Analytics Opportunities in Jaipur and Boost Your Career.pdf
High-Paying Data Analytics Opportunities in Jaipur and Boost Your Career.pdf
vinay salarite
Satisfaction_Framework_Presentation.pptx
Satisfaction_Framework_Presentation.pptxSatisfaction_Framework_Presentation.pptx
Satisfaction_Framework_Presentation.pptx
nagom47355
Lecture 2-DATABASE MODELS lecture 2.pptx
Lecture 2-DATABASE MODELS lecture 2.pptxLecture 2-DATABASE MODELS lecture 2.pptx
Lecture 2-DATABASE MODELS lecture 2.pptx
elvis24mutura
ARCH 2025: New Mexico Respite Provider Registry
ARCH 2025: New Mexico Respite Provider RegistryARCH 2025: New Mexico Respite Provider Registry
ARCH 2025: New Mexico Respite Provider Registry
Allen Shaw
Employee data login and attendance for region
Employee data login and attendance for regionEmployee data login and attendance for region
Employee data login and attendance for region
nagom47355
Big-O notations, Algorithm and complexity analaysis
Big-O notations, Algorithm and complexity analaysisBig-O notations, Algorithm and complexity analaysis
Big-O notations, Algorithm and complexity analaysis
drsomya2019
The rise of AI Agents - Beyond Automation_ The Rise of AI Agents in Service ...
The rise of AI Agents -  Beyond Automation_ The Rise of AI Agents in Service ...The rise of AI Agents -  Beyond Automation_ The Rise of AI Agents in Service ...
The rise of AI Agents - Beyond Automation_ The Rise of AI Agents in Service ...
Yasen Lilov
Hadoop-and-R-Programming-Powering-Big-Data-Analytics.pptx
Hadoop-and-R-Programming-Powering-Big-Data-Analytics.pptxHadoop-and-R-Programming-Powering-Big-Data-Analytics.pptx
Hadoop-and-R-Programming-Powering-Big-Data-Analytics.pptx
MdTahammulNoor
Exploratory data analysis (EDA) is used by data scientists to analyze and inv...
Exploratory data analysis (EDA) is used by data scientists to analyze and inv...Exploratory data analysis (EDA) is used by data scientists to analyze and inv...
Exploratory data analysis (EDA) is used by data scientists to analyze and inv...
jimmy841199
Presentation_DM_applications for another services
Presentation_DM_applications for another servicesPresentation_DM_applications for another services
Presentation_DM_applications for another services
aldowilmeryapita
FOOD LAWS.pptxbshdhdhdhdhdhhdhdhdhdhdhhdh
FOOD LAWS.pptxbshdhdhdhdhdhhdhdhdhdhdhhdhFOOD LAWS.pptxbshdhdhdhdhdhhdhdhdhdhdhhdh
FOOD LAWS.pptxbshdhdhdhdhdhhdhdhdhdhdhhdh
cshdhdhvfsbzdb
Drillingis_optimizedusingartificialneural.pptx
Drillingis_optimizedusingartificialneural.pptxDrillingis_optimizedusingartificialneural.pptx
Drillingis_optimizedusingartificialneural.pptx
singhsanjays2107
GLOBAL-GOALS-LOCAL-ACTIONS-The-SDG-Journey-from-Vision-to-Reality.pptx
GLOBAL-GOALS-LOCAL-ACTIONS-The-SDG-Journey-from-Vision-to-Reality.pptxGLOBAL-GOALS-LOCAL-ACTIONS-The-SDG-Journey-from-Vision-to-Reality.pptx
GLOBAL-GOALS-LOCAL-ACTIONS-The-SDG-Journey-from-Vision-to-Reality.pptx
KunalBhadana3
LITERATURE-MODEL.pptxddddddddddddddddddddddddddddddddd
LITERATURE-MODEL.pptxdddddddddddddddddddddddddddddddddLITERATURE-MODEL.pptxddddddddddddddddddddddddddddddddd
LITERATURE-MODEL.pptxddddddddddddddddddddddddddddddddd
Maimai708843
Mastering Data Science with Tutort Academy
Mastering Data Science with Tutort AcademyMastering Data Science with Tutort Academy
Mastering Data Science with Tutort Academy
yashikanigam1
MeasureCamp Belgrade 2025 - Yasen Lilov - Past - Present - Prompt
MeasureCamp Belgrade 2025 - Yasen Lilov - Past - Present - PromptMeasureCamp Belgrade 2025 - Yasen Lilov - Past - Present - Prompt
MeasureCamp Belgrade 2025 - Yasen Lilov - Past - Present - Prompt
Yasen Lilov
chap2_nnejjejehhehehhhhhhhhhehslides.ppt
chap2_nnejjejehhehehhhhhhhhhehslides.pptchap2_nnejjejehhehehhhhhhhhhehslides.ppt
chap2_nnejjejehhehehhhhhhhhhehslides.ppt
Nikhil620181
Capital market of Nigeria and its economic values
Capital market of Nigeria and its economic valuesCapital market of Nigeria and its economic values
Capital market of Nigeria and its economic values
ezehnelson104
Construction Management full notes (15CV61).pdf
Construction Management full notes (15CV61).pdfConstruction Management full notes (15CV61).pdf
Construction Management full notes (15CV61).pdf
Ajaharuddin1
High-Paying Data Analytics Opportunities in Jaipur and Boost Your Career.pdf
High-Paying Data Analytics Opportunities in Jaipur and Boost Your Career.pdfHigh-Paying Data Analytics Opportunities in Jaipur and Boost Your Career.pdf
High-Paying Data Analytics Opportunities in Jaipur and Boost Your Career.pdf
vinay salarite

Big data- hadoop -MapReduce

  • 1. DepartmentofMechanicalEngineering Humility Entrepreneurship Teamwork Learning Social Responsibility Respect for IndividualDeliver The Promise GMRInstituteofTechnology,Rajam Term Paper Final-Review GMR Institute of Technology An Autonomous Institute Affiliated to JNTUK, Kakinada 1 Department of Computer Science Engineering
  • 2. DepartmentofMechanicalEngineering Humility Entrepreneurship Teamwork Learning Social Responsibility Respect for IndividualDeliver The Promise GMRInstituteofTechnology,Rajam Performance analysis of MapReduce task in Big data using Hadoop 2November 13, 2016 TITLE by M. S. V. S. K .Avadhani (14341A05A4) Under the Guidance and supervision Of Mrs. K . Jayasri Assistant Professor Department Of Computer Science Engineering
  • 3. DepartmentofMechanicalEngineering Humility Entrepreneurship Teamwork Learning Social Responsibility Respect for IndividualDeliver The Promise GMRInstituteofTechnology,Rajam ABSTRACT Big Data is a huge amount of data that cannot be managed by the traditional data management system. There can be three forms of data, structured form, unstructured form and semi structured form. Most of the part of big data is in unstructured form. Unstructured data is difficult to handle. Hadoop is a technological answer to Big Data. The Apache Hadoop project provides better tools and techniques to handle this huge amount of data. A Hadoop Distributed File System (HDFS) for storage and the MapReduce techniques for processing the data. This paper discusses the work done on Hadoop by applying a number of files as input to the system and then analysing the performance of the Hadoop .
  • 4. DepartmentofMechanicalEngineering Humility Entrepreneurship Teamwork Learning Social Responsibility Respect for IndividualDeliver The Promise GMRInstituteofTechnology,Rajam ABSTRACT(contd..) Besides it discusses the behaviour of the map method and the reduce method with increasing number of files and the amount of bytes written and read by these tasks. oKeywords: Big data Hadoop HDFS MapReduce. 4November 13, 2016
  • 5. DepartmentofMechanicalEngineering Humility Entrepreneurship Teamwork Learning Social Responsibility Respect for IndividualDeliver The Promise GMRInstituteofTechnology,Rajam 5November 13, 2016
  • 6. DepartmentofMechanicalEngineering Humility Entrepreneurship Teamwork Learning Social Responsibility Respect for IndividualDeliver The Promise GMRInstituteofTechnology,Rajam Hadoop Hadoop is an open-source framework that allows to store and process big data in a distributed environment across clusters of commodity hardware. Storing HDFS(Hadoop Distributed File System) Processing MapReduce 6November 13, 2016
  • 7. DepartmentofMechanicalEngineering Humility Entrepreneurship Teamwork Learning Social Responsibility Respect for IndividualDeliver The Promise GMRInstituteofTechnology,Rajam HDFS: Specially designed file system for storing huge data sets in cluster of commodity hardware with streaming access pattern. 5 services : Name node Secondary node Job tracker Data node Task tracker 7November 13, 2016
  • 8. DepartmentofMechanicalEngineering Humility Entrepreneurship Teamwork Learning Social Responsibility Respect for IndividualDeliver The Promise GMRInstituteofTechnology,Rajam 8November 13, 2016
  • 9. DepartmentofMechanicalEngineering Humility Entrepreneurship Teamwork Learning Social Responsibility Respect for IndividualDeliver The Promise GMRInstituteofTechnology,Rajam MapReduce: MapReduce is a processing technique and a program model for distributed computing based on java. The MapReduce algorithm contains two important tasks, namely Map and Reduce. Map stage : The map or mappers job is to process the input data. Reduce stage : This stage is the combination of the Shuffle stage and the Reduce stage. 9November 13, 2016
  • 10. DepartmentofMechanicalEngineering Humility Entrepreneurship Teamwork Learning Social Responsibility Respect for IndividualDeliver The Promise GMRInstituteofTechnology,Rajam 10November 13, 2016
  • 11. DepartmentofMechanicalEngineering Humility Entrepreneurship Teamwork Learning Social Responsibility Respect for IndividualDeliver The Promise GMRInstituteofTechnology,Rajam The performance of the MapReduce task on the basis of the byte written, File bytes read, Reduce input records, have been recorded in the beside Table. Number of bytes written by the Map Reduce task does not increase with the rate at which the number of files is increasing. 11November 13, 2016
  • 12. DepartmentofMechanicalEngineering Humility Entrepreneurship Teamwork Learning Social Responsibility Respect for IndividualDeliver The Promise GMRInstituteofTechnology,Rajam The reason is that when the reduce function reduces the map output it just combines the output of map reduce like in example two time how is saved with only a single value increase by one. 12November 13, 2016
  • 13. DepartmentofMechanicalEngineering Humility Entrepreneurship Teamwork Learning Social Responsibility Respect for IndividualDeliver The Promise GMRInstituteofTechnology,Rajam Conclusion: We have analyzed the performance of the map reduce task with the increase number of files. We have used the word count application of the Map reduce for this analysis. The output shows that the Bytes written do not increase in the same proportion as compared to the amount of files increase. 13November 13, 2016
  • 14. DepartmentofMechanicalEngineering Humility Entrepreneurship Teamwork Learning Social Responsibility Respect for IndividualDeliver The Promise GMRInstituteofTechnology,Rajam REFERENCES [1] Shankar Ganesh Manikandan, Siddarth Ravi , Big Data Analysis using Apache Hadoop, IEEE,2014 [2] Ankita Saldhi, Abhinav Goel, Big Data Analysis Using Hadoop Cluster, IEEE,2014 [3] Amrit Pal, Pinki Agrawal, Kunal Jain, Kunal Jain, A Performance Analysis of MapReduce Task with Large Number of Files Dataset in Big Data Using Hadoop, 2014 Fourth International Conference on Communication Systems and Network Technologies [4] Aditya B. Patel, Manashvi Birla, Ushma Nair, Big Data Problem Using Hadoop and Map Reduce, NIRMA UNIVERSITY INTERNATIONAL CONFERENCE ON ENGINEERING, NUiCONE -2012 14November 13, 2016
  • 15. DepartmentofMechanicalEngineering Humility Entrepreneurship Teamwork Learning Social Responsibility Respect for IndividualDeliver The Promise GMRInstituteofTechnology,Rajam Thank you. -Avadhani M.k 15November 13, 2016