際際滷

際際滷Share a Scribd company logo
R Programming
What is R?
 R is worlds most widely used statistics programming language .
R is a programming language and software environment for
 Statistical analysis.
 Graphics representation and reporting .
R provides a suite of operators for calculations on arrays, lists,
vectors and matrices.
History
 R is a programming language it was an
implementation over S language. R was first
designed by Ross Ihaka and Robert Gentleman
at the University of Auckland in 1993
 It was stable released on October 31st 2014 the
four months ago, by R Development Core
Team Under GNU General Public License
Introduction
 R is a programming language and software environment for statistical computing
and graphics
 The R language is widely used among statisticians software and data analysis
 It compiles and runs on a wide variety of UNIX platforms, Windows and Mac OS.
 R can be downloaded and installed from CRAN website, CRAN stands for
Comprehensive R Archive Network
R - Data Types
Primitive (or atomic) data types in R are:
 Numeric (integer, double, complex)
 Character
 Logical
 Function
Text Mining with R
 R is an open source language and environment for statistical computing and
graphics. It includes packages like tm, SnowballC, ggplot2 and wordcloud, which
are used to carry out the earlier-mentioned steps in text processing. The first
prerequisite is that Rand R Studio need to be installed on your machine. R is an
open source language and environment for statistical computing and graphics. It
includes packages like tm, SnowballC, ggplot2 and wordcloud, which are used to
carry out the earlier-mentioned steps in text processing. The first prerequisite is
that Rand R Studio need to be installed on your machine.
Packages Used in Text Mining
 RSQLite, SQLite Interface for R
 tm, framework for text mining applications
 SnowballC, text stemming library
 Wordloud, for making wordCloud visualizations
 Syuzhet, text sentiment analysis
Data Mining with R programming
Reading SQLite data in R
 Docs <- Corpus(docs,VectorSource(docs$comments))
# Get all the emails sent by Hillary
 Comm <- read.csv(comments.csv, header = TRUE)
 emailRaw <- paste(emailHillary$EmailBody, collapse=" // ")
Cleaning Text in R
 Install.packages(tm)
 Install.packages(NLP)
 Load text mining package - library(tm)
 docs <- Corpus(VerctorSum(emailRaw))  Corpus it is a collection of text
documents
Processing text in R
 docs <- tm_map(docs, content_transformer(tolower))  It makes all the words to
lower cases.
 docs <- tm_map(docs, removeNumbers) - It removes numbers
 docs <- tm_map(docs, removeWords, stopWords(english))  It removes stop
words like the, is, of
 docs <- tm_map(docs, removePunctuation)  It removes Punctuation
 docs <- tm_map(docs, stripWhiteSpace)  It removes extra White Spaces
SnowballC to Stem Text
 #Text stemming (reduces words to their root form)
 library("SnowballC")
 docs <- tm_map(docs, stemDocument)
 # Remove additional stopwords
 docs <- tm_map(docs, removeWords, c("clintonemailcom", "stategov", "hrod"))
SnowballC to Stem Text
 dtm <- TermDocumentMatrix(docs)
 m <- as.matrix(dtm)
 v <- sort(rowSums(m),decreasing=TRUE)
 d <- data.frame(word = names(v),freq=v)
 head(d, 10)
Some picture
Visualizations
 #Wordcloud
 Uses two libraries libraries  wordcloud and
RcolorBrewer
 #Sentiment Analysis
 Uses library - syuzhet
Data Mining with R programming
Data Mining with R programming
Data Mining with R programming
Data Mining with R programming
Data Mining with R programming
Data Mining with R programming
Data Mining with R programming
Data Mining with R programming
k

More Related Content

What's hot (20)

2nd puc computer science chapter 3 data structures 1
2nd puc computer science chapter 3 data structures 12nd puc computer science chapter 3 data structures 1
2nd puc computer science chapter 3 data structures 1
Aahwini Esware gowda
Dynamic HTML (DHTML)
Dynamic HTML (DHTML)Dynamic HTML (DHTML)
Dynamic HTML (DHTML)
Himanshu Kumar
Data Visualization in Python
Data Visualization in PythonData Visualization in Python
Data Visualization in Python
Jagriti Goswami
Data science life cycle
Data science life cycleData science life cycle
Data science life cycle
Manoj Mishra
Reading Data into R
Reading Data into RReading Data into R
Reading Data into R
Kazuki Yoshida
Client & server side scripting
Client & server side scriptingClient & server side scripting
Client & server side scripting
baabtra.com - No. 1 supplier of quality freshers
An Introduction to Data Mining with R
An Introduction to Data Mining with RAn Introduction to Data Mining with R
An Introduction to Data Mining with R
Yanchang Zhao
Descriptive Statistics with R
Descriptive Statistics with RDescriptive Statistics with R
Descriptive Statistics with R
Kazuki Yoshida
Regular Expressions
Regular ExpressionsRegular Expressions
Regular Expressions
Satya Narayana
Unit 4 data storage and querying
Unit 4   data storage and queryingUnit 4   data storage and querying
Unit 4 data storage and querying
Ravindran Kannan
Loops in R
Loops in RLoops in R
Loops in R
Chris Orwa
Data frame operations
Data frame operationsData frame operations
Data frame operations
19MSS011dhanyatha
2. Entity Relationship Model in DBMS
2. Entity Relationship Model in DBMS2. Entity Relationship Model in DBMS
2. Entity Relationship Model in DBMS
koolkampus
Data Mining: Association Rules Basics
Data Mining: Association Rules BasicsData Mining: Association Rules Basics
Data Mining: Association Rules Basics
Benazir Income Support Program (BISP)
12. Indexing and Hashing in DBMS
12. Indexing and Hashing in DBMS12. Indexing and Hashing in DBMS
12. Indexing and Hashing in DBMS
koolkampus
Exploratory data analysis
Exploratory data analysis Exploratory data analysis
Exploratory data analysis
Peter Reimann
String matching algorithms
String matching algorithmsString matching algorithms
String matching algorithms
Ashikapokiya12345
Data models
Data modelsData models
Data models
Usman Tariq
Isam
IsamIsam
Isam
Javed Khan
Unit 1: Introduction to DBMS Unit 1 Complete
Unit 1: Introduction to DBMS Unit 1 CompleteUnit 1: Introduction to DBMS Unit 1 Complete
Unit 1: Introduction to DBMS Unit 1 Complete
Raj vardhan
2nd puc computer science chapter 3 data structures 1
2nd puc computer science chapter 3 data structures 12nd puc computer science chapter 3 data structures 1
2nd puc computer science chapter 3 data structures 1
Aahwini Esware gowda
Dynamic HTML (DHTML)
Dynamic HTML (DHTML)Dynamic HTML (DHTML)
Dynamic HTML (DHTML)
Himanshu Kumar
Data Visualization in Python
Data Visualization in PythonData Visualization in Python
Data Visualization in Python
Jagriti Goswami
Data science life cycle
Data science life cycleData science life cycle
Data science life cycle
Manoj Mishra
Reading Data into R
Reading Data into RReading Data into R
Reading Data into R
Kazuki Yoshida
An Introduction to Data Mining with R
An Introduction to Data Mining with RAn Introduction to Data Mining with R
An Introduction to Data Mining with R
Yanchang Zhao
Descriptive Statistics with R
Descriptive Statistics with RDescriptive Statistics with R
Descriptive Statistics with R
Kazuki Yoshida
Regular Expressions
Regular ExpressionsRegular Expressions
Regular Expressions
Satya Narayana
Unit 4 data storage and querying
Unit 4   data storage and queryingUnit 4   data storage and querying
Unit 4 data storage and querying
Ravindran Kannan
2. Entity Relationship Model in DBMS
2. Entity Relationship Model in DBMS2. Entity Relationship Model in DBMS
2. Entity Relationship Model in DBMS
koolkampus
12. Indexing and Hashing in DBMS
12. Indexing and Hashing in DBMS12. Indexing and Hashing in DBMS
12. Indexing and Hashing in DBMS
koolkampus
Exploratory data analysis
Exploratory data analysis Exploratory data analysis
Exploratory data analysis
Peter Reimann
String matching algorithms
String matching algorithmsString matching algorithms
String matching algorithms
Ashikapokiya12345
Unit 1: Introduction to DBMS Unit 1 Complete
Unit 1: Introduction to DBMS Unit 1 CompleteUnit 1: Introduction to DBMS Unit 1 Complete
Unit 1: Introduction to DBMS Unit 1 Complete
Raj vardhan

Similar to Data Mining with R programming (20)

R programming Language , Rahul Singh
R programming Language , Rahul SinghR programming Language , Rahul Singh
R programming Language , Rahul Singh
Ravi Basil
R programming language
R programming languageR programming language
R programming language
Keerti Verma
R Programming Language
R Programming LanguageR Programming Language
R Programming Language
NareshKarela1
R basics for MBA Students[1].pptx
R basics for MBA Students[1].pptxR basics for MBA Students[1].pptx
R basics for MBA Students[1].pptx
rajalakshmi5921
1_Introduction.pptx
1_Introduction.pptx1_Introduction.pptx
1_Introduction.pptx
ranapoonam1
Introduction to R Programming
Introduction to R ProgrammingIntroduction to R Programming
Introduction to R Programming
hemasri56
Best corporate-r-programming-training-in-mumbai
Best corporate-r-programming-training-in-mumbaiBest corporate-r-programming-training-in-mumbai
Best corporate-r-programming-training-in-mumbai
Unmesh Baile
Introduction to R and R Studio
Introduction to R and R StudioIntroduction to R and R Studio
Introduction to R and R Studio
Rupak Roy
R as supporting tool for analytics and simulation
R as supporting tool for analytics and simulationR as supporting tool for analytics and simulation
R as supporting tool for analytics and simulation
Alvaro Gil
R_L1-Aug-2022.pptx
R_L1-Aug-2022.pptxR_L1-Aug-2022.pptx
R_L1-Aug-2022.pptx
ShantilalBhayal1
R language
R languageR language
R language
K狸sh淡r Kr樽h
R programming
R programmingR programming
R programming
Pooja Sharma
Basics-of-R-programming.9625714.powerpoint.pptx
Basics-of-R-programming.9625714.powerpoint.pptxBasics-of-R-programming.9625714.powerpoint.pptx
Basics-of-R-programming.9625714.powerpoint.pptx
MSANDHYARANI3
Basics of R programming for analytics [Autosaved] (1).pdf
Basics of R programming for analytics [Autosaved] (1).pdfBasics of R programming for analytics [Autosaved] (1).pdf
Basics of R programming for analytics [Autosaved] (1).pdf
suanshu15
R_Language_study_forstudents_R_Material.ppt
R_Language_study_forstudents_R_Material.pptR_Language_study_forstudents_R_Material.ppt
R_Language_study_forstudents_R_Material.ppt
Suresh Babu
Brief introduction to R Lecturenotes1_R .ppt
Brief introduction to R  Lecturenotes1_R .pptBrief introduction to R  Lecturenotes1_R .ppt
Brief introduction to R Lecturenotes1_R .ppt
geethar79
Lecture1_R.ppt
Lecture1_R.pptLecture1_R.ppt
Lecture1_R.ppt
ArchishaKhandareSS20
Lecture1 r
Lecture1 rLecture1 r
Lecture1 r
Sandeep242951
Lecture1_R.ppt
Lecture1_R.pptLecture1_R.ppt
Lecture1_R.ppt
vikassingh569137
Modeling in R Programming Language for Beginers.ppt
Modeling in R Programming Language for Beginers.pptModeling in R Programming Language for Beginers.ppt
Modeling in R Programming Language for Beginers.ppt
anshikagoel52
R programming Language , Rahul Singh
R programming Language , Rahul SinghR programming Language , Rahul Singh
R programming Language , Rahul Singh
Ravi Basil
R programming language
R programming languageR programming language
R programming language
Keerti Verma
R Programming Language
R Programming LanguageR Programming Language
R Programming Language
NareshKarela1
R basics for MBA Students[1].pptx
R basics for MBA Students[1].pptxR basics for MBA Students[1].pptx
R basics for MBA Students[1].pptx
rajalakshmi5921
1_Introduction.pptx
1_Introduction.pptx1_Introduction.pptx
1_Introduction.pptx
ranapoonam1
Introduction to R Programming
Introduction to R ProgrammingIntroduction to R Programming
Introduction to R Programming
hemasri56
Best corporate-r-programming-training-in-mumbai
Best corporate-r-programming-training-in-mumbaiBest corporate-r-programming-training-in-mumbai
Best corporate-r-programming-training-in-mumbai
Unmesh Baile
Introduction to R and R Studio
Introduction to R and R StudioIntroduction to R and R Studio
Introduction to R and R Studio
Rupak Roy
R as supporting tool for analytics and simulation
R as supporting tool for analytics and simulationR as supporting tool for analytics and simulation
R as supporting tool for analytics and simulation
Alvaro Gil
Basics-of-R-programming.9625714.powerpoint.pptx
Basics-of-R-programming.9625714.powerpoint.pptxBasics-of-R-programming.9625714.powerpoint.pptx
Basics-of-R-programming.9625714.powerpoint.pptx
MSANDHYARANI3
Basics of R programming for analytics [Autosaved] (1).pdf
Basics of R programming for analytics [Autosaved] (1).pdfBasics of R programming for analytics [Autosaved] (1).pdf
Basics of R programming for analytics [Autosaved] (1).pdf
suanshu15
R_Language_study_forstudents_R_Material.ppt
R_Language_study_forstudents_R_Material.pptR_Language_study_forstudents_R_Material.ppt
R_Language_study_forstudents_R_Material.ppt
Suresh Babu
Brief introduction to R Lecturenotes1_R .ppt
Brief introduction to R  Lecturenotes1_R .pptBrief introduction to R  Lecturenotes1_R .ppt
Brief introduction to R Lecturenotes1_R .ppt
geethar79
Modeling in R Programming Language for Beginers.ppt
Modeling in R Programming Language for Beginers.pptModeling in R Programming Language for Beginers.ppt
Modeling in R Programming Language for Beginers.ppt
anshikagoel52

Recently uploaded (20)

decarbonization steel industry rev1.pptx
decarbonization steel industry rev1.pptxdecarbonization steel industry rev1.pptx
decarbonization steel industry rev1.pptx
gonzalezolabarriaped
Integration of Additive Manufacturing (AM) with IoT : A Smart Manufacturing A...
Integration of Additive Manufacturing (AM) with IoT : A Smart Manufacturing A...Integration of Additive Manufacturing (AM) with IoT : A Smart Manufacturing A...
Integration of Additive Manufacturing (AM) with IoT : A Smart Manufacturing A...
ASHISHDESAI85
Soil Properties and Methods of Determination
Soil Properties and  Methods of DeterminationSoil Properties and  Methods of Determination
Soil Properties and Methods of Determination
Rajani Vyawahare
15. Smart Cities Big Data, Civic Hackers, and the Quest for a New Utopia.pdf
15. Smart Cities Big Data, Civic Hackers, and the Quest for a New Utopia.pdf15. Smart Cities Big Data, Civic Hackers, and the Quest for a New Utopia.pdf
15. Smart Cities Big Data, Civic Hackers, and the Quest for a New Utopia.pdf
NgocThang9
Cyber Security_ Protecting the Digital World.pptx
Cyber Security_ Protecting the Digital World.pptxCyber Security_ Protecting the Digital World.pptx
Cyber Security_ Protecting the Digital World.pptx
Harshith A S
Cloud Computing concepts and technologies
Cloud Computing concepts and technologiesCloud Computing concepts and technologies
Cloud Computing concepts and technologies
ssuser4c9444
GM Meeting 070225 TO 130225 for 2024.pptx
GM Meeting 070225 TO 130225 for 2024.pptxGM Meeting 070225 TO 130225 for 2024.pptx
GM Meeting 070225 TO 130225 for 2024.pptx
crdslalcomumbai
Wireless-Charger presentation for seminar .pdf
Wireless-Charger presentation for seminar .pdfWireless-Charger presentation for seminar .pdf
Wireless-Charger presentation for seminar .pdf
AbhinandanMishra30
The Golden Gate Bridge a structural marvel inspired by mother nature.pptx
The Golden Gate Bridge a structural marvel inspired by mother nature.pptxThe Golden Gate Bridge a structural marvel inspired by mother nature.pptx
The Golden Gate Bridge a structural marvel inspired by mother nature.pptx
AkankshaRawat75
How to Make an RFID Door Lock System using Arduino
How to Make an RFID Door Lock System using ArduinoHow to Make an RFID Door Lock System using Arduino
How to Make an RFID Door Lock System using Arduino
CircuitDigest
Piping-and-pipeline-calculations-manual.pdf
Piping-and-pipeline-calculations-manual.pdfPiping-and-pipeline-calculations-manual.pdf
Piping-and-pipeline-calculations-manual.pdf
OMI0721
only history of java.pptx real bihind the name java
only history of java.pptx real bihind the name javaonly history of java.pptx real bihind the name java
only history of java.pptx real bihind the name java
mushtaqsaliq9
Indian Soil Classification System in Geotechnical Engineering
Indian Soil Classification System in Geotechnical EngineeringIndian Soil Classification System in Geotechnical Engineering
Indian Soil Classification System in Geotechnical Engineering
Rajani Vyawahare
TM-ASP-101-RF_Air Press manual crimping machine.pdf
TM-ASP-101-RF_Air Press manual crimping machine.pdfTM-ASP-101-RF_Air Press manual crimping machine.pdf
TM-ASP-101-RF_Air Press manual crimping machine.pdf
ChungLe60
Taykon-Kalite belgeleri
Taykon-Kalite belgeleriTaykon-Kalite belgeleri
Taykon-Kalite belgeleri
TAYKON
Engineering at Lovely Professional University (LPU).pdf
Engineering at Lovely Professional University (LPU).pdfEngineering at Lovely Professional University (LPU).pdf
Engineering at Lovely Professional University (LPU).pdf
Sona
Env and Water Supply Engg._Dr. Hasan.pdf
Env and Water Supply Engg._Dr. Hasan.pdfEnv and Water Supply Engg._Dr. Hasan.pdf
Env and Water Supply Engg._Dr. Hasan.pdf
MahmudHasan747870
US Patented ReGenX Generator, ReGen-X Quatum Motor EV Regenerative Accelerati...
US Patented ReGenX Generator, ReGen-X Quatum Motor EV Regenerative Accelerati...US Patented ReGenX Generator, ReGen-X Quatum Motor EV Regenerative Accelerati...
US Patented ReGenX Generator, ReGen-X Quatum Motor EV Regenerative Accelerati...
Thane Heins NOBEL PRIZE WINNING ENERGY RESEARCHER
Introduction to Safety, Health & Environment
Introduction to Safety, Health  & EnvironmentIntroduction to Safety, Health  & Environment
Introduction to Safety, Health & Environment
ssuserc606c7
How to Build a Maze Solving Robot Using Arduino
How to Build a Maze Solving Robot Using ArduinoHow to Build a Maze Solving Robot Using Arduino
How to Build a Maze Solving Robot Using Arduino
CircuitDigest
decarbonization steel industry rev1.pptx
decarbonization steel industry rev1.pptxdecarbonization steel industry rev1.pptx
decarbonization steel industry rev1.pptx
gonzalezolabarriaped
Integration of Additive Manufacturing (AM) with IoT : A Smart Manufacturing A...
Integration of Additive Manufacturing (AM) with IoT : A Smart Manufacturing A...Integration of Additive Manufacturing (AM) with IoT : A Smart Manufacturing A...
Integration of Additive Manufacturing (AM) with IoT : A Smart Manufacturing A...
ASHISHDESAI85
Soil Properties and Methods of Determination
Soil Properties and  Methods of DeterminationSoil Properties and  Methods of Determination
Soil Properties and Methods of Determination
Rajani Vyawahare
15. Smart Cities Big Data, Civic Hackers, and the Quest for a New Utopia.pdf
15. Smart Cities Big Data, Civic Hackers, and the Quest for a New Utopia.pdf15. Smart Cities Big Data, Civic Hackers, and the Quest for a New Utopia.pdf
15. Smart Cities Big Data, Civic Hackers, and the Quest for a New Utopia.pdf
NgocThang9
Cyber Security_ Protecting the Digital World.pptx
Cyber Security_ Protecting the Digital World.pptxCyber Security_ Protecting the Digital World.pptx
Cyber Security_ Protecting the Digital World.pptx
Harshith A S
Cloud Computing concepts and technologies
Cloud Computing concepts and technologiesCloud Computing concepts and technologies
Cloud Computing concepts and technologies
ssuser4c9444
GM Meeting 070225 TO 130225 for 2024.pptx
GM Meeting 070225 TO 130225 for 2024.pptxGM Meeting 070225 TO 130225 for 2024.pptx
GM Meeting 070225 TO 130225 for 2024.pptx
crdslalcomumbai
Wireless-Charger presentation for seminar .pdf
Wireless-Charger presentation for seminar .pdfWireless-Charger presentation for seminar .pdf
Wireless-Charger presentation for seminar .pdf
AbhinandanMishra30
The Golden Gate Bridge a structural marvel inspired by mother nature.pptx
The Golden Gate Bridge a structural marvel inspired by mother nature.pptxThe Golden Gate Bridge a structural marvel inspired by mother nature.pptx
The Golden Gate Bridge a structural marvel inspired by mother nature.pptx
AkankshaRawat75
How to Make an RFID Door Lock System using Arduino
How to Make an RFID Door Lock System using ArduinoHow to Make an RFID Door Lock System using Arduino
How to Make an RFID Door Lock System using Arduino
CircuitDigest
Piping-and-pipeline-calculations-manual.pdf
Piping-and-pipeline-calculations-manual.pdfPiping-and-pipeline-calculations-manual.pdf
Piping-and-pipeline-calculations-manual.pdf
OMI0721
only history of java.pptx real bihind the name java
only history of java.pptx real bihind the name javaonly history of java.pptx real bihind the name java
only history of java.pptx real bihind the name java
mushtaqsaliq9
Indian Soil Classification System in Geotechnical Engineering
Indian Soil Classification System in Geotechnical EngineeringIndian Soil Classification System in Geotechnical Engineering
Indian Soil Classification System in Geotechnical Engineering
Rajani Vyawahare
TM-ASP-101-RF_Air Press manual crimping machine.pdf
TM-ASP-101-RF_Air Press manual crimping machine.pdfTM-ASP-101-RF_Air Press manual crimping machine.pdf
TM-ASP-101-RF_Air Press manual crimping machine.pdf
ChungLe60
Taykon-Kalite belgeleri
Taykon-Kalite belgeleriTaykon-Kalite belgeleri
Taykon-Kalite belgeleri
TAYKON
Engineering at Lovely Professional University (LPU).pdf
Engineering at Lovely Professional University (LPU).pdfEngineering at Lovely Professional University (LPU).pdf
Engineering at Lovely Professional University (LPU).pdf
Sona
Env and Water Supply Engg._Dr. Hasan.pdf
Env and Water Supply Engg._Dr. Hasan.pdfEnv and Water Supply Engg._Dr. Hasan.pdf
Env and Water Supply Engg._Dr. Hasan.pdf
MahmudHasan747870
Introduction to Safety, Health & Environment
Introduction to Safety, Health  & EnvironmentIntroduction to Safety, Health  & Environment
Introduction to Safety, Health & Environment
ssuserc606c7
How to Build a Maze Solving Robot Using Arduino
How to Build a Maze Solving Robot Using ArduinoHow to Build a Maze Solving Robot Using Arduino
How to Build a Maze Solving Robot Using Arduino
CircuitDigest

Data Mining with R programming

  • 2. What is R? R is worlds most widely used statistics programming language . R is a programming language and software environment for Statistical analysis. Graphics representation and reporting . R provides a suite of operators for calculations on arrays, lists, vectors and matrices.
  • 3. History R is a programming language it was an implementation over S language. R was first designed by Ross Ihaka and Robert Gentleman at the University of Auckland in 1993 It was stable released on October 31st 2014 the four months ago, by R Development Core Team Under GNU General Public License
  • 4. Introduction R is a programming language and software environment for statistical computing and graphics The R language is widely used among statisticians software and data analysis It compiles and runs on a wide variety of UNIX platforms, Windows and Mac OS. R can be downloaded and installed from CRAN website, CRAN stands for Comprehensive R Archive Network
  • 5. R - Data Types Primitive (or atomic) data types in R are: Numeric (integer, double, complex) Character Logical Function
  • 6. Text Mining with R R is an open source language and environment for statistical computing and graphics. It includes packages like tm, SnowballC, ggplot2 and wordcloud, which are used to carry out the earlier-mentioned steps in text processing. The first prerequisite is that Rand R Studio need to be installed on your machine. R is an open source language and environment for statistical computing and graphics. It includes packages like tm, SnowballC, ggplot2 and wordcloud, which are used to carry out the earlier-mentioned steps in text processing. The first prerequisite is that Rand R Studio need to be installed on your machine.
  • 7. Packages Used in Text Mining RSQLite, SQLite Interface for R tm, framework for text mining applications SnowballC, text stemming library Wordloud, for making wordCloud visualizations Syuzhet, text sentiment analysis
  • 9. Reading SQLite data in R Docs <- Corpus(docs,VectorSource(docs$comments)) # Get all the emails sent by Hillary Comm <- read.csv(comments.csv, header = TRUE) emailRaw <- paste(emailHillary$EmailBody, collapse=" // ")
  • 10. Cleaning Text in R Install.packages(tm) Install.packages(NLP) Load text mining package - library(tm) docs <- Corpus(VerctorSum(emailRaw)) Corpus it is a collection of text documents
  • 11. Processing text in R docs <- tm_map(docs, content_transformer(tolower)) It makes all the words to lower cases. docs <- tm_map(docs, removeNumbers) - It removes numbers docs <- tm_map(docs, removeWords, stopWords(english)) It removes stop words like the, is, of docs <- tm_map(docs, removePunctuation) It removes Punctuation docs <- tm_map(docs, stripWhiteSpace) It removes extra White Spaces
  • 12. SnowballC to Stem Text #Text stemming (reduces words to their root form) library("SnowballC") docs <- tm_map(docs, stemDocument) # Remove additional stopwords docs <- tm_map(docs, removeWords, c("clintonemailcom", "stategov", "hrod"))
  • 13. SnowballC to Stem Text dtm <- TermDocumentMatrix(docs) m <- as.matrix(dtm) v <- sort(rowSums(m),decreasing=TRUE) d <- data.frame(word = names(v),freq=v) head(d, 10)
  • 14. Some picture Visualizations #Wordcloud Uses two libraries libraries wordcloud and RcolorBrewer #Sentiment Analysis Uses library - syuzhet
  • 23. k

Editor's Notes

  • #3: Old programming No multithreading Data loaded directly into memory limits fuctionlaity for larger datasets Sandboxsubsample data Microsoft working on multicore r h2o