R Programming
What is R?
 R is worlds most widely used statistics programming language .
R is a programming language and software environment for
 Statistical analysis.
 Graphics representation and reporting .
R provides a suite of operators for calculations on arrays, lists,
vectors and matrices.
 R is a programming language it was an
implementation over S language. R was first
designed by Ross Ihaka and Robert Gentleman
at the University of Auckland in 1993
 It was stable released on October 31st 2014 the
four months ago, by R Development Core
Team Under GNU General Public License
 R is a programming language and software environment for statistical computing
and graphics
 The R language is widely used among statisticians software and data analysis
 It compiles and runs on a wide variety of UNIX platforms, Windows and Mac OS.
 R can be downloaded and installed from CRAN website, CRAN stands for
Comprehensive R Archive Network
R - Data Types
Primitive (or atomic) data types in R are:
 Numeric (integer, double, complex)
Text Mining with R
 R is an open source language and environment for statistical computing and
graphics. It includes packages like tm, SnowballC, ggplot2 and wordcloud, which
are used to carry out the earlier-mentioned steps in text processing. The first
prerequisite is that Rand R Studio need to be installed on your machine. R is an
open source language and environment for statistical computing and graphics. It
includes packages like tm, SnowballC, ggplot2 and wordcloud, which are used to
carry out the earlier-mentioned steps in text processing. The first prerequisite is
that Rand R Studio need to be installed on your machine.
Packages Used in Text Mining
 RSQLite, SQLite Interface for R
 tm, framework for text mining applications
 SnowballC, text stemming library
 Wordloud, for making wordCloud visualizations
 Syuzhet, text sentiment analysis
Data Mining with R programming
Reading SQLite data in R
 Docs <- Corpus(docs,VectorSource(docs$comments))
# Get all the emails sent by Hillary
 Comm <- read.csv(comments.csv, header = TRUE)
 emailRaw <- paste(emailHillary$EmailBody, collapse=" // ")
Cleaning Text in R
 Load text mining package - library(tm)
 docs <- Corpus(VerctorSum(emailRaw))  Corpus it is a collection of text
Processing text in R
 docs <- tm_map(docs, content_transformer(tolower))  It makes all the words to
lower cases.
 docs <- tm_map(docs, removeNumbers) - It removes numbers
 docs <- tm_map(docs, removeWords, stopWords(english))  It removes stop
words like the, is, of
 docs <- tm_map(docs, removePunctuation)  It removes Punctuation
 docs <- tm_map(docs, stripWhiteSpace)  It removes extra White Spaces
SnowballC to Stem Text
 #Text stemming (reduces words to their root form)
 docs <- tm_map(docs, stemDocument)
 # Remove additional stopwords
 docs <- tm_map(docs, removeWords, c("clintonemailcom", "stategov", "hrod"))
SnowballC to Stem Text
 dtm <- TermDocumentMatrix(docs)
 m <- as.matrix(dtm)
 v <- sort(rowSums(m),decreasing=TRUE)
 d <- data.frame(word = names(v),freq=v)
 head(d, 10)
Some picture
 Uses two libraries libraries  wordcloud and
 #Sentiment Analysis
 Uses library - syuzhet
