狠狠撸

狠狠撸Share a Scribd company logo
Text Mining:
Detecting Insults in Social
Media Conversations
Group 7:
Dinesh Reddy Srirangapalle
Oluwafunke Balogun
Rajsingh Rathore
Firdous Farooque Shaikh
Aditya Trivedi
Introduction
???Social media has become a very powerful platform
???There are 1.8 Billion people worldwide who use some form of social
media
???73% of US population uses social media
???With the growth of mobile devices and reduction in their costs more
and more people are joining these platforms. The number of active
social media users is growing everyday
???Now social media not only plays a big role in people’s personal lives
but also in their professional lives
Pros and cons of social media
Pros Cons
??? Social media helps people connect
and keep in touch.
??? It also helps people stay updated
with current topics and news.
??? It helps people discover new trends
and topics.
??? It helps organizations better
understand their customers.
??? Giving freedom to people to say
anything to anyone with less or no
adverse affects.
??? Has been proven to have negative
effects on small children like social
awkwardness and inability to
connect with real people.
??? Another serious concern is
cyberbullying.
Problem Statement
With the growth in number of social media users, we have seen a growth in
crimes against kids and the number of cyberbullying cases.
According to cyberbullying.org (http://cyberbullying.org/facts/):
??? In the U.S. average kids start using internet as early as the age of 11-12
??? 36% of the parents take no action to limit or monitor their kids activities on
the internet
??? 25% student in high school have encountered cyberbullying at some point in
their lives
As of result of this we are seeing an increase in:
??? Kids dropping out of school
??? Lower performance in school
??? Disorders like anxiety and depression
In some extreme cases some have resorted to suicide while fighting
cyberbullying.
Purpose
We know that social media plays an important role in our society.
The purpose of our project is to:
??? MONITOR insults
??? PREVENT insults
At the same time maintaining:
??? FREEDOM
??? AND FLEXIBILITY
on the social media platforms
The project also tries to include in the system the “Slang” or
unconventional types of insults which are usually hard to
detect and are not found in standard English dictionary.
Significance of the project
??? In the last couple of decades, presence of internet has changed our
society forever
??? There are still not many strict laws and rules in place when it comes
to the limitations on what people can do and say on social media
platforms
??? There is a need for a mechanism which monitors and blocks insults
on social media
??? Our project tries to provide a way with the help of which social
media can be what it was meant to be and not what some rude and
mean people have made it to be
The design problem and scope for the
project
Scope:
???To build a system that can detect whether a comment is insulting or not
???Also to design a machine learning system that can classify online posts and
comments as insulting or not
The design problem of our project is:
Analysis and design consideration of a system to detect insults in social
media conversations.
Assumptions and Limitations of the
project
Assumptions:
??? Assumptions related to data set
??? Assumptions related to tools that we are using
Limitations:
Our project limited to:
??? Limitations of the tool
??? Limitations of the data set
??? Limited use of Semantria because of the trial version
??? Knowledge and skill of the team
Literature Review
Data Analytics:
Science of examining raw data to draw conclusions which helps to take
better decisions
Social Media Analytics:
Analyzing data collected from blogs and social media websites for
organizational or social benefits
Reputation System:
Computes and publish reputation scores for a set of objects
with in a domain or community
Text Mining:
Similar to “text analysis”. Helps to derive high quality
information from text. It uses patterns and trends to analyze
Systems Development & Methodology
??? Get the Dataset
??? Understand the Dataset
Variables Description Role Level
Insult 1- Insult
0- Non-Insult
Target Unary
Date Time at which the
comment was made
Rejected Interval
Comment Social Conversation Input Text
Analysis of the Dataset:
Analysis of
the Dataset
SAS E-Miner
Semantria
for Excel
Semantria for Excel:
??? Semantria is an Excel add-on that we have used to carry out sentiment analysis,
categorization, easy customization, entity extraction and visualization
Semantria
for Excel
Detailed
Mode
Discovery
Mode
Results & Discussions of Semantria Analysis
Facet Table:
??? A combination of both Facets and Attributes are taken into consideration to build a system of
insults
Themes Breakdown:
??? The above extracted text and graphical representation represents the various themes insults
have been grouped in
Entity Sentiment Breakdown:
??? Entity Sentiment Breakdown chart has extracted various highly mentioned proper nouns from
the social conversations with specific sentiment score assigned to them
Category Breakdown:
??? Based on the ‘Relevancy Score’ technique used by the tool, we get a Category Breakdown
which is a segmentation of insults into various categories
Queries Breakdown:
??? Query Breakdown is based on segmentation of insults which are categorized on the basis of
Boolean Operators
SAS Enterprise Miner
System Diagram we are using to analyze the text data
Results of various nodes
Text Parsing Node
??? Breaks words – parts of speech
??? Maximum number of words - noun
Text Filter Node
Concept Linking
??? Co-occur with a
center term
Text Rule Builder
Rules generated – subset of terms
Text Cluster Node
Cluster Hierarchy
Cluster Frequency
Results – Text cluster node
Recommendation
??? Large social media platforms should carry out in-house analysis to help add to the
library of words and implement a benchmark for a other small social media platform
to follow suit
??? CIO's should tap into the usefulness and uses of social media and learn what tool best
works for their organization environment to carry out in-house analysis
??? Further research can be conducted on these topics with the use of a larger dataset to
allow for more accuracy and insights
Conclusion
??? Our analysis proves to show the need for more sophisticated tools and expertise in
other to carry out text mining to achieve desired results.
??? Text mining is still in its elementary phase in analysis and is still gaining acceptance
??? Vendors are also trying to innovate more user friendly text mining tools while also
embedding text mining into data mining tool. The likes of SAS have already done this
??? The analysis of our project is restricted to the data set we have. We did not aim at
creating a library but instead to show ways on how word parsing and filtering can be
done in relation to text mining
Questions?
Thank you!

More Related Content

Detecting insults in social media conversations

  • 1. Text Mining: Detecting Insults in Social Media Conversations Group 7: Dinesh Reddy Srirangapalle Oluwafunke Balogun Rajsingh Rathore Firdous Farooque Shaikh Aditya Trivedi
  • 2. Introduction ???Social media has become a very powerful platform ???There are 1.8 Billion people worldwide who use some form of social media ???73% of US population uses social media ???With the growth of mobile devices and reduction in their costs more and more people are joining these platforms. The number of active social media users is growing everyday ???Now social media not only plays a big role in people’s personal lives but also in their professional lives
  • 3. Pros and cons of social media Pros Cons ??? Social media helps people connect and keep in touch. ??? It also helps people stay updated with current topics and news. ??? It helps people discover new trends and topics. ??? It helps organizations better understand their customers. ??? Giving freedom to people to say anything to anyone with less or no adverse affects. ??? Has been proven to have negative effects on small children like social awkwardness and inability to connect with real people. ??? Another serious concern is cyberbullying.
  • 4. Problem Statement With the growth in number of social media users, we have seen a growth in crimes against kids and the number of cyberbullying cases. According to cyberbullying.org (http://cyberbullying.org/facts/): ??? In the U.S. average kids start using internet as early as the age of 11-12 ??? 36% of the parents take no action to limit or monitor their kids activities on the internet ??? 25% student in high school have encountered cyberbullying at some point in their lives As of result of this we are seeing an increase in: ??? Kids dropping out of school ??? Lower performance in school ??? Disorders like anxiety and depression In some extreme cases some have resorted to suicide while fighting cyberbullying.
  • 5. Purpose We know that social media plays an important role in our society. The purpose of our project is to: ??? MONITOR insults ??? PREVENT insults At the same time maintaining: ??? FREEDOM ??? AND FLEXIBILITY on the social media platforms The project also tries to include in the system the “Slang” or unconventional types of insults which are usually hard to detect and are not found in standard English dictionary.
  • 6. Significance of the project ??? In the last couple of decades, presence of internet has changed our society forever ??? There are still not many strict laws and rules in place when it comes to the limitations on what people can do and say on social media platforms ??? There is a need for a mechanism which monitors and blocks insults on social media ??? Our project tries to provide a way with the help of which social media can be what it was meant to be and not what some rude and mean people have made it to be
  • 7. The design problem and scope for the project Scope: ???To build a system that can detect whether a comment is insulting or not ???Also to design a machine learning system that can classify online posts and comments as insulting or not The design problem of our project is: Analysis and design consideration of a system to detect insults in social media conversations.
  • 8. Assumptions and Limitations of the project Assumptions: ??? Assumptions related to data set ??? Assumptions related to tools that we are using Limitations: Our project limited to: ??? Limitations of the tool ??? Limitations of the data set ??? Limited use of Semantria because of the trial version ??? Knowledge and skill of the team
  • 9. Literature Review Data Analytics: Science of examining raw data to draw conclusions which helps to take better decisions Social Media Analytics: Analyzing data collected from blogs and social media websites for organizational or social benefits Reputation System: Computes and publish reputation scores for a set of objects with in a domain or community Text Mining: Similar to “text analysis”. Helps to derive high quality information from text. It uses patterns and trends to analyze
  • 10. Systems Development & Methodology ??? Get the Dataset ??? Understand the Dataset Variables Description Role Level Insult 1- Insult 0- Non-Insult Target Unary Date Time at which the comment was made Rejected Interval Comment Social Conversation Input Text
  • 11. Analysis of the Dataset: Analysis of the Dataset SAS E-Miner Semantria for Excel
  • 12. Semantria for Excel: ??? Semantria is an Excel add-on that we have used to carry out sentiment analysis, categorization, easy customization, entity extraction and visualization Semantria for Excel Detailed Mode Discovery Mode
  • 13. Results & Discussions of Semantria Analysis Facet Table: ??? A combination of both Facets and Attributes are taken into consideration to build a system of insults
  • 14. Themes Breakdown: ??? The above extracted text and graphical representation represents the various themes insults have been grouped in
  • 15. Entity Sentiment Breakdown: ??? Entity Sentiment Breakdown chart has extracted various highly mentioned proper nouns from the social conversations with specific sentiment score assigned to them
  • 16. Category Breakdown: ??? Based on the ‘Relevancy Score’ technique used by the tool, we get a Category Breakdown which is a segmentation of insults into various categories
  • 17. Queries Breakdown: ??? Query Breakdown is based on segmentation of insults which are categorized on the basis of Boolean Operators
  • 18. SAS Enterprise Miner System Diagram we are using to analyze the text data
  • 19. Results of various nodes Text Parsing Node ??? Breaks words – parts of speech ??? Maximum number of words - noun
  • 20. Text Filter Node Concept Linking ??? Co-occur with a center term
  • 21. Text Rule Builder Rules generated – subset of terms
  • 22. Text Cluster Node Cluster Hierarchy Cluster Frequency
  • 23. Results – Text cluster node
  • 24. Recommendation ??? Large social media platforms should carry out in-house analysis to help add to the library of words and implement a benchmark for a other small social media platform to follow suit ??? CIO's should tap into the usefulness and uses of social media and learn what tool best works for their organization environment to carry out in-house analysis ??? Further research can be conducted on these topics with the use of a larger dataset to allow for more accuracy and insights
  • 25. Conclusion ??? Our analysis proves to show the need for more sophisticated tools and expertise in other to carry out text mining to achieve desired results. ??? Text mining is still in its elementary phase in analysis and is still gaining acceptance ??? Vendors are also trying to innovate more user friendly text mining tools while also embedding text mining into data mining tool. The likes of SAS have already done this ??? The analysis of our project is restricted to the data set we have. We did not aim at creating a library but instead to show ways on how word parsing and filtering can be done in relation to text mining