ºÝºÝߣ

ºÝºÝߣShare a Scribd company logo
Igor Santos
Igor Miñambres-Marcos
Carlos Laorden
Patxi Galán-García
Aitor Santamaría-Ibirika
Pablo G. Bringas
Twitter Content-based Spam Filtering - CISIS 2013
Twitter Content-based Spam Filtering - CISIS 2013
Twitter Content-based Spam Filtering - CISIS 2013
Twitter Content-based Spam Filtering - CISIS 2013
Twitter Content-based Spam Filtering - CISIS 2013
Twitter Content-based Spam Filtering - CISIS 2013
Twitter Content-based Spam Filtering - CISIS 2013
Twitter Content-based Spam Filtering - CISIS 2013
Twitter Content-based Spam Filtering - CISIS 2013
Twitter Content-based Spam Filtering - CISIS 2013
Twitter Content-based Spam Filtering - CISIS 2013
Twitter Content-based Spam Filtering - CISIS 2013
Twitter Content-based Spam Filtering - CISIS 2013
Twitter Content-based Spam Filtering - CISIS 2013
Twitter Content-based Spam Filtering - CISIS 2013
Twitter Content-based Spam Filtering - CISIS 2013
Twitter Content-based Spam Filtering - CISIS 2013
Twitter Content-based Spam Filtering - CISIS 2013
Twitter Content-based Spam Filtering - CISIS 2013
Twitter Content-based Spam Filtering - CISIS 2013
Twitter Content-based Spam Filtering - CISIS 2013
Twitter Content-based Spam Filtering - CISIS 2013
Twitter Content-based Spam Filtering - CISIS 2013
Twitter Content-based Spam Filtering - CISIS 2013
Twitter Content-based Spam Filtering - CISIS 2013
Detecting spammer accounts
Content-based analysis
Twitter Content-based Spam Filtering - CISIS 2013
Twitter Content-based Spam Filtering - CISIS 2013
(TweetSpike) (Legitimate)
spam ham
Twitter Content-based Spam Filtering - CISIS 2013
t1
t2
t3
m1
m2
m10
m3
m9
m4
m7
m8
m5
m11
m6
Twitter Content-based Spam Filtering - CISIS 2013
Twitter Content-based Spam Filtering - CISIS 2013
legitimate
spam
legitimate
spam
testing
probability
Dynamic Markov Chain (DMC)
Prediction by Partial Match (PPM)
Twitter Content-based Spam Filtering - CISIS 2013
Classifier Acc. Sp Sr F-Measure AUC
Random Forest N=50 96.42 0.98 0.94 0.96 0.99
DMC without Adaptation 95.99 0.96 0.95 0.96 0.99
Random Forest N=10 95.96 0.97 0.94 0.95 0.99
PPM without Adaptation 94.80 0.97 0.91 0.94 0.99
Naive Bayes Multinomial Word Frequency 94.94 0.95 0.93 0.94 0.98
Bayes K2 94.12 0.99 0.88 0.93 0.98
DMC with Adaptation 93.11 0.94 0.90 0.92 0.98
C4.5 95.79 0.98 0.92 0.95 0.97
KNN K=3 93.71 0.97 0.89 0.93 0.97
SVM PVK 95.81 0.97 0.93 0.95 0.96
PPM with Adaptation 76.50 0.78 0.69 0.72 0.86
Naive Bayes 72.72 0.64 0.89 0.75 0.76
Twitter Content-based Spam Filtering - CISIS 2013
A new and public dataset of twitter
spam to serve as evaluation
Adaptation of content-based
spam filtering to Twitter
A new compression-based text
filtering library for the ML tool WEKA
enhance this approach using social
network features
semantic capabilities by studying
the linguistic relationships
Twitter Content-based Spam Filtering - CISIS 2013
Twitter Content-based Spam Filtering - CISIS 2013
1. Follow me: http://files.twiyo-magazine.com/200000231-
1dfbb1ef57/follow-me-twitter.png
2. Twitter: http://www.redunonet.co/twitter.png
3. Twitter Infography: http://expandedramblings.com/index.php/march-
2013-by-the-numbers-a-few-amazing-twitter-stats
4. Twitter news: http://techtips.biz/wp-
content/uploads/sites/9/2013/07/twitter-news.jpg
5. Customer service: http://www.parature.com/wp-
content/uploads/2012/04/customerservice_twitter.jpg
6. MUSI Deusto: https://twitter.com/MUSIDeusto
7. Gossip: http://polskilive.pl/wp-content/uploads/2013/02/bigstock-
Gossiping-Women-Retro-Clip-A-17343494.jpg
8. Cyber-bullying:
http://jodielouiseuow.files.wordpress.com/2013/05/2010-10-21-cyber-
bullies.jpg
9. Sad teddy bear: http://thumbs.dreamstime.com/x/sad-lonely-teddy-
bear-15726476.jpg
10. Spam bird: http://all4boys.ru/_pu/0/52734883.png
11. Dollars: http://vegasgravy.com/News-detail/two-women-caught-for-
transporting-drug-money-from-vegas/dollars/
12. Day 97: Infected by dustywrath:
http://www.flickr.com/photos/10921499@N07/2187318683
13. my bank sucks by B Rosen:
http://www.flickr.com/photos/rosengrant/3537904106/
14. Spam wall by freezelight:
http://www.flickr.com/photos/63056612@N00/155554663/
15. Bird with boxing gloves: http://www.fightlikeagirlclub.com/wp-
content/uploads/2010/11/Bird-with-Boxing-Gloves.png
16. Twitter media: http://media.meltybuzz.fr/article-1440806-
ajust_930/media.jpg
17. Construction bird: http://i1-news.softpedia-
static.com/images/news2/Malicious-URL-Filtering-on-Twitter-2.jpg
18. Bird in egg: http://needsomeonetoblog.com/wp-
content/uploads/2013/07/bigstock-Blue-bird-in-egg-6079257.jpg
19. Document folder:
http://www.gsstr.nl/upload/9/4/1/gsstr/documentfolder.large.jpg?0.7202
662836172612
20. ZIP: http://www.kohl.bz/fileadmin/template/ZIP.png
21. Bird in pole: http://www.microcenterblog.com/wp-
content/uploads/2013/01/Fake-or-Real-150x150.jpg
22. Bird screaming: http://www.bluewaterbrand.com/wp-
content/uploads/2013/04/168_2671597.jpg
23. Bird with sign: http://blog.retirementincomenetwork.com/wp-
content/uploads/2013/05/twitter-bird.jpg
24. Bird in lineup: http://sparkboutik.com/wp-
content/uploads/2012/01/twitterfauxpas.jpg

More Related Content

Twitter Content-based Spam Filtering - CISIS 2013

  • 1. Igor Santos Igor Miñambres-Marcos Carlos Laorden Patxi Galán-García Aitor Santamaría-Ibirika Pablo G. Bringas
  • 37. Dynamic Markov Chain (DMC) Prediction by Partial Match (PPM)
  • 39. Classifier Acc. Sp Sr F-Measure AUC Random Forest N=50 96.42 0.98 0.94 0.96 0.99 DMC without Adaptation 95.99 0.96 0.95 0.96 0.99 Random Forest N=10 95.96 0.97 0.94 0.95 0.99 PPM without Adaptation 94.80 0.97 0.91 0.94 0.99 Naive Bayes Multinomial Word Frequency 94.94 0.95 0.93 0.94 0.98 Bayes K2 94.12 0.99 0.88 0.93 0.98 DMC with Adaptation 93.11 0.94 0.90 0.92 0.98 C4.5 95.79 0.98 0.92 0.95 0.97 KNN K=3 93.71 0.97 0.89 0.93 0.97 SVM PVK 95.81 0.97 0.93 0.95 0.96 PPM with Adaptation 76.50 0.78 0.69 0.72 0.86 Naive Bayes 72.72 0.64 0.89 0.75 0.76
  • 41. A new and public dataset of twitter spam to serve as evaluation Adaptation of content-based spam filtering to Twitter A new compression-based text filtering library for the ML tool WEKA
  • 42. enhance this approach using social network features semantic capabilities by studying the linguistic relationships
  • 45. 1. Follow me: http://files.twiyo-magazine.com/200000231- 1dfbb1ef57/follow-me-twitter.png 2. Twitter: http://www.redunonet.co/twitter.png 3. Twitter Infography: http://expandedramblings.com/index.php/march- 2013-by-the-numbers-a-few-amazing-twitter-stats 4. Twitter news: http://techtips.biz/wp- content/uploads/sites/9/2013/07/twitter-news.jpg 5. Customer service: http://www.parature.com/wp- content/uploads/2012/04/customerservice_twitter.jpg 6. MUSI Deusto: https://twitter.com/MUSIDeusto 7. Gossip: http://polskilive.pl/wp-content/uploads/2013/02/bigstock- Gossiping-Women-Retro-Clip-A-17343494.jpg 8. Cyber-bullying: http://jodielouiseuow.files.wordpress.com/2013/05/2010-10-21-cyber- bullies.jpg 9. Sad teddy bear: http://thumbs.dreamstime.com/x/sad-lonely-teddy- bear-15726476.jpg
  • 46. 10. Spam bird: http://all4boys.ru/_pu/0/52734883.png 11. Dollars: http://vegasgravy.com/News-detail/two-women-caught-for- transporting-drug-money-from-vegas/dollars/ 12. Day 97: Infected by dustywrath: http://www.flickr.com/photos/10921499@N07/2187318683 13. my bank sucks by B Rosen: http://www.flickr.com/photos/rosengrant/3537904106/ 14. Spam wall by freezelight: http://www.flickr.com/photos/63056612@N00/155554663/ 15. Bird with boxing gloves: http://www.fightlikeagirlclub.com/wp- content/uploads/2010/11/Bird-with-Boxing-Gloves.png 16. Twitter media: http://media.meltybuzz.fr/article-1440806- ajust_930/media.jpg 17. Construction bird: http://i1-news.softpedia- static.com/images/news2/Malicious-URL-Filtering-on-Twitter-2.jpg 18. Bird in egg: http://needsomeonetoblog.com/wp- content/uploads/2013/07/bigstock-Blue-bird-in-egg-6079257.jpg
  • 47. 19. Document folder: http://www.gsstr.nl/upload/9/4/1/gsstr/documentfolder.large.jpg?0.7202 662836172612 20. ZIP: http://www.kohl.bz/fileadmin/template/ZIP.png 21. Bird in pole: http://www.microcenterblog.com/wp- content/uploads/2013/01/Fake-or-Real-150x150.jpg 22. Bird screaming: http://www.bluewaterbrand.com/wp- content/uploads/2013/04/168_2671597.jpg 23. Bird with sign: http://blog.retirementincomenetwork.com/wp- content/uploads/2013/05/twitter-bird.jpg 24. Bird in lineup: http://sparkboutik.com/wp- content/uploads/2012/01/twitterfauxpas.jpg