Charlie Greenbacker, founder and co-organizer of the DC NLP meetup group, provides a "crash course" in Natural Language Processing techniques and applications.
2. Agenda
? Introduction & Motivation
? Famous Examples
? Basics
? Major Task Areas
? Protips
? Resources
3. Introduction
& Motivation
By ^NLP ̄ we mean...
Natural Language Processing
(#NLProc)
aka Computational Linguistics,Text Analytics, etc.
not Neuro-linguistic Programming! (#NLP)
4. Introduction
& Motivation
Natural Language Processing is...
Using computers to process (i.e., analyze,
understand, generate, etc.) natural human
languages (e.g., English, Chinese, Klingon).
Hello, world! 低挫弊順
5. That sounds hard... why should I care?
? Most of the knowledge created by humans
is unstructured text (information overload)
? Need some way to make sense of it all
? Enable quantitative analysis of text data
Introduction
& Motivation
6. Famous Examples
Siri (Apple, SRI, Nuance)
Speech Recognition/Generation
IBM Watson
Question Answering
Google Translate
MachineTranslation
16. Major Task Areas
Assistive Technologies
? Text simpli?cation
? Predictive text input
? Alternative interfaces
17. Major Task Areas
NLG + Automatic Summarization
? Generating text from data
? Extractive summarization
? Abstractive summarization
18. Major Task Areas
Machine Translation
? From source to target, and back!
? Single terms work... sometimes
? Idioms, metaphors, cultural references
19. Major Task Areas
Sentiment Analysis
? Polarity, intensity, direction
? "Easy" for movie/product reviews
? "Impossible" for nearly anything else
20. Protips
? Domain adaptation
(retrain your models, social media != news)
? Assume everything is in beta
(error rates compound, translate last,
consult the research literature)
? Evaluation is essential
(human judges,^gold standard ̄ data,
cross-validation, appropriate metrics)
22. Resources
(books)
Natural Language
Processing with Python
Bird, Klein, and Loper
Speech and Language______________
Processing______________
Jurafsky and Martin______________
Foundations of Statistical
Natural Language Processing
Manning and Sch┨tze