際際滷

際際滷Share a Scribd company logo
NLP ^Crash Course ̄
Charlie Greenbacker
dcnlp.org
Agenda
? Introduction & Motivation
? Famous Examples
? Basics
? Major Task Areas
? Protips
? Resources
Introduction
& Motivation
By ^NLP ̄ we mean...
Natural Language Processing
(#NLProc)
aka Computational Linguistics,Text Analytics, etc.
not Neuro-linguistic Programming! (#NLP)
Introduction
& Motivation
Natural Language Processing is...
Using computers to process (i.e., analyze,
understand, generate, etc.) natural human
languages (e.g., English, Chinese, Klingon).
Hello, world! 低挫弊順
That sounds hard... why should I care?
? Most of the knowledge created by humans
is unstructured text (information overload)
? Need some way to make sense of it all
? Enable quantitative analysis of text data
Introduction
& Motivation
Famous Examples
Siri (Apple, SRI, Nuance)
Speech Recognition/Generation
IBM Watson
Question Answering
Google Translate
MachineTranslation
Basics
? Segmentation
? Part-of-speech tagging
? Noun phrase (NP) chunking
? Parsing
? Word sense disambiguation
Basics
? Stop words, stemming/lemmatization
? Frequency analysis
(terms, ngrams,TF-IDF)
? Machine learning (classi?cation,
clustering, recommendation)
Major Task Areas
Question Answering
? Match query with knowledge base
? Closed domain vs open domain
? Reasoning about intent of question
Major Task Areas
Speech Recognition
? Speech to text
? Trained/untrained user models
? Voice-based interfaces
Major Task Areas
Named Entity Recognition
? Entity extraction
? Persons, organizations, location
? Grammar, syntax, phrasing
Major Task Areas
Entity Resolution
? Linking names to ground truth
? Disambiguating similar names
Major Task Areas
Co-reference Resolution
? Finding antecedents for pronouns
? Name resolution
Major Task Areas
Relationship Extraction
? Attribute values
? SVO triples
? Populating ontologies
Major Task Areas
Information Retrieval
? Query expansion
? Relevancy of results
? ^More like this ̄
Major Task Areas
Assistive Technologies
? Text simpli?cation
? Predictive text input
? Alternative interfaces
Major Task Areas
NLG + Automatic Summarization
? Generating text from data
? Extractive summarization
? Abstractive summarization
Major Task Areas
Machine Translation
? From source to target, and back!
? Single terms work... sometimes
? Idioms, metaphors, cultural references
Major Task Areas
Sentiment Analysis
? Polarity, intensity, direction
? "Easy" for movie/product reviews
? "Impossible" for nearly anything else
Protips
? Domain adaptation
(retrain your models, social media != news)
? Assume everything is in beta
(error rates compound, translate last,
consult the research literature)
? Evaluation is essential
(human judges,^gold standard ̄ data,
cross-validation, appropriate metrics)
Resources
(toolkits)
Stanford CoreNLP
Java, GPL
Apache OpenNLP
Java,Apache License
NLTK
Python,Apache License
Resources
(books)
Natural Language
Processing with Python
Bird, Klein, and Loper
Speech and Language______________
Processing______________
Jurafsky and Martin______________
Foundations of Statistical
Natural Language Processing
Manning and Sch┨tze
Resources
(groups)
ACL (Association for
Computational Linguistics)
Conferences,Workshops, Journals, SIGs
DC NLP
NLP Meetups
Data Community DC
NLPWorkshops
Questions?
Charlie Greenbacker
dcnlp.org
@greenbacker

More Related Content

Natural Language Processing Crash Course