�ݺ�ߣ

Aadish Chopra
Natural Language Processing and
Application in Video Transcripts analysis & Survey Building

Video Transcripts
There can be various ways in which Healthcare transcripts can be transcribed.
Aadish Chopra
Doctor: How are you ? (Smiles)
Doctor: How are you ?
Doctor: Hwwrru ?

Transcripts: tf-idf
Exploratory analysis via term frequency – inverse document frequency
Through this we can know what each transcripts are talking about
Word frequency vectors can be formed
Aadish Chopra

Transcripts: Bag of Words
Two approaches can be followed:
 Word – frequency
 Manual
 Open source libraries
Aadish Chopra
Merits
• Computation is less
expensive
Demerits
• Poor in situations
where context is
meaningful

Transcripts: BOW
Open source libraries whose java implementation are available in both R and python
https://wordnet.princeton.edu/
http://mpqa.cs.pitt.edu/lexicons/subj_lexicon/
http://www.wjh.harvard.edu/~inquirer/homecat.htm
https://medium.com/mlreview/understanding-lstm-and-its-diagrams-37e2f46f1714
Aadish Chopra

Example of Bag of Words
A look into the bag of words approach
Aadish Chopra
Type Len word Stemmed Pos priorpolarity
Strongsubject 1 acrimoniously N Anypos negative
Weaksubject 1 Active N adj Positive
Strongsubject 1 Acumen N Noun Positive
Strongsubject 1 Adamant N Adj Negative
Weaksubject 1 admission N Noun positive

Word2vec and LSTM
Word2vec approach is particularly useful to understand the
meaning of words. This technique uses context words
around the center word.
LSTM technique is resource intensive and needs a GPU,
since the essential elements are memory networks
and recursive neural networks
Aadish Chopra

Video Transcripts
What can we find out ?
• Emotions : We can suggest users what kind of video it is. If we know a user’s preferences, then
using the cosine similarity technique we can recommend user what type of content a video has
• Comedy, romance, action
• Context : We can tell what a video is about
• Advertisement insertion points : Google’s biggest announcement was that advertisers will soon
be able to target viewers based on their Google search history, in addition to their viewing
behaviors which YouTube was already targeting.
• We can infer from Healthcare videos how the interaction is between a patient and a doctor
• Unusual events such as if we merge two ads in a video can easily be inferred
Aadish Chopra

Survey
Problem Statement : Focus vision has fixed number of question types for a survey.
Let us suppose a customer John comes for the first time from a Healthcare category.
After the user builds the survey we can create few more questions in that category with the help of customer
John
We can recommend questions based on the similarity using the word vectors
Or if we know the category of survey we can suggest our own custom template
For example question can be in any of the following categories
Healthcare
Market Research question
Greetings
Aadish Chopra

Recommendation Engine
We will first build a repository and then using the user’s interaction parameters will evolve our model.
So the model might suffer from cold start problems
Aadish Chopra

�ݺ�ߣ

Focus vision

More Related Content

Focus vision