Identifying unbalanced support class using unlabeled feedback data

?Download as PPTX, PDF?

0 likes?47 views

The document discusses word embedding models and their ability to determine semantic similarity between words. It provides examples of calculating similarity scores between related word pairs like "image" and "photo" as well as unrelated words like "day" and "year". It also notes that "macro" is more contextually similar to "excel" than "word" based on the similarity scores.

Identifying unbalanced support class using unlabeled feedback data

The input layer consists
of the one-hot encoded
input context words for
a word window of size C
and vocabulary of size V.
Output word Y
initially one hot
encoded to the
vocabulary of
size V.

>> model.wv.similarity('image','photo')
Output : 0.72919375102299688
>> model.wv.similarity('image', 'picture')
Output : 0.81212739465811012
>> model.wv.similarity('day', 'year')
Output : 0.56522373632906497
>> model.wv.similarity('day', 'month')
Output : 0.69417721213130068
>> model.wv.similarity('word', 'macro')
Output : 0.18828742650457433
>> model.wv.similarity('excel', 'macro')
Output : 0.38306930830401564
Image, photo, pictures
have similar context
day, month ,year have
similar context
Macro is contextually
closer to Excel
compared to Word.

Cluster 1
Cluster 2
Cluster 3
Word 1 Word 2 …...
Word
#70000
Sentence 1 0 0 ….. 1
Sentence 2 0 0 ….. 0
Cluster 1 Cluster 2 …...
Cluster
#7000
Sentence 1 0 0 ….. 1
Sentence 2 0 0 ….. 0

CountVectorizer Word2Vec
F1-Score Support Support
App 1 0.68 0.84
App 2 0.59 0.73
App 3 0.62 0.85

Identifying unbalanced support class using unlabeled feedback data

6. The input layer consists of the one-hot encoded input context words for a word window of size C and vocabulary of size V. Output word Y initially one hot encoded to the vocabulary of size V.

8. >> model.wv.similarity('image','photo') Output : 0.72919375102299688 >> model.wv.similarity('image', 'picture') Output : 0.81212739465811012 >> model.wv.similarity('day', 'year') Output : 0.56522373632906497 >> model.wv.similarity('day', 'month') Output : 0.69417721213130068 >> model.wv.similarity('word', 'macro') Output : 0.18828742650457433 >> model.wv.similarity('excel', 'macro') Output : 0.38306930830401564 Image, photo, pictures have similar context day, month ,year have similar context Macro is contextually closer to Excel compared to Word.

9. Cluster 1 Cluster 2 Cluster 3 Word 1 Word 2 …... Word #70000 Sentence 1 0 0 ….. 1 Sentence 2 0 0 ….. 0 Cluster 1 Cluster 2 …... Cluster #7000 Sentence 1 0 0 ….. 1 Sentence 2 0 0 ….. 0

13. CountVectorizer Word2Vec F1-Score Support Support App 1 0.68 0.84 App 2 0.59 0.73 App 3 0.62 0.85

Editor's Notes

In addition to understanding data platforms and data science technologies and tools, it is important to have an efficient process in order to improve the efficiency of the development and deployment of data science solutions. Many enterprise data science teams today are facing challenges related to standardization, collaboration, and incorporation of a mature process into their advanced analytics solution development and deployment. In this session, we will address the process-related challenges and how to address them using Microsoft Team Data Science Process (TDSP).
This sessions is a hands-on tutorial with the following objectives: Understand the process-related challenges our enterprise customer data science teams are facing today, and how Microsoft’s Team Data Science Process (TDSP) can help Incorporate standardization, collaborative development, and DevOps practices in data science projects using the TDSP Create data science projects using the standardized TDSP structure, artifacts, and documentation templates Form collaborative data science teams, and plan and execute data science projects under an Agile development framework in Visual Studio and work in collaborative code development and review using a version control system such as Git Address DevOps practices, e.g. unit testing, continuous integration in data science projects Assess data platform and security options for data science projects Instantiate and use TDSP in Azure ML workbench (Vienna).
This sessions is a hands-on tutorial with the following objectives: Understand the process-related challenges our enterprise customer data science teams are facing today, and how Microsoft’s Team Data Science Process (TDSP) can help Incorporate standardization, collaborative development, and DevOps practices in data science projects using the TDSP Create data science projects using the standardized TDSP structure, artifacts, and documentation templates Form collaborative data science teams, and plan and execute data science projects under an Agile development framework in Visual Studio and work in collaborative code development and review using a version control system such as Git Address DevOps practices, e.g. unit testing, continuous integration in data science projects Assess data platform and security options for data science projects Instantiate and use TDSP in Azure ML workbench (Vienna).

狠狠撸

Identifying unbalanced support class using unlabeled feedback data

More Related Content

Identifying unbalanced support class using unlabeled feedback data

Editor's Notes