ºÝºÝߣ

ºÝºÝߣShare a Scribd company logo
Unsupervised Sentence-embeddings by Manifold
Approximation and Projection
Deep Kayal
deep.kayal@pm.me
Setting the tone
Modern NLP systems are increasingly being powered by Transfer
learning
Setting the tone
Modern NLP systems are increasingly being powered by Transfer
learning
Setting the tone
But often, the downstream task is not known a-priori or adaptation
is not possible. E.g. in search
Setting the tone
But often, the downstream task is not known a-priori or adaptation
is not possible. E.g. in search
Setting the tone
In these cases we need universal sentence encoders
Pretrained model
Setting the tone
In these cases we need universal sentence encoders
Pretrained model
Setting the tone
In these cases we need universal sentence encoders
Who are you?
Where is this?
This is Amsterdam.
...
Pretrained model
Setting the tone
In these cases we need universal sentence encoders
Who are you?
Where is this?
This is Amsterdam.
...
[0.2 0.3 -0.01 0.4...]
[0.8 0.1 -0.5 0.4...]
[0.5 0.9 0.9 0.3 ...]
...
Commonly used sentence encoders
Avg word2vec
Commonly used sentence encoders
Avg word2vec
Commonly used sentence encoders
Doc2vec
Commonly used sentence encoders
Sentence BERT (BERT fine-tuned on SNLI dataset)
Related Work
Word movers distance, Matt Kusner et al.
Related Work
Word movers embeddings, Lingfei Wu et al.
Observation: Word movers distance is one of many ways to
compute distance between sets of words
Contributions of this work
Observation: Word movers distance is one of many ways to
compute distance between sets of words
Contribution 1:
Test and compare other common set-distance metrics
Contributions of this work
Contributions of this work
Observation: Word movers distance is one of many ways to
compute distance between sets of words
Contribution 1:
Test and compare other common set-distance metrics
- WMD
- Hausdorff distance
- Energy distance
Contributions of this work
Observation: Using a set-distance metric, we can construct a
neighbourhood graph using sentences and these distances
Contributions of this work
Observation: Using a set-distance metric, we can construct a
neighbourhood graph using sentences and these distances
Contribution 2:
Generate fixed-dimensional embeddings such they preserve the
above neighbourhood graph
Contributions of this work
Observation: Using a set-distance metric, we can construct a
neighbourhood graph using sentences and these distances
Contribution 2:
Generate fixed-dimensional embeddings such they preserve the
above neighbourhood graph
- Universal manifold approximation and projection (UMAP)
Distance metrics
WMD
Distance metrics
Hausdorff distance
Distance metrics
Energy distance
Steps to generate embeddings
Make approximate nearest neighbours graph
Steps to generate embeddings
Generate initial low dimensional graph and minimize cross entropy
between the two representations
Steps to generate embeddings
Points on low dimensional graphs are the desired embeddings
Evaluation
Sentence classification task on 6 datasets
Experimental Settings
First test:
- Use kNN with the set-distances to classify sentences directly
Experimental Settings
First test:
- Use kNN with the set-distances to classify sentences directly
- Versus, our method of generating embeddings using the
neighbourhood graph
- We use a linear SVM with the generated embeddings
Experimental Settings
Second test:
- Test 6 other popular approaches to produce sentence
embeddings
- Versus, our method of generating embeddings using the
neighbourhood graph
Results
Embeddings + classifier vs kNN
Results
Comparison of various embeddings
Takeaways
- We propose a novel sentence embedding mechanism
Takeaways
- We propose a novel sentence embedding mechanism
- Using set distances
Takeaways
- We propose a novel sentence embedding mechanism
- Using set distances
- And neighbourhood graph approximation
Takeaways
- We propose a novel sentence embedding mechanism
- Using set distances
- And neighbourhood graph approximation
- The embeddings are better at capturing information than the
distance metric alone
Takeaways
- We propose a novel sentence embedding mechanism
- Using set distances
- And neighbourhood graph approximation
- The embeddings are better at capturing information than the
distance metric alone
- The embeddings perform favourably as compared to various
other efficient mechanisms

More Related Content

Unsupervised sentence-embeddings by manifold approximation and projection