際際滷

際際滷Share a Scribd company logo
LEARNING FROM SETS
ANDREW CLEGG
IN A NUTSHELL
ABOUT ME
? Yelp (starting next week!)
? Etsy, Pearson, Last.fm,
AstraZeneca, consulting
? Bioinformatics, information
retrieval, natural language
processing (UCL/Birkbeck)
? Main interests: search,
recommendations,
personalization
? @andrew_clegg
? http://andrewclegg.org/
LEARNING DEEP
REPRESENTATIONS FOR
UNORDERED ITEM SETS
LEARNING FROM ITEM COLLECTIONS
PROBLEM STATEMENT
? A lot of real-world data consists of collections of objects
? User¨s session on a website (list of events)
? Products in a shopping cart (bag of items)
? Product titles (list of words)
? Songs played in a user¨s history (list of items)
? Movies liked in a user¨s signup flow (set of items)
LEARNING FROM ITEM COLLECTIONS
PROBLEM STATEMENT
? A lot of real-world data consists of collections of objects
? User¨s session on a website (list of events) ! ORDERED
? Products in a shopping cart (bag of items) ! ORDERED OR NOT
? Product titles (list of words) ! ORDERED´ OR NOT?
? Songs played in a user¨s history (list of items) ! ORDERED
? Movies liked in a user¨s signup flow (set of items) ! UNORDERED
LEARNING FROM ITEM COLLECTIONS
PROBLEM STATEMENT
? Learning representations for variable-length sequences is ^easy ̄
? RNNs, LSTMs, GRUs
? Input = sequence of embeddings
? Output = embedding for whole sequence
? Very effective but not always the cheapest or easiest to train
? But what if the data is unordered?
? What if it¨s ordered, but that ordering is uninformative?
HOW CAN WE LEARN A SINGLE
EMBEDDING FROM A BAG OR
SET OF ITEM EMBEDDINGS?
(WHICH MIGHT NOT WORK VERY WELL)
REALLY SIMPLE APPROACH
? Learn item embeddings in an unsupervised manner
? e.g. ^Item2Vec ̄, Barkan & Koenigstein 2016
? word2vec (skip-gram with negative sampling) on item IDs
? Average them together to get an embedding for the set/bag
? Often used in text mining / IR as a baseline or lower bound
? e.g. ^word centroid distance ̄ from Kusner et al 2015
Embeddings
Item 05
Item 17
Item 23 Element-wise mean
Issues:
? Not task oriented
? Embeddings can¨t adapt to problem domain
? No guarantee that taking the mean is the best strategy
LEARN EMBEDDINGS WHILE TRAINING ON A TASK
NEURAL BAG-OF-ITEMS
? Common baseline in NLP tasks: neural bag-of-words
? Initialize embeddings randomly
? Or from unsupervised pre-training, or third-party data
? Take mean (or sometimes sum)
? Feed into network, update embeddings via backprop
Embeddings
Item 05
Item 17
Item 23 Element-wise mean
Output layer or rest of network
Errors propagate back into embeddings
COMPOSE EMBEDDINGS VIA NON-LINEAR TRANSFORMATIONS
DEEP AVERAGING NETWORKS
? ^Deep Unordered Composition Rivals Syntactic Methods for Text
Classification ̄ (Iyyer et al 2015)
? Developed for sentiment classification & question answering
? Proposed as a cheap alternative to recursive neural networks
? In a nutshell:
? Don¨t use mean of embeddings directly
? Take mean and pass it through some fully-connected layers
? Probably prior art somewhere?
Embeddings
Item 05
Item 17
Item 23 Element-wise mean
Output layer or rest of network
Errors propagate back into FC layers and embeddings
FC2
FC1
Activation of last FC layer is representation of whole set
! Iyyer et al
THE DEEP LAYERS OF THE DAN
AMPLIFY TINY DIFFERENCES IN THE
VECTOR AVERAGE THAT ARE
PREDICTIVE OF THE OUTPUT LABELS.
 ̄
^
^I really loved Rosamund Pike¨s performance in the movie Gone Girl ̄
^I really loved Rosamund Pike¨s performance in the movie Gone Girl ̄
liked
^I really loved Rosamund Pike¨s performance in the movie Gone Girl ̄
liked
despised
^I really loved Rosamund Pike¨s performance in the movie Gone Girl ̄
despised
All three sentences have very similar vector mean
liked
REMOVING ENTIRE EMBEDDINGS FROM THE MEAN
WORD DROPOUT
? Additional contribution: alternative dropout scheme
? Don¨t add dropout after fully-connected layers
? Instead, randomly drop words from the input sentences
? Maybe somewhat specific to sentiment and question answering?
? Most words in a sentence don¨t affect the sentiment
? Most words in a sentence don¨t describe the actual answer
DEEP AVERAGING
NETWORKS FOR
ECOMMERCE DATA
PREDICTING GROCERY RE-ORDERS
INSTACART KAGGLE CONTEST
Simplified version of task, for trying out DANs:
? Given previous order (n of ~50K products)´
? Predict what % of items in it will be re-ordered in next order
? Use only the items in the previous order (not user, metadata etc.)
TRAIN ON 2893386 SAMPLES, VALIDATE ON 321488 SAMPLES
DAN VS GRU HEAD-TO-HEAD
DAN input: unordered item IDs
Dim-50 item embedding
(2484450 trainable params)
Mean + 2x dim-50 dense ReLU layers
(5100 trainable params)
Single linear output
(51 trainable params)
GRU input: ordered item IDs
Dim-50 item embedding
(2484450 trainable params)
GRU with 25 units + ReLU activation
(5700 trainable params)
Single linear output
(26 trainable params)
TRAINED WITH ADAM (ALL DEFAULTS) ON GOOGLE GPU BOX
DAN VS GRU HEAD-TO-HEAD
DAN
Batch size: 100
MSE loss
One epoch: 4 minutes
Mean training loss: 0.0631
Validation loss: 0.0626
Competitive result in minutes
GRU
Batch size: 100
MSE loss
One epoch: 5 hours
Mean training loss: 0.0626
Validation loss: 0.0614
Slightly better result´ in hours!
DAN MATCHED GRU PERFORMANCE IN 12 MINUTES
DAN VS GRU HEAD-TO-HEAD
DANLOSS
0.0568
0.0586
0.0604
0.0622
0.0640
EPOCH
1 2 3 4 5
VALIDATION TRAINING
0.0615 「 GRU performance after 5 hours
SOME REMARKS
DAN VS GRU HEAD-TO-HEAD
? Tried `neural bag-of-items¨ (no hidden layers) for comparison
? Training time per epoch similar to DAN (few secs faster)
? Validation loss flattened out at 0.063 (worse than DAN at epoch 0)
? Not a thorough investigation ! no hyperparameter search
? No dropout, weight decay, batch norm, etc.
? Item dropout (i.e. word dropout) didn¨t seem to help
? Unlike text mining tasks, all items in bag are (potentially) important
ANY QUESTIONS?
THANKS!
? Code available on GitHub:
? andrewclegg/insta-keras
? Feel free to grab me
afterwards to chat about
anything
? Or ping me on Twitter:
? @andrew_clegg
Ad

Recommended

PDF
From Stairway to Heaven onto the Highway to Hell with Xtext
Karsten Thoms
?
PPTX
BRV CTO Summit Deep Learning Talk
Doug Chang
?
PDF
Kaggle presentation
HJ van Veen
?
PDF
Andrew Clegg, Data Scientician & Machine Learning Engine-Driver: "Deep produc...
Dataconomy Media
?
PPTX
Deep Learning
MoctardOLOULADE
?
PPTX
Deep learning Tutorial - Part II
QuantUniversity
?
PPTX
prace_days_ml_2019.pptx
RohanBorgalli
?
PPTX
prace_days_ml_2019.pptx
SreeVani74
?
PPTX
prace_days_ml_2019.pptx
ssuserf583ac
?
PPTX
Deep learning presentation
Tunde Ajose-Ismail
?
PPTX
Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech...
MLconf
?
PDF
retrieval augmentation generation presentation slide part2
ViswakarmaChakravart
?
PPTX
ML in Astronomy - Workshop 1.pptx
AstronomyClubIITBHU
?
PPTX
deeplearningpresentation-180625071236.pptx
JeetDesai14
?
PPT
deepnet-lourentzou.ppt
yang947066
?
PPT
Overview of Deep Learning and its advantage
aqib296675
?
PPT
Introduction to Deep Learning presentation
johanericka2
?
PPT
Deep learning is a subset of machine learning and AI
leradiophysicien1
?
PDF
Machine Learning for Trading
Larry Guo
?
PPTX
Deep learning to the rescue - solving long standing problems of recommender ...
Bal│zs Hidasi
?
PPT
ECCV2010: feature learning for image classification, part 4
zukun
?
PPT
Machine Learning
butest
?
PDF
[PR12] PR-036 Learning to Remember Rare Events
Taegyun Jeon
?
PDF
Life-long / Incremental Learning (DLAI D6L1 2017 UPC Deep Learning for Artifi...
Universitat Polit┬cnica de Catalunya
?
PDF
Ask Me Any Rating: A Content-based Recommender System based on Recurrent Neur...
Claudio Greco
?
PDF
Ask Me Any Rating: A Content-based Recommender System based on Recurrent Neur...
Alessandro Suglia
?
PDF
PhD Defense
Taehoon Lee
?
PDF
Lifelong / Incremental Deep Learning - Ramon Morros - UPC Barcelona 2018
Universitat Polit┬cnica de Catalunya
?
PPTX
Structural Wonderers_new and ancient.pptx
nikopapa113
?
PDF
Proposal for folders structure division in projects.pdf
Mohamed Ahmed
?

More Related Content

Similar to Applied AI - 2017-07-11 - Learning From Sets (20)

PPTX
prace_days_ml_2019.pptx
ssuserf583ac
?
PPTX
Deep learning presentation
Tunde Ajose-Ismail
?
PPTX
Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech...
MLconf
?
PDF
retrieval augmentation generation presentation slide part2
ViswakarmaChakravart
?
PPTX
ML in Astronomy - Workshop 1.pptx
AstronomyClubIITBHU
?
PPTX
deeplearningpresentation-180625071236.pptx
JeetDesai14
?
PPT
deepnet-lourentzou.ppt
yang947066
?
PPT
Overview of Deep Learning and its advantage
aqib296675
?
PPT
Introduction to Deep Learning presentation
johanericka2
?
PPT
Deep learning is a subset of machine learning and AI
leradiophysicien1
?
PDF
Machine Learning for Trading
Larry Guo
?
PPTX
Deep learning to the rescue - solving long standing problems of recommender ...
Bal│zs Hidasi
?
PPT
ECCV2010: feature learning for image classification, part 4
zukun
?
PPT
Machine Learning
butest
?
PDF
[PR12] PR-036 Learning to Remember Rare Events
Taegyun Jeon
?
PDF
Life-long / Incremental Learning (DLAI D6L1 2017 UPC Deep Learning for Artifi...
Universitat Polit┬cnica de Catalunya
?
PDF
Ask Me Any Rating: A Content-based Recommender System based on Recurrent Neur...
Claudio Greco
?
PDF
Ask Me Any Rating: A Content-based Recommender System based on Recurrent Neur...
Alessandro Suglia
?
PDF
PhD Defense
Taehoon Lee
?
PDF
Lifelong / Incremental Deep Learning - Ramon Morros - UPC Barcelona 2018
Universitat Polit┬cnica de Catalunya
?
prace_days_ml_2019.pptx
ssuserf583ac
?
Deep learning presentation
Tunde Ajose-Ismail
?
Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech...
MLconf
?
retrieval augmentation generation presentation slide part2
ViswakarmaChakravart
?
ML in Astronomy - Workshop 1.pptx
AstronomyClubIITBHU
?
deeplearningpresentation-180625071236.pptx
JeetDesai14
?
deepnet-lourentzou.ppt
yang947066
?
Overview of Deep Learning and its advantage
aqib296675
?
Introduction to Deep Learning presentation
johanericka2
?
Deep learning is a subset of machine learning and AI
leradiophysicien1
?
Machine Learning for Trading
Larry Guo
?
Deep learning to the rescue - solving long standing problems of recommender ...
Bal│zs Hidasi
?
ECCV2010: feature learning for image classification, part 4
zukun
?
Machine Learning
butest
?
[PR12] PR-036 Learning to Remember Rare Events
Taegyun Jeon
?
Life-long / Incremental Learning (DLAI D6L1 2017 UPC Deep Learning for Artifi...
Universitat Polit┬cnica de Catalunya
?
Ask Me Any Rating: A Content-based Recommender System based on Recurrent Neur...
Claudio Greco
?
Ask Me Any Rating: A Content-based Recommender System based on Recurrent Neur...
Alessandro Suglia
?
PhD Defense
Taehoon Lee
?
Lifelong / Incremental Deep Learning - Ramon Morros - UPC Barcelona 2018
Universitat Polit┬cnica de Catalunya
?

Recently uploaded (20)

PPTX
Structural Wonderers_new and ancient.pptx
nikopapa113
?
PDF
Proposal for folders structure division in projects.pdf
Mohamed Ahmed
?
PPTX
retina_biometrics ruet rajshahi bangdesh.pptx
MdRakibulIslam697135
?
PPTX
Kel.3_A_Review_on_Internet_of_Things_for_Defense_v3.pptx
Endang Saefullah
?
PDF
Structured Programming with C++ :: Kjell Backman
Shabista Imam
?
PDF
Complete guidance book of Asp.Net Web API
Shabista Imam
?
PPTX
How to Un-Obsolete Your Legacy Keypad Design
Epec Engineered Technologies
?
PDF
FSE-Journal-First-Automated code editing with search-generate-modify.pdf
cl144
?
PPT
????? ???? ????? ??? ?? ???? ??? ???????
???? ??? ?????
?
PPTX
Industrial internet of things IOT Week-3.pptx
KNaveenKumarECE
?
PDF
Validating a Citizen Observatories enabling Platform by completing a Citizen ...
Diego L┏pez-de-Ipi?a Gonz│lez-de-Artaza
?
PPTX
MATERIAL SCIENCE LECTURE NOTES FOR DIPLOMA STUDENTS
SAMEER VISHWAKARMA
?
PDF
i氏Y創_Miipher and Miipher2 .
鰻粥京晦粥皆幄塀氏芙
?
PDF
Abraham Silberschatz-Operating System Concepts (9th,2012.12).pdf
Shabista Imam
?
PDF
Rapid Prototyping for XR: Lecture 1 Introduction to Prototyping
Mark Billinghurst
?
PDF
System design handwritten notes guidance
Shabista Imam
?
PDF
Rapid Prototyping for XR: Lecture 6 - AI for Prototyping and Research Directi...
Mark Billinghurst
?
PPTX
CST413 KTU S7 CSE Machine Learning Clustering K Means Hierarchical Agglomerat...
resming1
?
PDF
Call For Papers - 17th International Conference on Wireless & Mobile Networks...
hosseinihamid192023
?
PDF
????? ?? ??????? ?????????? ????? ?????? ??? ????.pdf
???? ??? ?????
?
Structural Wonderers_new and ancient.pptx
nikopapa113
?
Proposal for folders structure division in projects.pdf
Mohamed Ahmed
?
retina_biometrics ruet rajshahi bangdesh.pptx
MdRakibulIslam697135
?
Kel.3_A_Review_on_Internet_of_Things_for_Defense_v3.pptx
Endang Saefullah
?
Structured Programming with C++ :: Kjell Backman
Shabista Imam
?
Complete guidance book of Asp.Net Web API
Shabista Imam
?
How to Un-Obsolete Your Legacy Keypad Design
Epec Engineered Technologies
?
FSE-Journal-First-Automated code editing with search-generate-modify.pdf
cl144
?
????? ???? ????? ??? ?? ???? ??? ???????
???? ??? ?????
?
Industrial internet of things IOT Week-3.pptx
KNaveenKumarECE
?
Validating a Citizen Observatories enabling Platform by completing a Citizen ...
Diego L┏pez-de-Ipi?a Gonz│lez-de-Artaza
?
MATERIAL SCIENCE LECTURE NOTES FOR DIPLOMA STUDENTS
SAMEER VISHWAKARMA
?
i氏Y創_Miipher and Miipher2 .
鰻粥京晦粥皆幄塀氏芙
?
Abraham Silberschatz-Operating System Concepts (9th,2012.12).pdf
Shabista Imam
?
Rapid Prototyping for XR: Lecture 1 Introduction to Prototyping
Mark Billinghurst
?
System design handwritten notes guidance
Shabista Imam
?
Rapid Prototyping for XR: Lecture 6 - AI for Prototyping and Research Directi...
Mark Billinghurst
?
CST413 KTU S7 CSE Machine Learning Clustering K Means Hierarchical Agglomerat...
resming1
?
Call For Papers - 17th International Conference on Wireless & Mobile Networks...
hosseinihamid192023
?
????? ?? ??????? ?????????? ????? ?????? ??? ????.pdf
???? ??? ?????
?
Ad

Applied AI - 2017-07-11 - Learning From Sets

  • 2. IN A NUTSHELL ABOUT ME ? Yelp (starting next week!) ? Etsy, Pearson, Last.fm, AstraZeneca, consulting ? Bioinformatics, information retrieval, natural language processing (UCL/Birkbeck) ? Main interests: search, recommendations, personalization ? @andrew_clegg ? http://andrewclegg.org/
  • 4. LEARNING FROM ITEM COLLECTIONS PROBLEM STATEMENT ? A lot of real-world data consists of collections of objects ? User¨s session on a website (list of events) ? Products in a shopping cart (bag of items) ? Product titles (list of words) ? Songs played in a user¨s history (list of items) ? Movies liked in a user¨s signup flow (set of items)
  • 5. LEARNING FROM ITEM COLLECTIONS PROBLEM STATEMENT ? A lot of real-world data consists of collections of objects ? User¨s session on a website (list of events) ! ORDERED ? Products in a shopping cart (bag of items) ! ORDERED OR NOT ? Product titles (list of words) ! ORDERED´ OR NOT? ? Songs played in a user¨s history (list of items) ! ORDERED ? Movies liked in a user¨s signup flow (set of items) ! UNORDERED
  • 6. LEARNING FROM ITEM COLLECTIONS PROBLEM STATEMENT ? Learning representations for variable-length sequences is ^easy ̄ ? RNNs, LSTMs, GRUs ? Input = sequence of embeddings ? Output = embedding for whole sequence ? Very effective but not always the cheapest or easiest to train ? But what if the data is unordered? ? What if it¨s ordered, but that ordering is uninformative?
  • 7. HOW CAN WE LEARN A SINGLE EMBEDDING FROM A BAG OR SET OF ITEM EMBEDDINGS?
  • 8. (WHICH MIGHT NOT WORK VERY WELL) REALLY SIMPLE APPROACH ? Learn item embeddings in an unsupervised manner ? e.g. ^Item2Vec ̄, Barkan & Koenigstein 2016 ? word2vec (skip-gram with negative sampling) on item IDs ? Average them together to get an embedding for the set/bag ? Often used in text mining / IR as a baseline or lower bound ? e.g. ^word centroid distance ̄ from Kusner et al 2015
  • 9. Embeddings Item 05 Item 17 Item 23 Element-wise mean Issues: ? Not task oriented ? Embeddings can¨t adapt to problem domain ? No guarantee that taking the mean is the best strategy
  • 10. LEARN EMBEDDINGS WHILE TRAINING ON A TASK NEURAL BAG-OF-ITEMS ? Common baseline in NLP tasks: neural bag-of-words ? Initialize embeddings randomly ? Or from unsupervised pre-training, or third-party data ? Take mean (or sometimes sum) ? Feed into network, update embeddings via backprop
  • 11. Embeddings Item 05 Item 17 Item 23 Element-wise mean Output layer or rest of network Errors propagate back into embeddings
  • 12. COMPOSE EMBEDDINGS VIA NON-LINEAR TRANSFORMATIONS DEEP AVERAGING NETWORKS ? ^Deep Unordered Composition Rivals Syntactic Methods for Text Classification ̄ (Iyyer et al 2015) ? Developed for sentiment classification & question answering ? Proposed as a cheap alternative to recursive neural networks ? In a nutshell: ? Don¨t use mean of embeddings directly ? Take mean and pass it through some fully-connected layers ? Probably prior art somewhere?
  • 13. Embeddings Item 05 Item 17 Item 23 Element-wise mean Output layer or rest of network Errors propagate back into FC layers and embeddings FC2 FC1 Activation of last FC layer is representation of whole set
  • 14. ! Iyyer et al THE DEEP LAYERS OF THE DAN AMPLIFY TINY DIFFERENCES IN THE VECTOR AVERAGE THAT ARE PREDICTIVE OF THE OUTPUT LABELS.  ̄ ^
  • 15. ^I really loved Rosamund Pike¨s performance in the movie Gone Girl ̄
  • 16. ^I really loved Rosamund Pike¨s performance in the movie Gone Girl ̄ liked
  • 17. ^I really loved Rosamund Pike¨s performance in the movie Gone Girl ̄ liked despised
  • 18. ^I really loved Rosamund Pike¨s performance in the movie Gone Girl ̄ despised All three sentences have very similar vector mean liked
  • 19. REMOVING ENTIRE EMBEDDINGS FROM THE MEAN WORD DROPOUT ? Additional contribution: alternative dropout scheme ? Don¨t add dropout after fully-connected layers ? Instead, randomly drop words from the input sentences ? Maybe somewhat specific to sentiment and question answering? ? Most words in a sentence don¨t affect the sentiment ? Most words in a sentence don¨t describe the actual answer
  • 21. PREDICTING GROCERY RE-ORDERS INSTACART KAGGLE CONTEST Simplified version of task, for trying out DANs: ? Given previous order (n of ~50K products)´ ? Predict what % of items in it will be re-ordered in next order ? Use only the items in the previous order (not user, metadata etc.)
  • 22. TRAIN ON 2893386 SAMPLES, VALIDATE ON 321488 SAMPLES DAN VS GRU HEAD-TO-HEAD DAN input: unordered item IDs Dim-50 item embedding (2484450 trainable params) Mean + 2x dim-50 dense ReLU layers (5100 trainable params) Single linear output (51 trainable params) GRU input: ordered item IDs Dim-50 item embedding (2484450 trainable params) GRU with 25 units + ReLU activation (5700 trainable params) Single linear output (26 trainable params)
  • 23. TRAINED WITH ADAM (ALL DEFAULTS) ON GOOGLE GPU BOX DAN VS GRU HEAD-TO-HEAD DAN Batch size: 100 MSE loss One epoch: 4 minutes Mean training loss: 0.0631 Validation loss: 0.0626 Competitive result in minutes GRU Batch size: 100 MSE loss One epoch: 5 hours Mean training loss: 0.0626 Validation loss: 0.0614 Slightly better result´ in hours!
  • 24. DAN MATCHED GRU PERFORMANCE IN 12 MINUTES DAN VS GRU HEAD-TO-HEAD DANLOSS 0.0568 0.0586 0.0604 0.0622 0.0640 EPOCH 1 2 3 4 5 VALIDATION TRAINING 0.0615 「 GRU performance after 5 hours
  • 25. SOME REMARKS DAN VS GRU HEAD-TO-HEAD ? Tried `neural bag-of-items¨ (no hidden layers) for comparison ? Training time per epoch similar to DAN (few secs faster) ? Validation loss flattened out at 0.063 (worse than DAN at epoch 0) ? Not a thorough investigation ! no hyperparameter search ? No dropout, weight decay, batch norm, etc. ? Item dropout (i.e. word dropout) didn¨t seem to help ? Unlike text mining tasks, all items in bag are (potentially) important
  • 26. ANY QUESTIONS? THANKS! ? Code available on GitHub: ? andrewclegg/insta-keras ? Feel free to grab me afterwards to chat about anything ? Or ping me on Twitter: ? @andrew_clegg