To stand out from the crowd, sellers employ creative, sometimes disruptive titles for their products in online stores to improve their search relevancy or attract the attention of customers. As a part of the CIKM AnalytiCup 2017, the challenge is to build a product title quality model that can automatically grade the clarity and the conciseness of a product title. Our proposed Bagging Model for Product Title Quality with Noise could leave others behind in performance and become the winner of the CIKM Cup 2017 competition.
1 of 22
Download to read offline
More Related Content
CIKM AnalytiCup 2017: Bagging Model for Product Title Quality with Noise
3. Team Members
Tam T. Nguyen
nthanhtam@gmail.com
Postdoctoral Research Fellow
Ryerson University
Kaggle Grandmaster
Hossein Fani
hosseinfani@gmail.com
PhD Student
University of New Brunswick
Gilberto Titericz
giba1978@gmail.com
Machine Learning Expert
AirBnb Inc.
Kaggle Grandmaster
Ebrahim Bagheri
ebrahim.bagheri@gmail.com
Associate Professor
Ryerson University
5. hot sexy red clutch rug sack travel backpack unisex cheap with free gift
1
clarity
2
conciseness
Hot Sexy Tom Clovers Womens Mens Classy Look Cool Simple Style Casual
Canvas Crossbody Messenger Bag Handbag Fashion Bag Tote Handbag Gray
Problem Setting
7. Clarity if within five seconds one can understand the title, what the product is, and quickly figure out the key
attributes (color, size, model, ...).
Conciseness if it is short enough to contain all the necessary information. Otherwise, i.e., the title is
too long with many unnecessary words, Or it is too short such that it is unsure what the product is.
Data Set
10. 1. Cleansing
Noise
Html tags in short_description (%94)
Missing Values
product_type (less than %1)
category_lvl_3 (about %6) assign category_lvl_2
description (less than %1)
Outliers
price {-1, 999999, 9999999},
price Normalization based on country
19. 10-Fold Set 1 10-Fold Set 2 10-Fold Set 3 10-Fold Set 4
Base Model
Ensemble Model
Final Prediction
Fold Bagging
Fold Bagging
Set Fold Bagging
BLENDBLEND BLEND BLENDSTACK STACK STACK STACK
BLENDBLEND BLEND BLEND
BLEND
Bagging Models
#4: On Lazada, we have millions of products across thousands of categories.
To stand out from the crowd, sellers employ creative, sometimes disruptive efforts to improve their search relevancy or attract the attention of customers.
Product titles like this degenerate user experience by cluttering the site with irrelevant, misleading titles.In this challenge, we provide you with a set of product titles, description, and attributes, together with the associated title quality scores (clarity and conciseness) as labeled by our internal QC team.
Your task is to build a product title quality model that can automatically grade the clarity and the conciseness of a product title.judging a book by its cover
#6: On Lazada, we have millions of products across thousands of categories.
To stand out from the crowd, sellers employ creative, sometimes disruptive efforts to improve their search relevancy or attract the attention of customers.
Product titles like this degenerate user experience by cluttering the site with irrelevant, misleading titles.In this challenge, we provide you with a set of product titles, description, and attributes, together with the associated title quality scores (clarity and conciseness) as labeled by our internal QC team.
Your task is to build a product title quality model that can automatically grade the clarity and the conciseness of a product title.judging a book by its cover
#15: Contraposition
Use one target as a feature for the other one. But has problem in practice since we dont have the validation or test sets label.
#17: Plus the attributes, we extract more features from the textual attributes, title and short_description
stability selection
recursive feature elimination and cross-validation