際際滷

際際滷Share a Scribd company logo
Better Contextual Suggestions from ClueWeb12
Using Domain Knowledge Inferred from The Open Web
Thaer Samar
Alejandro Bellogin and Arjen P. de Vries
Our Submission
 Contextual Suggestion model:
1. Find attractions in ClueWeb12
2. Generating user profiles
3. Similarity between candidate attractions and users
4. Rank suggestion per (user, context) pair
 RQ:
Can we improve the performance of the contextual suggestions by
applying domain knowledge?
 Approach:
 Filter collection using domain knowledge to create sub-collections
 Apply same contextual suggestion model to different sub-collections
 Compare differences in effectiveness
 Two runs based on two sub-collections created differently
Creating Sub-collections (1)
 GeoFiltered sub-collection
 Applying geographical filter
 Exact mention of the given contexts
format: {City, ST} e.g., Miami, FL
 Exclude documents that mention multiple contexts
e.g., a Wikipedia page about cities in Florida state
Creating Sub-collections (2)
 TouristFiltered sub-collection
 Applying domain knowledge extracted from the structure of the
Open Web:
 Domain Oriented
 Manual list of tourist websites
{yelp, tripadvisor, wikitravel, zagat, xpedia, orbitz, and travel.yahoo}
 From ClueWeb12
 extract any document whose host in the list (TouristListFiltered)
e.g., http://www.zagat.com/miami
 Expand TouristListFiltered
 Extract outlinks
 Search for outlinks in ClueWeb12 (TouristOutlinksFiltered)
TouristFiltered sub-collection
 Attraction Oriented
 Use Foursquare API to get attractions for given contexts
 If URL is missing for the attraction, then use Google API
query: Cort辿s Restaurant Miami, FL
 For found attractions
 Get host names of their URLs (1,454 unique hosts)
 From ClueWeb12 get any document whose host from the above
(AttractionFiltered)
50 attractions per context
Format: attraction name, URL
e.g., Cort辿s Restaurant, http://cortesrestaurant.com
Miami, FL
Sub-collections Summary
ClueWeb12
733,019,372
docs
City, ST
8,883,068 docs
TouristListFiltered (175,260)
TouristOutlinksFiltered (97,678)
AttractionsFiltered (102,604)
GeoFiltered
TouristFiltered
Generating User Profiles
 For each user:
 Aggregate descriptions of attractions rated by the user
 Split the aggregated descriptions into positive and
negative profiles based on the ratings
Similarity
 Represent attractions and user profiles in weighted VSM
 Vector element <term, frequency>
 Cosine similarity
Ranked suggestions
 For each (user, context) pair
 Rank suggestions based on similarity score
 Generate titles to represent attraction:
 Extract from <title> or <header> tags
 Generate descriptions tailored to the user
 Extract content of <description> tag
 Break documents into sentences
 rank sentences based on their similarity with the user
 Concatenate until 512 bytes reached
Results (General Performance)
TouristFiltered
is better than
the GeoFiltered
Analysis (TouristFiltered vs. GeoFiltered)
 Percentage of topics where TouristFiltered is better than equal
to and worse than GeoFiltered
e.g.,
ToursitFiltered
gives better result
for 33.1% of the
judged topics
Analysis (decompose metrics dimensions )
 P@5 and MRR consider three dimensions of relevance
 Geographical (geo), description (desc) and document (doc) relevance
 Considering the desc and doc relevance only
The two runs have
almost similar
performance in the
desc and doc
dimensions
Analysis (decompose metrics evaluation )
 Considering the geo aspect only
TouristFiltered is
more
geographically
appropriate
Analysis (Effect of sub-collection parts )
 TouristFiltered sub-collection consists of three parts
 TouristListFiltered (TLF)
 TouristOutlinksFiltered (TOF)
 AttractionFiltered (AF)
 Measure how each part contributes to the performance
Major improvement in
the performance
gained after adding
the AttractionFiltered
(AF) part
Conclusions
 Applying domain knowledge about sites that are more likely to offer
attractions leads to better suggestions
 The best result are obtained when identifying attractions through
specialized services like Foursquare
 Our approach emphasized the importance of accurate geo-information for
high precision
 Reproducibility of research results:
 recommendations from Clueweb12
Future Work
 We can think of each part in TouristFiltered collection as a binary filter
 Each document in each part passes the corresponding filter
 Filter documents based on score obtained by:
 Combining different weighted filters
 Each filter can represent a different source of knowledge
Thanks!

More Related Content

Viewers also liked (10)

Please 2.14.2017
Please 2.14.2017Please 2.14.2017
Please 2.14.2017
Kevin Schafer
Huesos del-cuerpo-humano (1)Huesos del-cuerpo-humano (1)
Huesos del-cuerpo-humano (1)
Edgar Guillermo Chiliquinga Moreno
GOHA_duties
GOHA_dutiesGOHA_duties
GOHA_duties
Zeon Luis
Compilados foto maquetasCompilados foto maquetas
Compilados foto maquetas
Alejandra duque
Hari Resume (1)
Hari Resume (1)Hari Resume (1)
Hari Resume (1)
R.Hariharan RajaMohan
Kemuliaan
Kemuliaan Kemuliaan
Kemuliaan
Aji Subekti
Lekts 1
Lekts 1Lekts 1
Lekts 1
bhishgee
舒亰仄亠亠仆亳亠 亳 仄仂仆舒亢 弌
舒亰仄亠亠仆亳亠 亳 仄仂仆舒亢 弌舒亰仄亠亠仆亳亠 亳 仄仂仆舒亢 弌
舒亰仄亠亠仆亳亠 亳 仄仂仆舒亢 弌
MIPKiPK BNTU
Aku Bersyukur-Pada-Mu
Aku Bersyukur-Pada-MuAku Bersyukur-Pada-Mu
Aku Bersyukur-Pada-Mu
Aji Subekti
Kumpulan lagu lagu rohani
Kumpulan lagu lagu rohaniKumpulan lagu lagu rohani
Kumpulan lagu lagu rohani
Kortin
Huesos del-cuerpo-humano (1)Huesos del-cuerpo-humano (1)
Huesos del-cuerpo-humano (1)
Edgar Guillermo Chiliquinga Moreno
GOHA_duties
GOHA_dutiesGOHA_duties
GOHA_duties
Zeon Luis
Compilados foto maquetasCompilados foto maquetas
Compilados foto maquetas
Alejandra duque
Lekts 1
Lekts 1Lekts 1
Lekts 1
bhishgee
舒亰仄亠亠仆亳亠 亳 仄仂仆舒亢 弌
舒亰仄亠亠仆亳亠 亳 仄仂仆舒亢 弌舒亰仄亠亠仆亳亠 亳 仄仂仆舒亢 弌
舒亰仄亠亠仆亳亠 亳 仄仂仆舒亢 弌
MIPKiPK BNTU
Aku Bersyukur-Pada-Mu
Aku Bersyukur-Pada-MuAku Bersyukur-Pada-Mu
Aku Bersyukur-Pada-Mu
Aji Subekti
Kumpulan lagu lagu rohani
Kumpulan lagu lagu rohaniKumpulan lagu lagu rohani
Kumpulan lagu lagu rohani
Kortin

Similar to Contextual Suggestion 2014 (20)

Resource discovery and information sharing: reaching the 2.0 turn
Resource discovery and information sharing: reaching the 2.0 turnResource discovery and information sharing: reaching the 2.0 turn
Resource discovery and information sharing: reaching the 2.0 turn
Bonaria Biancu
Slawek Korea
Slawek KoreaSlawek Korea
Slawek Korea
Slawek
Better Contextual Suggestions by Applying Domain Knowledge
Better Contextual Suggestions by Applying Domain KnowledgeBetter Contextual Suggestions by Applying Domain Knowledge
Better Contextual Suggestions by Applying Domain Knowledge
Arjen de Vries
Evaluation Initiatives for Entity-oriented Search
Evaluation Initiatives for Entity-oriented SearchEvaluation Initiatives for Entity-oriented Search
Evaluation Initiatives for Entity-oriented Search
krisztianbalog
bridging formal semantics and social semantics on the web
bridging formal semantics and social semantics on the webbridging formal semantics and social semantics on the web
bridging formal semantics and social semantics on the web
Fabien Gandon
Controlled Vocabularies and Text Mining - Use Cases at the Goettingen State a...
Controlled Vocabularies and Text Mining - Use Cases at the Goettingen State a...Controlled Vocabularies and Text Mining - Use Cases at the Goettingen State a...
Controlled Vocabularies and Text Mining - Use Cases at the Goettingen State a...
Ralf Stockmann
Aggregation for searching complex information spaces
Aggregation for searching complex information spacesAggregation for searching complex information spaces
Aggregation for searching complex information spaces
Mounia Lalmas-Roelleke
SEMANTiCS2016 - Exploring Dynamics and Semantics of User Interests for User ...
SEMANTiCS2016 - Exploring Dynamics and Semantics of User Interests for User ...SEMANTiCS2016 - Exploring Dynamics and Semantics of User Interests for User ...
SEMANTiCS2016 - Exploring Dynamics and Semantics of User Interests for User ...
GUANGYUAN PIAO
Spotlight
SpotlightSpotlight
Spotlight
Stefano Lariccia
Social Semantic Search and Browsing
Social Semantic Search and BrowsingSocial Semantic Search and Browsing
Social Semantic Search and Browsing
Sebastian Ryszard Kruk
From federated to aggregated search
From federated to aggregated searchFrom federated to aggregated search
From federated to aggregated search
Mounia Lalmas-Roelleke
Data integration with a fa巽ade. The case of knowledge graph construction.
Data integration with a fa巽ade. The case of knowledge graph construction.Data integration with a fa巽ade. The case of knowledge graph construction.
Data integration with a fa巽ade. The case of knowledge graph construction.
Enrico Daga
Contextual Recommendation of Social Updates, a tag-based framework
Contextual Recommendation of Social Updates, a tag-based frameworkContextual Recommendation of Social Updates, a tag-based framework
Contextual Recommendation of Social Updates, a tag-based framework
Adrien Joly
Knowledge Discovery in an Agents Environment
Knowledge Discovery in an Agents EnvironmentKnowledge Discovery in an Agents Environment
Knowledge Discovery in an Agents Environment
ManjulaPatel
Fielded Sequential Dependence Model for Ad-Hoc Entity Retrieval in the Web of...
Fielded Sequential Dependence Model for Ad-Hoc Entity Retrieval in the Web of...Fielded Sequential Dependence Model for Ad-Hoc Entity Retrieval in the Web of...
Fielded Sequential Dependence Model for Ad-Hoc Entity Retrieval in the Web of...
FedorNikolaev
The Eprints Application Profile: a FRBR approach to modelling repository meta...
The Eprints Application Profile: a FRBR approach to modelling repository meta...The Eprints Application Profile: a FRBR approach to modelling repository meta...
The Eprints Application Profile: a FRBR approach to modelling repository meta...
Julie Allinson
Sem tech2013 tutorial
Sem tech2013 tutorialSem tech2013 tutorial
Sem tech2013 tutorial
Thengo Kim
Recent Trends in Semantic Search Technologies
Recent Trends in Semantic Search TechnologiesRecent Trends in Semantic Search Technologies
Recent Trends in Semantic Search Technologies
Thanh Tran
Towards Virtual Knowledge Graphs over Web APIs
Towards Virtual Knowledge Graphs over Web APIsTowards Virtual Knowledge Graphs over Web APIs
Towards Virtual Knowledge Graphs over Web APIs
Speck&Tech
Efficiently searching nearest neighbor in documents
Efficiently searching nearest neighbor in documentsEfficiently searching nearest neighbor in documents
Efficiently searching nearest neighbor in documents
eSAT Publishing House
Resource discovery and information sharing: reaching the 2.0 turn
Resource discovery and information sharing: reaching the 2.0 turnResource discovery and information sharing: reaching the 2.0 turn
Resource discovery and information sharing: reaching the 2.0 turn
Bonaria Biancu
Slawek Korea
Slawek KoreaSlawek Korea
Slawek Korea
Slawek
Better Contextual Suggestions by Applying Domain Knowledge
Better Contextual Suggestions by Applying Domain KnowledgeBetter Contextual Suggestions by Applying Domain Knowledge
Better Contextual Suggestions by Applying Domain Knowledge
Arjen de Vries
Evaluation Initiatives for Entity-oriented Search
Evaluation Initiatives for Entity-oriented SearchEvaluation Initiatives for Entity-oriented Search
Evaluation Initiatives for Entity-oriented Search
krisztianbalog
bridging formal semantics and social semantics on the web
bridging formal semantics and social semantics on the webbridging formal semantics and social semantics on the web
bridging formal semantics and social semantics on the web
Fabien Gandon
Controlled Vocabularies and Text Mining - Use Cases at the Goettingen State a...
Controlled Vocabularies and Text Mining - Use Cases at the Goettingen State a...Controlled Vocabularies and Text Mining - Use Cases at the Goettingen State a...
Controlled Vocabularies and Text Mining - Use Cases at the Goettingen State a...
Ralf Stockmann
Aggregation for searching complex information spaces
Aggregation for searching complex information spacesAggregation for searching complex information spaces
Aggregation for searching complex information spaces
Mounia Lalmas-Roelleke
SEMANTiCS2016 - Exploring Dynamics and Semantics of User Interests for User ...
SEMANTiCS2016 - Exploring Dynamics and Semantics of User Interests for User ...SEMANTiCS2016 - Exploring Dynamics and Semantics of User Interests for User ...
SEMANTiCS2016 - Exploring Dynamics and Semantics of User Interests for User ...
GUANGYUAN PIAO
Social Semantic Search and Browsing
Social Semantic Search and BrowsingSocial Semantic Search and Browsing
Social Semantic Search and Browsing
Sebastian Ryszard Kruk
From federated to aggregated search
From federated to aggregated searchFrom federated to aggregated search
From federated to aggregated search
Mounia Lalmas-Roelleke
Data integration with a fa巽ade. The case of knowledge graph construction.
Data integration with a fa巽ade. The case of knowledge graph construction.Data integration with a fa巽ade. The case of knowledge graph construction.
Data integration with a fa巽ade. The case of knowledge graph construction.
Enrico Daga
Contextual Recommendation of Social Updates, a tag-based framework
Contextual Recommendation of Social Updates, a tag-based frameworkContextual Recommendation of Social Updates, a tag-based framework
Contextual Recommendation of Social Updates, a tag-based framework
Adrien Joly
Knowledge Discovery in an Agents Environment
Knowledge Discovery in an Agents EnvironmentKnowledge Discovery in an Agents Environment
Knowledge Discovery in an Agents Environment
ManjulaPatel
Fielded Sequential Dependence Model for Ad-Hoc Entity Retrieval in the Web of...
Fielded Sequential Dependence Model for Ad-Hoc Entity Retrieval in the Web of...Fielded Sequential Dependence Model for Ad-Hoc Entity Retrieval in the Web of...
Fielded Sequential Dependence Model for Ad-Hoc Entity Retrieval in the Web of...
FedorNikolaev
The Eprints Application Profile: a FRBR approach to modelling repository meta...
The Eprints Application Profile: a FRBR approach to modelling repository meta...The Eprints Application Profile: a FRBR approach to modelling repository meta...
The Eprints Application Profile: a FRBR approach to modelling repository meta...
Julie Allinson
Sem tech2013 tutorial
Sem tech2013 tutorialSem tech2013 tutorial
Sem tech2013 tutorial
Thengo Kim
Recent Trends in Semantic Search Technologies
Recent Trends in Semantic Search TechnologiesRecent Trends in Semantic Search Technologies
Recent Trends in Semantic Search Technologies
Thanh Tran
Towards Virtual Knowledge Graphs over Web APIs
Towards Virtual Knowledge Graphs over Web APIsTowards Virtual Knowledge Graphs over Web APIs
Towards Virtual Knowledge Graphs over Web APIs
Speck&Tech
Efficiently searching nearest neighbor in documents
Efficiently searching nearest neighbor in documentsEfficiently searching nearest neighbor in documents
Efficiently searching nearest neighbor in documents
eSAT Publishing House

Recently uploaded (20)

SILICON IS AN INHIBITOR OF CERTAIN ENZYMES IN VITRO
SILICON IS AN INHIBITOR OF CERTAIN ENZYMES IN VITROSILICON IS AN INHIBITOR OF CERTAIN ENZYMES IN VITRO
SILICON IS AN INHIBITOR OF CERTAIN ENZYMES IN VITRO
Lilya BOUCELHA
AUTOSOMES , ALLOSOMES AND SEX RATIO IN HUMAN POPULATION
AUTOSOMES , ALLOSOMES AND SEX RATIO IN HUMAN POPULATIONAUTOSOMES , ALLOSOMES AND SEX RATIO IN HUMAN POPULATION
AUTOSOMES , ALLOSOMES AND SEX RATIO IN HUMAN POPULATION
Nistarini College, Purulia (W.B) India
THE APAYAO CULTURAL HERITAGE CORDILLERA 101
THE APAYAO CULTURAL HERITAGE CORDILLERA 101THE APAYAO CULTURAL HERITAGE CORDILLERA 101
THE APAYAO CULTURAL HERITAGE CORDILLERA 101
jonieclairepallayoc
Renewable energy as a sustainable solution for the future.
Renewable energy as a sustainable solution for the future.Renewable energy as a sustainable solution for the future.
Renewable energy as a sustainable solution for the future.
jitainbhatia10
Electrophoretic Technique Electro .pptx
Electrophoretic Technique Electro  .pptxElectrophoretic Technique Electro  .pptx
Electrophoretic Technique Electro .pptx
nghns4wcvc
Animal husbandry: Purpose, scope and management,dairy animals, breeds and eco...
Animal husbandry: Purpose, scope and management,dairy animals, breeds and eco...Animal husbandry: Purpose, scope and management,dairy animals, breeds and eco...
Animal husbandry: Purpose, scope and management,dairy animals, breeds and eco...
tibhathakur77
Unraveling the BETICHUMD Mechanism of CHUSOMERADUCK: A Game-Changing Paradigm...
Unraveling the BETICHUMD Mechanism of CHUSOMERADUCK: A Game-Changing Paradigm...Unraveling the BETICHUMD Mechanism of CHUSOMERADUCK: A Game-Changing Paradigm...
Unraveling the BETICHUMD Mechanism of CHUSOMERADUCK: A Game-Changing Paradigm...
jhnewshour
Energ and Energy Forms, Work, and Power | IGCSE Physics
Energ and Energy Forms, Work, and Power | IGCSE PhysicsEnerg and Energy Forms, Work, and Power | IGCSE Physics
Energ and Energy Forms, Work, and Power | IGCSE Physics
Blessing Ndazie
Automating Compression Ultrasonography of Human Thigh Tissue and Vessels via ...
Automating Compression Ultrasonography of Human Thigh Tissue and Vessels via ...Automating Compression Ultrasonography of Human Thigh Tissue and Vessels via ...
Automating Compression Ultrasonography of Human Thigh Tissue and Vessels via ...
ThrombUS+ Project
Drug evaluation Organoleptic, Microscopic, Chemical, Biological
Drug evaluation Organoleptic, Microscopic, Chemical, BiologicalDrug evaluation Organoleptic, Microscopic, Chemical, Biological
Drug evaluation Organoleptic, Microscopic, Chemical, Biological
Nistarini College, Purulia (W.B) India
biochemical mechanism of gall stone .pptx
biochemical mechanism of gall stone .pptxbiochemical mechanism of gall stone .pptx
biochemical mechanism of gall stone .pptx
Amri559698
Climate Information for Society: Attribution and Engineering
Climate Information for Society: Attribution and EngineeringClimate Information for Society: Attribution and Engineering
Climate Information for Society: Attribution and Engineering
Zachary Labe
Role of Secondary Metabolites in Defence Mechanism of Plants and its Regulation
Role of Secondary Metabolites in Defence Mechanism of Plants and its RegulationRole of Secondary Metabolites in Defence Mechanism of Plants and its Regulation
Role of Secondary Metabolites in Defence Mechanism of Plants and its Regulation
ankitverma144299
Grade 08-SCIENCE (BIOLOGY)CELL DIVISION.pptx
Grade 08-SCIENCE (BIOLOGY)CELL DIVISION.pptxGrade 08-SCIENCE (BIOLOGY)CELL DIVISION.pptx
Grade 08-SCIENCE (BIOLOGY)CELL DIVISION.pptx
MarvinAlegado
QUANTITATIVE GENETICS PART 2.pdf agriculture
QUANTITATIVE GENETICS PART 2.pdf agricultureQUANTITATIVE GENETICS PART 2.pdf agriculture
QUANTITATIVE GENETICS PART 2.pdf agriculture
KushiBhatia
Electrical Quantities and Circuits | IGCSE Physics
Electrical Quantities and Circuits | IGCSE PhysicsElectrical Quantities and Circuits | IGCSE Physics
Electrical Quantities and Circuits | IGCSE Physics
Blessing Ndazie
Preparing Ultrasound Imaging Data for Artificial Intelligence Tasks: Anonymis...
Preparing Ultrasound Imaging Data for Artificial Intelligence Tasks: Anonymis...Preparing Ultrasound Imaging Data for Artificial Intelligence Tasks: Anonymis...
Preparing Ultrasound Imaging Data for Artificial Intelligence Tasks: Anonymis...
ThrombUS+ Project
Parasitology Practical Book .pdf Biomedical science
Parasitology Practical Book .pdf Biomedical scienceParasitology Practical Book .pdf Biomedical science
Parasitology Practical Book .pdf Biomedical science
saihetharan
Difference between Prokaryotic cell and Eukaryotic cell.pptx
Difference between Prokaryotic cell and Eukaryotic cell.pptxDifference between Prokaryotic cell and Eukaryotic cell.pptx
Difference between Prokaryotic cell and Eukaryotic cell.pptx
DrSulabhaDeokar
Transgenic Sheep and high quality wool production.pptx
Transgenic Sheep and high quality wool production.pptxTransgenic Sheep and high quality wool production.pptx
Transgenic Sheep and high quality wool production.pptx
PSG College of Technology
SILICON IS AN INHIBITOR OF CERTAIN ENZYMES IN VITRO
SILICON IS AN INHIBITOR OF CERTAIN ENZYMES IN VITROSILICON IS AN INHIBITOR OF CERTAIN ENZYMES IN VITRO
SILICON IS AN INHIBITOR OF CERTAIN ENZYMES IN VITRO
Lilya BOUCELHA
THE APAYAO CULTURAL HERITAGE CORDILLERA 101
THE APAYAO CULTURAL HERITAGE CORDILLERA 101THE APAYAO CULTURAL HERITAGE CORDILLERA 101
THE APAYAO CULTURAL HERITAGE CORDILLERA 101
jonieclairepallayoc
Renewable energy as a sustainable solution for the future.
Renewable energy as a sustainable solution for the future.Renewable energy as a sustainable solution for the future.
Renewable energy as a sustainable solution for the future.
jitainbhatia10
Electrophoretic Technique Electro .pptx
Electrophoretic Technique Electro  .pptxElectrophoretic Technique Electro  .pptx
Electrophoretic Technique Electro .pptx
nghns4wcvc
Animal husbandry: Purpose, scope and management,dairy animals, breeds and eco...
Animal husbandry: Purpose, scope and management,dairy animals, breeds and eco...Animal husbandry: Purpose, scope and management,dairy animals, breeds and eco...
Animal husbandry: Purpose, scope and management,dairy animals, breeds and eco...
tibhathakur77
Unraveling the BETICHUMD Mechanism of CHUSOMERADUCK: A Game-Changing Paradigm...
Unraveling the BETICHUMD Mechanism of CHUSOMERADUCK: A Game-Changing Paradigm...Unraveling the BETICHUMD Mechanism of CHUSOMERADUCK: A Game-Changing Paradigm...
Unraveling the BETICHUMD Mechanism of CHUSOMERADUCK: A Game-Changing Paradigm...
jhnewshour
Energ and Energy Forms, Work, and Power | IGCSE Physics
Energ and Energy Forms, Work, and Power | IGCSE PhysicsEnerg and Energy Forms, Work, and Power | IGCSE Physics
Energ and Energy Forms, Work, and Power | IGCSE Physics
Blessing Ndazie
Automating Compression Ultrasonography of Human Thigh Tissue and Vessels via ...
Automating Compression Ultrasonography of Human Thigh Tissue and Vessels via ...Automating Compression Ultrasonography of Human Thigh Tissue and Vessels via ...
Automating Compression Ultrasonography of Human Thigh Tissue and Vessels via ...
ThrombUS+ Project
biochemical mechanism of gall stone .pptx
biochemical mechanism of gall stone .pptxbiochemical mechanism of gall stone .pptx
biochemical mechanism of gall stone .pptx
Amri559698
Climate Information for Society: Attribution and Engineering
Climate Information for Society: Attribution and EngineeringClimate Information for Society: Attribution and Engineering
Climate Information for Society: Attribution and Engineering
Zachary Labe
Role of Secondary Metabolites in Defence Mechanism of Plants and its Regulation
Role of Secondary Metabolites in Defence Mechanism of Plants and its RegulationRole of Secondary Metabolites in Defence Mechanism of Plants and its Regulation
Role of Secondary Metabolites in Defence Mechanism of Plants and its Regulation
ankitverma144299
Grade 08-SCIENCE (BIOLOGY)CELL DIVISION.pptx
Grade 08-SCIENCE (BIOLOGY)CELL DIVISION.pptxGrade 08-SCIENCE (BIOLOGY)CELL DIVISION.pptx
Grade 08-SCIENCE (BIOLOGY)CELL DIVISION.pptx
MarvinAlegado
QUANTITATIVE GENETICS PART 2.pdf agriculture
QUANTITATIVE GENETICS PART 2.pdf agricultureQUANTITATIVE GENETICS PART 2.pdf agriculture
QUANTITATIVE GENETICS PART 2.pdf agriculture
KushiBhatia
Electrical Quantities and Circuits | IGCSE Physics
Electrical Quantities and Circuits | IGCSE PhysicsElectrical Quantities and Circuits | IGCSE Physics
Electrical Quantities and Circuits | IGCSE Physics
Blessing Ndazie
Preparing Ultrasound Imaging Data for Artificial Intelligence Tasks: Anonymis...
Preparing Ultrasound Imaging Data for Artificial Intelligence Tasks: Anonymis...Preparing Ultrasound Imaging Data for Artificial Intelligence Tasks: Anonymis...
Preparing Ultrasound Imaging Data for Artificial Intelligence Tasks: Anonymis...
ThrombUS+ Project
Parasitology Practical Book .pdf Biomedical science
Parasitology Practical Book .pdf Biomedical scienceParasitology Practical Book .pdf Biomedical science
Parasitology Practical Book .pdf Biomedical science
saihetharan
Difference between Prokaryotic cell and Eukaryotic cell.pptx
Difference between Prokaryotic cell and Eukaryotic cell.pptxDifference between Prokaryotic cell and Eukaryotic cell.pptx
Difference between Prokaryotic cell and Eukaryotic cell.pptx
DrSulabhaDeokar
Transgenic Sheep and high quality wool production.pptx
Transgenic Sheep and high quality wool production.pptxTransgenic Sheep and high quality wool production.pptx
Transgenic Sheep and high quality wool production.pptx
PSG College of Technology

Contextual Suggestion 2014

  • 1. Better Contextual Suggestions from ClueWeb12 Using Domain Knowledge Inferred from The Open Web Thaer Samar Alejandro Bellogin and Arjen P. de Vries
  • 2. Our Submission Contextual Suggestion model: 1. Find attractions in ClueWeb12 2. Generating user profiles 3. Similarity between candidate attractions and users 4. Rank suggestion per (user, context) pair RQ: Can we improve the performance of the contextual suggestions by applying domain knowledge? Approach: Filter collection using domain knowledge to create sub-collections Apply same contextual suggestion model to different sub-collections Compare differences in effectiveness Two runs based on two sub-collections created differently
  • 3. Creating Sub-collections (1) GeoFiltered sub-collection Applying geographical filter Exact mention of the given contexts format: {City, ST} e.g., Miami, FL Exclude documents that mention multiple contexts e.g., a Wikipedia page about cities in Florida state
  • 4. Creating Sub-collections (2) TouristFiltered sub-collection Applying domain knowledge extracted from the structure of the Open Web: Domain Oriented Manual list of tourist websites {yelp, tripadvisor, wikitravel, zagat, xpedia, orbitz, and travel.yahoo} From ClueWeb12 extract any document whose host in the list (TouristListFiltered) e.g., http://www.zagat.com/miami Expand TouristListFiltered Extract outlinks Search for outlinks in ClueWeb12 (TouristOutlinksFiltered)
  • 5. TouristFiltered sub-collection Attraction Oriented Use Foursquare API to get attractions for given contexts If URL is missing for the attraction, then use Google API query: Cort辿s Restaurant Miami, FL For found attractions Get host names of their URLs (1,454 unique hosts) From ClueWeb12 get any document whose host from the above (AttractionFiltered) 50 attractions per context Format: attraction name, URL e.g., Cort辿s Restaurant, http://cortesrestaurant.com Miami, FL
  • 6. Sub-collections Summary ClueWeb12 733,019,372 docs City, ST 8,883,068 docs TouristListFiltered (175,260) TouristOutlinksFiltered (97,678) AttractionsFiltered (102,604) GeoFiltered TouristFiltered
  • 7. Generating User Profiles For each user: Aggregate descriptions of attractions rated by the user Split the aggregated descriptions into positive and negative profiles based on the ratings
  • 8. Similarity Represent attractions and user profiles in weighted VSM Vector element <term, frequency> Cosine similarity
  • 9. Ranked suggestions For each (user, context) pair Rank suggestions based on similarity score Generate titles to represent attraction: Extract from <title> or <header> tags Generate descriptions tailored to the user Extract content of <description> tag Break documents into sentences rank sentences based on their similarity with the user Concatenate until 512 bytes reached
  • 11. Analysis (TouristFiltered vs. GeoFiltered) Percentage of topics where TouristFiltered is better than equal to and worse than GeoFiltered e.g., ToursitFiltered gives better result for 33.1% of the judged topics
  • 12. Analysis (decompose metrics dimensions ) P@5 and MRR consider three dimensions of relevance Geographical (geo), description (desc) and document (doc) relevance Considering the desc and doc relevance only The two runs have almost similar performance in the desc and doc dimensions
  • 13. Analysis (decompose metrics evaluation ) Considering the geo aspect only TouristFiltered is more geographically appropriate
  • 14. Analysis (Effect of sub-collection parts ) TouristFiltered sub-collection consists of three parts TouristListFiltered (TLF) TouristOutlinksFiltered (TOF) AttractionFiltered (AF) Measure how each part contributes to the performance Major improvement in the performance gained after adding the AttractionFiltered (AF) part
  • 15. Conclusions Applying domain knowledge about sites that are more likely to offer attractions leads to better suggestions The best result are obtained when identifying attractions through specialized services like Foursquare Our approach emphasized the importance of accurate geo-information for high precision Reproducibility of research results: recommendations from Clueweb12
  • 16. Future Work We can think of each part in TouristFiltered collection as a binary filter Each document in each part passes the corresponding filter Filter documents based on score obtained by: Combining different weighted filters Each filter can represent a different source of knowledge