ºÝºÝߣ

ºÝºÝߣShare a Scribd company logo
Kohei Shinden, Atsuki Maruta, Makoto P. Kato
University of Tsukuba
KASYS at the NTCIR-15 WWW-3 Task
? NTCIR-15 WWW-3 Task
? Ad-hoc document retrieval tasks for web documents
Background 2
? Proposed search model using BERT (Birch)
? Yilmaz et al: Cross-Domain Modeling of Sentence-level
Evidence for Document Retrieval, EMNLP 2019
? BERT has been successfully applied to a broad range of NLP
tasks including document ranking tasks.
? Applying a sentence-level relevance estimator learned by QA and
microblog search datasets to ad-hoc document retrieval
Birch (Yilmaz et al, 2019) 3
1. The sentence-level relevance estimator is obtained by fine-tuning the
pre-trained BERT model with QA and microblog search data.
2. Calculate BM25 scores and BERT scores for query and document sentences.
3. Weighted sum of the BM25 and the score of the highest BERT-score
sentence in the document.
Pre-trained
BERT Model
BERT
Sentence-Level
Relevance Judgements
Model
Halloween Pictures
Datasets
Trick or Treat...
0.7
Children get candy...
0.3
Pumpkin sweets...
0.1
0.4
BERT + BM25 = 0.6
BM25
Score
BERT
Score Sentences Document
Fine-tune
? Weighted sum of the BM25 and the score of the highest
BERT-scoring sentence in the document
? Assuming that the most relevant sentences in a document are
good indicators of the document-level relevance [1]
? ?BM25(?): The BM25 score of document ?
? ?BERT(??): The sentence relevance of the top ?-th sentence obtained by BERT
? ?? : The hyper-parameter ?? is to be tuned with a validation set
Details of Birch 4
[1] Yilmaz et al: Cross-Domain Modeling of Sentence-level Evidence for Document Retrieval, EMNLP 2019
Preliminary Experiment Details 5
? Preliminary experiments to select datasets and
hyper-parameters suitable for ranking web documents
Train Validation
NTCIR-14 WWW-2
Test Collection
(with its original qrels)
Robust04 MS MARCO TREC CAR TREC MB
Model
MB ? ?
Model
CAR ? ?
Model
MS MARCO ? ?
Model
CAR ¡ú MB ? ? ?
Model
MS MARCO ¡ú MB ? ? ?
The checkmarks represent the data set used for training.
MSMARCO ¡ú MB is the best.
Thus, we submitted runs based on
MS MARCO ¡ú MB and CAR ¡ú MB.
Preliminary Experiment Results & Discussion 6
? Evaluated the prediction results of Birch models
? Top k sentences: Uses the k-sentence with the highest BERT score for ranking
0.3098 0.3112 0.3103
0.3266 0.3312 0.3318
0
0.1
0.2
0.3
0.4
0.5
BM25 MB CAR MS MARCO CAR ¡ú MB MS MARCO ¡ú MB
nDCG@10
Baseline Top 1 sentence Top 2 sentences Top 3 sentences
? MSMARCO¡úMB is the best. The CAR¡úMB model also achieved similar scores.
? The reason why MS MARCO and TREC CAR?s results are better probably
because they are web documents retrieval and have a large amount of data.
? BERT is also valid for web document retrieval.
Official Evaluation Results & Discussion 7
? Achieved the best performances in terms of
nDCG, Q and iRBU among all the participants.
KASYS-E-CO-NEW-1:
- MS MARCO¡úMB
- Top 3 sentences
KASYS-E-CO-NEW-4:
- MS MARCO¡úMB
- Top 2 sentences
KASYS-E-CO-NEW-5:
- CAR¡úMB
- Top 3 sentences
0.6935 0.7123
0.7959
0.9389
0
0.2
0.4
0.6
0.8
1
nDCG Q ERR iRBU
Baseline KASYS-E-CO-NEW-1
KASYS-E-CO-NEW-4 KASYS-E-CO-NEW-5
? Achieved the best performances in terms of
nDCG, Q and iRBU among all the participants.
? The effectiveness of BERT in ad hoc web document
retrieval tasks was verified.
? MSMARCO¡úMB is the best.
The CAR¡úMB model also
achieved similar scores.
? BERT is also valid for
web document retrieval.
Summary of NEW Runs 8
KASYS-E-CO-NEW-1:
- MS MARCO¡úMB
- Top 3 sentences
KASYS-E-CO-NEW-5:
- CAR¡úMB
- Top 3 sentences
KASYS-E-CO-NEW-4:
- MS MARCO¡úMB
- Top 2 sentences
0.6935 0.7123
0.7959
0.9389
0
0.2
0.4
0.6
0.8
1
nDCG Q ERR iRBU
Baseline KASYS-E-CO-NEW-1
KASYS-E-CO-NEW-4 KASYS-E-CO-NEW-5
REP Runs
9
Replicating and reproducing the THUIR runs
at the NTCIR 14 WWW-2 Task
Whether the results between models are consistent with each result.
THUIR KASYS(ours)
Abstract of REP runs 10
BM25 BM25
LambdaMART
(learning-to-rank model)
LambdaMART
(learning-to-rank model)
<
<
?
Replication Procedure 1 11
disney
switch
Canon
¨E¨E
Clueweb
Collection
Ranked by
BM25
algorithm
input output
Disney shop
Tokyo Disney
resort
Disney
official
¨E¨E
Ranked web documents
1st
2nd
3rd
input
Feature
extracting
program
Extracted eight features
Extracting tf, idf,
docement length, BM25,
LMIR as features
Up to BM25
LamdbaMART from here
WWW-2 and WWW-3 topics
honda
Pokemon
ice age
¨E¨E
?MQ Track : A dataset of the relevance of a topic and a document.
Replication Procedure 2 12
Re-ranked web document
Extraction
feature
program
qid:001 1:0.2 ¨E
qid:001 1:0.5 ¨E
qid:001 1:0.1 ¨E
qid:001 1:0.9 ¨E
output
¨E¨E
Extracted features from document
LambdaMART
input
MQ Track WWW-1 test
collection
train validate
Disney
official
Disney shop
Tokyo Disney
resort
1st
2nd
3rd
¨E¨E
output
? Features for learning to rank
? TF, IDF, TF-IDF, document length, BM25 score, and three
language-model-based IR scores
? The differences from original paper
? Although THUIR extracted the features from four fields (whole
document, anchor text, title, and URL), we extracted the features
from only the whole document
? Normalization is used by maximum and minimum values because
the normalization of features was not described in the original
paper
Implementation Details 13
0.43
0.44
0.45
0.46
0.47
0.48
0.49
0.5
0.51
LamdbaMART BM25
Ours Original
0.3
0.31
0.32
0.33
0.34
0.35
0.36
Preliminary Evaluation Results with Original WWW-2 qrels 14
0.28
0.29
0.3
0.31
0.32
0.33
0.34
Ours Original
nDCG@10 Q@10 nERR@10
? Our results is lower than original results
? LambdaMART results were above BM25 for all evaluation metrics
? Succeeded in reproducing the run
Ours Original
Official Evaluation Results 15
0.6
0.65
0.7
0.75
0.8
0.85
0.9
0.95
nDCG Q ERR iRBU
WWW-3 official result
LambdaMART BM25
? BM25 results were above LambdaMART for all evaluation metrics
? Failed to reproduce the run
0.5
0.55
0.6
0.65
0.7
0.75
0.8
0.85
0.9
0.95
nDCG Q ERR iRBU
WWW-2 official result
LambdaMART BM25
? In the original paper, LambdaMART gave better results than
BM25, but on the contrary, our BM25 result was better than
LambdaMART
? We failed to replicate and reproduce the original paper
Conclusion 16
Suggestions
? In web search tasks, more effective to extract features from
all fields
? Better to clarify the method of normalization in a paper
NEW runs
? Achieved the best performances in terms of nDCG, Q and iRBU among
all the participants
? The effectiveness of BERT in ad hoc web document retrieval tasks
was verified.
? MSMARCO¡úMB is the best. The CAR¡úMB model also achieved similar scores.
? BERT is also valid for web document retrieval.
REP runs
? In the original paper, LambdaMART gave better results than BM25,
but on the contrary, our BM25 result was better than LambdaMART
? We failed to replicate and reproduce the original paper
Summary of All Runs 17

More Related Content

Similar to KASYS at the NTCIR-15 WWW-3 Task (20)

NTCIR-15 www-3 kasys poster
NTCIR-15 www-3 kasys posterNTCIR-15 www-3 kasys poster
NTCIR-15 www-3 kasys poster
AtsukiMaruta
?
Bayesian Global Optimization
Bayesian Global OptimizationBayesian Global Optimization
Bayesian Global Optimization
Amazon Web Services
?
Week 12 Dimensionality Reduction Bagian 1
Week 12 Dimensionality Reduction Bagian 1Week 12 Dimensionality Reduction Bagian 1
Week 12 Dimensionality Reduction Bagian 1
khairulhuda242
?
Learning global pooling operators in deep neural networks for image retrieval...
Learning global pooling operators in deep neural networks for image retrieval...Learning global pooling operators in deep neural networks for image retrieval...
Learning global pooling operators in deep neural networks for image retrieval...
Erlangen Artificial Intelligence & Machine Learning Meetup
?
RDF Join Query Processing with Dual Simulation Pruning
RDF Join Query Processing with Dual Simulation PruningRDF Join Query Processing with Dual Simulation Pruning
RDF Join Query Processing with Dual Simulation Pruning
wajrcs
?
INF3703 - Chapter 13 Managing Database SQL Performance
INF3703 - Chapter 13 Managing Database SQL PerformanceINF3703 - Chapter 13 Managing Database SQL Performance
INF3703 - Chapter 13 Managing Database SQL Performance
bloeyyy
?
Distributed approach for Peptide Identification
Distributed approach for Peptide IdentificationDistributed approach for Peptide Identification
Distributed approach for Peptide Identification
abhinav vedanbhatla
?
Using Optimal Learning to Tune Deep Learning Pipelines
Using Optimal Learning to Tune Deep Learning PipelinesUsing Optimal Learning to Tune Deep Learning Pipelines
Using Optimal Learning to Tune Deep Learning Pipelines
SigOpt
?
Using Optimal Learning to Tune Deep Learning Pipelines
Using Optimal Learning to Tune Deep Learning PipelinesUsing Optimal Learning to Tune Deep Learning Pipelines
Using Optimal Learning to Tune Deep Learning Pipelines
Scott Clark
?
Auto-Pilot for Apache Spark Using Machine Learning
Auto-Pilot for Apache Spark Using Machine LearningAuto-Pilot for Apache Spark Using Machine Learning
Auto-Pilot for Apache Spark Using Machine Learning
Databricks
?
Amazon's Exabyte-Scale Migration from Spark to Ray
Amazon's Exabyte-Scale Migration from Spark to RayAmazon's Exabyte-Scale Migration from Spark to Ray
Amazon's Exabyte-Scale Migration from Spark to Ray
All Things Open
?
Large Scale Kernel Learning using Block Coordinate Descent
Large Scale Kernel Learning using Block Coordinate DescentLarge Scale Kernel Learning using Block Coordinate Descent
Large Scale Kernel Learning using Block Coordinate Descent
Shaleen Kumar Gupta
?
Scott Clark, Co-Founder and CEO, SigOpt at MLconf SF 2016
Scott Clark, Co-Founder and CEO, SigOpt at MLconf SF 2016Scott Clark, Co-Founder and CEO, SigOpt at MLconf SF 2016
Scott Clark, Co-Founder and CEO, SigOpt at MLconf SF 2016
MLconf
?
NTCIR15WWW3overview
NTCIR15WWW3overviewNTCIR15WWW3overview
NTCIR15WWW3overview
Tetsuya Sakai
?
An early look at the LDBC Social Network Benchmark's Business Intelligence wo...
An early look at the LDBC Social Network Benchmark's Business Intelligence wo...An early look at the LDBC Social Network Benchmark's Business Intelligence wo...
An early look at the LDBC Social Network Benchmark's Business Intelligence wo...
G¨¢bor Sz¨¢rnyas
?
Automated Context-based Question-Distractor Generation using Extractive Summa...
Automated Context-based Question-Distractor Generation using Extractive Summa...Automated Context-based Question-Distractor Generation using Extractive Summa...
Automated Context-based Question-Distractor Generation using Extractive Summa...
IRJET Journal
?
Benchmarking Elastic Cloud Big Data Services under SLA Constraints
Benchmarking Elastic Cloud Big Data Services under SLA ConstraintsBenchmarking Elastic Cloud Big Data Services under SLA Constraints
Benchmarking Elastic Cloud Big Data Services under SLA Constraints
Nicolas Poggi
?
Customer choice probabilities
Customer choice probabilitiesCustomer choice probabilities
Customer choice probabilities
Allan D. Butler
?
short presentation on caching Caching.ppt
short presentation on caching Caching.pptshort presentation on caching Caching.ppt
short presentation on caching Caching.ppt
yakashthapar2
?
Benchmarking Automated Machine Learning For Clustering
Benchmarking Automated Machine Learning For ClusteringBenchmarking Automated Machine Learning For Clustering
Benchmarking Automated Machine Learning For Clustering
biagiolicari7
?
NTCIR-15 www-3 kasys poster
NTCIR-15 www-3 kasys posterNTCIR-15 www-3 kasys poster
NTCIR-15 www-3 kasys poster
AtsukiMaruta
?
Week 12 Dimensionality Reduction Bagian 1
Week 12 Dimensionality Reduction Bagian 1Week 12 Dimensionality Reduction Bagian 1
Week 12 Dimensionality Reduction Bagian 1
khairulhuda242
?
RDF Join Query Processing with Dual Simulation Pruning
RDF Join Query Processing with Dual Simulation PruningRDF Join Query Processing with Dual Simulation Pruning
RDF Join Query Processing with Dual Simulation Pruning
wajrcs
?
INF3703 - Chapter 13 Managing Database SQL Performance
INF3703 - Chapter 13 Managing Database SQL PerformanceINF3703 - Chapter 13 Managing Database SQL Performance
INF3703 - Chapter 13 Managing Database SQL Performance
bloeyyy
?
Distributed approach for Peptide Identification
Distributed approach for Peptide IdentificationDistributed approach for Peptide Identification
Distributed approach for Peptide Identification
abhinav vedanbhatla
?
Using Optimal Learning to Tune Deep Learning Pipelines
Using Optimal Learning to Tune Deep Learning PipelinesUsing Optimal Learning to Tune Deep Learning Pipelines
Using Optimal Learning to Tune Deep Learning Pipelines
SigOpt
?
Using Optimal Learning to Tune Deep Learning Pipelines
Using Optimal Learning to Tune Deep Learning PipelinesUsing Optimal Learning to Tune Deep Learning Pipelines
Using Optimal Learning to Tune Deep Learning Pipelines
Scott Clark
?
Auto-Pilot for Apache Spark Using Machine Learning
Auto-Pilot for Apache Spark Using Machine LearningAuto-Pilot for Apache Spark Using Machine Learning
Auto-Pilot for Apache Spark Using Machine Learning
Databricks
?
Amazon's Exabyte-Scale Migration from Spark to Ray
Amazon's Exabyte-Scale Migration from Spark to RayAmazon's Exabyte-Scale Migration from Spark to Ray
Amazon's Exabyte-Scale Migration from Spark to Ray
All Things Open
?
Large Scale Kernel Learning using Block Coordinate Descent
Large Scale Kernel Learning using Block Coordinate DescentLarge Scale Kernel Learning using Block Coordinate Descent
Large Scale Kernel Learning using Block Coordinate Descent
Shaleen Kumar Gupta
?
Scott Clark, Co-Founder and CEO, SigOpt at MLconf SF 2016
Scott Clark, Co-Founder and CEO, SigOpt at MLconf SF 2016Scott Clark, Co-Founder and CEO, SigOpt at MLconf SF 2016
Scott Clark, Co-Founder and CEO, SigOpt at MLconf SF 2016
MLconf
?
An early look at the LDBC Social Network Benchmark's Business Intelligence wo...
An early look at the LDBC Social Network Benchmark's Business Intelligence wo...An early look at the LDBC Social Network Benchmark's Business Intelligence wo...
An early look at the LDBC Social Network Benchmark's Business Intelligence wo...
G¨¢bor Sz¨¢rnyas
?
Automated Context-based Question-Distractor Generation using Extractive Summa...
Automated Context-based Question-Distractor Generation using Extractive Summa...Automated Context-based Question-Distractor Generation using Extractive Summa...
Automated Context-based Question-Distractor Generation using Extractive Summa...
IRJET Journal
?
Benchmarking Elastic Cloud Big Data Services under SLA Constraints
Benchmarking Elastic Cloud Big Data Services under SLA ConstraintsBenchmarking Elastic Cloud Big Data Services under SLA Constraints
Benchmarking Elastic Cloud Big Data Services under SLA Constraints
Nicolas Poggi
?
Customer choice probabilities
Customer choice probabilitiesCustomer choice probabilities
Customer choice probabilities
Allan D. Butler
?
short presentation on caching Caching.ppt
short presentation on caching Caching.pptshort presentation on caching Caching.ppt
short presentation on caching Caching.ppt
yakashthapar2
?
Benchmarking Automated Machine Learning For Clustering
Benchmarking Automated Machine Learning For ClusteringBenchmarking Automated Machine Learning For Clustering
Benchmarking Automated Machine Learning For Clustering
biagiolicari7
?

Recently uploaded (20)

EaseUS Partition Master Crack 2025 + Serial Key
EaseUS Partition Master Crack 2025 + Serial KeyEaseUS Partition Master Crack 2025 + Serial Key
EaseUS Partition Master Crack 2025 + Serial Key
kherorpacca127
?
Cloud of everything Tech of the 21 century in Aviation
Cloud of everything Tech of the 21 century in AviationCloud of everything Tech of the 21 century in Aviation
Cloud of everything Tech of the 21 century in Aviation
Assem mousa
?
How Discord Indexes Trillions of Messages: Scaling Search Infrastructure by V...
How Discord Indexes Trillions of Messages: Scaling Search Infrastructure by V...How Discord Indexes Trillions of Messages: Scaling Search Infrastructure by V...
How Discord Indexes Trillions of Messages: Scaling Search Infrastructure by V...
ScyllaDB
?
DAO UTokyo 2025 DLT mass adoption case studies IBM Tsuyoshi Hirayama (ƽɽÒã)
DAO UTokyo 2025 DLT mass adoption case studies IBM Tsuyoshi Hirayama (ƽɽÒã)DAO UTokyo 2025 DLT mass adoption case studies IBM Tsuyoshi Hirayama (ƽɽÒã)
DAO UTokyo 2025 DLT mass adoption case studies IBM Tsuyoshi Hirayama (ƽɽÒã)
Tsuyoshi Hirayama
?
DealBook of Ukraine: 2025 edition | AVentures Capital
DealBook of Ukraine: 2025 edition | AVentures CapitalDealBook of Ukraine: 2025 edition | AVentures Capital
DealBook of Ukraine: 2025 edition | AVentures Capital
Yevgen Sysoyev
?
Technology use over time and its impact on consumers and businesses.pptx
Technology use over time and its impact on consumers and businesses.pptxTechnology use over time and its impact on consumers and businesses.pptx
Technology use over time and its impact on consumers and businesses.pptx
kaylagaze
?
Backstage Software Templates for Java Developers
Backstage Software Templates for Java DevelopersBackstage Software Templates for Java Developers
Backstage Software Templates for Java Developers
Markus Eisele
?
Build with AI on Google Cloud Session #4
Build with AI on Google Cloud Session #4Build with AI on Google Cloud Session #4
Build with AI on Google Cloud Session #4
Margaret Maynard-Reid
?
Q4 2024 Earnings and Investor Presentation
Q4 2024 Earnings and Investor PresentationQ4 2024 Earnings and Investor Presentation
Q4 2024 Earnings and Investor Presentation
Dropbox
?
DevNexus - Building 10x Development Organizations.pdf
DevNexus - Building 10x Development Organizations.pdfDevNexus - Building 10x Development Organizations.pdf
DevNexus - Building 10x Development Organizations.pdf
Justin Reock
?
Wondershare Dr.Fone Crack Free Download 2025
Wondershare Dr.Fone Crack Free Download 2025Wondershare Dr.Fone Crack Free Download 2025
Wondershare Dr.Fone Crack Free Download 2025
maharajput103
?
BoxLang JVM Language : The Future is Dynamic
BoxLang JVM Language : The Future is DynamicBoxLang JVM Language : The Future is Dynamic
BoxLang JVM Language : The Future is Dynamic
Ortus Solutions, Corp
?
SMART SENTRY CYBER THREAT INTELLIGENCE IN IIOT
SMART SENTRY CYBER THREAT INTELLIGENCE IN IIOTSMART SENTRY CYBER THREAT INTELLIGENCE IN IIOT
SMART SENTRY CYBER THREAT INTELLIGENCE IN IIOT
TanmaiArni
?
FinTech - US Annual Funding Report - 2024.pptx
FinTech - US Annual Funding Report - 2024.pptxFinTech - US Annual Funding Report - 2024.pptx
FinTech - US Annual Funding Report - 2024.pptx
Tracxn
?
30B Images and Counting: Scaling Canva's Content-Understanding Pipelines by K...
30B Images and Counting: Scaling Canva's Content-Understanding Pipelines by K...30B Images and Counting: Scaling Canva's Content-Understanding Pipelines by K...
30B Images and Counting: Scaling Canva's Content-Understanding Pipelines by K...
ScyllaDB
?
Formal Methods: Whence and Whither? [Martin Fr?nzle Festkolloquium, 2025]
Formal Methods: Whence and Whither? [Martin Fr?nzle Festkolloquium, 2025]Formal Methods: Whence and Whither? [Martin Fr?nzle Festkolloquium, 2025]
Formal Methods: Whence and Whither? [Martin Fr?nzle Festkolloquium, 2025]
Jonathan Bowen
?
Revolutionizing-Government-Communication-The-OSWAN-Success-Story
Revolutionizing-Government-Communication-The-OSWAN-Success-StoryRevolutionizing-Government-Communication-The-OSWAN-Success-Story
Revolutionizing-Government-Communication-The-OSWAN-Success-Story
ssuser52ad5e
?
Wondershare Filmora Crack 14.3.2.11147 Latest
Wondershare Filmora Crack 14.3.2.11147 LatestWondershare Filmora Crack 14.3.2.11147 Latest
Wondershare Filmora Crack 14.3.2.11147 Latest
udkg888
?
Unlocking DevOps Secuirty :Vault & Keylock
Unlocking DevOps Secuirty :Vault & KeylockUnlocking DevOps Secuirty :Vault & Keylock
Unlocking DevOps Secuirty :Vault & Keylock
HusseinMalikMammadli
?
Q4_TLE-7-Lesson-6-Week-6.pptx 4th quarter
Q4_TLE-7-Lesson-6-Week-6.pptx 4th quarterQ4_TLE-7-Lesson-6-Week-6.pptx 4th quarter
Q4_TLE-7-Lesson-6-Week-6.pptx 4th quarter
MariaBarbaraPaglinaw
?
EaseUS Partition Master Crack 2025 + Serial Key
EaseUS Partition Master Crack 2025 + Serial KeyEaseUS Partition Master Crack 2025 + Serial Key
EaseUS Partition Master Crack 2025 + Serial Key
kherorpacca127
?
Cloud of everything Tech of the 21 century in Aviation
Cloud of everything Tech of the 21 century in AviationCloud of everything Tech of the 21 century in Aviation
Cloud of everything Tech of the 21 century in Aviation
Assem mousa
?
How Discord Indexes Trillions of Messages: Scaling Search Infrastructure by V...
How Discord Indexes Trillions of Messages: Scaling Search Infrastructure by V...How Discord Indexes Trillions of Messages: Scaling Search Infrastructure by V...
How Discord Indexes Trillions of Messages: Scaling Search Infrastructure by V...
ScyllaDB
?
DAO UTokyo 2025 DLT mass adoption case studies IBM Tsuyoshi Hirayama (ƽɽÒã)
DAO UTokyo 2025 DLT mass adoption case studies IBM Tsuyoshi Hirayama (ƽɽÒã)DAO UTokyo 2025 DLT mass adoption case studies IBM Tsuyoshi Hirayama (ƽɽÒã)
DAO UTokyo 2025 DLT mass adoption case studies IBM Tsuyoshi Hirayama (ƽɽÒã)
Tsuyoshi Hirayama
?
DealBook of Ukraine: 2025 edition | AVentures Capital
DealBook of Ukraine: 2025 edition | AVentures CapitalDealBook of Ukraine: 2025 edition | AVentures Capital
DealBook of Ukraine: 2025 edition | AVentures Capital
Yevgen Sysoyev
?
Technology use over time and its impact on consumers and businesses.pptx
Technology use over time and its impact on consumers and businesses.pptxTechnology use over time and its impact on consumers and businesses.pptx
Technology use over time and its impact on consumers and businesses.pptx
kaylagaze
?
Backstage Software Templates for Java Developers
Backstage Software Templates for Java DevelopersBackstage Software Templates for Java Developers
Backstage Software Templates for Java Developers
Markus Eisele
?
Build with AI on Google Cloud Session #4
Build with AI on Google Cloud Session #4Build with AI on Google Cloud Session #4
Build with AI on Google Cloud Session #4
Margaret Maynard-Reid
?
Q4 2024 Earnings and Investor Presentation
Q4 2024 Earnings and Investor PresentationQ4 2024 Earnings and Investor Presentation
Q4 2024 Earnings and Investor Presentation
Dropbox
?
DevNexus - Building 10x Development Organizations.pdf
DevNexus - Building 10x Development Organizations.pdfDevNexus - Building 10x Development Organizations.pdf
DevNexus - Building 10x Development Organizations.pdf
Justin Reock
?
Wondershare Dr.Fone Crack Free Download 2025
Wondershare Dr.Fone Crack Free Download 2025Wondershare Dr.Fone Crack Free Download 2025
Wondershare Dr.Fone Crack Free Download 2025
maharajput103
?
BoxLang JVM Language : The Future is Dynamic
BoxLang JVM Language : The Future is DynamicBoxLang JVM Language : The Future is Dynamic
BoxLang JVM Language : The Future is Dynamic
Ortus Solutions, Corp
?
SMART SENTRY CYBER THREAT INTELLIGENCE IN IIOT
SMART SENTRY CYBER THREAT INTELLIGENCE IN IIOTSMART SENTRY CYBER THREAT INTELLIGENCE IN IIOT
SMART SENTRY CYBER THREAT INTELLIGENCE IN IIOT
TanmaiArni
?
FinTech - US Annual Funding Report - 2024.pptx
FinTech - US Annual Funding Report - 2024.pptxFinTech - US Annual Funding Report - 2024.pptx
FinTech - US Annual Funding Report - 2024.pptx
Tracxn
?
30B Images and Counting: Scaling Canva's Content-Understanding Pipelines by K...
30B Images and Counting: Scaling Canva's Content-Understanding Pipelines by K...30B Images and Counting: Scaling Canva's Content-Understanding Pipelines by K...
30B Images and Counting: Scaling Canva's Content-Understanding Pipelines by K...
ScyllaDB
?
Formal Methods: Whence and Whither? [Martin Fr?nzle Festkolloquium, 2025]
Formal Methods: Whence and Whither? [Martin Fr?nzle Festkolloquium, 2025]Formal Methods: Whence and Whither? [Martin Fr?nzle Festkolloquium, 2025]
Formal Methods: Whence and Whither? [Martin Fr?nzle Festkolloquium, 2025]
Jonathan Bowen
?
Revolutionizing-Government-Communication-The-OSWAN-Success-Story
Revolutionizing-Government-Communication-The-OSWAN-Success-StoryRevolutionizing-Government-Communication-The-OSWAN-Success-Story
Revolutionizing-Government-Communication-The-OSWAN-Success-Story
ssuser52ad5e
?
Wondershare Filmora Crack 14.3.2.11147 Latest
Wondershare Filmora Crack 14.3.2.11147 LatestWondershare Filmora Crack 14.3.2.11147 Latest
Wondershare Filmora Crack 14.3.2.11147 Latest
udkg888
?
Unlocking DevOps Secuirty :Vault & Keylock
Unlocking DevOps Secuirty :Vault & KeylockUnlocking DevOps Secuirty :Vault & Keylock
Unlocking DevOps Secuirty :Vault & Keylock
HusseinMalikMammadli
?
Q4_TLE-7-Lesson-6-Week-6.pptx 4th quarter
Q4_TLE-7-Lesson-6-Week-6.pptx 4th quarterQ4_TLE-7-Lesson-6-Week-6.pptx 4th quarter
Q4_TLE-7-Lesson-6-Week-6.pptx 4th quarter
MariaBarbaraPaglinaw
?

KASYS at the NTCIR-15 WWW-3 Task

  • 1. Kohei Shinden, Atsuki Maruta, Makoto P. Kato University of Tsukuba KASYS at the NTCIR-15 WWW-3 Task
  • 2. ? NTCIR-15 WWW-3 Task ? Ad-hoc document retrieval tasks for web documents Background 2 ? Proposed search model using BERT (Birch) ? Yilmaz et al: Cross-Domain Modeling of Sentence-level Evidence for Document Retrieval, EMNLP 2019 ? BERT has been successfully applied to a broad range of NLP tasks including document ranking tasks.
  • 3. ? Applying a sentence-level relevance estimator learned by QA and microblog search datasets to ad-hoc document retrieval Birch (Yilmaz et al, 2019) 3 1. The sentence-level relevance estimator is obtained by fine-tuning the pre-trained BERT model with QA and microblog search data. 2. Calculate BM25 scores and BERT scores for query and document sentences. 3. Weighted sum of the BM25 and the score of the highest BERT-score sentence in the document. Pre-trained BERT Model BERT Sentence-Level Relevance Judgements Model Halloween Pictures Datasets Trick or Treat... 0.7 Children get candy... 0.3 Pumpkin sweets... 0.1 0.4 BERT + BM25 = 0.6 BM25 Score BERT Score Sentences Document Fine-tune
  • 4. ? Weighted sum of the BM25 and the score of the highest BERT-scoring sentence in the document ? Assuming that the most relevant sentences in a document are good indicators of the document-level relevance [1] ? ?BM25(?): The BM25 score of document ? ? ?BERT(??): The sentence relevance of the top ?-th sentence obtained by BERT ? ?? : The hyper-parameter ?? is to be tuned with a validation set Details of Birch 4 [1] Yilmaz et al: Cross-Domain Modeling of Sentence-level Evidence for Document Retrieval, EMNLP 2019
  • 5. Preliminary Experiment Details 5 ? Preliminary experiments to select datasets and hyper-parameters suitable for ranking web documents Train Validation NTCIR-14 WWW-2 Test Collection (with its original qrels) Robust04 MS MARCO TREC CAR TREC MB Model MB ? ? Model CAR ? ? Model MS MARCO ? ? Model CAR ¡ú MB ? ? ? Model MS MARCO ¡ú MB ? ? ? The checkmarks represent the data set used for training.
  • 6. MSMARCO ¡ú MB is the best. Thus, we submitted runs based on MS MARCO ¡ú MB and CAR ¡ú MB. Preliminary Experiment Results & Discussion 6 ? Evaluated the prediction results of Birch models ? Top k sentences: Uses the k-sentence with the highest BERT score for ranking 0.3098 0.3112 0.3103 0.3266 0.3312 0.3318 0 0.1 0.2 0.3 0.4 0.5 BM25 MB CAR MS MARCO CAR ¡ú MB MS MARCO ¡ú MB nDCG@10 Baseline Top 1 sentence Top 2 sentences Top 3 sentences
  • 7. ? MSMARCO¡úMB is the best. The CAR¡úMB model also achieved similar scores. ? The reason why MS MARCO and TREC CAR?s results are better probably because they are web documents retrieval and have a large amount of data. ? BERT is also valid for web document retrieval. Official Evaluation Results & Discussion 7 ? Achieved the best performances in terms of nDCG, Q and iRBU among all the participants. KASYS-E-CO-NEW-1: - MS MARCO¡úMB - Top 3 sentences KASYS-E-CO-NEW-4: - MS MARCO¡úMB - Top 2 sentences KASYS-E-CO-NEW-5: - CAR¡úMB - Top 3 sentences 0.6935 0.7123 0.7959 0.9389 0 0.2 0.4 0.6 0.8 1 nDCG Q ERR iRBU Baseline KASYS-E-CO-NEW-1 KASYS-E-CO-NEW-4 KASYS-E-CO-NEW-5
  • 8. ? Achieved the best performances in terms of nDCG, Q and iRBU among all the participants. ? The effectiveness of BERT in ad hoc web document retrieval tasks was verified. ? MSMARCO¡úMB is the best. The CAR¡úMB model also achieved similar scores. ? BERT is also valid for web document retrieval. Summary of NEW Runs 8 KASYS-E-CO-NEW-1: - MS MARCO¡úMB - Top 3 sentences KASYS-E-CO-NEW-5: - CAR¡úMB - Top 3 sentences KASYS-E-CO-NEW-4: - MS MARCO¡úMB - Top 2 sentences 0.6935 0.7123 0.7959 0.9389 0 0.2 0.4 0.6 0.8 1 nDCG Q ERR iRBU Baseline KASYS-E-CO-NEW-1 KASYS-E-CO-NEW-4 KASYS-E-CO-NEW-5
  • 10. Replicating and reproducing the THUIR runs at the NTCIR 14 WWW-2 Task Whether the results between models are consistent with each result. THUIR KASYS(ours) Abstract of REP runs 10 BM25 BM25 LambdaMART (learning-to-rank model) LambdaMART (learning-to-rank model) < < ?
  • 11. Replication Procedure 1 11 disney switch Canon ¨E¨E Clueweb Collection Ranked by BM25 algorithm input output Disney shop Tokyo Disney resort Disney official ¨E¨E Ranked web documents 1st 2nd 3rd input Feature extracting program Extracted eight features Extracting tf, idf, docement length, BM25, LMIR as features Up to BM25 LamdbaMART from here WWW-2 and WWW-3 topics honda Pokemon ice age ¨E¨E
  • 12. ?MQ Track : A dataset of the relevance of a topic and a document. Replication Procedure 2 12 Re-ranked web document Extraction feature program qid:001 1:0.2 ¨E qid:001 1:0.5 ¨E qid:001 1:0.1 ¨E qid:001 1:0.9 ¨E output ¨E¨E Extracted features from document LambdaMART input MQ Track WWW-1 test collection train validate Disney official Disney shop Tokyo Disney resort 1st 2nd 3rd ¨E¨E output
  • 13. ? Features for learning to rank ? TF, IDF, TF-IDF, document length, BM25 score, and three language-model-based IR scores ? The differences from original paper ? Although THUIR extracted the features from four fields (whole document, anchor text, title, and URL), we extracted the features from only the whole document ? Normalization is used by maximum and minimum values because the normalization of features was not described in the original paper Implementation Details 13
  • 14. 0.43 0.44 0.45 0.46 0.47 0.48 0.49 0.5 0.51 LamdbaMART BM25 Ours Original 0.3 0.31 0.32 0.33 0.34 0.35 0.36 Preliminary Evaluation Results with Original WWW-2 qrels 14 0.28 0.29 0.3 0.31 0.32 0.33 0.34 Ours Original nDCG@10 Q@10 nERR@10 ? Our results is lower than original results ? LambdaMART results were above BM25 for all evaluation metrics ? Succeeded in reproducing the run Ours Original
  • 15. Official Evaluation Results 15 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 nDCG Q ERR iRBU WWW-3 official result LambdaMART BM25 ? BM25 results were above LambdaMART for all evaluation metrics ? Failed to reproduce the run 0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 nDCG Q ERR iRBU WWW-2 official result LambdaMART BM25
  • 16. ? In the original paper, LambdaMART gave better results than BM25, but on the contrary, our BM25 result was better than LambdaMART ? We failed to replicate and reproduce the original paper Conclusion 16 Suggestions ? In web search tasks, more effective to extract features from all fields ? Better to clarify the method of normalization in a paper
  • 17. NEW runs ? Achieved the best performances in terms of nDCG, Q and iRBU among all the participants ? The effectiveness of BERT in ad hoc web document retrieval tasks was verified. ? MSMARCO¡úMB is the best. The CAR¡úMB model also achieved similar scores. ? BERT is also valid for web document retrieval. REP runs ? In the original paper, LambdaMART gave better results than BM25, but on the contrary, our BM25 result was better than LambdaMART ? We failed to replicate and reproduce the original paper Summary of All Runs 17