際際滷

際際滷Share a Scribd company logo
Evaluation in Information
               Retrieval


      (Book油chapter油from油C.D.油Manning,油P.油Raghavan,油and油H.油Schutze.油
                Introduction油to油information油retrieval)



                            Dishant油Ailawadi
    INF384H油/油CS395T:油Concepts油of油Information油Retrieval油(and油Web油Search)油Fall11




油                                       油
Outline

瓿Why油Evaluation?
瓿Standard油test油collections.

瓿Precision油and油Recall

瓿Mean油Average油Precision

瓿Kappa油Statistic

瓿R足Precision

瓿Summary




油                         油
Why Evaluation?



 油There油are油many油retrieval油models/油algorithms/油systems,油
which油one油is油the油best?

 油Measure油effect油of油adding油new油features.

 油How油far油down油the油ranked油list油will油a油user油need油to油look油to油find油
some/all油relevant油documents?

 油Difficulties油:油Relevance,油it油is油not油binary油but油continuous.油How油
to油say油if油a油document油is油relevant?



油                                油
Standard Test Collections
油A油standard油test油collection油consists油of油three油things:
1.油A油document油collection.
2.油A油set油of油queries油on油this油collection
3.油A油set油of油relevance油judgments油on油those油queries.

If油a油document油in油test油collection油is油given油a油binary油classification.油油
This油decision油is油referred油to油as油the油gold油standard油or油ground油
truth油judgment油of油relevance.油油




油                                油
Standard Test Collections

       油Cranfield:油1950s油in油UK.油Too油small油to油be油used油nowadays.
    油TREC油(text油retrieval油conference)
    


              Early油TREC油had油50油Information油needs,油TREC油6足8油provide油150油
                 information油needs油over油more油than油500油thousand油articles.
              Recent油work油on油25油million油pages油of油GOV2油is油now油available油for油
                 research.
    油NTCIR油East足Asian油Language油and油Cross油Language油IR油Systems
    



    油Cross油Language油Evaluation油Forum油(CLEF)
    



    油Reuters足21578油collection油most油used油for油text油classification.
    



油                                         油
Evaluation Measures
         Retrieved    True油positives油(tp)    False油positives油(fp)

     Not油Retrieved    False油negatives油(fn)   True油negatives油(tn)
                       Relevant               Non油Relevant


               Number 油of 油relevant 油documents油retrieved            =油tp/(tp油+油fn)
    recall 油=油
                Total 油number 油of 油relevant 油documents


                 Number油油of 油relevant油documents油油retrieved
    precision油=油                                                      =油tp/(tp油+油fp)
                  Total油number油of 油documents油油retrieved



油
    (How油many油correct油selections?)油Accuracy油=油(tp油+油tn)/(tp油+油fp油+油fn油+油tn)
                                    油
An Example
    n doc油# relevant
                       Let油total油#油of油relevant油docs油=油6
    1 588       x
                       Check油each油new油recall油point:
    2 589       x
    3 576
                       R=1/6=0.167;     P=1/1=1
    4 590       x
    5 986
                       R=2/6=0.333;     P=2/2=1
    6 592       x
    7 984              R=3/6=0.5;油油油油油P=3/4=0.75
    8 988
    9 578              R=4/6=0.667;油P=4/6=0.667
    10 985
                                                    Missing油one油
    11 103                                          relevant油document.
    12 591                                          Never油reach油
    13 772      x      R=5/6=0.833;     p=5/13=0.38 100%油recall
    14 990
                                                              7

油                               油
Combining Precision & Recall
F足Measure:油Weighted油HM油of油precision油and油recall.




Value油of油硫油controls油trade足off:
癯油=油1:油Equally油weight油precision油and油recall.


癯油>油1:油Weight油recall油more.



 硫油<油1:油Weight油precision油more.
                     2 PR    2
                  F=      = 1 1
                     P + R R+P

油                                 油
Precision-Recall curve




Interpolated油Precision:油To油get油smooth油curve.

油                                油
11-point Interpolated Average Precision

Recall油油油Interp.
油油油油油油油油油油Precision
油油油0.0油油油油油油1.00
油油油0.1油油油油油油0.67
油油油0.2油油油油油油0.63
油油油0.3油油油油油油0.55
油油油0.4油油油油油油0.45
油油油0.5油油油油油油0.41
油油油0.6油油油油油油0.36
油油油0.7油油油油油油0.29
油油油0.8油油油油油油0.13
油油油0.9油油油油油油0.10
油油油1.0油油油油油油0.08

油                       油
Single Figure Measures

Mean油Average油Precision油(MAP):油Average油Precision油over油all油
queries.
Example:油Average油Precision:油(1油+油1油+油0.75油+油0.667油+油0.38油+油
0)/6油=油0.633



Normalized油Distributed油Cumulative油Gain油(NDCG):油For油non足
binary油notions.油



油                            油
Assesing Relevance
油Pooling:油To油obtain油a油subset油of油collection油related油to油query


    油Use油a油set油of油search油engines/algorithms
    油The油top足k油results油(k油is油between油20油to油50油in油TREC)油are
    油油merged油into油a油pool,油duplicates油are油removed
油   油Present油the油documents油in油a油random油order油to油analysts油for
    油油relevance油judgments


油Kappa油Statistic:


   油油If油we油have油multiple油judges油on油one油information油need,油how油consistent油are油
      those油judges?
油油kappa油=油(P(A)油油P(E))油/油(1油油P(E))
油油油油P(A)油is油the油proportion油of油the油times油that油the油judges
油油油油油agreed
油油油油P(E)油is油the油proportion油of油the油times油they油would油be
 油                                      油
油油油油expected油to油agree油by油chance
Example: Kappa Statistic
油油油油油油油油油油油油油油油油油油油油油油油油油油油Judge油2油Relevance
油油油油油油油油油油油油油油油油油油油油油油油油油油油油Yes油油油油油油No油油Total
Judge油1油油油油油油Yes油油油油油300油油油油油20油油油油320
Relevance油油油No油油油油油油10油油油油油油70油油油油油80
油油油油油油油油油油油油油油油油油Total油油油310油油油油油90油油油油400
Observed油proportion油of油the油times油the油judges油agreed油:


Pooled油marginals:油


Probability油that油two油judges油agreed油by油chance油(Max油Value=1,油Min油=0.5):油


Kappa油statistic:油


Kappa油Value油between油0.67油and油0.8油is油fair油agreement油but油below油0.67油is油
 油                                    油
seen油as油data油providing油a油dubious油basis油for油evaluation.
Evaluation
                                                  n doc油# relevant
R足PRECISION油:                                      1 588      x
                     R油=油#油of油relevant油docs油=油7    2 589      x
                                                   3 576
                      R足Precision油=油4/7油=油0.571    4 590      x
                                                   5 986
                                                   6 592      x
                                                   7 984
                                                   8 988
A/B油Test油:油Precisely油one油change油between            9 578
                                                  10 985
油current油and油previous油system.油We油evaluate油the     11 103
Affect油of油that油change油on油system.                  12 591
                                                  13 772      x
                                                  14 990




油                             油
Summary
瓿F足Measure:油To油combine油Precision油and油recall.油
瓿Recall足precision油graph油油conveying油more油information油than


 a油single油number油measure.
瓿Mean油average油precision油油single油number油value,油popular油


measure.
瓿Normalized油Discounted油Cumulative油Gain油(NDCG)油油single油


number油summary油for油each油rank油level油emphasizing油top油ranked油
documents,油relevance油judgments油only油needed油to油a油specific油rank油
depth油(e.g.,油10)
瓿Kappa油Measure:油Judgement油reliability

瓿R足Precision:油Only油need油to油examine油top油rel油documents.油




油                               油
THANK油YOU!




油       油

More Related Content

Similar to Presentation (20)

PDF
evaluation in infomation retrival
jetaime
PPT
Performance evaluation of IR models
Nisha Arankandath
PPTX
Common evaluation measures in NLP and IR
Rushdi Shams
PPT
Chapter 3 retrieval evaluation
AsimGardezi
PDF
Chapter 5 Query Evaluation.pdf
Habtamu100
PPTX
IR Evaluation using Rank-Biased Precision
Ofer Egozi
PPTX
Probabilistic Retrieval Models - Sean Golliher Lecture 8 MSU CSCI 494
Sean Golliher
PDF
assia2015sakai
Tetsuya Sakai
PDF
Web search-metrics-tutorial-www2010-section-2of7-relevance
Ali Dasdan
PDF
2011 Crowdsourcing Search Evaluation
Brian Johnson
PDF
TESCO Evaluation of Non-Normal Meter Data
TESCO - The Eastern Specialty Company
PDF
2 Machine Learning General.pdf
adityamcse
PDF
Mp2420852090
IJERA Editor
PDF
Hadoop Summit 2010 Machine Learning Using Hadoop
Yahoo Developer Network
PDF
3rd Proj. Update: Integrating SWI-Prolog for Semantic Reasoning in Bioclipse
Samuel Lampa
PPT
information technology materrailas paper
melkamutesfay1
PDF
Current Approaches in Search Result Diversification
Mario Sangiorgio
PDF
Review helpfulness assessment_aug15
Kristien Verreydt
PDF
Probabilistic Retrieval
otisg
PDF
Evangelos Kanoulas "Advances in Information Retrieval Evaluation"
Yandex
evaluation in infomation retrival
jetaime
Performance evaluation of IR models
Nisha Arankandath
Common evaluation measures in NLP and IR
Rushdi Shams
Chapter 3 retrieval evaluation
AsimGardezi
Chapter 5 Query Evaluation.pdf
Habtamu100
IR Evaluation using Rank-Biased Precision
Ofer Egozi
Probabilistic Retrieval Models - Sean Golliher Lecture 8 MSU CSCI 494
Sean Golliher
assia2015sakai
Tetsuya Sakai
Web search-metrics-tutorial-www2010-section-2of7-relevance
Ali Dasdan
2011 Crowdsourcing Search Evaluation
Brian Johnson
TESCO Evaluation of Non-Normal Meter Data
TESCO - The Eastern Specialty Company
2 Machine Learning General.pdf
adityamcse
Mp2420852090
IJERA Editor
Hadoop Summit 2010 Machine Learning Using Hadoop
Yahoo Developer Network
3rd Proj. Update: Integrating SWI-Prolog for Semantic Reasoning in Bioclipse
Samuel Lampa
information technology materrailas paper
melkamutesfay1
Current Approaches in Search Result Diversification
Mario Sangiorgio
Review helpfulness assessment_aug15
Kristien Verreydt
Probabilistic Retrieval
otisg
Evangelos Kanoulas "Advances in Information Retrieval Evaluation"
Yandex

Recently uploaded (20)

PDF
Automating the Geo-Referencing of Historic Aerial Photography in Flanders
Safe Software
PDF
Database Benchmarking for Performance Masterclass: Session 2 - Data Modeling ...
ScyllaDB
PDF
Hyderabad MuleSoft In-Person Meetup (June 21, 2025) 際際滷s
Ravi Tamada
PDF
Darley - FIRST Copenhagen Lightning Talk (2025-06-26) Epochalypse 2038 - Time...
treyka
PDF
Scaling i.MX Applications Processors Native Edge AI with Discrete AI Accele...
Edge AI and Vision Alliance
PPSX
Usergroup - OutSystems Architecture.ppsx
Kurt Vandevelde
PPTX
MARTSIA: A Tool for Confidential Data Exchange via Public Blockchain - Poster...
Michele Kryston
PDF
EIS-Webinar-Engineering-Retail-Infrastructure-06-16-2025.pdf
Earley Information Science
PDF
The Future of Product Management in AI ERA.pdf
Alyona Owens
PDF
2025_06_18 - OpenMetadata Community Meeting.pdf
OpenMetadata
PPTX
01_Approach Cyber- DORA Incident Management.pptx
FinTech Belgium
PDF
Redefining Work in the Age of AI - What to expect? How to prepare? Why it mat...
Malinda Kapuruge
PDF
Why aren't you using FME Flow's CPU Time?
Safe Software
PDF
From Chatbot to Destroyer of Endpoints - Can ChatGPT Automate EDR Bypasses (1...
Priyanka Aash
PDF
UiPath Agentic AI ile Ak脹ll脹 Otomasyonun Yeni a脹
UiPathCommunity
PPTX
CapCut Pro Crack For PC Latest Version {Fully Unlocked} 2025
pcprocore
PDF
Java 25 and Beyond - A Roadmap of Innovations
Ana-Maria Mihalceanu
PDF
Unlocking FME Flows Potential: Architecture Design for Modern Enterprises
Safe Software
PDF
The Growing Value and Application of FME & GenAI
Safe Software
PDF
Salesforce Summer '25 Release Frenchgathering.pptx.pdf
yosra Saidani
Automating the Geo-Referencing of Historic Aerial Photography in Flanders
Safe Software
Database Benchmarking for Performance Masterclass: Session 2 - Data Modeling ...
ScyllaDB
Hyderabad MuleSoft In-Person Meetup (June 21, 2025) 際際滷s
Ravi Tamada
Darley - FIRST Copenhagen Lightning Talk (2025-06-26) Epochalypse 2038 - Time...
treyka
Scaling i.MX Applications Processors Native Edge AI with Discrete AI Accele...
Edge AI and Vision Alliance
Usergroup - OutSystems Architecture.ppsx
Kurt Vandevelde
MARTSIA: A Tool for Confidential Data Exchange via Public Blockchain - Poster...
Michele Kryston
EIS-Webinar-Engineering-Retail-Infrastructure-06-16-2025.pdf
Earley Information Science
The Future of Product Management in AI ERA.pdf
Alyona Owens
2025_06_18 - OpenMetadata Community Meeting.pdf
OpenMetadata
01_Approach Cyber- DORA Incident Management.pptx
FinTech Belgium
Redefining Work in the Age of AI - What to expect? How to prepare? Why it mat...
Malinda Kapuruge
Why aren't you using FME Flow's CPU Time?
Safe Software
From Chatbot to Destroyer of Endpoints - Can ChatGPT Automate EDR Bypasses (1...
Priyanka Aash
UiPath Agentic AI ile Ak脹ll脹 Otomasyonun Yeni a脹
UiPathCommunity
CapCut Pro Crack For PC Latest Version {Fully Unlocked} 2025
pcprocore
Java 25 and Beyond - A Roadmap of Innovations
Ana-Maria Mihalceanu
Unlocking FME Flows Potential: Architecture Design for Modern Enterprises
Safe Software
The Growing Value and Application of FME & GenAI
Safe Software
Salesforce Summer '25 Release Frenchgathering.pptx.pdf
yosra Saidani
Ad

Presentation