際際滷

際際滷Share a Scribd company logo
November 18, 2010
Diane M. Talley, MA
Stephen B. Johnson, PhD
James A. Penny, PhD
Psychometrics as Science and Art
2010 ICE Educational Conference
IRT and Classical
Concepts of IRT
A logit
The abcs
Benefits
Pre-equating
immediate scoring
Population invariance
Assumptions
Implications
2010 ICE Educational Conference
The right tools for the job
Data
Program
Tool
2010 ICE Educational Conference
Versus
Classical versus IRT model
2010 ICE Educational Conference
Classical versus IRT
Classical Model IRT Model
Traditional Modern
Requires less strict
adherence to assumptions
Requires stricter
adherence to assumptions
Sample dependent Population invariant
Statistics
(p  diff, p-biserial  disc)
Probability-based statistics
(b-diff, a-disc, c-guessing)
Simple scoring model (raw
score)
Scoring is more complex
2010 ICE Educational Conference
Whats a logit?
Ability
The
Performance
Standard
Probability
2010 ICE Educational Conference
b (difficulty)
2010 ICE Educational Conference
0.00
0.10
0.20
0.30
0.40
0.50
0.60
0.70
0.80
0.90
1.00
-3
-2.8
-2.5
-2.3
-2
-1.8
-1.5
-1.3
-1
-0.8
-0.5
-0.3
0
0.25
0.5
0.75
1
1.25
1.5
1.75
2
2.25
2.5
2.75
THETA
P(u=1|THETA)
Paint by Numbers Leonardo
1
4
3
2
5
a (discrimination) and b
2010 ICE Educational Conference
0.00
0.10
0.20
0.30
0.40
0.50
0.60
0.70
0.80
0.90
1.00
-3
-2.75
-2.5
-2.25
-2
-1.75
-1.5
-1.25
-1
-0.75
-0.5
-0.25
0
0.25
0.5
0.75
1
1.25
1.5
1.75
2
2.25
2.5
2.75
THETA
P(u=1|THETA)
Paint by Numbers Leonardo
1
2
3
a, b, and c (guessing)
2010 ICE Educational Conference
0.00
0.10
0.20
0.30
0.40
0.50
0.60
0.70
0.80
0.90
1.00
-3
-2.75
-2.5
-2.25
-2
-1.75
-1.5
-1.25
-1
-0.75
-0.5
-0.25
0
0.25
0.5
0.75
1
1.25
1.5
1.75
2
2.25
2.5
2.75
THETA
P(u=1|THETA)
Paint by Numbers Leonardo
1
2
3
Fit statistics
Comparison of InfitandOutfit
0
1
2
3
4
5
6
Infit Outfit
ItemOrder
ICE 2010 Conference Atlanta Georgia
Outfit Mean Square Plot
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15 20 25 30
Item Order
MSQ
Infit Mean Square Plot
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
0 5 10 15 20 25 30
Item Order
MSQ
Population Invariance
Item 3
Item 2
Item 1
.92.70
.80.60
.50.15
High
Performing
Low
Performing
Classical Difficulty Values IRT Difficulty Values
Item 3
Item 2
Item 1
-.75-.75
0.000.00
1.501.50
High
Performing
Low
Performing
2010 ICE Educational Conference
IRT Pre-Equating
What does it mean?
Why would you want to do it?
What does it mean for building item banks
and forms?
2010 ICE Educational Conference
Test Information Function (TIF)
Comparison of Test Information Functions
0.0
10.0
20.0
30.0
40.0
50.0
60.0
70.0
-3 -2.75 -2.5 -2.25 -2 -1.75 -1.5 -1.25 -1 -0.75 -0.5 -0.25 0 0.25 0.5 0.775 1.025 1.275 1.525 1.775 2.025 2.275 2.525 2.775 3.025
Theta
Information
FormA
FormB
2010 ICE Educational Conference
Assumptions
Unidimensionality
Local Independence
2010 ICE Educational Conference
Implications
Item writing
Leave those scored items alone!
Focused item writing targeting the performance standard
Assembly
Items selected for a form should be around the standard
Testing and Reporting
Field test items for pre-equating/on-demand scoring
Form assignment
Scoring
Recalibration
Harder to explain to stakeholders
2010 ICE Educational Conference
Does IRT make sense for you?
What is the size and maturity of your program and
item bank?
Do you like to tinker with items?
Do your program requirements change frequently?
How experienced/capable are your item writers?
How do you score candidates?
IRT or number correct
Do you hold scores or do immediate scoring?
Can you afford a psychometrician?
2010 ICE Educational Conference
Questions?
Diane M. Talley dtalley@castleworldwide.com
James A. Penny jpenny@castleworldwide.com
Stephen B. Johnson sjohnson@castleworldwide.com
919.572.6880
www.castleworldwide.com

More Related Content

ABCs of IRT

  • 1. November 18, 2010 Diane M. Talley, MA Stephen B. Johnson, PhD James A. Penny, PhD
  • 2. Psychometrics as Science and Art 2010 ICE Educational Conference
  • 3. IRT and Classical Concepts of IRT A logit The abcs Benefits Pre-equating immediate scoring Population invariance Assumptions Implications 2010 ICE Educational Conference
  • 4. The right tools for the job Data Program Tool 2010 ICE Educational Conference Versus
  • 5. Classical versus IRT model 2010 ICE Educational Conference
  • 6. Classical versus IRT Classical Model IRT Model Traditional Modern Requires less strict adherence to assumptions Requires stricter adherence to assumptions Sample dependent Population invariant Statistics (p diff, p-biserial disc) Probability-based statistics (b-diff, a-disc, c-guessing) Simple scoring model (raw score) Scoring is more complex 2010 ICE Educational Conference
  • 8. b (difficulty) 2010 ICE Educational Conference 0.00 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.90 1.00 -3 -2.8 -2.5 -2.3 -2 -1.8 -1.5 -1.3 -1 -0.8 -0.5 -0.3 0 0.25 0.5 0.75 1 1.25 1.5 1.75 2 2.25 2.5 2.75 THETA P(u=1|THETA) Paint by Numbers Leonardo 1 4 3 2 5
  • 9. a (discrimination) and b 2010 ICE Educational Conference 0.00 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.90 1.00 -3 -2.75 -2.5 -2.25 -2 -1.75 -1.5 -1.25 -1 -0.75 -0.5 -0.25 0 0.25 0.5 0.75 1 1.25 1.5 1.75 2 2.25 2.5 2.75 THETA P(u=1|THETA) Paint by Numbers Leonardo 1 2 3
  • 10. a, b, and c (guessing) 2010 ICE Educational Conference 0.00 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.90 1.00 -3 -2.75 -2.5 -2.25 -2 -1.75 -1.5 -1.25 -1 -0.75 -0.5 -0.25 0 0.25 0.5 0.75 1 1.25 1.5 1.75 2 2.25 2.5 2.75 THETA P(u=1|THETA) Paint by Numbers Leonardo 1 2 3
  • 11. Fit statistics Comparison of InfitandOutfit 0 1 2 3 4 5 6 Infit Outfit ItemOrder ICE 2010 Conference Atlanta Georgia Outfit Mean Square Plot 0 0.2 0.4 0.6 0.8 1 1.2 0 5 10 15 20 25 30 Item Order MSQ Infit Mean Square Plot 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 0 5 10 15 20 25 30 Item Order MSQ
  • 12. Population Invariance Item 3 Item 2 Item 1 .92.70 .80.60 .50.15 High Performing Low Performing Classical Difficulty Values IRT Difficulty Values Item 3 Item 2 Item 1 -.75-.75 0.000.00 1.501.50 High Performing Low Performing 2010 ICE Educational Conference
  • 13. IRT Pre-Equating What does it mean? Why would you want to do it? What does it mean for building item banks and forms? 2010 ICE Educational Conference
  • 14. Test Information Function (TIF) Comparison of Test Information Functions 0.0 10.0 20.0 30.0 40.0 50.0 60.0 70.0 -3 -2.75 -2.5 -2.25 -2 -1.75 -1.5 -1.25 -1 -0.75 -0.5 -0.25 0 0.25 0.5 0.775 1.025 1.275 1.525 1.775 2.025 2.275 2.525 2.775 3.025 Theta Information FormA FormB 2010 ICE Educational Conference
  • 16. Implications Item writing Leave those scored items alone! Focused item writing targeting the performance standard Assembly Items selected for a form should be around the standard Testing and Reporting Field test items for pre-equating/on-demand scoring Form assignment Scoring Recalibration Harder to explain to stakeholders 2010 ICE Educational Conference
  • 17. Does IRT make sense for you? What is the size and maturity of your program and item bank? Do you like to tinker with items? Do your program requirements change frequently? How experienced/capable are your item writers? How do you score candidates? IRT or number correct Do you hold scores or do immediate scoring? Can you afford a psychometrician? 2010 ICE Educational Conference
  • 18. Questions? Diane M. Talley dtalley@castleworldwide.com James A. Penny jpenny@castleworldwide.com Stephen B. Johnson sjohnson@castleworldwide.com 919.572.6880 www.castleworldwide.com