際際滷

際際滷Share a Scribd company logo
Getting one voice:
tuning up experts assessment in
     measuring accessibility
                                      Silvia Mirri
                              Ludovico A. Muratori
                                  Paola Salomoni
                                 Matteo Battistelli

                    Department of Computer Science
                               University of Bologna
Summary


   Introduction
   Automatic and manual accessibility evaluations
   Our proposed metric
   Conclusions and future works




W4A 2012  April 16th&17th, 2012 - Lyon, France      2
Introduction

                    Web accessibility evaluations

             automatic tools + human assessment


   Metrics quantify accessibility level or barriers, providing
   numerical synthesis
       automatic tools return binary values
       human assessments are subjective and can get values from a
        continuous range


W4A 2012  April 16th&17th, 2012 - Lyon, France                      3
Our main goal

   Providing a metric to measure how far a Web
   page is from its accessibility version, taking into
   account

    integration of human assessments with automatic
     evaluations on the same target
    many humans assessments




W4A 2012  April 16th&17th, 2012 - Lyon, France          4
Steps

   1. Mixing up the manual evaluation together with the
      automatic ones


   2. Combining the assessments coming from different
      human evaluations
            Values distributed into a given range
            The more experts' assessments contribute to compute a
             value, the more this value is stable and reliable




W4A 2012  April 16th&17th, 2012 - Lyon, France                      5
Automatic and manual evaluations: an example

   Combination between the IMG element and its ALT
   attribute:
   1. If the ALT attribute is omitted the automatic check outputs 1
   2. If the ALT attribute is present the automatic check outputs 0

   Manual evaluation might state that:
    there is no lack of information once the images are hidden (this
     can happen in case 1, if the image is a pure decorative one)
    there is a lack of information once the image is hidden




W4A 2012  April 16th&17th, 2012 - Lyon, France                         6
Our metric
   A first version of our metric (Barriers Impact Factor) is
    computed on the basis of a barrier-error association
    table
   This table reports the list of assistive
    technologies/disabilities affected by any error
           screen reader/blindness
           screen magnifier/low vision
           color blindness
           input device independence/movement impairments
           deafness
           cognitive disabilities
           photosensitive epilepsy


W4A 2012  April 16th&17th, 2012 - Lyon, France                 7
Our metric

  Comparing automatic checks with WCAG 2.0 success
   criteria and identified relationships

                                               a certain error occurs or a
          A check fails
                                              manual control is necessary

  Each barrier is related to one success criterion and to
   one level of conformity (A, AA or AAA)
  Manual evaluations take values on the [0, 1] real
   numbers interval:
           1 means that an accessibility error occurs
           0 means the absence of that accessibility error

W4A 2012  April 16th&17th, 2012 - Lyon, France                              8
Our metric




W4A 2012  April 16th&17th, 2012 - Lyon, France   9
Weighting automatic and manual checks

     1. m(i)=a(i): the formula is a mere average among automatically
     and manually detected errors
     2. m(i)>a(i): the failure in manual assessment is considered more
     significant than the automatic one
     3. m(i)<a(i): the failure in automatic assessment is considered
        more significant than the manual one

                         AUTOMATIC                                AUTOMATIC
                         0        1                              0       1

                   [0,   I       III                       [0,   I       II
          MANUAL




                                                  MANUAL
                   ,1]   II      IV                        ,1]   III     IV




W4A 2012  April 16th&17th, 2012 - Lyon, France                               10
Some considerations

   The more human operators provide evaluations about
    an accessibility barrier and the more the value of
    accessibility level is reliable
   Behavior similar to online rating systems ones
   New users rating can be influenced by already
    expressed evaluations from other users
   Variance must be considered so as to reinforce the
    computed accessibility level



W4A 2012  April 16th&17th, 2012 - Lyon, France          11
A first assessment
                 PAGE CONTENT                       MANUAL EVALUATIONS

                                                        0,7        Expert A

                                                         1         Expert B

                                                        0,8        Expert C

                                                         1         Expert D

                  ALT=Image                           0,5        Expert E



              NO LINK, NO TITLE
                                                      CBIF



             AUTOMATIC EVALUATION
                                                        m=2
                                                        a=1
             0 (no known errors,                    Average=0,8               CBIF=0,53
             1 alert: placeholder                 Variance=0,036
                  detected)


W4A 2012  April 16th&17th, 2012 - Lyon, France                                           12
Conclusions

   We have defined an accessibility metric with the aim to
    evaluate barriers as a whole, combining results
    provided by using automatic tools and manual
    evaluations done by experts
   The metric has been preliminary tested by measuring
    accessibility barriers in several local public
    administration Web sites
   Five experts are manually evaluating barriers related to
    WCAG 2.0 1.1.1 (using an automatic monitoring system
    to verify the page content and to collect data from
    manual evaluations)

W4A 2012  April 16th&17th, 2012 - Lyon, France               13
Future Work


   Propose and discuss weights for the whole WCAG 2.0
    set of barriers

   Investigate how the number of experts involved in the
    evaluation, together with their rating variance, could
    influence the reliability of the computed values




W4A 2012  April 16th&17th, 2012 - Lyon, France              14
Contacts


       Thank you for your attention!

       For further information:
           silvia.mirri@unibo.it




W4A 2012  April 16th&17th, 2012 - Lyon, France   15

More Related Content

Mirri w4a2012

  • 1. Getting one voice: tuning up experts assessment in measuring accessibility Silvia Mirri Ludovico A. Muratori Paola Salomoni Matteo Battistelli Department of Computer Science University of Bologna
  • 2. Summary Introduction Automatic and manual accessibility evaluations Our proposed metric Conclusions and future works W4A 2012 April 16th&17th, 2012 - Lyon, France 2
  • 3. Introduction Web accessibility evaluations automatic tools + human assessment Metrics quantify accessibility level or barriers, providing numerical synthesis automatic tools return binary values human assessments are subjective and can get values from a continuous range W4A 2012 April 16th&17th, 2012 - Lyon, France 3
  • 4. Our main goal Providing a metric to measure how far a Web page is from its accessibility version, taking into account integration of human assessments with automatic evaluations on the same target many humans assessments W4A 2012 April 16th&17th, 2012 - Lyon, France 4
  • 5. Steps 1. Mixing up the manual evaluation together with the automatic ones 2. Combining the assessments coming from different human evaluations Values distributed into a given range The more experts' assessments contribute to compute a value, the more this value is stable and reliable W4A 2012 April 16th&17th, 2012 - Lyon, France 5
  • 6. Automatic and manual evaluations: an example Combination between the IMG element and its ALT attribute: 1. If the ALT attribute is omitted the automatic check outputs 1 2. If the ALT attribute is present the automatic check outputs 0 Manual evaluation might state that: there is no lack of information once the images are hidden (this can happen in case 1, if the image is a pure decorative one) there is a lack of information once the image is hidden W4A 2012 April 16th&17th, 2012 - Lyon, France 6
  • 7. Our metric A first version of our metric (Barriers Impact Factor) is computed on the basis of a barrier-error association table This table reports the list of assistive technologies/disabilities affected by any error screen reader/blindness screen magnifier/low vision color blindness input device independence/movement impairments deafness cognitive disabilities photosensitive epilepsy W4A 2012 April 16th&17th, 2012 - Lyon, France 7
  • 8. Our metric Comparing automatic checks with WCAG 2.0 success criteria and identified relationships a certain error occurs or a A check fails manual control is necessary Each barrier is related to one success criterion and to one level of conformity (A, AA or AAA) Manual evaluations take values on the [0, 1] real numbers interval: 1 means that an accessibility error occurs 0 means the absence of that accessibility error W4A 2012 April 16th&17th, 2012 - Lyon, France 8
  • 9. Our metric W4A 2012 April 16th&17th, 2012 - Lyon, France 9
  • 10. Weighting automatic and manual checks 1. m(i)=a(i): the formula is a mere average among automatically and manually detected errors 2. m(i)>a(i): the failure in manual assessment is considered more significant than the automatic one 3. m(i)<a(i): the failure in automatic assessment is considered more significant than the manual one AUTOMATIC AUTOMATIC 0 1 0 1 [0, I III [0, I II MANUAL MANUAL ,1] II IV ,1] III IV W4A 2012 April 16th&17th, 2012 - Lyon, France 10
  • 11. Some considerations The more human operators provide evaluations about an accessibility barrier and the more the value of accessibility level is reliable Behavior similar to online rating systems ones New users rating can be influenced by already expressed evaluations from other users Variance must be considered so as to reinforce the computed accessibility level W4A 2012 April 16th&17th, 2012 - Lyon, France 11
  • 12. A first assessment PAGE CONTENT MANUAL EVALUATIONS 0,7 Expert A 1 Expert B 0,8 Expert C 1 Expert D ALT=Image 0,5 Expert E NO LINK, NO TITLE CBIF AUTOMATIC EVALUATION m=2 a=1 0 (no known errors, Average=0,8 CBIF=0,53 1 alert: placeholder Variance=0,036 detected) W4A 2012 April 16th&17th, 2012 - Lyon, France 12
  • 13. Conclusions We have defined an accessibility metric with the aim to evaluate barriers as a whole, combining results provided by using automatic tools and manual evaluations done by experts The metric has been preliminary tested by measuring accessibility barriers in several local public administration Web sites Five experts are manually evaluating barriers related to WCAG 2.0 1.1.1 (using an automatic monitoring system to verify the page content and to collect data from manual evaluations) W4A 2012 April 16th&17th, 2012 - Lyon, France 13
  • 14. Future Work Propose and discuss weights for the whole WCAG 2.0 set of barriers Investigate how the number of experts involved in the evaluation, together with their rating variance, could influence the reliability of the computed values W4A 2012 April 16th&17th, 2012 - Lyon, France 14
  • 15. Contacts Thank you for your attention! For further information: silvia.mirri@unibo.it W4A 2012 April 16th&17th, 2012 - Lyon, France 15