際際滷

際際滷Share a Scribd company logo
AN UPDATE




   Prepared by Nadia Millington & Luis Rosenthal
Quality of phone



                      Ideally Nokia 6300 ( or above) will
                       allow appropriate visualisation of the
                       image is its resolution and screen size.

                      If microworkers do not have an
                       appropriate phone, they can access
                       this phone via a microfinance loan or
                       we can develop a scheme whereby
                       refurbished high end phones from the
                       first world ( which have been fully
                       depreciated) can be sent to the BOP
                       at a fraction of the cost ( some as low
                       as 20USDs) allowing for high
                       visualisation and good quality screen
                       size.
Data transmission costs

                          The money that the
                          microworkers earn is expected to
                          be significantly higher than the
                          data costs based on our quick
                          and dirty review of phone costs
                          in 3 developing countries.
                          Assuming each job pays 20US
                          cents we see data charges as a
                          small percentage of their
                          earnings and their only cost
                          (2-15%). We expect even these
                          percentages to be reduced based
                          on a thorough review of all the
                          available packages
Can the services be automated by a computer?

High accuracy OCR software can read more than 400
                                                                The accuracy of OCR systems is, in practice, directly
characters/second.
                                                                dependent upon the quality of the input documents.
                                                                OCR is not very tolerant of bad picture quality unlike
However:                                                        human readers. As such it is expected the OCR use
OCR software is not efficient in recognizing handwriting and    with receipt will have higher error thresholds. The
distinguishing between fonts which are quite similar to         main difficulties encountered with receipts , invoices
handwriting. In such cases manual entry plays better role       etc that are a challenge to OCR are
than OCR process.
Data entry provides complete flexibility allowing micro                 Variations in shape, due to serifs and style
operators to prepare digital documents from multiple                   variations.
formats- even audio recording of spending can be included,              Deformations, caused by broken characters,
and notes on partial payments scribbled on the receipts                smudged characters and speckle.
etc.                                                                    Variations in spacing, due to subscripts,
                                                                       superscripts, skew and variable spacing.
OCR may be efficient during the initial level of data entry
                                                                        Mixture of text and graphics.
service but cannot be a substitute of data entry service
because recognition of typewritten text is still not 100%
accurate even where clear imaging is available. OCR
software ranges from 71% to 95%; but total accuracy can
be achieved only by human review. Errors occur because
of :
Distinguishing noise from text- Dots and accents may be
mistaken for noise, and vice versa.
Mistaking graphics or geometry for text- This leads to
nontext being sent to recognition.
                                                                    ni = m
 Mistaking text for graphics or geometry- In this case the
text will not be passed to the recognition stage. This often   Common OCR issues include mistaking an ni for an m
happens if characters are connected to graphics.
When OCR doesnt work

These imperfections may affect and cause problems in different parts of the recognition process of an
OCR-system, resulting misclassifications
Finally

          Most OCR has some human interaction. Modern optical character
          recognition software relies on human interaction to correct
          misrecognized characters. Even though the software often reliably
          identifies low-confidence output, the simple language and
          vocabulary models employed are insufficient to automatically
          correct mistakes. A developer of the software lemon.com confirms
          this- he states Whenever the machine learning system or the OCR
          system have a low confidence result, it can ask for human
          assistance, usually with a multiple choice answer or a request to
          edit an entry.

          Models where OCR does not use human intervention, the
          consumer is expected to correct their own errors which is not a
          value proposition AskMom would ever employ as we are selling
          convenience

          It is possible to enhance the AskMom Business model with OCR
          technology on the front end utilising microworkers for quality
          assurance and low confidence results. The use of micro workers
          would still mean that we are operating at costs below other
          players. However, the human element is the key as it differentiates
          us. It allows AskMom to have higher levels of flexibility for
          recording complex, ill printed, receipts with accuracy from all parts
          of the world (offering a global solution) as opposed to the other
          options like lemon which only works within the US jurisdiction

More Related Content

Ask mom updated submitted april 2nd

  • 1. AN UPDATE Prepared by Nadia Millington & Luis Rosenthal
  • 2. Quality of phone Ideally Nokia 6300 ( or above) will allow appropriate visualisation of the image is its resolution and screen size. If microworkers do not have an appropriate phone, they can access this phone via a microfinance loan or we can develop a scheme whereby refurbished high end phones from the first world ( which have been fully depreciated) can be sent to the BOP at a fraction of the cost ( some as low as 20USDs) allowing for high visualisation and good quality screen size.
  • 3. Data transmission costs The money that the microworkers earn is expected to be significantly higher than the data costs based on our quick and dirty review of phone costs in 3 developing countries. Assuming each job pays 20US cents we see data charges as a small percentage of their earnings and their only cost (2-15%). We expect even these percentages to be reduced based on a thorough review of all the available packages
  • 4. Can the services be automated by a computer? High accuracy OCR software can read more than 400 The accuracy of OCR systems is, in practice, directly characters/second. dependent upon the quality of the input documents. OCR is not very tolerant of bad picture quality unlike However: human readers. As such it is expected the OCR use OCR software is not efficient in recognizing handwriting and with receipt will have higher error thresholds. The distinguishing between fonts which are quite similar to main difficulties encountered with receipts , invoices handwriting. In such cases manual entry plays better role etc that are a challenge to OCR are than OCR process. Data entry provides complete flexibility allowing micro Variations in shape, due to serifs and style operators to prepare digital documents from multiple variations. formats- even audio recording of spending can be included, Deformations, caused by broken characters, and notes on partial payments scribbled on the receipts smudged characters and speckle. etc. Variations in spacing, due to subscripts, superscripts, skew and variable spacing. OCR may be efficient during the initial level of data entry Mixture of text and graphics. service but cannot be a substitute of data entry service because recognition of typewritten text is still not 100% accurate even where clear imaging is available. OCR software ranges from 71% to 95%; but total accuracy can be achieved only by human review. Errors occur because of : Distinguishing noise from text- Dots and accents may be mistaken for noise, and vice versa. Mistaking graphics or geometry for text- This leads to nontext being sent to recognition. ni = m Mistaking text for graphics or geometry- In this case the text will not be passed to the recognition stage. This often Common OCR issues include mistaking an ni for an m happens if characters are connected to graphics.
  • 5. When OCR doesnt work These imperfections may affect and cause problems in different parts of the recognition process of an OCR-system, resulting misclassifications
  • 6. Finally Most OCR has some human interaction. Modern optical character recognition software relies on human interaction to correct misrecognized characters. Even though the software often reliably identifies low-confidence output, the simple language and vocabulary models employed are insufficient to automatically correct mistakes. A developer of the software lemon.com confirms this- he states Whenever the machine learning system or the OCR system have a low confidence result, it can ask for human assistance, usually with a multiple choice answer or a request to edit an entry. Models where OCR does not use human intervention, the consumer is expected to correct their own errors which is not a value proposition AskMom would ever employ as we are selling convenience It is possible to enhance the AskMom Business model with OCR technology on the front end utilising microworkers for quality assurance and low confidence results. The use of micro workers would still mean that we are operating at costs below other players. However, the human element is the key as it differentiates us. It allows AskMom to have higher levels of flexibility for recording complex, ill printed, receipts with accuracy from all parts of the world (offering a global solution) as opposed to the other options like lemon which only works within the US jurisdiction