ºÝºÝߣ

ºÝºÝߣShare a Scribd company logo
Building a Spanish MMTx by
using Automatic Translation and
Biomedical Ontologies
Francisco Carrero 1,2 ; Jos¨¦ Carlos Cortizo 1,2 ; Jos¨¦ M? G¨®mez 3
              1    Wipley, Social Gaming Platform
                   http://www.wipley.com
               2   Universidad Europea de Madrid
                   http://www.esp.uem.es/gsi
              3    Optenet
                   http://www.esp.uem.es/gsi
Outline

   The MIRCAT project
   The challenge
   English MetaMap, a big effort
   Approaching a Spanish MetaMap
   Experiments
   Discussion of the Results and Future Work
                                               Francisco Carrero Garcia
The MIRCAT Project
The Interface




                     Francisco Carrero Garcia
The MIRCAT Project
System¡¯s Architecture




                        Francisco Carrero Garcia
The Challenge
Our Goal




                            English docs




           Medical record


                            Spanish docs

                                           Francisco Carrero Garcia
The Challenge
The problem




     We can extract UMLS concepts from English texts using
     MetaMap...
     ...but there is no Spanish version of MetaMap
     Is it dif?cult to construct a tool like MetaMap?


                                                        Francisco Carrero Garcia
English MetaMap
A big Effort




                  ¡«3 years!!

                        Francisco Carrero Garcia
Approaching Spanish MetaMap
Two Main Approaches Considered




                                 Francisco Carrero Garcia
Approaching Spanish MetaMap
Our Approach: Translation and Reuse




                    Optional



                                      Francisco Carrero Garcia
Experimental Design
Text Collections


      MedLine Plus medical News
          http://www.nlm.nih.gov/medlineplus/newsbydate.html
          Excellent online resource
          2000 news, some in English, some in Spanish
          600 available in both languages

                                                        Francisco Carrero Garcia
Experiments
Experimental Design

     MetaMap extracts concepts, allowing multiple representations
         A => Using compound concepts
         B => simple concepts
         1 => resolves ambiguity by adding all the concepts
         2 => ignores ambiguities by choosing the ?rst possibility
         4 representations: A1, A2, B1, B2
                                                       Francisco Carrero Garcia
Experiments
Filtering


      Data representations containing a lot of features do not usually
      perform very well in text tasks
      Many classi?ers degrade in prediction accuracy when faced with
      many irrelevant features or redundant/correlated ones (¡°curse
      of dimensionality¡±)
      We apply Zipf¡¯s Law to ?lter the attributes

                                                        Francisco Carrero Garcia
Experiments Results
Number of concepts for each representation




                                             Francisco Carrero Garcia
Experiments Results
Average Similarities




                       Francisco Carrero Garcia
Experiments Results
Last Experiments (not in IDEAL paper)




                                        Francisco Carrero Garcia
Discussion of the Results
Translation

      The worst results (similarity) are achieved with the most
      complex (near to humans) representation: A1
      B1 is less complex and produces the best results
      => Our model seems to be more suitable as a plain bag-of-
      concepts representation
         Similar to bag-of-words representation, widely used in text
         processing tasks
                                                         Francisco Carrero Garcia
Discussion of the Results
Classi?cation


      All results are comparable to classi?cation on original English
      texts
      In some cases, are even better
      Best results using A2+Zipf, +7.8% in AUC
      UNMKD representations never achieves worse classi?cations than
      English

                                                         Francisco Carrero Garcia
Conclussions and Future Work

   The ¡°easy way¡± to construct a Spanish MetaMap is promising
   Google Translation seems a good tool to adapt English resources
   to any other languages (like Spanish)
   We should try other translation tools
   We are working on applying this approach to other text tasks
   (like Information Retrieval and Filtering)

                                                    Francisco Carrero Garcia
Ending...




   Thank you very much for your attention




                                            Francisco Carrero Garcia
Any Question?




                Francisco Carrero Garcia

More Related Content

Presentaci¨®n en IDEAL 2008

  • 1. Building a Spanish MMTx by using Automatic Translation and Biomedical Ontologies Francisco Carrero 1,2 ; Jos¨¦ Carlos Cortizo 1,2 ; Jos¨¦ M? G¨®mez 3 1 Wipley, Social Gaming Platform http://www.wipley.com 2 Universidad Europea de Madrid http://www.esp.uem.es/gsi 3 Optenet http://www.esp.uem.es/gsi
  • 2. Outline The MIRCAT project The challenge English MetaMap, a big effort Approaching a Spanish MetaMap Experiments Discussion of the Results and Future Work Francisco Carrero Garcia
  • 3. The MIRCAT Project The Interface Francisco Carrero Garcia
  • 4. The MIRCAT Project System¡¯s Architecture Francisco Carrero Garcia
  • 5. The Challenge Our Goal English docs Medical record Spanish docs Francisco Carrero Garcia
  • 6. The Challenge The problem We can extract UMLS concepts from English texts using MetaMap... ...but there is no Spanish version of MetaMap Is it dif?cult to construct a tool like MetaMap? Francisco Carrero Garcia
  • 7. English MetaMap A big Effort ¡«3 years!! Francisco Carrero Garcia
  • 8. Approaching Spanish MetaMap Two Main Approaches Considered Francisco Carrero Garcia
  • 9. Approaching Spanish MetaMap Our Approach: Translation and Reuse Optional Francisco Carrero Garcia
  • 10. Experimental Design Text Collections MedLine Plus medical News http://www.nlm.nih.gov/medlineplus/newsbydate.html Excellent online resource 2000 news, some in English, some in Spanish 600 available in both languages Francisco Carrero Garcia
  • 11. Experiments Experimental Design MetaMap extracts concepts, allowing multiple representations A => Using compound concepts B => simple concepts 1 => resolves ambiguity by adding all the concepts 2 => ignores ambiguities by choosing the ?rst possibility 4 representations: A1, A2, B1, B2 Francisco Carrero Garcia
  • 12. Experiments Filtering Data representations containing a lot of features do not usually perform very well in text tasks Many classi?ers degrade in prediction accuracy when faced with many irrelevant features or redundant/correlated ones (¡°curse of dimensionality¡±) We apply Zipf¡¯s Law to ?lter the attributes Francisco Carrero Garcia
  • 13. Experiments Results Number of concepts for each representation Francisco Carrero Garcia
  • 14. Experiments Results Average Similarities Francisco Carrero Garcia
  • 15. Experiments Results Last Experiments (not in IDEAL paper) Francisco Carrero Garcia
  • 16. Discussion of the Results Translation The worst results (similarity) are achieved with the most complex (near to humans) representation: A1 B1 is less complex and produces the best results => Our model seems to be more suitable as a plain bag-of- concepts representation Similar to bag-of-words representation, widely used in text processing tasks Francisco Carrero Garcia
  • 17. Discussion of the Results Classi?cation All results are comparable to classi?cation on original English texts In some cases, are even better Best results using A2+Zipf, +7.8% in AUC UNMKD representations never achieves worse classi?cations than English Francisco Carrero Garcia
  • 18. Conclussions and Future Work The ¡°easy way¡± to construct a Spanish MetaMap is promising Google Translation seems a good tool to adapt English resources to any other languages (like Spanish) We should try other translation tools We are working on applying this approach to other text tasks (like Information Retrieval and Filtering) Francisco Carrero Garcia
  • 19. Ending... Thank you very much for your attention Francisco Carrero Garcia
  • 20. Any Question? Francisco Carrero Garcia