際際滷

際際滷Share a Scribd company logo
Pasado, presente y futuro
de la b炭squeda de literatura cient鱈鍖ca



                             Ram坦n Alonso-足Allende
y futuro
Pasado, presente

de la b炭squeda de literatura cient鱈鍖ca



                             Ram坦n Alonso-足Allende
谷簡棚                棚~

                                                Future
           Science Cicle
                                                Search =


                                        Today
                                                Integration + Meaning + Social
誰棚叩鱈               単辿棚叩達奪鱈




                                                                 Relevance


                                                  Value system
                       2000s
  1990s
                                                                 + Complete
                                                                 + Easy
                                                                 -足 Time
Sistemas de informaci坦n




1995          2000          2005   2010
Charla en el CBM
Searches in PubMed
                   1.000.000
Searches (1000s)




                    750.000




                    500.000




                    250.000




                          0
                               1997   1998   1999   2000   2001   2002   2003   2004   2005   2006   2007   2008
Retos
Retos

 Manejar cantidades ingentes de informaci坦n.
 Ambig端edad del lenguaje.
 Tiempo.
 Mantenerse al d鱈a.


                                                jordinho_dp
Mucha informaci坦n heterogenea
80.000.000




60.000.000




40.000.000




20.000.000




        0
             92


                  93


                       94


                            95


                                 96


                                      97


                                            98


                                                   99


                                                        00


                                                               01


                                                                    02


                                                                           03


                                                                                 04


                                                                                      05


                                                                                           06


                                                                                                07
         19


                  19


                       19


                            19


                                 19


                                      19


                                            19


                                                  19


                                                        20


                                                             20


                                                                    20


                                                                         20


                                                                                 20


                                                                                      20


                                                                                           20


                                                                                                20
                                       GB        PDB     Medline     SwissProt
43% Genes humanos
tienen nombres ambiguos
Algunos datos




                               N炭mero de t辿rminos
                                                    4.000
 5.892 t辿rminos pueden ser
 genes o enfermedades                               3.000


 3.963 nombres hacen                               2.000

 referencia a 2 genes                               1.000
 diferentes
                                                       0
 Un t辿rmino hace referencia                                2 3 4 5 6 7 8 9
                                                             N炭mero de conceptos
 a 114 genes
                                                            Disease       Genes
                                                            Drugs
Algunos Ejemplos
                  sps                           AAt1
stiff-足man syndrome              annuloaortic ectasia 
(Diseases or Syndromes)          (Diseases or Syndromes)

polystyrene sulfonate           alanine aminotransferase 
(Pharmacological substances)     (Genes and Proteins)

systolic blood pressure 
(Biological functions)

spermine synthase
(Genes and Proteins)
Language ambiguity
  p坦奪o奪坦達谷                     eo達o奪坦達谷                     ^棚o奪坦達
                                                               R狸=誰o棚=
 a叩棚奪鱈=誰o棚=o棚=鱈=   p~達=奪~達=o棚=叩棚奪鱈=
                                                              棚辿棚谷奪鱈叩奪=~=
 谷~達=叩o達叩~辰=奪鱈叩鱈坦         叩o達叩~辰=奪鱈叩鱈叩谷
                                                            叩o達叩~辰=奪鱈叩鱈坦

                               p坦達o辰=m^m=叩谷=~奪=~辰叩~谷=
 f奪=e狸達~奪=鱈棚=~棚=~鱈=
                                       o棚W
 辰~谷鱈=RKQNU=奪谷=誰叩鱈=
                           m^m=Em~奪棚~鱈叩鱈叩谷J               p`q=谷鱈~奪谷=o棚W
  谷坦奪o奪坦達谷=EPUB=o=鱈=
                          ~谷谷o叩~鱈=辿棚o鱈叩奪F             p鱈達=`辰辰=q棚~奪谷辿辰~奪鱈
      鱈o鱈~辰=奪o達F
                         jRmpPM=Ej叩鱈oo=                     p棚鱈叩奪
      a棚狸谷=~樽=~=
                          棚叩o谷o達~辰=辿棚o鱈=PM谷F             p~辰達o奪=~辰叩鱈o奪叩奪
 o達達棚叩~辰=奪~達=~奪=~=
                            m^mli^=Emo辰坦^=
     達叩~辰=奪~達
                           辿o辰叩達棚~谷=~辰辿~F
Inmanejable
 More than 25 MM documents considering scienti鍖c
  articles, grants, biomedical patents relevant sources of
  information for biomedical researchers.

 2,000 new scienti鍖c papers published everyday
 5 years to read the new scienti鍖c material produced
  every 24 hours.

 Scan 130 journals and read 27 articles per day to follow a
  single disease, like breast cancer.
Mantenerse al d鱈a


Alertas en buscadores
emailling eTOCs
Feeds RSS
Search tasks & Lab work by discipline
                    80%

                    70%

                    60%
           % time




                    50%

                    40%

                    30%

                    20%

                    10%

                     0%
                                    All




                                                     Biochemestry




                                                                      Mol. & Cell Biol.




                                                                                          Genetics




                                                                                                         Biotechnology




                                                                                                                          Bioinfromatics




                                                                                                                                              Medicine




                                                                                                                                                            Other
               Searchin literature                                  Searching data form DB                                                 Working in the lab
Roos, A., Kumpulainen, S., J辰rvelin, K and Hedlund, T. (2008). "The information environment of researchers in molecular medicine" Information Research, 13(3) paper 353.
[Available at http://InformationR.net/ir/13-足3/paper353.html]
C坦mo afrontamos retos
Afrontamos los retos:

 Integrando informaci坦n para el usuarios.
 Analizando el texto (text mining).
   Funcionalidad 炭til.

 Tecnolog鱈a + Interfaz sencillo = - Tiempo
Integraci坦n de datos
Sequence DBs     Pathway DBs   Other DBs
UniProt          KEGG          Affymetrix
GenBank          EC            GO
RefSeq           Reactome      PDB
PIR                            MIM
EMBL             Domain DBs    CCDS
Entrez Protein   Pfam          HPRD
UniSTS           PROSITE       HGNC
                 SMART
Gene DBs         ProDom
GDB              InterPro
Ensembl
Entrez Gene
UniGene
H-足InvDB
MGC
HGNC
Text mining
Gene: GH1              Gene: GG1
Growth Hormone 1       Gamma Glutamyl Hydrolase
GeneID: 2688           GeneID: 8836

Synonym: GHN           Synonym: conjugasa
Synonym: GH            Synonym: GH


adenoma (0.300)         antifolate (2.850)
adipocyte (0.418)       carboxypeptidase (12.618)
adipose (0.324)         folate (0.674)
age-related (0.442)     gamma-glu-x (15.452)
genotropin (19.368)     antifolylpoly-gamma-glutamate (12.054)
Datos indexados

Medline      Open access          Proyectos I+D
abstracts   Texto completo          abstracts

  NU=j         NQRKMMM               NIR=j


             [=OMM=j=棚辰~叩o奪谷

              [=Qj=o奪辿鱈o谷
Comparison: Use-足Case:
Looking for the gene SCT
  PubMed: SCT is Solid-足 Cystic tumor

                Google Scholar: SCT is name of author


                            novo|seek: SCT is
                            meaning you are looking for:
                            -Secretine
                            -Stem Cell transplantation
novo|seek vs. Google Scholar




doo辰=po辰~棚W=奪o=誰~坦=鱈o=o狸谷=鱈=谷簡棚=坦o奪=棚~叩奪W=
                    鱈叩達Jo奪谷狸達叩奪
Techonology
      Search more efficiently.
      Extract more information.
      Put into relation different sources of information
      Gain time
                             Semantic Search
                                                 Discovery

                Concept relations


                                    Knowledge Extraction


by L cornide
Semantic Search

          Conceptual search
               e.g. Search of breast cancer
                    Detection of breast carcinoma cells in effusions is associated with rapidly fatal outcome
                    Women who do not receive regular mammograms are more likely than others to have breast cancer
                    diagnosed at an advanced stage
                    [] thereby providing higher cytotoxicity against the 4T1 mouse mammary carcinoma cell line

                   All of this keywords are referred to the same biomedical concept, a search by breast cancer
                   will retrieve this three documents

          Use of context and semantic information to identify the relevant information
               e.g. Search of CAT, that could be referred to the enzyme Catalase or to the animal, cat.
                 [..] activity of antioxidant enzymes (GSH-足Px, SOD, CAT) and content of malondialdehyde (MDA) were
                       determined
                 [] 26 free-足living lynx, 53 domestic cats, 28 dogs, 33 red foxes (Vulpes vulpes) []

                 The same keyword is referred to different biomedical concepts. Using the context, we can
                 identify that only the first sentence talks about an enzyme


by L cornide
Concept Relations


     e.g. Search for Alzheimers Disease
          The apolipoprotein E gene (APOE) polymorphism genotyping has an allegedly important predictive value
          for coronary heart disorders and Alzheimer's disease.
          Apolipoprotein E (apoE), a ligand for the low-足density lipoprotein receptor family, has been implicated in
          modulating glial inflammatory responses and the risk of neurodegeneration associated with Alzheimer's
          disease.
          Although many genes have been suggested to be associated with AD, with the exception of APOE, most
          polymorphic variants of potential risk exhibit a very weak association with AD

               The protein apolipoprotein E and Alzheimer disease are related with a relevance of 36%




by L cornide
Knowledge Extraction

         Based on the detected relations between concepts, we can extract automatically
          knowledge from text

          e.g. Obtain the knowledge about Breast cancer, extracted from literature
               [] BRCA1 or BRCA2 [] Information was recorded on prophylactic mastectomy, prophylactic
               oophorectomy, use of tamoxifen [..] had a bilateral prophylactic oophorectomy. [] breast cancer, 248
               (18.0%) had had a prophylactic bilateral mastectomy. Among those who did not have a prophylactic
               mastectomy, only 76 women (5.5%) took tamoxifen and 40 women (2.9%) took raloxifene for breast
               cancer prevention. [].


          Genes BRCA1 and BRCA2 are related with breast cancer. Tamoxifen and Raloxifene are drugs
          used in its treatment, and mastectomy and oophorectomy are usual procedures to treat it.




by L cornide
Make new Discoveries

         Discover hidden relations between concepts that have not been described
          before in the scientific literature

           e.g. Obtain the knowledge about Breast cancer, extracted from literature
                [] meal fatty acids appear to be an important determinant of vascular reactivity, with fish oils
                significantly improving postprandial endothelium-足independent vasodilation
                Numerous studies have documented longer bleeding times and decreased platelet aggregation in
                subjects ingesting omega-足3 fatty acids
                vasomotor pain, in particular the fact of reactional vasodilation during Raynaud's syndrome,
                inflammation in the region surrounding zones of ischemic necrosis, and infection of ulcers
                Objective judgement on effects of medicine in patients with Raynaud's phenomenon-足-足measurement of
                cutaneous blood flow using laser Doppler flowmeter and platelet aggregation activity


           By finding evidence of a relation between fish oils and vasodilatation and platelet aggregation,
           and evidence in the link between these two functions and Raynauds syndrome, we can uncover
           a new discovery that was not described previously in the literature, the possible treatment of
           Raynauds Syndrome with fish oil.



by L cornide
Charla en el CBM
Charla en el CBM
Charla en el CBM
Charla en el CBM
Charla en el CBM
Charla en el CBM
Charla en el CBM
Charla en el CBM
El Futuro

   Informaci坦n estructurada.

   Identi鍖cador de usuario.

   El art鱈culo del futuro.

   B炭squeda social.
http://beta.cell.com/erickson/
Collective
                                              Social Search
                                                                                     Collaborative
                                                                                         Q&A




                 Friend-足Filtered
http://www.readwriteweb.com/archives/3_flavors_of_social_search_what_to_expect.php
Beta testers

  Colaboraci坦n en el desarrollo de uno de los
  principales buscadores biom辿dicos en el mercado.

  Acceso a los 炭ltimas actualizaciones de nuestro
  buscador.

  Regalo seguro.

www.novoseek.com/betatesters.html
Contacto


           Ram坦n Alonso-足Allende
           Marketing & Business Development
           allende@bioalma.com
           Phone: +34 91 141 71 50

More Related Content

Charla en el CBM

  • 1. Pasado, presente y futuro de la b炭squeda de literatura cient鱈鍖ca Ram坦n Alonso-足Allende
  • 2. y futuro Pasado, presente de la b炭squeda de literatura cient鱈鍖ca Ram坦n Alonso-足Allende
  • 3. 谷簡棚 棚~ Future Science Cicle Search = Today Integration + Meaning + Social 誰棚叩鱈 単辿棚叩達奪鱈 Relevance Value system 2000s 1990s + Complete + Easy -足 Time
  • 6. Searches in PubMed 1.000.000 Searches (1000s) 750.000 500.000 250.000 0 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008
  • 8. Retos Manejar cantidades ingentes de informaci坦n. Ambig端edad del lenguaje. Tiempo. Mantenerse al d鱈a. jordinho_dp
  • 9. Mucha informaci坦n heterogenea 80.000.000 60.000.000 40.000.000 20.000.000 0 92 93 94 95 96 97 98 99 00 01 02 03 04 05 06 07 19 19 19 19 19 19 19 19 20 20 20 20 20 20 20 20 GB PDB Medline SwissProt
  • 10. 43% Genes humanos tienen nombres ambiguos
  • 11. Algunos datos N炭mero de t辿rminos 4.000 5.892 t辿rminos pueden ser genes o enfermedades 3.000 3.963 nombres hacen 2.000 referencia a 2 genes 1.000 diferentes 0 Un t辿rmino hace referencia 2 3 4 5 6 7 8 9 N炭mero de conceptos a 114 genes Disease Genes Drugs
  • 12. Algunos Ejemplos sps AAt1 stiff-足man syndrome annuloaortic ectasia (Diseases or Syndromes) (Diseases or Syndromes) polystyrene sulfonate alanine aminotransferase (Pharmacological substances) (Genes and Proteins) systolic blood pressure (Biological functions) spermine synthase (Genes and Proteins)
  • 13. Language ambiguity p坦奪o奪坦達谷 eo達o奪坦達谷 ^棚o奪坦達 R狸=誰o棚= a叩棚奪鱈=誰o棚=o棚=鱈= p~達=奪~達=o棚=叩棚奪鱈= 棚辿棚谷奪鱈叩奪=~= 谷~達=叩o達叩~辰=奪鱈叩鱈坦 叩o達叩~辰=奪鱈叩鱈叩谷 叩o達叩~辰=奪鱈叩鱈坦 p坦達o辰=m^m=叩谷=~奪=~辰叩~谷= f奪=e狸達~奪=鱈棚=~棚=~鱈= o棚W 辰~谷鱈=RKQNU=奪谷=誰叩鱈= m^m=Em~奪棚~鱈叩鱈叩谷J p`q=谷鱈~奪谷=o棚W 谷坦奪o奪坦達谷=EPUB=o=鱈= ~谷谷o叩~鱈=辿棚o鱈叩奪F p鱈達=`辰辰=q棚~奪谷辿辰~奪鱈 鱈o鱈~辰=奪o達F jRmpPM=Ej叩鱈oo= p棚鱈叩奪 a棚狸谷=~樽=~= 棚叩o谷o達~辰=辿棚o鱈=PM谷F p~辰達o奪=~辰叩鱈o奪叩奪 o達達棚叩~辰=奪~達=~奪=~= m^mli^=Emo辰坦^= 達叩~辰=奪~達 辿o辰叩達棚~谷=~辰辿~F
  • 14. Inmanejable More than 25 MM documents considering scienti鍖c articles, grants, biomedical patents relevant sources of information for biomedical researchers. 2,000 new scienti鍖c papers published everyday 5 years to read the new scienti鍖c material produced every 24 hours. Scan 130 journals and read 27 articles per day to follow a single disease, like breast cancer.
  • 15. Mantenerse al d鱈a Alertas en buscadores emailling eTOCs Feeds RSS
  • 16. Search tasks & Lab work by discipline 80% 70% 60% % time 50% 40% 30% 20% 10% 0% All Biochemestry Mol. & Cell Biol. Genetics Biotechnology Bioinfromatics Medicine Other Searchin literature Searching data form DB Working in the lab Roos, A., Kumpulainen, S., J辰rvelin, K and Hedlund, T. (2008). "The information environment of researchers in molecular medicine" Information Research, 13(3) paper 353. [Available at http://InformationR.net/ir/13-足3/paper353.html]
  • 18. Afrontamos los retos: Integrando informaci坦n para el usuarios. Analizando el texto (text mining). Funcionalidad 炭til. Tecnolog鱈a + Interfaz sencillo = - Tiempo
  • 19. Integraci坦n de datos Sequence DBs Pathway DBs Other DBs UniProt KEGG Affymetrix GenBank EC GO RefSeq Reactome PDB PIR MIM EMBL Domain DBs CCDS Entrez Protein Pfam HPRD UniSTS PROSITE HGNC SMART Gene DBs ProDom GDB InterPro Ensembl Entrez Gene UniGene H-足InvDB MGC HGNC
  • 20. Text mining Gene: GH1 Gene: GG1 Growth Hormone 1 Gamma Glutamyl Hydrolase GeneID: 2688 GeneID: 8836 Synonym: GHN Synonym: conjugasa Synonym: GH Synonym: GH adenoma (0.300) antifolate (2.850) adipocyte (0.418) carboxypeptidase (12.618) adipose (0.324) folate (0.674) age-related (0.442) gamma-glu-x (15.452) genotropin (19.368) antifolylpoly-gamma-glutamate (12.054)
  • 21. Datos indexados Medline Open access Proyectos I+D abstracts Texto completo abstracts NU=j NQRKMMM NIR=j [=OMM=j=棚辰~叩o奪谷 [=Qj=o奪辿鱈o谷
  • 22. Comparison: Use-足Case: Looking for the gene SCT PubMed: SCT is Solid-足 Cystic tumor Google Scholar: SCT is name of author novo|seek: SCT is meaning you are looking for: -Secretine -Stem Cell transplantation
  • 23. novo|seek vs. Google Scholar doo辰=po辰~棚W=奪o=誰~坦=鱈o=o狸谷=鱈=谷簡棚=坦o奪=棚~叩奪W= 鱈叩達Jo奪谷狸達叩奪
  • 24. Techonology Search more efficiently. Extract more information. Put into relation different sources of information Gain time Semantic Search Discovery Concept relations Knowledge Extraction by L cornide
  • 25. Semantic Search Conceptual search e.g. Search of breast cancer Detection of breast carcinoma cells in effusions is associated with rapidly fatal outcome Women who do not receive regular mammograms are more likely than others to have breast cancer diagnosed at an advanced stage [] thereby providing higher cytotoxicity against the 4T1 mouse mammary carcinoma cell line All of this keywords are referred to the same biomedical concept, a search by breast cancer will retrieve this three documents Use of context and semantic information to identify the relevant information e.g. Search of CAT, that could be referred to the enzyme Catalase or to the animal, cat. [..] activity of antioxidant enzymes (GSH-足Px, SOD, CAT) and content of malondialdehyde (MDA) were determined [] 26 free-足living lynx, 53 domestic cats, 28 dogs, 33 red foxes (Vulpes vulpes) [] The same keyword is referred to different biomedical concepts. Using the context, we can identify that only the first sentence talks about an enzyme by L cornide
  • 26. Concept Relations e.g. Search for Alzheimers Disease The apolipoprotein E gene (APOE) polymorphism genotyping has an allegedly important predictive value for coronary heart disorders and Alzheimer's disease. Apolipoprotein E (apoE), a ligand for the low-足density lipoprotein receptor family, has been implicated in modulating glial inflammatory responses and the risk of neurodegeneration associated with Alzheimer's disease. Although many genes have been suggested to be associated with AD, with the exception of APOE, most polymorphic variants of potential risk exhibit a very weak association with AD The protein apolipoprotein E and Alzheimer disease are related with a relevance of 36% by L cornide
  • 27. Knowledge Extraction Based on the detected relations between concepts, we can extract automatically knowledge from text e.g. Obtain the knowledge about Breast cancer, extracted from literature [] BRCA1 or BRCA2 [] Information was recorded on prophylactic mastectomy, prophylactic oophorectomy, use of tamoxifen [..] had a bilateral prophylactic oophorectomy. [] breast cancer, 248 (18.0%) had had a prophylactic bilateral mastectomy. Among those who did not have a prophylactic mastectomy, only 76 women (5.5%) took tamoxifen and 40 women (2.9%) took raloxifene for breast cancer prevention. []. Genes BRCA1 and BRCA2 are related with breast cancer. Tamoxifen and Raloxifene are drugs used in its treatment, and mastectomy and oophorectomy are usual procedures to treat it. by L cornide
  • 28. Make new Discoveries Discover hidden relations between concepts that have not been described before in the scientific literature e.g. Obtain the knowledge about Breast cancer, extracted from literature [] meal fatty acids appear to be an important determinant of vascular reactivity, with fish oils significantly improving postprandial endothelium-足independent vasodilation Numerous studies have documented longer bleeding times and decreased platelet aggregation in subjects ingesting omega-足3 fatty acids vasomotor pain, in particular the fact of reactional vasodilation during Raynaud's syndrome, inflammation in the region surrounding zones of ischemic necrosis, and infection of ulcers Objective judgement on effects of medicine in patients with Raynaud's phenomenon-足-足measurement of cutaneous blood flow using laser Doppler flowmeter and platelet aggregation activity By finding evidence of a relation between fish oils and vasodilatation and platelet aggregation, and evidence in the link between these two functions and Raynauds syndrome, we can uncover a new discovery that was not described previously in the literature, the possible treatment of Raynauds Syndrome with fish oil. by L cornide
  • 37. El Futuro Informaci坦n estructurada. Identi鍖cador de usuario. El art鱈culo del futuro. B炭squeda social.
  • 39. Collective Social Search Collaborative Q&A Friend-足Filtered http://www.readwriteweb.com/archives/3_flavors_of_social_search_what_to_expect.php
  • 40. Beta testers Colaboraci坦n en el desarrollo de uno de los principales buscadores biom辿dicos en el mercado. Acceso a los 炭ltimas actualizaciones de nuestro buscador. Regalo seguro. www.novoseek.com/betatesters.html
  • 41. Contacto Ram坦n Alonso-足Allende Marketing & Business Development allende@bioalma.com Phone: +34 91 141 71 50