際際滷

際際滷Share a Scribd company logo
Moving From Noise to Signal Semantic Web
Agenda Introduction Semantic Web What is Semantic Web? Why it matters? How to Semantify the Web? Web 3.0 Linked Data
Introduction 1.3+ billion people connected to the web 2006  161 EB of information created/replicated  (1 EB = 1 billion GB) Technical information doubled every 2 years By 2010  six times to 988Eb (approx = 1 ZB) Technical information will double every 72 hours Computers, mobile phones, intelligent devices  Internet is broken  not one web  unable to communicate
Information Overload Is that really how the Web experience is supposed to feel?  Key Problem  how to share meaning? Filtering, not aggregating.  Not more, just smarter.
Semantics? Related to Syntax Syntax  How you say something (letters, punctuation, grammar) eg. HTML Semantics  Meaning behind what you say Example: I Love Technology I  Technology
Whats the big deal? Internet std way to communicate Parrot  mimic w/o understanding The Web Store and retrieve docs on the internet syntax to display the doc (HTML) Search Engines Find any website that we want Life is good!!! Can we make it any better?? How??
The Answer  Semantic Web Understand the meaning behind webpages Web of Things vs Web of Documents Things can be ANYTHING  people, places, pets, events, music, movies, organizations. Not only identify these things but also relationships (Human-like!!!) Embed semantics in html docs  microformats, RDF Its not about the futureits about Today!!!
The Possibilities
油
Why Semantic Web? Spend less time searching Spend less time looking at things that do not matter Spend less time explaining what we want to computers Bottomline  improve the online experience!!!
Cartoon by  Geek and Poke
Its all about the noise Web 1.0:  Get  (hear & see) Noise Web 2.0:  Make  Noise Web 3.0:  Filter  the Noise Web 4.0: Going deaf.or  SmartNoise
Semantifying the Web - Approaches Bottom Up Annotating information in web pages with machine readable tags Technical Challenges Representational Complexity How to create  manual/automatic? How much can be transformed? Standard Issue Business Challenges Its primitive Consumer Value? How to market? Recent Wins: Yahoo search engine to support RDF, MF Dapper  automated annotation tool
Annotation Technologies  Trade-off between simplicity and completeness RDF Graph based  things, attributes, relationships Precise but complex Triple Microformats Uses specific CSS styles Compact Embedded in HTML gaining popularity because of their simplicity Popular microformats: hCard: describes personal and company contact information hReview: adds meta information to review pages hCalendar: used to describe events Limitations no way to described type hierarchies somewhat cryptic, because the focus is to keep the annotations to a minimum Flickr, Eventful, and LinkedIn
Semantifying the Web - Approaches Top Down Focused on leveraging information in existing web pages As  is NLP Tools (entity extraction) Calais & TextWise  APIs that recognize people, companies, places in docs Vertical Search Engines  ZoomInfo, Spock & Retrevo Dapper, BlueOrganizer, ClearForest  recognize objects in web pages & annotate them Yahoo! Shortcuts, Snap, Smartlinks  recognize objects in text and links Challenges Not 100% perfect, has ambiguities May not scale well
Map+ add-on for Firefox vertical search engine Spock
More Annotations    Structured Web    More Precise Top-Down
Web 3.0  =  Semantic Web  =  Linked Data Are They Equal??
油
Structured Data RDBMS Powerful and flexible Pre-defined relationships and usage of data Too constraining and too structured Schema changes are expensive Virtually impossible to make different DBs speak Linked Data Establish linkages at the data level(RDF) Bridges the gap between unstructured and structured data Does not add any semantic meaning to the information
Linked Data Medium for the semantic web  It does not create smart data, only enables it Relies on clean, granular, structured data Pre-Structured Pre-Connected
Further Reading RDF, OWL, Microformats, FOAF Linked Data Semantic APIs

More Related Content

Semantic Web

  • 1. Moving From Noise to Signal Semantic Web
  • 2. Agenda Introduction Semantic Web What is Semantic Web? Why it matters? How to Semantify the Web? Web 3.0 Linked Data
  • 3. Introduction 1.3+ billion people connected to the web 2006 161 EB of information created/replicated (1 EB = 1 billion GB) Technical information doubled every 2 years By 2010 six times to 988Eb (approx = 1 ZB) Technical information will double every 72 hours Computers, mobile phones, intelligent devices Internet is broken not one web unable to communicate
  • 4. Information Overload Is that really how the Web experience is supposed to feel? Key Problem how to share meaning? Filtering, not aggregating. Not more, just smarter.
  • 5. Semantics? Related to Syntax Syntax How you say something (letters, punctuation, grammar) eg. HTML Semantics Meaning behind what you say Example: I Love Technology I Technology
  • 6. Whats the big deal? Internet std way to communicate Parrot mimic w/o understanding The Web Store and retrieve docs on the internet syntax to display the doc (HTML) Search Engines Find any website that we want Life is good!!! Can we make it any better?? How??
  • 7. The Answer Semantic Web Understand the meaning behind webpages Web of Things vs Web of Documents Things can be ANYTHING people, places, pets, events, music, movies, organizations. Not only identify these things but also relationships (Human-like!!!) Embed semantics in html docs microformats, RDF Its not about the futureits about Today!!!
  • 9.
  • 10. Why Semantic Web? Spend less time searching Spend less time looking at things that do not matter Spend less time explaining what we want to computers Bottomline improve the online experience!!!
  • 11. Cartoon by Geek and Poke
  • 12. Its all about the noise Web 1.0: Get (hear & see) Noise Web 2.0: Make Noise Web 3.0: Filter the Noise Web 4.0: Going deaf.or SmartNoise
  • 13. Semantifying the Web - Approaches Bottom Up Annotating information in web pages with machine readable tags Technical Challenges Representational Complexity How to create manual/automatic? How much can be transformed? Standard Issue Business Challenges Its primitive Consumer Value? How to market? Recent Wins: Yahoo search engine to support RDF, MF Dapper automated annotation tool
  • 14. Annotation Technologies Trade-off between simplicity and completeness RDF Graph based things, attributes, relationships Precise but complex Triple Microformats Uses specific CSS styles Compact Embedded in HTML gaining popularity because of their simplicity Popular microformats: hCard: describes personal and company contact information hReview: adds meta information to review pages hCalendar: used to describe events Limitations no way to described type hierarchies somewhat cryptic, because the focus is to keep the annotations to a minimum Flickr, Eventful, and LinkedIn
  • 15. Semantifying the Web - Approaches Top Down Focused on leveraging information in existing web pages As is NLP Tools (entity extraction) Calais & TextWise APIs that recognize people, companies, places in docs Vertical Search Engines ZoomInfo, Spock & Retrevo Dapper, BlueOrganizer, ClearForest recognize objects in web pages & annotate them Yahoo! Shortcuts, Snap, Smartlinks recognize objects in text and links Challenges Not 100% perfect, has ambiguities May not scale well
  • 16. Map+ add-on for Firefox vertical search engine Spock
  • 17. More Annotations Structured Web More Precise Top-Down
  • 18. Web 3.0 = Semantic Web = Linked Data Are They Equal??
  • 19.
  • 20. Structured Data RDBMS Powerful and flexible Pre-defined relationships and usage of data Too constraining and too structured Schema changes are expensive Virtually impossible to make different DBs speak Linked Data Establish linkages at the data level(RDF) Bridges the gap between unstructured and structured data Does not add any semantic meaning to the information
  • 21. Linked Data Medium for the semantic web It does not create smart data, only enables it Relies on clean, granular, structured data Pre-Structured Pre-Connected
  • 22. Further Reading RDF, OWL, Microformats, FOAF Linked Data Semantic APIs