�ݺ�ߣ

Dealing with the “new” data in the
“Cloud” – Linked Data

London - New York - Dubai - Mumbai 2011

Table of Contents

Definitions 3
History 5
The Modigliani Test 11
Link Data 13
Raw Data 23
Resource Description Framework 30
Linked Data Principles 42
Publishing Linked Data 57
Faceted Browsers 65
On-the-fly Mashups 67
SPARQL 73
What is a Linked Data Application 77
Characteristics of a Linked Data Application 78
Contact Us 81

Definitions
RDF: The RDF data model is similar to classic conceptual
modelling approaches such as Entity-Relationship or Class
diagrams, as it is based upon the idea of making statements about
resources (in particular Web resources) in the form of subject-
predicate-object expressions. These expressions are known as
triples in RDF terminology. The subject denotes the resource, and
the predicate denotes traits or aspects of the resource and
expresses a relationship between the subject and the object. For
example, one way to represent the notion "The sky has the colour
blue" in RDF is as the triple: a subject denoting "the sky", a
predicate denoting "has the colour", and an object denoting "blue".
RDF is an abstract model with several serialization formats (i.e.,
file formats), and so the particular way in which a resource or
triple is encoded varies from format to format.

Definitions
SPARQL: (SPARQL Protocol and RDF Query Language,
pronounced "sparkle") is an RDF query language

Linked Data: Linked Data describes a method of publishing
structured data, so that it can be interlinked and become more
useful. It builds upon standard Web technologies, such as HTTP
and URIs - but rather than using them to serve web pages for
human readers, it extends them to share information in a way that
can be read automatically by computers. This enables data from
different sources to be connected and queried.

History

Linked Data Design Issues by Tim Berners-Lee July 2006
Linked Open Data Project WWW2007
First LOD Cloud May 2007
BBC publishes Linked Data 2008
NY Times announcement SemTech2009 - ISWC09
Data.gov.uk publishes Linked Data 2010

The Modigliani Test

 Show me all the locations of all the original paintings
of Modigliani
 Daniel Koller (@dakoller) showed that you can find
this with a SPARQL query on DBpedia

Peak cloud based data - linked data

Search for

Football Players who went to the University of
Texas at Austin, played for the Dallas Cowboys as
Cornerback

Why can’t we just FIND it…

Using the Current Web =internet + links + docs
is terribly inefficient

So what is the problem?
 We aren’t always interested in documents
• We are interested in THINGS
• These THINGS might be in documents
 We can read a HTML document rendered in a browser and find
what we are searching for
• This is hard for computers. It’s typically based on
guesswork from some primitive NLP engine, or simple
keyword search

What do we need to do?

Make it easy for computers/software to find THINGS

How can we do that?

• Besides publishing documents on the web
- which computers can’t understand easily
• Let’s publish something that computers can
understand

RAW DATA!
But don’t we already publish raw data in
RDBMS, XML, CSV, etc?

Yes!

But it’s not in a consistent format, and very
difficult to integrate (or “link”).

For example, how do I know that the
Wael Elrifai in Facebook is the same
as Wael Elrifai in Twitter

Don’t we already have a standard
way of publishing on the web?

We have a standardized way of
publishing documents on the web, right?
HTML

Then why can’t we have a standard way
of publishing data on the Web?

Resource Description Framework (RDF)
A data model
•A way to model data
•i.e. Relational databases use relational data model
RDF is a triple data model
Labeled Graph
Subject, Predicate, Object
<Wael> <was born in> <Beirut>
<Beirut> <is part of> <the Lebanon>
<Wael> <likes> <the Semantic Web>

RDF can be serialized in different ways

RDF/XML
RDFa (RDF in HTML)
N3
Turtle
JSON

So does that mean that I have to
publish my data in RDF now?

You don’t have to… but it sure
would be nice.

Databases back up documents
THINGS have PROPERTIES:
A Book as a Title, an author, …

Isbn Title Author PublisherID ReleasedData
978-0-596- Programming Toby Segaran 1 July 2009
15381-6 the Semantic
Web
… … … … …

PublisherID PublisherNa
This is a THING: me
A book title “Programming the
Semantic Web” by Toby Segaran, 1 O’Reilly
… Media
… …

Lets represent the data in RDF
Isbn Title Author PublisherID ReleasedData

978-0- Programming Toby 1 July 2009
596- the Semantic Segaran
15381- Web
6
Programming the
PublisherID PublisherName title Semantic Web
1 O’Reilly Media

author
book Toby Segaran

isbn 978-0-596-15381-6

publisher
Publisher O’Reilly
name

Remember that we are on the web

Everything on the web is identified by a URL

And now let’s link the data to other data

Programming the
Semantic Web
title

http://…/isbn Toby
author Segaran
978

978-0-596-15381-6
isbn

publisher
http://…/pu O’Reilly
blisher1 name

And now consider the data from Revyu.com

hasReview
http:// http://…/
…/revie isbn978
w1
description
reviewer
Awesom
e Book
name

http://… Wael
/reviewer Elrifai

Let’s start to link data

http:// hasReview http://…/
…/revie isbn978
w1 Programming
description title the Semantic
hasReviewer sameAs Web

Awesom http:// author Toby
e Book …/isbn9
Segaran
78
http://
…/revie name
wer isbn
978-0-596-15381-6
Wael publisher
Elrifai http://…/ name
publisher1 O’Reilly

Data on the Web that is in RDF and
is linked to other RDF data is
LINKED DATA

Linked Data Principles

1. Use URIs as names for things
2. Use HTTP URIs so that people can look up
(dereference) those names.
3. When someone looks up a URI, provide
useful information.
4. Include links to other URIs so that they can
discover more things.

Linked Data makes the web appear
a single global database!
The same can be done inside your company!

What if you wanted to know your company’s
EBITDA for Catalonia in 2010?

You could have a EDW pre-aggregate and
distribute the data, an analyst calculate it on
the spot, or…

Linked data in your internal semantic
web could relate all transactions to a
linked financial formulae!

You ask the question, tell your system
where to look (as part of the question,
this can be prebuilt) and voilà!

I can query a database with SQL. Is
there a way to query Linked Data with a
query language?

Yes! There is actually a standardize
language for that

FIND all the reviews on the book
“Programming the Semantic Web”
by people who live in London

hasReview http://…/
http://…/ Programming
isbn978 the Semantic
review1
Web
description title
hasReviewer sameAs
Toby
Awesom http:// Segaran
author
e Book …/isbn9
78
http://… 978-0-596-15381-6
/reviewer name
isbn

sameAs Wael publisher http://…
Elrifai name O’Reilly
/publishe
r1
http://waelw
orldwide.com livesIn http://dbpedia.org/London
name Wael Elrifai

This looks cool, but let’s be realistic.
What is the incentive to publish
Linked Data?

What was your incentive to publish
an HTML (Intranet) page in 1990?

1) Share data in documents
2) Because you neighbor was doing it

So why should we publish
Linked Data in 2011?

1) Share data as data
2) Because you neighbor is doing it

You’ll be among good company…

Linked Data Publishers
UK Government
US Government
BBC
Open Calais – Thomson Reuters
Freebase
NY Times
Best Buy
CNET
Dbpedia

How can I publish Linked Data?

Publishing Linked Data
• Legacy Data in Relational Databases
• D2R Server
• Virtuoso
• Triplify
• Ultrawrap
• CMS
• Drupal 7
• Native RDF Stores
• Databases for RDF (Triple Stores)
• AllegroGraph, Jena, Sesame, Virtuoso
• Talis Platform (Linked Data in the Cloud)
• In HTML with RDFa

Consuming Linked Data by Humans

HTML Browsers
RDF can be serialized in RDFa
Have you heard of
•Yahoo’s Search Monkey
•Google Rich Snippets?
They are consuming RDFa
But WHY?

Because there is life beyond ten
blue links

Google and Yahoo are starting to crawl
RDFa!

The Semantic Web is a reality!

The Reality

•Yahoo is crawling data that is in RDFa and
Microformats under a specific vocabularies
• FOAF
• GoodRelations

• Google is crawling RDFa and Microformaats that
use the Google vocabulary

Linked Data Browsers

Tabulator
•http://www.w3.org/2005/ajar/tab
OpenLink
•http://ode.openlinksw.com/
Zitgist Dataviewr
•http://dataviewer.zitgist.com/
Marbles
•http://www5.wiwiss.fu-berlin.de/marbles/
Explorator
•http://www.tecweb.inf.puc-rio.br/explorator

http://dev.semsol.com/2010/semtech/

Time to create new and innovative
ways to interact with Linked Data

This may be one of the Killer Apps that we have all been
waiting for

http://en.wikipedia.org/wiki/File:Mosaic_browser_plaque_ncsa.jpg

Where can I find SPARQL Endpoints?

Dbpedia:
http://dbpedia.org/sparql
Musicbrainz: http://dbtune.org/musicbrainz/sparql
U.S. Census:
http://www.rdfabout.com/sparql
Semantic Crunchbase: http://cb.semsol.org/sparql
http://esw.w3.org/topic/SparqlEndpoints

• Querying a single dataset is quite boring
compared to:
• Issuing SPARQL queries over multiple datasets

• How can you do this?
1. Issue follow-up queries to different endpoints
2. Querying a central collection of datasets
3. Build store with copies of relevant datasets
4. Use query federation system

Follow-up Queries

• Idea: issue follow-up queries over other
datasets based on results from previous
queries
• Substituting placeholders in query templates

Getting Started

• Finding URIs
• Finding Additional Data
• Finding SPARQL Endpoints

What is a Linked Data application

Software system that makes use of data on the
web from multiple datasets AND that benefits
from links between the datasets

Characteristics of Linked Data Applications

• Consume data that is published on the web following
the Linked Data principles
• Discover further information by following the links
between different data sources
• Combine the consumed linked data with data from
sources (not necessarily Linked Data)
• Expose the combined data back to the web
following the Linked Data principles
• Offer value to end-users

Examples

• http://data-gov.tw.rpi.edu/wiki
• http://dbrec.net/
• http://fanhu.bz/
• http://data.nytimes.com/schools/schools.html
• http://sig.ma
• http://visinav.deri.org/semtech2010/

Hot Research Topics

• Interlinking Algorithms
• Provenance and Trust
• Dataset Dynamics
• UI
• Distributed Query

Contact

PEAK Consulting United States United Arab Emirates
Headquarters
11 Penn Plaza, 5th floor Unit P12 Rimal, The
90 Long Acre, Covent Garden New York, NY 1000 Walk
London WC2E 9RZ United States PO Box 487 177 Dubai
United Kingdom United Arab Emirates
Tel: +1 (212) 946 4824
Tel: +44 (0)207 849 3422 Fax: +1 (212) 946 2801 Tel: +44 (0)207 849
Fax: +44 (0)207 990 9478 3422
Fax: +44 (0)207 990
9478

http://www.peakconsulting.eu
info@peakconsulting.eu

�ݺ�ߣ

Peak cloud based data - linked data

Recommended

More Related Content

Similar to Peak cloud based data - linked data (20)

Recently uploaded (20)

Peak cloud based data - linked data