
Dean Allemang
(870) Semantic Mash-ups using RDF, RSS and Microformats
Peer-Refereed Talk
Tuesday, 2007-06-26, 13:40 - 14:20, Arena 6
- Dean Allemang - TopQuadrant Inc. (speaker)
- Holger Knublauch
- Willie Milnor
Topics
Abstract
The name "mash-up" started out as a reference in music, where two or
more music sources were brought together into a single work. All too often, a
web mash-up refers to a site that takes information from a single web site and
displays it in a novel way, like having Google Maps display all the coffeeshops
in a certain zip code. While this is a useful thing to do, it seems odd to call
it a mash-up, since there is only source for the information. But in order to
combine information from multiple sources and display it even in an open API
like Google Maps, requires a program that will translate from each source into
the API. The mashing-up is under program control, that is, under the control of
the programmer.
In this presentation we outline the idea of a semantic mash-up, where the
mash-up program is a model-driven architecture. This puts the structure of the
mash-up under model control, rather than program control. It is still necessary
to translate each information source into a semantic structure (i.e., RDF), but
once that has been done, the structure of the mash-up is specified by a model,
rather than by program code.
Seen this way, Semantic Mash-ups are an example of model-driven architecture
(MDA), where a program is specified by a model rathern than by program code. The
advantage of MDA versus conventional system construction by programming is that
modeling is ostensibly more accessible to a wider class of users than
programming. The holy grail of MDA is to empower "business users" to
construct systems the way they want to, without a need for intervention by a
programmer.
While MDA is an attractive idea in principle, often it turns out that the
process of modeling, which is supposed to be a non-technical activity, is just
as technical as writing a program in a general-purpose language. If Semantic
Mash-ups were just MDA in disguise, there would be no reason to pay them much
heed.
We argue, however, that because Semantic mash-ups do not attempt to provide
anything close to a general-purpose system construction capability, that it is a
lot easier for the process of modeling to be accessible to a general audience.
Describing how to combine information in a mash-up is a fairly simple modeling
task, one that actually can be done by user with fewer technical skills than are
required by a more ambitious general-purpose programming language.
The W3C standard language for sharing information on the semantic web, RDF, is
the ultimate mash-up language. It provides and elegant framwork to describe
information based on the global naming convention that is already in use
throughout the World Wide Web, the URI. Merging information from multiple
sources is a simple merging process in which all information about a particular
"resource" (URI) is brought together from multiple sources into one
place.
The idea of using RDF as the basis for a mash-up was pioneered by the MIT Simile
project in 2005. Simile tools could be used to convert information from a number
of sources into RDF, merge them into a single source, then display them in a
number of ways, including maps (using the Google Map API), faceted search,
timelines and graph displays.
The Simile tools do not take advantage of the other layers of the W3C semantic
web stack, in particular, RDFS and OWL. As we shall see, these tools are
indispensible for allowing a semantic model to describe how to combine
information from multiple sources, i.e., to making a semantic mash-up model.
In this presentation, we will describe a system we have built for enabling
semantic mash-ups. The system, call TopBraid, is based on the W3C standard
lanaguages for semantic modeling RDF, RDFS and OWL, and built using the Eclipse
platform. The system supports the following user tasks for constructing a
semantic mash-up:
1) Semantic Mash-up designers use the desktop interface TopBraid Composer to
descdribe what information should be combined together and in what way for a
semantic mash-up
2) Plug-in programmers build simple interfaces from RDF to a display plug-in
(e.g., Maps, Calendars, spreadsheets, timelines, etc.)
3) Content providers mark up information in a way that makes it more amenable to
mashing up (using microformats or RDFa)
The system is deployed using the Eclipse framework server-side; all plug-ins and
models that are available in the desktop environment are also available in the
delivery system. Any display that is available in the Eclipse rich client is
also available for presentation on a web browser (thin client).
Data source that are already in RDF compliant forms (e.g., RDFA, RSS 1.0,
GRDDL-enabled microformats or even RDF/XML itself) are imported directly into
TopBraid composer at mash-up design time. Other data sources can be converted
into RDF using an automated mark-up strategy pioneered by the Simile project
Solvent tool, whereby web page fragments are selected and marked-up with
semantic metadata.
Once these sources have been imported into TopBraid Composer, it is a simple
matter to describe various combinations of information using the constructs of
RDFS and OWL. RDFS provides a class structure in which sets of individuals are
described as Classes; a mash-up of multiple sets (each set representing
information from a different source) is specified as a common superclass of the
classes to be mashed up. The semantics of RDFS imply that the display of a
class includes all members of its subclasses. This allows a modeler to define
several layers of mash-ups, depending on the level of detail that is useful for
a particular display.
The system has been deployed with a small number of display plug-ins for maps,
spreadsheets, calendars and timelines, and forms the focus of a semantic web
training course that TopQuadrant runs at regular intervals. During the two-day
hands-on session, course participants create their own semantic mash-up by
finding and creating information sources, merging them together, and displaying
them using the plug-ins provided with TopBraid.
Future work includes incorporating more diplay modes in the form of Eclipse
plug-ins, providing more capabilities to support markup of unstructured data,
and translators of information from other structured data sources.







