2. Semantic Web ChangesIf the Semantic Web means anything, it means changing the Web's infrastructure such that information exchanges between computers alone become as ubiquitous, cheap, and easy as exchanges between humans, mediated by the Web, are already. One vital goal, however, is to make inter-machine exchanges possible without doing permanent damage to the ecology of the Web: inter-machine exchanges are not meant to replace or supplant inter-human ones, merely to supplement them.
2.1 Beyond HypertextSo far we've argued that the Web's hypertext model, though expressively impoverished in comparison to other hypertext models, has been widely successful in and across a great many parts of society, including higher education. The differences between text and hypertext have called forth and made possible interesting differences in the way academic communities constitute themselves and enact their scholarly practices.
The success of the Web suggests, however, that the network effect is more important than the expressivity of the hypertext model. In some sense the fact that millions of people are engaged in a wide diversity of interesting projects and activities using the Web overwhelms the fact that the Web's hypertext model is relatively inexpressive. It is rather astonishing to explore the rich webs of signification and linkage which have been created on the Web with only the lowly, unidirectional link. The algorithm which powers Google, Page Rank, is based on the unidirectional link, as well as some assumptions, which turn out to be mostly correct, about popularity and relevance. That is, we end up getting a lot of power out of a relatively inexpressive hypertext model, with its untyped, unidirectional link, and the network effect.
Thus, as we begin to see some of the building blocks of the Semantic Web put into place, we anticipate that there will be new practices and institutions that are called forth by these new technologies (just as these new technologies are themselves being called forth by a different set of practices and institutions). As we've focused so far on the transition from text to hypertext, we'll now take up the transition from hypertext to hypertextual knowledge representation or hyperkrep.
There are at least two technologies, in addition to the existing Web infrastucture itself, which are key to the Semantic Web: RDF and OWL. RDF, the Resource Description Format, which is an XML vocabulary, is an assertional knowledge representation language, allowing anyone to say anything about anything. How does it accomplish this? The first point to make is that RDF is based on a formally specified semantics, grounded in model theory.
The main idea behind RDF is that knowledge can be represented as a graph of directed, labeled arcs; one makes assertions about a thing by means of associating subjects and objects by way of predicates. Put the other way around, RDF graphs are full of things called "triples", which are three-tuples, or assertions, containing subject, predicate, and object terms. What makes RDF particularly useful in the context of the Web and the Semantic Web is that the value of these terms -- subject, predicate, object -- may each be a URI. "URI" stands for Universal Resource Identifier; it is the term most commonly used for what was formerly called a URL or Universal Resource Locator.
Let's take a concrete, if contrived and simplistic example. You are a philosopher of science and a member of the (mythical, as far as we know) C.P. Snow Society. The society maintains a presence on the Web at http://www.cpsnow.org/, which includes a few notable resources: a page about C.P. Snow himself, http://www.cpsnow.org/cpsnow/, and a page about his famous little book, The Two Cultures and the Scientific Revolution, http://www.cpsnow.org/two-cultures. Imagine, further, that you would like to represent some knowledge; for example, "C.P. Snow wrote a book called The Two Cultures and the Scientific Revolution".
How might you go about encoding some bits of knowledge such that Semantic Web agents could interpret them. Let's begin by rewriting our simple sentence in a longer but slightly more literal form: "There is a book that is titled 'The Two Cultures...' and its author is 'C.P. Snow'". More awkward, more wooden, and more verbose, but this version of our sentence is semantically equivalent.
How might we encode this strange
set of sentences in RDF? That is, how might we encode it as a set of
three-tuples of the form (subject, predicate, object)? First we
will give the encoding, then we will explain it:
(http://www.cpsnow.org/two-cultures, rdf:type, cpss:book)(http://www.cpsnow.org/cpsnow, dc:author, http://www.cpsnow.org/two-cultures)(http://www.cpsnow.org/two-cultures, dc:date, "...")(http://www.cpsnow.org/two-cultures, dc:title, "The Two Cultures and the Scientific Revolution")
What have we done here? First, we've said that the web resource, http://www.cpsnow.org/two-cultures is (or, more accurately, represents a thing which is) a book. The term form "xxx:yyy" is a kind of abbreviation, known as an XML qualified name or "qname". It means that we're using a term from an existing vocabulary or set of terms, rather than making up our own. The RDF specifications from the W3 Consortium, specify that "rdf:type" is a term which means, roughly, "is-a". You can read that first triple as, roughly, "the web resource, http://www.cpsnow.org/two-cultures, is of the type cpss:book". Perhaps the CP Snow Society doesn't know or approve of existing sets of terms which define "book", so it's defined its own, using the prefix "cpss".
The second triple can be read as saying that "there is a web resource, http://www.cpsnow.org/cpsnow, which is or represents the entity which is the author of another web resource, http://www.cpnsow.org/two-cultures". We know this second web resource, the one in the object position in the second triple, is a book, because that was the assertion made in the first triple. Putting these together, we've now said that there is a book, identified by such-and-such a web resource, which was authored by some entity, identified in turn by such-and-such a web resource.
Lastly, the two final triples says that there is a web resource, which we now know to be a book, that has the title "The Two Cultures..." and a specific date. Rather than making up our own terminology for date and title, we use the well-known Dublin Core meta-data standard using its common qname prefix "dc:" to denote it.
That's not so difficult. We've expressed a helpful bit of knowledge, and we've done so in a way that can be easily turned into a format that Semantic Web agents can understand -- a format backed by a rigorous, formal semantics. Now, suppose we want to say a bit more? Suppose we want to say a bit more about C.P. Snow, the natural person, himself? We can start to see a bit of the promised power of the Semantic Web by taking this question a little further.
Even though all of the web resources discussed so far are mythical, there is a good chance that you have been assuming a particular thing about them, namely, that if there were such resources on the Web, what you would find when you used your web browser to visit them would be some HTML. That's a perfectly reasonable assumption, given the past 10 or so years of history and experience with the Web. That is, if you pointed your browser at http://www.cpsnow.org/two-cultures you would expect to see a page describing the book in HTML.
But, in another sense, it's dead wrong. And here's why. The existing Web works because web resources represent (and, sometimes, just are) interesting things in the world. And these resources, standing in for (or being) interesting things in the world, often point to other resources, which in turn stand in for (or are) other interesting things in the world. Imagine, then, that instead of finding HTML, meant for human consumption, at those web resources, one could find RDF meant for machine consumption. So, instead of (or in addition to) finding an HTML page giving the biographic details of C.P. Snow, one nay find an RDF document which includes the following triples:
(http://www.cpsnow.org/cpsnow, rdf:type, foaf:Person)(http://www.cpsnow.org/cpsnow, foaf:name, "Charles Percy Snow")(http://www.cpsnow.org/cpsnow, foaf:img, http://www.cpsnow.org/cpsnow.jpg)(http://www.cpsnow.org/cpsnow, foaf:gender, "male")
You can read the first triple as saying, roughly, that "there is a web resource, http://www.cpsnow.org/cpsnow, which represents a natural person". In this case we're using the term foaf:Person, which means we're using the term "Person" drawn from a vocabulary called "Friend of a Friend", a common way to represent information about natural persons on the Semantic Web. Next, "there is a web resource, which represents a natural person, that is named 'C.P. Snow'"; third, "there is a web resource, which represents a natural person of the male gender".
Note the network effect is once again present! The CP Snow society let the Dublin Core folks define facts about publication metadata and let the Friend of a Friend vocabulary define facts about people. DC and FOAF, in turn, may link to other documents that represent other types of information and so on and so forth. Instead of every document making up its own representaion, they are linked into a Web of semantic representation.
One may quickly see, or so we think, that if a great many affinity groups within higher education -- study groups, learned societies, scholarly conferences and colloquiums, departments, colleges, seminars, groups of students, groups of students and a faculty member, and so on -- develop in the next five years even one hundredth as many RDF resources as they have created HTML resources in the past five years, then the Semantic Web will become a thing very rich in knowledge, that is, in knowledge discoverable and consumable by machines and agents.
OWL is a newly developed ontology language for the Web. An ontology language is a means by which one can formally describe a knowledge domain, with the goal of enabling computers to provide various kinds of reasoning services about that domain, and about the knowledge described by an ontology for that domain. In our current, technical usage, an ontology is a formal specification of a knowledge domain: what individuals and classes of individuals there are in that domain, the relationships which obtain between these individuals and classes, their proper and apparent parts, and so on. Thus, using OWL one can formally specify a knowledge domain, describing its most salient features and constituents, and then use that formal specification to make assertions about what there is in that domain. You can feed all of that to a computer which will reason about the domain and its knowledge for you. And, here's the most tantalizing bit, you can do all of this on, in, and with the Web, in both interesting and powerful ways.
Two brief points: First, we all spend some amount of our brain power -- almost entirely without consciously knowing that this is what we are doing -- dealing with informal, implicit ontologies. In order to act meaningfully at all within particular social contexts, we need to have understood something roughly like an ontology of that context. In any situation or context there will be features which we attend to, because they just are the salient features of that context, and an even larger number of things about the situation which we do not attend to, which we cannot even call features, because they are the background noise against which salience emerges. Second, unlike humans, computers can only provide reasoning services over a knowledge domain because the domain and the knowledge have been formally and rigorously specified in advance and because some human has implemented various reasoning algorithms in a way which that computer can apply.
From these two points we may be able to conclude that ordinary people, with the right support and motivation, can learn to use the formal tools of computerized ontology languages, like OWL, to represent the things which they already know in a way which computers can then reason about, as a supplement and aid to human interests. It's worth noting that the alternative, expecting the computer to understand and reason with human concepts and language, is far beyond the current state-of-the-art, if achievable at all.
So far nothing we have said about ontology languages and reasoning systems is specific to OWL as an ontology language for the Web. However, OWL has been specifically crafted out of its Webbish forerunners, particularly SHOE and DAML+OIL, to take advantage of some of the interesting things about the Web. OWL is intended to be an ontology language that has the following features: it should operate at the scale of the Web; it should be distributed across many systems, allowing people to share ontologies and parts of ontologies; it should be compatible with the Web's ways of achieving accessibility and internationalization; and it should be, relative to most prior knowledge representation systems, easy to get started with, non-proprietary, and open. In short, OWL was based on the same principles we mentioned about the Web itself much earlier in this discourse -- openness and scalability to allow a network effect.
Insofar as OWL accomplishes or will accomplish these goals, it will do so by virtue of the fact that it was designed by a collection of KnowledgeRrepresenation and Web experts, with the explicit goal of making a formal knowledge representation (KR) language work on the world's first globally distributed hypermedia system. This is a relatively new thing to aim at in the history of KR systems. In some ways, the OWL Working Group (WG) is among the most ambitious of the W3C's many WGs. It is often said of W3C WGs that they are not meant to do new work, that is, to do new research into some field; rather, they are meant to standardize and specify things which are already known in such a way that makes open computing possible and proprietary vendor lock-in improbable. In the case of the OWL WG, however, this general rule was broken. While OWL has precursors, the most important of which is DAML+OIL, it took a non-trivial amount of real, new technical work to make OWL into a practical ontology language for the Web.
Despite our enthusiasm for OWL, we have to temper it with a dose of realism. OWL can be and probably is everything good which people have said about it; if so, that in and of itself will not mean that the Semantic Web visions will be widely achieved. Whether or not the Semantic Web ever happens, in as robust and important a sense as the original Web happened, depends on a complex set of factors and their interactions, only some of which are under anyone's direct control.
Having OWL means a few things are no longer true. First, it is no longer true that the Semantic Web can be dismissively written off as a bit of magical, wishful thinking on the part of some Utopian-leaning technologists. OWL provides a real foundation, rooted in the rich research and engineering tradition of KR and DL, for the Semantic Web. Second, it is no longer true that RDF and RDF Schemas are the obvious choices for a certain class of Web applications. OWL will soon be considered in some cases a better choice than RDF alone; it is more expressive and, in the OWL Full variant, upwardly compatible with RDF.
To see how OWL can be used, we return to our earlier example. Suppose the C.P. Snow Society wants to organize its bibliographic information already encoded in RDF. To take a simple example, they would like to distinguish between works by Snow and works about him. In OWL, we can express these concepts using class expressions, in particular, restrictions on the various properties a work has. For example, the class of work by C.P. Snow is just the set of work which have http://www.cpsnow.org/cpsnow (the person designated by this URI) as their dc:author, while the class of works by C.P. Snow is just the set of works which have http://www.cpsnow.org/cpsnow as (one of) their dc:subject(s). We can easily express these definitions in OWL, give names to these concepts (e.g., http://www.cpsnow.org/ WorksByCPSnow and http://www.cpsnow.org/ WorksAboutCPSnow and expect an OWL system to correctly infer which works we've already described fall into which class. The C.P. Snow society can build upon these concepts to express the distinction between works and articles solely written by Snow and collaborative works (e.g., by defining WorksByOnlySnow as a subclass of WorksByCPSnow where there is only one author, and CollaborationsWithSnow as the subclass of WorksByCPSnow where there is at least one author who isn't snow).
While helpful for organizing the C.P. Snow society's Web site, such an ontology only becomes interesting, and only become a true Web ontology, when it is published on the Web for all and sundry to examine, use, extend, or dispute, along with the facts (expressed in RDF) the ontology is meant to organize. Anyone, anywhere on the Web could then take the facts and impose an alternative or rival organization upon them, or take both the facts and the ontology and refine the ontology to greater detail. In this way, the Semantic Web enables non-coordiated (and even non-cooperative) collaboration about a domain of discourse, one in which the conceptual work is aided and abetted by programs. Not only will our Web Agents find and aggregate information from the Web (and without fragile and error prone "scraping" of HTML pages), but they will be able to give some initial guidance about whether certain aggregations make sense.