Linked Data

Originally published on July 25, 2016 by Steve Meyer

Here we will be collecting information about Linked Data initiatives at the UW-Madison Libraries. First up, we are sharing an internal position paper that provides the philosophical foundation for our focus on using Linked Data to enhance library discovery.

Linked Data & Libraries: Where to Begin?

With many different library Linked Data projects underway these days, one of the key questions facing libraries in this space is, “Where to begin?” Many of the major initiatives getting attention in this space involve efforts to produce a wholesale conversion of the library catalog out of MARC and into RDF. Think BIBFRAME, WorldCat Linked Data, RDA Registry. Such efforts are massive in scale and are primarily the purview of the maintainers of cooperative data sets and vendor systems.

However, there is another frame of reference for approaching Linked Data in libraries, namely, can we enhance the library  discovery experience with data sets that are not traditionally curated  by libraries? How do we as librarians avail ourselves of this rich data? What kinds of technologies can help us to access and exploit it? This line of  questioning leads to a focus on diverse data sets and vocabularies with which we may not yet have experience. This is the start of a very  exciting exploration…

A Multilingual Approach Through the Patron Experience

We would like to propose that libraries embrace parallel efforts to the major initiatives underway to redesign cataloging itself and experiment with additional data that relates to but is external to our core catalog metadata. Given the prevalence of Linked Data sets for authority files, it is time to begin embracing their links and the other data you will find at those links. To do this, Linked Data initiatives and projects must embrace the World Wide Web, learn to speak languages, as it were, outside of the walls of libraryland. Our initial efforts within the UW-Madison Libraries are finding this approach productive.

Remember the Links

“With linked data, when you have some of it, you can find other, related, data.” – Linked Data. Tim Berners-Lee

While the internal data structures and systems in the library have been the focus of Linked Data efforts they can inhibit connections to the broadest possible network, the Internet itself. We should embrace the linked connectedness that is at the core of Linked Data.

The purpose of linking data sets together is to replicate for the world’s data what has been so voluminously successful in the linking together of the world’s documents on the World Wide Web. In 2016, it is easy to take for granted and forget the fundamentally simple and astonishingly profound impact of the basic architecture of the World Wide Web: it connects things together and promotes access across multiple sources of information throughout the Internet via linking. Documents from any source on the Web may be connected to one another by a simple, powerful method.

Connecting Data: To What End?

When applying these principles to data, the same spirit of connectedness requires a little more attention since the presumed first client of Linked Data is the machine. The machine requires a little more context or instruction so that it knows what to do with the data it encounters. Unlike the human client consuming documents on the Web, the machine cannot by itself ascribe meaning to the data it encounters. A person will be able to make judgments about the links s/he encounters in a web page document and decide whether or not to follow the link to continue consuming more information.

These judgments are informed by the language, imagery and communication mechanisms built into the published documents themselves. Our machines, insofar as they are still tools that perform basic tasks according to their instructions, will need context in the form of a vocabulary that provides the semantics that can be associated with the operating instructions of the machines. The relevant machines in this context, computers or computer programs that crawl the web and retrieve more pieces of data and information to assist in human activity, simply do not understand human language or imagery as the person does. Furthermore, these machines do not have the same form of agency when it comes to consuming information. They are our agents, not their own.

This is to say that Linked Data requires a sort of language that is specified by people, but actionable by machines. The language is composed of a few different parts:

  1. Semantic Structure/Grammar: the W3C has already established the triple structure (subject, predicate, object) as the basic unit of data assertion.
  2. Serialization: this is also already established by the W3C. Triples should be expressed as RDF in whatever flavor suits your processing environment.
  3. Vocabulary: herein lies an important, strategic question for libraries…

Describe, But Don’t Forget to Link

Major efforts like BIBFRAME are focused largely on the translation of the MARC record standard into an entity-oriented model that lends itself to RDF serialization.

To that extent, BIBFRAME aims to be a robustly expressive RDF version of present day library bibliographic description. This is a universe of metadata that is unparalleled by any other to-date. However, its primary goal is thorough and robust description, which is not the same as linking. Linked Data is built upon a fundamentally different premise. Rather than focus on the totality and completeness of description achieved within a self-contained system, Linked Data is founded on a model for knowledge acquisition and use that depends upon pluralism: cooperative and linked description situated within a larger ecosystem. Basing data description on a linking model is an acknowledgment that no single source of information can ever be complete because new assertions can always be made about a thing. By simply changing your vantage point and perspective, you will see differently and describe differently.

What does this mean from a practical perspective? Libraries are service-oriented organizations that aim to connect “every reader [to] his/her book.” We connect patrons to information that we curate in one form or another, but our capacity is finite and therefore we cannot manage every piece of information that may be relevant to that process. The library is simply not the publisher of an encyclopedia with comprehensive description about the author of every work in our catalog. Many of the day-to-day resource management operations are largely about efficient inventory control for physical and digital things.

However, in the 21st century, we have an amazing opportunity to embrace Linked Data on the open Web to situate the things in our inventory within a broader information context. We can use links to other information published about the same things our data describes to enhance the library services with information we have not traditionally curated.

Connected Data is Inherently Heterogeneous

This is where deliberate and strategic consideration comes into play when picking vocabularies. The questions can be framed as:

  • With whom do you wish to speak?
  • Who knows something that might help our patrons understand my collections?

As a prerequisite, therefore, the key question becomes: In what languages do you wish to be fluent?

Being fluent exclusively in library vocabularies will only enable us to speak to the world of libraries. We are quite fluent in this domain and do a lot with library data presently. What we really want to be able to do is speak to the broader World Wide Web. We want to talk to non-library data sets like DBpedia or Geonames to get biographical or geographic information about the things in our authority files. We want to speak to published data sets produced by MusicBrainz or Getty Research that offer additional data points about musicians and artists not captured in MARC bibliographic records.

For the individual library, there are big dividends to reap by focusing its Linked Data efforts on learning new data vocabularies and models. This is the foundation for recent experiments that we are engaging in. The focus is on making sure that when we provide services based on our data, “you can find other, related, data” that might assist a patron. This will be accomplished by connecting to data beyond the internal world of libraries.