Unifying and Augmenting Metadata as Linked Open Data

1. Metadata systems

The IISH has five different source systems for which metadata are created. They are:

  • Archives Information Management System
  • Integrated Library System
  • Research Data Repository
  • Electronic depot (e.g. digital born materials)
  • The IISH Website

The sources are described in various kinds of metadata that are retrievable via API's. In Figure 1 below we see how the metadata is retrieved via an Extract, Transform, Load (ETL) pipeline, called TriplyETL. The resulting RDF is uploaded to the IISH Knowledge Graph in a TriplyDB triple store. Next, the RDF is imported in Spinque Desk, a visual design tool for search strategies. Finally, the data are delivered to a Drupal website.

Figure 2 below, is a repetition of Figure 1, but this time with the brands of the components. On the left hand side, most components are open source, while the conversion and search strategies are closed source. The Linked Data in the Triplestore is of course portable to other products, but the ETL and search strategy are not. The website is in Drupal, but maintained by a company.

The details of the metadata pipeline are provided in Figure 3. The dark blue parts are implemented in the pipeline. The lighter coloured parts are not yet. A first thing on the agenda is to implement the finding aids using the Records in Context ontology.

2. Retrieving metadata from the IISG Knowledge Graph

The sparql endpoint of the IISG Knowledge Graph is https://druid.datalegend.net/IISG/iisg-kg/sparql/iisg-kg. We can write a sparql query to retrieve the number of types in the IISG Knowledge Graph. Figure 4 below shows that images and books are the most common content types in the IISG collection. If you click on the text 'try this query yourself' next to the graph you will be able to manipulate the query results or the visualisation of the results.

Figure 4. Types (classes) in the IISG Knowledge Graph

To illustrate the advantage of using Linked Data, we will now search for all items with a specific word in the title, e.g. "Gouda". While the different metadata ontologies prescribe different terms, such as title, name, or even 245 12$a, we can simply query for "sdo:name" and retrieve for any source type the number of items with "Gouda" in the title.

Table 1. Number of items given a topic by source type

3. Moving beyond metadata

Images often are part of a collection. The collection of press photographer Ben van Meerendonk is one of the more well known image collections at the IISG. In a pilot study Rinske Zandhuis applied the YOLO image entity detection algorithm to distinguish objects in a part of the collection. This now allows visitors to go beyond the metadata and find specific images using their memories of objects in photographs. This is illustrated in the query below, where a user can search for a Wikidata defined object to retrieve photographs with that particular object. If you click on the link provided by a section, you are directed to a IIIF-server, allowing you to directly play that part of the video.

Figure 5. Searching images via object detection. Click an image to see the section where the object was detected.

Another example of moving beyond the metadata is a project consisting of 14 hours of video an interview with the late Prime Minister of The Netherlands, Wim Kok. Using the Whisper algorithm, Ikrame Zirar transposed the audio into text. The text was divided in sections, to serve as captions for the video. Those captions were transposed in Linked Data allowing us to search through the 14 hours of the interview. For example, on what Wim Kok would do on Sunday evenings to prepare for the week ahead.

Table2. Searching through video for spoken text. Click link to view and listen to interview scene.

Discussion

In this so called 'data story' we have seen how the IISG has setup a pipeline to derive metadata from multiple systems and harmonise those metadata using Linked Open Data. By using Linked Data a single term allows one to search for all 'creators' of works even though in the original data these 'creators' are called differently, while preserving the original roles mentioned in the underlying sources (e.g. 'author' or 'illustrator').

Furthermore it was illustrated how it is possible to move beyond metadata of images or videos to help users find the 'right' image material. The IISG has just started discovering these opportunities, but so far is excited to expand these methods across its collection.

For a final visualisation feature, we need the help of the reader. Thanks to CLARIAH and these authors, we can now transform online data stories like these directly into paper print. For details we refer the reader to this FAIR data story. But for now, we would like to ask the reader to add ?lncs after the url in the browser and hit enter. This feature is implement in TriplyDB, but is an open source feature also available in CLARIAH.