Heterogeneity in representations of IISG film meta-data

This story is based on a conversation I had with Frank de Jong, archivist at the IISG. He mentioned that metadata on movies has been stored at varying levels of granualarity. What follows are a couple of examples he mentioned.

The first example shows a record describing a single movie, that happens to be on a single tape. Notice how the movie appears to have no title. Moreover, in the original record the description says U-matic 45 min, rather than just '45 min'.

Example of a single record describing a single film (on a single tape)

When briefly exploring the titles of films, it clearly shows that many items have no title.

This next example, describes a single record on a single film as above, but it relates to two tapes (a 35mm and 16mm). Notice that the type of information is different from above, where the description mentioned the lenght of the film, not its physical diameter. We can also see that the movie has an appropriate title, including a language tag, nl, indicating that the film is in Dutch.

In the next query, a single record describes multiple movies. Notice how the 'content' field provides more in depth information, compared to the queries above (for which that information was not available).

Films are also represented as part of an archive, as illustrated by the film Toekomst 36. What is worrisome, is that in the Linked Data representation there do not appear to be direct links from the archive to the individual audio and visual materials, as used to be the case in the search view on the previous version of the IISG website.

A film might also be part of a collection of video and audio samples as portrayed by this search view. It doesn't appear to be the case that the collection has been modeled as such in Linked Data as I am unable to find any of the descriptors such as "18.630 foto's/negatieven, 2530 dia's, 356 geluidsbanden/cassettes, 630 films/filmfragmenten" or the collection id in the Linked Data representation.

Finally, most films are simply covered by their title in a list provided as pdf on the old website. There doesn't appear to be any representation of these films, nor the lists themselves in the Linked Open Data. Ideally, these lists would need to be converted to collection descriptions as above, or in case of stand-alone film titles, described as records. Finally, the available audio and MovingImage materials ought to be 'playable' in Linked Open Data. This is now impossible, because the links to the raw files are preceded by a reference to a default player, that currently refuses to play any video files I encountered and hardly played any of the audio files.

Use case: The Future of '36

Above we have seen that films are archived in different formats sometimes as records and sometimes as part of an archive and that in the latter case there is no link between the film itself and the archive.

But with some ugly text matching we are still able to retrieve and connect a bit more information on a given film. To illustrate this, I will focus on the film "De Toekomst '36".

First, I will check what entries have the string "De Toekomst '36" as (part of their) description. If you would execute the query yourself (click on 'Try this query yourself), you would notice that this query takes a long time: it is an expensive query to run. But here we need to, because there are no links between the items we are interested in.

The query provides 10 results that have in any of their item descriptions the full string "Toekomst '36". To understand what kind of items these results are, we can look at their 'form' description, but this is only available for the first five of the items. However, all items are of one or more types. For example, items 3-5 are in poster form and of type StillImage and type Poster.

Notice how item six seems to be of a perculiar type: title. Judging from the URI the title seems to be authorative: an official way of spelling a particular name, or in this case title. If you were to click on the item and use the browser, you would see that the title is linked to two other items: 796824 and 796828. These two items also appear in our query result set, numbers 4 and 5. Being a poster, with exactly the same title, item 3 may have also been linked to the authorative title. The fact that it is not, may mean that the poster has the same title, but might refer to a different work. Or it simply is another small anomaly in the metadata representation.

The last four search results (7-10) are various metadata fields of the same item or rather ArchivalMaterial or Collection and so is the first search result. Moreover, if one would click on the first search result, one would notice that this archive is argued to be same (owl:sameAs) as the archive underlying the last four search results.

Now that we know what's available, let's explore it a little, by checking out the posters first. The two posters linked via their title provide scarce descriptions, amongst others the year of publication and on what countries the film is on. Also thumbnail images are provided. In the query below, I'm able to concatenate the pieces of information and use it as part of an image description. Please note that, while some words in the description are hard coded by me, the countries and the year are values derived from the catalogue as is the image header.

It is unfortunate, that for the third poster we can't show an image, but we do have quite some interesting metadata. Despite these substantive differences and different metadata fields, we can still combine the best of the three posters, as is shown in the query below, where I combine the image with metadata on the collaborators of the movie and metadata taken from the archive on the content of the movie.

The result describing the DVD provides yet other interesting metadata. Specifically, it mentions topics that are authoritative to the IISG, one of which is 'Spanish Civil War'. By combining various traits of the earlier search, we might offer subsequent items of interest to the reader. For example, below I show four results, that are also of type poster and have 'Spanish Civil War' as topic. This would be a typical 'what you also might like' recommendation.

Posters you may also be interested in