Lippe brick-makers project: data story

Mentions of responsibility

Motivation

This data story offers an improved way to integrate and navigate an important collection of digitized archival documents and the transcriptions and data that were extracted from those sources.

Background

"In the 19th century, up to 40% of all male workers left the Principality of Lippe every spring. Until the fall, they worked in brick works in northern Germany, the Netherlands, Scandinavia and Eastern Europe. The seasonal migrant work of the Lippe brickmakers dates back to the 17th century. The administration of the Principality of Lippe recorded them in detail. For the years from 1778 to 1869, the State Archives of North Rhine-Westphalia (LAV NRW), Department of East Westphalia-Lippe, in Detmold holds over 100,000 records on more than 30,000 brickmakers. Nowhere else in the world is it possible to find so much detailed information on seasonal workers with their origins and destinations for such a long and early period. They are invaluable for the history of migrations, work and family, but also for genealogists with ancestors from Lippe.Thanks to the cooperation between the International Institute of Social History in Amsterdam (IISG) and the LAV NRW, this data is made available embedded in its historical context." (Source: https://web.archive.org/web/20231017194904/https://iisg.nl/migration/ziegler/).

Access to the data

The historical documents an the data derived from these sources were/are offered in these forms:

  1. As digital copies of the archival records, in many cases accompanied by a transcription:
  1. As archival descriptions of the different groups of sources (descriptions of the most important fonds "L77" and "L79"):
  1. As a database in which the personal names from the passport and brick messenger lists are stored in such a way that they can be easily searched by name, place of origin in Lippe and destination.
  • this "database" was made by researchers Jan Lucassen and Piet Lourens who, for several years, transcribed the original scans and extracted details from the tables in the original sources. This "dabase" is offered at the Dataverse repository of the IISH in a comma separated value (.csv) format: https://hdl.handle.net/10622/ZKS9BA.

As a user of this collection, it is difficult to find a clear way to search and navigate across the different sources (which is explained in the next section "Defining the problem"), and to find connections between the scans and the corresponding transcripts and extracted data. This motivated the writing of this data story, which shows a more integrated way to provide such navigation between collections and data.

Defining the problem

The materials derived for the Lippische Ziegler project are described as an IISH Collection: https://hdl.handle.net/10622/ARCH03497. If you click on the tab 'content list' you'll be able to view all records within the collection. Each record consists of multiple scan, that you can can view by clicking 'online access'. To search through the archival materials go to the tab 'search in text' and enter text. This is an example of the current situation:

  • If you search for the string 'lohn' in the 'search in text' tab described above, the first results mentions:
    • Found on page 675 for item 4 :
    • Bauerschaft 165 Lohne Hellmann Adolph Filges Meister Assemissen 165 Lohne Hellmann Fritz
  • To view this text on the original scan, click on 'item 4' which will bring you to the content list tab. There click 'online access' below item 4, which will open a new page or tab with a viewer of all scans in 'item 4'. On top of the page it reads: "Image 1 of 900". Change '1' into 675 and you'll end up at the right scan. Use zoom and arrow buttons to navigate the page in more detail.
  • Note: sometimes the numbering of the page is off by 1-3 pages. This alignment was corrected, and the updated links were added to the dataset: https://hdl.handle.net/10622/1RNBFT.

Solution

Storing the data

To facilitate access, the scans of the archival materials have been stored at the IISG in this dataverse: https://datasets.iisg.amsterdam/dataverse/lippe

This dataverse (or project data container) contains three datasets:

  1. A dataset called "Lippe brickmakers: archival pages transcriptions" which contains the transcribed text from the archival sources mentioned above: https://hdl.handle.net/10622/1RNBFT.

  2. A dataset called "Lippe brick-makers: database" which contains the structured data extracted from the transcripts by Lucassen & Lourens: https://hdl.handle.net/10622/ZKS9BA.

  3. A dataset called "Lippe brick-makers database: sample with geodata" which contains a small sample from the database enriched with correct geo-data, to showcase how proper identification of place names, and its conversion to linked open data, can facilitate historical research.

Improving access via Linked Open Data

This data story provides different ways to search and access the datasets and the scans by using a direct link to the page scans both from the page transcriptions and from the database.

The basic "data model" that was used when converting the raw data (.csv) to triples using the code deposited here: DOI is the following:

lippe_concepts_part1

1. Search in the archival pages transcriptions

This section gives access to the linked data version of the archival pages transcriptions (the complete dataset can be found here: https://datasets.iisg.amsterdam/dataset.xhtml?persistentId=hdl:10622/1RNBFT).

Users can find words in the transcriptions and find a direct link to the scanned page where the term occurs. There are no intermediary steps to get to the results. Users can search for any word (of a person, factory, place, etc.) in the transcripts, keeping in mind that there is no way to distinguish between these entities yet since these are the transcriptions.

Find words (strings of, for instance, person names, place names, factory names, etc.) in the page transcriptions, which are linked to the digital scan.

2. Search in the brickmakers database (per person name)

In this version of the data, the brickmakers database can be queried by person name. The data has been converted from a csv file to linked open data (.nt file), and it's available here: https://hdl.handle.net/10622/ZKS9BA. The link from the database to the scans and to the transcribed texts is given in the results without intermediary steps. Person names are shown as transcribed by the researchers. This query looks for strings in the columns "Namen" en "Vornamen" (person name and last name). Because in some cases it was not clear what the name or the last name was, (e.g., Hermann Sieckmann = Sieckmann Hermann), we created a column called "person name redundant", which joints Namen + Vornamen + Namen to facilitate searching in all possible ways. Thus, if you start typing "Sieckmann" you may get a list with values such as "Hermann Sieckmann Hermann". But the resulting table will only show "Namen" and "Vornamen". Note: please remember that you can change the number of results being displayed. For doing that, use the button "Go to Query" and change the value at the end of the code (Filter: ##).

Search for "work events" in the database using a person name.

2. Search in the brickmakers database (factory name)

In this version of the data, the brickmakers database can be queried by factory name. The data has been converted from a csv file to linked open data (.nt file), and it's available here: https://hdl.handle.net/10622/ZKS9BA. The link from the database to the scans and to the transcribed texts is given in the results without intermediary steps. Person names are shown as transcribed by the researchers. This query looks for strings in the column "Fabrik". Note: please remember that you can change the number of results being displayed. For doing that, use the button "Go to Query" and change the value at the end of the code (Filter: ##).

lippe-brickmakers-database-query-fabrik

Visualize the places from a sample of the database enriched with Geodata

  • The data specialist (Liliana Melgar) and the main researcher (Jan Lucassen), evaluated the candidate places given by the WHG which would match the places (as strings) transcribed by the researchers. In our sample, 12 of the 18 records (i.e., unique place names) got 35 hits in total.
  • Decision about the place of origin ("Ortschaft")
    • We assume the workers lived in Lippe and that the place of residence (or departure) is within the area of Lippe as it's defined in the researchers' publications
  • Decision tree to select the place of destination ("Wohin")
    • If the name of the place is the same for the province and the capital, the capital is chosen
    • The information about the places related to the factory and/or the factory owners is taken into account to decide between candidates
    • If none of the above is clear, the "most important" place is selected (this is defined as the place with most habitants)
  • Examples:
    • For a place called Obendorf we got about five potential place candidates, even in different continents. Within Europe, there were also a few places with the same name. After following the decision tree above to define precisely the place of destination, we see that it all fits to the location in Hannover.
    • There was a place called Norddorf which only had one potential candidate, but this happened to be an island. The researcher investigated the factors above and also the family names (Akermann / Ackermann), confirming that, indeed, the evidence indicated that the workers also went to this island (Nordorf).

Note: the dot near Africa represents the "Unknown" locations, that is, places for which we don't have the coordinates.

Query a sample of the database enriched with geodata

Further reading

More details on the Lippische Ziegler project can be found on: http://www.iisg.nl/migration/ziegler/ including links to Landesarchiv Nordrhein-Westfalen (LAV NRW), Abteilung Ostwestfalen-Lippe, in Detmold holding the original archival materials.

Next steps

This is a summary of what was done to create this story:

  • Wrong page alignments detected in the first version of the data story were corrected in the source data (https://datasets.iisg.amsterdam/dataverse/lippe). The corrections were done to ALL the transcriptions made by the main researchers.
  • All the scanned pages have now a link to their digital viewer. The only remaining archive piece to include is Signature 4718 (inventory number .5 here: https://search.iisg.amsterdam/Record/ARCH03497). The German archive has to be contacted to provide them.
  • All the data was improved (for example, correcting some parsing errors in the initial csv files, and checking as much as possible that it was all consistent). The data is available and documented in Dataverse (https://datasets.iisg.amsterdam/dataverse/lippe). The links can also be found in the data story itself.
  • We converted it all to Linked Open Data. The scripts and the data are all available (the links are in Dataverse and/or in the Data story). In this way, it’s possible to ensure reproducibility and transparency of the data.
  • The data story is improved, adding as much content as we thought was necessary for the users to understand the data, and to search it.
  • We also added all the related publications to the main researcher's Orcid profile: https://orcid.org/0009-0009-6065-3947.

Further development