โ๏ธ Challenges and solutions towards building and visualising FAIR data for traditional games
by: Carlos Utrilla Guerrero and Vincent Emonet
Like almost all research disciplines, digital humanities is poised to enter an era of unprecedented large scale analysis powered by massive amounts of (public) digital collections and hundreds of millions of records on the web. However, this rising amount of humanities data is largely unstructured, making it nearly impossible to connect to other datasets for better analysis, and in some cases even a shortage of usefulness or reusability.
๐ฏ Objective
In this story, we will focus on the lessons acquired on the state of art of data modelling methodologies and digital tools, to make largely unstructured humanities data, more interoperable (the I in FAIR):
- Several studies have proposed semantic web technologies and FAIR approaches as a set of recommended solutions supporting better data modelling approaches, data storytelling and increasing data reusability. The state of these technologies has become increasingly important due to the rising amount of largely unstructured humanities data, making it nearly impossible to connect to other datasets for better analysis, leading in some cases to a shortage of usefulness or reusability.
We will focus on the solutions and tools we encountered, including CLARIAH and UM public services, and how these can be used and applied in every research to make your own data more interoperable.
๐งโ๐ป Methodology
The following section describes the process of building an online resource to explore the historical context of traditional games. We introduce the data model using established standards, in particular wikipedia and schema.org ontologies, for supporting data interoperability and longevity, as well as providing stable digital representations of traditional games.
Graph Data Model
This is a diagrammatic representation of all the data in the PLAYFAIRKG, which were uploaded to druid. The above image highlights the links among custom items created specifically for the PLAYFAIR dataset, generic items already existing in Wikidata, and the range of properties used to create the links. The map is meant to be used as a tool for those looking to create their own queries into the dataset by providing explicit QIDs and PIDs for specific and generic Games
, as well as properties within a Game. โSpecific itemsโ in the map refer to Gameitems at Ruleset
level, such as the Wikidata properties for the historical Periods
and geonames properties for Regions
. Concepts and properties are annotated with general (e.g. Schema.org, geonames or wikipedia) and dedicated digital humanities (e.g. getty) ontologies. Here is the RDF serialisation for each level::
- Data Model Game table - describes a
Game
(i.e. schema ontology) - Data Model Ruleset Regions - describes the
Rulesets
of a Game given geographicalRegions
(i.e. geonames ontology) - Data Model Ruleset Periods - describes the
Rulesets
of a Game given historicalPeriods
(i.e. wikidata ontology)
Digital tools & techniques for humanities
We have utilised two open source tools for RDF conversion:
๐ฎ Run CoW for table Game
level
- The Input CSV files are in
./data
: - Generate the metadata file with CSVw mappings:
cow_tool build data/tableGames.csv
-
Change the json file generated:
- Change the base URI to w3id.org/ludeme
- Add
propertyUrl
to map to our predicates (cf. https://www.w3.org/TR/tabular-data-primer/#property-names)
-
You can check our JSON Skeleton Schema in
./data
: -
Run the CSVw mappings to generate RDF:
cow_tool convert data/tableGames.csv
-
The Outputs RDF files are in
./data
:
๐ช Convert spreadsheets to Linked Data for Ruleset
level
The LDWizard helps you to convert spreadsheets to Linked Data (RDF) relevant to the Humanities and Social sciences. It will guide you through a straightforward process to map the different columns to a type and properties, in order to create valid Linked Data entities:
๐ค๏ธ Upload the spreadsheets in the CSV format (Comma-Separated Values) into the application.
๐ Define which column identifies the entity that will be created for each row (it will be the identifier of the entity).
๐ Then, for each column, you search and define which standard property best matches the data in the column for the entity.
๐ฅ๏ธ Finally, you can download the result in Linked Data format (Resource Description Framework, RDF).
๐ผ๏ธ Data Visualisations
As part of our research, we investigated approaches and designs for using data visualisation based on the data Ludeme has collected about Games
and Rulesets
from our PLAYFAIR-KG. We hope that these graphical representations of the data and the variety of formats will encourage further research on interoperability work as well as inspire cultural heritage research using linked open data with contemporary art in general and performance based art practices in particular.
"SPARQL" data visualisation example
PREFIX schema: <https://schema.org/>
PREFIX schemas: <https://schema.org/>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX ludeme: <https://w3id.org/ludeme/>
PREFIX ludata: <https://w3id.org/ludeme/data/>
PREFIX dc: <http://purl.org/dc/terms/>
SELECT * WHERE {
?game a schemas:Game ;
rdfs:label ?title ;
schema:material ?material ;
}
ORDER BY desc(?game)
LIMIT 50
Chart organization example
Bar Chart Example
Network for Game data
This view renders SPARQL Construct results in a graph representation. It works for Turtle, Trig, N-Triples and N-Quads responses. The maximum amount of results that can be visualized is 1.000 due to performance.
Gallery Example
The following SPARQL query binds an HTML string consisting of a header, an image (img),to the ?widget variable
. This results in a gallery with 10 widgets, each displaying a Senet
Game.