Analysing Stature: Europe vs. Asia in the 19th Century
By Karin Wienholts, Klaas Krab & Lennart Jetten
In 2020, economic historians Baten and Blum collected a large collection of anthropometric data from the period 1810 - 1984 and used it to analyse what they call "anthropometric divergence" between affluent and poor areas in the world in relation to socio-economic developments related to globalisation (Jörg Baten and Matthias Blum, 'Global Height Trends in Industrial and Developing Countries, 1810-1984: An Overview', Research Gate concept paper January 2020).
Part of the data they used are published as linked data on Druid. This allows us to repeat parts of their analysis, and explore how anthropometric trends may correlate with potential explanatory historical factors. Of special interest here is how the 'Great Divergence' in economic power postulated by Pomeranz to accelerate in the 19th century due to the industrial revolution taking place in Europe (Kenneth Pomeranz, The Great Divergence: China, Europe, and the Making of the Modern World Economy. Princeton University Press, 2000) might correlate with the anthropometric divergences reported by Baten and Blum. As anthropometric data are notoriously susceptible to bias (they rarely approach the ideal of a random population sample, see also e.g. Björn Quanjer & Jan Kok, 'Drafting the Dutch: Selection Biases in Dutch Conscript Records in the Second Half of the Nineteenth Century', Social Science History 44 (2020): 501–524), it is also of interest to see whether rigorous limitation to data with the least bias will result in any differences with the results of Baten and Blum.
This Data Story makes use of the Druid dataset microHeights, which contains human stature data from a wide variety of countries and world regions. We selected a part that contains the least known biases: data from conscription records, censuses, anthropological or ethnographic studies and hospital records; these are supposedly closest to random population samples. We used these selected data to test whether our hypothesis - that the accelerated Great Divergence in 19th century is reflected in a higher percentage increase of the average height in Europe compared to people from other continents - is borne out by the facts. To investigate this, we compared how the stature of European and Asian adult males born in the nineteenth century developed in this century. We queried the data using SPARQL and made available our results and queries for others to use.
We made a query selecting men older than 17 from datasets that met our criteria of selection. Presumably some individuals still grew after the age of 17, but this was considered a minor limitation. In addition, we added the condition that the country of birth equals the country of residence at measurement, to make sure that the selected individuals grew up and grew to the recorded height in the countries that we visualize in the figures below. Subsequently, the sample set was broadened in two steps by removing the conditions related to 1) country of birth (identical to country of residence) and 2) specific dataset types (those considered to be non-biased), first by removing the restrictions on military volunteers and then removing all restrictions on dataset types altogether. The aim of this broadening of the sample set was to investigate whether the results change when the sample set becomes larger in size, but also more biased.
Table 1: Results for the primary query set up to fulfill all our requirements of the data - query can be accessed through the option to "try this query yourself". Countries for which calculated averages are listed fulfill our requirements.
Map 1: The modern countries which contain the geographic origins of the datasets that meet our primary selection criteria.
The query result yields that only individuals from a limited number of datasets from Germany, India, Pakistan and the Philippines meet the selection criteria. Map 1 shows the geographical locations of these countries, and we conclude that we can compare Europe (Germany) with Asia (India, Pakistan and the Philippines).
Figure 1: Average height (cm) per decade/country of birth of adult (age 18 or higher) males born between 1799 – 1900, and resident in the country of birth - strict selection of database type.
Figure 2: Average height (cm) per decade/region of birth of adult (age 18 or higher) males born between 1799 – 1900, and resident in their country of birth.
Figures 1 and 2 show that Germans have the highest average height overall compared to both individual countries and Asia as a whole, but it is the trend over time that is informative. Unfortunately, a quantitative measure for trends cannot be calculated in SPARQL queries, but the overall impression is that there are no large increases in height in this period.
A feature of the data sets generated by the SPARQL query is that it produces samples of different size, which may complicate analysis of differences. These sample sizes are summarised in Tables 2 and 3.
Table 2: Number of observations associated with Figure 1.
Table 3: Number of observations associated with Figure 2.
Ideally, the data in Figures 1 and 2 should be analysed statistically by using a regression method to discover different trends over time. However, Tables 2 and 3 show that the sample sizes behind the datapoints are rather different, such a regression analysis requires that every datapoint should be augmented by a measure of variability such as a Confidence Interval. Sadly, SPARQL does not (yet) include a statistical toolkit; alternatives we considered do not adhere to the European FAIR principles for research data. Our conclusions therefore have a very tentative character.
Removing the condition that country of birth equals country of residence did not introduce new countries into the sample. The sample set was broadened by including data from groups of military volunteers (which might introduce a social bias). This introduces new countries into the results, but Figure 3 seems to confirm the results presented in Figure 1, and Figure 4 those of Figure 2.
Figure 3: Average height (cm) per decade/country of birth of adult (age 18 or higher) males born between 1799 – 1900 - loose selection of dataset type.
Figure 4: Average height (cm) per decade/continent of birth of adult (age 18 or higher) males born between 1799 – 1900 - loose selection of dataset type.
We subsequently removed all conditions of dataset types from an explicit desire to include China, as this is a crucial part of Pomeranz' concept of the Great Divergence. The results can be found in Figures 5 and 6.
Figure 5: Average height (cm) per decade/country of birth of adult (age 18 or higher) males born between 1799 – 1900 - no selection of dataset type.
Figure 6: Average height (cm) per decade/continent of birth of adult (age 18 or higher) males born between 1799 – 1900 - no selection of dataset type.
Comparison of the present results with the findings of Baten and Blum shows that independent of rigorous bias selection, or of re-introduction of a potential bias (inclusion of volunteers and thereafter all dataset types), there is little evidence for a 19th century anthropometric divergence between Europe and Asia that corresponds to an acceleration of the Great Divergence triggered by the early coal-based Industrial revolution. Our results show a persistent difference in stature between Europe and Asia in the 19th century, but the limitations of SPARQL and other tools that adhere to the FAIR principles and our desire to ensure reproducibility prevent us from investigating this difference further.