A long-term exploratory analysis of male stature and occupation in microHeights, 1720-1910

A long-term exploratory analysis of male stature and occupation in microHeights, 1720-1910

By Nick Van den Broeck, Jasper Segerink and Christophe De Coster

Image source: Wikimedia Commons

I. Introduction

How much and why did a population’s stature change over time and space? This is a general but important question, as height is often considered a key indicator for economic conditions. Or put differently: observed variations in heights are often attributed to differences in the socio-economic background of individuals or societal groups (Steckel (2009); Floud et al. (2012); Depauw (2017)). Therefore, comparing average heights of structurally different occupational groups, like farmers and labourers, takes us straight to the core of debates on the effects of capitalism and (proto)-industrialization. With this in mind we set out to investigate microHeights, a dataset containing over 360.000 observations on the stature of individuals throughout the 17th and 20th centuries, and covering multiple countries across the globe.

We will ask ourselves: ‘How did the average stature of adult males belonging to different occupational groups change over time?’

II. To query is to standardize

Using microHeights confronted us with one major challenge: standardization of the variable occupation, or better, the lack of it. The table below lists the 100 most frequently registered occupations in microHeights.

Different languages, abbreviations, inconsistent capitalization, spelling mistakes, numeric references and symbols,… anything but a standardized format like HISCO/HISCLASS-codes. An automated clean-up process was unable to fully allocate such a code to each observation without considerable error margins. Using the query above, we randomly selected five of the top fifteen of the most common occupations in microHeights or, more precisely, of the occupations whose spellings occurred most frequently. This was thus done irrespective of potential age- or gender-related occupational connotations. The selection was: labourer, day labourer, farmer, shoemaker and maidservant. Manually checking spelling variants for each occupation required us to return to the individual datasets. Therefore, it was necessary to find out which datasets contained data on the variable ‘occupation’:

Out of the 32 datasets, 14 contain data on the variable occupation. Most of these datasets are either prison or military related, yet also two slave-datasets stand out. The query also revealed that the completeness of the variable differs per dataset. For nine datasets, a value for occupation was nearly always available, yet for others this was only 7 to 60 percent.

Individual datasets were then loaded into RStudio to trace the spelling variants of the five occupations (see R-script). There were actually 15 datasets that registered occupations, but two of them were unusable. The ‘Cuba Army Recruits’-dataset differentiates between occupational types (unskilled, professional,...) that are not registered as occupation in microHeights, which explains why they do not show up in the above script (remove filter # on line 19). Accordingly, in ‘UK: Prisoners […] at Wandsworth’ codes from 2-6 were used referring to the Armstrong classification (Horrell et. al (2009)). For the five selected occupations, we manually sifted through these individual datasets to trace variations on spelling, capitalization, and abbreviation (see R-script).

Initially, we intended on using the occupations-URI in the microHeights dataset as UID to link our cleaned file with. Hypothetically, each of the different spellings found in an individual dataset had their exact twin as URI in microheights, like: /occupation/#spelling_variant. But this was not the case. For example, ‘CORDONNIER’ appeared all-capitalized in the individual dataset, but in microHeights only '/occupation/cordonnier' existed. Nonetheless, different URI’s existed for uncapitalized '/occupation/trabalhador' and capitalized '/occupation/Trabalhador'. Another telling example was ‘dienstmagd’ from the dataset ‘Czech Prison Repy’. Applying the filter on line 20 in the query above, SPARQL refers to 114 observations for ‘dienstmagd’ in this dataset. However, when we explored the raw data in R (see R-script) only 10 were counted. Variations in spelling include ‘d’, ‘D’, ‘dienstm’, ‘Dienstm’ and ‘Dienstmädchen’, but in total these sum up to 124, implying that 10 observations are lost in microHeights. This suggests that the conversion of raw occupational data to microHeights’ URI’s did not happen in a systematic manner, or in one that we are not aware of. Consequently, we only standardized URI’s of occupations that we did find in microHeights

Ruben Schalk kindly linked microHeights with our cleaned occupation 'keys' using URI_MH as UID. The query below provides an overview on how the observations in the standardized data and used filters compare to the original datasets per birth decade. We filtered out all females, and adulthood is (broadly) defined being from ages 20-55 years.

III. As above so below? An analysis of height.

The query above gives us an impression of the general development of stature for our four remaining professions. The polynomial trendline (r^2 0.584) indicates that from 1660-1910, average height increased from 157,5cm to 161,8cm, peeking at 166,2cm in 1800. Remarkable is the 10cm increase in height during the early eighteenth century from 158 to 168cm for the four occupations. However, this development might be influenced by the lower number of observations before 1720. The overall decrease in height during the nineteenth century is also disturbed in 1850 by the lower number of observations during that birth decade. The representativity of this sample is stressed as a comparable development - except for the 1770-1800’s - can be distinguished when all height-observations with occupations within microHeights are taken into consideration.

The real power of our HISCO-standardization, however, lies in the ability to trace the height of specific occupations through time.

The query above tracks the average height of our four male occupations from 1720 to 1910. This periodization stems from a lower limit of 50 observations of standardized occupations (see query 3). With an average around 163/4cm, but occasional upsurges towards 167/8cm, the height of labourers remained most stable throughout the research period. Day labourers on the other hand, endured a visible decrease throughout the nineteenth century, from 166 to 161cm. The height of farmers was relatively volatile during the 1700’s. From 1800 to 1880, their height stabilized around a tall 168cm, but decreased during the last decades. In contrast to farmers, the average height of shoemakers increased throughout the 18th century, reaching its zenith of 174cm in 1800. Afterwards, a continuous decline to 162cm occured.

When compared to each other and the general adult male population in the same graph, these differences materialise even further.

Grosso modo two phases can be distinguished, with 1820 as the tipping point. Before this birth decade, shoemakers consistently measured well above the average (based on all occupations), but afterwards this was no longer the case. Farmers, who before 1820 always measured below the average, then soar above all other occupations until the turn of the next century. (Day)Labourers’ height on the other hand, continuously remained under the average.

These divergent trends might be explained by different levels of skill, and adaptability to socio-economic developments. Shoemaking was a handicraft until the 19th century, after which it became increasingly industrialized. Effectively, as shoemaking became part of a more accessible production process, its required skills lowered (Riello (2003), Blewett (1983)). Accordingly, from the 1820’s onwards, farmers’ height measured a remarkable increase, and stood out compared to all other occupations. We notice differences averaging circa 3cm during some birth decades! While we have to look into the specificities of each individual dataset, this hints that farmers were especially better off, possibly due to their more secured and direct access to food compared to other professions (in rural areas and cities alike), which influenced their nutritional intake and stature (see for example: Depauw (2017)). Both labourers and day labourers faced structurally tougher working conditions, and their height varied accordingly (see Kirby (1995)).

IV. Concluding Remarks

With this concise datastory we have hinted at the importance of studying height in tandem with occupation. Our preliminary results have revealed highly differentiated trends according to adult males’ occupations that are in line with their socio-economic vulnerabilities as put forward in the literature. But this is merely the tip of the iceberg. Further analyses across space and time using microHeights’ data can further dig out these trends among axes of gender, age and origin. By using the microHeights database in full as a level of analysis, we did not acknowledge that countries and regions were characterized by different economic developments through time which -as suggested- influence stature. A relational analysis with GDP through Clio-Infra can follow up on this suggestion.

But before this is possible, a more elaborate standardization of the occupation-variable in microHeights is urgently needed. At the time of writing, studying an occupation in microHeights is not optimal, and we have exposed some inconsistencies in the conversion process of the occupations to microHeights’ URI’s which need revisiting. This datastory has shown that through an in-depth linguistic analysis of the individual datasets, standardization is possible, albeit requiring manual intervention for some. When more occupations are uniformized, analyses based on HISCLASS can further tease out socially differentiated trends in stature.

This datastory was created as part of the 2021 Posthumus course 'Data management for Historians'.