The Integrated Census Microdata (I-CeM) project was a three-year programme which has produced a standardised, integrated dataset of most of the censuses of Great Britain for the period 1851 to 19111.
By making available to academic researchers the detailed information about everyone resident in this country collected at each decennial census from 1851 to 1911, the I-CeM project has transformed the research landscape for work in the economic, social, and demographic history of this country during a period of profound change in the wake of the industrial revolution.
The I-CeM dataset is one of the most important historical datasets in the world, and puts British social scientific research, and the Department of History at the University of Essex, at the forefront of international efforts in the field. The dataset is made available at the UK Data Archive at the University of Essex in two forms – in an anonymised version available online to accredited researchers, and in a full version via secure data access arrangements.
What has the project involved?
The I-CeM team have created machine readable parish-level tables of population counts by census year and developed consistent enumeration geography for Great Britain over time, derived from the published census tables for the period 1851-1911.
This framework of standardised administrative geography and digitised population counts has been used to reconcile data received from commercial partners. The I-CeM team have also constructed a wide range of data enhancements as derived variables to augment the transcribed census data.
Standardised coding schemes, harmonised over time, have been established, along with data dictionaries and thesauri, to facilitate the coding of census variables, including relationship to household head, marital condition, disability, place of work and place of birth. Occupational coding schema have been mapped to relevant historical and international standards to ensure comparability.
I-CeM project has also created an online interface and bespoke dissemination tool for authenticated users to access versions of the I-CeM data or generate their own datasets, and supporting documentation. The enhanced, integrated I-CeM datasets will thus make the British census enumerators’ books from the decennial censuses of 1851-1911 available to the research community in a standardised comparable form.
The I-CeM dataset is based on digital transcriptions of the original data made by a genealogical service provider, FindMyPast, for its own purposes. FindMyPast had a commercial incentive to ensure that the transcription process was as accurate as possible. However, no transcription can be 100 per cent accurate, and further mis-specifications will have been introduced during coding.
However, the original census enumerator’s books were themselves transcripts from household schedules, which were themselves approximations to the realities of Victorian life, so any inaccuracies in the I-CeM dataset are a matter of degree rather than of kind.
The I-CeM datset is a substantially accurate rendering of the original census returns, and any limitations of the data are thoroughly outweighed by the convenience of having 60 years of data downloadable for the whole of Great Britain in a standardised form.
In addition, although the original digital data have been coded and standardised, the original text and numerical strings have always been preserved in separate variables, so that researchers can go back to the original transcription.
1 England and Wales, 1851-1861, 1881-1911, Scotland, 1851-1901.