In this post we continue our new series — ‘Historical Research in the Digital Age’ — which explores historians’ use and understanding of the digital tools and sources that shape modern research culture. The series explores the impact and implications of digital resources (positive and negative) for how historians work today.
In Part Three we hear from Anna Mcnally who is a qualified archivist with twenty years of professional experience. Here, Anna considers the development of digitised archives from the early 2000s, the behind-the-scenes work of digital archives, and how this influences the work we’re able to do as historical researchers.
The series is hosted by Ian Milligan from the University of Waterloo, Ontario, whose new book, The Transformation of Historical Research in the Digital Age, is now available as a free Open Access download from Cambridge University Press. Later in the series we’ll hear from historian-users of digital and tools, and from those who research in environments where digital resources remain limited.
Why digitise?
The answers to this question might seem obvious to a historian: to make collections accessible to remote audiences; to enable text searching, and to allow for new types of research. Yet how often do you pose the question of a resource you are using: why has this been digitised?
In this post, I’m going to look at how the decisions behind digitisation can impact on your experience as a historical researcher, and some of the questions you might want to consider as you approach a digitised resource.
How we got here
Archival digitisation projects started in earnest around 2000. One of the earliest UK projects, Kent Archives’ tithe maps, originally applied for funding for microfilming but was carried out as a digitisation project, 1998-2002, and supplied on CD-ROM. The practicalities of dial-up internet restricted the size of the images for many early projects, and so digitised documents usually formed part of an educational resource, such as the Moving Here website or as ‘shop windows’ to drive in-person visits to the physical collections. A report from the 2004 Gerald Aylmer Seminar noted a digitisation project as having ‘generated a demand for original records’, which I remember as being a common expectation at that time. The move to expectations of greater and off-site access duly followed. In 2009 the UK Government published Archives for the 21st Century which included the recommendation that archives should work towards ‘comprehensive online access for archive discovery through catalogues and to digitised archive content by citizens at a time and place that suits them’.
The subsequent decade saw a move towards a presumption in favour of digitisation as a means of opening up archives. The widespread closure of archives to the public in March 2020, as a result of the Covid-19 pandemic, further tilted the scales in favour of digital access. Rather than driving visitors to the archive reading room as it did 20 years ago, today digitisation is perceived to be of sufficient quality to keep researchers away.
Faced with an online interface of millions of images, it is easy for researchers to assume that the proportion of digitised material is much higher than it actually is.
Researchers accessing digitised archives in 2023 are therefore increasingly approaching them as the norm. Yet the material they are using was often digitised—and importantly selected for digitisation—at a time when digital access was viewed as an added bonus rather than as the primary means of access.
Although there are many criteria for selection, the priority has always been to digitise popular material. This is because it has benefits both for users (in providing convenient access); archive staff (minimising reproduction and retrieval enquiries for the same material), and for the objects themselves (since repeated handling causes damage). This was unproblematic when researchers expected archives to only have a small proportion of their collections available digitally and there was a tacit understanding about what would be available.
As new generations of researchers approach the archive without the experience of using digitised records pre-pandemic, we can no longer assume that this tacit understanding is in place. There is a danger that researchers will see the selection of material for digitisation as a form of secondary appraisal, after the initial selection for preservation in the archive, and indicative of the records’ perceived research value rather than simply a popularity contest.
Faced with an online interface of millions of images, it is easy for researchers to assume that the proportion of digitised material is much higher than it actually is. Yet the UK National Archives has only digitised an estimated 10% of its collection, a figure that is likely to be significantly lower for the majority of UK archives.
Why software matters
This perception of abundance is compounded by digitised images often being hosted in a separate interface to the archival catalogue, due to the technical constraints of both types of software. This can also result in significant differences in the metadata between the two interfaces. For example, the catalogue for University of Leicester’s Gorrie Collection is hosted in CalmView software while the digitised collection is available through CONTENTdm. The catalogue provides additional information about the physical material that is not mentioned in the digitised collection; while the digitised collection includes descriptions of each item that aren’t available in the catalogue.
In the case of the Gorrie Collection, the entire Fonds is available digitally, but this is unusual. There are a combination of physical and legal restrictions that prevent a significant number of archival items from being digitised, which makes digitisation of a complete collection rare. Over the last decade the Tate Archive has embarked on a particularly ambitious digitisation programme, making more than 75,000 items available.
The Tate’s digitised archive collections are presented through a traditional hierarchical arrangement, ensuring that the digitised items can be understood in context. Yet understanding what hasn’t been digitised still requires navigating across both interfaces. For example, the correspondence series in Ithell Colquhoun’s archive (TGA 929/1) numbers 18 items on the digitised site and 2544 items on the catalogue. As an alternative approach, the Norfolk Record Office has chosen to digitise files in their entirety for their Second Air Division Digital Archive but with redactions and removals due to copyright noted in the metadata.
Anticipated audiences
While some of the projects mentioned so far are textual materials, the majority of digitised archival items are still primarily visual. Digitised photographs are useful for publications, websites and social media, allowing archives to support the mission of their parent organisation. In a previous role, I scanned nearly a thousand photographs for use across five volumes to celebrate the 175th anniversary of the University of Westminster. Having been digitised, these images were later added to the online catalogue to support access, but the primary purpose was to support the institution’s marketing and communications.
As such, in most cases only the image side of the photograph was scanned; where a photograph was too large for the office scanner, sometimes the edges were cropped. While this isn’t an issue where the digitised image is used for illustrative purposes, it can pose problems for researchers for whom the photographic form or annotations convey relevant information. Increasingly, however, some digitisation projects are taking these users into account. Catalogue metadata also often doesn’t include the size of the original object, which can be difficult to discern from the digitised image.
Computational access techniques such as machine learning or text mining present enormous opportunities for opening up hidden histories within collections.
Although archive catalogues are intended to be used by the entire spectrum of researchers, digitisation projects must be planned with a specific audience in mind. Yet regardless of the target user, the type of research imagined is usually analogous to that which is carried out in the reading room. Digitised documents are primarily delivered to the researcher as jpegs or pdfs, with zooming in the only interactivity that’s enabled beyond what would be possible in person.
This type of access was previously described as a ‘virtual reading room’ although the term changed meaning during the Covid-19 pandemic lockdowns in the UK and came to describe access to archives and special collections via Zoom. At present few digitisation projects anticipate machine-readable access to the data. Computational access techniques such as machine learning or text mining present enormous opportunities for opening up hidden histories within collections, both when applied to the archival documents and to the collection metadata itself. Sometimes it may be possible to apply these techniques to older digitisation projects, but researchers may need to contact the archive staff in order to gain access to the materials in a format they can use.
Digitisation means more work for archivists
Digitisation means a doubling of preservation efforts, with both the physical and digital version now requiring ongoing care and maintenance. At the very least a digitised copy requires storage, with the attendant costs, while proper maintenance means regular checking to ensure that the digital file remains usable and uncorrupted over time.
With some older digitisation projects, now approaching their 20th anniversary, archivists are also increasingly having to find ways to preserve the websites that host the digitisation projects, alongside the original documents and digitised images, as examples of scholarly output. Yet a 2017 study found that even well-regarded resources like Early English Books Online were regularly being cited as if the writer had looked at the original document instead of the digitised version. These practices make it difficult for archive professionals to advocate for funding to both maintain and create new digital resources.
Digitisation opens up a lot of opportunities, both in providing access to audiences who might not have previously been able to travel to see the documents and in enabling new types of research. As more collections are digitised, it will be crucial for researchers to be able both to approach digital collections critically, and to articulate to funding bodies why they sometimes still need to look at the original.
About the Author
Anna McNally is a qualified archivist with twenty years of professional experience in managing collections in academic and museum contexts. She has also taught and written on critical approaches to archives. A list of Anna’s publications and projects can be found on her personal website.
‘HISTORICAL RESEARCH IN THE DIGITAL AGE’: ABOUT THIS SERIES
‘Historical Research in the Digital Age’ is a 6-part series of posts on the Royal Historical Society’s blog, published between December 2022 and February 2023. The series is designed and hosted by Ian Milligan, Professor of History at the University of Waterloo, Ontario. It’s prompted by Ian’s new book, The Transformation of Historical Research in the Digital Age (available Open Access via Cambridge University Press, 2022), which considers the impact and implications of digital resources for contemporary historical practice.
In addition to his own essay, ‘We Are All Digital Now: And what this means for historical research’ (December 2022), Ian invites five contributors to continue the discussion from several perspectives:
- the builder of digital tools for historians: Part 2, with William J. Turkel
- the archivist-interpretator who mediates between resources (and their commercial providers) and users: Part 3, with Anna Mcnally
- the collaborative and interdisciplinary researcher who brings historical and computer science knowledge to big data: Part 4, with Ruth Ahnert
- the many without access to such resources given the many ‘digital disparities’ of infrastructure and sources that exist: Part 5, with Gerben Zaagsma
- the historian-user who applies digital resources to their work, and the implications of this: Part 4, with Jo Guldi
ALSO AVAILABLE IN THIS SERIES
Part One: ‘We are all Digital Now: and what this means for historical research’, by Ian Milligan
Part Two: ‘Tools for the Trade: and how historians can make best use of them’, by William J. Turkel
Part Three: ‘Why Archivist Digitise: and why it matters’, by Anna Mcnally
Part Four: ‘Researching with Big Data; and how historians can work collaboratively’, by Ruth Ahnert
Part Five: ‘Digitising History from a Global Context; and what this tells us about access and inequality’, by Gerben Zaagsma
FURTHER RHS BLOG SERIES
The Society’s blog, Historical Transactions, offers regular think pieces on historical research projects and approaches to the past. These include several previous series, addressing wide-ranging questions concerning historical methods and the value of historical thinking.
Recent contributions to series include ‘Writing Race’ and ‘What is History For?’ We welcome proposals for other short series of posts, bringing historians together to discuss topics, practices and values. If you’d like to suggest a RHS blog series, please email: philip.carter@royalhistsoc.org.