In this post we continue our new series — ‘Historical Research in the Digital Age’ — which explores historians’ use and understanding of the digital tools and sources that shape modern research culture. The series explores the impact and implications of digital resources (positive and negative) for how historians work today.
In Part Two we hear from Professor William J. Turkel who’s an experienced builder of digital tools for historians. In this post we’re introduced to selected resources and guides to tools. William also explains the thinking we should undertake as historians when seeking to use digital tool, or indeed to create our own digital resources — either singly or as a team.
The series is hosted by Ian Milligan from the University of Waterloo, Ontario, whose new book, The Transformation of Historical Research in the Digital Age, is now available as a free Open Access download from Cambridge University Press. Later in the series we’ll hear from university archivists and librarians who are responsible for mediating and interpreting digital resources for students and researchers, from historian-users of digital and tools, and from those who research in environments where digital resources remain limited.
There are many ways to build digital resources for historical research, and I have little experience with most of them.
I have always worked by myself, or in small teams that had at most one other programmer. The computational literacy of my team members has varied widely and they have typically had no background in traditional computer science. Furthermore, our tools were intended for internal use only, although we shared our code as open source. So I cannot provide much perspective on the development of the large public-facing websites that are typically a first point of contact for people searching for digital history online. I also have not contributed to the development of tools used by historians (and others) such as Zotero, Transkribus, Voyant, or Mallet.
But I have been programming and teaching beginners how to program since the late 1970s. Here, in this second article for the series ‘Historical Research in the Digital Age’, I will focus on why you might want to try building your own digital resources and how to get started.
One reason to explore programming is that it becomes possible to automate research tasks that you are already doing. Two decades ago, John Unsworth identified seven ‘scholarly primitives’: discovering, annotating, comparing, referring, sampling, illustrating, and representing. Each of these activities can be automated with tools that you build yourself.
Programming never does your research for you, but you can use it to speed up routine work.
For example, I was recently collaborating with Michael Bartlett, an emeritus Civil Engineering professor who wanted to create a database of historic bridge images. One of the ways that we discovered these images was by using Wikidata to retrieve identifiers for linked open data and then using the Library of Congress identifier for a given bridge to retrieve the images themselves (the process is shown here with code). In digital resources, such stable identifiers are used for referring. In the same project we automated the annotation of images with metadata such as latitude and longitude, automated the comparison of photographs taken in close proximity to one another, and so on. Each of these tasks could have been done by hand, but it would have been considerably more time consuming and expensive. Programming never does your research for you, but you can use it to speed up routine work.
In this case, programming also served as a way of structuring our interactions so we could both teach and learn from one another. Going into the project, I knew little of the history of bridge construction or civil engineering more generally; Mike was interested in learning more about the databases, image and text mining, and machine learning techniques I had been using in my own work.
Programming forces you to clarify your assumptions to the point where not only the computer, but also other programmers, can understand them.
Along the way we got to know each other better, had some fun, and created resources we both will use in the future. If your own research abuts another discipline, especially one of the social or natural sciences, then developing some skill with programming has the additional benefit of helping you to develop a new literacy for interdisciplinary collaboration. Historians and geographers may approach maps differently, for example, but the degree to which they do so becomes more salient and explicit in discussions about the construction of a database to support a geographic information system (GIS). Programming forces you to clarify your assumptions to the point where not only the computer, but also other programmers, can understand them.
Becoming a ‘Programming Historian’
Collaboration can also make it easier to learn how to program.
If there’s a single attribute that distinguishes people who find it easier to learn to code than others, it is an ability to hang in there when something does not work.
When she was a post-doc, Kim Martin organised self-study groups to work on lessons from the Programming Historian, an open content, open access, open source site that ‘publish[es] novice-friendly, peer-reviewed tutorials that help humanists learn a wide range of digital tools, techniques, and workflows to facilitate research and teaching.’ Nobody in Kim’s groups was an experienced programmer, but participants provided a supportive environment for one another. Whoever had figured something out could temporarily take the lead, and group members successfully completed projects that some might have found too intimidating to approach by themselves.
If there’s a single attribute that distinguishes people who find it easier to learn to code than others, it is an ability to hang in there when something does not work, experience the frustration, and keep trying to solve the problem. If a group is motivated to succeed, team members who are tenacious — who enjoy figuring out why things did not work the way they expected — can often propel the others through temporary snags. Team members who are not so determined or detail oriented often bring other useful qualities to the group, such as enthusiasm, interpersonal skills, or lateral thinking.
Whether you are working by yourself or with a group of friends, programming is something that you have to practice every day or two, if you want to learn to do it at all. It is like learning to speak a natural language or play a musical instrument. If you are just starting out, it helps to choose a programming language like Python which is relatively ‘high-level’. This means you don’t have to worry about the details of how your particular computer hardware works and you can focus on the problem you want to solve. Python (and many other languages) also can be used in web-based Jupyter notebooks that allow you to combine your code with narrative text, interactive visualisations, research notes, and source citations. As Jennifer Guiliano writes in her excellent new Primer for Teaching Digital History. Ten Design Principles (Duke University Press 2022):
If I was forced to choose only one assignment tied to methods to build into a digital history course for the rest of my career, it would be the research notebook that lays bare what students do in their work as historians.
Rather than focusing exclusively on conclusions, a digital history research notebook asks students to reveal how they reached their argument’s conclusions. It is entirely about methods. What evidence was kept? What evidence was discarded? What transformations did the evidence undergo? Did you introduce new evidence?
The digital history research notebook is an assignment that encourages techniques and heuristics to be laid bare to the reader. (p. 76)
Jupyter notebooks also play well with platforms that can be used directly for historical research and teaching. For example, ITHAKA Constellate enables the use of notebook-driven exploration of more than 27 million journal articles and 2 million historical newspaper issues. Text analytics can also be done using Jupyter notebooks on features extracted from more than 15 million volumes by the HathiTrust Research Center.
Working with source materials
Providers like ITHAKA and HathiTrust create digital resources that can provide a strong foundation for developing custom code. When you are considering a project, here are some things to ask about the provider:
- Permission: are you allowed to build on their resources? The Google N-gram Viewer is very useful for digital history assignments. If you need to create a custom project with n-grams, you can download their data for reuse under a Creative Commons license.
- Scale: does the site offer access to more sources than you could read in a lifetime? All of the providers I mention here do.
- Consistency: nothing slows you down as much as having to write code to handle exceptions. A good example of consistency is provided by Chronicling America, the Library of Congress site that provides access to millions of digitised pages from 140K historic American newspaper titles. Every page in the collection can easily be reached by building a URL using a consistent template.
- Stability: once you have written your code, any changes to the site can break it. While code maintenance is an unavoidable task, the less frequently you have to do it, the better.
- Help: can you find instructions for accomplishing a particular task? The Internet Archive, for example, not only allows you to download many files, but also provides detailed instructions on how to do batch downloading.
Learning from what’s out there, and seeing what it makes possible
As you begin to integrate programming into your own research and teaching, one useful strategy is to search for examples of what you can do with particular techniques. In 2013, Miriam Posner posted an article titled ‘How did they make that?’, analysing a series of digital history examples in terms of the skills and software required to produce each. A gallery of primary sources, for example, was built with Omeka using HTML, CSS and PHP. She also linked to introductory tutorials to help the reader get started if they wanted to produce something similar.
Studying digital resources — especially ones that you find inspiring — can also provide opportunities to publish in journals like Reviews in Digital Humanities. Also of value is the Models of Argument-Driven Digital History site which provides previously published digital history articles newly ‘annotated by their authors to highlight the use of digital methods to make historical arguments.’ Most of the techniques on display required a substantial amount of programming, but the ability to see how a method relates to an argument can provide a strong motivation to learn that method. For more experienced programmers, journals like Digital Humanities Quarterly and the Journal of Cultural Analytics (both Open Access) provide regular sources of inspiration, as does the literature of computer science.
About the author
William J. Turkel is Professor of History at The University of Western Ontario, Canada, His research involves computational history, big history, and science and technology studies, with a focus on methods.
William has taught, trained, collaborated and written with many of the leading historians practising today, in North American and Europe. A selection of his publications and projects in the field are also available from his personal website.
‘Historical Research in the Digital Age’: about this series
‘Historical Research in the Digital Age’ is a 6-part series of posts on the Royal Historical Society’s blog, published between December 2022 and February 2023. The series is designed and hosted by Ian Milligan, Professor of History at the University of Waterloo, Ontario. It’s prompted by Ian’s new book, The Transformation of Historical Research in the Digital Age (available Open Access via Cambridge University Press, 2022), which considers the impact and implications of digital resources for contemporary historical practice,
In addition to his own essay, ‘We Are All Digital Now: And what this means for historical research’ (December 2022), Ian invites four contributors to continue the discussion from several perspectives:
- the builder of digital tools for historians: Part 2, with William J. Turkel
- the archivist-interpretator who mediates between resources (and their commercial providers) and users: Part 3, with Anna Mcnally
- the historian-user who applies digital resources to their work, and the implications of this: Part 4, with Jo Guldi
- the many without access to such resources given the many ‘digital disparities’ of infrastructure and sources that exist: Part 5, with Gerben Zaagsma
- the collaborative and interdisciplinary researchers who bring historical and computer science knowledge to big data: Part 6, with Ruth Ahnert
Currently available in the series
Part One: ‘We are all Digital Now: and what this means for historical research’, by Ian Milligan
Part Two: ‘Tools for the Trade: and how historians can make best use of them’, by William J. Turkel
Part Three: ‘Why Archivists Digitise: and why it matters’, by Anna Mcnally