Geografie 2025, 130, 271-297
Digital innovations in historical climatology: Classifying weather and climatic extremes and their impacts on societies using machine learning on written documents
This article explores how digital innovations – particularly machine learning and natural language processing – can streamline and enhance workflows in historical climatology. Traditionally reliant on time-consuming manual analysis of historical documents, the field now benefits from modern digital tools at each research stage, from source discovery to publication. Focusing on classifying large, unstructured textual data, the study examines methods ranging from manual keyword searches and Bayesian models to advanced large language models. Using the tambora.org corpus, it extracts and categorizes references to weather extremes like thunderstorms and heavy rainfall and their impacts on mobility. The paper compares these approaches in terms of accuracy, resource demands such as runtime performance and memory, and their ability to interpret historical language. It argues that digital methods – especially AI – can transform the extraction and classification of climate data from historical texts, offering significant advantages by assisting researchers in historical climatology.
Funding
This work was supported in part by the DEMUR Project within the DFG Priority Program “On the Way to the Fluvial Anthroposphere” (DFG-SPP 2361). This paper benefited from the participation in the Climate Reconstruction and Impacts from the Archives of Societies (CRIAS) working group of the Past Global Changes (PAGES) project.