Categories
Achievements Updates

Benchmarking Food Identities: the Challenge of Capturing Food Over Time

By Teresa Paccosi

When we think about food in history, we often picture ingredients, recipes, or famous, maybe strange, dishes. But food has never been just about eating. Across centuries, it has been used as medicine, symbol, ritual object, cosmetic ingredient, and even a marker of status or identity. Understanding this rich and shifting landscape is far more complex than simply spotting the word bread or wine in an old text.

This is one of the challenges we have taken on in the Trifecta project.

Our aim is to build a system that can automatically detect references to food in historical texts, analyse how they are used, and trace how their meanings change over time. That might sound straightforward, but it actually raises some difficult questions. One major issue is what researchers call the long tail: the many items that appear only rarely in texts [1]. In large digital collections, common foods such as bread or beer show up again and again. But less frequent items, perhaps a particular herb used in healing, or a spice mentioned in a ritual context, risk being missed by automated systems that are often trained to detect the most frequent terms. From a cultural or historical perspective, however, these rare mentions can be extremely revealing. They may point to local traditions, specialised knowledge, or forgotten practices.

For this reason, we are not interested in food solely as nourishment. We are developing a model that also captures its uses in other processes, such as medicinal or ritual ones. By widening the lens in this way, we can identify food items that may seem marginal in overall frequency but are central within specific domains of practice. This helps us map how foods move between spheres of life and acquire new meanings in time and context.

To do this properly, we first needed a clear and consistent way of defining what counts as “food” and how its uses should be described. We therefore designed a structured annotation scheme:[2] a set of guidelines for human researchers to follow which support them on marking up texts consistently, i.e. to avoid divergency as much as possible. These guidelines do not only record the food item itself and its characteristics, but also its purpose, context, and function. For instance, in the sentence “ginger soothes sore throats”, our scheme not only identifies the food item, ginger, but also highlights its typical use in a medical context, prompted by the word soothes.  Thanks to this scheme, we can analyse the relatively uncommon food ginger on a large scale, especially in its frequent role as a natural anti-inflammatory.  

Our theoretical foundation is Frame Semantics [3], developed by the linguist Charles J. Fillmore. In simple terms, this approach suggests that words evoke entire situations, or “frames”. For example, a word such as harvest brings to mind farmers, crops, tools, seasons, and social practices. Meaning is not isolated; it is embedded in a structured scene or situation. Applied to the food domain, this perspective allows us to go beyond naming an ingredient. When a historical text mentions honey, for instance, it may be part of a healing recipe, a religious offering, or a cosmetic preparation. Each of these situations involves different participants, intentions, and effects. By modelling these as interconnected frames, we can represent who uses a food, for what reason, in which context, and with what outcome.

At present, we are manually annotating historical texts in English and Dutch. This carefully marked material will form our benchmark dataset: a reference corpus that shows, in detail, how food and its uses have been identified by humans trained on our annotation scheme. Once completed, the benchmark will be used to train a supervised machine learning model. In practice, this means explicitly teaching the system to recognise the same patterns that annotators have highlighted. By learning from many examples, the model can begin to identify similar structures in new, unseen texts. The ultimate goal is to apply it to large collections of historical sources, extracting structured information on a scale that manual reading alone could not sustain.

With this approach, we hope to uncover long-term patterns in how foods shift in meaning and function. A plant that once appeared mainly in medical contexts might later become an everyday ingredient. A substance associated with ritual activities might turn into a luxury commodity. By combining careful theoretical modelling, detailed annotation, and automated analysis, we can begin to trace these transformations across centuries. Food history, in other words, is not just about what people ate. It is about how substances travelled through different areas of life, acquiring new roles and shedding old ones. By building tools that can capture this complexity, we aim to open up fresh perspectives on the cultural and social life of food.

[1] For further information on long tail: https://en.wikipedia.org/wiki/Long_tail

[2] https://en.wikipedia.org/wiki/Text_annotation#Linguistic_annotation.

[3] https://en.wikipedia.org/wiki/Frame_semantics_(linguistics).

Categories
Presentations Updates

Jiaqi Presents Semantic Change Research at CHR 2025 in Luxembourg

In December, TRIFECTA team member Jiaqi travelled to Luxembourg to attend and present at CHR 2025, the 6th Conference on Computational Humanities Research, held from December 9-12 at the Luxembourg Centre for Contemporary and Digital History (C²DH) at the University of Luxembourg.

A Venue Where Past Meets Future

The conference took place on the University’s Belval campus. There is something poetic about discussing historical language change here, given the campus’s own remarkable transformation. Where steelworkers once tended roaring blast furnaces that helped build Europe’s railways, researchers now gather in sleek modern buildings to study the past with computational tools.

The campus sits in the heart of what was once Luxembourg’s industrial heartland. Two massive blast furnaces still tower over the site, preserved as monuments to the region’s steel-producing heritage. The university buildings have been designed to echo this history: the 85-metre Maison du Savoir deliberately mirrors the proportions of the old furnaces, while the Maison du Livre wraps around a former ore silo. Walking between sessions, you pass reflective water basins that occupy spaces where molten steel once flowed. It’s a striking reminder that transformation—whether of industrial sites or of word meanings—is a constant in human history.

Tracing How Words Change

Jiaqi presented this paper on semantic change detection in historical Dutch newspapers, exploring how computational methods can help us understand how the meanings of words evolve over time. This work is in line with TRIFECTA’s mission of developing better knowledge graphs for humanities research, as understanding how concepts shift and transform across historical periods is crucial for accurately representing historical knowledge.

Connections and Conversations

CHR has always been a warm and welcoming community for researchers working at the intersection of computation and the humanities, and this year was no exception. The conference brought together scholars working on everything from historical NLP to cultural analytics, and the conversations over coffee were just as valuable as the formal sessions. It was particularly inspiring to connect with others grappling with similar challenges around temporal analysis and historical languages.

We look forward to building on these connections and continuing to advance our research for the TRIFECTA project. We look forward to CHR2027 (the next edition) in Manchester!