Categories
Achievements Updates

Benchmarking Food Identities: the Challenge of Capturing Food Over Time

By Teresa Paccosi

When we think about food in history, we often picture ingredients, recipes, or famous, maybe strange, dishes. But food has never been just about eating. Across centuries, it has been used as medicine, symbol, ritual object, cosmetic ingredient, and even a marker of status or identity. Understanding this rich and shifting landscape is far more complex than simply spotting the word bread or wine in an old text.

This is one of the challenges we have taken on in the Trifecta project.

Our aim is to build a system that can automatically detect references to food in historical texts, analyse how they are used, and trace how their meanings change over time. That might sound straightforward, but it actually raises some difficult questions. One major issue is what researchers call the long tail: the many items that appear only rarely in texts [1]. In large digital collections, common foods such as bread or beer show up again and again. But less frequent items, perhaps a particular herb used in healing, or a spice mentioned in a ritual context, risk being missed by automated systems that are often trained to detect the most frequent terms. From a cultural or historical perspective, however, these rare mentions can be extremely revealing. They may point to local traditions, specialised knowledge, or forgotten practices.

For this reason, we are not interested in food solely as nourishment. We are developing a model that also captures its uses in other processes, such as medicinal or ritual ones. By widening the lens in this way, we can identify food items that may seem marginal in overall frequency but are central within specific domains of practice. This helps us map how foods move between spheres of life and acquire new meanings in time and context.

To do this properly, we first needed a clear and consistent way of defining what counts as “food” and how its uses should be described. We therefore designed a structured annotation scheme:[2] a set of guidelines for human researchers to follow which support them on marking up texts consistently, i.e. to avoid divergency as much as possible. These guidelines do not only record the food item itself and its characteristics, but also its purpose, context, and function. For instance, in the sentence “ginger soothes sore throats”, our scheme not only identifies the food item, ginger, but also highlights its typical use in a medical context, prompted by the word soothes.  Thanks to this scheme, we can analyse the relatively uncommon food ginger on a large scale, especially in its frequent role as a natural anti-inflammatory.  

Our theoretical foundation is Frame Semantics [3], developed by the linguist Charles J. Fillmore. In simple terms, this approach suggests that words evoke entire situations, or “frames”. For example, a word such as harvest brings to mind farmers, crops, tools, seasons, and social practices. Meaning is not isolated; it is embedded in a structured scene or situation. Applied to the food domain, this perspective allows us to go beyond naming an ingredient. When a historical text mentions honey, for instance, it may be part of a healing recipe, a religious offering, or a cosmetic preparation. Each of these situations involves different participants, intentions, and effects. By modelling these as interconnected frames, we can represent who uses a food, for what reason, in which context, and with what outcome.

At present, we are manually annotating historical texts in English and Dutch. This carefully marked material will form our benchmark dataset: a reference corpus that shows, in detail, how food and its uses have been identified by humans trained on our annotation scheme. Once completed, the benchmark will be used to train a supervised machine learning model. In practice, this means explicitly teaching the system to recognise the same patterns that annotators have highlighted. By learning from many examples, the model can begin to identify similar structures in new, unseen texts. The ultimate goal is to apply it to large collections of historical sources, extracting structured information on a scale that manual reading alone could not sustain.

With this approach, we hope to uncover long-term patterns in how foods shift in meaning and function. A plant that once appeared mainly in medical contexts might later become an everyday ingredient. A substance associated with ritual activities might turn into a luxury commodity. By combining careful theoretical modelling, detailed annotation, and automated analysis, we can begin to trace these transformations across centuries. Food history, in other words, is not just about what people ate. It is about how substances travelled through different areas of life, acquiring new roles and shedding old ones. By building tools that can capture this complexity, we aim to open up fresh perspectives on the cultural and social life of food.

[1] For further information on long tail: https://en.wikipedia.org/wiki/Long_tail

[2] https://en.wikipedia.org/wiki/Text_annotation#Linguistic_annotation.

[3] https://en.wikipedia.org/wiki/Frame_semantics_(linguistics).

Categories
Achievements Presentations Publication

TRIFECTA Team Takes Home Two Awards @LDK

Gauri accepting the Best Student Paper Award from conference chair Andon Tchechmedjiev

In September, team members Gauri, Jiaqi, Marieke, Rik, and Teresa travelled to Naples, Italy to attend and present at LDK 2025, the 5th Conference on Language, Data and Knowledge. This conference is very central to the different research strands in the project, so it was great to (re)connect with colleagues and see what they are working on (and eat great food).

We presented the following papers:

  • Veruska Zamborlini, Jiaqi Zhu, Marieke van Erp, and Arianna Betti. Philosophising Lexical Meaning as an OntoLex-Lemon Extension. (presented at the satellite OntoLex workshop). This research is part of our knowledge modelling strand and in this paper we investigated how we can represent different aspects and meanings of a concept through time or in different contexts;
  • Gauri Bhagwat, Marieke van Erp, Teresa Paccosi, Rik Hoekstra. Detecting Changing Culinary Trends Through Historical Recipes. This research is part of our food history use case, and presents an analysis of different editions of a cookbook as well as newspaper recipes to see how ingredient use changes over time;
  • Marieke van Erp, Jiaqi Zhu, Vera Provatorova. Tracing Organisation Evolution in Wikidata. This paper is an investigation of how change is represented in one of the largest and most commonly used knowledge graphs. As we are considering feeding any data generated within the project back into this, it is necessary to know if existing data models are a suitable fit;
  • Andrea Schimmenti, Stefano De Giorgis, Fabio Vitali, Marieke van Erp. Old Reviews, New Aspects: Aspect Based Sentiment Analysis and Entity Typing for Book Reviews with LLMs. In this collaboration with the University of Bologna and the Italian National Research Council, we investigated the use of large language models to analyse opinions in a data-scarce domain. Whilst we used a different use case domain than TRIFECTA’s main maritime and food history use cases, we think it is important to see what connections to other domains we have and how tools work there to see if we can translate them to our use cases.

While it’s already great to get papers accepted to a conference and present and discuss them with colleagues, it was even cooler to see our efforts recognised by the fact that Detecting Changing Culinary Trends Through Historical Recipes coordinated by Gauri won the Best Student Paper Award and our Tracing Organisation Evaluation in Wikidata paper won the Best Poster Award!

The winning poster