Benchmarking Food Identities: the Challenge of Capturing Food Over Time

By Teresa Paccosi

When we think about food in history, we often picture ingredients, recipes, or famous, maybe strange, dishes. But food has never been just about eating. Across centuries, it has been used as medicine, symbol, ritual object, cosmetic ingredient, and even a marker of status or identity. Understanding this rich and shifting landscape is far more complex than simply spotting the word bread or wine in an old text.

This is one of the challenges we have taken on in the Trifecta project.

Our aim is to build a system that can automatically detect references to food in historical texts, analyse how they are used, and trace how their meanings change over time. That might sound straightforward, but it actually raises some difficult questions. One major issue is what researchers call the long tail: the many items that appear only rarely in texts [1]. In large digital collections, common foods such as bread or beer show up again and again. But less frequent items, perhaps a particular herb used in healing, or a spice mentioned in a ritual context, risk being missed by automated systems that are often trained to detect the most frequent terms. From a cultural or historical perspective, however, these rare mentions can be extremely revealing. They may point to local traditions, specialised knowledge, or forgotten practices.

For this reason, we are not interested in food solely as nourishment. We are developing a model that also captures its uses in other processes, such as medicinal or ritual ones. By widening the lens in this way, we can identify food items that may seem marginal in overall frequency but are central within specific domains of practice. This helps us map how foods move between spheres of life and acquire new meanings in time and context.

To do this properly, we first needed a clear and consistent way of defining what counts as “food” and how its uses should be described. We therefore designed a structured annotation scheme:[2] a set of guidelines for human researchers to follow which support them on marking up texts consistently, i.e. to avoid divergency as much as possible. These guidelines do not only record the food item itself and its characteristics, but also its purpose, context, and function. For instance, in the sentence “ginger soothes sore throats”, our scheme not only identifies the food item, ginger, but also highlights its typical use in a medical context, prompted by the word soothes. Thanks to this scheme, we can analyse the relatively uncommon food ginger on a large scale, especially in its frequent role as a natural anti-inflammatory.

Our theoretical foundation is Frame Semantics [3], developed by the linguist Charles J. Fillmore. In simple terms, this approach suggests that words evoke entire situations, or “frames”. For example, a word such as harvest brings to mind farmers, crops, tools, seasons, and social practices. Meaning is not isolated; it is embedded in a structured scene or situation. Applied to the food domain, this perspective allows us to go beyond naming an ingredient. When a historical text mentions honey, for instance, it may be part of a healing recipe, a religious offering, or a cosmetic preparation. Each of these situations involves different participants, intentions, and effects. By modelling these as interconnected frames, we can represent who uses a food, for what reason, in which context, and with what outcome.

At present, we are manually annotating historical texts in English and Dutch. This carefully marked material will form our benchmark dataset: a reference corpus that shows, in detail, how food and its uses have been identified by humans trained on our annotation scheme. Once completed, the benchmark will be used to train a supervised machine learning model. In practice, this means explicitly teaching the system to recognise the same patterns that annotators have highlighted. By learning from many examples, the model can begin to identify similar structures in new, unseen texts. The ultimate goal is to apply it to large collections of historical sources, extracting structured information on a scale that manual reading alone could not sustain.

With this approach, we hope to uncover long-term patterns in how foods shift in meaning and function. A plant that once appeared mainly in medical contexts might later become an everyday ingredient. A substance associated with ritual activities might turn into a luxury commodity. By combining careful theoretical modelling, detailed annotation, and automated analysis, we can begin to trace these transformations across centuries. Food history, in other words, is not just about what people ate. It is about how substances travelled through different areas of life, acquiring new roles and shedding old ones. By building tools that can capture this complexity, we aim to open up fresh perspectives on the cultural and social life of food.

[1] For further information on long tail: https://en.wikipedia.org/wiki/Long_tail.

[2] https://en.wikipedia.org/wiki/Text_annotation#Linguistic_annotation.

[3] https://en.wikipedia.org/wiki/Frame_semantics_(linguistics).

Recent Posts

Archives