Bolette fights for the Danish language in the age of algorithms¶

Bolette Sandford Pedersen has been working with language models since 1989. Back then it was called machine translation systems, and artificial intelligence was not part of the equation. Today the field has exploded, and the professor from the University of Copenhagen has just been named to Denmark's Top 100 Women in AI. Meet the computational linguist from DFM who refuses to let closed, foreign models define our Danish language community.

Something has changed when Bolette Sandford Pedersen attends family gatherings. It is not the terrible uncle jokes or the homemade songs. What has changed is that she no longer has to spend half an hour explaining what she does for a living. Today her field of research — language models — has become common knowledge.

"Society's attention to my field has exploded completely. I have been doing this for nearly 40 years, and the enormous interest is fun, but also challenging," says Bolette Sandford Pedersen, who heads the Centre for Language Technology at the University of Copenhagen.

With the launch of ChatGPT, the public truly opened its eyes to the technology Bolette has dedicated her professional life to since 1989, when she was developing machine translation systems. This has led to her playing a central role at Danish Foundation Models (DFM) in evaluating language models adapted to the Danish language — helping to ensure that the artificial intelligence of the future can actually understand what it means to write and comprehend Danish.

In May she was also named one of the ten most influential women in AI in Denmark in the category 'Research and Science'.

"I am deeply honoured by this recognition, because it matters greatly to me that we create a diverse development environment where different genders, ages, and cultures contribute and ensure different perspectives on AI. It is our differences that qualify the conversation about what language models should actually be capable of," says the computational linguist.

Rye bread work and shoes that are too small¶

At DFM, Bolette Sandford Pedersen has spearheaded a unique metaphor dataset containing thousands of typical Danish metaphors designed to evaluate large AI models — an area where the models frequently fail.

"'Rugbrødsarbejde' ('rye bread work', meaning steady, unglamorous effort) is for example a wonderful part of our figurative language in Denmark that requires quite a bit of cultural context to be understood, because the concept refers back to a staple of our diet: rye bread. And fixed expressions like 'at gå i for små sko' ('to wear shoes that are too small', meaning to underachieve) or 'at være som snydt ud af næsen på nogen' ('to be blown out of someone's nose') are also wildly misinterpreted by the models. At the same time, the models happily invent compound words like 'våbenstilstand' and 'isvogn' — words we might understand, but which simply do not exist in Danish," she says.

Ultimately, the fight for good Danish language models is also a fight for sovereignty. If we allow English-trained models to set the agenda without properly adapting them to our context, we risk not only impoverishing the Danish language, but fundamentally changing its very structures, she believes:

"I am not that worried about the loanwords we import from English. I am more worried about our heirloom silver: how we construct sentences, and how our vocabulary and figurative language are composed in Danish. Our entire system of meaning is drifting towards English — for example when we start using the word 'hjælpsom' (helpful) about things other than people, simply because English uses 'helpful' more broadly. It is moving fast right now, and I think we need to keep a watchful eye on developments."

Push-ups for the language brain¶

Although Bolette Sandford Pedersen has dedicated her life to language technology, she warns against becoming too comfortable in its company.

"If we are not careful, our language risks becoming extremely homogenised by artificial intelligence. The models tend to write in a very standardised and heavy style with many passive constructions and noun complexes. It would be incredibly sad for us as a language community if the use of language models leads to the Danish language deteriorating, and we lose all the small, delightful devices we have for expressing ourselves as exactly who we are," she says.

Her advice to the curious dinner companions at family gatherings — and to Danes in general — is therefore just as analogue as it is blunt:

"I honestly fear that we will collectively become less sharp, because we are lazy by nature and will therefore always resort to the easiest solution. Including when it comes to checking and verifying AI answers. We should read at least 20 pages of good fiction every day. Exactly like we do push-ups for the body," she concludes.

Danish Foundation Models is part of the Danish government's strategic AI initiative. The goal is to post-train, evaluate, maintain, and provide open access to large Danish language models. The project is a collaboration between Aarhus University, the University of Southern Denmark, the University of Copenhagen, and the Alexandra Institute. Read more about DFM here.