From Ancient Greek to AI: UNSILO welcomes Kasper Fyhn Jacobsen

Kasper is UNSILO’s latest recruit, and talked to us about his fascinating background.  

I was born in a small village near Esbjerg, Jutland, and I’ve always been interested in language. I studied classics (Greek and Latin) at school, and I enjoyed grammar and translation. When it came to university, I studied linguistics at Aarhus University. I did well in my degree, and then had a gap year between my BA and MA, when I was a night porter in a hotel. I was working between 10 and night and 7 in the morning – many silent hours, and time to think. I had to do lots of manual office tasks, and I felt sure that there must be an easier way to do this kind of thing, it should be possible to automate it. When I went back to university, I considered doing a new BA in computing, and actually did some supplementary courses in maths, before starting the masters. As part of the Masters degree, I had the opportunity to work as an intern at UNSILO. I was amazed by the trust placed in me compared with the other jobs I had done – Søren Andersen, associate director of engineering, came by to my desk and asked how I was, and if I was comfortable. They were happy for me to work at home, and said we know you can work well.  

It was good to go from being a student to an intern, and then overlapping with my thesis, and then starting right away at UNSILO. My first day was the day after I handed in my thesis, and I got a top grade!   

I was able on the masters to do projects, and you could choose which projects you did. One project was analyzing recorded speech in different dialects of Danish. We developed a tool for counting the number syllables per second.  

How many dialects are there in Denmark?  

More than in our data! We had just five dialects. But it is a continuum, so not easy to state a precise number.  

I learned about Natural Language Processing (NLP) largely through self-chosen projects through my Masters, with the dialect study as one example, and then from my time at UNSILO. 

Currently, I’m living in Esbjerg (over on the west coast of Jutland) because my girlfriend is a newly qualified doctor doing medical practice at the local hospital. I will be coming back to Aarhus in September. They say Aarhus is the smallest big city you can find: it has all the institutions and cultural stuff, but it is still quite small.  

What is it like as a linguist working in a machine-learning team?  

As linguists, we are trained to identify underlying mechanisms (i.e. “rules”) at various levels in language, e.g. phonology, grammar, social differences etc. Hence, when looking at a specific linguistic problem that we want a computer to be able to handle, we often know the underlying mechanisms already. We can then either choose to hard-code our understanding of those rules, or we can apply machine learning (ML) for a computer to make out those rules on its own. The latter tends to outperform the former, though the latter is more convoluted (because ML models can be harder to inspect and comprehend). The idea of identifying any kinds of patterns from large amounts of data (without knowing the rules explicitly), as we do for everything else in our lives, is what more linguists believe goes on in language acquisition, rather than activating certain innate/hardwired capabilities. 

At UNSILO, we are very good at finding patterns. We do concept extraction using our basic linguistic knowledge of texts – we tag content by part of speech (POS). It might seem very basic, but the concepts we identify are noun phrases, and having identified them as concepts, we then combine them with quantitative evidence (counting) to create a more meaningful result.  

In the future, we will be making more use of the computational power available to us, with neural networks becoming cheaper and cheaper  – there are a lot of tasks we can do. Everything we do is sequential, at the moment. But that is not how things work in our brains. Ideally we want to mimic what works in our brain, and carry out thought processes simultaneously. 

Many thanks, Kasper! 

Receive an email every time we publish a new blog post