Do research-paper recommendations work?

The UNSILO Recommend tool in action on the Cambridge Core website

Most content-based websites contain recommendations. Whether you are using Amazon, Netflix, Spotify, or Google Scholar, most sites provide some kind of  link to other recommended content. How effective are these links? And can we compare recommender tools from one site to another? Is it valuable for a publisher to add recommender links?

One thing to make clear immediately is that academic recommender tools work in a very different way to other kinds of recommendation. For an e-commerce site, recommendations are typically calculated by tracking what people bought. If you buy an inkjet printer, the chances are you will want to buy printer ink next, so it makes sense to provide a recommender tool based on what users bought after buying the item searched for.

But this kind of recommendation is of little use for academic searching. Firstly, academics aren’t buying anything; we don’t know if the article links they found were useful or not. You could argue that citations from other articles is some measure of success: if several academics cited this article, it must be important. But that isn’t necessarily the criterion of relevance for academics: what a researcher frequently wants to know is if anyone has investigated (say) the effect of aspirin on diabetes patients, whether or not their work has yet been cited. The UNSILO recommender algorithm does not make any use of citations; it is completely content-based, without any usage tracking.

A simple contrast between recommendations is to compare how the recommender tool behaves when no relevant link is found. A tool such as Netflix is designed to provide the user with something new to watch, even if you search for a title Netflix doesn’t offer (Netflix never returns zero results from a search). In contrast, academics are not interested in always finding a hit; they want only relevant hits, not just any hit. Unfortunately, even Google Scholar returns hits if a meaningless combination of terms is added (try searching for “aspirin metallurgy volcanoes trainers” on Google Scholar, and you still get 30+ results). Despite its title, Google Scholar is fundamentally a string-matching tool just like its bigger parent, and it isn’t difficult to see its limitations.

The introduction of AI-based tools has transformed the provision of recommendations, moving from string search to concept-based search. Let’s say I am interested in articles on diabetes and aspirin. However much of an expert I might be, I certainly don’t know how the researchers phrased their title: is it “diabetes” or “diabetic”, or is diabetes referred to more elliptically, as “impaired glucose tolerance”? A concept-based tool such as UNSILO, which is not reliant on a few keywords for each article, can find all the main variants of phrases and ensure a search produces comprehensive results.

If we do succeed in providing links to relevant content, how can we measure its success or otherwise? Major surveys of how academics search for and find content (e.g. How Readers Discover Content, a long-term study updated every three years) show that academics use search and discovery very extensively, despite (we may assume) being subject-matter experts in their chosen domain. In other words, checking citations, listening to presentations, and reading articles still leaves gaps for an academic, gaps that that they fill using search. The details of how they search may vary, but it’s clear that most if not all academics are involved in searching for new content as part of their academic work.

Perhaps the most common metric used by publishers is the click-through rate (CTR), the number of times a user on the page clicks on one of the links provided. Click-through rate means here the number of times people who open a content page such as this click on one of the links on the right. This simple measure enables different recommender tools to be compared, but also gives an idea of how widely used these tools are. Has anyone measured the use of these links? Yes, a 2014 research paper showed that ratings by humans correlated quite closely with CTR (perhaps not surprising):

Ratings in the user study correlated strongly with CTR. This indicates that explicit user satisfaction (ratings) is a good approximation of the acceptance rate of recommendations (CTR), and vice versa.  (Beel and Langer, 2014)

In other words, researchers click on the articles they judge to be most relevant. Interestingly, the same paper found that providing links via what they termed “inferred ground-truths” (such as citations) are generally flawed and far less valuable for evaluating research paper recommender systems.

What kind of click-through rates are found in practice? From feedback via publishers, we hear that a typical recommender system may achieve at best a 6.5% CTR. Although this figure may seem low, anyone familiar with actual usage of websites will know that the statistics for interaction with a site beyond the first search are always very low. The click-through rate for display adverts on a webpage are typically 0.1% – and even that figure often comprises mainly bots, automated systems that click on all the links on a site on a regular basis.

Can click-through rates be compared? An article in Forbes (Fou, 2020) points out that every website will have a different result for CTR, and the results are not comparable – the time a user spends on a hotel booking site will be much greater than the time taken by an academic to find one scholarly article. So comparisons are only meaningful within the same site, or with other websites that have a similar function.

Another metric of site engagement is page views per session. For instance, an average pages per session of 2 means that each user to the website visited two pages before leaving. While this metric will vary widely from one site to another, it is a good way of identifying trends over time on a single website, or of comparing two websites with a similar function (such as two academic publisher sites). Using a recommender system typically results in a clear increase in pages per session. A typical value when using a recommender system will be above 2.

Given the differences between sites, it would be invidious to single out specific products, but publishers report to us widely differing results when comparing recommender tools. There is a very clear difference between recommenders – variations of 2x or even 3x the usage of one tool compared with another are common. We are happy to provide access to UNSILO Recommender tools for evaluation purposes, and we strongly recommend trialing two or more products in this way, ideally via an A/B test lasting over at least a month.

One conclusion is very clear: recommender tools work. Any recommender is good, but concept –based tools seem to provide a good experience for researchers that they use again and again. Recommended links are a vital part of the academic workflow, and they significantly increase engagement. After all, any publisher who misses out on a significant enhancement in site usage and benefit to users is not making the most of their content.


Receive an email every time we publish a new blog post