The problem with keywords

Keywords have been with us since the 1960s. They were first used when digital content started to become a reality, in the 1960s, as a way of facilitating discovery. A glance at the Google ngram viewer shows a dramatic increase in their use up to around 2000, since when references to “keyword” have plateaued: 

Where do keywords come from? Some publishers mandate that authors provide a number of keywords for any submitted article. For example, an article provided by a publisher aimed at researchers on the use of keywords states “most journals require authors to select 4-8 keywords (or phrases) to accompany a manuscript to facilitate online searches”. How should the author identify these keywords? Very sensibly, the guide article suggests that for medical content, the author should use keywords from MeSH, the medical subject headings curated by the US National Library of Medicine. But the specific details become quite challenging. It is recommended that authors should : 

  • Avoid using esoteric terminology 
  • Omit very general search terms such as “cell” 
  • Avoid any abbreviations that may have multiple meanings 
  • Use Google Scholar to find “commonly used, yet specific terms” 

This is quite a challenge in information management terms, and it would be quite a challenge for any author to follow all these rules. Perhaps it’s not surprising that many authors stick to the simplest response, following the suggestion of four to eight words. For example, an article entitled “Associations between aspirin use and the risk of cancers: a meta-analysis of observational studies” has just four keywords, Aspirin, Cancers, Meta-analysis, Observational studies. That sounds straightforward enough; but the article abstract clearly states the study is about the regular use of aspirin, rather than occasional use – this distinction would not be picked up by these keywords.  

Do keywords still have a function? It would appear so from the number of articles that contain extensive lists of keywords. But the availability of full-text indexing would make the need for keywords obsolete, you would think. Nonetheless, keywords are more widely used today than evern before, it would seem. Some articles have extensive lists of keywords, presumably compiled by information professionals involved in the compilation of the article. Here is one example, an article about liver transplantation, with no fewer than 41 keywords:  

Keywords for one academic article

The selection of keywords is intriguing – it certainly doesn’t follow the guidance given to authors quoted above (terms like “adult”, “human” are very general indeed). The article on which it is based is a meta-analysis article, a review of clinical trials, and which (commendably) already has an clearly documented search strategy used to find the clinical trials described in the article. So what is the function of these keywords? To make this article discoverable? If so, why include “Bayes Theorem”, which looks very peripheral to the subject of the article, and which is not included in the search strategies used to conduct the search? And what do combinations of terms mean, for example “combination / mortality”? Plus, of course, the problem that many of these terms are polysemous – that is, they can have more than one meaning, depending on the context. The term “combination”, for example, might appear in many different contexts. Clearly, a considerable human effort has gone into compiling this list, but its position in the discovery workflow is not entirely clear. Nonetheless, compiling large numbers of keywords is very popular. There are many websites that offer to identify relevant (or not-so-relevant) keywords for any word or phrase, such as the site illustrated at the top of this article.  

Perhaps a more effective methodology might be to understand the use case behind these keywords. If the goal is for a researcher to find content about liver transplantation, then there are machine-based tools for this purpose. UNSILO uses a concept cluster to “index” any document in a corpus. That concept cluster, which comprises hundreds of concepts for any document, all of them ranked by relevance for that article compared to others in the corpus, is then used to identify related content, or to identify (depending on the use case) a relevant reviewer or journal for that article. Having created a concept cluster, it then enables the user to find “more like this”; not on the basis of arbitrary concepts selected by a human, but using concepts based on the actual occurrences in the corpus. Most remarkably, a tool like Classify can identify all the synonymous and closely related terms to any other in the corpus. Here, for example, are some of the related concepts identified from PubMed for “liver transplantation”: 

Of course it is possible to argue details about the two approaches, human and machine, but it seems clear from the above that the machine identifies some concepts not included in the 41-term keyword list. For example, the system finds that “hepatic transplantation” is a close synonym for “liver transplantation”; there is no similar concept in the human-generated  keywords.  

In conclusion, this quick assessment is not intended to be a simple points-scoring exercise. The message should be that both approaches, human and machine, can contribute, and that we should be working to identify ways of making the most effective combination of the two – something UNSILO Classify does very well, using its “human-in-the-loop” configuration dashboard.  

Note: the concept expansion shown above is available in UNSILO Classify for most terms and phrases used in the corpus.


Receive an email every time we publish a new blog post