Identifying semantically related terms

Although the word “semantic” is bandied about these days with little concern for accuracy, most searching we carry out with search engines remains anything but semantic. While UNSILO has been from its foundation involved in identifying and using semantic links between concepts, it was sometimes difficult to demonstrate to users exactly how the semantic tool was working, and, indeed, even to show that it was genuinely semantic. 

A recent enhancement to the UNSILO Classify tool has made explicit how the underlying extraction engine reveals semantic relationships. UNSILO Classify was built to enable users to create subject collections, even if they were not experts in the topic. One problem with building a subject collection is to know all the various ways a topic can be described. We are all familiar with synonyms in English, and frequently medical terms have both a common name and a technical name for the same condition or process: “cardiac” for “heart”, “dental” and “tooth”, for example. This is one of the fundamental challenges in searching, and Boolean tools, which are string-based, have never successfully solved it. At a medical conference recently, a workshop about systematic reviews noticed that a Boolean search missed many relevant hits because it did not includes some key synonyms. The topic of the article was to examine the effect on mental health of using hand-held devices. The Boolean search to find relevant articles included “smartphone” and “mobile phone” and “handheld device”, but did not include “cellphone”.  Now, it is very difficult, even for information professionals, to identify all the various ways of naming a topic. It was perhaps not surprising that it took a class of 30 professional searchers to discover one missing term in a search strategy.

The UNSILO engine is based around the analysis of a corpus; it identifies the most related words and/or phrases for a term from that corpus.

UNSILO Classify identifies concepts by looking at words and phrases in context. This means it is able to identify degrees of synonimity from any corpus of text. The benefit of this technique is that the tool identifies all the related terms from that subject-specific corpus (for the corpus shown here, medicine and healthcare). The innovation for UNSILO Classify is that semantic links are now revealed for any concept in a single click. That means, for example, that the user can see common equivalent terms for “kidney disease”, including “renal disease”, “renal failure”, and so on:

Showing related concepts for “kidney diseases” from a major medical corpus

This feature is simply not possible using Boolean tools; the relationship between “kidney” and “renal” is not something that a string search can identify, without external human input. 

It should be noted that the new UNSILO Classify capability is not, strictly speaking, finding only synonyms. It finds related terms that appear in the same context as the original concept. Hence, “kidney function” and “kidney failure” are not synonymous, but frequently appear close together in medical literature.

Here, the human input takes over. The automatic system suggests concepts, but does not automatically expand the search – users do that. Users can decide for themselves which of the terms are synonymous and should be included in a subject collection. Here is the real innovation: humans are very good at looking at a list and selecting relevant terms from it; humans are very poor at starting with a blank sheet of paper, as it were, and trying to guess all the related terms for a concept. In this case, the term “semantic linking” is fully justified.

Receive an email every time we publish a new blog post