AI, Citations, and Bias

Training image of a cat for a supervised AI tool (Public domain image, CC0)

A common misconception about AI tools used for academic text is that they all work in the same way – and UNSILO is frequently (but wrongly) grouped with other tools that are based on usage, or citations, for example. In this post, we explain the difference.  

One fundamental distinction between types of AI tools is to distinguish “supervised” and “unsupervised” methods. Supervised tools make use of a training set. Training sets are typically (although not always) a human construct. For example, you want a machine to distinguish pictures of dogs and pictures of cats. You show the machine 50 pictures labelled “cat” and 50 pictures labelled “dog”, and these 100 pictures represent your training set. The machine then uses the data provided to look at a new picture, and, based on the training set you provided, it makes a decision whether the picture you presented is a dog or a cat.  

Clearly, the success or otherwise of this procedure depends on how good the training set was. If the training set is large enough, covering all possible manifestations of the cases we want to find, the quality can be very good indeed. For example, systems trained to convert handwritten numbers and characters into digital versions can today interpret human handwriting with more accuracy than humans can, and of course many times faster than a human could.  

Of course, there will often be “edge cases” where what is submitted cannot be interpreted correctly by the machine. But with supervised learning, one key principle to note is that the system cannot perform any better than the human-curated data supplied to it. If the pictures of dogs and cats supplied are sufficient to separate, say, 80% of the images supplied, then the machine, however good the algorithm, cannot exceed that figure.  

A supervised model simply replicates whatever bias is present in the humans who create the training set; so if, for example, a training set of scientists comprises 25% females and 75% males, that disparity will continue in any algorithmic extraction from the corpus.  

UNSILO’s unsupervised concept extraction does not use a human-based training set. It compares a text against a corpus, typically comprising millions of words. It finds all the words and phrases that are distinctive about this text, by comparison. It uses words in context to identify syntactic (“bridge”, “bridges”) and semantic (“bridge”, “crossing”) similarities and synonyms (“kidney”, “renal”). Unlike the supervised method, this way of identifying concepts is not dependent on humans identifying concepts in advance: the system trains itself, based on the corpus provided.  No system is perfect, but in this respect the limitation is simply the size of the corpus. 

To summarize, therefore, the unsupervised concept extraction used by UNSILO does not have the limitation of a human-curated training set, but it may contain inadvertent bias. It cannot be free of bias, because inequalities in the corpus will still remain. If, for example, most science articles have in the past been written by males, then that gender bias will persist on any AI operation on the corpus.  

Nor is UNSILO’s related api based in any way on citations and usage. The developers of UNSILO felt strongly that the fundamental use case for researchers when exploring academic content is not to see how many times an article may have been downloaded or read; that may or may not correspond with the veracity of the argument. Instead, the related API simply identifies overlapping concepts between articles. Instead of keywords, usually limited to as few as four or five terms, UNSILO uses over 200 concepts for every academic article, then identifies a relevance score for each concept for that document and matches the concepts in that order. Hence UNSILO is free from the criticisms levelled at citation-based tools, for example that the more citations the more important the article.  

When considering bias, it is worth comparing the bias for any manual operation that the AI tool may work alongside. It seems unfair to point out a bias in the AI tool when that bias was pre-existing in the corpus on which the AI tool works. For example, many publishers keep a database of peer reviewers: authors who have reviewed articles for them in the past. Such a database inevitably reflects a historical perspective. Today, more STEM articles are published by Chinese authors than by Western authors, but of course that preponderance will not be reflected in the publishers’ reviewer database. When UNSILO suggests a peer reviewer, it uses only subject relevance as a criterion for making the selection. The human editor then makes the decision, and is free to select any names they prefer. Our goal is to provide ten or so names, all of whom would be qualified by subject matter and by the fact that they have authored in the subject recently, to write a review. 


Receive an email every time we publish a new blog post