The Publishing Research Consortium (PRC) have published a report on “Text Mining of Journal Literature”. This is a major survey, with over 500 responses, and so the results were eagerly anticipated, especially because the same organisation carried out a similar survey in 2011. What does it tell us about text mining today? It would appear from the survey there has been no major breakthrough in text mining during the last five years to make it become a mainstream academic technique.
• Around 75% of researchers have never done any text mining
• But around two-thirds of respondents were open to learning more about the technique, mainly for the literature review stage of research.
The survey sample was derived from the mailing lists of four publishers, Elsevier, SAGE, Taylor & Francis and Wiley. It is worth noting (although not stated explicitly) that the survey is only of scientific researchers. This makes it all surprising that the reported take-up is so low. The survey did not define text mining in detail, but would appear to restrict itself to tools for the large-scale assessment of scholarly content, rather than the use of text analytics tools within commonly available search interfaces, such as SpringerLink and Elsevier, to search and to classify content – this is a much more widely used (yet not evaluated here). Given Google’s extensive implementation of text analytics tools, it is probably true to say that no researcher today can be unaware of text analytics. While the report suggests that more text mining tools should be “plug and play”, the truth is that most researchers are already using text mining tools even if they may not always be aware that text mining tools are being used for their information search and retrieval.
Oddly, although perhaps as befits an article for an academic readership, the survey begins by measuring the number of scholarly papers devoted to text mining – this is growing by around 10% per year over a five-year period.
Most text-mining users have been using the tool for some time. They use text mining primarily to extract information, concepts and new facts (around 70%), followed by classification of documents (40%). How would non-users like to use the tool? It is always dangerous to infer a conclusion from users about their views of a tool they are not using in practice, but their stated preference is clearly for the literature review (over 50%), and finding hidden links between content (nearly 50%).
Not surprisingly, the highest use of text mining is in life sciences (41% of respondents compared with an average of 24%). The report includes some helpful regional variations, which provide some rather surprising results. For example, by region North America has the lowest use of text mining of all world regions – the highest is South America. The country with the highest proportion of text-mining usage was India, with 30% of respondents using it (compared to an average of 17%).
While the report concludes that text mining is a technique that is still not very widely known or used, it could be argued that text mining tools are now present in many disciplines, within and outside academic, and can only become more widespread in the coming years as academics become more familiar with their capabilities.