The recent MarkLogic World London event (June 13th-15th, 2016) had several sessions either devoted to or relating to text analytics and text mining. Increasingly, MarkLogic customers are using these tools to improve productivity and to reduce manual processing. One of the most interesting presentations was from KPMG, who presented a solution they have built for automating many aspects of financial compliance, using machine-learning tools for concept extraction and manipulation.
Michael Henry, who gave a witty and entertaining talk, had a lovely term for the current state of financial compliance as carried out by humans: “stare and compare”. Basically, a lot of humans have to stare at a lot of documents and compare them to other documents, for compliance to be achieved. Compliance tools have been increasingly required in the past few years, as we all know, and for banks, compliance costs have now reached in some cases 20% of their net revenue (up from a single-figure percentage a few years ago). Typical compliance requirements for wealth management include:
KYC = know your customer
AML = anti money-laundering
NNC = negative news checking
A typical requirement for a bank today is for the regulator to require that the bank checks all 200,000 of its existing customers for a KYC check, within the next 12 months. Banks carry out this kind of checking by a manual read of relevant documents, such as mortgage applications. Information that is considered relevant is then copied by hand to a compliance register. This highly manual (and deeply unexciting) task, the “stare and compare”, is typically carried out by large teams of offshore agents. Not surprisingly, the staff turnover in these teams is very high; nobody wants to do such work.
Instead, KPMG introduced a more automated process. Using a combination of a MarkLogic database, some OCR tools and a content enrichment framework with semantic tools to infer conclusions with a stated confidence score, the new system was able to reduce the human input by 80%. The remaining 20% of the time is taken up by humans checking the issues that still remain when the automatic system has done its work. So successful has this operation been that KPMG are now supplying this compliance automation tool to 14 banks in the US. Remarkably, the new system has provided several advantages in addition to reducing costs. A much greater number of compliance checks are now run on the captured documents than was ever possible before. When the regulations change, all the documents can be checked again against the new criteria. Finally, the number of errors inherent in a large-scale manual checking process has been reduced (ask any human to carry out a very repetitive task for long enough and they start to make mistakes)
The principles that KPMG followed are to a large extent applicable to any text analytics process, such as extracting concepts from academic content:
1. Collect data only once, if possible, by ingesting unstructured documents into machine-readable form. Remove people from the digitisation process as far as possible (which means restate your criteria in a format that a machine can validate).
2. Run the documents past a content enrichment framework (avoiding the “stare and compare”).
Publishers are not involved for the most part in financial compliance, but any objective assessment of the publisher workflow reveals many areas of extensive human input, including indexing and tagging, and much of this could be fully or largely replaced by automated solutions – not least from UNSILO. Many people in the audience were surprised that such a sensitive area as compliance could be handled via automated tools; yet it was clear that the results were better than the original, fully manual, operation. Financial compliance is yet another example of the application of text analytics to a host of areas that were considered until a few years ago to be requiring human curation.