Text Interrogator: Extracting Text From Financial Documents Like a Boss

By Dan Huss

Extracting Text From Financial Documents Like a Boss

Text Interrogator

Extrapolating and applying detailed information from numerous lengthy documents has been a major pain point for many industries for years. As long as crucial information has been catalogued in this format, brute force has been the primary, and in many cases the only way to synthesize it. But the reliance on the accuracy and comprehension of humans, not to mention their attention span and bandwidth, means that important information can be siloed or even completely missed for indefinite amounts of time.

Enter Text Interrogator-- a revolutionary artificial intelligence tool from the Gravity AI marketplace. Text Interrogator automates the extraction of information from documents in a question/answer based process that mimics the human undertaking of comprehending and answering questions about information in a document. With this tool, it is possible to turbocharge the accuracy and efficiency of document synthesis, allowing organizations from many different industries to focus on what they do best, rather than on tedious data extraction.


The Problem of Information Synthesis

For many years, digitizing information from contracts and other financial or legal documents into spreadsheets has been the best attempt at codifying volumes of information into searchable databases of files. But the constant challenge is that every single documented piece of information is unique in terms of what types of information are included and how this information is formatted. In other words, there are lots of things that can be missed when cataloguing and searching this information, because in many cases, human reasoning skills are needed to draw parallels or connections that go beyond simple keyword searching. But this type of human comprehension-driven synthesis is majorly limited by human work capacity. For example, it may be no problem to look through a contract to determine the name of the signatory or the total cost of services, but repeating this process across hundreds or thousands of documents, each of which may be dozens of pages in length, may take days worth of man hours. This problem only gets worse as it goes from theoretical to real life and raises problems along with it: What does it cost to have a person dedicated to data extraction? Does your firm have a person with the knowledge and bandwidth to be dedicated to this task? What happens if that person, who is the sole expert on this type of information, is on vacation, or if they leave the firm? Ultimately this seemingly innocent, menial task of reading documents for specific bits of information can be a major pain point, as it can fall in the critical path of larger tasks and ultimately slow down operations or even cause oversights, missed opportunities and lost revenue.

The Solution

Text Interrogator is an artificial intelligence tool that is able to take on this previously human-driven task by mimicking the human synthesis process, but at scale and in real time. At its heart, Text Interrogator is a tool that finds information  in documents by making use of natural language processing (NLP), an intersectional subfield of linguistics, data science, and artificial intelligence. The program has been pre-trained, based on Google Brain’s Bidirectional Encoder Representations from Transformers (BERT) language model, and further trained using the Stanford Question Answering Dataset (SQUAD). From an application standpoint, this means that, as opposed to keyword searching, users are actually able to ask specific questions about documents, like “What is the total value of the invoice?”, “What is the name of the author?” or “When is this document dated?” and Text Interrogator combs the documents and finds these answers in real time. Text Interrogator is combined with technology from Optical Character Recognition (OCR) tools, which allow it to extract information from PDFs and other scanned documents and images files that span mediums, and are not limited to text files that originate from computers.

The fact that Text Interrogator is able to process information in the form of questions, alleviates headaches associated with inconsistencies of formatting or organization of information across documents. For example, in a keyword search, any spelling variation, synonym, or alternative phrasing for a given type of information would have to be accounted for manually, or else it would be overlooked. This problem is also exacerbated across documents or information fields that may be organized differently from one to the next. The natural language processing corpus of knowledge allows Text Interrogator to intuitively detect these types of nuances and account for them instantaneously, as if a human was reading through the documents, but without the time and effort required of a human, and without the potential for user error.

Who It Helps

The primary intended use of the Text Interrogator model is to extract information from financial documents, which makes it instantly helpful to any customers in business and finance sectors. But because of the query-based information extraction and its adaptability based on NLP, it has the power to search document repositories for specific types of information, prioritize incoming documents and applications to search for best fits, and to analyze investment notes to partition estimates of sum total investments into specific areas between specific dates. So in other words, there are nearly unlimited application possibilities for the platform, that include legal, real estate, lending, investing, and many more verticals.

Does It Really Work?

The real impact of automation is seen at scale, and many industries are starting to realize the incredible value of artificial intelligence. Recently, JP Morgan reported that by automating previously human-driven, mundane finance tasks, they could complete 360,000 man hours worth of work in a matter of seconds, and in doing so, save millions of dollars. This is precisely the mindset that drives the utility of Text Interrogator. In a recent experiment, Text Interrogator was able to successfully extract court decisions out of a collection of legal documents. By the numbers, there are 286,000 civil case filings in the United States each year. If it took just 15 minutes to digitize each court case, it would take 71,500 man hours to complete this task--that’s more than 8 years. Using Text Interrogator, these cases could be combed for specific answers within a matter of seconds. Of course the legal sector is just one example of the numerous verticals with similar volumes of data that need to be parsed on a constant basis, and Text Interrogator has the capability to automate that task and save time and money for these industries as well.

Although performance and accuracy is considered to be high, artificial intelligence such as that seen in Text Interrogator still works the most effectively when coupled with a human in the loop. Text Interrogator makes this type of data validation simple by highlighting the information it used to make its judgements for ease of human fact checking. By definition, no artificial intelligence platform is perfect, and human input to flag mistakes or mis-judgements is crucial to making the performance more accurate over time. Performance is also able to be optimized by frequently retraining and recalibrating Text Interrogator with the most up-to-date corpuses of information.

Get Started

Are you interested in learning more about how Text Interrogator can optimize document parsing within your business or industry? You can book a test of the program today, or contact the developer with more questions. As with all tools in the Gravity AI database, Text Interrogator has been vetted and approved by regulatory bodies governing its core industries, meaning it is ready to download and begin implementing today.