ExtractIQ

White Paper

When you have eliminated the impossible, whatever remains, however improbable, must be the truth!

A famous Sherlock Holmes quote written by Sir Arthur Conan Doyle in the novel “The Sign of the Four” and repeated by Spock in the Star Trek movie “Star Trek VI: The Undiscovered Country”. Both highly analytical and logical characters saying that if we have sufficient information available about all the possible outcomes to a given problem, we are in a much better position to determine the correct solution. White Paper

How often in business are we faced with the same dilemma? Asking many questions to frame possible outcomes and relying on data to back them up. In the absence of accurate data, we need to rely on intuition and experience to make educated guesses. This white paper is not going to give you Vulcan super-human powers or move you into 221b Baker Street, London but will give you some insights into how intelligent document processing (IDP) technology can put data at your fingertips, something not previously possible.

What is Intelligent Document Processing?

It is all about discovering useful things from content. The content can be virtually anything: from a scanned letter, a social media message, to an audio recording. In each case, the content contains sentences and words. But we can already read the contents of a document. So, what is the big deal?

Unfortunately, it is not so easy for a computer to do what a human can do so readily.  Imagine, you were given a document to read written in Portuguese and you do not speak that language. All you would see is a list of words without being able to understand anything about the meaning of the document. This is what the computer “sees” when it reads the contents. Intelligent Document Processing (IDP) is technology that gives software cognitive capabilities to understand and put into context what has been written.

It can take us quite a long time to read the contents of a document, The Chronical of Higher Education estimates that we can read about 100 words per minute – so roughly 200 pages in a day. A computer can read that same content in a fraction of a second. This is the real opportunity – we can now read huge amounts of content in a short time which was previously not possible.

This white paper explores the technology that makes this possible and gives you an idea on some of the benefits as well.

Let us start by looking at what we want to achieve when software “reads” the contents of a document. As a human reading content, the brain translates strings of letters into words. At the same time, language processing, or comprehension, gives meaning to those words and integrates them with our existing knowledge. We are constantly learning and increasing our knowledge.

But, what about devices – do they behave the same as humans? Over 30 million speech recognition devices were sold globally last year, and this number is expected to grow to nearly 60 million this year. So, when you ask Alexa, Siri, Google, or Cortana, “What’s the weather going to be like today?” the device records your voice. Then that recording is sent over the Internet to computers which parse the recording into commands it understands. Then, the system sends the relevant output back to your device. Every time a mistake is made interpreting your request, that data is used to make the system smarter the next time around – the software is constantly learning, just like the human.

The same natural language processing used by voice activated devices is also leveraged by content analytics. The software can break down the individual words within the content to understand the “things” that are referenced and the context of the subject matter. The content might be a short sentence within a social media message to hundreds of pages within a report.

Often the goal is to recognize and extract information to categorize the content. Wikipedia defines categorization as “… an activity that consists of putting things into categories based on their similarities or common criteria. It allows humans to organize things, objects, and ideas”. By understanding the content, we can perform something like a librarian by organizing materials so that information can be easily found by others. Assigning keywords or tags describing the content allow it to be organized and searched without needing to be read. The computer can automatically organize and categorize large quantities of content by creating a digital library.

In other situations, we might already have a specific topic and quickly need to find all content which makes some reference to it. Perhaps we want to understand a sentiment regarding the subject, for example, find content where there are negative views being expressed towards the subject matter. This process can sift through huge quantities of content and via analytical processing differentiate content with high probability of relevancy to the topic under investigation. In a short time, a small subset of relevant content is identified for further investigation.

We will now explore a little more about how the system is able to achieve these results. If you would like to read all this white paper, please provide your name and email address and we will send you a link to download the white paper straight away.

White Paper Request