ExtractIQ

Machine Learning

Machine learning is an application of artificial intelligence (AI) that provides systems the ability to automatically learn and improve from experience without being explicitly programmed.

We configure the ExtractIQ Recognition Engine to process content to extract useful metadata. We accomplish this in several ways with different recognition approaches.

In some situations, the recognition engine needs to understand what type of content is being processed so that it can apply specific techniques to extract the metadata. One example is the identification of an invoice type so that zonal recognition can take place that is dependent on the layout of the invoice. There is no easy way to perform this invoice classification, so a series of tests need to be performed based upon the analysis of a sample:

Recognition Tests

Is there reference to the term “Invoice”?

Is there reference to the term “ACME Inc.”?

Recognition Tests

Is there reference to the term “Total”?

Is there reference to the term “105 Crosby Drive”?

If both tests are satisfied, are we confident that we have found an Invoice provided by ACME Inc.? We think it is possible, but equally we may have located a letter referring to an ACME Inc. invoice and not an actual invoice.

So, we are not highly confident we have a match and therefore need to learn more about these invoices to improve our confidence level.

After processing some more invoices we add the following tests:

We learnt that all the invoices refer to the term “Total” and the ACME Inc. street address. The successful combination of all four tests increases the confidence to a high level that a match has been found.

In this simple example, we are evolving and improving the efficiency of the recognition process based upon machine learning. We are teaching the software to improve its performance over time.