ExtractIQ

Recognition Engine

When Content Analytics software processes all the words in a file it needs to recognize key information that is important from the normal language elements. It is searching for clues and patterns that identify this information. This is all accomplished by a Recognition Engine – software that systematically processes all the words to extract what is most relevant to the task at hand.

The Recognition Engine is taught how to best perform this task given the type of content being processed. There are five main recognition approaches which can be utilized and combined to produce the best result. As content is processed, feedback on recognition performance allows the configuration to be improved providing a constant self-improving cycle. The Recognition Engine learns as more content is processed over time.

1. Content

Content – this involves linguistic analysis of the words within the content to extract useful information.

2. System

System – information from the context of the content are referenced in this case; examples would be the system properties of a file and any encoded data within the filename or path.

3. Zonal

Zonal – specific spatial areas within content can be isolated to identify and extract information. An example would be recognizing a company logo to understand the layout of a business document like an invoice or purchase order.

4. Database

Database – information extracted from the content can be used to lookup related information in organizational and internet databases. As an example, after finding a product code, lookup up the product description.

5. Application

Application – for some document types, leverage the native application to extract information. For example, extract the title of a Microsoft Word document by referencing a bookmark value.

The combination of all these approaches provides powerful recognition for all types of content and therefore meets the customer requirement in a flexible way.

For more detailed information on the Recognition Engine, please see our White Paper here.