ExtractIQ

Named Entity Recognition

Named Entity Recognition (NER) is a sub-task of NLP that seeks out and categorizes specified entities in a body of text. This is useful as it allows us to extract people names, organizations, geographic locations, etc. as metadata from the content. These can then be leveraged to catalog the content for future reference.

In school we were taught that a proper noun was “a specific person, place, or thing,” thus extending our definition from a normal noun.

Unfortunately, this seemingly simple distinction is a challenging computational linguistic task – the extraction of named entities, e.g. persons, organizations, or locations from content. More formally, the task of NER can be described as the identification of named entities in computer readable text via annotation with categorization tags for information extraction.

Let us look at an example:

“Buzz Aldrin joined Armstrong and became the second human to set foot on the Moon

When we read this fragment of text it will be obvious to just about everybody that this is making some reference to the Moon landing back in 1969. The computer however might see “Armstrong” as “Neil Armstrong”, “Louis Armstrong” or “Armstrong, Ontario in Canada”. As a result, the named entity processing needs to analyze other words nearby to semantically connect them. Sophisticated language processing takes place to select the most likely outcome in each case.