Named Entity Recognizer

The script is based on Named Entity Detection  capacities offered by spacy.

NER entity types

It allows to identify and index persons, places, organizations, etc. At the moment it can handle 6 different languages. In English, one can select among 10 kinds of entities.

  • LOC: non-GPE locations, mountain ranges, bodies of water.
  • ORG: companies, agencies, institutions, etc.
  • PERSON: people, including fictional.
  • GPE: countries, cities, states (GeoPolitical Entity). Use our CorTexT geocoding service to locate the extracted geographical entities on the earth, and refine the coordinates (latitude and longitude) with GeoEdit Tool.
  • PRODUCT: objects, vehicles, foods, etc. (Not services.)
  • EVENT: named hurricanes, battles, wars, sports events, etc.
  • WORK_OF_ART: titles of books, songs, etc.
  • DATE: absolute or relative dates or periods.
  • TIME: times smaller than a day.
  • MONEY: monetary values, including unit.

4 kinds of entities are automatically retrieved in the 5 other languages:

  • LOC: countries, cities, states, mountain ranges, bodies of water (which corresponds to LOC and GPE  in English).  Use our CorTexT geocoding service to filter and locate the geographical entities on the earth, and refine the coordinates (latitude and longitude) with GeoEdit Tool.
  • ORG: companies, agencies, institutions, etc.
  • PERSON: people, including fictional.
  • MISC

Parameters

language

Choose the original language of you selected textual field(s).

Named Entity Types

Select the types of entities to extract. See Named Entity Recognizer entity types for the list of extracted entities depending the selected language.

Minimum Frequency

Minimum frequency threshold (applied for each category in each time step).

List size

Total number of distinct entities extracted. Only the N most frequent entities in each category and for each time step.

A correspondence analysis as an example

As an illustration, see the result of a correspondence analysis mixing named entities extracted from the bible:

Correspondance Analysis with 3*10 Named Entities (PER + GPE + ORG) and bible books as a supplementary field