The script is based on Named Entity Detection capacities offered by spacy.
It allows to identify and index persons, places, organizations, etc. At the moment it can handle 5 different languages. In English, one can select among 10 kinds of entities (date: DATE, products: PRODUCT, people: PERSON, geological entities: GPE, etc.). 4 kinds of entities are automatically retrieved in the 4 other languages (geographic entities (LOC), organizations (ORG), person (PERSON), geopolitical entities (GPE)).
As an illustration, see the result of a correspondance analysis mixing named entities extracted from the bible: