Truncation for Corpus Terms Indexer

Cortext Manager Q&A forum › Category: Text processing › Truncation for Corpus Terms Indexer

Hello,
I would like to index my corpus on the basis of a list of terms previously identified by me. I will use the Corpus Terms Indexer script.
I would like to know if it’s possible (and how) to integrate truncations in this list. For example, if I indicate “chenalis*”, I would like to have the terms “chenalisation”, “chenalisations”, “chenaliser”…
Thanks for your help!
Déborah

Question Tags: Text

1 Answers

0 Vote Up Vote Down

Lionel Staff answered 4 years ago

Dear Déborah,
I would recommend to do it in two steps:

perform a (large) lexical extraction to identify the different forms of each noun phrase in your corpus based on grammatical variations;
work with it (you can even add new forms which are not detected using “|&|newform1|&|newform2”), and select only the forms which are in the “list of terms previously identified by” you.

During the re-indexation step you may want to have a look to the “Use the shared dictionary“, but just to add few other words variations (additions which are not based on grammar).
I hope it helps
L

Cortext Manager Documentation

Learn about Cortext methods and share your experiences