Hello,
I would like to index my corpus on the basis of a list of terms previously identified by me. I will use the Corpus Terms Indexer script.
I would like to know if it’s possible (and how) to integrate truncations in this list. For example, if I indicate “chenalis*”, I would like to have the terms “chenalisation”, “chenalisations”, “chenaliser”…
Thanks for your help!
Déborah
1 Answers
Dear Déborah,
I would recommend to do it in two steps:
- perform a (large) lexical extraction to identify the different forms of each noun phrase in your corpus based on grammatical variations;
- work with it (you can even add new forms which are not detected using “|&|newform1|&|newform2”), and select only the forms which are in the “list of terms previously identified by” you.
During the re-indexation step you may want to have a look to the “Use the shared dictionary“, but just to add few other words variations (additions which are not based on grammar).
I hope it helps
L