I would like to index my corpus on the basis of a list of terms previously identified by me. I will use the Corpus Terms Indexer script.
I would like to know if it’s possible (and how) to integrate truncations in this list. For example, if I indicate “chenalis*”, I would like to have the terms “chenalisation”, “chenalisations”, “chenaliser”…
Thanks for your help!
I would recommend to do it in two steps:
- perform a (large) lexical extraction to identify the different forms of each noun phrase in your corpus based on grammatical variations;
- work with it (you can even add new forms which are not detected using “|&|newform1|&|newform2”), and select only the forms which are in the “list of terms previously identified by” you.