I have uploaded a terms list and indexed a corpus with it in order to create a contrast analysis graph.
I expect the graph to display words from my list, only with the main form. But it shows plurals as well. For example, my terms list includes the following line :
road;road;road|&|roads|&|Road|&|Roads|&|all-weather road|&|all-weather roads
And both “road” and “roads” appear on the graph in very different places…
Same problem if I only have ‘road;road;road’ in the terms list. Plural “roads” appears as well.
Would you please help me solving this problem?
I forgot to tell that I tried with both “naive” and “advanced” tokenizer…
If have understand well: the option you have chosen produces a tokenization of the indexed term list you want to work with. So, what you describe is the normal behaviour.
In contrast analysis, when you choose the parameter “What is the nature of the data, textual or categorical ?” as text (and naïve for example) it cuts again the classes of your term list according to the punctuation (spaces…) as it would have been a classic sentence in a full text field.
As you have built your own classification, and updated those classes through a term list, you should choose category which will work directly with the full name of your classes.
I hope it helps,