Contrast Analysis graph -> pb with indexed terms

ForumCategory: ErrorContrast Analysis graph -> pb with indexed terms
freddie2310 asked 2 weeks ago

Hello,
I have uploaded a terms list and indexed a corpus with it in order to create a contrast analysis graph.
I expect the graph to display words from my list, only with the main form. But it shows plurals as well. For example, my terms list includes the following line : 
road;road;road|&|roads|&|Road|&|Roads|&|all-weather road|&|all-weather roads
And both “road” and “roads” appear on the graph in very different places…
 
Same problem if I only have ‘road;road;road’ in the terms list. Plural “roads” appears as well.
Would you please help me solving this problem?
 
Thanks
Frederique Bordignon

freddie2310 replied 2 weeks ago

I forgot to tell that I tried with both “naive” and “advanced” tokenizer…

1 Answers
Lionel Staff answered 2 weeks ago

Dear Frederique,
If have understand well: the option you have chosen produces a tokenization of the indexed term list you want to work with. So, what you describe is the normal behaviour.
In contrast analysis, when you choose the parameter “What is the nature of the data, textual or categorical ?” as  text (and naïve for example) it cuts again the classes of your term list according to the punctuation (spaces…) as it would have been a classic sentence in a full text field.
As you have built your own classification, and updated those classes through a term list, you should choose category which will work directly with the full name of your classes.
I hope it helps,
Lionel

learn about CorText scripts and share your experience