Hi guys, I am currently trying to work on a kind of complicated corpus with a very specific set of questions I want to make. The thing is that the terms extraction is not so helpful to separate the useful from the superfluous. I sense that there is a lot of noise in the data, regarding my interests at least. It is a big set of text where many issues are being discussed. I just want to focus on one. Therefore, I’ve made a list of the words I consider descriptive of my interests. What I am having trouble doing is to use this self-made dictionary to filter the text content. I want to keep just the text entrances where these words are being used. I am stumbling with this already for a couple of hours with no luck. Maybe I am just too tired. But I was wondering if someone could just give me basic instructions on this. It would be really, really appreciated.
I have managed to index a list of terms. So now, all these terms are merged under one label. That is how far I’ve reached.
Thanks in advance!
I’ve tried using a ‘pivot word’ but I shall not be doing it properly, so it crashes. The documentation don’t say much about how to format a ‘pivot word’ or if there is any chance to use more than one.
Could you not simply use the query script against this “one label” you created to identify your target topic ?
I think I managed to find and solve the problem. It was in Spanish and, then, the accents got in the way and made text recognition difficult.
Thanks for your quick response Jean-Philippe!