matias.milia asked 2 years ago

Hi guys, I am currently trying to work on a kind of complicated corpus with a very specific set of questions I want to make. The thing is that the terms extraction is not so helpful to separate the useful from the superfluous. I sense that there is a lot of noise in the data, regarding my interests at least. It is a big set of text where many issues are being discussed. I just want to focus on one. Therefore, I’ve made a list of the words I consider descriptive of my interests. What I am having trouble doing is to use this self-made dictionary to filter the text content. I want to keep just the text entrances where these words are being used. I am stumbling with this already for a couple of hours with no luck. Maybe I am just too tired. But I was wondering if someone could just give me basic instructions on this. It would be really, really appreciated.
I have managed to index a list of terms. So now, all these terms are merged under one label. That is how far I’ve reached.
Thanks in advance!

matias.milia replied 2 years ago

I’ve tried using a ‘pivot word’ but I shall not be doing it properly, so it crashes. The documentation don’t say much about how to format a ‘pivot word’ or if there is any chance to use more than one.

Jean-Philippe Cointet Staff replied 2 years ago

Could you not simply use the query script against this “one label” you created to identify your target topic ?

matias.milia replied 2 years ago

I think I managed to find and solve the problem. It was in Spanish and, then, the accents got in the way and made text recognition difficult.

matias.milia replied 2 years ago

Thanks for your quick response Jean-Philippe!