Filter terms index according to a list of terms

CorText Manager Q&A forum › Category: Text processing › Filter terms index according to a list of terms

Matthieu P asked 3 years ago

Hello !
I have a subcorpus that I built from a wider corpus (that I filtered to keep only the documents that belong to one of the Louvain’s clusters detected in this latter). I would like to carry out a network mapping on that subcorpus but only on some specific terms.
More precisely, I have this list of terms that I identified thanks to a tf-idf analysis I ran outside of Cortext. I want to restrain my analyses on Cortext to a list of terms that have the highest tf-idf (let’s say the top 100 of those terms) in the cluster I am interested in. Then, all the terms that do not match this list of terms should be deleted from my subcorpus to only keep this top 100. I could use the “top nodes” option in the network mapping panel, but as it is based on the frequency I still have some terms that I don’t want to appear.
I tried filtering the subcorpus with the “query” script but I didn’t manage to do it. I am feeling that there is a simple way to do so, but my Cortext skills are a bit rusted, since I haven’t been using it for weeks.
Would you have any tips ? Thanks a lot, again !

1 Answers

0 Vote Up Vote Down

Lionel Staff answered 3 years ago

Dear Matthieu,

The most straightforward way to achieve what you want is to :

Run a list builder to extract all the main forms of your terms list. This should correspond to the list that you already have and, if it corresponds, you should not need to run it again;
Run a list indexer, selecting the “Define a custom list of entities”, and precise the name you want for your new variable

This second task will index only the terms you have selected.

Another strategy would be to build a block list of terms, listing all the unwanted terms. With this second strategy, you can calculate a network on the full network and remove the unwanted terms only in the visualized networks.
I hope it helps
L

Cortext Manager Documentation

Filter terms index according to a list of terms

learn about CorText methods and share your experience