Filter terms index according to a list of terms

CorText Manager Q&A forumCategory: Text processingFilter terms index according to a list of terms
Matthieu P asked 3 years ago

Hello !
I have a subcorpus that I built from a wider corpus (that I filtered to keep only the documents that belong to one of the Louvain’s clusters detected in this latter). I would like to carry out a network mapping on that subcorpus but only on some specific terms.
More precisely, I have this list of terms that I identified thanks to a tf-idf analysis I ran outside of Cortext. I want to restrain my analyses on Cortext to a list of terms that have the highest tf-idf (let’s say the top 100 of those terms) in the cluster I am interested in. Then, all the terms that do not match this list of terms should be deleted from my subcorpus to only keep this top 100. I could use the “top nodes” option in the network mapping panel, but as it is based on the frequency I still have some terms that I don’t want to appear.
I tried filtering the subcorpus with the “query” script but I didn’t manage to do it. I am feeling that there is a simple way to do so, but my Cortext skills are a bit rusted, since I haven’t been using it for weeks.
Would you have any tips ? Thanks a lot, again !

1 Answers
Lionel Staff answered 3 years ago

Dear Matthieu,
 
The most straightforward way to achieve what you want is to :

  1. Run a list builder to extract all the main forms of your terms list. This should correspond to the list that you already have and, if it corresponds, you should not need to run it again;
  2. Run a list indexer, selecting the “Define a custom list of entities”, and precise the name you want for your new variable

This second task will index only the terms you have selected.
 
Another strategy would be to build a block list of terms, listing all the unwanted terms. With this second strategy, you can calculate a network on the full network and remove the unwanted terms only in the visualized networks.
I hope it helps
L