Create subcorpus from specific terms

CorText Manager Q&A forumCategory: Data processingCreate subcorpus from specific terms
Matthieu P asked 8 months ago

Hello,
First, thank you very much for Cortext, I’ve been using it for a week and it is really helping me achieve some textual analysis I’ve been trying to do for weeks.

I searched the Q&A section to solve my problem but I think I couldn’t find the answer. I have a corpus of articles that I want to filter according to a list of extracted ngrams and terms to create a subcorpus. This subcorpus should contain all the articles that mention any of these specific terms and ngrams at least once. I am not familiar at all with SQL and couldn’t find an option in the Query script that would allow me to upload a list of terms as a basis for filtering. Would you mind giving me some advice ?

Thanks a lot,
Matthieu

1 Answers
Lionel Staff answered 8 months ago

Dear Matthieu,
Yes! In fact you have two ways to do achieve what you want to do.
When running a lexical extraction, the documents are tagged with the main forms (specific terms) according to the corresponding list of forms (ngram). So, when using the query script, you do not have to bother you about the list of forms (ngram).
The most straightforward is, you are right, to use directly the query script

  • Build a query that list all the selected forms using the field produced by the lexical extraction
  • and define the conditions
  • The query may look like this : data = ‘term 1’ OR data = ‘term 2’ OR data = ‘term n’ where term n corresponds to the extracted main forms (specific terms)

Use the build a new database parameter to extract this document in an entirely new database, and add a custom name for the new database that will store the results of the query.
But it is also a matter of number of terms you want to use for the selection. This strategy is not really suitable if you have 30 or more keywords. In that case, you may want to use a combination of lexical extraction (already made), list builder and list indexer.
I hope it helps
L

Matthieu P replied 8 months ago

Thanks a lot, it works perfectly well ! 🙂