Indexing problem

Johann Grémont asked 2 months ago

Dear Team Cortext, 
I’m writing to you because I’m having trouble indexing the corpus in Cortext.
Here are the steps I am taking.
1. Creation of a list of x terms (terms extraction)
2. Export to a spreadsheet in ods format, in particular to group terms.
Example: I have 4 stems that are similar (canc, cancer, cancer prostat, cancer pediatr) with the 4 following main forms : cancer, cancers, cancers de la prostate, cancers pédiatriques and the 4 following Forms : cancer, cancer|&|Cancer|&|cancers|&|cancers de la prostate|&|cancers pédiatriques; cancers ; cancers de la prostate ; cancers pédiatriques
I’d like to group them under the same main form, “cancers”.
To do this, in the Forms column, I gather “cancer|&|Cancer|&|cancers|&|cancers pédiatriques|&|cancers de la prostate”.

3. Save in CSV format, tab separator, UTF8 encoding
4. Upload to Cortext in CSV format
5. Check term list
6. Indexing
Step 6 is where the difficulty arises. Some of the terms in the Forms column disappear. I only get “cancer|&|cancers”.
Hence my question. What step do I need to take to ensure that the terms in the Forms column don’t disappear?
Many thanks in advance for your feedback,
Yours sincerely
 
 

1 Answers
Lionel Staff answered 2 months ago

Dear Johann,
Strange!
From what I understand, the steps you’ve described correspond to what’s expected.

  1. Terms extraction script
  2. Download the list of terms
  3. Work on the spreadsheet. Where the forms are what will be indexed and the main forms are used in the analyses that follow.
  4. Upload the updated list
  5. Update your dataset using the Corpus terms Indexer script

Have you deleted the merged rows or added and w in the last column?