Dear Team Cortext,
I’m writing to you because I’m having trouble indexing the corpus in Cortext.
Here are the steps I am taking.
1. Creation of a list of x terms (terms extraction)
2. Export to a spreadsheet in ods format, in particular to group terms.
Example: I have 4 stems that are similar (canc, cancer, cancer prostat, cancer pediatr) with the 4 following main forms : cancer, cancers, cancers de la prostate, cancers pédiatriques and the 4 following Forms : cancer, cancer|&|Cancer|&|cancers|&|cancers de la prostate|&|cancers pédiatriques; cancers ; cancers de la prostate ; cancers pédiatriques
I’d like to group them under the same main form, “cancers”.
To do this, in the Forms column, I gather “cancer|&|Cancer|&|cancers|&|cancers pédiatriques|&|cancers de la prostate”.
3. Save in CSV format, tab separator, UTF8 encoding
4. Upload to Cortext in CSV format
5. Check term list
6. Indexing
Step 6 is where the difficulty arises. Some of the terms in the Forms column disappear. I only get “cancer|&|cancers”.
Hence my question. What step do I need to take to ensure that the terms in the Forms column don’t disappear?
Many thanks in advance for your feedback,
Yours sincerely
Dear Johann,
Strange!
From what I understand, the steps you’ve described correspond to what’s expected.
- Terms extraction script
- Download the list of terms
- Work on the spreadsheet. Where the forms are what will be indexed and the main forms are used in the analyses that follow.
- Upload the updated list
- Update your dataset using the Corpus terms Indexer script
Have you deleted the merged rows or added and w in the last column?