Problem with the corpus list indexer

CorText Manager Q&A forumCategory: Text processingProblem with the corpus list indexer
Clement Fromageau asked 1 year ago

Hi everyone,
My corpus is 33 text with the date in the name from 1989 to 2021
I want to make a demographic graphic to compare the terms extraction with the year of occurence.
The second tutorial video is perfect in that case but I have a problem :

  • Step “list builder” : OK
  • The I download the TSV file and I added a column to extract the date from the title of the documents (Look first pic)
  • Step “upload file and data parsing” : I saved with CSV UTF-8 format and I uploaded it with a data parsing
  • Step “Corpus list indexer” : the result is not the same as the tutorial. My file that I want upload is the first picture and Cortext result is the second attached.
  • I don’t know why and if its a problem that the result file is in TSV format ? (Third pic)

First pic
Second pic
Third pic
Thank you in advance for your answer.
Clement

2 Answers
Lionel Staff answered 1 year ago

Dear Clement,

  • You do not need any more to “data parse” a tsv list. We have improved the user experience regarding this specific step.

For the other questions, I can not help you, instead you invite me in your project? Please invite me using lionel dot villard at esiee dot fr
Best regards,
Lionel

Clement Fromageau replied 1 year ago

D’accord pour le tsv,
Je vous ai envoyé une invitation car je suis toujours bloqué au list indexer

Lionel Staff answered 1 year ago

Dear Clement,
It was only a problem of format so, to make it works:

  1. use tsv (tabulation separated) for lists, dictionaries …
  2. to make the process straightforward, name the second column “ISIpubdate”, it will automatically fill the right variable used for temporal analysis
  3. upload the list thought the “upload file” button and box
  4. perform a corpus list indexer with : define a custom list of entities = yes (it excludes/filters all values which are not listed in the list, so you are sure to have only years in the ISIpubdate variable) + add a dictionary of equivalent strings = yes (in your case, it will build the correspondence between the filename values and the year so, between the first and the second columns of your term list)

I hope it helps!!
L