Hello,
I am unable to use a custom list of entities (countries, in this case). I have crated in in Google sheets, with three columns with unique headings and each country in the rows. Then, downloaded as a .tsv with the file name changed to .csv. The file uploads, but I am unable to parse it as a term list as the option is not available: only ‘dataset’ and ‘cortext db’ are available.
When I try to index my corpus with the terms list it does not give me any frequency information about the terms.
Thanks for any help.
Dear brandonwg,
There is no need any more to parse a term list. You can drag and drop you list of terms and use it directly. You can even now use .tsv, without changing the file extension. Which script do you want to use : term list indexer or list indexer ?
Could you copy and pastel here the three first lines of your file?
Best regards
L
Question related also to: https://docs.cortext.net/question/uploading-parsing-custom-list/
Hello Lionel,
Thank you for your prompt response ! Here are the first three lines:
col1 country1 country2
Afghanistan Afghanistan Afghanistan
Albania Albania Albania
I think I was mistaken, I was initially trying to use the list indexer, but now realize I should be using the terms indexer. If I am doing this, should I still have the first column as a unique entry ? So as above, rather than the first column sharing the country name, it should have its own value ?
Thank you again.
Yes, your file is not formatted in a way it could work with CorText.
You have to ways to deal with is :
– corpus term indexer: with grammatical and lexical capabilities (roots, plurals…). Can work after a term extraction script; https://docs.cortext.net/corpus-terms-indexer/
– corpus list indexer: for simply modifying a list of entities, without any full text capabilities. https://docs.cortext.net/corpus-list-indexer/ . So, to use it you need a variable with unique values that you want to modify.
Corpus list indexer could work after list builder.
For corpus list indexer, you file should be as follow:
– old-name -> new-name
– Tab separated (utf8)
– with eventually some other variables you may want to add to your corpus
That makes sense, thank you ! I will continue working with it.
See below