Author Keyword and Indexed Keyword in Parsing Scopus RIS format

CorText Manager Q&A forumCategory: Data processingAuthor Keyword and Indexed Keyword in Parsing Scopus RIS format
Quentin asked 4 years ago

Hello,
I would like to know how are handled the author keywords and indexed keywords when parsing a scopus ris format. In a CSV extraction of a corpus from Scopus, these appear in separated fields. It seems that the result of parsing a scopus ris format only returns one “keywords” field. Is there a way to dissociate author and indexed keywords without parsing a CSV corpus?
Many thanks!
Quentin

Lionel Staff replied 4 years ago

Dear Quentin,

Short answer
From what I know, the indexed keywords are not included in the exports produced by Scopus, in a RIS Scopus format. So, CorText Manager can only deal with author’s keywords.

Long answer
In many cases, these indexed keywords are questionable. They are built on external classifications / ontologies which try to be as generic as possible. In addition, most of the time the methods used to tagged a document are not precisely defined.
If you want to enrich the author’s keywords list in your corpus, you could also run a lexical extraction script. Based on this new list of extracted terms, you could build a classification using a network mapping script (occurrences of keywords). With these two steps, you would be able produce an endogenous classification for the documents of your corpus.
I hope It helps!
L

Quentin replied 4 years ago

Dear Lionel,

It helps! But it continued to seem strange to me to have so many keywords for a large part of my corpus items, and checking manually confirmed that parsing a scopus ris format included all indexed keywords in the “Keywords field” (at least for the items I checked). Was it an error or is it supposed to be specified in the parsing script parameters that I did not see?
Thanks for your help!
Quentin

2 Answers
Lionel Staff answered 4 years ago

See below!

Quentin answered 4 years ago

Thank you for these short and long answers!
Another doubt with the parsing of Scopus RIS format: the variable “Language of Original Document” appears in the CSV formt of scopus extraction, but does not seem to be available with the Scopus RIS format after parsing. Is there a way to get this information without re-uploading a CSV version of the corpus? (My aim is to separate english and french subsets). 
Thanks for your help,
Quentin

Lionel Staff replied 4 years ago

Dear Quentin,

Exactly: the language from the RIS Scopus format is not yet parsed by the CorText Manager. We should add it.
• The best way is to build form the csv an index with two columns: the id of the article or the title, and the corresponding Language value (e.g. FR or EN);
• Upload it as a resource and apply it to your Corpus with the Corpus List indexer (https://docs.cortext.net/corpus-list-indexer/).
I hope it helps!
Lionel