Trouble parsing a SCOPUS corpus

CorText Manager Q&A forumCategory: Data processingTrouble parsing a SCOPUS corpus
matias.milia asked 4 years ago

Hi, I’ve been having some problems with parsing a corpus exported from SCOPUS. I’ve chosen to export every variable available, saved in my mac and then zipped in a .zip file. When I upload this I would get an error. It has already happened to me with two different databases. One consists in 265578 documents indexed in 170 .ris files that weight 1.92GB altogether and 621.6MB when in the .zip file; the other one has 11723 indexed documents that weight 52mb and 18.1mb when zipped.
I have tried converting .ris to .csv (using ris_converter) but it would drop some information while doing so.
It is worth mentioning that the databases would parse if I select .ris (standard). But when I do so, some fields such as ‘Contry’ would not be available. I tried to do some ‘curation’ of the database acording to information available ( https://en.wikipedia.org/wiki/RIS_(file_format) ) but I don’t seem to be able to relate each code of the variables in the database to the list on wikipedia.
So, back to where I started. I think some field from the export from SCOPUS is making trouble. So, then, any idea which one would it be? How can I deal with this if the database won’t parse? Should I just go for a new export, which fields should I drop? any suggestions?


Here is the log for the errors.

2018-10-04 12:49:05 INFO : Parsing Script Started
2018-10-04 12:49:05 INFO : 
                           Source:
                                                         Type of Data: dataset
                                                         Corpus Format: ris (scopus)
                                                         Ignore entries with incorrectly formatted time steps: true
                           
2018-10-04 12:49:07 INFO : Preparing raw data
2018-10-04 12:49:07 INFO : Parsing file /srv/local/documents/93da/93da5d6516dec7cc742c76792d040a63/ener-92-2016-argmex/2010_16_ARGMEX_CHEM_MATE.ris
2018-10-04 12:49:21 INFO : Parsing file /srv/local/documents/93da/93da5d6516dec7cc742c76792d040a63/ener-92-2016-argmex/2010_16_ARGMEX_REST.ris
2018-10-04 12:49:26 INFO : Parsing file /srv/local/documents/93da/93da5d6516dec7cc742c76792d040a63/ener-92-2016-argmex/2010_16_ARGMEX_PHYS_EART_CENG.ris
2018-10-04 12:49:37 INFO : Parsing file /srv/local/documents/93da/93da5d6516dec7cc742c76792d040a63/ener-92-2016-argmex/2007_09_ARGMEX.ris
2018-10-04 12:49:38 DEBUG : Something went wrong while trying to parse, are you sure you selected the correct corpus format ?
1 Answers
Jean-Philippe Cointet Staff answered 4 years ago

Hi Matias, 
thanks for the bug report, the problem you had was caused by mis-formatted references but it is now solved and Scopus RIS files parsing should now hopefully go smoothly.