Hi, I’ve been having some problems with parsing a corpus exported from SCOPUS. I’ve chosen to export every variable available, saved in my mac and then zipped in a .zip file. When I upload this I would get an error. It has already happened to me with two different databases. One consists in 265578 documents indexed in 170 .ris files that weight 1.92GB altogether and 621.6MB when in the .zip file; the other one has 11723 indexed documents that weight 52mb and 18.1mb when zipped.
I have tried converting .ris to .csv (using ris_converter) but it would drop some information while doing so.
It is worth mentioning that the databases would parse if I select .ris (standard). But when I do so, some fields such as ‘Contry’ would not be available. I tried to do some ‘curation’ of the database acording to information available ( https://en.wikipedia.org/wiki/RIS_(file_format) ) but I don’t seem to be able to relate each code of the variables in the database to the list on wikipedia.
So, back to where I started. I think some field from the export from SCOPUS is making trouble. So, then, any idea which one would it be? How can I deal with this if the database won’t parse? Should I just go for a new export, which fields should I drop? any suggestions?
—
Here is the log for the errors.
2018-10-04 12:49:05 INFO : Parsing Script Started 2018-10-04 12:49:05 INFO : Source: Type of Data: dataset Corpus Format: ris (scopus) Ignore entries with incorrectly formatted time steps: true 2018-10-04 12:49:07 INFO : Preparing raw data 2018-10-04 12:49:07 INFO : Parsing file /srv/local/documents/93da/93da5d6516dec7cc742c76792d040a63/ener-92-2016-argmex/2010_16_ARGMEX_CHEM_MATE.ris 2018-10-04 12:49:21 INFO : Parsing file /srv/local/documents/93da/93da5d6516dec7cc742c76792d040a63/ener-92-2016-argmex/2010_16_ARGMEX_REST.ris 2018-10-04 12:49:26 INFO : Parsing file /srv/local/documents/93da/93da5d6516dec7cc742c76792d040a63/ener-92-2016-argmex/2010_16_ARGMEX_PHYS_EART_CENG.ris 2018-10-04 12:49:37 INFO : Parsing file /srv/local/documents/93da/93da5d6516dec7cc742c76792d040a63/ener-92-2016-argmex/2007_09_ARGMEX.ris 2018-10-04 12:49:38 DEBUG : Something went wrong while trying to parse, are you sure you selected the correct corpus format ?