Parsing a modified .RIS (Scopus) database

Cortext Manager Q&A forum › Category: Data processing › Parsing a modified .RIS (Scopus) database

matias.milia asked 4 years ago

Hi,
I have been working on some SCOPUS data (in .RIS format). I have processed the database beforehand and added some specific variables that were not available before (and that do not usually are part of the SCOPUS data exports). I have parsed the database using the two different parsing options (.RIS SCOPUS and .RIS regular). I notice a differece; when parsing as .RIS (SCOPUS) I get two variables (Geolocalization: city and countries) that are very interesting but that I don’t find the way of recovering when parsing in .RIS (regular). Is there any way to process my .RIS (regular) database separately to get this information recovered? How would you recommend to do so?

Question Tags: data parsing, scopus

1 Answers

1 Vote Up Vote Down

Lionel Staff answered 4 years ago

Dear Matias,
Yes, ris and ris scopus parsers have different behaviors. And some preprocessings are done for ris scopus while building the dataset.
One of the most problematic aspect for the ris standard is that it is not design to support author addresses.
You have several options to achieve what you want to do. One of the most straightforward way is to build the new variables before in a TSV file, and use corpus list indexer to add it to the ris scopus dataset. The idea is to have the id of the documents in the first column, and the new variables in other columns. And use the id of the documents to index your corpus.

From what I have understood, I think you can even already export this information from the ris standard dataset using the export csv feature of the corpus explorer script. And add it to the ris scopus dataset.

I Hope it helps

matias.milia replied 4 years ago

Thanks, Lionel! Yes, I thought about that, but I dismissed the idea since SCOPUS does not have an identification number as Web of Science does. Nevertheless, with your suggestion, I gave it a second thought and found a way around it. The URL is a helpful field to identify the documents as it is a unique reference. I cleaned the root of the URL (https://www.scopus.com/inward/record.uri?eid=2-s2.0-) and kept the identifier (all the text included after that). So far, it seems like it is working; thanks!

Cortext Manager Documentation

Learn about Cortext methods and share your experiences