Integrating datasets and deleting duplicates

CorText Manager Q&A forumCategory: Data processingIntegrating datasets and deleting duplicates
matias.milia asked 6 years ago

Hi, I am running a set of different query strings in Scopus database. I would like to integrate them all in one same dataset. Since the searches probably will overlap, the key is in this case to avoid documents to duplicate. So, then, I would like to delete the ones that appear more than once. I am thinking that the DOI could be a easy way to identify repeated entries. But then, I wonder, is list builder the tool I would need to use? Since DOI is not really a category and I would need to delete just the duplicates and not those similar, I started to doubt about it.
For further information, I am having them exported as .ris and there are around 4,000 entries, so there are quite a few entries to do it manually.
I could export it as .csv and work them all together in OpenOffice, but with such a big pile of data I am afraid it will crash.
Thanks in advance,
Matías.

Jean-Philippe Cointet Staff replied 6 years ago

sorry for the lack of responsiveness. Your question is linked to the way duplicates are managed when parsing datasets in the manager. Duplicate management still depends very much on the data format you are using, and as you noticed, in ris format, removal of duplicate entries is not possible yet. Additionally I’m not sure doi information is systematically present… We are working on setting up a robust system for that purpose by it may take some time.

The best solution for your precise problem would probably be to perform a unique search combining all the queries on Scopus if possible.

matias.milia replied 6 years ago

Thanks a lot, just found a way of merging directly at scopus it is called ‘Combine queries…’ and this would do the trick I think. It would allow me to write quite specific query lines and then merging them before exporting them. So, problem solved!
Thanks again!