I am trying to make a network between authors and quoted authors from a RIS file (scopus).
Unfortunately, the authors appear in the form Name followed by a comma first letter of the first name and a period. (Name, P.)
But the quoted authors have neither point nor comma (Name P).
How can I harmonize the writing of names?
Thanks in advance.
There are several ways to deal with this issue (which is a classic problem). Basically, you should build a discretionary which will associate the two forms for a given name, to one harmonized name.
- You can extract the two lists, using list builder, on the two variables. Download the two lists and merge them using calc libre office or google sheet. And manually harmonized them: with first column the name variations and in the second the harmonized names. Upload it, and index your corpus using list indexer
- You can proceed the same way, but use a tool to do fuzzy matching in order two calculate a similarity between the names variations.
See here a similar question :https://docs.cortext.net/question/replace-name-duplicats-in-a-list-of-authors/
I hope it helps