Duplicate links in co-occurrence network

Cortext Manager Q&A forumCategory: Network mappingDuplicate links in co-occurrence network
Patrik Blik asked 4 days ago

Hi, I’ve created a couple of networks from Scopus data, through a by now somewhat routine curation (‘keywords’ field needs to be split by semicolon despite subfield-separator option specified at csv upload) -> term extraction -> network mapping of resulting ‘Terms’ field procedure. I now noticed that the downloaded gexf file has dual links between most nodes, eg. below. I leave most options on default, but which might be causing co-occurrence to be split among the nodes, especially with uneven weight? (if 35 co-occurs with 123 at a rate of 0.62, how can 123 co-occur with 35 at 0.33?)

<edge id="340" source="35" target="123" weight="0.617122727778">
<attvalues>
<attvalue for="0" value="low"/>
<attvalue for="1" value="123"/>
<attvalue for="2" value="35"/>
<attvalue for="3" value="0.617122727778"/>
</attvalues>
</edge>

<edge id=”1226″ source=”123″ target=”35″ weight=”0.333208200853″>
<attvalues>
<attvalue for=”0″ value=”low”/>
<attvalue for=”1″ value=”35″/>
<attvalue for=”2″ value=”123″/>
<attvalue for=”3″ value=”0.333208200853″/>
</attvalues>
</edge>

1 Answers
Lionel Staff answered 4 days ago

Dear Patrik Blik,
Thank you for using CorTexT Manager!
What may appear as duplicates in your network are actually not duplicates. This occurs because, depending on the selected parameters, the proximity measure matrix (derived from the co-occurrence matrix) is not symmetrical. There are two main reasons for this:

  • The chosen Proximity Measure, which is set by default to “Distributional” (see documentation). This is an indirect measure that compares how similar the semantic contexts of two keywords are.
  • The use of “Find the Optimal Proximity Threshold” (see documentation), which introduces a threshold that further modifies the proximity matrix.

If your goal is to visualize and analyze the complete network using raw occurrence values, we suggest the following settings:

  • Set “Automatically define the Proximity Measure” to No, and choose “Proximity Measure” = Raw
  • Enable “Edges Filtering Advanced Settings”, and set “Find the Optimal Proximity Threshold” to No

Why not using our RIS(Scopus) parser instead of the csv ?
I hope this helps!
CorTexT Team