Dear CorTexT team,

I am currently working on co-occurrence networks and for a question of readability, I have reduced the number of neighbours to 5. I would like to know how the threshold is calculated. Indeed, many nodes are connected to more than 5 other different nodes.

So, if for a given node A, I have a number of co-occurrences that goes from 2 to 150, the network will only represent links with nodes that co-occur between 145 and 150 times with node A? And if I have 15 words in this range, will these 15 words be linked to node A?

Thank you in advance for your help,

Anne-Lise

Dear Anne-lise,

Top N edges, Top N neighbours and Proximity Threshold are all parameters that are dependent on the chosen proximity measure.

Number of top neighbours to consider enable users to selected for each nodes the top closest neighbours for a given node, where the closest are sort according to the selected proximity measure.

If RAW is chosen, It means that the raw value (frequency) of the cooccurrences is used without any further calculation (e.g. useful to directly catch how intense are the relations in a coauthorship network or in a collaboration network between metropolitan areas…). In an adjacent matrix based on a RAW frequency, a top 3 neighbours of a given node A, means that are selected only the 3 highest values for the nodes of the networks linked to the node A (let’s say nodes: B, C, D). But nodes A can totally be in the top 3 neighbours list of other nodes of the network (let’s say nodes: E and F), so have in total more than 3 Edges. E and F nodes are not in the top 3 edges list of A, but A is in the top 3 edges list of E and F… By selecting the “Number of top neighbours to consider”, users are able to keep what matters locally for each node, by preserving the overall network structure.

I hope it helps,

L

Dear Lionel,

Thank you for your very clear answer and I will easily be able to transpose it to my case where the proximity measure is distributional.

I have another question related to this one. To analyse my network in detail (in order to publish my results), I switched to Gephi to be able to access all the interesting statistics (betweeness, etc.). And if I understood correctly, CorText produces a directed graph (since there are degrees in and out). But in my case, it is about co-occurrence of keywords in an academic corpus. I don’t really understand how there can be an orientation of the co-occurrence. Could you explain to me how CorText decides whether a link is inbound or outbound in my case?

I can send you the details of the settings I made (and even the keywords extraction) if that would be helpful.

Thanks again for your help and for this incredible plateform that is CoText!

Anne-Lise

It could come from different reasons depending on the chosen parameters.

But in your case I feel that the answer is the same than above 🙂

As long as you selected a top N edges for each nodes, you introduce a kind of directionality. With a in degree (when A is a top N of others Nodes) and out degree (for nodes which are in the Top N of A).

L

L