I would like to analyze a corpus from the WOS based on Keyword co-currences (concatenated Keywords and ISIID fields), the goal being to divide the corpus into themes and sub-themes according to clusters.
I have several questions about the distribution in the clusters using network mapping. Each reference is associated to a cluster but I would like to vary this number in order to have a reduced number of clusters (15, then between 5 and 10) so as to :
– have nested clusters if possible if we reduce their number in a second analysis
– obtain roughly homogeneous cluster sizes (avoid clusters with few nodes)
– obtain a coherent thematic distribution of the references themes
– avoid as much as possible references without cluster.
I’ve used distributional proximity measurement and modified the number of nodes to vary the result.
Can I play on the parameters size community threshold (1 for now) and proximity threshold (for now 0) to get the desired result and what is their exact role?
Should I keep the Louvain community detection algorithm or rather clique percolation to exclude nodes with few connections?
And how are the cluster names defined (they are not the first 2 items of each cluster a priori)?
Thank you in advance for your answers
Have a nice day
Yes, exactly, you may want to play with these parameters:
- Network Analysis and layout > Louvain resolution script and with the Parameter resolution value, which goes from 0.1 up to 4.9. The default value is 1 where the linkages inside clusters compared to those between clusters is optimal. Resolution is a parameter for the Louvain community detection algorithm that affects the size of the recovered clusters. Smaller resolutions recover smaller clusters, and therefore a larger number of clusters, and larger values recover clusters containing more nodes and therefore a fewer number of clusters. In CorText Manager, the resolution parameter does not affect the overall structure of the linkages between nodes.
- Edges > Edges filtering advanced settings > Find the Optimal Proximity Threshold and Proximity Threshold. With Find the Optimal Proximity Threshold on yes CorText Manager uses a strategy to determine a threshold where: just before, the network is divided in subcomponents (unliked partitions of the graph) and, just after, nodes of network are more widely linked to the other nodes. So, this strategy tends to minimize the number of links while conserving the overall connectiveness of the network. If you want to change this behaviour you may want to define Find the Optimal Proximity Threshold on no, and play with Proximity Threshold parameter (from 0 to 1), and the other filters (Number of top edges to consider and Number of top neighbours to consider).
The clusters naming strategy is in fact simple: it is the top 2 nodes of each clusters with the highest centrality, where CorText Manager measure the centrally of the nodes in term of Degree (number of distinct links). The names of the clusters are the combination of the two selected nodes for each cluster.
I hope it helps!