Edges Definition

This page describes the parameters regarding the edges of the maps in the mapping script

Proximity measure

capture-decran-2016-10-26-a-02-32-16

Once you have defined the fields of inquiry along with the correct time periods, co-occurrence networks should be transformed according to a given proximity measure. One can choose co-occurrences based direct (chi2, mutual informationraw, cramer) or indirect measure (cosinedistributional). Direct measures only take into account the raw co-occurrence number between two nodes while indirect measures account for the global distribution of co-occurrences of the two target nodes with all the other nodes.

Usually a standard strategy is to choose direct measures like chi2 for heterogeneous network  and indirect measure like distributional measure (or cosine more classically)  for homogeneous networks. This is the default behavior of the script if you don’t specify any metrics at this step.

Raw measure is also useful when one does not want to affect the original co-occurrences statistics, for example when plotting a collaboration network.

Heterogeneous measures (with the “het” suffix) allow the production of affiliation networks made of nodes from the first field only, whose proximity is computed by comparing their profile according to the second field. The scalar product or the cosine measure is then used to provide networks such as authors connected when sharing the same research interests.

Metrics explained
Metrics explained

 

 

 

 

 

 

 

 

A more thorough and technical explanation of the metrics are available on the metrics page.

 

Network Filtering

Screenshot from 2016-08-16 15:33:21

An important step when producing a network map is to carefully define the filtering steps. Edges of the weighted network (the weight being equal to proximities between nodes) should be filtered in order to get rid of insignificant edges that may make it difficult for users to visualize its most significant features. Three possibilities are offered: (i) exclude any link whose weight is below the distance threshold, (ii) only select the N most significant edges (top edges according to their weight), (iii) only select the N edges connected to the closest neighbours of each node in the network (top neighbours).

Although users are free to define their own strategy when filtering, a simple rule of thumb is to focus on top neighbours filtering with indirect measures and distance threshold or top edges filtering with direct measures. One should also keep in mind that direct measure cannot be below 0 but have no upper values (except for cramer distance always below 1) and that indirect proximity measures are comprised between 0 and 1.

Advanced Settings

Screenshot from 2016-08-16 15:32:58

The heterogeneous edges option allows, when two different fields have been selected, to compute not only links between nodes in each field but also links between nodes falling in the same category.

Screenshot from 2016-08-16 15:33:53

Color Edges – Edges will be colored according to the color of the node

Only take “short range” co-occurences – this parameter allows you to control the range over which a co-occurrence is valid. It is especially useful when producing semantic maps of long documents. By default every joint appearance of two words in a record shall be considered as a co-occurrent event even if the two words are very far apart. If one activates the short-range option, when it is possible to define the maximum number of separating sentences between two words should be used to  consider the cooccurrences event to be significant. Classically, 5 to 10 sentences seems a reasonable value. Additionally, use the context decay speed, if you wish to weight the effect of co-occurence by a factor inversely proportional to the distance (still in number of sentences) between two words.

Democratic – This last option is still experimental. By default, for each document, a group is built connecting every pair of entities. For instance, author co-occurrence matrix will index collaboration between each pair of co-authors. It seems a reasonable assumption, but one may be tempted not to give too much importance to very large events gathering dozens or even hundreds of authors compared to smaller events. Put differently, by default, two people collaborating in a very large team have the same weight in the final map than two people working on their own. This option alleviates this effect and guarantees that each article have the same contribution to the final map.

learn about CorText scripts and share your experience