I am wondering how is the “project records onto clusters” option computed (I would like to do some analysis from the list of records with specification of the clusters they belong to). More precisely :
- When I choose not to assign a unique cluster to each record, I can see in the corpus explorer data frame that each record has its list of different clusters : my guess is that Cortext takes the list of nodes corresponding to a given record and swaps them for the names of the clusters each node belongs to. For example, let’s say that I have a node called “Bobby” in the record “text_1”, and the network mapping results assign “Bobby” to the cluster “A” : then, “Bobby” would turn into “A” in the list of non-unique clusters assigned to “text_1” ? Is this how it works ? (yes, sorry my example is peculiar)
- Then, if I choose to assign a unique cluster to each record, how is this cluster choosen ? My guess is that Cortext keeps the most frequent cluster based on the list of nodes (and thus the clusters they belong to).
Thanks a lot again for everything !
Hello. Your assumptions are incorrect. From the documentation:
Project records onto clusters
By default, once the cluster structure of the map has been determined, every article is matched against each cluster composition to assess how close their content are. A document may then be assigned to zero or several clusters at once. Additionally, a new table (whose name starts with “projection_cluster” followed by the field name) will be created in the database. This new table can be used as a new field for further analysis.
Assign a unique cluster to each record (best match)
This option can be activated to limit the maximum number of clusters assigned to each document to 1. The most similar cluster to a given document is then selected, as long as this similarity is higher than a predetermined threshold (meaning that some documents may still stay unassigned).
In other words, there is a measure of content similarity between a cluster and a record, whose details are not documented, from which clusters are attributed to each record. The unique cluster is simply the cluster with highest similarity.