Cooc_deviation scale

Amy Weissenbach asked 4 months ago

Hi, I am using cooc_deviation to shade heat maps, and I’m a bit confused about how to interpret the scale. The documentation suggests that a cooc_deviation measure of “2” means that a term’s number of citations in a document type is “twice what should be expected” if terms were uniformly distributed across document types. But in that case, how to make sense of cooc_deviation scores between -1 and 1?
I have a corpus of letters, and I’m trying to create a heat map to look at what people say in typed letters as compared to handwritten letters. The scale on my heat map goes from -0.2 to 0.2. I’m guessing that this means that for terms shaded at 0.2, they appear 20% more often in typed letters than one would expect if terms were uniformly distributed across typed and handwritten letters? (And that if the measure were 2.0, then this would actually indicate that the term appeared 200% more often, not 200% as often…?) If I’m wrong, what does a cooc_deviation score of 0.2 mean?
Thanks very much!

1 Answers
Lionel Staff answered 3 months ago

Dear Amy,

For Heat Maps, you should keep in mind that it is the association of the two (or even three) variables which is used to build the heat and cold zones: on one side the one or two variables used and plotted in network map (keywords in your case, if I understand well) and on the other side the categories (typed letters and handwritten letters). So, you should define which category is used for the heat map using the value of the field you wish to plot the heatmap of parameter
So, at the end, you measure the deviation of an association (broad definition of a cooccurrence), between a keywords (shown in the network) and the type of letter, within you documents (letters of all types).
Positive and negative cooc_deviation follow the same strategy: for negative cooc_deviation, it is reversed. The trick is that the measurement is then totally symmetrical: i.e. large deviations over or under represented are measured in the same way.

  • If you find -1, it means that you need +100% more co-occurrences than observed to reach the baseline
  • If you find -2, it means that it would take +200% more co-occurrences than observed to reach the baseline
  • If you find +2, it means that there are +200% more co-occurrences than expected for the baseline

I hope it helps,