The profiling script offers very similar analytical capacities than contingency analysis. Given two fields of interest, it will consider a target entity in the first field and produce a visualization of how biased the entities of the second field are distributed in documents which have been tagged by this target entity.
For instance, one may be interested to visualize the full profile of one source with respect to the year of publication, words being used in its documents or any kind of metadata attached to individual documents.
Two measures are possible for measuring the salience of entity i (from field 1) toward j (from field 2) . Let’s consider a simple case that consists in measuring which delegation – among all the countries participating the Conference of the Parties – has been mentioning issues related to adaptation (identified thanks to a previous topic modeling) in the ENB corpus. Of course, we know that certain countries are more vocal than others. So we want our measure to be normalized, meaning that we will actually evaluate whether the proportion of times Tuvalu is talking about adaptation (compared to any other topic) is higher or smaller than the average value (for all the countries (including (one versus all) or excluding Tuvalu (one versus the rest option))) . This deviation is simply measure as follows. We compare the empirical proportion of mentions of topic j by country i with its expected value that is the product of the proportions of times i is mentioned by the raw number of times topic j is mentioned. We then measure a deviation rate that is symmetric, meaning that values can range from minus infinity to plus infinity.
The typical output is an histogram that is shown above.