I have run the demography script to explore the temporal distribution of topics that I have extracted with the ‘Topic modeling’ script. I have now some doubts about the interpretation of temporal evolution. How is the occurrence of topics (as a group of words) calculated over time? in other words, what does the y axis indicates when the variable used are topics rather than a single term?
It is the number of documents for the periods used associated with the topics. But, you have to consider that it is a raw count of the number of documents.
So there are two issues using topic names (from topic modelling script) with demography script :
- Each document may have more than one topic;
- Each topic for one document is not representative of the content with the same intensity. Some topics may be strongly present in your documents while some others are marginal. So, using a demography script does not show the real evolution of the importance of the topic in the content of the documents.
I hope it helps