Topic modeling – how to find out the list of topics?

ForumCategory: QuestionsTopic modeling – how to find out the list of topics?
mhrousseau asked 1 month ago

I am currently employing a topic modeling script on a set of data, and we have limited the script to 40 topics. We can see the key terms that are designated for each topic however we are not able to see a list of the actual topics that the script has created. Is there a way to see this in the script? Thank you! 

Jean-Philippe Cointet Staff replied 1 month ago

The list of topic shows in the visualization (click the eye next to the html file): https://documents.cortext.net/6a78/6a78e74d413fb0ee103d9234d88f50a0/97517/vislda.html#topic=1&lambda=1&term=

Top 10 most important words for each topic are also shown in the log (just click the green flag) like in the example below :
2018-11-08 18:40:02 INFO : topic #0 (0.100): 0.046*”cop” + 0.034*”item” + 0.033*”agenda” + 0.026*”chair” + 0.022*”sbsta” + 0.021*”presid” + 0.021*”report” + 0.018*”consult” + 0.015*”matter” + 0.014*”plenari”
2018-11-08 18:40:02 INFO : topic #1 (0.100): 0.060*”parti” + 0.030*”inform” + 0.022*”develop” + 0.022*”communic” + 0.019*”gef” + 0.019*”nation” + 0.019*”paragraph” + 0.018*”report” + 0.018*”text” + 0.017*”countri”
2018-11-08 18:40:02 INFO : topic #2 (0.100): 0.033*”propos” + 0.020*”parti” + 0.019*”eu” + 0.019*”articl” + 0.018*”text” + 0.017*”said” + 0.015*”issu” + 0.015*”support” + 0.014*”us” + 0.014*”group”
2018-11-08 18:40:02 INFO : topic #3 (0.100): 0.031*”group” + 0.022*”deleg” + 0.020*”inform” + 0.020*”text” + 0.018*”consult” + 0.017*”issu” + 0.016*”negoti” + 0.015*”discuss” + 0.014*”work” + 0.012*”contact”
2018-11-08 18:40:02 INFO : topic #4 (0.100): 0.042*”cdm” + 0.026*”project” + 0.017*”activ” + 0.016*”mechan” + 0.012*”board” + 0.012*”fund” + 0.012*”said” + 0.011*”lulucf” + 0.011*”ji” + 0.010*”parti”
2018-11-08 18:40:02 INFO : topic #5 (0.100): 0.038*”develop” + 0.036*”countri” + 0.027*”climat” + 0.016*”said” + 0.016*”chang” + 0.014*”action” + 0.013*”need” + 0.013*”technolog” + 0.012*”call” + 0.011*”mitig”

Finally, two variables which name start with “projection_cluster_LDA” allows to see which topic were assigned to each document, and with which intensity. By default, only topics which prevalence in a document is higher than a certain threshold Th are stored (the threshold being defined as the inverse of the number of topic)

Hope it’s clearer this way!

learn about CorText scripts and share your experience