orianabras asked 7 years ago

Dear colleagues, I am working with a database of publications. I wish to know whether top research institutions publish more certain types of publications (ISIDT) or not. I am doing this by running an analysis of Qi2 between the variable ISIDT and the variable research institution, choosing for ex: 200 nodes. Is this a correct way of proceeding? Am I correct in interpreting that I will get a distribution of the types of publications that the top 200 research institutions publish? Thank you very much for your collaboration. Best, Oriana  

2 Answers
Jean-Philippe Cointet Staff answered 7 years ago

Yes it is a good strategy ! Although the resulting network may not have much sense from a global perspective (I mean clusters may not mean anything particular given this choice of node types).  I would advise you to manually set the chi2 threshold above a certain value (2 for instance), such that only most relevant biases of certain institutions towards certain document type would show.  Be careful with document types too, as some of them may be very rare in the corpus. I would advise you to first create a new table (with corpus_list_indexer) only containing most important ones (articles, reviews, etc.) or choosing a different number of nodes for document types than for institutions (use the advanced settings in the nodes panel to define another number of top nodes in field2 to do so) Finally, it may also prove useful to directly use contingency analysis. The final results will probably hardly readable with such a high number of items. But the raw distributions of document types per institutions will be displayed in the log. You can then simply analyze this table in a spreadsheet. Hope it will work, good luck!

orianabras answered 7 years ago

Merci! Je vais essayer.