Network mapping on Pubmed abstract text

CorText Manager Q&A forumCategory: Data processingNetwork mapping on Pubmed abstract text
bienvenu thomas asked 6 years ago

Hi
I would like to perform network mapping on Pubmed abstracts (raw text).
When importing abstracts as such in Cortext, the “text” field is not in the options
When importing abstracts as text, the script returns the follwing error message
Thanks for your help!

Debug Log:
2018-12-14 23:33:22  -  Apparently at least one of the network you produced is empty, please review the choice of your parameters (fields and proximity metric choice)"
1 Answers
Jean-Philippe Cointet Staff answered 6 years ago

Dear Thomas, 
it seems like you are trying to produce a network map using abstract as field1 and field2. To produce a semantic network  (co-word), you would first need to extract terms from the text using named entity recognition or term extraction. 
I hope it helps !

bienvenu thomas replied 6 years ago

Hi Jean-Philippe
Many thanks for your swift answer. Sorry I don’t get how this would work as these Cortext scripts break down structure (document, sentences, date, authors etc.). Ideally, Network mapping would include a pre-processing option for stemming (or eq.) without alterning the structure.

Thanks again for this brilliant website

Jean-Philippe Cointet Staff replied 6 years ago

Thank you Thomas.
I’m really sorry but I think I did not understand your question. Can you tell me again which data format you are using (xml from pubmed or some other format ?) and what kind of map you would like to produce ?

bienvenu thomas replied 6 years ago

Many thanks again, Jean-Philippe
3 formats: Pubmed xml, plain text and ISI
Ideally, I would like to visualize homogeneous network dynamics among concepts (terms) contained within abstracts (with Pubmed) and/or full-text articles.
Hope this makes more sense.
Thank you

Jean-Philippe Cointet Staff replied 6 years ago

I think I understand better.
Cortext does not allow to combine data coming in different formats in the same dataset unfortunately…
The only way to proceed would then be to first import datasets separately, isolate the info they all shared (but this is challenging as authors won’t be foramted the same way, citations are not consistently stored, etc.) and try to aggregate the info in the same file. If you want to work with abstract only, this is doable using options offered in the corpus explorer (that allows to export selecte fields as a csv for instance)