Using Extracted Terms at Heterogenous Network Mapping

CorText Manager Q&A forumCategory: Text processingUsing Extracted Terms at Heterogenous Network Mapping
Bobbele1 asked 5 years ago

Hey guys,
At the moment, I am trying to write my master thesis. One part of it is to conduct a heterogenous network mapping analysis.
In order to do so, I have extracted a term list of 500 words via the script Terms Extraction out of my corpus which contains 1300 pdf files downloaded and zipped from LexisNexis. This is supposed to the basis for my mapping.
Opposing to the manual, I cannot choose the extracted list to be part of the analysis in any of field 1 or field 2.
A few days ago, this was possible when using a test data set. But now, I cannot choose any terms list to be analysed, no matter which dataset I use.
Can you help me to solve this problem? I am quite in time trouble :/

Thanks in advance and best regards,
a desperate student

Jean-Philippe Cointet Staff replied 5 years ago

Dear student,
Has the “term extraction” script terminated normally?
Its outcome should consist in the production of a csv file containing all the extracted terms but also the creation of a new variable (which should show as Terms in the network mapping parameter form) that includes the index of every term !
Is “Terms” not showing in the list of avialable variables ?

Bobbele1 replied 5 years ago

Dear Jean-Philippe,
thank you very much for the quick response. The “term extraction” script has terminated sucessfully.
The outcome is also a csv file containing all the terms. So far everything has worked out. what do you mean by a new variable? when I start the network mapping script, there are only “terms”,”timesteps”, “filename” and “text” given for choice.
The problem is that “terms” does relate to the whole corpus and not to the extracted terms list which should be offered according to the documentation. I hope, I could answer it in an appreciating way.

Maybe you can use the script log:
Lexical extraction parameters:
Textual Fields:
– filename
– text
Minimum Frequency: ‘3.’
List length: ‘100’
Monogramms are forbidden: true
Maximal length (max number of words): ‘2’
Lexical extraction advanced settings: true
Frequency computation level: document level
Ranking Principle: chi2
linguistic pre-processing: true
grammatical criterion: noun phrase
Pivot Word: ”
Starting Character: ”
Sampling: false
Automatically index the corpus: true
Optionnaly you can name the new indexation that will be generated: ”
Choose Original Timescale: Standard Periods
Number of time slices: ‘1’
time slices distribution: homogeneous

Thanks again !


Jean-Philippe Cointet Staff replied 5 years ago

The new variable is precisely called Terms !
What do you exactly mean when you write that “The problem is that “terms” does relate to the whole corpus and not to the extracted terms list which should be offered according to the documentation.” ?
Entities stored in the Terms variable should be the same than the one listed in the csv file you mention.
Can you run the corpus explorer to check the indexation worked properly?
However, you should be able to map a term list using Terms as Field1 and Field2 right now ?

Bobbele1 replied 5 years ago

The point is, I have edited the extracted terms list by deleting meta data and words not being specific. Of course, I also saved my edits. This edited list is supposed to be called “terms” at network mapping (?). So, when generating the heterogenous network map, I also would like to use the edited terms list.
Using “terms” always guides me to a map that contains all words from the initial “extrracted terms list” which means that there are nouns being used which I had deleted before.
Another choice that is offered is “PC_Terms_Terms” which I thought to be the edited terms list. Unfortunately, the script doesn’t run successfully as it comes up with an error. I have no idea what to do.
The corpus explorerr has also worked properly….

1 Answers
Jean-Philippe Cointet Staff answered 5 years ago

I think I understood what is happening now.
If you manually edit a list of terms. You still need to index it after upload in your project space. 
Meaning you have to run the “corpus term indexer” script using the newly uploaded csv file for indexation.
By default the variable Terms will be replaced with the new indexation. Alternatively you can also indicate a different name 

Bobbele1 replied 5 years ago

Thank you so much, I will have a try after lunch. I will let you know !

Bobbele1 replied 5 years ago

It worked out. Thank you very much for your support!

Best Regards