corpus list indexer

This script is naturally connected to list builder script. It provides users with full control other a set of items that may later get mapped or analyzed. Technically, a new field will be created based on user selection.

How to use the script

Screenshot from 2016-08-16 14:57:27

  • Field – Select the field you wish to work on
  • Define a custom list of entities – If yes, one can provide a csv file filled with a list of items that will be specifically indexed in the target field (concretely, only the first column of a tabulated csv file will be considered). If no, every entities present under the chosen field will be indexed.
  • Add a dictionary of equivalent strings – If yes, one should provide a csv file made of couples of equivalent strings. Entities from the first column of the csv file will be automatically transformed into second column entities (remember that the default csv formatting is tabulation delimited, please use open office if you want to edit it in a spreadsheet software).
  • Add a null label to every article with no matching tag – This will label “null” any field that has none of the tags chosen by the user
  • Count only one occurrence per article during indexation –  This option is useful when one does not wish that several occurrences of the same entry are mentioned for a given document. For instance, if one wants to compute the distribution of articles published by the USA in a scientific database, it may be useful to reindex the Country field first with this option, such that articles written by at least one american author are counted only once. By default, if several scientists with different US affiliations publish a paper, then this article is indexed with several occurrences of USA in the raw database.
  • Finally, one can give a custom name to the newly generated field.

Note: Remember that while re-indexing new values to an existing list, use two columns namely 1: filename and 2: the replacing value. Therefore, if you are looking to re-index several values (such as date, source etc.) separate two-column files must be uploaded.

If you are replacing a temporal field, the table must necessarily be named “ISIpubdate”  

Note that the csv files used of feeding the list indexer can be directly produced from the online csv editor.

See the video below for a demo of how to add time step information to a dataset imported from raw text files:

learn about CorText scripts and share your experience