Data Processing

In this category, you will find instructions for preparing and parsing your original raw data. Data Slicer is specifically dedicated to transform numeric data into binned categories. Finally  querying facilities are explained to extract sub-corpora or enrich existing tables in your corpus.

Data processing documentation

Upload Corpus

To use the data parser you need to first “upload a corpus” as a zipped file containing every single raw files forming your corpus (e.g. set of isi output files, csv files, etc). Once you have chosen your original dataset, you must select its type from the list of available formats (csv, factiva, pubmed, isi). ...

Data Parsing

What does the parsing step? “Data parser” is a generic parsing script that handles a wide range of data formats: isi files (as  downloaded from the Web Of Science), Factiva datasets, Pubmed, RIS files, batches of simple text files or any file formatted in csv format (please use “robust csv” parser in that case, see ...

Query Corpus

This script allows you to query your corpus and build a subcorpus or create new fields with sql-like queries. Two modes of querying are proposed, sql begin the standard one. Querying your corpus in a sql-like mode The principle is to allow users to directly perform sql-like queries on their corpus. Query type Choose sql ...

Data Slicer

Data Slicer simply slices numeric data (provided that they are integer values) into any given number of quantiles (to be chosen in the form). For example, if one has a database compiling information about individuals including their age, it may be useful to transform this field in bins of various significant ages. In turn, it ...

Upload resource

You can upload any kind of documents (doc, pdf, power point) into your project. This is particularly useful for sharing these documents with the participants you are working with in the project or for example to store in your project a scientific article which would be useful for your analysis. But you may want to ...

Data Curation

Data curation script is there to help you to handle some transformation you would like to apply to your corpus. Database level Rename a Database Rename your corpus/database with a new name. Useful to shorten database form built by Query corpus which has usually long name. Remove duplicate entries This option allows to get rid ...

