This documentation provides detailed information about Cortext Manager.
Once you have produced and uploaded your corpus, the first required step is to parse it (the parsing script should be automatically launched after upload). This task will convert the original corpus into a convenient format (namely sqlite database) which is tractable for further treatments.
Different tools (scripts) are then at your disposal to analyze this dataset. Before starting them be sure to always choose the parsed file (.db file) as a corpus. It is also advised to read the log of jobs clicking on the flag (green or red) that appears in the project page once a script has started. When the job has failed, the last line of the log should provide some succinct explanations. If not, please don’t hesitate to report the bug to the forum. Available scripts are listed below and are categorized in 5 families in this documentation: Data Exploration, Data Processing, Text Processing, Data Analysis and Time Processing.
- Demography will generate basic descriptive statistics about the structure and evolution of the main fields in your dataset,
- Lexical Extraction automatically extracts list of pertinent terms using NLP technics,
- Named Entity Recognizer detects named entities such as persons, organizations, locations, etc.
- One can also indexes databases with their own custom terms list, a dedicated interface is proposed to easily create your own lists,
- Heterogeneous Networks Mapping performs homogeneous and heterogeneous network analysis and produces intelligible and tunable representation of dynamics,
- Contingency Matrix provide a direct visualization of existing correlations between distinct fields in your data,
- Period Detector longitudinally analyzes the composition of your data to automatically detect structurally distinct periods,
- You can customize the periods you wish to work on with Period Slicer. Quantative data may also be very easily pre-processed with the Data Slicer script,
- Query A Corpus to create any sub-corpus resulting from a complex query,
- Different scripts allow to filter out and clean categorical lists: List Builder and Corpus List Indexer,
- Distant Reading builds an interface which allows to compare the dynamic profiles of words in a dynamic corpus,
- Correspondance Analysis script provide minimal facilities to perform a multiple correspondance analysis on any set of variables.
- Word2Vec Explorer maps large number of words which positon has been trained using word2vec model.
If you don’t have any dataset available, please feel free to use this dataset of recipes from a former Kaggle competition featuring a set of almost 40 000 cooking recipes along with their regional cuisine origin. Simply upload the zip file, parse the dataset as a json file and start exploring (see too maps produced with this dataset below)!