Different tools are provided to help you browse your data:
- First Corpus Explorer provides a table-like visualization of your dataset,
- Demography script is useful for getting an overall idea of the dynamics of entities in each field,
- Distant Reading provides a complete interface for evaluating the temporal trend of textual entities.
- With Word2Vec Explorer, one can browse the structure of large vocabulary trained from a corpus
Data exploration documentation
Corpus explorer provides a table-like view of your dataset directly online allowing to directly read the content of your corpus. Different filtering option are also provided either globally using the top search box or for each field using the individual search boxes at the bottom. A column may also be hidden using the Toggle option at the...
Demography processes each field of the corpus and counts the raw evolution of occurrences of the top items. You will simply be asked to specify the number of top items you wish to evaluate. If you previously customized periods, you can also optionally choose them instead of the original time stamps. The script creates two...
Inspired from Franco Moretti work on literary corpus, this script provides a complete interface (files suffixed by distant.html in the resulting dataset directory) for comparing the dynamics of a series of items in a corpus. It is mainly designed to compare words from a given textual field but could be used for other purposes… Distance reading parameters Textual...
W2V Explorer learns the word embedding of every word (above a given frequency threshold) using the Word2Vec (Mikolov et al. 2013) model in a corpus and visualizing the position of words in a reduced 2 dimensional space generated by t-SNE (Maaten, 2008). Words are also clustered according to their proximity using HDBScan algorithm (Campello et al. 2013). Set...
Contrast Analysis script proposes to show how much two sub-corpus (defined by the user within a dataset) feature a different set of words in its textual content or entities in any categorical field. It uses the excellent library scattertext (Kessler J.S., 2017). See below an interactive example showing which words were used relatively more often...
Latest questions in the Q&A forum on data exploration
475 views1 answers0 votes
454 views2 answers0 votes
544 views3 answers1 votes
517 views3 answers0 votes
877 views0 answers0 votes
867 views1 answers0 votes
1873 views0 answers0 votes
677 views0 answers0 votes
973 views1 answers0 votes