Date format

tommv asked 5 years ago

Hello dear Cortext friends.
I am trying to import a corpus (json) and one of the column contains dates (in the format “2017-02-26”).
However Cortext does not seems to recognise the dates as such.
Should I use a different format for dates? or launch a script to parse them?
Thank you for your help
Tommaso

Jean-Philippe Cointet Staff replied 5 years ago

hi Tommaso, could you please send your dataset or even a simple sample of it ?
Your date seems to be correctly formatted. Maybe there is an other issue…

tommv replied 5 years ago

Wow that’s a quick reaction!
I’ve uploaded the dataset here https://www.dropbox.com/s/sdnokhyl3e5fk5x/BeforeAfter_USA_dates.json.zip?dl=0
The column with the dates is “daterelease” and ideally I’d like to slice my database by month (but the timeslicer script only takes years from what I can understand).
PS. There is another column called “period” which contains a text indicating ‘before’ or ‘after’ a certain date. Not sure if this create some conflict.

2 Answers
tommv answered 5 years ago

Wow that’s a quick reaction!
I’ve uploaded the dataset here https://www.dropbox.com/s/sdnokhyl3e5fk5x/BeforeAfter_USA_dates.json.zip?dl=0
The column with the dates is “daterelease” and ideally I’d like to slice my database by month (but the timeslicer script only takes years from what I can understand).
PS. There is another column called “period” which contains a text indicating ‘before’ or ‘after’ a certain date. Not sure if this create some conflict.

Jean-Philippe Cointet Staff answered 5 years ago

The issue is now resolved
Below, the log I obtained uploading and parsing your dataset and setting the time resolution to month starting from January 1985.
The data is spanning from months 362 to 386 which seems ok.
Sorry again for the bug…
2017-06-27 20:55:04 INFO : Parsing Script Started
2017-06-27 20:55:04 INFO :
Source:
Type of Data: dataset
Corpus Format: json
To search for duplicate entries, type the json unique field (if any): ”
Tweets extracted from Twitter API: other
If your json file includes a time entry, please indicate the attribute name: daterelease
Time Granularity: month
Starting Year: ‘1985’
If your json file is weighted, please type the name of the column including the weights of each entry: ”
Ignore entries with incorrectly formatted time steps: true
2017-06-27 20:55:04 INFO : Preparing raw data
2017-06-27 20:55:04 INFO : Parsing file /srv/local/documents/4d80/4d80be866332f32a8d479403c4a01f1e/beforeafter-usa-dates-json/BeforeAfter_USA_dates.json
2017-06-27 20:55:05 INFO : 1554 total entries
2017-06-27 20:55:05 INFO : Fields extracted: [u’docid’,
u’Title’,
u’daterelease’,
u’country’,
u’period’,
u’ISIpubdate’,
u’source’,
u’text’]
2017-06-27 20:55:05 INFO : Temporal data spanning from 362 to 386
2017-06-27 20:55:05 INFO : Parsing ended successfully

tommv replied 5 years ago

It worked!
Thank you very much for the help.

BTW, the new interface of the manager and the improved speed of analysis are amazing!