Parsing json files from RetroNews

CorText Manager Q&A forumCategory: Data processingParsing json files from RetroNews
Aurore Flamion asked 2 years ago

Hi, 
I’m trying to parse a corpus from RetroNews. I can only export it as json files. 
I can’t get it to work: the only message I get from Context Manager is ‘Debug Log: Error! Log file not found’. Can you help me to solve the problem? I tried to upload a corpus composed of a single json file as a test (here is the file : https://drive.google.com/file/d/10yZ0e2jzloj6qChFd2cND8MHglyERTke/view?usp=sharing). 
Thank you for your help ! 

3 Answers
Lionel Staff answered 2 years ago

Dear Aurore,
The problem comes from the fact that the documents from RetroNews are divided in pages in the json file.Where each page has its own variables (text, number of words, size…). And, the hierarchy of the pages is not he same than the original document.
Do you have more export options from RetroNews ?
L

Aurore Flamion answered 2 years ago

Thank you for your reply ! I will ask the RetroNews team.
For now, this is the only automatic download format that I have access to.
Could a python script make this data work with Cortext?

Thank you very much ! 

Lionel Staff answered 2 years ago

I do not know if you are familiar with transforming the data-structure, but there are many ways to achieve what you want to do.

  • python is one of them
  • You could produce a new json, by concatenating all the sub elements (mainly the text one) for all pages and moving them to the top hierarchy
  • you could try to transform the json into a csv 

there are probably some other options. But all of them required a little work from your side. Not so difficult, but which need skills on data transformation, coding…
I hope it helps