I’m trying to parse a corpus from RetroNews. I can only export it as json files.
I can’t get it to work: the only message I get from Context Manager is ‘Debug Log: Error! Log file not found’. Can you help me to solve the problem? I tried to upload a corpus composed of a single json file as a test (here is the file : https://drive.google.com/file/d/10yZ0e2jzloj6qChFd2cND8MHglyERTke/view?usp=sharing).
Thank you for your help !
The problem comes from the fact that the documents from RetroNews are divided in pages in the json file.Where each page has its own variables (text, number of words, size…). And, the hierarchy of the pages is not he same than the original document.
Do you have more export options from RetroNews ?
Thank you for your reply ! I will ask the RetroNews team.
For now, this is the only automatic download format that I have access to.
Could a python script make this data work with Cortext?
Thank you very much !
I do not know if you are familiar with transforming the data-structure, but there are many ways to achieve what you want to do.
- python is one of them
- You could produce a new json, by concatenating all the sub elements (mainly the text one) for all pages and moving them to the top hierarchy
- you could try to transform the json into a csv
there are probably some other options. But all of them required a little work from your side. Not so difficult, but which need skills on data transformation, coding…
I hope it helps