Europresse corpus parsing

Cortext Manager Q&A forum › Category: Data processing › Europresse corpus parsing

Hannah asked 3 years ago

Hello,
I can’t use the parsing script on a Europresse corpus (exported in html then zipped).
The error displayed is “Log file not found”.
Thanks for your help!
Hannah

3 Answers

0 Vote Up Vote Down

Lionel Staff answered 3 years ago

Dear Hannah,
Apparently, Europress has added a new class for (some of) the title of the articles.
If you know how to open html files, just:

replace all the “sm-margin-TopNews ” (no quote, space at the end is important)
by an empty string: “”

Save it as a new file. Zip it and parse it.
We will work soon on it, many thanks for the report!
L

Hannah replied 3 years ago

Thank you for your answer ! I’m not sure how to make change to the htlm file thought (would RStudio work?). I’m trying to figure this out by myself, I’ll let you know how it goes 🙂
best,
Hannah

0 Vote Up Vote Down

Lionel Staff answered 3 years ago

Dear Hanna,
Even simpler. A basic notepad editor would fit : notepad++ or something else that you are used to.
I hope it helps !
L

Hannah replied 3 years ago

Got it ! It worked, thank you a lot and have a nice day !

0 Vote Up Vote Down

Lionel Staff answered 3 years ago

Dear Hanna,

The Europress parser was updated this morning. It should now work as before.
Please do not hesitate if you face any further issues.

Kind regards
L

Cortext Manager Documentation

Learn about Cortext methods and share your experiences