Europresse corpus parsing

CorText Manager Q&A forumCategory: Data processingEuropresse corpus parsing
Hannah asked 2 years ago

Hello,
I can’t use the parsing script on a Europresse corpus (exported in html then zipped).
The error displayed is “Log file not found”.
Thanks for your help!
Hannah

3 Answers
Lionel Staff answered 2 years ago

Dear Hannah,
Apparently, Europress has added a new class for (some of) the title of the articles.
If you know how to open html files, just:

  • replace all the “sm-margin-TopNews ” (no quote, space at the end is important)
  • by an empty string: “”

Save it as a new file. Zip it and parse it.
We will work soon on it, many thanks for the report!
L

Hannah replied 2 years ago

Thank you for your answer ! I’m not sure how to make change to the htlm file thought (would RStudio work?). I’m trying to figure this out by myself, I’ll let you know how it goes 🙂
best,
Hannah

Lionel Staff answered 2 years ago

Dear Hanna,
Even simpler. A basic notepad editor would fit : notepad++ or something else that you are used to.
I hope it helps !
L
 

Hannah replied 2 years ago

Got it ! It worked, thank you a lot and have a nice day !

Lionel Staff answered 2 years ago

Dear Hanna,

The Europress parser was updated this morning. It should now work as before.
Please do not hesitate if you face any further issues.

Kind regards
L