Europresse corpus parsing

CorText Manager Q&A forumCategory: Data processingEuropresse corpus parsing
Hannah asked 1 year ago

Hello,
I can’t use the parsing script on a Europresse corpus (exported in html then zipped).
The error displayed is “Log file not found”.
Thanks for your help!
Hannah

3 Answers
Lionel Staff answered 1 year ago

Dear Hannah,
Apparently, Europress has added a new class for (some of) the title of the articles.
If you know how to open html files, just:

  • replace all the “sm-margin-TopNews ” (no quote, space at the end is important)
  • by an empty string: “”

Save it as a new file. Zip it and parse it.
We will work soon on it, many thanks for the report!
L

Hannah replied 1 year ago

Thank you for your answer ! I’m not sure how to make change to the htlm file thought (would RStudio work?). I’m trying to figure this out by myself, I’ll let you know how it goes 🙂
best,
Hannah

Lionel Staff answered 1 year ago

Dear Hanna,
Even simpler. A basic notepad editor would fit : notepad++ or something else that you are used to.
I hope it helps !
L
 

Hannah replied 1 year ago

Got it ! It worked, thank you a lot and have a nice day !

Lionel Staff answered 11 months ago

Dear Hanna,

The Europress parser was updated this morning. It should now work as before.
Please do not hesitate if you face any further issues.

Kind regards
L