I can’t use the parsing script on a Europresse corpus (exported in html then zipped).
The error displayed is “Log file not found”.
Thanks for your help!
Apparently, Europress has added a new class for (some of) the title of the articles.
If you know how to open html files, just:
- replace all the “sm-margin-TopNews ” (no quote, space at the end is important)
- by an empty string: “”
Save it as a new file. Zip it and parse it.
We will work soon on it, many thanks for the report!
Thank you for your answer ! I’m not sure how to make change to the htlm file thought (would RStudio work?). I’m trying to figure this out by myself, I’ll let you know how it goes 🙂