Hello,
I can’t use the parsing script on a Europresse corpus (exported in html then zipped).
The error displayed is “Log file not found”.
Thanks for your help!
Hannah
Dear Hannah,
Apparently, Europress has added a new class for (some of) the title of the articles.
If you know how to open html files, just:
- replace all the “sm-margin-TopNews ” (no quote, space at the end is important)
- by an empty string: “”
Save it as a new file. Zip it and parse it.
We will work soon on it, many thanks for the report!
L
Thank you for your answer ! I’m not sure how to make change to the htlm file thought (would RStudio work?). I’m trying to figure this out by myself, I’ll let you know how it goes 🙂
best,
Hannah
Dear Hanna,
Even simpler. A basic notepad editor would fit : notepad++ or something else that you are used to.
I hope it helps !
L
Got it ! It worked, thank you a lot and have a nice day !