Hi, I am trying to upload a corpus of text extracted from Mexican digital media but they won’t parse. They are all in UNICODE UTF-8, I don’t know if that can be a problem for Cortext. I have done this in the past and the process is quite straightforward, I don’t know if I am messing it up somehow. Anyways, here is the error I get from the script log:
2020-05-27 02:43:22 DEBUG : Something went wrong while trying to parse, are you sure you selected the correct corpus format ?
Any ideas? They are 126 documents, all in different ‘.txt’ files and they are zipped all together in a ‘.zip’ file.
Ps: Sorry, I tried to paste the complete error code, but the server would block me every time saying ‘A potentially unsafe operation has been detected in your request to this site. Your access to this service has been limited. (HTTP response code 403)’
Dear Matias,
Could you invite me to your project so I would be able to check for your txt dataset ? I will quit the project after.
with : lionel dot villard at esiee dot com
Best regards
L
Thanks Lionel, I have already sent you an invitation to join the project. FYI, text files still need some cleansing, this was just my first approach to it.
Keep me posted if there is something I can do.
Thanks again!
I made a mistake in my own mail address!!
lionel dot villard at esiee dot fr
Could you invite me again ??
No worries, it happens. I’ve already invited you again.
Thanks for your help!