Problem parsing DOCX corpus

CorText Manager Q&A forumCategory: Data processingProblem parsing DOCX corpus
Emma Bogler asked 2 years ago
Hello!

I am trying to parse a small sub-corpus of about 70 DOCX files, which I uploaded as a .zip (as usual). Every time I run the parsing script I get the following error logs:

It seems that the generated database does not contain any table, did you select the correct source for your dataset? It seems that the generated database is empty or corrupted, did you select the correct source for your dataset?

I am at a loss since the larger corpus from which this sub-corpus was drawn parses normally. Any ideas? Thanks in advance for your help!
2 Answers
Lionel Staff answered 2 years ago

Dear Emma,
That’s strange!
Could you invite me into your project using lionel dot villard at esiee dot fr ??
Best regards,
Lionel

Lionel Staff answered 2 years ago

Dear Emma,
Docx parsing tested this morning with a test dataset: everything seams to work as usual! Any progress on your side?
Could it be from a corrupted docx file or a corrupted zipped file?
I hope it helps,
L