Problem parsing DOCX corpus

Cortext Manager Q&A forum › Category: Data processing › Problem parsing DOCX corpus

Emma Bogler asked 3 years ago

Hello!

I am trying to parse a small sub-corpus of about 70 DOCX files, which I uploaded as a .zip (as usual). Every time I run the parsing script I get the following error logs:

It seems that the generated database does not contain any table, did you select the correct source for your dataset?
It seems that the generated database is empty or corrupted, did you select the correct source for your dataset?

I am at a loss since the larger corpus from which this sub-corpus was drawn parses normally. Any ideas? Thanks in advance for your help!

Question Tags: Parsing error

2 Answers

0 Vote Up Vote Down

Lionel Staff answered 3 years ago

Dear Emma,
That’s strange!
Could you invite me into your project using lionel dot villard at esiee dot fr ??
Best regards,
Lionel

0 Vote Up Vote Down

Lionel Staff answered 3 years ago

Dear Emma,
Docx parsing tested this morning with a test dataset: everything seams to work as usual! Any progress on your side?
Could it be from a corrupted docx file or a corrupted zipped file?
I hope it helps,
L

Cortext Manager Documentation

Learn about Cortext methods and share your experiences