Trouble parsing a PDF corpus

Cortext Manager Q&A forum › Category: Data processing › Trouble parsing a PDF corpus

matias.milia asked 8 years ago

Hi, I am trying to parse a corpus of pdf files. Some of them seem to be having troubles to be parsed. It seems to me that the number of pages in the document might be an issue. Some documents of 50 pages get parsed, some others of 199 don’t. Could the number of pages be causing some trouble? Got some ideas on how to work around this? I will be trying just spliting the files and see what happens.

Question Tags: corpus, parse, PDF

Cortext Manager Documentation

Learn about Cortext methods and share your experiences