Trouble parsing a PDF corpus

CorText Manager Q&A forumCategory: Data processingTrouble parsing a PDF corpus
matias.milia asked 5 years ago

Hi, I am trying to parse a corpus of pdf files. Some of them seem to be having troubles to be parsed. It seems to me that the number of pages in the document might be an issue. Some documents of 50 pages get parsed, some others of 199 don’t. Could the number of pages be causing some trouble? Got some ideas on how to work around this? I will be trying just spliting the files and see what happens.