PDF filenames changed after parsing

CorText Manager Q&A forumCategory: Data processingPDF filenames changed after parsing
etancoigne asked 2 months ago

Hello Cortext!
I uploaded a corpus of 3’000 PDFs as a zip file. Each PDF has a name based on the pattern “Year_Volume_whatever.pdf”.
I parsed it successfully (“Split the text content by sentence” : No, “Ignore entries with incorrectly formatted time steps” : Yes)
I got 3 fields : Time Steps, text, filename. While exploring the corpus, I could see that Cortext cropped my initial filename to fill the “filename” field: Now it is “Volume_whatever”. The field “Time Steps” is filled with what seems to be the “Volume” part of my filename (no years at all).
I planned to index my corpus with another database that I have, which has a “filename” fields corresponding to my initial filenames (“Year_Volume_whatever.pdf”).
How can I prevent Cortext to change the filenames I used?
Thanks a lot,
Elise

Lionel Staff replied 2 months ago

See below!

1 Answers
Lionel Staff answered 2 months ago

Hello Elise,
That strange! Could you invite me into your project: lionel dot villard at esiee dot fr ?
See you there 🙂
Lionel