Hello,
I’ve started to encounter problems in parsing a corpus of txt data from Lexis Nexis. Although I’ve previously encountered occasional problems with Lexis Nexis data, today I’ve encountered continuous errors. To confirm it’s not the data, I’ve tried with a range of small and large corpuses downloaded from Lexis Nexis, and have controlled for the various parsing settings. Each time I get the following:
Debug Log: Error! Log file not found.
Can anyone advise?
Thanks
Laurie
Dear Laurie,
It seems like some of the articles you uploaded did not have any LOAD-DATE entry (which contains the date information).
The script was modified to ignore them and is now processing the files properly, but as a consequence some articles may be missing…
Dear Jean-Philippe
Thanks for such a quick response!
The script now parses the text, and I can include the LOAD-DATE in future data collection. However, I now have a new problem that when I use the Corpus Explorer I now find errors with the “Title”: although most of the time the Title is correct, in some instances the script seems to mis-recognise the Title and instead gives the Section or the Date. Is this a problem with the script or the formatting of the Lexis Nexis data?
Best
Laurie
The parsing script has not changed, so I guess it comes from this specific dataset…
Have you tried to download another set of data from Lexis Nexis and reproduce the bug.
Something may have gone wrong when you downloaded this specific dataset. Another possibility is that lexis nexis changed its strategy of data formatting, in which case, we will try to modify the parser accordingly.
Please keep us posted and don’t hesitate to share your project with me if you don’t mind. It would allow me to precisely look at the original data.
Ok, I uploaded a new Lexis Nexis corpus and it still makes the same (occasional) Title errors described above. I have shared the project with you- look forward to your response. Thanks!
L
I can’t see it, have you shared it at my complete adress ?
Login is jphcoi
hosted @gmail.com