Hello,
I hope you are well! I noticed that the documentation pages and video tutorials recommend “robust csv,” but I saved my file as a UTF-8 comma delimited file, and I could only parse it as a csv file, not as a robust csv file (it gave me an error).
I was wondering what the difference between parsing a corpus as a “csv” and a “robust csv” is?
Thank you for your time!
Kate
Hello Kate,
The two parsing types are similar in what they are doing.
The ‘csv’ parser type will deal with specific characters (UTF-8) line by line, while the ‘robust csv’ parser type doesn’t, so it is able to work with large files in a faster way.
To explain the error you have had with the ‘robust csv’ parsing, we have one hypothesis: when selecting the parameter
‘Time Field’ (If your csv file includes a time entry, please indicate the attribute name)
you cannot use the reserved word ‘year’ inside a fieldname in your file. So if, for example, you have a fieldname like “Year of birth”, the ‘Data parsing script’ will convert it to “Year_of_birth”.
Then, when you are parsing it and want to indicate the ‘Time Field’ you should write “Year_of_birth” with underscores, otherwise it will not work as expected.
I hope it helps!
Tatiana