Difference between csv and robust csv in data parsing

CorText Manager Q&A forumCategory: Data processingDifference between csv and robust csv in data parsing
Kate Li asked 2 months ago

Hello,
I hope you are well! I noticed that the documentation pages and video tutorials recommend “robust csv,” but I saved my file as a UTF-8 comma delimited file, and I could only parse it as a csv file, not as a robust csv file (it gave me an error).
I was wondering what the difference between parsing a corpus as a “csv” and a “robust csv” is? 
Thank you for your time!
Kate

1 Answers
Tatiana Sanchez Staff answered 2 weeks ago

Hello Kate,

The two parsing types are similar in what they are doing.
The ‘csv’ parser type will deal with specific characters (UTF-8) line by line, while the ‘robust csv’ parser type doesn’t, so it is able to work with large files in a faster way.
To explain the error you have had with the ‘robust csv’ parsing, we have one hypothesis: when selecting the parameter
‘Time Field’ (If your csv file includes a time entry, please indicate the attribute name)
you cannot use the reserved word ‘year’ inside a fieldname in your file. So if, for example, you have a fieldname like “Year of birth”, the ‘Data parsing script’ will convert it to “Year_of_birth”.
Then, when you are parsing it and want to indicate the ‘Time Field’ you should write “Year_of_birth” with underscores, otherwise it will not work as expected.

I hope it helps!

Tatiana