Hello,
I encounter a problem when performing a time processing analysis on a corpus built from Factiva. This seems to be a basic question of initial data formatting but I haven’t found any questions/answers closely related to Factiva corpora or at least the possibility of automating proper formatting.
Could you help me?
The corpus was normally constituted and allows usual analysis of textual fields and good recovery of all the other fields.
However, when I engage in analyses mobilizing the time variables, I encounter errors.
After a return to the data through an exploration of the corpus, I did indeed notice that the native formatting of the variables “PublicationDate” and “ISIpubdate” could not be mobilized in their current state as they appear in forms similar to “1st of February 2021” with possibly other usual linking words related to the specific languages of the corpora (European though).
Have I missed one of the previous steps to correctly parse the corpus? Or is it due to a change in the native Factiva formatting? In any case, could you tell me a simple and efficient procedure to automate via Cortext the correct formatting of temporal fields in order to perform temporal analyses?
Thank you in advance,
Nicolas
Dear Nicolas,
No matter if your browser is set to the Spanish or English (or French in my case), what matters is the language configuration of Factiva.
Just switch it from Spanish to English: option panel, top right section, and select “Idioma” 🙂
And it should work.
Enjoy
Lionel
It sounds like a change on Factiva side, let us have a look and investigate the issue… We will keep you posted, can you simply send us an example of source file which time information is not parsed properly ?
Thank you
The problem is encountered for example with a query targeting 4 major Italian newspapers (keyword “innova*”, classic paper editions of “Il Messaggero”, “La Stampa”, “La Repubblica”, “Corriere della Sera”) with the 500 most recent articles as of today.
From the html file to the csv file obtained with “Corpus explorer”, there is no difference and the field “PD” / “Publication Date” contains a string such as for example: “10 de mayo de 2021” or “26 de abril de 2021 ET 08:55” due to the Spanish configuration of the browser and Factiva interface at the time of downloading.
Nicolas