query a corpus to create a subcorpus

CorText Manager Q&A forumCategory: Data processingquery a corpus to create a subcorpus
orianabras asked 6 years ago

Dear colleagues, In a database of publications I would like to perform analysis taking into account only certain types of documents. I think I can do it by making a query in the database. Is it advisable to create a new project to make the query? Or can I make the query in my current project and how should I proceed when I want to do analysis for the whole database again? Also, I would like to ask, when writing the condition for the query how do I write several conditions, for example, several types of documents? I hope I could clearly express my doubts. Thank you very much for your time and collaboration. Oriana

5 Answers
Jean-Philippe Cointet Staff answered 6 years ago

The whole database won’t be affected, query script can actually create a brand new database with the select documents only.Regarding multiple criteria queries, after selecting ISIDT as a field, it is possible to write things like: “data = ‘article’ or data = ‘review'” It’s even possible to use any boolean operator accepted by the sqlite syntax.For instance “NOT data = ‘article'” shall exclude articles.   Lastly, you can use the like syntax to query on raw textual content: data like ‘%bird%’ to only select documents which abstract (for instance) include the word “bird”

orianabras answered 6 years ago

Merci Jean-Philippe!

orianabras answered 6 years ago

Hello again, I tried to use the condition: “NOT data = ‘Meeting Abstract'” and it did not work, that is, it returned a non filtered database. The condition: “data !=’Meeting Abstract'” did work and it returned a filtered database for that condition. Was there some detail that I missed about the way of making the query using “NOT” ? Thank you.    

Lionel Staff answered 6 years ago

It should work by following these steps:

  1. Choose the script: Query Query a corpus to extract a subcorpus
  2. Stay with the default value for option Query Type: SQL
  3. Choose your variable with Target table name: Publications Type
  4. Define your query, condition (*): data <> ‘Meeting Abstract’
  5. If you want to build an entire new database, check the Build an entirely new database excluding articles that do not match the query for every fields (*) option with: Yes

Hope it helps!

Lionel Staff replied 6 years ago

Be careful : data ‘Meeting Abstract’ is with simple quotes

orianabras answered 6 years ago

Thank you Lionel!