Structural Analysis for a huge database (265k entries)

CorText Manager Q&A forumCategory: Structural analysisStructural Analysis for a huge database (265k entries)
matias.milia asked 5 years ago

Hi, I am working with a quite large database (around 265k entries) and trying to do a Structural Analysis. I found out that the script would stop computing if the number of entities goes over 30,000 because it is computationally too heavy. I have a total of 319,189 authors, which is the field I am using as input.
I am just aiming to understand the newcomer dynamics (ratio and total number per year), so it is not the whole thing I am trying to do. Anyways, I have some questions that you might be able to help me with, I couldn’t find further info in the documentation page https://docs.cortext.net/structural-analysis/

  1. How are the TOP ENTITIES calculated?
    I activated ‘top nodes filtering for the whole period’ and just get to analyse a 10% of my population, so I need to know how is the data being selected if I want to analyse results correctly. I assume the script takes the more significative authors, but I don’t know how does it reach that point.
  2. Is it possible to, somehow, set this TOP ENTITIES aside in a particular database or variable?
    This would be useful to understand the composition of this top entities population. I know this might sound a little bit ‘too much’, but I had to ask.
  3. Which alternative method do you suggest to do a author count for each year of the period?
    If I manage to do this, would be useful to better describe the field and the inserted bias.
  4. Is there any other way to show the newcomer evolution in such a large database?
    I am thinking over it a lot, but I don’t seem to find any way around this.

I know my questions might be difficult to answer, so thanks beforehand for your kind effort.
Thanks a lot!

matias.milia replied 5 years ago

I found a way to see if the dynamics I was observing for this 30,000 TOP ENTITIES related to a broader sample. I run the PERIOD DETECTOR script just on the ‘author’ field with the TOP 200,000 authors; I found that the dynamics I observed do illustrate the dynamics of the field. It solves partially my question, so I thought it could be useful to post this here.

matias.milia replied 5 years ago

Anyway, I still would like to know how are the TOP ENTITIES calculated. Does anybody know where to find an insight on this?