Period Detector

Screenshot from 2016-08-16 15:09:28

Two periods detected

Period detector directly works on the frequency distribution of (a) given field(s) to produce a matrix that maps the distance between the composition of the dataset at two different time steps (in the example above indexed by years for instance).

Screenshot from 2016-08-16 15:10:47

One should define the fields to consider for constructing frequency vector profiles at each time step (optionally one can choose to restrain the computation to the N most  frequent items – a useful feature when field distribution is heterogeneous as dissimilarity measure may then be sensible to noise). The “dissimilarity” between two time-steps is then computed as 1 minus the cosine value between vectors formed by the frequency values of each  “year”. For instance on the example above, every diagonal cells score 0 because profiles are perfectly aligned. The whiter the cell, the most dissimilar two time steps are.

The script also automatically computes the partition which optimally divides the time in a given number of periods. The algorithm searches the cut times that optimize the sum of the homogeneities of each sub-block.

If set to zero, the number of slices will be automatically computed using statistical   criterion explained in Tibshirani, Walther and Hastie paper (2001).