Period Detector

Two periods detected

Period detector directly works on the frequency distribution of (a) given field(s) to produce a matrix that maps the distance between the composition of the dataset at two different time steps (in the example above indexed by years for instance).

One should define the fields to consider for constructing frequency vector profiles at each time step (optionally one can choose to restrain the computation to the N most  frequent items – a useful feature when field distribution is heterogeneous as dissimilarity measure may then be sensible to noise). The “dissimilarity” between two time-steps is then computed as 1 minus the cosine value between vectors formed by the frequency values of each  “year”. For instance on the example above, every diagonal cells score 0 because profiles are perfectly aligned. The whiter the cell, the most dissimilar two time steps are.

The script also automatically computes the partition which optimally divides the time in a given number of periods. The algorithm searches the cut times that optimize the sum of the homogeneities of each sub-block.

If set to zero, the number of slices will be automatically computed using statistical   criterion explained in Tibshirani, Walther and Hastie paper (2001).