## Composition of the dataset at different time steps

Period detector directly works on the frequency distribution of (a) given field(s) to produce a matrix that maps the distance between the composition of the dataset at two different time steps (in the example above indexed by years for instance).

#### Field

One should define the fields to consider for constructing frequency vector profiles at each time step. The “dissimilarity” between two time-steps is then computed as 1 minus the cosine value between vectors formed by the frequency values of each “year”. For instance on the example above, every diagonal cells score 0 because profiles are perfectly aligned. The whiter the cell, the most dissimilar two time steps are.

#### Enter the number of periods you which to detect

The script also automatically computes the partition which optimally divides the time in a given number of periods. The algorithm searches the cut times that optimize the sum of the homogeneities of each sub-block. If set to **zero**, the number of slices will be automatically computed using a statistical criterion (Tibshirani et al., 2001).

#### Top items

Restrain the computation to the N most frequent items from the selected field(s) (a useful feature when field distribution is heterogeneous as dissimilarity measure may then be sensible to noise).

## What to do next?

Based on the matrix and the periods detection, you may want to apply these periods using Period Slicer script to define and use **Custom periods** in your dataset.

## Reference

Robert Tibshirani, Guenther Walther and Trevor Hastie, Estimating the number of clusters in a data set via the gap statistic, J.R. Statist. Soc. B (2001), 63, Part 2, pp. 411-423 (online).