cluster#

tike.cluster.by_scan_grid(pool, shape, dtype, scan, *args, fly=1)[source]#

Split the field of view into a 2D grid.

Mask divide the data into a 2D grid of spatially contiguous regions.

Parameters
  • shape (tuple of int) – The number of grid divisions along each dimension.

  • dtype (List[str]) – The datatypes of the args after splitting.

  • scan ((nscan, 2) float32) – The 2D coordinates of the scan positions.

  • args ((nscan, ...) float32 or None) – The arrays to be split by scan position.

  • fly (int) – The number of scan positions per frame.

  • pool (ThreadPool) –

Returns

  • order (List[array[int]]) – The locations of the inputs in the original arrays.

  • scan (List[array[float32]]) – The divided 2D coordinates of the scan positions.

  • args (List[array[float32]] or None) – Each input divided into regions or None if arg was None.

tike.cluster.by_scan_stripes(scan, n, fly=1, axis=0)[source]#

Return n boolean masks that split the field of view into stripes.

Mask divide the data into spatially contiguous regions along the position axis.

Split scan into three stripes: >>> [scan[s] for s in by_scan_stripes(scan, 3)]

FIXME: Only uses the first view to divide the positions. Assumes the positions on all angles are distributed similarly.

Parameters
  • scan ((nscan, 2) float32) – The 2D coordinates of the scan positions.

  • n (int) – The number of stripes.

  • fly (int) – The number of scan positions per frame.

  • axis (int (0 or 1)) – Which spatial dimension to divide along. i.e. horizontal or vertical.

Returns

mask – A list of boolean arrays which divide the scan positions into n stripes.

Return type

list of (nscan, ) boolean

tike.cluster.cluster_compact(*args, **kwargs)[source]#
tike.cluster.cluster_wobbly_center(*args, **kwargs)[source]#
tike.cluster.compact(population, num_cluster, max_iter=500)[source]#

Return the indices that divide population into compact clusters.

Uses an approach that is inspired by the naive k-means algorithm, but it returns equally sized clusters.

Parameters
  • population ((M, N) array_like) – The M samples of an N dimensional population that needs to be clustered.

  • num_cluster (int (0..M]) – The number of clusters in which to divide M samples.

Returns

indicies – The indicies of population that belong to each cluster. Clusters are sorted from largest to smallest.

Return type

(num_cluster,) list of array of integer

Raises

ValueError – If num_cluster is less than 1 or more than 65535. The implementation uses uint16 as cluster tag, so it cannot count more than that number of clusters.

tike.cluster.wobbly_center(population, num_cluster)[source]#

Return the indices that divide population into heterogenous clusters.

Uses a contrarian approach to clustering by maximizing the heterogeneity inside each cluster to ensure that each cluster would be able to capture the entire variance of the original population yielding clusters which are similar to each other in excess to the original population itself.

Parameters
  • population ((M, N) array_like) – The M samples of an N dimensional population that needs to be clustered.

  • num_cluster (int (0..M]) – The number of clusters in which to divide M samples.

Returns

indicies – The indicies of population that belong to each cluster.

Return type

(num_cluster,) list of array of integer

Raises

ValueError – If num_cluster is less than 1 or more than 65535. The implementation uses uint16 as cluster tag, so it cannot count more than that number of clusters.

References

Mishra, Megha, Chandrasekaran Anirudh Bhardwaj, and Kalyani Desikan. “A Maximal Heterogeneity Based Clustering Approach for Obtaining Samples.” arXiv preprint arXiv:1709.01423 (2017).