cluster#

tike.cluster.by_scan_grid(*args, pool, shape, dtype, destination, scan, fly=1)[source]#

Split the field of view into a 2D grid.

Mask divide the data into a 2D grid of spatially contiguous regions.

Parameters
  • shape (tuple of int) – The number of grid divisions along each dimension.

  • dtype (List[str]) – The datatypes of the args after splitting.

  • scan ((nscan, 2) float32) – The 2D coordinates of the scan positions.

  • args ((nscan, ...) float32 or None) – The arrays to be split by scan position.

  • fly (int) – The number of scan positions per frame.

  • pool (ThreadPool) –

  • destination (List[str]) –

Returns

  • order (List[array[int]]) – The locations of the inputs in the original arrays.

  • scan (List[array[float32]]) – The divided 2D coordinates of the scan positions.

  • args (List[array[float32]] or None) – Each input divided into regions or None if arg was None.

tike.cluster.by_scan_stripes(scan, n, fly=1, axis=0)[source]#

Return n boolean masks that split the field of view into stripes.

Mask divide the data into spatially contiguous regions along the position axis.

Split scan into three stripes: >>> [scan[s] for s in by_scan_stripes(scan, 3)]

FIXME: Only uses the first view to divide the positions. Assumes the positions on all angles are distributed similarly.

Parameters
  • scan ((nscan, 2) float32) – The 2D coordinates of the scan positions.

  • n (int) – The number of stripes.

  • fly (int) – The number of scan positions per frame.

  • axis (int (0 or 1)) – Which spatial dimension to divide along. i.e. horizontal or vertical.

Returns

mask – A list of boolean arrays which divide the scan positions into n stripes.

Return type

list of (nscan, ) boolean

tike.cluster.by_scan_stripes_contiguous(*args, pool, shape, dtype, destination, scan, fly=1, batch_method, num_batch)[source]#

Split data by into stripes and create contiguously ordered batches.

Divide the field of view into one stripe per devices; within each stripe, create batches according to the batch_method loading the batches into contiguous blocks in device memory.

Parameters
  • shape (tuple of int) – The number of grid divisions along each dimension.

  • dtype (List[str]) – The datatypes of the args after splitting.

  • scan ((nscan, 2) float32) – The 2D coordinates of the scan positions.

  • args ((nscan, ...) float32 or None) – The arrays to be split by scan position.

  • fly (int) – The number of scan positions per frame.

  • batch_method – The method for determining the batches after dividing amongst GPUs

  • pool (ThreadPool) –

  • destination (List[str]) –

  • num_batch (int) –

Returns

  • order (List[array[int]]) – The locations of the inputs in the original arrays.

  • batches (List[List[array[int]]]) – The locations of the elements of each batch

  • scan (List[array[float32]]) – The divided 2D coordinates of the scan positions.

  • args (List[array[float32]] or None) – Each input divided into regions or None if arg was None.

Return type

Tuple[List[numpy.typing.NDArray], List[List[numpy.typing.NDArray]]]

tike.cluster.cluster_compact(*args, **kwargs)[source]#
tike.cluster.cluster_wobbly_center(*args, **kwargs)[source]#
tike.cluster.compact(population, num_cluster, max_iter=500)[source]#

Return the indices that divide population into compact clusters.

Uses an approach that is inspired by the naive k-means algorithm, but it returns equally sized clusters.

Parameters
  • population ((M, N) array_like) – The M samples of an N dimensional population that needs to be clustered.

  • num_cluster (int (0..M]) – The number of clusters in which to divide M samples.

Returns

indicies – The indicies of population that belong to each cluster. Clusters are sorted from largest to smallest.

Return type

(num_cluster,) list of array of integer

Raises

ValueError – If num_cluster is less than 1 or more than 65535. The implementation uses uint16 as cluster tag, so it cannot count more than that number of clusters.

tike.cluster.stripes_equal_count(population, num_cluster, dim=0)[source]#

Return indices dividing the population into stripes of equal count.

The returned clusters are divided along the provided dimension into clusters of approximate equal numbers of elements.

Parameters
  • population ((M, N) array_like) – The M samples of an N dimensional population that needs to be clustered.

  • num_cluster (int (0..M]) – The number of clusters in which to divide M samples.

  • dim (int) – The dimension (of N) along which the population is divided.

Returns

indicies – The indicies of population that belong to each cluster.

Return type

(num_cluster,) list of array of integer

tike.cluster.wobbly_center(population, num_cluster)[source]#

Return the indices that divide population into heterogenous clusters.

Uses a contrarian approach to clustering by maximizing the heterogeneity inside each cluster to ensure that each cluster would be able to capture the entire variance of the original population yielding clusters which are similar to each other in excess to the original population itself.

Parameters
  • population ((M, N) array_like) – The M samples of an N dimensional population that needs to be clustered.

  • num_cluster (int (0..M]) – The number of clusters in which to divide M samples.

Returns

indicies – The indicies of population that belong to each cluster.

Return type

(num_cluster,) list of array of integer

Raises

ValueError – If num_cluster is less than 1 or more than 65535. The implementation uses uint16 as cluster tag, so it cannot count more than that number of clusters.

References

Mishra, Megha, Chandrasekaran Anirudh Bhardwaj, and Kalyani Desikan. “A Maximal Heterogeneity Based Clustering Approach for Obtaining Samples.” arXiv preprint arXiv:1709.01423 (2017).

tike.cluster.wobbly_center_random_bootstrap(population, num_cluster, boot_fraction=0.95)[source]#

Return the indices that divide population into heterogenous clusters.

Uses a hybrid approach to generate heterogenous clusters. First, a fraction of the population is divided into clusters randomly, then the wobbly center algorithm is used to distriube the remaining segment of the population with the goal of maximizing intracluster heterogeneity.

Parameters
  • population ((M, N) array_like) – The M samples of an N dimensional population that needs to be clustered.

  • num_cluster (int (0..M]) – The number of clusters in which to divide M samples.

  • boot_fraction ((0, 1]) – The percentage of each cluster that is randomly assigned before starting the wobbly center algorithm.

Returns

indicies – The indicies of population that belong to each cluster.

Return type

(num_cluster,) list of array of integer

Raises

ValueError – If num_cluster is less than 1 or more than 65535. The implementation uses uint16 as cluster tag, so it cannot count more than that number of clusters.

References

Mishra, Megha, Chandrasekaran Anirudh Bhardwaj, and Kalyani Desikan. “A Maximal Heterogeneity Based Clustering Approach for Obtaining Samples.” arXiv preprint arXiv:1709.01423 (2017).