communicators¶
Module for communicators using threadpool and MPI.
This module implements both the p2p and collective communications among multiple GPUs and multiple nodes.
-
class
tike.communicators.
Comm
(gpu_count, mpi=<class 'tike.communicators.mpi.MPIComm'>, pool=<class 'tike.communicators.pool.ThreadPool'>, **kwargs)[source]¶ Bases:
object
A Ptychography communicator.
Compose the multiprocessing and multithreading communicators to handle synchronization and communication among both GPUs and nodes.
-
gpu_count
¶ The number of GPUs to use per process.
- Type
int
-
mpi
¶ The multi-processing communicator.
- Type
class
-
pool
¶ The multi-threading communicator.
- Type
class
-
-
class
tike.communicators.
MPIComm
[source]¶ Bases:
object
A class for python MPI wrapper.
Many clusters do not support inter-node GPU-GPU communications, so we first gather the data into main memory then communicate them.
-
rank
¶ The identity of this process.
- Type
int
-
size
¶ The total number of MPI processes.
- Type
int
-
-
class
tike.communicators.
ThreadPool
(workers)[source]¶ Bases:
concurrent.futures.thread.ThreadPoolExecutor
Python thread pool plus scatter gather methods.
A Pool is a context manager which provides access to and communications amongst workers.
-
workers
¶ The number of GPUs to use or a tuple of the device numbers of the GPUs to use. If the number of GPUs is less than the requested number, only workers for the available GPUs are allocated.
- Type
int, tuple(int)
- Raises
ValueError – When invalid GPU device ids are provided. When the current CUDA device does not match the first GPU id in the list of workers.
-
gather
(x: list, worker=None, axis=0) → cupy.array[source]¶ Concatenate x on a single worker along the given axis.
-
map
(func, *iterables, **kwargs)[source]¶ ThreadPoolExecutor.map, but wraps call in a cuda.Device context.
-
reduce_gpu
(x: list, worker=None) → cupy.array[source]¶ Reduce x by addition to one GPU from all other GPUs.
-
reduce_mean
(x: list, axis, worker=None) → cupy.array[source]¶ Reduce x by addition to one GPU from all other GPUs.
-
shutdown
(wait=True)¶ Clean-up the resources associated with the Executor.
It is safe to call this method several times. Otherwise, no other methods can be called after this one.
- Parameters
wait – If True then shutdown will not return until all running futures have finished executing and the resources used by the executor have been reclaimed.
-