communicators¶
Module for communicators using threadpool and MPI.
This module implements both the p2p and collective communications among multiple GPUs and multiple nodes.
-
class
tike.communicators.Comm(gpu_count, mpi=<class 'tike.communicators.mpi.MPIComm'>, pool=<class 'tike.communicators.pool.ThreadPool'>, **kwargs)[source]¶ Bases:
objectA Ptychography communicator.
Compose the multiprocessing and multithreading communicators to handle synchronization and communication among both GPUs and nodes.
-
gpu_count¶ The number of GPUs to use per process.
- Type
int
-
mpi¶ The multi-processing communicator.
- Type
class
-
pool¶ The multi-threading communicator.
- Type
class
-
-
class
tike.communicators.MPIComm[source]¶ Bases:
objectA class for python MPI wrapper.
Many clusters do not support inter-node GPU-GPU communications, so we first gather the data into main memory then communicate them.
-
rank¶ The identity of this process.
- Type
int
-
size¶ The total number of MPI processes.
- Type
int
-
-
class
tike.communicators.ThreadPool(workers)[source]¶ Bases:
concurrent.futures.thread.ThreadPoolExecutorPython thread pool plus scatter gather methods.
A Pool is a context manager which provides access to and communications amongst workers.
-
workers¶ The number of GPUs to use or a tuple of the device numbers of the GPUs to use. If the number of GPUs is less than the requested number, only workers for the available GPUs are allocated.
- Type
int, tuple(int)
- Raises
ValueError – When invalid GPU device ids are provided. When the current CUDA device does not match the first GPU id in the list of workers.
-
gather(x: list, worker=None, axis=0) → cupy.array[source]¶ Concatenate x on a single worker along the given axis.
-
map(func, *iterables, **kwargs)[source]¶ ThreadPoolExecutor.map, but wraps call in a cuda.Device context.
-
reduce_gpu(x: list, worker=None) → cupy.array[source]¶ Reduce x by addition to one GPU from all other GPUs.
-
reduce_mean(x: list, axis, worker=None) → cupy.array[source]¶ Reduce x by addition to one GPU from all other GPUs.
-
shutdown(wait=True)¶ Clean-up the resources associated with the Executor.
It is safe to call this method several times. Otherwise, no other methods can be called after this one.
- Parameters
wait – If True then shutdown will not return until all running futures have finished executing and the resources used by the executor have been reclaimed.
-