Worker¶
Dask-CUDA workers extend the standard Dask worker in two ways:
Advanced networking configuration
GPU Memory Pool configuration
These configurations can be defined in the single cluster use case with LocalCUDACluster
or passed to workers on the cli with dask-cuda-worker
Single Cluster configuration¶
Dask-CUDA can be configured for single machine clusters with multiple GPUs such as as DGX1 or DGX2. Below is an example of configuring a single machine Dask cluster on a DGX2 with an RMM pool and NVLink enabled
from dask.distributed import Client
from dask_cuda import LocalCUDACluster
from dask_cuda.initialize import initialize
# Configurations
protocol = "ucx"
interface = "enp6s0" # DGX-2
enable_tcp_over_ucx = True
enable_nvlink = True
enable_infiniband = False
initialize(
create_cuda_context=True,
enable_tcp_over_ucx=enable_tcp_over_ucx,
enable_infiniband=enable_infiniband,
enable_nvlink=enable_nvlink,
)
cluster = LocalCUDACluster(local_directory="/tmp/USERNAME",
protocol=protocol,
interface=interface,
enable_tcp_over_ucx=enable_tcp_over_ucx,
enable_infiniband=enable_infiniband,
enable_nvlink=enable_nvlink,
rmm_pool_size="25GB",
)
client = Client(cluster)
Command Line Tool¶
New configuration options:
--interface TEXT The external interface used to connect to
the scheduler, usually an ethernet interface
is used for connection, and not an
InfiniBand interface (if one is available).
--device-memory-limit TEXT Bytes of memory per CUDA device that the
worker can use. This can be an integer
(bytes), float (fraction of total device
memory), string (like 5GB or 5000M), 'auto',
or zero for no memory management (i.e.,
allow full device memory usage).
--rmm-pool-size TEXT If specified, initialize each worker with an
RMM pool of the given size, otherwise no RMM
pool is created. This can be an integer
(bytes) or string (like 5GB or 5000M).
NOTE: This size is a per worker (i.e., per
GPU) configuration, and not cluster-wide!
--enable-tcp-over-ucx / --disable-tcp-over-ucx
Enable TCP communication over UCX
--enable-infiniband / --disable-infiniband
Enable InfiniBand communication
--enable-nvlink / --disable-nvlink
Enable NVLink communication
--net-devices TEXT When None (default), 'UCX_NET_DEVICES' will
be left to its default. Otherwise, it must
be a non-empty string with the interface
name. Normally used only with --enable-
infiniband to specify the interface to be
used by the worker, such as 'mlx5_0:1' or
'ib0'.