Note that the features described below are currently experimental.
A container image is available for the Multi-Process Service (MPS) control daemon.
Only Volta MPS is supported.
More information on Volta MPS can be found in the Volta architecture whitepaper:
Volta provides very high throughput and low latency for deep learning inference particular when there is a batching system in place to aggregate images to submit to the GPU simultaneously to maximize performance. Without such a batching system, individual inference jobs do not fully utilize execution resources of a GPU. Volta MPS provides an easy option to improve throughput while satisfying latency targets, by permitting many individual inference jobs to be submitted concurrently to the GPU and improving overall GPU utilization.
- NVIDIA GPU with Architecture >= Volta (7.0)
- A supported version of Docker.
- The NVIDIA Container Runtime for Docker.
If you are using Docker Compose, it might further restrict the version of the Docker Engine you need.
You need a version of Docker Compose that supports the Compose file format version 2.3.
A docker-compose.yml file is provided in the sample repository on GitLab:
https://gitlab.com/nvidia/samples/tree/master/mps
$ git clone https://gitlab.com/nvidia/samples.git /tmp/samples
$ cd /tmp/samples/mps
$ export NVIDIA_VISIBLE_DEVICES=0
$ export CUDA_MPS_ACTIVE_THREAD_PERCENTAGE=33
$ docker-compose up
Note: If you want the CUDA sample (here nbody) to run on multiple GPUs, you will need to edit the CLI arguments passed to the nbody executable. e.g:
cat cuda-samples/Dockerfile
FROM nvidia/cuda:9.0-base-ubuntu16.04
RUN apt-get update && apt-get install -y --no-install-recommends \
cuda-samples-$CUDA_PKG_VERSION && \
rm -rf /var/lib/apt/lists/*
WORKDIR /usr/local/cuda/samples/5_Simulations/nbody
RUN make -j"$(nproc)"
# Edit the numdevices option so that it can run on multiple devices
CMD ["./nbody", "-benchmark", "-i=10000", "-numdevices=8"]
To learn more about the implementation details of containerizing MPS, you can look at the comments in the docker-compose.yml file.
The following diagram summarizes the flow and the interactions for Docker Compose:
