TensorDock
Search
K

Containers

Learn to deploy Docker containers on TensorDock in under 5 minutes

Introduction

TensorDock Marketplace allows you to easily deploy Docker containers on cloud GPUs. Features include:
  • Load distribution: Container replicas are deployed on multiple hostnodes so that requests to containers can be processed in parallel. This reduces the workload on each machine and allows you to make more requests to the container per minute.
  • Scalability: You can change the number of deployed replicas at any time. Our system will automatically scale the number of replicas depending on GPU usage, so you don't have to worry about handling this logic if you don't want to.
You will need to set an SSH public key in your organization settings before deploying a container, which can be done here.

Example: Deploy a Docker LLM API

In this example, we'll be deploying a vLLM container. Once it's deployed, we will be able to generate endlessly fascinating text by using the container's API.
vLLM has certain hardware requirements that must be met in order to deploy properly. If you are deploying a vLLM container, make sure to deploy at least 2 GPUs per replica, 8 GB of RAM, and ensure that the GPU types you have selected support quantization.
  1. 1.
    Configure basic details about your container, such as its name and Docker image source.
  1. 2.
    Configure how your container should be deployed on TensorDock's hostnodes. Since resource pricing varies by hostnode, you can cap the hourly rate of each replica. Additionally, you can select what GPU types your container should be deployed on. Your choices will be used as fallbacks in case your first choice of GPU is out of stock.
  1. 3.
    Customize the runtime configuration of your Docker container. The configuration below is equivalent to the Docker command:
docker run \
-v ~/.cache/huggingface:/root/.cache/huggingface \
-p 8000:8000 \
--env "HUGGING_FACE_HUB_TOKEN=MY_TOKEN" \
vllm/vllm-openai:latest \
--model thebloke/llama-2-70b-chat-awq
  1. 3.
    Deploy your container. It can take roughly to 10 minutes for the container to boot, but the duration ultimately depends on the container's size.
  2. 4.
    Test the deployed API via the following URL:
https://seahorse-app-gt2it.ondigitalocean.app/{YOUR_CONTAINER_ID}/{API_PATH}
The path is forwarded to the container API, so we can try testing a completion to the /v1/completions route. The request URL looks like this:
https://seahorse-app-gt2it.ondigitalocean.app/df1f50cf-cfc7-4da0-89f1-80c460c49afe/v1/completions