(self-managed-http-load-test)=
You need a working nvcf-cli configured against your self-managed cluster.
If you have not set this up yet, follow the {ref}self-hosted-cli guide to
install the binary and the {ref}cli-configuration section to point it at your
gateway.
Verify the CLI can reach the cluster before continuing:
./nvcf-cli init
./nvcf-cli api-key generateUse the load_tester_supreme container for load testing. It is purpose-built for high-throughput benchmarking and includes:
- gRPC + HTTP + SSE endpoints in a single image
- Tunable
repeats,delay, andsizefields to shape request/response profiles - Built-in OpenTelemetry tracing
The source, build instructions, and registry push examples are in the nv-cloud-function-helpers repository. Build and push the image to whichever container registry your cluster has credentials for:
git clone https://github.com/NVIDIA/nv-cloud-function-helpers.git
cd nv-cloud-function-helpers/examples/function_samples/load_tester_supreme
# Build
docker build --platform linux/amd64 -t load_tester_supreme .
# Tag and push (replace with your registry -- NGC, ECR, etc.)
docker tag load_tester_supreme nvcr.io/<your-org>/load_tester_supreme:latest
docker push nvcr.io/<your-org>/load_tester_supreme:latest:::{tip}
To check which registries your cluster recognises, run
./nvcf-cli registry list.
:::
Then create the function and deploy it using the CLI:
# Create the function (HTTP)
./nvcf-cli function create \
--name "load-tester-supreme" \
--image "nvcr.io/<your-org>/load_tester_supreme:latest" \
--inference-url "/echo" \
--inference-port 8000 \
--health-uri "/health" \
--health-port 8000 \
--health-timeout PT30S
# Deploy (adjust GPU type and instance type for your cluster)
./nvcf-cli function deploy create \
--gpu H100 \
--instance-type NCP.GPU.H100_1x \
--min-instances 1 \
--max-instances 1 \
--function-id <function id> \
--version-id <version id>
# Generate an API key for invocations
./nvcf-cli api-key generateOnce deployed, note the following -- you will need them for the run script:
- Function ID -- the UUID returned by
function create - Function Version ID -- the UUID of the specific deployed version
- API key -- from
./nvcf-cli api-key generate(begins withnvapi-)
Your gateway address is the external address of the Envoy Gateway deployed with the control plane. To retrieve it:
export GATEWAY_ADDR=$(kubectl get gateway nvcf-gateway -n envoy-gateway \
-o jsonpath='{.status.addresses[0].value}')
echo "Gateway Address: $GATEWAY_ADDR"On AWS EKS this is an ELB hostname (e.g.
a1b2c3d4.us-east-1.elb.amazonaws.com). For a local deployment (Kind,
k3d, Docker Desktop) it is typically localhost or 127.0.0.1.
git clone https://github.com/NVIDIA/nv-cloud-function-helpers.git
cd nv-cloud-function-helpers/examples/load-testsInstall k6 if you don't have it:
# macOS
brew install k6# Linux (Debian/Ubuntu)
sudo gpg -k
sudo gpg --no-default-keyring --keyring /usr/share/keyrings/k6-archive-keyring.gpg \
--keyserver hkp://keyserver.ubuntu.com:80 \
--recv-keys C5AD17C747E3415A3642D57D77C6C491D6AC1D69
echo "deb [signed-by=/usr/share/keyrings/k6-archive-keyring.gpg] https://dl.k6.io/deb stable main" \
| sudo tee /etc/apt/sources.list.d/k6.list
sudo apt-get update && sudo apt-get install k6The run*.sh scripts are gitignored, so each user creates their own locally.
Create run_http_self_managed_test.sh in the load-tests directory:
#!/bin/bash
set -e
export GATEWAY_ADDR=<your-gateway-address>
export TOKEN=<your-nvapi-key>
export FUNCTION_ID=<your-function-id>
# This local/self-managed example uses `http://` because it does not assume TLS
# termination. Use `https://` if your gateway terminates TLS.
# `/echo` matches the inference path configured for the load_tester_supreme
# example. If your function uses a different inference path, replace `/echo`
# with that path in `HTTP_SUPREME_NVCF_URL` and the curl example below.
export HTTP_SUPREME_NVCF_URL="http://${FUNCTION_ID}.invocation.${GATEWAY_ADDR}/echo"
export SENT_MESSAGE_SIZE=32
export RESPONSE_COUNT=1
k6 run functions/supreme_http_test.js \
--vus 10 --duration 60s \
-e TOKEN=${TOKEN} \
-e HTTP_SUPREME_NVCF_URL=${HTTP_SUPREME_NVCF_URL} \
-e SENT_MESSAGE_SIZE=${SENT_MESSAGE_SIZE} \
-e RESPONSE_COUNT=${RESPONSE_COUNT}Make it executable and run:
chmod +x run_http_self_managed_test.sh
./run_http_self_managed_test.shEach VU simulates a single concurrent HTTP client, sending requests in a loop and holding the invocation connection open while waiting for a response. The number of VUs directly controls the concurrency hitting your endpoint.
| VUs | Simulates |
|---|---|
| 1--5 | Smoke test -- verify the endpoint works under minimal load |
| 10--50 | Light load -- a small team or service calling the function |
| 100--500 | Moderate load -- multiple services or a rollout with real traffic |
| 1000+ | Stress test -- find the breaking point or max throughput |
Fixed VUs for a set duration (simplest approach):
# 10 concurrent users for 1 minute
k6 run functions/supreme_http_test.js --vus 10 --duration 60s ...
# 200 concurrent users for 10 minutes
k6 run functions/supreme_http_test.js --vus 200 --duration 10m ...Ramping VUs with a config file (recommended for real load tests):
Example k6_rampup_config.json:
{
"cloud": {
"projectID": 3695020
},
"scenarios": {
"rampup_scenario": {
"executor": "ramping-vus",
"startVUs": 0,
"gracefulRampDown": "30s",
"gracefulStop": "30s",
"stages": [
{ "duration": "1m", "target": 5 },
{ "duration": "2m", "target": 5 },
{ "duration": "1m", "target": 25 },
{ "duration": "2m", "target": 25 },
{ "duration": "1m", "target": 100 },
{ "duration": "2m", "target": 100 },
{ "duration": "1m", "target": 500 },
{ "duration": "2m", "target": 500 },
{ "duration": "1m", "target": 1000 },
{ "duration": "2m", "target": 1000 },
{ "duration": "1m", "target": 0 }
]
}
}
}k6 run functions/supreme_http_test.js \
--config k6_rampup_config.json \
-e TOKEN=${TOKEN} \
-e HTTP_SUPREME_NVCF_URL=${HTTP_SUPREME_NVCF_URL} \
-e SENT_MESSAGE_SIZE=${SENT_MESSAGE_SIZE} \
-e RESPONSE_COUNT=${RESPONSE_COUNT}| Variable | Purpose |
|---|---|
TOKEN |
Your nvapi-* bearer token from ./nvcf-cli api-key generate |
FUNCTION_ID |
Function ID for invocation routing |
HTTP_SUPREME_NVCF_URL |
Invocation URL for the function inference path: http://<function-id>.invocation.<gateway-addr>/echo |
SENT_MESSAGE_SIZE |
Size of the test payload in bytes |
RESPONSE_COUNT |
Number of responses the server should return |
Then verify the endpoint works with curl:
curl -v -X POST \
"${HTTP_SUPREME_NVCF_URL}" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $TOKEN" \
-H "Nvcf-Poll-Seconds: 5" \
-d '{"message": "hello", "repeats": 1}'You should receive a 200 OK response with the Nvcf-Status: fulfilled
header.