This document describes the development guideline to contribute to ML pipeline project. Please check the main page for instruction on how to deploy a ML pipeline system.
The Pipeline system is included in kubeflow. See Getting Started Guide for how to deploy with Kubeflow.
To be able to use GKE, the Docker images need to be uploaded to a public Docker repository, such as GCR
To build the API server image and upload it to GCR on x86_64 (amd64) machines:
# Run in the repository root directory
$ docker build -t gcr.io/<your-gcp-project>/api-server:latest -f backend/Dockerfile .
# Push to GCR
$ gcloud auth configure-docker
$ docker push gcr.io/<your-gcp-project>/api-server:latestTo build the API server image and upload it to GCR on non-x86_64 machines (such as aarch64 machines):
# Run in the repository root directory
$ docker build --platform linux/amd64 -t gcr.io/<your-gcp-project>/api-server:latest -f backend/Dockerfile .
# Push to GCR
$ gcloud auth configure-docker
$ docker push gcr.io/<your-gcp-project>/api-server:latestBuilding on Apple Silicon: When building on Apple Silicon (M1/M2/M3), QEMU emulation causes issues with FIPS builds. Use
FIPS_ENABLED=0to disable FIPS.# Build all amd64 images FIPS_ENABLED=0 make -C backend image_all # Build arm64 images for local Kind testing FIPS_ENABLED=0 TARGETARCH=arm64 make -C backend image_allBuild options:
FIPS_ENABLED=0: Disables FIPS (required for Apple Silicon)FIPS_ENABLED=1(default): FIPS-compliant build for productionTARGETARCH=arm64|amd64: Target architecture (default: amd64)Production images are built on amd64 CI runners with FIPS enabled.
To build the scheduled workflow controller image and upload it to GCR:
# Run in the repository root directory
$ docker build -t gcr.io/<your-gcp-project>/scheduledworkflow:latest -f backend/Dockerfile.scheduledworkflow .
# Push to GCR
$ gcloud auth configure-docker
$ docker push gcr.io/<your-gcp-project>/scheduledworkflow:latestTo build the viewer CRD controller image and upload it to GCR:
# Run in the repository root directory
$ docker build -t gcr.io/<your-gcp-project>/viewer-crd-controller:latest -f backend/Dockerfile.viewercontroller .
# Push to GCR
$ gcloud auth configure-docker
$ docker push gcr.io/<your-gcp-project>/viewer-crd-controller:latestTo build the persistence agent image and upload it to GCR:
# Run in the repository root directory
$ docker build -t gcr.io/<your-gcp-project>/persistenceagent:latest -f backend/Dockerfile.persistenceagent .
# Push to GCR
$ gcloud auth configure-docker
$ docker push gcr.io/<your-gcp-project>/persistenceagent:latestTo build the visualization server image and upload it to GCR:
# Run in the repository root directory
$ docker build -t gcr.io/<your-gcp-project>/visualization:latest -f backend/Dockerfile.visualization .
# Push to GCR
$ gcloud auth configure-docker
$ docker push gcr.io/<your-gcp-project>/visualization:latestTo build the frontend image and upload it to GCR:
# Run in the repository root directory
$ docker build -t gcr.io/<your-gcp-project>/frontend:latest -f frontend/Dockerfile .
# Push to GCR
$ gcloud auth configure-docker
$ docker push gcr.io/<your-gcp-project>/frontend:latestMinikube can pick your local Docker image so you don't need to upload to remote repository.
For example, to build API server image
$ docker build -t ml-pipeline-api-server -f backend/Dockerfile .Python based visualizations are a new method to visualize results within the Kubeflow Pipelines UI. For more information about Python based visualizations please visit the documentation page. To create predefine visualizations please check the developer guide.
Run unit test for the API server
cd backend/src/ && go test ./...TODO: add instruction
pip install ./dsl/ --upgrade && python ./dsl/tests/main.py
pip install ./dsl-compiler/ --upgrade && python ./dsl-compiler/tests/main.pyCheck this page for more details.
Q: How to access to the database directly?
You can inspect mysql database directly by running:
kubectl run -it --rm --image=docker.io/library/mysql:8.4 --restart=Never mysql-client -- mysql -h mysql
mysql> use mlpipeline;
mysql> select * from jobs;Q: How to inspect object store directly?
Minio provides its own UI to inspect the object store directly:
kubectl port-forward -n ${NAMESPACE} $(kubectl get pods -l app=minio -o jsonpath='{.items[0].metadata.name}' -n ${NAMESPACE}) 9000:9000
Access Key:minio
Secret Key:minio123Q: I see an error of exceeding Github rate limit when deploying the system. What can I do?
See Ksonnet troubleshooting page
Q: How do I check my API server log?
API server logs are located at /tmp directory of the pod. To SSH into the pod, run:
kubectl exec -it -n ${NAMESPACE} $(kubectl get pods -l app=ml-pipeline -o jsonpath='{.items[0].metadata.name}' -n ${NAMESPACE}) -- /bin/shor
kubectl logs -n ${NAMESPACE} $(kubectl get pods -l app=ml-pipeline -o jsonpath='{.items[0].metadata.name}' -n ${NAMESPACE})Q: How to check my cluster status if I am using Minikube?
Minikube provides dashboard for deployment
minikube dashboard