Skip to content

Conversation

@qweeah
Copy link
Contributor

@qweeah qweeah commented Dec 24, 2025

What type of PR is this?
/kind feature

What this PR does / why we need it:

This PR adds support for service-account-based image pull authentication (aka Identity Binding) from Azure Container Registry (ACR), implementing KEP-4412 projected service account tokens for kubelet image credential providers.

Changes

Data Model

Added ServiceAccountImagePullProfile to SecurityProfile with fields:

  • Enabled: Enable/disable service account-based image pull
  • DefaultClientID: Default managed identity client ID
  • DefaultTenantID: Default managed identity tenant ID
  • LocalAuthoritySNI: SNI endpoint for Identity Bindings Local Authority

Added getter methods to SecurityProfile for null-safe access.

Implementation Paths

1. Legacy CSE (Template-based)

  • pkg/agent/variables.go: Template variables for CSE scripts (using the naming convention serviceAccountImagePull...)
  • parts/linux/cloud-init/artifacts/cse_cmd.sh: Environment variable declarations (e.g., SERVICE_ACCOUNT_IMAGE_PULL_ENABLED, SERVICE_ACCOUNT_IMAGE_PULL_DEFAULT_CLIENT_ID)
  • parts/linux/cloud-init/artifacts/cse_config.sh: Credential provider config generation

2. AKSNodeConfig (Proto-based)

  • aks-node-controller/proto/aksnodeconfig/v1/security_profile.proto: Proto definitions (ServiceAccountImagePullProfile)
  • aks-node-controller/parser/parser.go: Environment variable generation
  • aks-node-controller/parser/helper.go: Null-safe helper functions

Both paths converge at cse_config.sh to generate /var/lib/kubelet/credential-provider-config.yaml with identity binding arguments (--ib-default-client-id, --ib-default-tenant-id, --ib-sni-name) and token attributes.

Testing

  • Unit Tests: Updated tests across variables_test.go, datamodel/types_test.go, and parser/helper_test.go to reflect the new naming convention.
  • E2E Tests: Verified scenarios (enabled, disabled, network isolated).

All tests validate that the credential provider config file contains the correct identity binding configuration.

Cluster Support

  • Public cloud (Azure Commercial)
  • AKS Custom Cloud
  • Network Isolated (NI/airgap) clusters

Which issue(s) this PR fixes:

Fixes #

Requirements:

  • uses conventional commit messages
  • includes documentation
  • adds unit tests
  • adds e2e tests
  • tested upgrade from previous version
  • commits are GPG signed and Github marks them as verified

Special notes for your reviewer:

Dual implementation approach (legacy CSE + modern AKSNodeConfig) for backward compatibility. Both paths are fully tested and generate identical credential provider configuration. The renaming from ImagePullIdentity to ServiceAccountImagePull ensures consistency with the upstream feature name.

Release note:

AgentBaker now supports service-account-based image pull authentication from Azure Container Registry (ACR) via ServiceAccountImagePullProfile in SecurityProfile, using projected service account tokens (KEP-4412) for authentication.

@qweeah qweeah changed the title feat: Add ImagePullIdentityProfile to SecurityProfile for identity binding-based image pull feat: add ImagePullIdentityProfile for identity binding-based image pull Dec 24, 2025
@qweeah qweeah marked this pull request as ready for review December 29, 2025 00:35
@norshtein
Copy link
Member

This PR's logic LGTM. But I don't quiet understand what is step 2 "RP integration and configuration flow". Agentbaker won't rely on anything from AKS RP, I think you could make all changes in the same PR and add E2E for it? Here is an example PR: #7059 .

@github-actions
Copy link
Contributor

github-actions bot commented Dec 29, 2025

The latest Buf updates on your PR. Results from workflow Buf CI / buf (pull_request).

BuildFormatLintBreakingUpdated (UTC)
✅ passed✅ passed✅ passed✅ passedJan 13, 2026, 3:40 AM

type SecurityProfile struct {
PrivateEgress *PrivateEgress `json:"privateEgress,omitempty"`
PrivateEgress *PrivateEgress `json:"privateEgress,omitempty"`
ServiceAccountImagePullProfile *ServiceAccountImagePullProfile `json:"serviceAccountImagePullProfile,omitempty"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not very sure, but this profile belongs to ContainerService.Properties, it's not very common(?) to directly edit this struct, we typically append NodeBootstrappingConfiguration for feature implementation. Node sig reviewers may have more insight into this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I saw the comment here mentioning that Properties stands for the AKS cluster definition. Therefore, I thought that AKS-related settings ought to be placed in properties?

Signed-off-by: Billy Zha <[email protected]>
Copy link
Member

@norshtein norshtein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Kindly request review from node sig folks.

IB_ARGS=""
if [ "${SERVICE_ACCOUNT_IMAGE_PULL_ENABLED}" = "true" ]; then
IB_TOKEN_ATTRIBUTES="
tokenAttributes:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: is it possible to remove \n in the line above So it will look like

IB_TOKEN_ATTRIBUTES="tokenAttributes:
      serviceAccountTokenAudience: api://AKSIdentityBinding
...

and put ${IB_TOKEN_ATTRIBUTES} to the next line (line1080)? Seems more readable

optionalServiceAccountAnnotationKeys:
- kubernetes.azure.com/acr-client-id"
IB_ARGS="
- --ib-sni-name=${IDENTITY_BINDINGS_LOCAL_AUTHORITY_SNI}"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here. Remove the \n here?

})
}

func Test_Ubuntu2204Gen2_ImagePullIdentityBinding_Enabled(t *testing.T) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add some tests for Scriptless? Not necessarily all 4 tests, or 2 most significant tests are also good.
Scriptless is a new way to provision a node. We start to release it to production recently and should be available in all regions by the end of this month.

Check tests_ with suffix _scriptless for reference.

type SecurityProfile struct {
PrivateEgress *PrivateEgress `json:"privateEgress,omitempty"`
PrivateEgress *PrivateEgress `json:"privateEgress,omitempty"`
ServiceAccountImagePullProfile *ServiceAccountImagePullProfile `json:"serviceAccountImagePullProfile,omitempty"`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ServiceAccountImagePullProfile doesn't sound like a SecurityProfile to me. Maybe just put it at the same level of SecurityProfile? (Move it one level up). But this is just my opinion. Will ask other teammates for inputs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

components This pull request updates cached components on Linux or Windows VHDs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants