Skip to content

test: Shutdown Kafka Brokers before ZooKeeper/KRaft Controllers#956

Merged
NickLarsenNZ merged 6 commits into
mainfrom
test/shutdown-kafka-before-zookeeper
May 8, 2026
Merged

test: Shutdown Kafka Brokers before ZooKeeper/KRaft Controllers#956
NickLarsenNZ merged 6 commits into
mainfrom
test/shutdown-kafka-before-zookeeper

Conversation

@NickLarsenNZ
Copy link
Copy Markdown
Member

@NickLarsenNZ NickLarsenNZ commented Apr 10, 2026

I noticed tests would sometimes fail due to timeouts deleting the namespace. It turned out ZooKeeper pods were shut down before Kafka Brokers, and the brokers were trying to communicate with ZooKeeper (or the KRaft controller).

Tip

This fix was failing for KRaft clusters. After brokers have been scaled down and we attempt to scale down controllers, we see this error:

Warning  BuildConfigMap  2s (x22 over 107s)  kafkacluster.kafka.stackable.tech  failed to build configmap: no Kraft controllers found to build

This might depend on #955

I have reverted the cc853b9 and will raise it in a new PR.

Last test results

Note: Only KRaft tests were failing, but since reverting cc853b9, they are irrelevant in this PR.

--- FAIL: kuttl/harness/configuration_kafka-latest-3.9.1_openshift-false (312.10s)
--- FAIL: kuttl/harness/logging_kafka-3.9.1_zookeeper-latest-3.9.4_openshift-false (511.70s)
--- FAIL: kuttl/harness/operations-kraft_kafka-kraft-3.9.1_openshift-false (587.05s)
--- FAIL: kuttl/harness/operations-kraft_kafka-kraft-4.1.1_openshift-false (649.95s)
--- FAIL: kuttl/harness/smoke-kraft_kafka-kraft-3.9.1_openshift-false (553.01s)
--- FAIL: kuttl/harness/smoke-kraft_kafka-kraft-4.1.1_openshift-false (759.79s)
--- FAIL: kuttl/harness/upgrade_upgrade_old-3.9.1_upgrade_new-4.1.1_use-client-tls-false_use-client-auth-tls-false_openshift-false (396.50s)
--- FAIL: kuttl/harness/upgrade_upgrade_old-3.9.1_upgrade_new-4.1.1_use-client-tls-false_use-client-auth-tls-true_openshift-false (398.25s)
--- FAIL: kuttl/harness/upgrade_upgrade_old-3.9.1_upgrade_new-4.1.1_use-client-tls-true_use-client-auth-tls-false_openshift-false (395.12s)
--- FAIL: kuttl/harness/upgrade_upgrade_old-3.9.1_upgrade_new-4.1.1_use-client-tls-true_use-client-auth-tls-true_openshift-false (413.45s)
--- PASS: kuttl/harness/cluster-operation_kafka-latest-3.9.1_zookeeper-latest-3.9.4_openshift-false (55.44s)
--- PASS: kuttl/harness/delete-rolegroup_kafka-3.9.1_zookeeper-latest-3.9.4_openshift-false (38.70s)
--- PASS: kuttl/harness/kerberos_kafka-3.9.1_zookeeper-latest-3.9.4_openshift-false_krb5-1.21.1_kerberos-realm-CLUSTER.LOCAL_kerberos-backend-mit_broker-listener-class-cluster-internal_bootstrap-listener-class-cluster-internal (127.73s)
--- PASS: kuttl/harness/kerberos_kafka-3.9.1_zookeeper-latest-3.9.4_openshift-false_krb5-1.21.1_kerberos-realm-CLUSTER.LOCAL_kerberos-backend-mit_broker-listener-class-cluster-internal_bootstrap-listener-class-external-stable (135.15s)
--- PASS: kuttl/harness/kerberos_kafka-3.9.1_zookeeper-latest-3.9.4_openshift-false_krb5-1.21.1_kerberos-realm-CLUSTER.LOCAL_kerberos-backend-mit_broker-listener-class-cluster-internal_bootstrap-listener-class-external-unstable (149.73s)
--- PASS: kuttl/harness/kerberos_kafka-3.9.1_zookeeper-latest-3.9.4_openshift-false_krb5-1.21.1_kerberos-realm-PROD.MYCORP_kerberos-backend-mit_broker-listener-class-cluster-internal_bootstrap-listener-class-cluster-internal (140.33s)
--- PASS: kuttl/harness/kerberos_kafka-3.9.1_zookeeper-latest-3.9.4_openshift-false_krb5-1.21.1_kerberos-realm-PROD.MYCORP_kerberos-backend-mit_broker-listener-class-cluster-internal_bootstrap-listener-class-external-stable (174.00s)
--- PASS: kuttl/harness/kerberos_kafka-3.9.1_zookeeper-latest-3.9.4_openshift-false_krb5-1.21.1_kerberos-realm-PROD.MYCORP_kerberos-backend-mit_broker-listener-class-cluster-internal_bootstrap-listener-class-external-unstable (144.87s)
--- PASS: kuttl/harness/opa_kafka-latest-3.9.1_zookeeper-latest-3.9.4_opa-latest-1.12.3_use-opa-tls-false_openshift-false_krb5-1.21.1 (124.94s)
--- PASS: kuttl/harness/opa_kafka-latest-3.9.1_zookeeper-latest-3.9.4_opa-latest-1.12.3_use-opa-tls-true_openshift-false_krb5-1.21.1 (146.65s)
--- PASS: kuttl/harness/smoke_kafka-3.9.1_zookeeper-3.9.4_use-client-tls-false_openshift-false (79.51s)
--- PASS: kuttl/harness/smoke_kafka-3.9.1_zookeeper-3.9.4_use-client-tls-true_openshift-false (89.54s)
--- PASS: kuttl/harness/tls_kafka-3.9.1_zookeeper-latest-3.9.4_use-client-tls-false_use-client-auth-tls-false_openshift-false (46.43s)
--- PASS: kuttl/harness/tls_kafka-3.9.1_zookeeper-latest-3.9.4_use-client-tls-false_use-client-auth-tls-true_openshift-false (196.36s)
--- PASS: kuttl/harness/tls_kafka-3.9.1_zookeeper-latest-3.9.4_use-client-tls-true_use-client-auth-tls-false_openshift-false (228.95s)
--- PASS: kuttl/harness/tls_kafka-3.9.1_zookeeper-latest-3.9.4_use-client-tls-true_use-client-auth-tls-true_openshift-false (192.11s)

This ensures Kafka pods don't hang when zookeeper is shutdown before kafka on namespace deletion by kuttl
This ensures Kafka brokers don't hang when kraft controlers are shutdown on namespace deletion by kuttl
This reverts commit cc853b9.

This relies on a bug fix that should come in with #955
@NickLarsenNZ NickLarsenNZ marked this pull request as ready for review May 7, 2026 15:49
@NickLarsenNZ NickLarsenNZ moved this to Development: Waiting for Review in Stackable Engineering May 7, 2026
@NickLarsenNZ NickLarsenNZ requested review from adwk67 and sbernauer May 7, 2026 15:50
@NickLarsenNZ NickLarsenNZ enabled auto-merge May 7, 2026 15:51
@NickLarsenNZ
Copy link
Copy Markdown
Member Author

NickLarsenNZ commented May 7, 2026

See also #963 - @adwk67 and I discussed the gracefulShutdownTimeout: 60s as potentially masking the issue.

We can bring back the broker downscaling (for KRaft clusters) with #955.

@adwk67 adwk67 moved this from Development: Waiting for Review to Development: In Review in Stackable Engineering May 8, 2026
NickLarsenNZ and others added 2 commits May 8, 2026 10:05
Copy link
Copy Markdown
Member

@adwk67 adwk67 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@NickLarsenNZ NickLarsenNZ added this pull request to the merge queue May 8, 2026
Merged via the queue into main with commit 4e3e44e May 8, 2026
10 checks passed
@NickLarsenNZ NickLarsenNZ deleted the test/shutdown-kafka-before-zookeeper branch May 8, 2026 08:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Development: In Review

Development

Successfully merging this pull request may close these issues.

2 participants