Skip to content

USHIFT-6957: Use nodename as Prometheus instance#6659

Open
sjug wants to merge 1 commit into
openshift:mainfrom
sjug:prom-instance-nodename
Open

USHIFT-6957: Use nodename as Prometheus instance#6659
sjug wants to merge 1 commit into
openshift:mainfrom
sjug:prom-instance-nodename

Conversation

@sjug
Copy link
Copy Markdown
Contributor

@sjug sjug commented May 12, 2026

Switches the Prometheus instance label from host:port to the Kubernetes NodeName so the same queries work uniformly against MicroShift and OpenShift cluster-monitoring. The label value is now resolved by replicating MicroShift's own NodeName logic (node.hostnameOverride from /etc/microshift/config.yaml and drop-ins under config.d/ in lexical order, lowercased; fallback to OS hostname) instead of using the inventory hostname, so the scrape label tracks what kubelet actually advertises even when an override is configured.

Label Prometheus metrics with the Kubernetes NodeName instead of
host:port to match the OpenShift cluster-monitoring convention so the
same tooling and queries can target MicroShift and OpenShift uniformly.
@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label May 12, 2026
@openshift-ci-robot
Copy link
Copy Markdown

openshift-ci-robot commented May 12, 2026

@sjug: This pull request references USHIFT-6957 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "5.0.0" version, but no target version was set.

Details

In response to this:

Label Prometheus metrics with the Kubernetes NodeName instead of host:port to match OpenShift cluster-monitoring's convention so the same tooling and queries can target MicroShift and OpenShift uniformly.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 12, 2026

Walkthrough

The PR updates Ansible infrastructure to resolve Kubernetes node names from MicroShift configuration files and propagate that name through Prometheus scrape job configurations, kubelet logging blocks, and network query defaults, while adding CGNAT network support.

Changes

Kubernetes Node Name Resolution & Prometheus Configuration

Layer / File(s) Summary
Node name resolution from MicroShift config
ansible/roles/common/tasks/nodename.yml
New task file reads /etc/microshift/config.yaml and drop-in files under /etc/microshift/config.d, parses YAML to extract node.hostnameOverride, computes merged override (later drop-ins take precedence), and sets kubernetes_nodename fact with fallback to ansible_hostname.
Playbook orchestration and fact usage
ansible/setup-node.yml, ansible/roles/microshift-start/tasks/main.yml
Playbook adds new "Resolve microshift nodename" play to run nodename resolution before logging setup. microshift-start includes nodename task and updates Prometheus instance reference to use kubernetes_nodename instead of inventory_hostname:9100.
Prometheus scrape job instance labels
ansible/roles/add-kubelet-logging/tasks/main.yml, ansible/roles/install-logging/templates/prometheus.yml.j2
Kubelet and cadvisor scrape jobs now add labels.instance derived from kubernetes_nodename (with fallback to host). Template-based node, process, and crio jobs similarly emit per-host targets with instance labels.
Network query defaults and CGNAT support
ansible/roles/common/tasks/files/prom-network-query.py
Default Prometheus URL changes from http://zen3:9091 to http://prometheus:9091. IP validation now treats CGNAT range (100.64.0.0/10) as local/private in addition to loopback, link-local, and RFC1918. Help text updated to reflect new defaults.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 11 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 25.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (11 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly summarizes the main change: using Kubernetes nodename as the Prometheus instance label instead of host:port, which is the core objective across all modified files.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Stable And Deterministic Test Names ✅ Passed PR contains only Ansible config files (YAML, Python, Jinja2), no Go test files. The custom check for Ginkgo test names is not applicable.
Test Structure And Quality ✅ Passed PR contains no Ginkgo test code. Changes are only to Ansible roles, tasks, templates, and Python utilities. Custom check for test quality requirements is not applicable.
Microshift Test Compatibility ✅ Passed No new Ginkgo e2e tests added in this PR. Changes are limited to Ansible playbooks/roles for Prometheus configuration.
Single Node Openshift (Sno) Test Compatibility ✅ Passed No Ginkgo e2e tests added. PR contains only Ansible automation and configuration changes (YAML, Python, Jinja templates). SNO test compatibility check not applicable.
Topology-Aware Scheduling Compatibility ✅ Passed PR is Ansible automation/configuration code only. No Kubernetes manifests, operator code, or scheduling constraints introduced. Check applies only to deployment manifests and operator code.
Ote Binary Stdout Contract ✅ Passed PR modifies only Ansible playbooks, Python utility scripts, and Jinja2 templates. Contains no OTE binaries, Go test code, or Ginkgo test framework. Check not applicable.
Ipv6 And Disconnected Network Test Compatibility ✅ Passed No new Ginkgo e2e tests added in this PR. Changes consist only of Ansible playbooks, roles, Python scripts, and Jinja2 templates. Check is not applicable.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci openshift-ci Bot requested review from ggiguash and pmtk May 12, 2026 17:20
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 12, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: sjug
Once this PR has been reviewed and has the lgtm label, please assign jerpeter1 for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (3)
ansible/roles/common/tasks/nodename.yml (2)

33-54: ⚡ Quick win

Add error handling for malformed YAML in config files.

The from_yaml filter will raise an error if config files contain invalid YAML, but there's no explicit handling for this case. A malformed drop-in will fail the entire playbook rather than gracefully falling back to ansible_hostname.

Consider wrapping the YAML parsing in a rescue block or validating files before parsing.

🛡️ Option: Add validation task

Before line 40, add:

    - name: Validate YAML syntax in config files
      ansible.builtin.command:
        cmd: python3 -c "import yaml; yaml.safe_load(open('{{ item }}'))"
      loop: "{{ _microshift_configs.results | selectattr('content', 'defined') | map(attribute='item') | list }}"
      register: _yaml_validation
      failed_when: false
      changed_when: false

    - name: Warn about invalid YAML files
      ansible.builtin.debug:
        msg: "Warning: {{ item.item }} contains invalid YAML and will be skipped"
      loop: "{{ _yaml_validation.results | selectattr('rc', 'defined') | selectattr('rc', 'ne', 0) | list }}"
      when: _yaml_validation is defined

Then filter out invalid files in the merge logic.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@ansible/roles/common/tasks/nodename.yml` around lines 33 - 54, The YAML
parsing in the set_fact block for _microshift_hostname_override uses from_yaml
on r.content and can raise on malformed YAML; update the logic to handle parse
errors by wrapping the from_yaml call in a Jinja2 try/rescue (or pre-validate
files) so that if parsing fails for a given r in _microshift_configs.results it
is skipped and does not abort the playbook, and ensure the namespace value
fallback remains ansible_hostname; specifically modify the template inside the
Compute hostnameOverride task (referencing _microshift_configs, parsed,
ns.value, and from_yaml) to catch exceptions (or check validity before parsing)
and only set ns.value when parsing succeeds.

43-54: ⚖️ Poor tradeoff

Consider simplifying the Jinja template logic.

The namespace-based iteration and nested conditionals are dense. This could be refactored into a custom Ansible filter or a separate Python script for better testability and maintainability.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@ansible/roles/common/tasks/nodename.yml` around lines 43 - 54, The Jinja loop
is too dense—replace the inline namespace/loop with a custom Ansible filter
(e.g., filter name extract_hostname_override) that accepts
_microshift_configs.results, decodes each r.content, parses YAML, and returns
parsed.node.hostnameOverride or '' as a default; then call
extract_hostname_override from the template instead of the current block.
Implement the filter in filter_plugins (function name extract_hostname_override)
to handle b64decode/from_yaml, mapping checks for parsed and parsed.node, and
safe defaults so the template becomes a single function call returning ns.value.
ansible/setup-node.yml (1)

51-57: 💤 Low value

Minor: Inconsistent capitalization in task name.

The play name uses "Resolve microshift nodename" (line 51) but the task name uses "resolve kubernetes nodename" (line 56). Consider consistent capitalization for readability.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@ansible/setup-node.yml` around lines 51 - 57, The play uses "Resolve
microshift nodename" but the task is titled "resolve kubernetes nodename" which
is inconsistent; update the task name string "resolve kubernetes nodename" to
match the play's capitalization and wording (e.g., "Resolve microshift nodename"
or at least capitalize and standardize to "Resolve Kubernetes nodename") so both
the play name and the task name are consistent; edit the task title in the
include_tasks block that references roles/common/tasks/nodename.yml.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@ansible/roles/common/tasks/nodename.yml`:
- Around line 33-54: The YAML parsing in the set_fact block for
_microshift_hostname_override uses from_yaml on r.content and can raise on
malformed YAML; update the logic to handle parse errors by wrapping the
from_yaml call in a Jinja2 try/rescue (or pre-validate files) so that if parsing
fails for a given r in _microshift_configs.results it is skipped and does not
abort the playbook, and ensure the namespace value fallback remains
ansible_hostname; specifically modify the template inside the Compute
hostnameOverride task (referencing _microshift_configs, parsed, ns.value, and
from_yaml) to catch exceptions (or check validity before parsing) and only set
ns.value when parsing succeeds.
- Around line 43-54: The Jinja loop is too dense—replace the inline
namespace/loop with a custom Ansible filter (e.g., filter name
extract_hostname_override) that accepts _microshift_configs.results, decodes
each r.content, parses YAML, and returns parsed.node.hostnameOverride or '' as a
default; then call extract_hostname_override from the template instead of the
current block. Implement the filter in filter_plugins (function name
extract_hostname_override) to handle b64decode/from_yaml, mapping checks for
parsed and parsed.node, and safe defaults so the template becomes a single
function call returning ns.value.

In `@ansible/setup-node.yml`:
- Around line 51-57: The play uses "Resolve microshift nodename" but the task is
titled "resolve kubernetes nodename" which is inconsistent; update the task name
string "resolve kubernetes nodename" to match the play's capitalization and
wording (e.g., "Resolve microshift nodename" or at least capitalize and
standardize to "Resolve Kubernetes nodename") so both the play name and the task
name are consistent; edit the task title in the include_tasks block that
references roles/common/tasks/nodename.yml.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 17ac164f-4570-41c7-8dd0-e7f8bb042889

📥 Commits

Reviewing files that changed from the base of the PR and between db194f3 and 70d39b3.

📒 Files selected for processing (6)
  • ansible/roles/add-kubelet-logging/tasks/main.yml
  • ansible/roles/common/tasks/files/prom-network-query.py
  • ansible/roles/common/tasks/nodename.yml
  • ansible/roles/install-logging/templates/prometheus.yml.j2
  • ansible/roles/microshift-start/tasks/main.yml
  • ansible/setup-node.yml

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 12, 2026

@sjug: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants