fix: benchmark frequency on CWF, use pcm-tpmi to count number of compute dies#676
fix: benchmark frequency on CWF, use pcm-tpmi to count number of compute dies#676harp-intel wants to merge 2 commits intomainfrom
Conversation
…ute dies Signed-off-by: Harper, Jason M <jason.m.harper@intel.com>
There was a problem hiding this comment.
Pull request overview
Updates PerfSpect’s x86 frequency benchmark script to better handle multi-die Intel CPUs (GNR/CWF) by interleaving core IDs across dies, and switches compute-die counting from lspci-based heuristics to pcm-tpmi output parsing.
Changes:
- Extend special interleaving logic to apply to both GNR (model 173) and CWF (model 221).
- Replace compute-die detection from
lspcidevice counting to parsingpcm-tpmioutput, and recomputedies_per_socketaccordingly. - Update the frequency benchmark script dependency list to include
pcm-tpmiinstead oflspci.
| output=$(pcm-tpmi 2 0x10 -d -b 26:26) | ||
| compute_die_count=0 | ||
| while read -r line; do | ||
| if [[ $line == *"entry"* && $line == *"instance"* && $line == *"value 0"* ]]; then | ||
| compute_die_count=$((compute_die_count + 1)) |
There was a problem hiding this comment.
pcm-tpmi execution/parsing isn’t validated here. If pcm-tpmi isn’t present on the target PATH, fails, or its output format changes, compute_die_count will stay 0 and the script will later miscompute or crash. Consider checking the command exit status and that the parsed count is a positive integer before proceeding (otherwise fall back to no interleaving or emit a clear error and exit).
| num_sockets=$(lscpu | grep -E '^Socket\(s\):' | awk '{print $2}') | ||
| # echo "Number of devices: $num_devices" | ||
| # echo "Number of sockets: $num_sockets" | ||
| num_devices_per_die=2 | ||
| # Calculate the number of dies per socket | ||
| dies_per_socket=$((num_devices / num_sockets / num_devices_per_die)) | ||
| dies_per_socket=$(( compute_die_count / num_sockets )) | ||
| # echo "Number of dies per socket: $dies_per_socket" | ||
| # Calculate the number of cores per die | ||
| cores_per_die=$((num_cores_per_socket / dies_per_socket)) |
There was a problem hiding this comment.
dies_per_socket=$(( compute_die_count / num_sockets )) can evaluate to 0 (or error) if num_sockets is 0/empty or compute_die_count < num_sockets, which then causes a division-by-zero in cores_per_die=$((num_cores_per_socket / dies_per_socket)). Add guards to ensure num_sockets > 0 and dies_per_socket >= 1 (and optionally that the division is exact) before using these values.
Signed-off-by: Harper, Jason M <jason.m.harper@intel.com>
| output=$(pcm-tpmi 2 0x10 -d -b 26:26) | ||
| compute_die_count=0 | ||
| while read -r line; do | ||
| if [[ $line == *"entry"* && $line == *"instance"* && $line == *"value 0"* ]]; then | ||
| compute_die_count=$((compute_die_count + 1)) | ||
| fi | ||
| done <<< "$output" | ||
| # echo "Number of compute dies: $compute_die_count" | ||
| num_sockets=$(lscpu | grep -E '^Socket\(s\):' | awk '{print $2}') | ||
| # echo "Number of devices: $num_devices" | ||
| # echo "Number of sockets: $num_sockets" | ||
| num_devices_per_die=2 | ||
| # Calculate the number of dies per socket | ||
| dies_per_socket=$((num_devices / num_sockets / num_devices_per_die)) | ||
| dies_per_socket=$(( compute_die_count / num_sockets )) | ||
| # echo "Number of dies per socket: $dies_per_socket" | ||
| # Calculate the number of cores per die | ||
| cores_per_die=$((num_cores_per_socket / dies_per_socket)) | ||
| # echo "Number of cores per die: $cores_per_die" |
There was a problem hiding this comment.
The compute-die detection logic can produce compute_die_count=0 (e.g., pcm-tpmi failure/permission issue/unexpected output), which then makes dies_per_socket=0 and triggers a division-by-zero when computing cores_per_die. Please add explicit error handling: check pcm-tpmi exit status, validate compute_die_count and num_sockets are >0, and ensure compute_die_count is evenly divisible by num_sockets before doing arithmetic; otherwise return a clear error or skip interleaving.
| var numDies int | ||
| var coreMultiplier int | ||
| if strings.Contains(arch, cpus.UarchSRF) { | ||
| numDies = 4 | ||
| coreMultiplier = 4 | ||
| } else if strings.Contains(arch, cpus.UarchCWF) { | ||
| numDies = 3 | ||
| coreMultiplier = 4 | ||
| } else if strings.Contains(arch, cpus.UarchGNR_X3) { | ||
| archMultiplier = 3 | ||
| numDies = 3 | ||
| coreMultiplier = 1 | ||
| } else if strings.Contains(arch, cpus.UarchGNR_X2) { | ||
| archMultiplier = 2 | ||
| numDies = 2 | ||
| coreMultiplier = 1 | ||
| } else { | ||
| archMultiplier = 1 | ||
| numDies = 1 | ||
| coreMultiplier = 1 | ||
| } | ||
| for _, count := range bucketCoreCounts { | ||
| if startRange > count { | ||
| adjustedCount := count * coreMultiplier | ||
| if startRange > adjustedCount { | ||
| break | ||
| } | ||
| if archMultiplier > 1 { | ||
| totalCoreCount := count * archMultiplier | ||
| if totalCoreStartRange > int(totalCoreCount) { | ||
| break | ||
| } | ||
| totalCoreBuckets = append(totalCoreBuckets, fmt.Sprintf("%d-%d", totalCoreStartRange, totalCoreCount)) | ||
| totalCoreStartRange = int(totalCoreCount) + 1 | ||
| totalCoreCount := adjustedCount * numDies | ||
| if totalCoreStartRange > int(totalCoreCount) { | ||
| break | ||
| } | ||
| dieCoreBuckets = append(dieCoreBuckets, fmt.Sprintf("%d-%d", startRange, count)) | ||
| startRange = int(count) + 1 | ||
| totalCoreBuckets = append(totalCoreBuckets, fmt.Sprintf("%d-%d", totalCoreStartRange, totalCoreCount)) | ||
| totalCoreStartRange = int(totalCoreCount) + 1 | ||
| dieCoreBuckets = append(dieCoreBuckets, fmt.Sprintf("%d-%d", startRange, adjustedCount)) | ||
| startRange = int(adjustedCount) + 1 |
There was a problem hiding this comment.
GetSpecFrequencyBuckets now has new SRF/CWF-specific bucket math (numDies/coreMultiplier) but there are no unit tests covering this function. Since internal/extract/frequency_test.go already exists for related helpers, please add cases that validate the generated bucket ranges/headers for SRF and CWF (including the new CWF 3-die behavior and the coreMultiplier scaling) to prevent regressions.
multi-die processors requires core interleaving for frequency benchmarking
This pull request updates the logic for detecting and handling specific CPU models in the
internal/script/scripts.goscript, particularly for GNR and CWF architectures. The main improvements include extending support to an additional CPU model, switching fromlspcitopcm-tpmifor more accurate compute die detection, and updating dependencies accordingly.CPU Model Support and Detection:
lspcioutput to usingpcm-tpmi, which provides a more accurate count.dies_per_socketto use the new compute die count.Dependency Updates:
pcm-tpmiinstead oflspci, reflecting the new detection method.