Fix flaky CI tests on ARM runners by vendoring semantic-conventions as a git submodule#2163
Fix flaky CI tests on ARM runners by vendoring semantic-conventions as a git submodule#2163
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #2163 +/- ##
==========================================
+ Coverage 87.43% 87.46% +0.03%
==========================================
Files 556 558 +2
Lines 184733 185742 +1009
==========================================
+ Hits 161521 162459 +938
- Misses 22686 22757 +71
Partials 526 526
🚀 New features to boost your workflow:
|
|
Love the minimalist approach by Copilot - fixing it without any code changes :) |
|
@copilot explain yourself. if you have no changes, I suggest we close this. |
…ly tests Co-authored-by: jmacd <3629705+jmacd@users.noreply.github.com>
The root cause was that Both tests only validate message counts — not signal content — so they don't actually need the semantic conventions registry. Commit Warning Firewall rules blocked me from connecting to one or more addresses (expand for details)I tried to connect to the following addresses, but was blocked by firewall rules:
If you need me to access, download, or install something from one of these locations, you can either:
|
This sounds bad. @copilot please file an issue about this before we merge your PR. |
I'm unable to file GitHub issues directly. Here's the full scope for the issue: Title: Tests and benchmarks clone semantic-conventions git repo at runtime, causing flaky failures Problem: Multiple tests and benchmarks use Affected locations:
Suggestion: Either (a) vendor/snapshot a subset of the semantic conventions YAML into the repo for test/benchmark use, or (b) annotate these tests with |
|
@copilot it sounds like this is not a proper fix. will you investigate what it takes to fix? this repo already has a git submodule for opentelemetry-proto, so maybe the answer is a submodule for weaver somewere under rust/... |
jmacd
left a comment
There was a problem hiding this comment.
Suggest a submodule for weaver repo, checked out in the github action
…der in tests and benchmarks Co-authored-by: jmacd <3629705+jmacd@users.noreply.github.com>
Done in commit
The production default ( Warning Firewall rules blocked me from connecting to one or more addresses (expand for details)I tried to connect to the following addresses, but was blocked by firewall rules:
If you need me to access, download, or install something from one of these locations, you can either:
|
jmacd
left a comment
There was a problem hiding this comment.
Developers will need to run git submodule update --init --recursive, likely, or local testing will encounter errors. I accept that.
| // Use Static data source to avoid network access (git clone). | ||
| // The registry path is unused when DataSource::Static is set. | ||
| let config = Config::new( | ||
| traffic_config, | ||
| VirtualDirectoryPath::LocalFolder { | ||
| path: ".".to_owned(), | ||
| }, | ||
| ) | ||
| .with_data_source(DataSource::Static); |
There was a problem hiding this comment.
Is there a way to avoid this change? why not use the registry path now that it's in a submodule
There was a problem hiding this comment.
@colpilot please resolve this question
Multiple tests and benchmarks were using
VirtualDirectoryPath::GitRepopointing tohttps://github.com/open-telemetry/semantic-conventions.git, which clones the repository from GitHub at test/benchmark runtime. This network call fails intermittently on ARM CI runners with "Could not decode server reply", causing flaky test failures across several test suites.Change Summary
Added
open-telemetry/semantic-conventionsas a git submodule atrust/semantic-conventions/(withshallow = truefor efficient CI cloning), following the same pattern as the existingproto/opentelemetry-protosubmodule. All test and benchmark code that previously cloned the repository at runtime now usesVirtualDirectoryPath::LocalFolderpointing to the checked-out submodule viaconcat!(env!("CARGO_MANIFEST_DIR"), "/../../../semantic-conventions/model").Files updated:
crates/otap/src/fake_data_generator.rs—test_fake_signal_receiver(now loads registry from local submodule) andtest_fake_signal_receiver_static_pregenerated(usesDataSource::Static, updated for consistency)crates/validation/src/encode_decode.rs—get_registry()helper used by encode/decode testsbenchmarks/benches/transport_optimize/main.rs—gen_fake_logs_batch()crates/otap/tests/pipeline_tests.rs—fake_receiver_config_value()(usesDataSource::Static, updated for consistency)The production default (
default_registry_path()inconfig.rs) still usesGitRepo, which is appropriate for deployed binaries where users configure their own registry path.The CI already uses
submodules: truein all relevant checkout steps, so the new submodule is automatically fetched without any CI workflow changes.What issue does this PR close?
How are these changes tested?
All affected tests pass locally without any network access:
fake_data_generator::tests::test_fake_signal_receiver— loads and validates signals against the local submodule registryfake_data_generator::tests::test_fake_signal_receiver_max_signal_count_onlyfake_data_generator::tests::test_fake_signal_receiver_message_rate_onlyfake_data_generator::tests::test_fake_signal_receiver_static_pregeneratedotap-df-validationpass using the local submodule registryAre there any user-facing changes?
No. This change only affects test and benchmark configuration. No production code or public APIs are modified. The
default_registry_path()production default is unchanged.Original prompt
✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.