EfficientDiD methodology validation (PR-B): sieve outcome-regression upgrade + Verified Components by igerber · Pull Request #521 · igerber/diff-diff

igerber · 2026-06-01T16:40:30Z

Summary

EfficientDiD methodology-review PR-B (source validation): validates the implementation against Chen, Sant'Anna & Xie (2025, arXiv:2506.17729v1) and promotes the METHODOLOGY_REVIEW.md row to Complete. (PR-A EfficientDiD methodology review (PR-A): Chen-Sant'Anna-Xie (2025) paper-review fidelity artifact #515 landed the paper-review fidelity artifact; this is the substantive source-vs-code pass.)
Behavior change — the covariate doubly-robust outcome regression m̂(X) is upgraded from linear OLS to a growing polynomial sieve (basis-dimension-penalized AIC/BIC, no fixed K≤5 ceiling) across all three nuisance sieves, so the covariate path matches Theorem 4.1 / Assumption C.1's growing-sieve regime (dimension rate p_K = comb(K+d,d) = o(n) for low-dimensional covariates) rather than only approximating it. sieve_k_max=1 forces every covariate-path sieve to degree 1. The Theorem A.1 Hausman statistic was extracted into a behavior-preserving _hausman_quadratic_form helper, and estimate_outcome_regression gained a criterion guard.
New tests/test_methodology_efficient_did.py (24 tests): paper-equation-numbered Verified Components (Eq 3.5/3.13 weights, Eq 3.9 generated-outcome telescoping, §4.1 no-cov closed form, Cor 3.1/3.2 PT-Post=CS, Thm 4.1 SE, Thm A.1 Hausman incl. the restricted−efficient covariance direction + rank-deficient-V DOF + direction guard) plus growing-sieve / multivariate / weight-scale-invariance / zero-weight-padding / last_cohort-DR coverage.
Docs reconciled to the growing-sieve framing across REGISTRY, the paper review, module/class docstrings, docs/api/efficient_did.rst, docs/choosing_estimator.rst, diff_diff/guides/llms-full.txt, CHANGELOG, and the tracker. HRS Table 6 anchor tightened from 0.1·SE → 0.05·SE, with the openICPSR-116186 data license + 656-vs-652 sample difference documented in tests/data/README.md.

Methodology references (required if estimator / math changes)

Method name(s): EfficientDiD — covariate doubly-robust path (sieve outcome regression + sieve propensity ratio (Eq 4.1–4.2) + kernel-smoothed Ω*(X)); Hausman PT-All vs PT-Post pretest (Theorem A.1).
Paper / source link(s): Chen, X., Sant'Anna, P. H. C., & Xie, H. (2025). Efficient Difference-in-Differences and Event Study Estimators, arXiv:2506.17729v1. Paper review on file: docs/methodology/papers/chen-santanna-xie-2025-review.md.
Any intentional deviations from the source (and why): documented in docs/methodology/REGISTRY.md ## EfficientDiD — overall_att cohort-size-weighted vs paper ES_avg uniform (Eq 2.3, recoverable from event-study output); fixed-weight multiplier bootstrap (the analytical path carries the (G_g−π_g) WIF correction); Hausman effective-rank DOF (= |E| when well-conditioned); vcov_type narrow to {hc1} (IF-based variance, no single design matrix); polynomial sieve basis + Silverman kernel bandwidth (paper leaves the basis/kernel open); i.i.d.-asymptotics SEs with cluster/survey EIF variants as documented library extensions. The prior linear-OLS efficiency deviation is eliminated by the sieve upgrade. The growing sieve attains the bound for low-dimensional covariates; high-dimensional X would call for the paper's ML-nuisance option (Remark 4.2).

Validation

Tests added/updated: tests/test_methodology_efficient_did.py (new), tests/test_efficient_did_validation.py (anchor tightened), tests/data/README.md. Existing tests/test_efficient_did.py / tests/test_survey_phase3.py / tests/test_survey_phase6.py stay green — the cap removal is a no-op for groups under ~3,125 units. Full affected suite: 361 passed (incl. slow) under DIFF_DIFF_BACKEND=python; black / ruff clean on changed code; mypy adds zero errors.
Backtest / simulation / notebook evidence: HRS Table 6 cross-paper replication on the paper's own data (matches all six ATT(g,t) + ES(0/1/2) + ES_avg to <0.03 published SE) + Compustat Monte-Carlo validation (unbiasedness, the Callaway-Sant'Anna efficiency gain, coverage, SE calibration).

Security / privacy

Confirm no secrets/PII in this PR: Yes

🤖 Generated with Claude Code

github-actions · 2026-06-01T16:51:30Z

Overall Assessment
⛔ Blocker

Executive Summary

The main blocker is a silent survey-weighted correctness issue: zero-weight padded rows can still change sieve selection in the DR path, so EfficientDiD can return different ATT/SE from analytically equivalent data with no warning.
Aside from that, the substantive methodology shift looks aligned with Chen, Sant'Anna & Xie (2025) and the updated Methodology Registry: the sieve outcome-regression move is documented, the Hausman covariance direction is correct, and the existing documented deviations remain properly labeled.
The new methodology file improves coverage materially, but it only exercises the K>5 path on the new outcome-regression helper, not on the two propensity helpers that also changed behavior.
The added performance debt from rebuilding polynomial bases is already tracked in TODO.md, so I do not treat it as a blocker.

Methodology

P0 Impact: zero-weight survey padding is not actually inert in the weighted sieve logic. In the new outcome-regression helper, raw n_group drives auto-k_max, the n_basis < n_group admissibility cap, and the IC itself at diff_diff/efficient_did_covariates.py:115 through L186. The same raw-count dependence remains in the changed propensity-ratio and inverse-propensity helpers at diff_diff/efficient_did_covariates.py:330 through L414 and diff_diff/efficient_did_covariates.py:515 through L587. With WLS coefficients/RSS otherwise unchanged, analytically irrelevant zero-weight rows can still change which K values are admissible or selected, and therefore change the final DR estimate with no warning. That violates both the survey-weight semantics of the Section 4 nuisance fits and the contract the PR itself states in tests/test_methodology_efficient_did.py:524. Concrete fix: when unit_weights is present, derive active positive-weight masks once and use positive-weight support counts for the auto-K rule, n_basis cap, and IC sample-size terms across all three sieve helpers; then add an end-to-end padded-vs-filtered survey regression test for EfficientDiD(..., covariates=..., survey_design=...).

No other source-material mismatches stood out in the changed estimator math. The REGISTRY-documented deviations (overall_att weighting, effective-rank Hausman df, fixed-weight bootstrap aggregation, vcov_type={"hc1"}) are clearly labeled, so I did not count them as defects.

Code Quality

No additional findings beyond the blocker above.

Performance

P3 Impact: the no-ceiling change increases repeated _polynomial_sieve_basis(X, K) work across all three nuisance fits, but the PR already records that follow-up at TODO.md:165. Concrete fix: none for approval; the tracking is sufficient.

Maintainability

No additional findings.

Tech Debt

No untracked tech-debt findings beyond the already-tracked basis-cache follow-up.

Security

No findings.

Documentation/Tests

P2 Impact: the new methodology suite only validates the K>5 behavior on estimate_outcome_regression at tests/test_methodology_efficient_did.py:580, even though the PR also changes the auto-K behavior of estimate_propensity_ratio_sieve and estimate_inverse_propensity_sieve at diff_diff/efficient_did_covariates.py:338 and diff_diff/efficient_did_covariates.py:521. The only zero-weight-padding check is helper-local and non-end-to-end, so the user-visible DR estimator surface is still unpinned. Concrete fix: add direct K>5 regression tests for both propensity helpers and one full-estimator survey test comparing zero-weight-padded data to the same data filtered to positive-weight units.

Path to Approval

Make zero-weight rows analytically irrelevant in all three weighted sieve helpers by basing auto-K, admissibility caps, and IC sample-size terms on positive-weight support rather than raw row counts.
Add a deterministic regression test that fails under the current raw-count logic, plus an end-to-end EfficientDiD(..., covariates=..., survey_design=...) padded-vs-filtered invariance test.
Add K>5 regression coverage for the propensity-ratio and inverse-propensity helpers, or an end-to-end DR fixture that exercises the uncapped path there as well.

Verification note: this was a static review only. I could not run the Python test suite in the current environment because the project dependencies are not installed here.

github-actions · 2026-06-01T18:16:42Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 089bc1d70c9e54db5d8be355d4182daa0a7b6a3e

Overall Assessment

⛔ Blocker

Executive Summary

The prior blocker on sieve order selection appears fixed: the outcome-regression, propensity-ratio, and inverse-propensity helpers now all key auto-K selection off positive-weight support, and the new methodology file adds direct regression coverage for the uncapped K>5 propensity paths.
P0 [Newly identified]: overidentified survey DR fits are still not zero-weight invariant when kernel_bandwidth=None. Auto Silverman bandwidth is still computed from the full unweighted covariate matrix, so zero-weight padded rows can change Omega*(X) and therefore the efficient combination weights.
The new end-to-end zero-weight invariance test does not cover that path: its fixture is effectively just-identified (H=1), so compute_per_unit_weights() short-circuits to 1 and the kernel/bandwidth path is bypassed.
Aside from that remaining issue, the sieve-OR upgrade, Hausman extraction, and the documented deviations look aligned with the updated registry and the source material. citeturn0view2turn0view3
The repeated-basis performance cost and the tiny survey-SE zero-weight correction are both properly tracked in TODO.md and are non-blocking.

Methodology

P0 [Newly identified] Impact: zero-weight survey padding can still change the DR point estimate in overidentified covariate fits when kernel_bandwidth=None. _silverman_bandwidth() still uses the full unweighted row count and dispersion of covariate_matrix, and compute_omega_star_conditional() auto-calls it before forming per-unit efficient weights. In any H>1 cell, that bandwidth feeds directly into Omega*(X) and then compute_per_unit_weights(), so analytically irrelevant zero-weight rows can still move ATT(g,t) and downstream aggregates with no warning. This is not about the choice of Silverman itself; the paper leaves the kernel/bandwidth choice open. The defect is the dependence of the auto selector on zero-weight rows. Concrete fix: on survey fits, derive auto bandwidth from the positive-weight support only (ideally a documented weighted-support analogue), or fail closed unless kernel_bandwidth is explicit when zero-weight rows are present. Relevant code: diff_diff/efficient_did_covariates.py:L767-L777, diff_diff/efficient_did_covariates.py:L898-L899, diff_diff/efficient_did_covariates.py:L1015-L1069, diff_diff/efficient_did.py:L823-L825. citeturn0view2turn0view3
No other undocumented source-material mismatches stood out in the changed estimator math. The remaining deviations are explicitly labeled in docs/methodology/REGISTRY.md:L1060-L1072.

Code Quality

No findings.

Performance

P3 Impact: the uncapped growing sieve increases repeated basis construction work, but that follow-up is already tracked and does not block approval. Concrete fix: none for this PR; keep the existing cache follow-up in TODO.md:L166.

Maintainability

No findings.

Tech Debt

P3 Impact: the tiny survey-SE zero-weight non-invariance from PSU counting is explicitly tracked and is appropriately non-blocking. Concrete fix: none for approval; the follow-up is already recorded in TODO.md:L90.

Security

No findings.

Documentation/Tests

P2 Impact: the new zero-weight invariance coverage overstates the surface it actually pins. The end-to-end fixture in TestSurveyZeroWeightInvariance uses one treated cohort and only periods 1,2, which leaves a single valid comparison moment; compute_per_unit_weights() therefore returns all ones and the kernel/bandwidth path is never exercised. That means the new test cannot catch the remaining auto-bandwidth drift described above, even though the changelog now claims exact point-estimate invariance. Concrete fix: add an end-to-end survey regression with at least one overidentified DR cell (H>1), kernel_bandwidth=None, and zero-weight padded units carrying extreme covariates so the bandwidth dependence is forced to show up. Relevant refs: tests/test_methodology_efficient_did.py:L785-L844, diff_diff/efficient_did_weights.py:L87-L99, diff_diff/efficient_did_covariates.py:L1037-L1041, CHANGELOG.md:L14.

Path to Approval

Make auto kernel_bandwidth=None ignore zero-weight rows on survey fits by basing the selector on the positive-weight support, and document that rule in REGISTRY.md.
Add an overidentified survey invariance regression with H>1 and kernel_bandwidth=None, comparing a positive-weight-only fit to the same data padded with zero-weight units carrying extreme covariates.
Keep the “exact zero-weight invariance” changelog/registry/test language only after the bandwidth path is fixed; otherwise narrow the claim to sieve-order selection / just-identified cases.

Static review only. I could not execute the estimator or test suite in this environment because the Python dependencies here are not installed (numpy was unavailable).

github-actions · 2026-06-01T18:29:17Z

🔁 AI review rerun (requested by @igerber)

Head SHA: b8ef9074f60304dce1a2694bd6cf0f8f9d49a694

Overall Assessment

✅ Looks good

Executive Summary

All prior P1+ findings appear resolved. The zero-weight survey blocker is fixed in code by moving auto Silverman bandwidth selection onto positive-weight support, and the new overidentified H>1 regression actually exercises that path in the test suite. diff_diff/efficient_did_covariates.py:L767-L792, diff_diff/efficient_did_covariates.py:L913-L915, tests/test_methodology_efficient_did.py:L901-L940
The affected method is EfficientDiD’s covariate DR path: sieve outcome regression, propensity ratio, inverse propensity, conditional Omega*(X), and the Hausman PT-All vs PT-Post pretest. The changed implementation matches the updated methodology contract in the registry and paper-review notes. diff_diff/efficient_did.py:L212-L286, docs/methodology/REGISTRY.md:L1060-L1068, docs/methodology/papers/chen-santanna-xie-2025-review.md:L189-L193
I did not find a new undocumented methodology deviation, missing assumption check rising to P1, or incorrect SE/inference pattern in the changed files. The estimator continues to use safe_inference() rather than inline t_stat = effect / se logic. diff_diff/efficient_did.py:L1096-L1128, diff_diff/efficient_did.py:L1186-L1235
The remaining issues in the changed area are non-blocking and already tracked: tiny zero-weight survey SE drift in shared PSU corrections, and extra basis-construction cost after removing the fixed K<=5 ceiling. TODO.md:L90, TODO.md:L166
Verification was static only. I attempted a targeted pytest slice, but pytest is not installed in this environment.

Methodology

P3-informational Impact: the remaining source-material deviations are documented implementation choices rather than defects: polynomial sieve basis choice, Silverman bandwidth, cohort-size-weighted overall_att, and effective-rank Hausman DOF. Concrete fix: none; documentation is already explicit. docs/methodology/REGISTRY.md:L1060-L1068, docs/methodology/papers/chen-santanna-xie-2025-review.md:L189-L193

Code Quality

No findings. Parameter propagation is complete for sieve_k_max and sieve_criterion: validation, get_params(), all three nuisance fits, results metadata, and hausman_pretest pass-through all line up. diff_diff/efficient_did.py:L346-L427, diff_diff/efficient_did.py:L928-L1018, diff_diff/efficient_did.py:L1238-L1283, diff_diff/efficient_did_results.py:L124-L186

Performance

P3-informational Impact: the growing-sieve path rebuilds _polynomial_sieve_basis(X, K) inside each nuisance helper for every candidate K, so large covariate-adjusted fits do avoidable extra work. Concrete fix: keep the already-added cache follow-up in TODO.md. diff_diff/efficient_did_covariates.py:L166-L173, diff_diff/efficient_did_covariates.py:L392-L400, diff_diff/efficient_did_covariates.py:L584-L590, TODO.md:L166

Maintainability

No findings. Extracting _hausman_quadratic_form() is behavior-preserving and improves isolated testability of the covariance-direction and effective-rank logic. diff_diff/efficient_did.py:L142-L202, diff_diff/efficient_did.py:L1917-L1937, tests/test_methodology_efficient_did.py:L391-L432

Tech Debt

P3-informational (tracked in TODO.md) Impact: survey SEs are still not exactly zero-weight-invariant because the shared PSU finite-sample correction counts zero-weight PSUs; the PR now correctly limits the “exact invariance” claim to point estimates and makes the residual SE drift explicit in tests. Concrete fix: count only positive-weight PSUs in _compute_stratified_psu_meat, as already recorded. TODO.md:L90, tests/test_methodology_efficient_did.py:L856-L862, tests/test_methodology_efficient_did.py:L939-L940

Security

No findings.

Documentation/Tests

No findings. The new methodology suite now covers both missing surfaces from the previous review: just-identified zero-weight order-selection invariance and overidentified Omega*(X) bandwidth invariance, plus direct growing-sieve and Hausman checks. tests/test_methodology_efficient_did.py:L528-L567, tests/test_methodology_efficient_did.py:L591-L764, tests/test_methodology_efficient_did.py:L772-L940
Verification note: static review only; I could not execute the tests here because pytest is unavailable.

…upgrade + zero-weight-invariant nuisance selectors Source-validation pass for Chen, Sant'Anna & Xie (2025, arXiv:2506.17729v1); promotes the METHODOLOGY_REVIEW.md row to Complete. - Covariate doubly-robust outcome regression m_hat(X): linear OLS -> growing polynomial sieve (dimension-penalized AIC/BIC, no fixed K<=5 ceiling), so the covariate path attains the semiparametric efficiency bound under Assumption C.1; the K<=5 cap is removed from the two pre-existing propensity sieves too. - Theorem A.1 Hausman statistic extracted into a behavior-preserving _hausman_quadratic_form helper; criterion guard on estimate_outcome_regression. - Zero-weight (survey-subpopulation / padded) rows are now fully inert for the DR point estimate: all three sieve order-selectors key off the positive-weight support (n_pos = sum(w>0)) for auto-k_max / the n_basis cap / the IC sample-size terms, and the auto Silverman bandwidth for the kernel Omega*(X) is evaluated on the positive-weight support (this last only bites overidentified H>1 cells where Omega* feeds the efficient weights). Both were CI-codex P0s, both fail-under-bug verified. - New tests/test_methodology_efficient_did.py: paper-equation Verified Components (weights, generated-outcome telescoping, no-cov closed form, PT-Post=CS, SE, Hausman) + growing-sieve / propensity K>5 / zero-weight invariance (just- identified H=1 sieve-order AND overidentified H>1 kernel-bandwidth). - HRS Table 6 anchor tightened 0.1*SE -> 0.05*SE + data license / 656-vs-652 docs; REGISTRY / CHANGELOG / paper-review / docstrings reconciled to the growing-sieve + positive-weight-support contract; two P3 follow-ups (basis-rebuild cache, survey-vcov zero-weight-PSU DOF correction) tracked in TODO.md. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

github-actions · 2026-06-01T19:24:30Z

🔁 AI review rerun (requested by @igerber)

Head SHA: f632acb9948566167d10b2a9fd47bfc85aec77d7

Overall Assessment

✅ Looks good

Executive Summary

Prior P1+ issues appear resolved; I found no unmitigated P0/P1 findings in the changed diff.
Affected method: EfficientDiD covariate DR path and Hausman PT-All/PT-Post pretest.
The sieve outcome-regression upgrade is documented in REGISTRY.md and matches the local Chen/Sant’Anna/Xie review artifact’s §4 flexible-nuisance contract.
sieve_k_max / sieve_criterion propagation is complete across validation, fit-time nuisances, get_params(), and result metadata.
Remaining issues are P3/tracked: zero-weight survey SE drift and avoidable sieve-basis recomputation.
Verification was static only; this runner has no pytest or numpy, so I could not execute tests.

Methodology

Severity: P3-informational
Impact: Remaining deviations are documented implementation choices, not defects: polynomial basis choice, Silverman bandwidth, effective-rank Hausman DOF, cohort-size overall_att, fixed-weight bootstrap aggregation, and cluster/survey EIF extensions.
Concrete fix: None required. See docs/methodology/REGISTRY.md:L1060-L1068 and METHODOLOGY_REVIEW.md:L658-L663.
Severity: No finding
Impact: The changed outcome regression now uses the documented polynomial sieve with AIC/BIC order selection and positive-weight support handling. Fit-time calls pass the same sieve controls into outcome regression, propensity ratio, and inverse propensity helpers.
Concrete fix: None. See diff_diff/efficient_did_covariates.py:L36-L240, diff_diff/efficient_did.py:L918-L1018.

Code Quality

Severity: No finding
Impact: New parameters are validated, returned by get_params(), threaded through all nuisance fits, and stored on EfficientDiDResults.
Concrete fix: None. See diff_diff/efficient_did.py:L300-L427, diff_diff/efficient_did_results.py:L126-L186.
Severity: No finding
Impact: No new inline inference anti-pattern found in the changed EfficientDiD paths; pointwise and aggregate inference still routes through safe_inference().
Concrete fix: None. See diff_diff/efficient_did.py:L1096-L1128, diff_diff/efficient_did.py:L1623-L1689.

Performance

Severity: P3-informational, tracked in TODO.md
Impact: The growing-sieve path rebuilds polynomial bases per candidate K in each nuisance helper, which can add avoidable cost for large covariate-adjusted fits.
Concrete fix: Cache _polynomial_sieve_basis(X, K) per fit and share it across the three sieve helpers. Already tracked at TODO.md:L166.

Maintainability

Severity: No finding
Impact: _hausman_quadratic_form() extraction preserves the restricted-minus-efficient covariance direction and makes effective-rank behavior directly testable.
Concrete fix: None. See diff_diff/efficient_did.py:L142-L202, diff_diff/efficient_did.py:L1917-L1937.

Tech Debt

Severity: P3-informational, tracked in TODO.md
Impact: Survey sandwich SEs are still not exactly zero-weight invariant because shared survey PSU finite-sample corrections count zero-weight PSUs. Point estimates are covered as invariant; SE drift is documented and tracked.
Concrete fix: Count only positive-weight PSUs inside _compute_stratified_psu_meat. See TODO.md:L90, tests/test_methodology_efficient_did.py:L856-L862.

Security

Severity: No finding
Impact: I did not see secrets, credential material, or new unsafe I/O/network behavior in the changed files.
Concrete fix: None.

Documentation/Tests

Severity: No finding
Impact: The new methodology tests cover the main changed surfaces: nonlinear sieve OR, linear fallback, invalid criterion guard, growing order above 5, propensity sieve cap removal, Hausman direction/rank, and zero-weight survey invariance for both order selection and overidentified bandwidth paths.
Concrete fix: None. See tests/test_methodology_efficient_did.py:L443-L618, tests/test_methodology_efficient_did.py:L725-L764, tests/test_methodology_efficient_did.py:L826-L940.
Verification note: Could not run tests because python -m pytest failed with No module named pytest, and direct import checks failed with No module named numpy.

igerber force-pushed the feature/efficient-did-sieve-validation branch from b8ef907 to f632acb Compare June 1, 2026 19:20

igerber added the ready-for-ci Triggers CI test workflows label Jun 1, 2026

igerber merged commit 1bf0002 into main Jun 1, 2026
33 of 34 checks passed

igerber deleted the feature/efficient-did-sieve-validation branch June 1, 2026 21:34

igerber mentioned this pull request Jun 2, 2026

Bump version to 3.5.1 #526

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

EfficientDiD methodology validation (PR-B): sieve outcome-regression upgrade + Verified Components#521

EfficientDiD methodology validation (PR-B): sieve outcome-regression upgrade + Verified Components#521
igerber merged 1 commit into
mainfrom
feature/efficient-did-sieve-validation

igerber commented Jun 1, 2026

Uh oh!

github-actions Bot commented Jun 1, 2026

Uh oh!

github-actions Bot commented Jun 1, 2026

Uh oh!

github-actions Bot commented Jun 1, 2026

Uh oh!

github-actions Bot commented Jun 1, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

igerber commented Jun 1, 2026

Summary

Methodology references (required if estimator / math changes)

Validation

Security / privacy

Uh oh!

github-actions Bot commented Jun 1, 2026

Uh oh!

github-actions Bot commented Jun 1, 2026

Uh oh!

github-actions Bot commented Jun 1, 2026

Uh oh!

github-actions Bot commented Jun 1, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant