Skip to content

EfficientDiD methodology validation (PR-B): sieve outcome-regression upgrade + Verified Components#521

Merged
igerber merged 1 commit into
mainfrom
feature/efficient-did-sieve-validation
Jun 1, 2026
Merged

EfficientDiD methodology validation (PR-B): sieve outcome-regression upgrade + Verified Components#521
igerber merged 1 commit into
mainfrom
feature/efficient-did-sieve-validation

Conversation

@igerber
Copy link
Copy Markdown
Owner

@igerber igerber commented Jun 1, 2026

Summary

  • EfficientDiD methodology-review PR-B (source validation): validates the implementation against Chen, Sant'Anna & Xie (2025, arXiv:2506.17729v1) and promotes the METHODOLOGY_REVIEW.md row to Complete. (PR-A EfficientDiD methodology review (PR-A): Chen-Sant'Anna-Xie (2025) paper-review fidelity artifact #515 landed the paper-review fidelity artifact; this is the substantive source-vs-code pass.)
  • Behavior change — the covariate doubly-robust outcome regression m̂(X) is upgraded from linear OLS to a growing polynomial sieve (basis-dimension-penalized AIC/BIC, no fixed K≤5 ceiling) across all three nuisance sieves, so the covariate path matches Theorem 4.1 / Assumption C.1's growing-sieve regime (dimension rate p_K = comb(K+d,d) = o(n) for low-dimensional covariates) rather than only approximating it. sieve_k_max=1 forces every covariate-path sieve to degree 1. The Theorem A.1 Hausman statistic was extracted into a behavior-preserving _hausman_quadratic_form helper, and estimate_outcome_regression gained a criterion guard.
  • New tests/test_methodology_efficient_did.py (24 tests): paper-equation-numbered Verified Components (Eq 3.5/3.13 weights, Eq 3.9 generated-outcome telescoping, §4.1 no-cov closed form, Cor 3.1/3.2 PT-Post=CS, Thm 4.1 SE, Thm A.1 Hausman incl. the restricted−efficient covariance direction + rank-deficient-V DOF + direction guard) plus growing-sieve / multivariate / weight-scale-invariance / zero-weight-padding / last_cohort-DR coverage.
  • Docs reconciled to the growing-sieve framing across REGISTRY, the paper review, module/class docstrings, docs/api/efficient_did.rst, docs/choosing_estimator.rst, diff_diff/guides/llms-full.txt, CHANGELOG, and the tracker. HRS Table 6 anchor tightened from 0.1·SE → 0.05·SE, with the openICPSR-116186 data license + 656-vs-652 sample difference documented in tests/data/README.md.

Methodology references (required if estimator / math changes)

  • Method name(s): EfficientDiD — covariate doubly-robust path (sieve outcome regression + sieve propensity ratio (Eq 4.1–4.2) + kernel-smoothed Ω*(X)); Hausman PT-All vs PT-Post pretest (Theorem A.1).
  • Paper / source link(s): Chen, X., Sant'Anna, P. H. C., & Xie, H. (2025). Efficient Difference-in-Differences and Event Study Estimators, arXiv:2506.17729v1. Paper review on file: docs/methodology/papers/chen-santanna-xie-2025-review.md.
  • Any intentional deviations from the source (and why): documented in docs/methodology/REGISTRY.md ## EfficientDiDoverall_att cohort-size-weighted vs paper ES_avg uniform (Eq 2.3, recoverable from event-study output); fixed-weight multiplier bootstrap (the analytical path carries the (G_g−π_g) WIF correction); Hausman effective-rank DOF (= |E| when well-conditioned); vcov_type narrow to {hc1} (IF-based variance, no single design matrix); polynomial sieve basis + Silverman kernel bandwidth (paper leaves the basis/kernel open); i.i.d.-asymptotics SEs with cluster/survey EIF variants as documented library extensions. The prior linear-OLS efficiency deviation is eliminated by the sieve upgrade. The growing sieve attains the bound for low-dimensional covariates; high-dimensional X would call for the paper's ML-nuisance option (Remark 4.2).

Validation

  • Tests added/updated: tests/test_methodology_efficient_did.py (new), tests/test_efficient_did_validation.py (anchor tightened), tests/data/README.md. Existing tests/test_efficient_did.py / tests/test_survey_phase3.py / tests/test_survey_phase6.py stay green — the cap removal is a no-op for groups under ~3,125 units. Full affected suite: 361 passed (incl. slow) under DIFF_DIFF_BACKEND=python; black / ruff clean on changed code; mypy adds zero errors.
  • Backtest / simulation / notebook evidence: HRS Table 6 cross-paper replication on the paper's own data (matches all six ATT(g,t) + ES(0/1/2) + ES_avg to <0.03 published SE) + Compustat Monte-Carlo validation (unbiasedness, the Callaway-Sant'Anna efficiency gain, coverage, SE calibration).

Security / privacy

  • Confirm no secrets/PII in this PR: Yes

🤖 Generated with Claude Code

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jun 1, 2026

Overall Assessment
⛔ Blocker

Executive Summary

  • The main blocker is a silent survey-weighted correctness issue: zero-weight padded rows can still change sieve selection in the DR path, so EfficientDiD can return different ATT/SE from analytically equivalent data with no warning.
  • Aside from that, the substantive methodology shift looks aligned with Chen, Sant'Anna & Xie (2025) and the updated Methodology Registry: the sieve outcome-regression move is documented, the Hausman covariance direction is correct, and the existing documented deviations remain properly labeled.
  • The new methodology file improves coverage materially, but it only exercises the K>5 path on the new outcome-regression helper, not on the two propensity helpers that also changed behavior.
  • The added performance debt from rebuilding polynomial bases is already tracked in TODO.md, so I do not treat it as a blocker.

Methodology

  • P0 Impact: zero-weight survey padding is not actually inert in the weighted sieve logic. In the new outcome-regression helper, raw n_group drives auto-k_max, the n_basis < n_group admissibility cap, and the IC itself at diff_diff/efficient_did_covariates.py:115 through L186. The same raw-count dependence remains in the changed propensity-ratio and inverse-propensity helpers at diff_diff/efficient_did_covariates.py:330 through L414 and diff_diff/efficient_did_covariates.py:515 through L587. With WLS coefficients/RSS otherwise unchanged, analytically irrelevant zero-weight rows can still change which K values are admissible or selected, and therefore change the final DR estimate with no warning. That violates both the survey-weight semantics of the Section 4 nuisance fits and the contract the PR itself states in tests/test_methodology_efficient_did.py:524. Concrete fix: when unit_weights is present, derive active positive-weight masks once and use positive-weight support counts for the auto-K rule, n_basis cap, and IC sample-size terms across all three sieve helpers; then add an end-to-end padded-vs-filtered survey regression test for EfficientDiD(..., covariates=..., survey_design=...).

No other source-material mismatches stood out in the changed estimator math. The REGISTRY-documented deviations (overall_att weighting, effective-rank Hausman df, fixed-weight bootstrap aggregation, vcov_type={"hc1"}) are clearly labeled, so I did not count them as defects.

Code Quality

  • No additional findings beyond the blocker above.

Performance

  • P3 Impact: the no-ceiling change increases repeated _polynomial_sieve_basis(X, K) work across all three nuisance fits, but the PR already records that follow-up at TODO.md:165. Concrete fix: none for approval; the tracking is sufficient.

Maintainability

  • No additional findings.

Tech Debt

  • No untracked tech-debt findings beyond the already-tracked basis-cache follow-up.

Security

  • No findings.

Documentation/Tests

  • P2 Impact: the new methodology suite only validates the K>5 behavior on estimate_outcome_regression at tests/test_methodology_efficient_did.py:580, even though the PR also changes the auto-K behavior of estimate_propensity_ratio_sieve and estimate_inverse_propensity_sieve at diff_diff/efficient_did_covariates.py:338 and diff_diff/efficient_did_covariates.py:521. The only zero-weight-padding check is helper-local and non-end-to-end, so the user-visible DR estimator surface is still unpinned. Concrete fix: add direct K>5 regression tests for both propensity helpers and one full-estimator survey test comparing zero-weight-padded data to the same data filtered to positive-weight units.

Path to Approval

  1. Make zero-weight rows analytically irrelevant in all three weighted sieve helpers by basing auto-K, admissibility caps, and IC sample-size terms on positive-weight support rather than raw row counts.
  2. Add a deterministic regression test that fails under the current raw-count logic, plus an end-to-end EfficientDiD(..., covariates=..., survey_design=...) padded-vs-filtered invariance test.
  3. Add K>5 regression coverage for the propensity-ratio and inverse-propensity helpers, or an end-to-end DR fixture that exercises the uncapped path there as well.

Verification note: this was a static review only. I could not run the Python test suite in the current environment because the project dependencies are not installed here.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jun 1, 2026

🔁 AI review rerun (requested by @igerber)

Head SHA: 089bc1d70c9e54db5d8be355d4182daa0a7b6a3e


Overall Assessment

⛔ Blocker

Executive Summary

  • The prior blocker on sieve order selection appears fixed: the outcome-regression, propensity-ratio, and inverse-propensity helpers now all key auto-K selection off positive-weight support, and the new methodology file adds direct regression coverage for the uncapped K>5 propensity paths.
  • P0 [Newly identified]: overidentified survey DR fits are still not zero-weight invariant when kernel_bandwidth=None. Auto Silverman bandwidth is still computed from the full unweighted covariate matrix, so zero-weight padded rows can change Omega*(X) and therefore the efficient combination weights.
  • The new end-to-end zero-weight invariance test does not cover that path: its fixture is effectively just-identified (H=1), so compute_per_unit_weights() short-circuits to 1 and the kernel/bandwidth path is bypassed.
  • Aside from that remaining issue, the sieve-OR upgrade, Hausman extraction, and the documented deviations look aligned with the updated registry and the source material. citeturn0view2turn0view3
  • The repeated-basis performance cost and the tiny survey-SE zero-weight correction are both properly tracked in TODO.md and are non-blocking.

Methodology

  • P0 [Newly identified] Impact: zero-weight survey padding can still change the DR point estimate in overidentified covariate fits when kernel_bandwidth=None. _silverman_bandwidth() still uses the full unweighted row count and dispersion of covariate_matrix, and compute_omega_star_conditional() auto-calls it before forming per-unit efficient weights. In any H>1 cell, that bandwidth feeds directly into Omega*(X) and then compute_per_unit_weights(), so analytically irrelevant zero-weight rows can still move ATT(g,t) and downstream aggregates with no warning. This is not about the choice of Silverman itself; the paper leaves the kernel/bandwidth choice open. The defect is the dependence of the auto selector on zero-weight rows. Concrete fix: on survey fits, derive auto bandwidth from the positive-weight support only (ideally a documented weighted-support analogue), or fail closed unless kernel_bandwidth is explicit when zero-weight rows are present. Relevant code: diff_diff/efficient_did_covariates.py:L767-L777, diff_diff/efficient_did_covariates.py:L898-L899, diff_diff/efficient_did_covariates.py:L1015-L1069, diff_diff/efficient_did.py:L823-L825. citeturn0view2turn0view3
  • No other undocumented source-material mismatches stood out in the changed estimator math. The remaining deviations are explicitly labeled in docs/methodology/REGISTRY.md:L1060-L1072.

Code Quality

No findings.

Performance

  • P3 Impact: the uncapped growing sieve increases repeated basis construction work, but that follow-up is already tracked and does not block approval. Concrete fix: none for this PR; keep the existing cache follow-up in TODO.md:L166.

Maintainability

No findings.

Tech Debt

  • P3 Impact: the tiny survey-SE zero-weight non-invariance from PSU counting is explicitly tracked and is appropriately non-blocking. Concrete fix: none for approval; the follow-up is already recorded in TODO.md:L90.

Security

No findings.

Documentation/Tests

  • P2 Impact: the new zero-weight invariance coverage overstates the surface it actually pins. The end-to-end fixture in TestSurveyZeroWeightInvariance uses one treated cohort and only periods 1,2, which leaves a single valid comparison moment; compute_per_unit_weights() therefore returns all ones and the kernel/bandwidth path is never exercised. That means the new test cannot catch the remaining auto-bandwidth drift described above, even though the changelog now claims exact point-estimate invariance. Concrete fix: add an end-to-end survey regression with at least one overidentified DR cell (H>1), kernel_bandwidth=None, and zero-weight padded units carrying extreme covariates so the bandwidth dependence is forced to show up. Relevant refs: tests/test_methodology_efficient_did.py:L785-L844, diff_diff/efficient_did_weights.py:L87-L99, diff_diff/efficient_did_covariates.py:L1037-L1041, CHANGELOG.md:L14.

Path to Approval

  1. Make auto kernel_bandwidth=None ignore zero-weight rows on survey fits by basing the selector on the positive-weight support, and document that rule in REGISTRY.md.
  2. Add an overidentified survey invariance regression with H>1 and kernel_bandwidth=None, comparing a positive-weight-only fit to the same data padded with zero-weight units carrying extreme covariates.
  3. Keep the “exact zero-weight invariance” changelog/registry/test language only after the bandwidth path is fixed; otherwise narrow the claim to sieve-order selection / just-identified cases.

Static review only. I could not execute the estimator or test suite in this environment because the Python dependencies here are not installed (numpy was unavailable).

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jun 1, 2026

🔁 AI review rerun (requested by @igerber)

Head SHA: b8ef9074f60304dce1a2694bd6cf0f8f9d49a694


Overall Assessment

✅ Looks good

Executive Summary

Methodology

Code Quality

Performance

Maintainability

Tech Debt

Security

  • No findings.

Documentation/Tests

…upgrade + zero-weight-invariant nuisance selectors

Source-validation pass for Chen, Sant'Anna & Xie (2025, arXiv:2506.17729v1);
promotes the METHODOLOGY_REVIEW.md row to Complete.

- Covariate doubly-robust outcome regression m_hat(X): linear OLS -> growing
  polynomial sieve (dimension-penalized AIC/BIC, no fixed K<=5 ceiling), so the
  covariate path attains the semiparametric efficiency bound under Assumption
  C.1; the K<=5 cap is removed from the two pre-existing propensity sieves too.
- Theorem A.1 Hausman statistic extracted into a behavior-preserving
  _hausman_quadratic_form helper; criterion guard on estimate_outcome_regression.
- Zero-weight (survey-subpopulation / padded) rows are now fully inert for the DR
  point estimate: all three sieve order-selectors key off the positive-weight
  support (n_pos = sum(w>0)) for auto-k_max / the n_basis cap / the IC sample-size
  terms, and the auto Silverman bandwidth for the kernel Omega*(X) is evaluated on
  the positive-weight support (this last only bites overidentified H>1 cells where
  Omega* feeds the efficient weights). Both were CI-codex P0s, both fail-under-bug
  verified.
- New tests/test_methodology_efficient_did.py: paper-equation Verified Components
  (weights, generated-outcome telescoping, no-cov closed form, PT-Post=CS, SE,
  Hausman) + growing-sieve / propensity K>5 / zero-weight invariance (just-
  identified H=1 sieve-order AND overidentified H>1 kernel-bandwidth).
- HRS Table 6 anchor tightened 0.1*SE -> 0.05*SE + data license / 656-vs-652 docs;
  REGISTRY / CHANGELOG / paper-review / docstrings reconciled to the growing-sieve
  + positive-weight-support contract; two P3 follow-ups (basis-rebuild cache,
  survey-vcov zero-weight-PSU DOF correction) tracked in TODO.md.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@igerber igerber force-pushed the feature/efficient-did-sieve-validation branch from b8ef907 to f632acb Compare June 1, 2026 19:20
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jun 1, 2026

🔁 AI review rerun (requested by @igerber)

Head SHA: f632acb9948566167d10b2a9fd47bfc85aec77d7


Overall Assessment

✅ Looks good

Executive Summary

  • Prior P1+ issues appear resolved; I found no unmitigated P0/P1 findings in the changed diff.
  • Affected method: EfficientDiD covariate DR path and Hausman PT-All/PT-Post pretest.
  • The sieve outcome-regression upgrade is documented in REGISTRY.md and matches the local Chen/Sant’Anna/Xie review artifact’s §4 flexible-nuisance contract.
  • sieve_k_max / sieve_criterion propagation is complete across validation, fit-time nuisances, get_params(), and result metadata.
  • Remaining issues are P3/tracked: zero-weight survey SE drift and avoidable sieve-basis recomputation.
  • Verification was static only; this runner has no pytest or numpy, so I could not execute tests.

Methodology

  • Severity: P3-informational
    Impact: Remaining deviations are documented implementation choices, not defects: polynomial basis choice, Silverman bandwidth, effective-rank Hausman DOF, cohort-size overall_att, fixed-weight bootstrap aggregation, and cluster/survey EIF extensions.
    Concrete fix: None required. See docs/methodology/REGISTRY.md:L1060-L1068 and METHODOLOGY_REVIEW.md:L658-L663.

  • Severity: No finding
    Impact: The changed outcome regression now uses the documented polynomial sieve with AIC/BIC order selection and positive-weight support handling. Fit-time calls pass the same sieve controls into outcome regression, propensity ratio, and inverse propensity helpers.
    Concrete fix: None. See diff_diff/efficient_did_covariates.py:L36-L240, diff_diff/efficient_did.py:L918-L1018.

Code Quality

  • Severity: No finding
    Impact: New parameters are validated, returned by get_params(), threaded through all nuisance fits, and stored on EfficientDiDResults.
    Concrete fix: None. See diff_diff/efficient_did.py:L300-L427, diff_diff/efficient_did_results.py:L126-L186.

  • Severity: No finding
    Impact: No new inline inference anti-pattern found in the changed EfficientDiD paths; pointwise and aggregate inference still routes through safe_inference().
    Concrete fix: None. See diff_diff/efficient_did.py:L1096-L1128, diff_diff/efficient_did.py:L1623-L1689.

Performance

  • Severity: P3-informational, tracked in TODO.md
    Impact: The growing-sieve path rebuilds polynomial bases per candidate K in each nuisance helper, which can add avoidable cost for large covariate-adjusted fits.
    Concrete fix: Cache _polynomial_sieve_basis(X, K) per fit and share it across the three sieve helpers. Already tracked at TODO.md:L166.

Maintainability

  • Severity: No finding
    Impact: _hausman_quadratic_form() extraction preserves the restricted-minus-efficient covariance direction and makes effective-rank behavior directly testable.
    Concrete fix: None. See diff_diff/efficient_did.py:L142-L202, diff_diff/efficient_did.py:L1917-L1937.

Tech Debt

  • Severity: P3-informational, tracked in TODO.md
    Impact: Survey sandwich SEs are still not exactly zero-weight invariant because shared survey PSU finite-sample corrections count zero-weight PSUs. Point estimates are covered as invariant; SE drift is documented and tracked.
    Concrete fix: Count only positive-weight PSUs inside _compute_stratified_psu_meat. See TODO.md:L90, tests/test_methodology_efficient_did.py:L856-L862.

Security

  • Severity: No finding
    Impact: I did not see secrets, credential material, or new unsafe I/O/network behavior in the changed files.
    Concrete fix: None.

Documentation/Tests

  • Severity: No finding
    Impact: The new methodology tests cover the main changed surfaces: nonlinear sieve OR, linear fallback, invalid criterion guard, growing order above 5, propensity sieve cap removal, Hausman direction/rank, and zero-weight survey invariance for both order selection and overidentified bandwidth paths.
    Concrete fix: None. See tests/test_methodology_efficient_did.py:L443-L618, tests/test_methodology_efficient_did.py:L725-L764, tests/test_methodology_efficient_did.py:L826-L940.

  • Verification note: Could not run tests because python -m pytest failed with No module named pytest, and direct import checks failed with No module named numpy.

@igerber igerber added the ready-for-ci Triggers CI test workflows label Jun 1, 2026
@igerber igerber merged commit 1bf0002 into main Jun 1, 2026
33 of 34 checks passed
@igerber igerber deleted the feature/efficient-did-sieve-validation branch June 1, 2026 21:34
@igerber igerber mentioned this pull request Jun 2, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready-for-ci Triggers CI test workflows

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant