#864 development constant tail bug#868
Conversation
…eds to be reduced by the tail
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #868 +/- ##
==========================================
+ Coverage 86.90% 87.02% +0.12%
==========================================
Files 87 87
Lines 4932 4978 +46
Branches 624 634 +10
==========================================
+ Hits 4286 4332 +46
Misses 456 456
Partials 190 190
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
…nto #864_DevelopmentConstant_Tail_Bug
| ): | ||
| patterns[int(ddim)] = 1.0 | ||
|
|
||
| fit_patterns = patterns if self.style == "ldf" else cdf_patterns |
There was a problem hiding this comment.
Still keeping track of both LDFs and CDFs.
| ldf = xp.array([float(fit_patterns[int(item)]) for item in obj.ddims]) | ||
| ldf = ldf[None, None, None, :] |
There was a problem hiding this comment.
Same as before.
| ldf = xp.concatenate((ldf[..., :-1] / ldf[..., 1:], ldf[..., -1:]), -1) | ||
|
|
||
| # apply tail_cdf to the last ldfs of the triangle | ||
| ldf[..., -1] = ldf[..., -1] * tail_cdfs |
There was a problem hiding this comment.
This is after following through everything, and just need to multiply the last element with the tail.
|
Hopefully this helps a bit. I can try to improve the readability to reduce tech debt a bit more tomorrow. |
|
@henrydingliu do you have Claude code? |
…nto #864_DevelopmentConstant_Tail_Bug
it's a lot of catching of corner cases. i actually just had an idea after reviewing your other PR. you know how @genedan created a fake 2nd latest diagonal in all the friedland data? what if we just take that idea to implement
i develop in databricks. so i just use the built-in AI for basic syntax type acceleration/debugging. this is what the databricks AI tells me when I ask what model it is. I’m an OpenAI-provided assistant accessed through Databricks, but the specific model identifier may be abstracted away by the platform. If you need the exact deployment/model name, the best place to check is your Databricks or API configuration where this assistant was set up. I also have a paid copilot 365. i do a lot of ideation and high-level solution design through that. Claude Opus is one of the models available. |
Hmm ok, I tried a few times to ask Cursor to refactor my code and it worked pretty terribly. I am going to go through another pass to do more cleanup but feel free to give it a try! |
| ).fit_transform(raa) | ||
| assert np.all( | ||
| np.round(result.cdf_.to_frame().values.flatten(), 6) | ||
| == np.array([1.1, 1.1, 1.1, 1.1, 1.1, 1.1, 1.1, 1.1, 1.1, 1.1]) |
There was a problem hiding this comment.
isn't this the same problem as what you originally wanted to address? i.e. fed 11 cdf into developmentconstant but only get 10 back
There was a problem hiding this comment.
ah, okay. the original code only gave you 9 back
There was a problem hiding this comment.
on third thought, i think this test actually shows a bug. the ldf_ would show 120-132 to be 1.1, which is not what was originally supplied.
There was a problem hiding this comment.
Sorry, you think the RHS is wrong? Why? The cdf from 120-ult is 1.1, as supplied?
There was a problem hiding this comment.
i'm not saying that the rhs is wrong. i'm saying this test isn't capturing an error in the untested ldf_. the ldf_ at 120 coming out of DevelopmentConstant is 1.1, but the actual ldf at 120 that was supplied to the estimator was 1.
| obj = self._set_fit_groups(X).val_to_dev().copy() | ||
|
|
||
| xp = obj.get_array_module() | ||
| obj = obj.iloc[..., :1, :-1]*0+1 |
There was a problem hiding this comment.
i think this is the only line you have to change.
if self.style == "cdf":
obj = obj.iloc[..., :1, :]*0+1
else:
obj = obj.iloc[..., :1, :-1]*0+1There was a problem hiding this comment.
Nah, that's the lazy way. You aren't going to catch all the edge cases. Just bring my tests in and try your code. You'll fail a bunch.
There was a problem hiding this comment.
obj = obj.iloc[..., :1, :]*0+1 is fundamentally changing the structure, this is now saying all development period will now develop one more, including the oldest origin period.
And if it's LDF style, then you don't? This is wrong.
There was a problem hiding this comment.
Try to bring my tests in, and you can see if you can catch all the edge cases more cleanly. I'm sure there's a way.
I think for this PR, reviewing the tests is actually more important than the code itself.
There was a problem hiding this comment.
Nah, that's the lazy way. You aren't going to catch all the edge cases. Just bring my tests in and try your code. You'll fail a bunch.
it definitely is the lazy way. it's also done in this PR, literally just a few lines down
obj = obj.iloc[..., :1, dev_slice] * 0 + 1
Try to bring my tests in, and you can see if you can catch all the edge cases more cleanly
good idea. i can do that
There was a problem hiding this comment.
Oh and if you want to try to use the tests and let AI solve it, you can give that a try if you have access to a good AI agent.
Cursor couldn't do it and just kept iterating itself until I killed it.
There was a problem hiding this comment.
Ha, it's very possible I made mistakes on the tests. So let's make sure the tests are right.
Your implementation is much cleaner, let's just make sure it can catch all the edge cases. I think we are super close.
There was a problem hiding this comment.
i think each test needs to test both the ldf_ and the cdf_. we'd just have to manually calculate the cdf/ldf from the supplied vector.
| 1.4641, | ||
| 1.331, | ||
| 1.21, | ||
| ] |
There was a problem hiding this comment.
i think this test actually shows a bug. the ldf_ would show 120-132 to be 1.21, which is not what was originally supplied.
There was a problem hiding this comment.
The CDF, not LDF, is 1.21. Line 365 checks the CDF.
There was a problem hiding this comment.
yes, the cdf is 121. that is correct. but the underlying ldf_ at 120 is also 1.21, which is not what was supplied to the estimator
There was a problem hiding this comment.
Let me check... I will add tests to check both the cdf_ and ldf_
| else: | ||
| raise ValueError("callable axis needs to be 0 or 1") | ||
|
|
||
| patterns = self.patterns(rows.iloc[0]) |
There was a problem hiding this comment.
Callable path determines shape from first row only
Low Severity
When patterns is callable, self.patterns(rows.iloc[0]) is called to determine include_last and dev_slice based on the first row's pattern length. Each subsequent row is then independently processed in _callable_row, which may produce a different row_tail_cdf. If different rows return patterns of different lengths, the obj skeleton shape (determined solely by the first row) may be inappropriate for other rows — for example, if the first row's pattern is short (include_last=False) but another row's is long, the obj.ddims will have one fewer period than that row needs.
Additional Locations (1)
Reviewed by Cursor Bugbot for commit 6f28c21. Configure here.
| ) | ||
| assert np.all( | ||
| np.round(result.ldf_.to_frame().values.flatten(), 6) | ||
| == np.array([1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.1]) |
There was a problem hiding this comment.
this doesn't match "reported_pattern"
There was a problem hiding this comment.
It's not supposed to be, the reported_pattern is in CDF form, but the LHS/RHS check here is in LDF form.
There was a problem hiding this comment.
Let me know if you agree lol.
I had to be super careful, made many mistakes before. Just double check to see if you are aligned.
There was a problem hiding this comment.
the 1.1 comes at the 10th in the returned ldf_, instead of the 11th element implied by the original pattern
There was a problem hiding this comment.
Because the data object doesn't have the 11th origin period?
There was a problem hiding this comment.
right. so we have an issue here because the ldf_ returns changes, beyond filling with additional 1.0's, depending on how large the triangle is. in theory, we would want DevelopmentConstant().fit(trI_9x9).ldf_[:-1] to be equal to DevelopmentConstant().fit(trI_8x8).ldf_ but that's not happening
There was a problem hiding this comment.
Let me reply you on the main thread.
| 1.1, | ||
| 1.1, | ||
| 1.1, | ||
| 1.1**2, |
There was a problem hiding this comment.
this doesn't match reported_patterns
There was a problem hiding this comment.
It does? The only thing that's different is the last LDF, those need to be grouped, or it will be incorrectly discarded.
There was a problem hiding this comment.
there's no 1.21 in the original ldf pattern.
There was a problem hiding this comment.
Yes because the pattern extends beyond what is needed by the data object. Are you suggesting that you just discard the last one?
There was a problem hiding this comment.
no, the PR is what's discarding the last ldf. i'm saying we need to keep the last ldf, that would make it consistent with Development
raa = cl.load_sample('raa')
dev = cl.Development().fit(raa)
raa_1987 = raa[(raa.valuation <= '1988-01-01')]
print(dev.transform(raa).ldf_)
print(dev.transform(raa_1987).ldf_) 12-24 24-36 36-48 48-60 60-72 72-84 84-96 96-108 108-120
(All) 2.999359 1.623523 1.270888 1.171675 1.113385 1.041935 1.033264 1.016936 1.009217
12-24 24-36 36-48 48-60 60-72 72-84 84-96 96-108 108-120
(All) 2.999359 1.623523 1.270888 1.171675 1.113385 1.041935 1.033264 1.016936 1.009217
So I think the above is only true if the pattern supplied is shorter than a length of 7. Try this example, and comment/uncomment the pattern length, everything is working as expected to me. When pattern is raa10x10 = cl.load_sample("raa")
raa9x9 = raa10x10[raa10x10.valuation<'1990']
reported_patterns = {
12: 1.1,
24: 1.1,
36: 1.1,
48: 1.1,
60: 1.1,
72: 1.1,
84: 1.1,
96: 1.1,
108: 1.1,
# 120: 1.1,
# 132: 1.1,
}
raa10x10_ldf_ = cl.DevelopmentConstant(
patterns=reported_patterns, style="ldf"
).fit_transform(raa10x10).ldf_
raa9x9_ldf_ = cl.DevelopmentConstant(
patterns=reported_patterns, style="ldf"
).fit_transform(raa9x9).ldf_
print(raa10x10_ldf_)
print(raa9x9_ldf_)In raa10x10 = cl.load_sample("raa")
raa9x9 = raa10x10[raa10x10.valuation<'1990']
reported_patterns = {
12: 1.1,
24: 1.1,
36: 1.1,
48: 1.1,
60: 1.1,
72: 1.1,
84: 1.1,
96: 1.1,
108: 1.1,
# 120: 1.1,
# 132: 1.1,
}
raa10x10_ldf_ = cl.DevelopmentConstant(
patterns=reported_patterns, style="cdf"
).fit_transform(raa10x10).ldf_
raa9x9_ldf_ = cl.DevelopmentConstant(
patterns=reported_patterns, style="cdf"
).fit_transform(raa9x9).ldf_
print(raa10x10_ldf_)
print(raa9x9_ldf_) |
|
Sorry, but I am not following your question/feedback, but can you give me an example of where you think my implementation doesn't give the correct answer, and what the correct answer should be? |
right. when the supplied ldf or cdf is longer, the resulting ldf_ is distorted. right now you are forcing those tests to pass by comparing the resulting ldf_ to something other than what was supplied. |
this implementation doesn't give the right answer when the supplied pattern is longer. the correct answer should replicate the supplied ldf or the implied ldf of the suppled cdf exactly (not counting factors of 1 to fill the space). basically, if elsewhere in the package we want stuff like raa = cl.load_sample('raa')
dev = cl.Development().fit(raa)
raa_1987 = raa[(raa.valuation <= '1988-01-01')]
print(dev.transform(raa).ldf_)
print(dev.transform(raa_1987).ldf_) |
I don't think I agree with this. You are saying to just discard the extra LDF pattern beyond the triangle object? Look at this example: raa10x10 = cl.load_sample("raa")
raa9x9 = raa10x10[raa10x10.valuation<'1990']
reported_patterns = {
12: 1.1,
24: 1.1,
36: 1.1,
48: 1.1,
60: 1.1,
72: 1.1,
84: 1.1,
96: 1.1,
108: 1.1, #9th LDF, or if this is a CDF, it would've been 1.21
120: 1.1, #10th's LDF, not CDF
# 132: 1.1,
}
raa10x10_ldf_ = cl.DevelopmentConstant(
patterns=reported_patterns, style="ldf"
).fit_transform(raa10x10).ldf_
raa9x9_ldf_ = cl.DevelopmentConstant(
patterns=reported_patterns, style="ldf"
).fit_transform(raa9x9).ldf_
print(raa10x10_ldf_)
print(raa9x9_ldf_)Returns: 12-24 24-36 36-48 48-60 60-72 72-84 84-96 96-108 108-120 120-132
(All) 1.1 1.1 1.1 1.1 1.1 1.1 1.1 1.1 1.1 1.1
12-24 24-36 36-48 48-60 60-72 72-84 84-96 96-108 108-120
(All) 1.1 1.1 1.1 1.1 1.1 1.1 1.1 1.1 1.21
# ^ this 9th's LDF should
# be 1.1 instead of 1.21You are saying, we should just discard the remaining
This is different, in this example, you are estimating patterns using a set of data, and estimating another pattern using a subset of that data, it should be clear that the pattern estimated using the subset lacks something (i.e. the tail). Let me try to se if I can convince you with another example: Do you agree that these two patterns are the same? DC_LDF = cl.DevelopmentConstant(
patterns={
12: 1.1,
24: 1.1,
36: 1.1,
48: 1.1,
60: 1.1,
72: 1.1,
84: 1.1,
96: 1.1,
108: 1.1,
120: 1.1,
}, style="ldf"
)
DC_CDF = cl.DevelopmentConstant(
patterns={
12: 1.1**10,
24: 1.1**9,
36: 1.1**8,
48: 1.1**7,
60: 1.1**6,
72: 1.1**5,
84: 1.1**4,
96: 1.1**3,
108: 1.1**2,
120: 1.1,
}, style="cdf"
)If so, then you should get the same CDF, not LDF, no matter what triangle object you fit on. Again, in my opinion, one of the current implementation flaw that I think is that if the pattern provided is shorter, and it is in LDF form, the extra pattern is basically discarded. My PR fixes that. |
…nto #864_DevelopmentConstant_Tail_Bug
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
There are 2 total unresolved issues (including 1 from previous review).
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 881f922. Configure here.
|
@henrydingliu do you agree with my latest post? I am trying to resolve the last bug bot comment. |
|
@henrydingliu do you have notifications muted? 😜 |


Summary of Changes
Addressed two bugs in the DevelopmentConstant()
Related GitHub Issue(s)
Fixes #864
Additional Context for Reviewers
This PR fixes both bugs, even though only 1 is reported on #864.
There was also an old bug(?) in
test_constant_callable_axis1, not sure whypatterns.valueshadpatterns.values[:, :-1]dropped the last value. This is corrected.uv run pytest) and documentation changes (uv run jb build docs --builder=custom --custom-builder=doctest)Note
Medium Risk
Changes core reserve development pattern logic (LDF/CDF/tail) used in projections; behavior is well covered by new tests but affects actuarial outputs when patterns differ in length or style from the triangle.
Overview
DevelopmentConstantnow normalizes external age→factor patterns through a new_prepare_cdf_patternspath: LDF inputs are converted to CDFs before fitting, patterns longer than the triangle keep a tail factor (rebased in-triangle CDFs + tail on the last LDF) instead of dropping ages beyond the triangle, and shorter patterns warn and pad missing development ages with 1.0. Callable patterns use the same preparation per row, with clearer handling of whether the last development period is included.Tests add coverage for tail/no-tail, exact/short/long CDF and LDF patterns, incremental triangles, and fix
test_constant_callable_axis1to compare full CDF values (not[:, :-1]).utility_functionsonly has trivial import/formatting cleanup.Reviewed by Cursor Bugbot for commit a04a8ca. Bugbot is set up for automated code reviews on this repo. Configure here.