Fail HTLCs from late counterparty commitment updates in ChannelMonitor by joostjager · Pull Request #4434 · lightningdevkit/rust-lightning

joostjager · 2026-02-23T09:45:12Z

ldk-reviews-bot · 2026-02-23T09:45:15Z

👋 Thanks for assigning @wpaulino as a reviewer!
I'll wait for their review and will help manage the review process.
Once they submit their review, I'll check if a second reviewer would be helpful.

codecov · 2026-02-23T12:51:54Z

Codecov Report

❌ Patch coverage is 92.72388% with 39 lines in your changes missing coverage. Please review.
✅ Project coverage is 86.05%. Comparing base (90b79e4) to head (56c813f).
⚠️ Report is 31 commits behind head on main.

Files with missing lines	Patch %	Lines
lightning/src/chain/chainmonitor.rs	90.22%	22 Missing and 4 partials ⚠️
lightning/src/chain/channelmonitor.rs	91.66%	7 Missing and 3 partials ⚠️
lightning/src/util/test_utils.rs	96.87%	3 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #4434      +/-   ##
==========================================
+ Coverage   85.97%   86.05%   +0.08%     
==========================================
  Files         159      159              
  Lines      104722   105759    +1037     
  Branches   104722   105759    +1037     
==========================================
+ Hits        90030    91013     +983     
- Misses      12191    12227      +36     
- Partials     2501     2519      +18

Flag	Coverage Δ
tests	`86.05% <92.72%> (+0.08%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

lightning/src/ln/channelmanager.rs

ldk-reviews-bot · 2026-02-23T20:02:42Z

👋 The first review has been submitted!

Do you think this PR is ready for a second reviewer? If so, click here to assign a second reviewer.

TheBlueMatt

Hmmmmmmmmm, its really not quite so simple. If a monitor update is InProgress we generally expect the in-memory ChannelMonitor to have actually seen the new state update (though we expect that on restart there's a good chance it won't have that information and it'll be replayed). That means the monitor is theoretically "in charge" of resolving the HTLCs in question. While I think this patch is correct, I'm a bit hesitant to end up with two things in charge of resolving a specific HTLC. ISTM it also wouldn't be so complicated to immediately mark HTLCs as failed in the monitor upon receiving updates that try to add new HTLCs after a channel has been closed.

joostjager · 2026-02-24T07:32:55Z

ISTM it also wouldn't be so complicated to immediately mark HTLCs as failed in the monitor upon receiving updates that try to add new HTLCs after a channel has been closed.

Isn't the problem with this that the monitor can't see whether the commitment was already sent to the peer or blocked by async persistence?

TheBlueMatt · 2026-02-24T12:55:21Z

Right, the monitor has to assume it may have been sent to the peer. That's fine, though, it should be able to fail the HTLC(s) once the commitment transaction is locked.

joostjager · 2026-02-24T15:20:57Z

Hmm, it seems that the original issue isn't an issue then. I confirmed that mining the force close commitment tx indeed generates the payment failed event. Or is there still something missing?

TheBlueMatt · 2026-02-24T16:22:50Z

In the general case (ChannelMonitor in-memory in the same machine that runs ChannelManager) I believe there is not a runtime issue, no. I was thinking you'd identified an issue in the crash case, though (where we crash without persisting the monitor update and on restart lose it) but actually I'm not sure this is any different from existing payments logic - if you start a payment and nothing persists on restart its expected to be gone and downstream code has to handle that.

joostjager · 2026-02-24T16:38:20Z

The one that I identified wasn't a crash case. An assertion in the fuzzer identified that a payment was lost, but with current understanding I think it's just that the fuzzer didn't push the force-close forward enough (mine). Will close it for now and can look again if the payment remains lost even with confirmation of the commit tx. Unit test did confirm that it should just work.

joostjager · 2026-03-02T08:07:16Z

Reopening as this fix is now required for #4351.

joostjager · 2026-03-02T08:10:56Z

Hmmmmmmmmm, its really not quite so simple. If a monitor update is InProgress we generally expect the in-memory ChannelMonitor to have actually seen the new state update (though we expect that on restart there's a good chance it won't have that information and it'll be replayed). That means the monitor is theoretically "in charge" of resolving the HTLCs in question. While I think this patch is correct, I'm a bit hesitant to end up with two things in charge of resolving a specific HTLC. ISTM it also wouldn't be so complicated to immediately mark HTLCs as failed in the monitor upon receiving updates that try to add new HTLCs after a channel has been closed.

Now that we've decided that for deferred writes we no longer want to expect the in-memory ChannelMonitor to have seen the changes, I think the above doesn't apply anymore, and we might have to accept that two things are in charge for resolving a specific HTLC.

lightning/src/ln/channel.rs

lightning/src/chain/channelmonitor.rs

TheBlueMatt

Now that we've decided that for deferred writes we no longer want to expect the in-memory ChannelMonitor to have seen the changes, I think the above doesn't apply anymore, and we might have to accept that two things are in charge for resolving a specific HTLC.

I don't think so. Instead, what we'll need to do is detect a new local counterparty state ChannelMontiorUpdate after the channel has close and track it to fail the HTLCs from it once the commitment transaction has 6 confs.

TheBlueMatt · 2026-03-02T16:15:38Z

Oh actually no I don't think we need to wait for 6 confs since we never sent the message, ie we can fail it immediately.

joostjager · 2026-03-03T10:29:58Z

Oh actually no I don't think we need to wait for 6 confs since we never sent the message, ie we can fail it immediately.

Does the monitor know that a message was never sent? #4434 (comment) suggests that we should assume we did send.

I steered Claude to a potential fix: main...joostjager:rust-lightning:chain-mon-internal-deferred-writes-fix-contract. I based it on the deferred writes branch so that tests can get delayed in-memory updates.

Is this the direction you have in mind? It is unfortunate that we need to add this extra logic to a system that is already complicated.

ldk-reviews-bot · 2026-03-12T11:18:16Z

🔔 1st Reminder

Hey @wpaulino! This PR has been waiting for your review.
Please take a look when you have a chance. If you're unable to review, please let us know so we can find another reviewer.

ldk-reviews-bot · 2026-03-14T11:18:50Z

🔔 2nd Reminder

Hey @wpaulino! This PR has been waiting for your review.
Please take a look when you have a chance. If you're unable to review, please let us know so we can find another reviewer.

ldk-reviews-bot · 2026-03-16T11:19:04Z

🔔 3rd Reminder

Hey @wpaulino! This PR has been waiting for your review.
Please take a look when you have a chance. If you're unable to review, please let us know so we can find another reviewer.

wpaulino

LGTM, though CI is sad likely due to the failed_back_htlc_ids change.

joostjager · 2026-03-17T08:59:22Z

The round-trip check failed because failed_back_htlc_ids is not serialized. We work around it by temporarily copying the set onto the deserialized monitor before comparison, then clearing it. It's ugly but avoids modifying production code beyond making the field pub(crate).

lightning/src/chain/channelmonitor.rs

ldk-claude-review-bot · 2026-03-17T09:22:46Z

lightning/src/chain/chainmonitor.rs

+				None => {
+					debug_assert!(false, "flush count exceeded queue length");
+					return;
+				},


In release builds, this silently returns without logging when count exceeds the queue length. This can happen if flush is called concurrently (e.g., from TestChainMonitor::release_pending_monitor_events auto-flush racing with the background processor's explicit flush). A log_error! before the return would make such races observable in production, rather than silently under-flushing:

Suggested change

None => {

debug_assert!(false, "flush count exceeded queue length");

return;

},

None => {

debug_assert!(false, "flush count exceeded queue length");

log_error!(logger, "flush count exceeded queue length");

return;

},

lightning/src/chain/chainmonitor.rs

lightning/src/util/test_utils.rs

ldk-claude-review-bot · 2026-03-17T09:23:41Z

TheBlueMatt · 2026-03-17T14:00:32Z

I think feel free to squash so we can land this. Claude had one comment that might be worth fixing, up to you.

joostjager · 2026-03-17T16:29:45Z

Squashed, but needs to be rebased after #4351. Added the Claude suggestion to also log excessive flush count.

ldk-claude-review-bot · 2026-03-17T16:44:46Z

lightning/src/chain/channelmonitor.rs

+				});
+			} else {
+				// Funding spend still awaiting ANTI_REORG_DELAY: queue the failure.
+				let (txid, transaction, height, block_hash) = pending_spend_entry.clone().unwrap();


Nit/perf: pending_spend_entry.clone().unwrap() is called inside the per-HTLC loop, cloning the Option<Transaction> each iteration. For a commitment with many HTLCs, this does N clones of potentially large transaction data. Consider destructuring once before the loop:

Suggested change

let (txid, transaction, height, block_hash) = pending_spend_entry.clone().unwrap();

let pending_data = pending_spend_entry.clone();

for (source, payment_hash, amount_msat) in htlcs {

if is_source_known(source) {

continue;

}

if self.counterparty_fulfilled_htlcs.get(&SentHTLCId::from_source(source)).is_some() {

continue;

}

let htlc_value_satoshis = Some(amount_msat / 1000);

let logger = WithContext::from(logger, None, None, Some(payment_hash));

// Defensively mark the HTLC as failed back so the expiry-based failure

// path in `block_connected` doesn't generate a duplicate `HTLCUpdate`

// event for the same source.

self.failed_back_htlc_ids.insert(SentHTLCId::from_source(source));

if let Some(confirmed_txid) = self.funding_spend_confirmed {

Or alternatively, extract the pending_data tuple once and clone only transaction per entry (Txid and BlockHash are Copy).

Not sure if worth it, the tx is the large part of the clone anyway.

lightning/src/chain/chainmonitor.rs

ldk-claude-review-bot · 2026-03-17T16:45:13Z

lightning/src/chain/channelmonitor.rs

+		// Check HTLC sources against all previously-known commitments to find truly new
+		// ones. After the update has been applied, `prev_counterparty_commitment_txid` holds
+		// what was `current` before this update, so it represents the already-known
+		// counterparty state. HTLCs already present in any of these will be handled by
+		// `fail_unbroadcast_htlcs` when the spending transaction confirms.
+		let is_source_known = |source: &HTLCSource| {
+			if let Some(ref txid) = self.funding.prev_counterparty_commitment_txid {
+				if let Some(htlc_list) = self.funding.counterparty_claimable_outpoints.get(txid) {
+					if htlc_list.iter().any(|(_, s)| s.as_ref().map(|s| s.as_ref()) == Some(source))
+					{
+						return true;
+					}
+				}
+			}
+			// Note that we don't care about the case where a counterparty sent us a fresh local commitment transaction
+			// post-closure (with the `ChannelManager` still operating the channel). First of all we only care about
+			// resolving outbound HTLCs, which fundamentally have to be initiated by us. However we also don't mind
+			// looking at the current holder commitment transaction's HTLCs as any fresh outbound HTLCs will have to
+			// first come in a locally-initiated update to the counterparty's commitment transaction which we can, by
+			// refusing to apply the update, prevent the counterparty from ever seeing (as no messages can be sent until
+			// the monitor is updated). Thus, the HTLCs we care about can never appear in the holder commitment
+			// transaction.
+			if holder_commitment_htlcs!(self, CURRENT_WITH_SOURCES).any(|(_, s)| s == Some(source))
+			{
+				return true;
+			}
+			if let Some(mut iter) = holder_commitment_htlcs!(self, PREV_WITH_SOURCES) {
+				if iter.any(|(_, s)| s == Some(source)) {
+					return true;
+				}
+			}
+			false
+		};


The is_source_known closure only checks prev_counterparty_commitment_txid (what was current before this update), not all entries in counterparty_claimable_outpoints. The comment at line 4413 explains the reasoning, and the holder commitment fallback covers additional cases.

Just want to confirm a subtle assumption: this correctness relies on the fact that the monitor always holds the immediately-prior counterparty commitment in prev after the update is applied (line 3502). If a single ChannelMonitorUpdate ever contained two LatestCounterpartyCommitment steps, the second step's iteration here would check prev (which is now the first step's commitment, not the original pre-batch state), and HTLCs from the first step would appear "known" even if they were truly new. The code is correct as long as each ChannelMonitorUpdate contains at most one counterparty commitment step — is that guaranteed at the ChannelManager level?

Analysis shows it isn't possible to have two LatestCounterpartyCommitment in the same update. Interesting find though.

When a ChannelMonitorUpdate containing a new counterparty commitment is dispatched (e.g. via deferred writes) before a channel force-closes but only applied to the in-memory monitor after the commitment transaction has already confirmed on-chain, the outbound HTLCs in that update must be failed back. Add fail_htlcs_from_update_after_funding_spend to ChannelMonitorImpl which detects this race condition during update_monitor. When a LatestCounterpartyCommitmentTXInfo or LatestCounterpartyCommitment update is applied and the funding output has already been spent, the function iterates all outbound HTLCs from the update and creates OnchainEvent::HTLCUpdate entries for those that need to be failed back. These entries mature after ANTI_REORG_DELAY blocks, giving time for the peer to potentially broadcast the newer commitment. HTLCs that appear as non-dust outputs in the confirmed commitment (whether counterparty or holder) are skipped, as they will be resolved on-chain via the normal HTLC timeout/success path. HTLCs already fulfilled by the counterparty (tracked in counterparty_fulfilled_htlcs) are also skipped. Duplicate failures from previously-known counterparty commitments are handled gracefully by the ChannelManager. AI tools were used in preparing this commit.

joostjager · 2026-03-18T21:02:40Z

Rebased after merge of #4351

ldk-claude-review-bot · 2026-03-18T21:25:34Z

lightning/src/chain/channelmonitor.rs

+			// Note that we don't care about the case where a counterparty sent us a fresh local commitment transaction
+			// post-closure (with the `ChannelManager` still operating the channel). First of all we only care about
+			// resolving outbound HTLCs, which fundamentally have to be initiated by us. However we also don't mind
+			// looking at the current holder commitment transaction's HTLCs as any fresh outbound HTLCs will have to
+			// first come in a locally-initiated update to the counterparty's commitment transaction which we can, by
+			// refusing to apply the update, prevent the counterparty from ever seeing (as no messages can be sent until
+			// the monitor is updated). Thus, the HTLCs we care about can never appear in the holder commitment
+			// transaction.


Nit: The comment concludes "the HTLCs we care about can never appear in the holder commitment transaction" — yet lines 4443-4451 proceed to check them anyway. Could you rephrase to say this is a defensive/belt-and-suspenders check? Currently it reads like the check below shouldn't be needed at all.

ldk-claude-review-bot · 2026-03-18T21:25:43Z

lightning/src/chain/channelmonitor.rs

+		let is_source_known = |source: &HTLCSource| {
+			if let Some(ref txid) = self.funding.prev_counterparty_commitment_txid {
+				if let Some(htlc_list) = self.funding.counterparty_claimable_outpoints.get(txid) {
+					if htlc_list.iter().any(|(_, s)| s.as_ref().map(|s| s.as_ref()) == Some(source))
+					{
+						return true;
+					}
+				}
+			}


Minor: is_source_known checks self.funding.prev_counterparty_commitment_txid but not self.pending_funding[*].prev_counterparty_commitment_txid. In a splice scenario where the confirmed funding spend is for a pending funding output, an HTLC that was already known in that pending funding's previous counterparty commitment would be incorrectly treated as "new" and failed. Worth a TODO or a check against pending funding scopes as well?

ldk-claude-review-bot · 2026-03-18T21:25:52Z

lightning/src/chain/channelmonitor.rs

+			// When the counterparty commitment confirms, FundingSpendConfirmation matures
+			// immediately (no CSV delay), so funding_spend_confirmed is set. The new payment's
+			// commitment update then triggers immediate HTLC failure, generating payment events
+			// alongside the channel close event.


"FundingSpendConfirmation matures immediately (no CSV delay)" is misleading — FundingSpendConfirmation still requires ANTI_REORG_DELAY blocks to mature (the on_local_output_csv only adds additional delay beyond ANTI_REORG_DELAY). It appears that the payment failure events here actually come from the ChannelManager's force-close logic (failing the in-flight HTLC when CommitmentTxConfirmed is processed), not from the monitor's immediate HTLC failure path. Consider clarifying the comment.

TheBlueMatt

Thanks. Changes since @wpaulino's ACK are trivial so gonna land.

joostjager force-pushed the fix-force-close-in-progress-monitor-drops-htlc branch from 2d36c03 to cce8ccb Compare February 23, 2026 10:08

joostjager marked this pull request as ready for review February 23, 2026 13:49

joostjager requested a review from wpaulino February 23, 2026 17:34

wpaulino reviewed Feb 23, 2026

View reviewed changes

lightning/src/ln/channelmanager.rs Outdated Show resolved Hide resolved

wpaulino requested a review from TheBlueMatt February 23, 2026 20:03

joostjager force-pushed the fix-force-close-in-progress-monitor-drops-htlc branch from cce8ccb to ad3036e Compare February 23, 2026 21:11

TheBlueMatt reviewed Feb 23, 2026

View reviewed changes

joostjager closed this Feb 24, 2026

This was referenced Feb 24, 2026

Payment silently stuck when channel force-closed during in-progress monitor update #4431

Closed

Defer ChainMonitor updates and persistence to flush() #4351

Merged

joostjager reopened this Mar 2, 2026

joostjager commented Mar 2, 2026

View reviewed changes

lightning/src/ln/channel.rs Outdated Show resolved Hide resolved

joostjager requested a review from TheBlueMatt March 2, 2026 08:13

joostjager force-pushed the fix-force-close-in-progress-monitor-drops-htlc branch 2 times, most recently from 6421a28 to 8dac2fa Compare March 2, 2026 08:20

joostjager commented Mar 2, 2026

View reviewed changes

lightning/src/chain/channelmonitor.rs Outdated Show resolved Hide resolved

joostjager force-pushed the fix-force-close-in-progress-monitor-drops-htlc branch from 8dac2fa to da66b8e Compare March 2, 2026 08:23

TheBlueMatt reviewed Mar 2, 2026

View reviewed changes

joostjager requested a review from wpaulino March 10, 2026 11:17

joostjager force-pushed the fix-force-close-in-progress-monitor-drops-htlc branch from 56c813f to 25c9ca3 Compare March 11, 2026 13:18

wpaulino reviewed Mar 16, 2026

View reviewed changes

joostjager force-pushed the fix-force-close-in-progress-monitor-drops-htlc branch from baa58e4 to f1364ba Compare March 17, 2026 08:58

ldk-claude-review-bot reviewed Mar 17, 2026

View reviewed changes

lightning/src/chain/channelmonitor.rs Show resolved Hide resolved

ldk-claude-review-bot reviewed Mar 17, 2026

View reviewed changes

lightning/src/chain/chainmonitor.rs Show resolved Hide resolved

ldk-claude-review-bot reviewed Mar 17, 2026

View reviewed changes

lightning/src/chain/chainmonitor.rs Show resolved Hide resolved

ldk-claude-review-bot reviewed Mar 17, 2026

View reviewed changes

lightning/src/util/test_utils.rs Show resolved Hide resolved

joostjager force-pushed the fix-force-close-in-progress-monitor-drops-htlc branch 2 times, most recently from 9999385 to 337bef1 Compare March 17, 2026 16:29

ldk-claude-review-bot reviewed Mar 17, 2026

View reviewed changes

lightning/src/chain/chainmonitor.rs Show resolved Hide resolved

ldk-claude-review-bot reviewed Mar 17, 2026

View reviewed changes

joostjager force-pushed the fix-force-close-in-progress-monitor-drops-htlc branch from 337bef1 to 3e1a18c Compare March 18, 2026 21:01

ldk-claude-review-bot reviewed Mar 18, 2026

View reviewed changes

TheBlueMatt approved these changes Mar 19, 2026

View reviewed changes

TheBlueMatt merged commit 1d0586f into lightningdevkit:main Mar 19, 2026
18 of 20 checks passed

github-project-automation bot moved this to Done in Weekly Goals Mar 19, 2026

-				let (txid, transaction, height, block_hash) = pending_spend_entry.clone().unwrap();
+		let pending_data = pending_spend_entry.clone();
+		for (source, payment_hash, amount_msat) in htlcs {
+			if is_source_known(source) {
+				continue;
+			}
+			if self.counterparty_fulfilled_htlcs.get(&SentHTLCId::from_source(source)).is_some() {
+				continue;
+			}
+			let htlc_value_satoshis = Some(amount_msat / 1000);
+			let logger = WithContext::from(logger, None, None, Some(payment_hash));
+			// Defensively mark the HTLC as failed back so the expiry-based failure
+			// path in `block_connected` doesn't generate a duplicate `HTLCUpdate`
+			// event for the same source.
+			self.failed_back_htlc_ids.insert(SentHTLCId::from_source(source));
+			if let Some(confirmed_txid) = self.funding_spend_confirmed {

Conversation

joostjager commented Feb 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ldk-reviews-bot commented Feb 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Feb 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

ldk-reviews-bot commented Feb 23, 2026

Uh oh!

TheBlueMatt left a comment

Choose a reason for hiding this comment

Uh oh!

joostjager commented Feb 24, 2026

Uh oh!

TheBlueMatt commented Feb 24, 2026

Uh oh!

joostjager commented Feb 24, 2026

Uh oh!

TheBlueMatt commented Feb 24, 2026

Uh oh!

joostjager commented Feb 24, 2026

Uh oh!

joostjager commented Mar 2, 2026

Uh oh!

joostjager commented Mar 2, 2026

Uh oh!

Uh oh!

Uh oh!

TheBlueMatt left a comment

Choose a reason for hiding this comment

Uh oh!

TheBlueMatt commented Mar 2, 2026

Uh oh!

joostjager commented Mar 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ldk-reviews-bot commented Mar 12, 2026

Uh oh!

ldk-reviews-bot commented Mar 14, 2026

Uh oh!

ldk-reviews-bot commented Mar 16, 2026

Uh oh!

wpaulino left a comment

Choose a reason for hiding this comment

Uh oh!

joostjager commented Mar 17, 2026

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ldk-claude-review-bot commented Mar 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

TheBlueMatt commented Mar 17, 2026

Uh oh!

joostjager commented Mar 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

joostjager commented Mar 18, 2026

Uh oh!

Choose a reason for hiding this comment

joostjager commented Feb 23, 2026 •

edited

Loading

ldk-reviews-bot commented Feb 23, 2026 •

edited

Loading

codecov bot commented Feb 23, 2026 •

edited

Loading

joostjager commented Mar 3, 2026 •

edited

Loading

ldk-claude-review-bot commented Mar 17, 2026 •

edited

Loading

joostjager commented Mar 17, 2026 •

edited

Loading