Skip to content

Commit 0d1e978

Browse files
joostjagerclaude
andcommitted
Replace dual-sync-async persistence panic with Watch contract
Commit 0760f99 ("Disallow dual-sync-async persistence without restarting") added a panic in non-test builds when a Persist implementation returns both Completed and InProgress from the same ChannelManager instance. However, this check runs against the status that ChainMonitor returns to ChannelManager, not the raw Persist result. When ChannelMonitor::update_monitor fails (e.g. a counterparty commitment_signed arrives after a funding spend confirms), ChainMonitor persists the full monitor successfully but overrides the return value to InProgress. If the user's Persist impl only ever returns Completed, this override triggers a false mode-mismatch panic. This replaces the panic with a per-channel contract at the Watch trait level: a Watch implementation must not return Completed for a channel update while prior InProgress updates are still pending. Switching from Completed to InProgress is always allowed, but switching back is impractical because the Watch implementation cannot observe when ChannelManager has finished processing a MonitorEvent::Completed. The documentation on ChannelMonitorUpdateStatus is updated to describe these rules. The mode tracking and panic checks from 0760f99 are removed and replaced with a panic that validates the new contract directly on the in-flight update state. Legacy tests that switch the persister between modes mid-flight can opt out via Node::disable_monitor_completeness_assertion(). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1 parent 9df659a commit 0d1e978

File tree

7 files changed

+43
-29
lines changed

7 files changed

+43
-29
lines changed

lightning/src/chain/channelmonitor.rs

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6817,6 +6817,7 @@ mod tests {
68176817
let legacy_cfg = test_legacy_channel_config();
68186818
let node_chanmgrs = create_node_chanmgrs(3, &node_cfgs, &[Some(legacy_cfg.clone()), Some(legacy_cfg.clone()), Some(legacy_cfg)]);
68196819
let nodes = create_network(3, &node_cfgs, &node_chanmgrs);
6820+
nodes[1].disable_monitor_completeness_assertion();
68206821
let channel = create_announced_chan_between_nodes(&nodes, 0, 1);
68216822
create_announced_chan_between_nodes(&nodes, 1, 2);
68226823

lightning/src/chain/mod.rs

Lines changed: 4 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -233,11 +233,10 @@ pub enum ChannelMonitorUpdateStatus {
233233
/// This includes performing any `fsync()` calls required to ensure the update is guaranteed to
234234
/// be available on restart even if the application crashes.
235235
///
236-
/// If you return this variant, you cannot later return [`InProgress`] from the same instance of
237-
/// [`Persist`]/[`Watch`] without first restarting.
236+
/// You cannot switch from [`InProgress`] to this variant for the same channel without first
237+
/// restarting. However, switching from this variant to [`InProgress`] is always allowed.
238238
///
239239
/// [`InProgress`]: ChannelMonitorUpdateStatus::InProgress
240-
/// [`Persist`]: chainmonitor::Persist
241240
Completed,
242241
/// Indicates that the update will happen asynchronously in the background or that a transient
243242
/// failure occurred which is being retried in the background and will eventually complete.
@@ -263,12 +262,7 @@ pub enum ChannelMonitorUpdateStatus {
263262
/// reliable, this feature is considered beta, and a handful of edge-cases remain. Until the
264263
/// remaining cases are fixed, in rare cases, *using this feature may lead to funds loss*.
265264
///
266-
/// If you return this variant, you cannot later return [`Completed`] from the same instance of
267-
/// [`Persist`]/[`Watch`] without first restarting.
268-
///
269265
/// [`InProgress`]: ChannelMonitorUpdateStatus::InProgress
270-
/// [`Completed`]: ChannelMonitorUpdateStatus::Completed
271-
/// [`Persist`]: chainmonitor::Persist
272266
InProgress,
273267
/// Indicates that an update has failed and will not complete at any point in the future.
274268
///
@@ -328,6 +322,8 @@ pub trait Watch<ChannelSigner: EcdsaChannelSigner> {
328322
/// cannot be retried, the node should shut down immediately after returning
329323
/// [`ChannelMonitorUpdateStatus::UnrecoverableError`], see its documentation for more info.
330324
///
325+
/// See [`ChannelMonitorUpdateStatus`] for requirements on when each variant may be returned.
326+
///
331327
/// [`ChannelManager`]: crate::ln::channelmanager::ChannelManager
332328
fn update_channel(
333329
&self, channel_id: ChannelId, update: &ChannelMonitorUpdate,

lightning/src/ln/chanmon_update_fail_tests.rs

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -175,6 +175,7 @@ fn do_test_simple_monitor_temporary_update_fail(disconnect: bool) {
175175
let node_cfgs = create_node_cfgs(2, &chanmon_cfgs);
176176
let node_chanmgrs = create_node_chanmgrs(2, &node_cfgs, &[None, None]);
177177
let mut nodes = create_network(2, &node_cfgs, &node_chanmgrs);
178+
nodes[0].disable_monitor_completeness_assertion();
178179

179180
let node_a_id = nodes[0].node.get_our_node_id();
180181
let node_b_id = nodes[1].node.get_our_node_id();
@@ -316,6 +317,7 @@ fn do_test_monitor_temporary_update_fail(disconnect_count: usize) {
316317
let node_cfgs = create_node_cfgs(2, &chanmon_cfgs);
317318
let node_chanmgrs = create_node_chanmgrs(2, &node_cfgs, &[None, None]);
318319
let mut nodes = create_network(2, &node_cfgs, &node_chanmgrs);
320+
nodes[0].disable_monitor_completeness_assertion();
319321

320322
let node_a_id = nodes[0].node.get_our_node_id();
321323
let node_b_id = nodes[1].node.get_our_node_id();
@@ -969,6 +971,7 @@ fn do_test_monitor_update_fail_raa(test_ignore_second_cs: bool) {
969971
let node_cfgs = create_node_cfgs(3, &chanmon_cfgs);
970972
let node_chanmgrs = create_node_chanmgrs(3, &node_cfgs, &[None, None, None]);
971973
let mut nodes = create_network(3, &node_cfgs, &node_chanmgrs);
974+
nodes[1].disable_monitor_completeness_assertion();
972975

973976
let node_a_id = nodes[0].node.get_our_node_id();
974977
let node_b_id = nodes[1].node.get_our_node_id();
@@ -1500,6 +1503,7 @@ fn claim_while_disconnected_monitor_update_fail() {
15001503
let node_cfgs = create_node_cfgs(2, &chanmon_cfgs);
15011504
let node_chanmgrs = create_node_chanmgrs(2, &node_cfgs, &[None, None]);
15021505
let mut nodes = create_network(2, &node_cfgs, &node_chanmgrs);
1506+
nodes[1].disable_monitor_completeness_assertion();
15031507

15041508
let node_a_id = nodes[0].node.get_our_node_id();
15051509
let node_b_id = nodes[1].node.get_our_node_id();
@@ -1727,6 +1731,7 @@ fn first_message_on_recv_ordering() {
17271731
let node_cfgs = create_node_cfgs(2, &chanmon_cfgs);
17281732
let node_chanmgrs = create_node_chanmgrs(2, &node_cfgs, &[None, None]);
17291733
let mut nodes = create_network(2, &node_cfgs, &node_chanmgrs);
1734+
nodes[1].disable_monitor_completeness_assertion();
17301735

17311736
let node_a_id = nodes[0].node.get_our_node_id();
17321737
let node_b_id = nodes[1].node.get_our_node_id();
@@ -3849,6 +3854,7 @@ fn do_test_durable_preimages_on_closed_channel(
38493854
// Now reload node B
38503855
let manager_b = nodes[1].node.encode();
38513856
reload_node!(nodes[1], &manager_b, &[&mon_ab, &mon_bc], persister, chain_mon, node_b_reload);
3857+
nodes[1].disable_monitor_completeness_assertion();
38523858

38533859
nodes[0].node.peer_disconnected(node_b_id);
38543860
nodes[2].node.peer_disconnected(node_b_id);

lightning/src/ln/channelmanager.rs

Lines changed: 20 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -2792,12 +2792,12 @@ pub struct ChannelManager<
27922792
#[cfg(any(test, feature = "_test_utils"))]
27932793
pub(super) per_peer_state: FairRwLock<HashMap<PublicKey, Mutex<PeerState<SP>>>>,
27942794

2795-
/// We only support using one of [`ChannelMonitorUpdateStatus::InProgress`] and
2796-
/// [`ChannelMonitorUpdateStatus::Completed`] without restarting. Because the API does not
2797-
/// otherwise directly enforce this, we enforce it in non-test builds here by storing which one
2798-
/// is in use.
2799-
#[cfg(not(any(test, feature = "_externalize_tests")))]
2800-
monitor_update_type: AtomicUsize,
2795+
/// When set, disables the panic when `Watch::update_channel` returns `Completed` while
2796+
/// prior updates are still `InProgress`. Some legacy tests switch the persister between
2797+
/// `InProgress` and `Completed` mid-flight, which violates this contract but is otherwise
2798+
/// harmless in a test context.
2799+
#[cfg(test)]
2800+
pub(crate) skip_monitor_update_assertion: AtomicBool,
28012801

28022802
/// The set of events which we need to give to the user to handle. In some cases an event may
28032803
/// require some further action after the user handles it (currently only blocking a monitor
@@ -3540,8 +3540,8 @@ impl<
35403540

35413541
per_peer_state: FairRwLock::new(new_hash_map()),
35423542

3543-
#[cfg(not(any(test, feature = "_externalize_tests")))]
3544-
monitor_update_type: AtomicUsize::new(0),
3543+
#[cfg(test)]
3544+
skip_monitor_update_assertion: AtomicBool::new(false),
35453545

35463546
pending_events: Mutex::new(VecDeque::new()),
35473547
pending_events_processor: AtomicBool::new(false),
@@ -9965,6 +9965,15 @@ This indicates a bug inside LDK. Please report this error at https://github.com/
99659965
if update_completed {
99669966
let _ = in_flight_updates.remove(update_idx);
99679967
}
9968+
// A Watch implementation must not return Completed while prior updates are
9969+
// still InProgress, as this would violate the async persistence contract.
9970+
#[cfg(test)]
9971+
let skip_check = self.skip_monitor_update_assertion.load(Ordering::Relaxed);
9972+
#[cfg(not(test))]
9973+
let skip_check = false;
9974+
if !skip_check && update_completed && !in_flight_updates.is_empty() {
9975+
panic!("Watch::update_channel returned Completed while prior updates are still InProgress");
9976+
}
99689977
(update_completed, update_completed && in_flight_updates.is_empty())
99699978
} else {
99709979
// We blindly assume that the ChannelMonitorUpdate will be regenerated on startup if we
@@ -10030,23 +10039,13 @@ This indicates a bug inside LDK. Please report this error at https://github.com/
1003010039
panic!("{}", err_str);
1003110040
},
1003210041
ChannelMonitorUpdateStatus::InProgress => {
10033-
#[cfg(not(any(test, feature = "_externalize_tests")))]
10034-
if self.monitor_update_type.swap(1, Ordering::Relaxed) == 2 {
10035-
panic!("Cannot use both ChannelMonitorUpdateStatus modes InProgress and Completed without restart");
10036-
}
1003710042
log_debug!(
1003810043
logger,
1003910044
"ChannelMonitor update in flight, holding messages until the update completes.",
1004010045
);
1004110046
false
1004210047
},
10043-
ChannelMonitorUpdateStatus::Completed => {
10044-
#[cfg(not(any(test, feature = "_externalize_tests")))]
10045-
if self.monitor_update_type.swap(2, Ordering::Relaxed) == 1 {
10046-
panic!("Cannot use both ChannelMonitorUpdateStatus modes InProgress and Completed without restart");
10047-
}
10048-
true
10049-
},
10048+
ChannelMonitorUpdateStatus::Completed => true,
1005010049
}
1005110050
}
1005210051

@@ -19553,8 +19552,8 @@ impl<
1955319552

1955419553
per_peer_state: FairRwLock::new(per_peer_state),
1955519554

19556-
#[cfg(not(any(test, feature = "_externalize_tests")))]
19557-
monitor_update_type: AtomicUsize::new(0),
19555+
#[cfg(test)]
19556+
skip_monitor_update_assertion: AtomicBool::new(false),
1955819557

1955919558
pending_events: Mutex::new(pending_events_read),
1956019559
pending_events_processor: AtomicBool::new(false),

lightning/src/ln/functional_test_utils.rs

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -598,6 +598,14 @@ impl<'a, 'b, 'c> Node<'a, 'b, 'c> {
598598
self.node.init_features() | self.onion_messenger.provided_init_features(peer_node_id)
599599
})
600600
}
601+
602+
/// Disables the panic when `Watch::update_channel` returns `Completed` while prior updates
603+
/// are still `InProgress`. Some legacy tests switch the persister between modes mid-flight,
604+
/// which violates this contract but is otherwise harmless.
605+
#[cfg(test)]
606+
pub fn disable_monitor_completeness_assertion(&self) {
607+
self.node.skip_monitor_update_assertion.store(true, core::sync::atomic::Ordering::Relaxed);
608+
}
601609
}
602610

603611
impl<'a, 'b, 'c> std::panic::UnwindSafe for Node<'a, 'b, 'c> {}

lightning/src/ln/monitor_tests.rs

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3384,6 +3384,7 @@ fn test_claim_event_never_handled() {
33843384
let chan_0_monitor_serialized = get_monitor!(nodes[1], chan.2).encode();
33853385
let mons = &[&chan_0_monitor_serialized[..]];
33863386
reload_node!(nodes[1], &init_node_ser, mons, persister, new_chain_mon, nodes_1_reload);
3387+
nodes[1].disable_monitor_completeness_assertion();
33873388

33883389
expect_payment_claimed!(nodes[1], payment_hash_a, 1_000_000);
33893390
// The reload logic spuriously generates a redundant payment preimage-containing

lightning/src/ln/reload_tests.rs

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -823,12 +823,14 @@ fn do_test_partial_claim_before_restart(persist_both_monitors: bool, double_rest
823823

824824
// Now restart nodes[3].
825825
reload_node!(nodes[3], original_manager.clone(), &[&updated_monitor.0, &original_monitor.0], persist_d_1, chain_d_1, node_d_1);
826+
nodes[3].disable_monitor_completeness_assertion();
826827

827828
if double_restart {
828829
// Previously, we had a bug where we'd fail to reload if we re-persist the `ChannelManager`
829830
// without updating any `ChannelMonitor`s as we'd fail to double-initiate the claim replay.
830831
// We test that here ensuring that we can reload again.
831832
reload_node!(nodes[3], node_d_1.encode(), &[&updated_monitor.0, &original_monitor.0], persist_d_2, chain_d_2, node_d_2);
833+
nodes[3].disable_monitor_completeness_assertion();
832834
}
833835

834836
// Until the startup background events are processed (in `get_and_clear_pending_events`,
@@ -2216,6 +2218,7 @@ fn test_reload_with_mpp_claims_on_same_channel() {
22162218
nodes_1_deserialized,
22172219
Some(true)
22182220
);
2221+
nodes[1].disable_monitor_completeness_assertion();
22192222

22202223
// When the claims are reconstructed during reload, PaymentForwarded events are regenerated.
22212224
let events = nodes[1].node.get_and_clear_pending_events();

0 commit comments

Comments
 (0)