Skip to content

Support tiered data storage#692

Open
enigbe wants to merge 5 commits into
lightningdevkit:mainfrom
enigbe:2025-10-tiered-data-storage
Open

Support tiered data storage#692
enigbe wants to merge 5 commits into
lightningdevkit:mainfrom
enigbe:2025-10-tiered-data-storage

Conversation

@enigbe

@enigbe enigbe commented Nov 1, 2025

Copy link
Copy Markdown
Contributor

What this PR does

In this PR we introduce TierStore, a three-tiered (KVStore + KVStoreSync) implementation that manages data across three distinct storage layers based on criticality.

Background

As we have moved towards supporting remote storage with VssStore, we need to recognize that not all data has the same storage requirements. Currently, all data goes to a single store which creates some problems:

  • Remote storage adds latency on every read/write operation
  • Network graph and scorer data is ephemeral and easily rebuild-able from the network
  • Persisting ephemeral data to remote storage wastes bandwidth and adds unnecessary latency
  • Users persisting to remote storage have no local disaster recovery option

This PR proposes tiered storage that provides granular control over where different data types are stored. The tiers include:

  • Ephemeral store (network graph, scorer): optional fast local storage to avoid remote latency
  • Primary store (channels, payments, wallet): preferably remote storage for critical data
  • Backup store: optional (preferably local) backup for disaster recovery, with data written to (removed from) lazily to avoid blocking primary data operations

Additionally, we also permit the configuration of Node with tiered storage allowing callers to:

  • set retry parameters,
  • set backup and ephemeral stores, and to
  • build the Node with a primary store.
    These configuration options also extend to our foreign interface, allowing bindings target to build the Node with their own (KVStore + KVStoreSync) implementations. A sample Python implementation is provided and tested.

Concerns

  1. Nested Retry Logic: VssStore has built-in retry logic. Wrapping it in TierStore creates nested retries.
  2. Backup Queue Capacity: Currently hard-coded (100 in prod, 5 in tests). Ideally this should be configurable but there are concerns about exposing this to callers? What is a sensible default?
  3. Data Consistency Between Primary and Backup: Backup operates asynchronously and may lag behind or miss operations if the queue overflows. Should we consider adding a background sync task to reconcile backup with primary? Expose metrics for backup lag/queue depth? Implement catch-up logic on startup?
  4. Exposing KVStore to the FFI

Related Issues

@ldk-reviews-bot

ldk-reviews-bot commented Nov 1, 2025

Copy link
Copy Markdown

👋 Thanks for assigning @TheBlueMatt as a reviewer!
I'll wait for their review and will help manage the review process.
Once they submit their review, I'll check if a second reviewer would be helpful.

@ldk-reviews-bot ldk-reviews-bot requested a review from tnull November 1, 2025 23:18
@ldk-reviews-bot

Copy link
Copy Markdown

🔔 1st Reminder

Hey @tnull! This PR has been waiting for your review.
Please take a look when you have a chance. If you're unable to review, please let us know so we can find another reviewer.

@enigbe enigbe force-pushed the 2025-10-tiered-data-storage branch 4 times, most recently from 29f47f3 to 264aa7f Compare November 4, 2025 22:07
@ldk-reviews-bot

Copy link
Copy Markdown

🔔 2nd Reminder

Hey @tnull! This PR has been waiting for your review.
Please take a look when you have a chance. If you're unable to review, please let us know so we can find another reviewer.

@ldk-reviews-bot

Copy link
Copy Markdown

🔔 3rd Reminder

Hey @tnull! This PR has been waiting for your review.
Please take a look when you have a chance. If you're unable to review, please let us know so we can find another reviewer.

@ldk-reviews-bot

Copy link
Copy Markdown

🔔 4th Reminder

Hey @tnull! This PR has been waiting for your review.
Please take a look when you have a chance. If you're unable to review, please let us know so we can find another reviewer.

@ldk-reviews-bot

Copy link
Copy Markdown

🔔 5th Reminder

Hey @tnull! This PR has been waiting for your review.
Please take a look when you have a chance. If you're unable to review, please let us know so we can find another reviewer.

@ldk-reviews-bot

Copy link
Copy Markdown

🔔 6th Reminder

Hey @tnull! This PR has been waiting for your review.
Please take a look when you have a chance. If you're unable to review, please let us know so we can find another reviewer.

@ldk-reviews-bot

Copy link
Copy Markdown

🔔 7th Reminder

Hey @tnull! This PR has been waiting for your review.
Please take a look when you have a chance. If you're unable to review, please let us know so we can find another reviewer.

@ldk-reviews-bot

Copy link
Copy Markdown

🔔 8th Reminder

Hey @tnull! This PR has been waiting for your review.
Please take a look when you have a chance. If you're unable to review, please let us know so we can find another reviewer.

@ldk-reviews-bot

Copy link
Copy Markdown

🔔 9th Reminder

Hey @tnull! This PR has been waiting for your review.
Please take a look when you have a chance. If you're unable to review, please let us know so we can find another reviewer.

@ldk-reviews-bot

Copy link
Copy Markdown

🔔 10th Reminder

Hey @tnull! This PR has been waiting for your review.
Please take a look when you have a chance. If you're unable to review, please let us know so we can find another reviewer.

@ldk-reviews-bot

Copy link
Copy Markdown

🔔 11th Reminder

Hey @tnull! This PR has been waiting for your review.
Please take a look when you have a chance. If you're unable to review, please let us know so we can find another reviewer.

@ldk-reviews-bot

Copy link
Copy Markdown

🔔 12th Reminder

Hey @tnull! This PR has been waiting for your review.
Please take a look when you have a chance. If you're unable to review, please let us know so we can find another reviewer.

@ldk-reviews-bot

Copy link
Copy Markdown

🔔 13th Reminder

Hey @tnull! This PR has been waiting for your review.
Please take a look when you have a chance. If you're unable to review, please let us know so we can find another reviewer.

@enigbe enigbe force-pushed the 2025-10-tiered-data-storage branch from 264aa7f to 493dd9a Compare December 2, 2025 19:50
@ldk-reviews-bot

Copy link
Copy Markdown

🔔 14th Reminder

Hey @tnull! This PR has been waiting for your review.
Please take a look when you have a chance. If you're unable to review, please let us know so we can find another reviewer.

@enigbe enigbe force-pushed the 2025-10-tiered-data-storage branch 2 times, most recently from a30cbfb to 1e7bdbc Compare December 4, 2025 23:30
@ldk-reviews-bot

Copy link
Copy Markdown

🔔 15th Reminder

Hey @tnull! This PR has been waiting for your review.
Please take a look when you have a chance. If you're unable to review, please let us know so we can find another reviewer.

@ldk-reviews-bot

Copy link
Copy Markdown

🔔 16th Reminder

Hey @tnull! This PR has been waiting for your review.
Please take a look when you have a chance. If you're unable to review, please let us know so we can find another reviewer.

@tnull tnull left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for looking into this and excuse the delay here!

I did a first pass, generally this looks already pretty good, but it is a huge PR and in some areas could be simplified. For instance, we should drop the generic Retry logic as the concrete implementations would already implement that if they need it. Likewise, we shouldn't fallback to the backup for now as it's really only meant as disaster recovery (KVStore caching is out-of-scope for this PR now, even though we might want to explore that soon, too). There is also no need to replicate the write-ordering locks in TierStore, but we'll need them in ForeignKVStoreAdapter (which might be mergeable with DynStore?).

Generally, if you find opportunities to reduce the size of the changeset here it would be appreciated. It would also be cool if you could try to break up the PR into more feature commits, but feel free to leave as is if not.

Comment thread src/io/tier_store.rs Outdated

/// Configuration for exponential backoff retry behavior.
#[derive(Debug, Copy, Clone)]
pub struct RetryConfig {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Retrying is pretty specific to the particular KVStore implementation. I don't think we should expose retrying params on top of what we already do for implementations internally.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure thing. This has been dropped.
I considered it because users can choose their own KVStore implementation without considering retrying. I thought to add some level of control but the added complexity and size of the changeset might not be worth it.

Comment thread src/io/tier_store.rs Outdated
}

pub struct TierStoreInner {
/// For remote data.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Might also be local.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated and addressed here db1fe83

Comment thread bindings/ldk_node.udl Outdated
[Throws=BuildError]
Node build_with_vss_store_and_header_provider(NodeEntropy node_entropy, string vss_url, string store_id, VssHeaderProvider header_provider);
[Throws=BuildError]
Node build_with_tier_store(NodeEntropy node_entropy, DynStore primary_store);

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we now also expose Builder::build_with_store?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes we can.
This is exposed here 451a392

Comment thread src/ffi/types.rs Outdated
Self { inner: Arc::new(adapter) }
}

pub fn from_ldk_store(store: Arc<dyn LdkSyncAndAsyncKVStore + Send + Sync>) -> Arc<Self> {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we rather make this an impl From<Arc<dyn LdkSyncAndAsyncKVStore + Send + Sync>> for Arc .., we can drop the wrap_storemacro and just use.into()` instead.

Comment thread src/builder.rs Outdated
}

let store = wrap_store!(Arc::new(tier_store));
self.build_with_store(node_entropy, store)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to use build_with_store_internal here to make sure we're using the same Runtime etc.

Comment thread src/io/tier_store.rs Outdated
Arc::clone(&outer_lock.entry(locking_key).or_default())
}

async fn execute_locked_write<

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused, why are we replicating all the write-ordering logic here? Given the implementations are required to fullfill LDK's requirements already, can't we just call the inner write and be done with it?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, the reasoning here is that users might not implement the KVStore trait for their own stores as required and thus, the onus of correctness will fall to us. If we can't get it from their inner stores, the wrapper TierStore should provide some guarantees. I do agree that if the inner store already satisfies LDK's requirements, the wrapper shouldn't need to duplicate that logic. The write-ordering has been removed.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I have to eat my words here: it seems we have to implement key-level versioning for TierStore too as the returned write/remove futures are not immediately polled.

Codex:

  • P1 /home/tnull/workspace/ldk-node-pr-692/src/io/tier_store.rs:100: TierStore::write/remove delays delegation until the returned future is polled, so it
    loses the existing stores’ “latest call wins” ordering. Example: create old = write(old), then new = write(new), await new, then await old; TierStore will
    write old last. Existing SqliteStore assigns the version at call time and skips stale writes (/home/tnull/workspace/ldk-node-pr-692/src/io/sqlite_store/
    mod.rs:79, /home/tnull/workspace/ldk-node-pr-692/src/io/sqlite_store/mod.rs:603). This can corrupt async channel monitor persistence by letting an older
    pending update overwrite or remove newer state. TierStore needs its own call-time per-key versioning before returning the future.

Comment thread src/ffi/types.rs Outdated
Comment thread src/types.rs Outdated
Comment thread src/io/test_utils.rs Outdated
Comment thread src/ffi/types.rs Outdated
Comment thread src/ffi/types.rs Outdated
}
}

#[async_trait]

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably not worth taking a dependency just to desugar impl Future<Output = Result<(), IOError>> + Send + Sync + 'a.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably not worth taking a dependency just to desugar impl Future<Output = Result<(), IOError>> + Send + Sync + 'a.

Well, it's 'the official'/supported way to do async traits with Uniffi: https://mozilla.github.io/uniffi-rs/latest/futures.html#exporting-async-trait-methods

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does uniffi require that in some way? Its just a comically-overkill way to sugar impl Future which is kinda nuts...

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, worse, it looks like async_trait desugars to the Pin<Box<dyn ...>> version...which is dumb but at least for uniffi it shouldn't matter cause we'd have to do that anyway.

@tnull tnull left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately this will need a substantial rebase now that #696 landed, sorry for that!

Besides some tedious rebase work, we now have DynStoreWrapper which can be used on the ffi and the non-ffi side. I do wonder if we can actually just merge that wrapper with the TierStore, as well as the ForeignKVStoreAdapter/DynStore. Seems all these wrapper structs may not be needed and we could just get away with one trait and one wrapper struct, mostly?

@enigbe enigbe force-pushed the 2025-10-tiered-data-storage branch from 8dbb312 to 0ae61b7 Compare May 19, 2026 06:05
@ldk-reviews-bot

Copy link
Copy Markdown

🔔 4th Reminder

Hey @tnull! This PR has been waiting for your review.
Please take a look when you have a chance. If you're unable to review, please let us know so we can find another reviewer.

@ldk-reviews-bot

Copy link
Copy Markdown

🔔 5th Reminder

Hey @tnull! This PR has been waiting for your review.
Please take a look when you have a chance. If you're unable to review, please let us know so we can find another reviewer.

@ldk-reviews-bot

Copy link
Copy Markdown

🔔 6th Reminder

Hey @tnull! This PR has been waiting for your review.
Please take a look when you have a chance. If you're unable to review, please let us know so we can find another reviewer.

@ldk-reviews-bot

Copy link
Copy Markdown

🔔 7th Reminder

Hey @tnull! This PR has been waiting for your review.
Please take a look when you have a chance. If you're unable to review, please let us know so we can find another reviewer.

@ldk-reviews-bot

Copy link
Copy Markdown

🔔 8th Reminder

Hey @tnull! This PR has been waiting for your review.
Please take a look when you have a chance. If you're unable to review, please let us know so we can find another reviewer.

@ldk-reviews-bot

Copy link
Copy Markdown

🔔 9th Reminder

Hey @tnull! This PR has been waiting for your review.
Please take a look when you have a chance. If you're unable to review, please let us know so we can find another reviewer.

@tnull tnull left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excuse the delay here! Mostly looks good, just a few comments.

Comment thread tests/common/mod.rs
Comment thread src/io/tier_store.rs Outdated
// http://www.apache.org/licenses/LICENSE-2.0> or the MIT license <LICENSE-MIT or
// http://opensource.org/licenses/MIT>, at your option. You may not use this file except in
// accordance with one or both of these licenses.
#![allow(dead_code)] // TODO: Temporal warning silencer. Will be removed in later commit.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fixup for this seems to be in the wrong commit.

Comment thread src/builder.rs Outdated
Comment thread src/io/tier_store.rs Outdated
Comment thread src/io/tier_store.rs Outdated
Comment thread src/builder.rs Outdated
Comment thread src/types.rs
@enigbe enigbe force-pushed the 2025-10-tiered-data-storage branch 3 times, most recently from deb81d0 to e53deb6 Compare June 11, 2026 08:32
@enigbe

enigbe commented Jun 11, 2026

Copy link
Copy Markdown
Contributor Author

Hi @tnull
Thanks for the review again. I've addressed the latest feedback in a series of fixup commits.

@enigbe enigbe force-pushed the 2025-10-tiered-data-storage branch 2 times, most recently from 4e221ed to cb7b044 Compare June 12, 2026 09:51
@enigbe enigbe requested review from TheBlueMatt and tnull June 12, 2026 09:55
@enigbe enigbe force-pushed the 2025-10-tiered-data-storage branch from cb7b044 to 6d8fa02 Compare June 12, 2026 12:21
@ldk-reviews-bot

Copy link
Copy Markdown

🔔 1st Reminder

Hey @tnull @TheBlueMatt! This PR has been waiting for your review.
Please take a look when you have a chance. If you're unable to review, please let us know so we can find another reviewer.

1 similar comment
@ldk-reviews-bot

Copy link
Copy Markdown

🔔 1st Reminder

Hey @tnull @TheBlueMatt! This PR has been waiting for your review.
Please take a look when you have a chance. If you're unable to review, please let us know so we can find another reviewer.

@enigbe enigbe force-pushed the 2025-10-tiered-data-storage branch from cadf81f to 1f2cd4d Compare June 16, 2026 08:34
@ldk-reviews-bot

Copy link
Copy Markdown

🔔 2nd Reminder

Hey @tnull @TheBlueMatt! This PR has been waiting for your review.
Please take a look when you have a chance. If you're unable to review, please let us know so we can find another reviewer.

1 similar comment
@ldk-reviews-bot

Copy link
Copy Markdown

🔔 2nd Reminder

Hey @tnull @TheBlueMatt! This PR has been waiting for your review.
Please take a look when you have a chance. If you're unable to review, please let us know so we can find another reviewer.

@tnull tnull left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feel free to squash fixups!

Unfortunately, cargo test --features uniffi currently fails to compile currently. I think we might also need to revert to implement the same key-level versioning pattern as we have in other stores afterall. Excuse the back-and-forth.

Comment thread src/io/tier_store.rs Outdated
primary_namespace,
secondary_namespace,
key,
buf.clone(),

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, can we avoid cloning if no backup store is set?

Comment thread src/io/tier_store.rs Outdated
Arc::clone(&outer_lock.entry(locking_key).or_default())
}

async fn execute_locked_write<

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I have to eat my words here: it seems we have to implement key-level versioning for TierStore too as the returned write/remove futures are not immediately polled.

Codex:

  • P1 /home/tnull/workspace/ldk-node-pr-692/src/io/tier_store.rs:100: TierStore::write/remove delays delegation until the returned future is polled, so it
    loses the existing stores’ “latest call wins” ordering. Example: create old = write(old), then new = write(new), await new, then await old; TierStore will
    write old last. Existing SqliteStore assigns the version at call time and skips stale writes (/home/tnull/workspace/ldk-node-pr-692/src/io/sqlite_store/
    mod.rs:79, /home/tnull/workspace/ldk-node-pr-692/src/io/sqlite_store/mod.rs:603). This can corrupt async channel monitor persistence by letting an older
    pending update overwrite or remove newer state. TierStore needs its own call-time per-key versioning before returning the future.

Comment thread src/io/tier_store.rs Outdated
{
Ok(data) => Ok(data),
Err(e) => {
log_error!(

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Codex:

  • P3 /home/tnull/workspace/ldk-node-pr-692/src/io/tier_store.rs:197: read_primary logs every read error as ERROR, including normal NotFound startup paths.
    Since the builder now wraps every store in TierStore (/home/tnull/workspace/ldk-node-pr-692/src/builder.rs:897), fresh nodes will log expected missing
    node_metrics, graph, scorer, event queue, peers, etc. as errors before callers handle them as defaults. The wrapper should avoid error-level logging for
    NotFound and generally let callers decide.

enigbe added 3 commits June 19, 2026 23:26
This commit adds `TierStore`, a tiered `KVStore` implementation that
routes node persistence across three storage roles:

- a primary store for durable, authoritative data
- an optional backup store for a second durable copy of primary-backed data
- an optional ephemeral store for rebuildable cached data such as the
  network graph and scorer

TierStore routes ephemeral cache data to the ephemeral store when
configured, while durable data remains primary+backup. Reads and lists
do not consult the backup store during normal operation.

For primary+backup writes and removals, this implementation treats the
backup store as part of the persistence success path rather than as a
best-effort background mirror. Earlier designs used asynchronous backup
queueing to avoid blocking the primary path, but that weakens the
durability contract by allowing primary success to be reported before
backup persistence has completed. TierStore now issues primary and backup
operations together and only returns success once both complete.

This gives callers a clearer persistence guarantee when a backup store is
configured: acknowledged primary+backup mutations have been attempted
against both durable stores. The tradeoff is that dual-store operations
are not atomic across stores, so an error may still be returned after one
store has already been updated.

Additionally, adds unit coverage for the current contract, including:
- basic read/write/remove/list persistence
- routing of ephemeral data away from the primary store
- backup participation in the foreground success path for writes and removals
Add native builder support for tiered storage by introducing
`TierStoreConfig` and builder methods for configuring ephemeral storage
and a local SQLite backup mirror.

During node construction, wrap the configured primary store in
`TierStore` and attach secondary tiers for cache-like ephemeral data and
mirrored durable backup writes. The builder constructs the backup  and
ephemeral stores internally using a dedicated SQLite database file.

Add test coverage for full-cycle backup mirroring, same-path rejection,
and UniFFI-backed builder configuration. Update `setup_builder!` so
FFI-backed builder tests can use mutable configuration helpers.
…ightningdevkit#871

- Added temporary #[allow(dead_code)] annotations to NodeBuilder::set_backup_storage_dir_path
  and NodeBuilder::set_ephemeral_storage_dir_path
- Keeps these tiered storage builder setters compilable while they remain unused before dropping
  in lightningdevkit#871
@enigbe enigbe force-pushed the 2025-10-tiered-data-storage branch from 1f2cd4d to d424891 Compare June 19, 2026 23:38
@ldk-reviews-bot

Copy link
Copy Markdown

🔔 3rd Reminder

Hey @TheBlueMatt! This PR has been waiting for your review.
Please take a look when you have a chance. If you're unable to review, please let us know so we can find another reviewer.

enigbe added 2 commits June 20, 2026 01:38
avoid buffer clone if backup isn't configured
Here we avoid error-level logging for primary read
misses and log partial primary/backup mutation failures
as possible divergence.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants