Issue 19781 : Internal error: Assertion failed: !self.finished: LimitedBatchCoalescer #19785

bert-beyondloops · 2026-01-13T11:27:44Z

Which issue does this PR close?

PR will close issue #19781.

Rationale for this change

Fixes the internal error

What changes are included in this PR?

The code change is inspired by the CoalesceBatchesStream implementation.

Are these changes tested?

Additional sqllogictest written in limit.slt which triggered the issue before the fix.

Are there any user-facing changes?

No

pepijnve · 2026-01-13T11:42:44Z

datafusion/physical-plan/src/filter.rs

-        let poll;
        let elapsed_compute = self.metrics.baseline_metrics.elapsed_compute().clone();
        loop {
+            // If there is any completed batch ready, return it


nit: any -> a

martin-g · 2026-01-13T12:31:26Z

datafusion/physical-plan/src/filter.rs

    RecordBatchStream, SendableRecordBatchStream, Statistics,
 };
-use crate::coalesce::LimitedBatchCoalescer;
 use crate::coalesce::PushBatchStatus::LimitReached;


PushBatchStatus is imported below, so LimitReached variant could be used as PushBatchStatus::LimitReached.

martin-g · 2026-01-13T12:34:45Z

datafusion/physical-plan/src/filter.rs

                        }
-                        Err(e) => {
-                            poll = Poll::Ready(Some(Err(e)));
+                        LimitReached => {


Suggested change

LimitReached => {

PushBatchStatus::LimitReached => {

martin-g · 2026-01-13T12:37:16Z

datafusion/physical-plan/src/filter.rs

-                            poll = Poll::Ready(Some(Err(e)));
+                        LimitReached => {
+                            // limit was reached, so stop early
+                            self.batch_coalescer.finish()?


Suggested change

self.batch_coalescer.finish()?

self.batch_coalescer.finish()?;

martin-g · 2026-01-13T12:39:05Z

datafusion/physical-plan/src/coalesce/mod.rs

        Ok(())
    }

+    pub fn is_finished(&self) -> bool {


Suggested change

pub fn is_finished(&self) -> bool {

pub(crate) fn is_finished(&self) -> bool {

crate visibility added

bert-beyondloops · 2026-01-13T12:53:57Z

Thanks @martin-g for the review!

Jefffrey

Perhaps @Dandandan or @alamb can take a look since it seems its related to this PR?

#18604

Jefffrey · 2026-01-14T02:31:51Z

datafusion/sqllogictest/test_files/limit.slt

+
+# tests with target partition set to 1
+statement ok
+set datafusion.execution.target_partitions = '1';


nit: can we reset these configs after this test? Just so if tests are added after this they don't inherit these config settings

I intentionally included the comment:

tests with target partition set to 1

to mark that all subsequent tests use this configuration.

I couldn't find a method to reset this setting (I'm unsure what the initial value was, and there's no obvious way to preserve and reapply it?).
Looking at other slt files, I noticed that some settings are similarly modified without being reset—see order.slt as an example.
If there's a standard approach for handling this, I'd be happy to implement it.

Maybe we can move this to a separate SLT file that has this configuration as a kind of preamble?

The problem with these settings, as people are observing, is that they affect all subsequent tests

I like @pepijnve 's suggestion -- let's put this in a separate test. Perhaps something like limit_single_row_batches.slt

Using the setup of one target_partition and batch sizes of one to test all the limiting corner cases is a great idea. We should probably add some more basic cases to cover some other operators

I created an additional slt file as requested and explicitly added some comments where tests with target partitions set to 1 should start.

alamb

Thank you @bert-beyondloops @Jefffrey and @pepijnve and @martin-g

This is some great engineering.

I reviewed this PR carefully and I think it correct, makes sense, and fixes the issue. I also spent some time trying to understand how our existing test coverage missed it (theory below)

I do think it would be nice to move the test to its own file, but we could also do that as a follow on PR too

If anyone else is curious, the error from the reproducer before this PR is

set datafusion.execution.target_partitions = '1';
set datafusion.execution.batch_size = '1';
create table test (i INTEGER) as values (1), (2);
select * from test where i <> 0 limit 1;

Internal error: Assertion failed: !self.finished: LimitedBatchCoalescer: cannot push batch after finish.
This issue was likely caused by a bug in DataFusion's code. Please help us to resolve this by filing a bug report in our issue tracker: https://github.com/apache/datafusion/issues

Here is the plan

> explain format indent select * from test where i <> 0 limit 1;
+---------------+-----------------------------------------------------+
| plan_type     | plan                                                |
+---------------+-----------------------------------------------------+
| logical_plan  | Limit: skip=0, fetch=1                              |
|               |   Filter: test.i != Int32(0)                        |
|               |     TableScan: test projection=[i]                  |
| physical_plan | FilterExec: i@0 != 0, fetch=1                       |
|               |   DataSourceExec: partitions=1, partition_sizes=[2] |
|               |                                                     |
+---------------+-----------------------------------------------------+
2 row(s) fetched.
Elapsed 0.006 seconds.

alamb · 2026-01-14T12:35:18Z

datafusion/sqllogictest/test_files/limit.slt

+
+# tests with target partition set to 1
+statement ok
+set datafusion.execution.target_partitions = '1';


The problem with these settings, as people are observing, is that they affect all subsequent tests

I like @pepijnve 's suggestion -- let's put this in a separate test. Perhaps something like limit_single_row_batches.slt

Using the setup of one target_partition and batch sizes of one to test all the limiting corner cases is a great idea. We should probably add some more basic cases to cover some other operators

alamb · 2026-01-14T12:39:20Z

datafusion/physical-plan/src/filter.rs

+                return self.metrics.baseline_metrics.record_poll(poll);
+            }
+
+            if self.batch_coalescer.is_finished() {


I think this is the key change -- specifically that on subsequent calls to poll_next() we just drain the coalescer and don't get the next input / try and put it in.

Previously, if the input returns another batch, the FilterExec will try and add it to the BatchCoalescer (which is what triggers the assert)

This new code correctly drains the coalescer state 👍 .

I suspect that we haven't seen this on other queries because most/all the ExecutionPlans that can feed a FilterExec also implement Limit. Thus when the FilterExec calls poll_next on the input, the input returns None and no additional batch is pushed to the BatchCoalescer.

The default batch size for the memory exec means that it will most often only return a single batch.

I can't really figure out why setting the number of target partitions to 1 makes any difference (though I verified it is required to trigger the reproducer)

Perhaps the reason @bert-beyondloops saw this in his system is that you have some custom execution plan (that doesn't implement limit pushdown 🤔 ). This is fine I am just trying to explain why we haven't hit this issue before / in our other tests

We indeed specify target partition 1 for some specific case since we know upfront our input is sorted and we do not want overhead of a SortMergeExec.

Without the target partition set to 1, you get another physical plan (with multiple partitions) and where the limit is foreseen on another type of Exec.

alamb · 2026-01-14T12:41:22Z

datafusion/physical-plan/src/filter.rs

+            if self.batch_coalescer.is_finished() {
+                // If input is done and no batches are ready, return None to signal end of stream.
+                let poll = Poll::Ready(None);
+                return self.metrics.baseline_metrics.record_poll(poll);


FWIW I don't think record_poll is needed here (it only records information when there is a batch). There is nothing wrong with this call, but it is somewhat confusing as there are other paths below that return without also calling record_poll)

Indeed, this wasn't consistent. But honestly, you know this (it only records information when there is a batch) by looking at the internals of the record_poll.
In my opinion, you should call the record_poll everywhere but this is another discussion as such :-)

I'll adapt.

…comments)

alamb · 2026-01-14T15:59:10Z

Looks good to me ❤️

alamb

let's give this until tomorrow to allow time for others to comment and then I'll plan to merge it in assuming nothing else comes up

Thank you again @bert-beyondloops

Bert Vermeiren added 2 commits January 13, 2026 11:30

Fix: filter with limit support raises internal error (slt test)

30a693c

fix: filter with limit support raises internal error

95b675e

github-actions bot added sqllogictest SQL Logic Tests (.slt) physical-plan Changes to the physical-plan crate labels Jan 13, 2026

pepijnve reviewed Jan 13, 2026

View reviewed changes

martin-g reviewed Jan 13, 2026

View reviewed changes

martin-g approved these changes Jan 13, 2026

View reviewed changes

Jefffrey reviewed Jan 14, 2026

View reviewed changes

alamb approved these changes Jan 14, 2026

View reviewed changes

Bert Vermeiren added 2 commits January 14, 2026 14:36

Fix: filter with limit support raises internal error (tackled review …

b2b8a23

…comments)

Fix: filter with limit support raises internal error (tackled review …

fc37c87

…comments)

bert-beyondloops force-pushed the issue-19781-filter-limit branch from a9f3798 to fc37c87 Compare January 14, 2026 13:54

bert-beyondloops requested a review from alamb January 14, 2026 14:24

alamb approved these changes Jan 14, 2026

View reviewed changes

	self.batch_coalescer.finish()?
	self.batch_coalescer.finish()?;

	pub fn is_finished(&self) -> bool {
	pub(crate) fn is_finished(&self) -> bool {

Issue 19781 : Internal error: Assertion failed: !self.finished: LimitedBatchCoalescer #19785

Are you sure you want to change the base?

Issue 19781 : Internal error: Assertion failed: !self.finished: LimitedBatchCoalescer #19785

Conversation

bert-beyondloops commented Jan 13, 2026

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bert-beyondloops commented Jan 13, 2026

Uh oh!

Jefffrey left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alamb left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alamb commented Jan 14, 2026

Uh oh!

alamb left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants