Skip to content

feat: add progressive_set_intersection to disjoint_set#14491

Closed
Devanik21 wants to merge 2 commits intoTheAlgorithms:masterfrom
Devanik21:master
Closed

feat: add progressive_set_intersection to disjoint_set#14491
Devanik21 wants to merge 2 commits intoTheAlgorithms:masterfrom
Devanik21:master

Conversation

@Devanik21
Copy link
Copy Markdown

Checklist

  • I have read the CONTRIBUTING.md file.
  • I have performed a self-review of my own code.
  • My code follows the style guidelines of this project.
  • I have added tests for my changes (doctests included).
  • All new and existing tests passed.
  • I have added the algorithm to the correct folder.
  • I have added docstrings and comments where necessary.

Description

This PR adds progressive_set_intersection() to data_structures/disjoint_set/.

Problem it addresses

Python's built-in set.intersection(*others) is already highly optimized in C. However, when intersecting many sets (50–100+) or dealing with highly imbalanced sizes (e.g., one set with 10 elements vs. multiple sets with millions of elements), repeatedly doing membership checks can be improved by strategic ordering and early pruning.

This implementation demonstrates the "smallest-first + progressive pruning" heuristic:

  • Sort all input sets by size (ascending).
  • Start with a copy of the smallest set as the initial result.
  • Progressively intersect with the remaining sets (from smallest to largest).
  • Early exit as soon as the result becomes empty.

This pattern significantly reduces unnecessary work in practice for the edge cases discussed in #14368.

Why add this?

  • Strong educational value — clearly teaches an important optimization technique for multi-set operations.
  • Pure Python, zero external dependencies.
  • Works with any hashable elements.
  • Comprehensive doctests included.
  • Handles all edge cases gracefully (empty input, single set, immediate empty result).

Note: For general use cases, Python's built-in set.intersection() is still recommended. This module is mainly for learning and teaching the "prune early" strategy.

Algorithm Details

  • Time complexity: Worst case ~O(min_size × k) where k is the number of sets, but much faster in practice due to sorting + early pruning.
  • Space complexity: O(size of smallest set) extra space.

Related Issue

Closes #14368

Files Changed

  • data_structures/disjoint_set/progressive_set_intersection.py (new file)

Testing

  • All doctests pass: python -m doctest -v progressive_set_intersection.py
  • Code follows repo conventions (snake_case, clear docstring, proper formatting).

Example Usage

from data_structures.disjoint_set.progressive_set_intersection import progressive_set_intersection

s1 = {1, 2, 3, 4}
s2 = {2, 3, 5, 6}
s3 = {2, 3, 7}

result = progressive_set_intersection(s1, s2, s3)
print(result)  # Output: {2, 3}

Would be happy to add sorted_array_intersection (two-pointer technique) or a bitmap version in a follow-up PR.
Thanks to @Starglen and @dinakars777 for the valuable discussion in the original issue!

Devanik21 and others added 2 commits April 1, 2026 13:39
This function computes the intersection of multiple sets efficiently by sorting them by size and using early termination.
@Devanik21 Devanik21 closed this Apr 1, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant