Skip to content

data_structures: add progressive_set_intersection in disjoint_set#14490

Closed
Devanik21 wants to merge 1 commit intoTheAlgorithms:masterfrom
Devanik21:master
Closed

data_structures: add progressive_set_intersection in disjoint_set#14490
Devanik21 wants to merge 1 commit intoTheAlgorithms:masterfrom
Devanik21:master

Conversation

@Devanik21
Copy link
Copy Markdown

Description

This PR adds progressive_set_intersection() to the repository.

Problem it addresses

Python's built-in set.intersection(*others) is already highly optimized in C and automatically iterates over the smallest set. However, when intersecting many sets (50–100+) or dealing with highly imbalanced sizes (e.g., one tiny set of 10 elements vs. several sets with millions of elements), a naive approach can waste time.

This implementation demonstrates the "smallest-first + progressive pruning" heuristic:

  • Sort all input sets by size (ascending).
  • Start with a copy of the smallest set.
  • Progressively intersect with the remaining sets.
  • Early exit as soon as the result becomes empty.

This pattern significantly reduces unnecessary membership checks in practice.

Why add this to the repo?

  • Educational value: Clearly shows an important optimization technique for multi-set operations.
  • Pure Python, zero external dependencies.
  • Works with any hashable elements (no assumptions about integer ranges or sorting).
  • Includes comprehensive doctests.
  • Handles edge cases: empty input, single set, empty intersection.

Note: For most everyday use cases, the built-in set.intersection() remains the best choice. This module is primarily for learning and for scenarios with many/imbalanced sets.

Algorithm Details

  • Time complexity: In the worst case ~O(min_size × k) where k is the number of sets, but much faster in practice due to early pruning.
  • Space complexity: O(min_size) extra space (copy of the smallest set).

Related Issue

Closes #14368

Files Changed

  • data_structures/disjoint_set/progressive_set_intersection.py (new file)

Testing

  • All doctests pass (python3 -m doctest -v progressive_set_intersection.py)
  • Follows repo style (snake_case, proper docstring, type hints where appropriate)

Example Usage

from data_structures.disjoint_set.progressive_set_intersection import progressive_set_intersection

s1 = {1, 2, 3, 4}
s2 = {2, 3, 5, 6}
s3 = {2, 3, 7}

result = progressive_set_intersection(s1, s2, s3)
print(result)          # Output: {2, 3}

Would be happy to add more functions (e.g., sorted array intersection using two-pointer technique) or a bitmap version in a follow-up PR if needed.
Thanks to @Starglen and @dinakars777 for the discussion in the issue!

This function computes the intersection of multiple sets efficiently by sorting them by size and using early termination.
@algorithms-keeper
Copy link
Copy Markdown

Closing this pull request as invalid

@Devanik21, this pull request is being closed as none of the checkboxes have been marked. It is important that you go through the checklist and mark the ones relevant to this pull request. Please read the Contributing guidelines.

If you're facing any problem on how to mark a checkbox, please read the following instructions:

  • Read a point one at a time and think if it is relevant to the pull request or not.
  • If it is, then mark it by putting a x between the square bracket like so: [x]

NOTE: Only [x] is supported so if you have put any other letter or symbol between the brackets, that will be marked as invalid. If that is the case then please open a new pull request with the appropriate changes.

@algorithms-keeper algorithms-keeper bot closed this Apr 1, 2026
@algorithms-keeper algorithms-keeper bot added the awaiting reviews This PR is ready to be reviewed label Apr 1, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

awaiting reviews This PR is ready to be reviewed invalid

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant