Skip to content

Conversation

@discivigour
Copy link
Contributor

@discivigour discivigour commented Jan 13, 2026

Purpose

  • The current FullStartingScanner class is a bit bloated. This pr introduces PartialStartingScanner to split some of the features of FullStartingScanner class.

Tests

API and Format

Documentation

@discivigour discivigour marked this pull request as ready for review January 13, 2026 12:56
if idx_of_this_subtask >= number_of_para_subtasks:
raise Exception("idx_of_this_subtask must be less than number_of_para_subtasks")
if self.start_pos_of_this_subtask is not None:
raise Exception("with_shard and with_slice cannot be used simultaneously")
Copy link
Contributor

@XiaoHongbo-Hope XiaoHongbo-Hope Jan 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use a suitable Exception or Value type instead of generic Exception, such as ValueError.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👌

@discivigour discivigour marked this pull request as draft January 14, 2026 03:58
split.shard_file_idx_map[file.file_name] = (0, plan_end_pos - file_begin_pos)
elif file_end_pos <= plan_start_pos or file_begin_pos >= plan_end_pos:
split.shard_file_idx_map[file.file_name] = (-1, -1)
return file_end_pos
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

define a constant for (-1, -1), which is important for other developers to understand it.

return filtered_partitioned_files, (plan_start_pos, plan_end_pos)

def _append_only_filter_by_shard(self, partitioned_files: defaultdict) -> (defaultdict, int, int):
"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

similar code with _data_evolution_filter_by_shard, can you refactor ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants