Skip to content

[Parquet] Stop converting TSDB block to parquet if it has too many labels #7195

@yeya24

Description

@yeya24

Is your feature request related to a problem? Please describe.
Parquet is a columnar format. However, the parquet library we are using has a upper bound of number of columns in a single parquet file. Exceeding the limit will cause the library to panic.

We have some workaround to shard the parquet file when converting the block if it identifies the number of columns will exceed the upper bound. However, if the TSDB block has too many labels (corresponding to parquet column), then it will create a lot of shards, which cause unnecessary complexity and cost when converting the block.

The typical parquet column limit is 32767 while I saw TSDB blocks with 2M distinct labels. This will result in 60 shards which seems too much.

Describe the solution you'd like
If converter finds the TSDB block has too many labels and it exceeds a configured threshold, upload a no convert marker for the TSDB block. Next time, converter will skip converting this block if the no convert marker exists.

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Add any other context or screenshots about the feature request here.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions