Skip to content

Conversation

@afterincomparableyum
Copy link

@afterincomparableyum afterincomparableyum commented Dec 28, 2025

Integrate existing compression infrastructure (LZ4 and ZSTD) into the C++ client write path. This enables compression during pushData operations, matching the functionality available in the Java client.

Changes:

  • Add compression support to ShuffleClientImpl:

    • Add shuffleCompressionEnabled_ flag and compressor_ member
    • Initialize compressor from CelebornConf in constructor
    • Compress data in pushData() when compression is enabled
    • Use compressed size for batchBytesSize tracking
  • Configuration integration:

    • Read compression codec from celeborn.client.shuffle.compression.codec
    • Read ZSTD compression level from celeborn.client.shuffle.compression.zstd.level
    • Default to NONE (compression disabled)
  • Retry/revive support:

    • Retry path correctly uses pre-compressed body buffer
    • No re-compression needed during retries
  • Testing:

    • Add CompressorFactoryTest for factory pattern and config integration
    • Add compression config tests to CelebornConfTest
    • Test offset compression support for both LZ4 and ZSTD

How was this patch tested?

Unit Tests, as well as compiling code

@afterincomparableyum
Copy link
Author

afterincomparableyum commented Dec 28, 2025

@HolyLow this PR is a WIP, I will rebase it off of main after #3568 gets merged.

This is the commit for write compression: 6abae43

Integrate existing compression infrastructure (LZ4 and ZSTD) into the C++ client write path. This enables compression during pushData operations, matching the functionality available in the Java client.

Changes:
- Add compression support to ShuffleClientImpl:
  * Add shuffleCompressionEnabled_ flag and compressor_ member
  * Initialize compressor from CelebornConf in constructor
  * Compress data in pushData() when compression is enabled
  * Use compressed size for batchBytesSize tracking

- Configuration integration:
  * Read compression codec from celeborn.client.shuffle.compression.codec
  * Read ZSTD compression level from celeborn.client.shuffle.compression.zstd.level
  * Default to NONE (compression disabled)

- Retry/revive support:
  * Retry path correctly uses pre-compressed body buffer
  * No re-compression needed during retries

- Testing:
  * Add CompressorFactoryTest for factory pattern and config integration
  * Add compression config tests to CelebornConfTest
  * Test offset compression support for both LZ4 and ZSTD
@afterincomparableyum afterincomparableyum changed the title [WIP][CELEBORN-2221][CIP-14] Add support for write compression in CppClient [CELEBORN-2221][CIP-14] Support writing with compression in C++ client Jan 17, 2026
@afterincomparableyum afterincomparableyum marked this pull request as ready for review January 17, 2026 23:22
@afterincomparableyum
Copy link
Author

@HolyLow @SteNicholas @FMX @RexXiong Could you please help review this PR? Appreciate your help in improving this as needed!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant