Skip to content

deprecate blosc enums#3963

Open
d-v-b wants to merge 15 commits intozarr-developers:mainfrom
d-v-b:deprecate-blosc-enums
Open

deprecate blosc enums#3963
d-v-b wants to merge 15 commits intozarr-developers:mainfrom
d-v-b:deprecate-blosc-enums

Conversation

@d-v-b
Copy link
Copy Markdown
Contributor

@d-v-b d-v-b commented May 10, 2026

This PR deprecates the string enums used for the blosc codec. Literal strings are used instead. We only use enums in a few places -- Literal strings are much more abundant for representing constant string types -- and I think that's because enums in python are rather cumbersome compared to Literal[<str>].

If this PR has legs, I would follow up with similar deprecations for the other enums (used in the bytes and sharding codecs).

closes partially addresses #3457

d-v-b and others added 7 commits May 10, 2026 13:03
Design for steering BloscCodec users toward literal-string parameters,
with the enum classes kept importable but deprecated on member access.

Canonical: https://gist.github.com/d-v-b/9fd3fe92f82a24c929129f42a6f11f60

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Member access on BloscShuffle / BloscCname now emits DeprecationWarning
and returns the equivalent string. BloscCodec stores cname/shuffle as
literal strings; passing a real enum.Enum instance to __init__ warns.

BloscShuffle.from_int returns a str. Internal call sites in
migrate_v3 continue to work because BloscCodec accepts both forms.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The BloscShuffle and BloscCname enums are deprecated; update doc
examples to the recommended literal-string form.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The 0000 filename is a placeholder; rename to the PR number when the
pull request is opened.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The design doc is published as a public gist linked from the PR
description; the in-tree copy is no longer needed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replace Sphinx :class: roles and double-backticks with single
backticks in the new docstrings added by the blosc enum deprecation.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@d-v-b d-v-b requested a review from ilan-gold May 10, 2026 21:27
@codecov
Copy link
Copy Markdown

codecov Bot commented May 10, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 93.27%. Comparing base (ad90884) to head (33a2023).

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3963      +/-   ##
==========================================
+ Coverage   93.26%   93.27%   +0.01%     
==========================================
  Files          87       87              
  Lines       11721    11739      +18     
==========================================
+ Hits        10932    10950      +18     
  Misses        789      789              
Files with missing lines Coverage Δ
src/zarr/codecs/blosc.py 95.77% <100.00%> (+0.61%) ⬆️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

d-v-b and others added 4 commits May 11, 2026 00:03
Add tests for the ValueError paths in _parse_cname / _parse_shuffle
and the AttributeError path in the deprecated-enum metaclass, which
codecov reported as uncovered. Drop a few "type: ignore" markers
that mypy now flags as unused after the init signature widened.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
mypy's per-file (pre-commit) and project-wide views disagree on
whether the deliberately-wrong arguments need a type ignore. Using
typing.cast keeps both views happy and is more explicit about what
each test is asserting.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@d-v-b d-v-b added the downstream Downstream libraries using zarr label May 11, 2026
Copy link
Copy Markdown
Contributor

@ilan-gold ilan-gold left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

closes #3457

Don't think this Pr does that.

Not sure why we can't use Literal in the strings https://zarr--3963.org.readthedocs.build/en/3963/api/zarr/codecs/#zarr.codecs.BloscCodec for the docs.

Any reason we are not using https://github.com/tox-dev/sphinx-autodoc-typehints?

Comment thread src/zarr/codecs/blosc.py Outdated
CName = Literal["lz4", "lz4hc", "blosclz", "snappy", "zlib", "zstd"]
"""The codec identifiers used in the blosc codec """

CNAME: Final = ("lz4", "lz4hc", "blosclz", "snappy", "zlib", "zstd")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get_args is only useful at runtime, it doesn't help you with type checking. for a small number of literal strings that won't change, writing them out explicitly as a final collection does help with runtime and type checking.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But this CNAME all-caps isn't used for anything except runtime checking unless I am missing something, no?

Copy link
Copy Markdown
Contributor

@ilan-gold ilan-gold May 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like, why even define it? get_args(CName) + lru_cache if there is a performance concern

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's 6 strings that are defined in a spec. given the choice between get_args + lru_cache (two imports, produces something untyped) and just binding the 6 strings to a constant, the static constant is far simpler (no imports, gets typed correctly).

Anyone who wants to iterate over the values defined in the literal type needs a way of making those values. We should have been doing this in our tests, but we were not. I added some tests that exercise this.

Comment thread src/zarr/codecs/blosc.py Outdated
Comment thread tests/test_codecs/test_blosc.py Outdated
Comment thread tests/test_codecs/test_blosc.py
Comment thread tests/test_codecs/test_blosc.py Outdated
Comment thread src/zarr/codecs/blosc.py
@d-v-b
Copy link
Copy Markdown
Contributor Author

d-v-b commented May 11, 2026

Any reason we are not using https://github.com/tox-dev/sphinx-autodoc-typehints?

we aren't using sphinx!

d-v-b and others added 3 commits May 11, 2026 12:35
Rename Shuffle -> BloscShuffleLiteral and CName -> BloscCnameLiteral.
The bare Shuffle name collided with numcodecs.Shuffle re-exported
from zarr.codecs, which would have caused mkdocstrings cross-refs in
the BloscCodec docstring to resolve to the wrong symbol. The Literal
suffix also clearly distinguishes the type alias from the deprecated
BloscShuffle / BloscCname enum classes.

Update the BloscCodec docstring to reference the new names in the
Attributes and Parameters sections (Convention A from cast_value.py),
with literal values enumerated in prose.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Parametrize the BloscShuffle / BloscCname member-access warning tests
  into a single test_blosc_enum_member_access_warns.
- Parametrize the cname / shuffle reject-unknown tests into a single
  test_blosc_codec_rejects_unknown driven by **kwargs.
- Parametrize the AttributeError-on-unknown-member tests into a single
  test_blosc_enum_attribute_error_for_unknown_member.
- Add a docstring to every new test explaining what behavior it verifies,
  so reviewers don't have to read the body to understand the intent.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@d-v-b
Copy link
Copy Markdown
Contributor Author

d-v-b commented May 11, 2026

closes #3457

Don't think this Pr does that.

good catch, I updated the PR summary to say "partially addresses"

@ilan-gold
Copy link
Copy Markdown
Contributor

we aren't using sphinx!

Oh right, mkdocs is completely separate - for a moment I was thinking it used sphinx as a backbone, but I guess not

…trip

Backfill missing coverage: previously every test in the blosc suite used
only "lz4" or "zstd" for cname and only "bitshuffle" or "shuffle" for
shuffle. Add four parametrized tests driven by BLOSC_CNAME / BLOSC_SHUFFLE:

- accepts_all_cnames / accepts_all_shuffles: every value in the runtime
  tuple is accepted by BloscCodec and round-trips on the stored attribute.
  Catches drift between the BloscCnameLiteral / BloscShuffleLiteral type
  aliases and their runtime BLOSC_* counterparts.
- json_roundtrip_all_cnames / json_roundtrip_all_shuffles: BloscCodec
  to_dict / from_dict preserves every value. Codec fields are fully
  specified so the test doesn't trip over tunable-attribute state, which
  is not part of the JSON form.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Comment on lines +172 to +173
@pytest.mark.parametrize("shuffle", BLOSC_SHUFFLE)
def test_blosc_codec_accepts_all_shuffles(shuffle: BloscShuffleLiteral) -> None:
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ilan-gold this demonstrates the utility of a value that matches the literal values. making this type-safe with get_args requires casting boilerplate, because get_args returns tuple[Any,...].

Comment thread src/zarr/codecs/blosc.py
The data type size in bytes used for shuffle filtering.
cname : BloscCname
The compression algorithm being used (lz4, lz4hc, blosclz, snappy, zlib, or zstd).
cname : BloscCnameLiteral
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once we fully deprecate the enums, the BloscCname identifier will be available, and I would bind it to the literal type.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

downstream Downstream libraries using zarr

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants