deprecate blosc enums#3963
Conversation
Design for steering BloscCodec users toward literal-string parameters, with the enum classes kept importable but deprecated on member access. Canonical: https://gist.github.com/d-v-b/9fd3fe92f82a24c929129f42a6f11f60 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Member access on BloscShuffle / BloscCname now emits DeprecationWarning and returns the equivalent string. BloscCodec stores cname/shuffle as literal strings; passing a real enum.Enum instance to __init__ warns. BloscShuffle.from_int returns a str. Internal call sites in migrate_v3 continue to work because BloscCodec accepts both forms. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The BloscShuffle and BloscCname enums are deprecated; update doc examples to the recommended literal-string form. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The 0000 filename is a placeholder; rename to the PR number when the pull request is opened. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The design doc is published as a public gist linked from the PR description; the in-tree copy is no longer needed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replace Sphinx :class: roles and double-backticks with single backticks in the new docstrings added by the blosc enum deprecation. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #3963 +/- ##
==========================================
+ Coverage 93.26% 93.27% +0.01%
==========================================
Files 87 87
Lines 11721 11739 +18
==========================================
+ Hits 10932 10950 +18
Misses 789 789
🚀 New features to boost your workflow:
|
Add tests for the ValueError paths in _parse_cname / _parse_shuffle and the AttributeError path in the deprecated-enum metaclass, which codecov reported as uncovered. Drop a few "type: ignore" markers that mypy now flags as unused after the init signature widened. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…-python into deprecate-blosc-enums
mypy's per-file (pre-commit) and project-wide views disagree on whether the deliberately-wrong arguments need a type ignore. Using typing.cast keeps both views happy and is more explicit about what each test is asserting. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
closes #3457
Don't think this Pr does that.
Not sure why we can't use Literal in the strings https://zarr--3963.org.readthedocs.build/en/3963/api/zarr/codecs/#zarr.codecs.BloscCodec for the docs.
Any reason we are not using https://github.com/tox-dev/sphinx-autodoc-typehints?
| CName = Literal["lz4", "lz4hc", "blosclz", "snappy", "zlib", "zstd"] | ||
| """The codec identifiers used in the blosc codec """ | ||
|
|
||
| CNAME: Final = ("lz4", "lz4hc", "blosclz", "snappy", "zlib", "zstd") |
There was a problem hiding this comment.
There was a problem hiding this comment.
get_args is only useful at runtime, it doesn't help you with type checking. for a small number of literal strings that won't change, writing them out explicitly as a final collection does help with runtime and type checking.
There was a problem hiding this comment.
But this CNAME all-caps isn't used for anything except runtime checking unless I am missing something, no?
There was a problem hiding this comment.
Like, why even define it? get_args(CName) + lru_cache if there is a performance concern
There was a problem hiding this comment.
it's 6 strings that are defined in a spec. given the choice between get_args + lru_cache (two imports, produces something untyped) and just binding the 6 strings to a constant, the static constant is far simpler (no imports, gets typed correctly).
Anyone who wants to iterate over the values defined in the literal type needs a way of making those values. We should have been doing this in our tests, but we were not. I added some tests that exercise this.
we aren't using sphinx! |
Rename Shuffle -> BloscShuffleLiteral and CName -> BloscCnameLiteral. The bare Shuffle name collided with numcodecs.Shuffle re-exported from zarr.codecs, which would have caused mkdocstrings cross-refs in the BloscCodec docstring to resolve to the wrong symbol. The Literal suffix also clearly distinguishes the type alias from the deprecated BloscShuffle / BloscCname enum classes. Update the BloscCodec docstring to reference the new names in the Attributes and Parameters sections (Convention A from cast_value.py), with literal values enumerated in prose. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Parametrize the BloscShuffle / BloscCname member-access warning tests into a single test_blosc_enum_member_access_warns. - Parametrize the cname / shuffle reject-unknown tests into a single test_blosc_codec_rejects_unknown driven by **kwargs. - Parametrize the AttributeError-on-unknown-member tests into a single test_blosc_enum_attribute_error_for_unknown_member. - Add a docstring to every new test explaining what behavior it verifies, so reviewers don't have to read the body to understand the intent. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
good catch, I updated the PR summary to say "partially addresses" |
Oh right, mkdocs is completely separate - for a moment I was thinking it used sphinx as a backbone, but I guess not |
…trip Backfill missing coverage: previously every test in the blosc suite used only "lz4" or "zstd" for cname and only "bitshuffle" or "shuffle" for shuffle. Add four parametrized tests driven by BLOSC_CNAME / BLOSC_SHUFFLE: - accepts_all_cnames / accepts_all_shuffles: every value in the runtime tuple is accepted by BloscCodec and round-trips on the stored attribute. Catches drift between the BloscCnameLiteral / BloscShuffleLiteral type aliases and their runtime BLOSC_* counterparts. - json_roundtrip_all_cnames / json_roundtrip_all_shuffles: BloscCodec to_dict / from_dict preserves every value. Codec fields are fully specified so the test doesn't trip over tunable-attribute state, which is not part of the JSON form. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
| @pytest.mark.parametrize("shuffle", BLOSC_SHUFFLE) | ||
| def test_blosc_codec_accepts_all_shuffles(shuffle: BloscShuffleLiteral) -> None: |
There was a problem hiding this comment.
@ilan-gold this demonstrates the utility of a value that matches the literal values. making this type-safe with get_args requires casting boilerplate, because get_args returns tuple[Any,...].
| The data type size in bytes used for shuffle filtering. | ||
| cname : BloscCname | ||
| The compression algorithm being used (lz4, lz4hc, blosclz, snappy, zlib, or zstd). | ||
| cname : BloscCnameLiteral |
There was a problem hiding this comment.
Once we fully deprecate the enums, the BloscCname identifier will be available, and I would bind it to the literal type.
This PR deprecates the string enums used for the blosc codec.
Literalstrings are used instead. We only use enums in a few places -- Literal strings are much more abundant for representing constant string types -- and I think that's because enums in python are rather cumbersome compared toLiteral[<str>].If this PR has legs, I would follow up with similar deprecations for the other enums (used in the bytes and sharding codecs).
closespartially addresses #3457