Skip to content

GH-47435: [Python][Parquet] Add direct key encryption/decryption API#49667

Merged
adamreeve merged 11 commits into
apache:mainfrom
smaheshwar-pltr:parquet-direct-key-encryption
May 24, 2026
Merged

GH-47435: [Python][Parquet] Add direct key encryption/decryption API#49667
adamreeve merged 11 commits into
apache:mainfrom
smaheshwar-pltr:parquet-direct-key-encryption

Conversation

@smaheshwar-pltr

@smaheshwar-pltr smaheshwar-pltr commented Apr 6, 2026

Copy link
Copy Markdown
Contributor

Rationale for this change

See #47435.

What changes are included in this PR?

Adds direct encryption / decryption Python API

Are these changes tested?

Yes, see PR.

Are there any user-facing changes?

Yes, new Python bindings.

@github-actions

github-actions Bot commented Apr 6, 2026

Copy link
Copy Markdown

Thanks for opening a pull request!

If this is not a minor PR. Could you open an issue for this pull request on GitHub? https://github.com/apache/arrow/issues/new/choose

Opening GitHub issues ahead of time contributes to the Openness of the Apache Arrow project.

Then could you also rename the pull request title in the following format?

GH-${GITHUB_ISSUE_ID}: [${COMPONENT}] ${SUMMARY}

or

MINOR: [${COMPONENT}] ${SUMMARY}

See also:

@smaheshwar-pltr smaheshwar-pltr changed the title Support Parquet direct encryption [Python][Parquet] Add direct key encryption/decryption API Apr 7, 2026
@smaheshwar-pltr

Copy link
Copy Markdown
Contributor Author

cc @ggershinsky, this is a requirement for PyIceberg encryption support, see apache/iceberg-python#3221

@ggershinsky

Copy link
Copy Markdown
Contributor

sgtm, #47435 (comment)

@smaheshwar-pltr smaheshwar-pltr changed the title [Python][Parquet] Add direct key encryption/decryption API GH-47435: [Python][Parquet] Add direct key encryption/decryption API Apr 8, 2026
@github-actions

github-actions Bot commented Apr 8, 2026

Copy link
Copy Markdown

⚠️ GitHub issue #47435 has been automatically assigned in GitHub to PR creator.

1 similar comment
@github-actions

github-actions Bot commented Apr 8, 2026

Copy link
Copy Markdown

⚠️ GitHub issue #47435 has been automatically assigned in GitHub to PR creator.

Comment thread python/pyarrow/_parquet_encryption.pyx Outdated
Comment on lines +726 to +733
This bypasses the KMS-based :class:`CryptoFactory` API and directly
constructs decryption properties from a plaintext key. This is useful
when the caller manages key wrapping externally (e.g. via an
application-level envelope encryption scheme).

For most use cases, prefer the higher-level :class:`CryptoFactory`
with :class:`DecryptionConfiguration`, which handles envelope
encryption and key rotation automatically.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per #47435 (comment), adding this documentation

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's true that the CryptoFactory handles key rotation automatically, that part of the comment should probably be removed. It does allow key rotation if you use external key material though.

@smaheshwar-pltr smaheshwar-pltr marked this pull request as ready for review April 8, 2026 19:06
@smaheshwar-pltr

Copy link
Copy Markdown
Contributor Author

Not sure who to ping here for review, @AlenkaF @raulcd @rok maybe could I please have some eyes on this?

@adamreeve adamreeve left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for implementing this @smaheshwar-pltr! I've left a few suggestions

Comment thread python/pyarrow/_parquet_encryption.pyx Outdated
Comment on lines +726 to +733
This bypasses the KMS-based :class:`CryptoFactory` API and directly
constructs decryption properties from a plaintext key. This is useful
when the caller manages key wrapping externally (e.g. via an
application-level envelope encryption scheme).

For most use cases, prefer the higher-level :class:`CryptoFactory`
with :class:`DecryptionConfiguration`, which handles envelope
encryption and key rotation automatically.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's true that the CryptoFactory handles key rotation automatically, that part of the comment should probably be removed. It does allow key rotation if you use external key material though.

Comment thread python/pyarrow/_parquet_encryption.pyx Outdated
Comment thread python/pyarrow/_parquet_encryption.pyx Outdated
Comment thread python/pyarrow/_parquet_encryption.pyx Outdated
@github-actions github-actions Bot added awaiting review Awaiting review awaiting committer review Awaiting committer review and removed awaiting review Awaiting review labels Apr 17, 2026
@github-actions github-actions Bot added awaiting review Awaiting review and removed awaiting review Awaiting review awaiting committer review Awaiting committer review labels Apr 17, 2026
@smaheshwar-pltr

Copy link
Copy Markdown
Contributor Author

@adamreeve, thank you for the great review here 🙌 . I've reworked the docs now - they are consistent with my (inexperienced) understanding.

Would appreciate your eyes on this again, please let me know what you think!

@github-actions

Copy link
Copy Markdown

Revision: 3e529d5

Submitted crossbow builds: ursacomputing/crossbow @ actions-4b28d0a71b

Task Status
preview-docs GitHub Actions

@adamreeve adamreeve left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the changes @smaheshwar-pltr, I just noticed one more minor issue that I missed first time around sorry.

Comment thread python/pyarrow/_parquet_encryption.pyx Outdated
@github-actions github-actions Bot added awaiting committer review Awaiting committer review and removed awaiting review Awaiting review labels Apr 20, 2026
@adamreeve

adamreeve commented Apr 21, 2026

Copy link
Copy Markdown
Contributor

@smaheshwar-pltr, can you please comment on #47435 so I can assign it to you? Eg. just a "take" comment.

Then I'll merge this, the CI failures are unrelated and also failing on main.

@adamreeve

Copy link
Copy Markdown
Contributor

I've created #49821 as a follow-up issue to add support for per-column keys.


cdef cppclass CFileDecryptionPropertiesBuilder\
" parquet::FileDecryptionProperties::Builder":
CFileDecryptionPropertiesBuilder() except +

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If all these APIs can raise C++ exceptions, which kind of exceptions will be raised on the Python side?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I don't think these specific calls actually reach a (C++) throw on the paths we hit, so this is being defensive. We do validations python-side that throw TypeError and ValueError on e.g. invalid length / algorithm.)

If an exception was raised somehow though, I think we'd be fine and are consistent with elsewhere in this file - from https://cython.readthedocs.io/en/latest/src/userguide/wrapping_CPlusPlus.html#exceptions, the except + here would surface it as a RuntimeError.

For example, FileMetaData::AppendRowGroups is also exposed as except +

void AppendRowGroups(const CFileMetaData& other) except +

and reachably throws ParquetException

throw ParquetException(msg);

which would Cython surfaces as RuntimeError - we even have a test for this:

with pytest.raises(RuntimeError, match=prefix + expected_error):

Comment on lines +736 to +738
b"0123456789abcdef",
b"0123456789abcdef01234567",
b"0123456789abcdef0123456789abcdef",

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: reuse the KEY_ constants above?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, addressed in 640b66c (I also lifted to module scope to match the other constants)

dec_props_no_aad = pe.create_decryption_properties(
footer_key=self.KEY_128,
)
with pytest.raises(IOError):

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we also test something about the exception message?

Suggested change
with pytest.raises(IOError):
with pytest.raises(IOError, match='XXX...'):

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, addressed in 640b66c.

footer_key=self.KEY_128,
aad_prefix=b"wrong_prefix",
)
with pytest.raises(IOError):

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, addressed in 640b66c.

Comment thread python/pyarrow/_parquet_encryption.pyx Outdated
CFileDecryptionPropertiesBuilder* builder
shared_ptr[CFileDecryptionProperties] props

footer_key_bytes = tobytes(footer_key)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't call tobytes as it will utf8-encode a str object.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, addressed in 640b66c I think with a isinstance(footer_key, bytes) check.

Comment thread python/pyarrow/_parquet_encryption.pyx Outdated
builder.footer_key(c_footer_key)

if aad_prefix is not None:
c_aad_prefix = tobytes(aad_prefix)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks - #49667 (comment)

Comment thread python/pyarrow/_parquet_encryption.pyx Outdated
)

c_footer_key = CSecureString(<c_string>footer_key_bytes)
builder = new CFileDecryptionPropertiesBuilder()

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we using new? We can create a plain value, I think.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, switched to CFileDecryptionPropertiesBuilder in 640b66c as described

Comment thread python/pyarrow/_parquet_encryption.pyx Outdated
shared_ptr[CFileEncryptionProperties] props
ParquetCipher cipher

footer_key_bytes = tobytes(footer_key)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here and other instances below.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, thanks

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I had to revert this specifically in 72029b6 due to https://github.com/apache/arrow/actions/runs/25529723022/job/76124851585 - CI failed with "C++ class must have a nullary constructor to be stack allocated". parquet::FileEncryptionProperties::Builder only declares explicit Builder(SecureString footer_key) (no default ctor), so Cython can't stack-allocate it.

@smaheshwar-pltr smaheshwar-pltr force-pushed the parquet-direct-key-encryption branch from 5c018f7 to 640b66c Compare May 2, 2026 01:06
@smaheshwar-pltr

Copy link
Copy Markdown
Contributor Author

Thank you for the review @pitrou 🙏 I believe all comments are now addressed, PTAL!

@smaheshwar-pltr

Copy link
Copy Markdown
Contributor Author

Friendly ping @adamreeve @pitrou 🙏

@smaheshwar-pltr

Copy link
Copy Markdown
Contributor Author

Friendly ping here @adamreeve @pitrou / others! Would love to get this in / reviewed!

@adamreeve adamreeve left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The latest changes look good to me thanks @smaheshwar-pltr.

The build timing out is a problem on other branches too so looks unrelated to this change.

@smaheshwar-pltr

Copy link
Copy Markdown
Contributor Author

Thanks @adamreeve I've pushed an empty commit in 5dd54ca to retrigger 🙏

@adamreeve

Copy link
Copy Markdown
Contributor

The timeout is happening consistently but doesn't need to pass for this PR to be merged. I've raised #49998 for this.

I'll leave this PR open a while longer in case @pitrou wants to take another look.

@adamreeve adamreeve merged commit 336fdb3 into apache:main May 24, 2026
17 of 18 checks passed
@adamreeve adamreeve removed the awaiting committer review Awaiting committer review label May 24, 2026
@conbench-apache-arrow

Copy link
Copy Markdown

After merging your PR, Conbench analyzed the 4 benchmarking runs that have been run so far on merge-commit 336fdb3.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details. It also includes information about 9 possible false positives for unstable benchmarks that are known to sometimes produce them.

Mottl pushed a commit to Mottl/arrow that referenced this pull request May 26, 2026
…n API (apache#49667)

### Rationale for this change

See apache#47435.

### What changes are included in this PR?

Adds direct encryption / decryption Python API 

### Are these changes tested?

Yes, see PR.

### Are there any user-facing changes?

Yes, new Python bindings.
* GitHub Issue: apache#47435

Authored-by: Sreesh Maheshwar <smaheshwar@palantir.com>
Signed-off-by: Adam Reeve <adreeve@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants