[WIP][python] Add user identity-based token cache for RESTTokenFileIO#7564
Closed
shyjsarah wants to merge 4 commits intoapache:masterfrom
Closed
[WIP][python] Add user identity-based token cache for RESTTokenFileIO#7564shyjsarah wants to merge 4 commits intoapache:masterfrom
shyjsarah wants to merge 4 commits intoapache:masterfrom
Conversation
- Add user_identity parameter to RESTTokenFileIO constructor - Use (path, user_identity) as token cache key to ensure proper isolation between different users accessing the same table location - Add _extract_user_identity() method in RESTCatalog to extract user identity from authentication provider (Bear Token, DLF AccessKey, ECS Role, etc.) - Implement double-check locking pattern for thread-safe token caching - Add comprehensive unit tests covering: * Multi-user isolation scenarios * Token cache reuse for same user * Table rename preservation (path-based caching) * Token expiry checking This prevents token cache pollution where different users would share the same data token when accessing the same table.
Use actual access_key_id from token for all DLF authentication types:
- ECS Role and STS File now extract the real access_key_id (STS.xxx) instead of role name or file path
- Ensures consistent user identity identification across all auth methods
- Simplified logic: all DLF auth types use 'dlf:{access_key_id}' format
Remove trailing whitespace in blank lines to comply with PEP 8. No functional changes, only formatting fixes.
…arer token, remove fallback
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Purpose
This PR implements user identity-based token cache isolation in
RESTTokenFileIOto prevent multiple users from sharing the same data token when accessing the same table location.Problem: The previous implementation used only table identifier as the token cache key, causing token cache pollution when multiple users accessed the same table within a single process. All users would share the same data token regardless of their actual authentication credentials.
Solution:
(path, user_identity)as the token cache key to ensure proper isolation between different usersKey Changes:
user_identityparameter toRESTTokenFileIO.__init__()- stores user identity for cache key construction_extract_user_identity()method inRESTCatalogto extract user identity from auth provider:"bear:{token}""dlf:{access_key_id}"(actual AK from token, e.g., STS.xxx)try_to_refresh_token()to use(path, user_identity)tuple as cache keyuser_identitywhen creatingRESTTokenFileIOinstances infile_io_for_data()Tests
New Unit Tests (
test_token_cache_isolation.py):test_different_users_have_separate_token_cache- Verifies different users get separate tokens for same pathtest_same_user_reuses_token_cache- Verifies same user reuses cached tokentest_table_rename_preserves_token_cache- Verifies table rename doesn't break token cache (path-based)test_empty_user_identity_isolation- Verifies empty user identity handlingtest_token_cache_with_expiry_check- Verifies token expiration checking logic