-
Notifications
You must be signed in to change notification settings - Fork 452
Aaronb/vendor base 64 poc #8578
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
7188ce06
wants to merge
3
commits into
clerk:main
Choose a base branch
from
7188ce06:aaronb/vendor-base-64-poc
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
3 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,89 @@ | ||
| # Vendored: `base-64` | ||
|
|
||
| - **Upstream:** https://github.com/mathiasbynens/base64 | ||
| - **Vendored version:** 1.0.0 (published 2020-12-12) | ||
| - **License:** MIT (see `upstream/LICENSE-MIT.txt`) | ||
| - **Maintainer (npm):** `mathias` (single) | ||
| - **Owner inside Clerk:** `@clerk/expo` maintainers | ||
|
|
||
| ## Why this is vendored | ||
|
|
||
| `base-64` is the userland `atob`/`btoa` implementation that `@clerk/expo` polyfills onto `global` for Hermes engines that lack native versions (see `packages/expo/src/polyfills/base64Polyfill.ts`). When `@clerk/expo` is installed by a customer, the published tarball declares `base-64` as a runtime external. The customer's package manager resolves `^1.0.0` against the npm registry at install time and fetches `base-64` fresh — Clerk's own `pnpm-lock.yaml` is not in the published tarball and does not participate in the customer's install. | ||
|
|
||
| That externality + the fact that `base-64`'s exports become **`global.btoa` and `global.atob` inside every Clerk-Expo customer's running app** makes this dependency a high-leverage supply-chain target. Two attack chains motivate vendoring: | ||
|
|
||
| ### Chain 1 — Publisher account compromise | ||
|
|
||
| `base-64` is single-maintainer. If `mathias`'s npm account is compromised (phishing, token theft, hostile transfer, social engineering) and a malicious `base-64@1.0.1` is published, every customer install of `@clerk/expo` after the publish resolves `^1.0.0` to `1.0.1` and pulls the compromised bytes. The polyfill assigns the compromised `encode`/`decode` to `global.btoa`/`global.atob`. Every subsequent `btoa()` or `atob()` call anywhere in the customer's app — including third-party libraries and Clerk's own runtime — runs through the compromised implementation, silently. | ||
|
|
||
| Historical precedents in this class: `event-stream` (2018), `ua-parser-js` (2021), `colors.js`/`faker.js` (2022), `xz-utils` (2024). Each was a maintainer-account compromise that published a malicious new version. | ||
|
|
||
| Pinning to an exact version (`"base-64": "1.0.0"` instead of `"^1.0.0"`) would close Chain 1 for direct deps — the customer's resolver would never pick up `1.0.1`. But: | ||
|
|
||
| ### Chain 2 — Registry-level same-version substitution | ||
|
|
||
| If the npm registry itself serves substituted bytes for `base-64@1.0.0` (registry compromise, malicious unpublish-then-republish, npm-internal account compromise), the customer's first install of `@clerk/expo` fetches the substituted bytes, computes their hash, and records it as the trusted reference in their lockfile. There is no prior hash to compare against. Future installs with `--frozen-lockfile` "verify" against the now-poisoned hash. | ||
|
|
||
| Exact-pinning does not address Chain 2 — the resolver still routes through the registry for `1.0.0`, and whatever bytes the registry serves are what the customer gets. Vendoring is the only mechanism that closes both chains: the customer's resolver never fetches `base-64` from the npm registry because the bytes ship inside the `@clerk/expo` npm tarball. | ||
|
|
||
| | | Caret range | Exact pin | Vendored | | ||
| |---|---|---|---| | ||
| | Chain 1 (new version) | ❌ | ✅ | ✅ | | ||
| | Chain 2 (same-version substitution, first install) | ❌ | ❌ | ✅ | | ||
|
|
||
| See `Sessions/S161/PROPOSAL.md` for the broader proposal. | ||
|
|
||
| ## What's in `upstream/` | ||
|
|
||
| `upstream/` is a **byte-for-byte copy of the published `base-64@1.0.0` npm tarball.** Nothing in that directory has been modified by Clerk. | ||
|
|
||
| ``` | ||
| upstream/ | ||
| ├── base64.js (~164 lines, single-file UMD; exports {encode, decode, version}) | ||
| ├── package.json (upstream's; see "inert fields" below) | ||
| ├── LICENSE-MIT.txt (MIT) | ||
| └── README.md (upstream's README) | ||
| ``` | ||
|
|
||
| ### Inert fields in `upstream/package.json` | ||
|
|
||
| The upstream `package.json` is preserved so future refresh diffs against new `base-64` versions match byte-for-byte against `npm pack` output. These fields are **inert in this location** — they do nothing here: | ||
|
|
||
| - `scripts.*` — not executed; no install lifecycle runs against vendored code. | ||
| - `main: "base64.js"` — bundlers do not walk inner `package.json` of nested `src/vendor/` directories for relative imports; the Clerk-side `index.ts` (in this directory) handles resolution explicitly. | ||
|
|
||
| ## How consumers import it | ||
|
|
||
| Inside `@clerk/expo`: | ||
|
|
||
| ```ts | ||
| import { decode, encode } from '../vendor/base-64'; | ||
| ``` | ||
|
|
||
| The Clerk-side `index.ts` shim re-exports from `./upstream/base64.js` with typed signatures, abstracting the bundler-resolution detail (see `index.ts`). | ||
|
|
||
| ## Refreshing from upstream | ||
|
|
||
| `upstream/` is intentionally frozen. Don't routinely sync. | ||
|
|
||
| Refresh **only** when: | ||
|
|
||
| - A CVE is reported against `mathiasbynens/base64` upstream, OR | ||
| - A spec-relevant bug is discovered. | ||
|
|
||
| `base-64@1.0.0` has been the only release since 2014. Any new upstream release after 2020-12-12 should be treated as anomalous and investigated before adoption. | ||
|
|
||
| Procedure: | ||
|
|
||
| 1. `npm pack base-64@<new-version>` in a scratch directory; extract. | ||
| 2. `diff -r` against `upstream/`. | ||
| 3. Read every changed line. Apply Clerk's threat model — is this a fix you want, or a behavior change you don't? | ||
| 4. If accepting: replace `upstream/` with the new tarball contents in one commit (no other changes). | ||
| 5. Re-run `parity.spec.ts` to confirm behavioral equivalence still holds. | ||
| 6. Update the vendored version in this README. | ||
|
|
||
| ## Tests | ||
|
|
||
| `__tests__/parity.spec.ts` asserts byte-equivalent inputs produce identical outputs between the upstream npm package (kept as `@clerk/expo`'s `devDependency`) and this vendored copy. Covers RFC 4648 fixtures, `atob`/`btoa` cross-compatibility, and Unicode edge cases. | ||
|
|
||
| The upstream `base-64` stays a `devDependency` of `@clerk/expo` for as long as this parity test exists. Removing the devDep would mean giving up the empirical comparator. |
135 changes: 135 additions & 0 deletions
135
packages/expo/src/vendor/base-64/__tests__/parity.spec.ts
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,135 @@ | ||
| /** | ||
| * Vendor parity test for base-64@1.0.0. | ||
| * | ||
| * Loads encode/decode from BOTH the upstream npm package (kept in | ||
| * @clerk/expo devDependencies for as long as this test exists) AND the | ||
| * vendored copy at ../upstream/base64.js. Asserts byte-equivalent inputs | ||
| * produce identical outputs. | ||
| * | ||
| * What this test buys: | ||
| * - The byte-equivalence check (tools/verify-vendor.sh) proves the bytes | ||
| * on disk match the upstream tarball. | ||
| * - This test proves that loading those bytes through our bundler / | ||
| * test runtime produces upstream's behavior — closing the gap between | ||
| * "the bytes are correct" and "the runtime does what upstream does." | ||
| * | ||
| * When to remove this test: | ||
| * - When the upstream `base-64` devDependency is removed from | ||
| * packages/expo/package.json, this file must be removed too (the | ||
| * `from 'base-64'` import would fail to resolve). Removing the devDep | ||
| * means losing the comparator; don't do that unless the vendoring | ||
| * approach is fully accepted. | ||
| * | ||
| * See packages/expo/src/vendor/base-64/README.md for the broader vendoring | ||
| * rationale and the customer-side attack chains that motivate it. | ||
| */ | ||
|
|
||
| // eslint-disable-next-line no-restricted-imports -- intentional: comparator for vendor parity | ||
| import { decode as upstreamDecode, encode as upstreamEncode } from 'base-64'; | ||
| import { describe, expect, it } from 'vitest'; | ||
|
|
||
| import { decode as vendoredDecode, encode as vendoredEncode } from '../'; | ||
|
|
||
| /** | ||
| * RFC 4648 §10 test vectors — canonical base64 fixtures from the spec. | ||
| * Every base64 implementation should handle these identically. | ||
| */ | ||
| const RFC4648_VECTORS: Array<[plain: string, encoded: string]> = [ | ||
| ['', ''], | ||
| ['f', 'Zg=='], | ||
| ['fo', 'Zm8='], | ||
| ['foo', 'Zm9v'], | ||
| ['foob', 'Zm9vYg=='], | ||
| ['fooba', 'Zm9vYmE='], | ||
| ['foobar', 'Zm9vYmFy'], | ||
| ]; | ||
|
|
||
| /** | ||
| * Cases beyond the RFC vectors — the polyfill use case is hijacking | ||
| * global.btoa / global.atob, so the parity surface must cover everything | ||
| * an arbitrary caller (third-party library) might throw at it. | ||
| */ | ||
| const EXTRA_VECTORS: Array<[label: string, plain: string]> = [ | ||
| ['empty', ''], | ||
| ['single null byte', '\x00'], | ||
| ['Latin-1 high', '\xff'], | ||
| ['arbitrary binary', '\x00\x01\x02\x03\x04\xfd\xfe\xff'], | ||
| ['ASCII letters', 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789'], | ||
| ['ASCII symbols', '!@#$%^&*()_+-=[]{}|;:,.<>?/~`\'"\\'], | ||
| ['long string', 'a'.repeat(1024)], | ||
| ['exactly 3 bytes', 'abc'], | ||
| ['exactly 4 bytes (forces padding=)', 'abcd'], | ||
| ['exactly 6 bytes (no padding)', 'abcdef'], | ||
| ['JSON shape', '{"foo":"bar","baz":[1,2,3]}'], | ||
| ]; | ||
|
|
||
| describe('base-64 vendor parity — RFC 4648 fixtures', () => { | ||
| it.each(RFC4648_VECTORS)('encode(%j) === %j', (plain, encoded) => { | ||
| expect(vendoredEncode(plain)).toBe(upstreamEncode(plain)); | ||
| expect(vendoredEncode(plain)).toBe(encoded); // canonical anchor | ||
| }); | ||
|
|
||
| it.each(RFC4648_VECTORS)('decode(%j) === %j', (plain, encoded) => { | ||
| expect(vendoredDecode(encoded)).toBe(upstreamDecode(encoded)); | ||
| expect(vendoredDecode(encoded)).toBe(plain); // canonical anchor | ||
| }); | ||
| }); | ||
|
|
||
| describe('base-64 vendor parity — extra fixtures', () => { | ||
| it.each(EXTRA_VECTORS)('encode/decode roundtrip: %s', (_label, plain) => { | ||
| const vEnc = vendoredEncode(plain); | ||
| const uEnc = upstreamEncode(plain); | ||
| expect(vEnc).toBe(uEnc); | ||
| expect(vendoredDecode(vEnc)).toBe(upstreamDecode(uEnc)); | ||
| expect(vendoredDecode(vEnc)).toBe(plain); | ||
| }); | ||
| }); | ||
|
|
||
| describe('base-64 vendor parity — deterministic fuzz', () => { | ||
| it('matches upstream for 512 random binary strings of varying length', () => { | ||
| for (let seed = 0; seed < 512; seed++) { | ||
| const len = (seed * 37) % 256; | ||
| const chars: string[] = []; | ||
| for (let i = 0; i < len; i++) { | ||
| // Latin-1 range only (0-255) — what base-64 contracts on. | ||
| chars.push(String.fromCharCode((seed + i * 13) & 0xff)); | ||
| } | ||
| const plain = chars.join(''); | ||
| const vEnc = vendoredEncode(plain); | ||
| const uEnc = upstreamEncode(plain); | ||
| expect(vEnc, `seed=${seed}`).toBe(uEnc); | ||
| expect(vendoredDecode(vEnc), `seed=${seed}`).toBe(upstreamDecode(uEnc)); | ||
| } | ||
| }); | ||
| }); | ||
|
|
||
| describe('base-64 vendor parity — error handling', () => { | ||
| // Both upstream and vendored should throw on invalid input. We don't pin | ||
| // the error message, just that they agree on which inputs throw. | ||
| const INVALID: Array<[label: string, invalid: string]> = [ | ||
| ['truncated padding', 'Zm9'], | ||
| ['invalid char', 'Zm$9v'], | ||
| ['stray padding', 'Zm9v='], | ||
| ]; | ||
| it.each(INVALID)('decode throws-or-matches on invalid input: %s', (_label, invalid) => { | ||
| let vErr: unknown = null; | ||
| let uErr: unknown = null; | ||
| let vResult: string | null = null; | ||
| let uResult: string | null = null; | ||
| try { | ||
| vResult = vendoredDecode(invalid); | ||
| } catch (e) { | ||
| vErr = e; | ||
| } | ||
| try { | ||
| uResult = upstreamDecode(invalid); | ||
| } catch (e) { | ||
| uErr = e; | ||
| } | ||
| // Either both threw or both produced the same output. | ||
| expect(Boolean(vErr)).toBe(Boolean(uErr)); | ||
| if (!vErr) { | ||
| expect(vResult).toBe(uResult); | ||
| } | ||
| }); | ||
| }); |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,47 @@ | ||
| /** | ||
| * Clerk-side entry for the vendored `base-64` package. | ||
| * | ||
| * Why this file exists: | ||
| * - `upstream/` is a verbatim copy of the base-64@1.0.0 npm tarball and | ||
| * must not be modified (byte-equivalence is the security claim — see | ||
| * ./README.md). | ||
| * - The upstream ships as a UMD wrapper — its exports are assigned to | ||
| * `module.exports` inside an IIFE that TypeScript cannot trace | ||
| * statically. tsc therefore infers an empty export surface for the | ||
| * `.js` file even with `allowJs: true`. The cast below asserts the | ||
| * export shape; the parity test (`./__tests__/parity.spec.ts`) | ||
| * verifies the assertion empirically against the upstream npm | ||
| * package. | ||
| * - Consumers import from this file (`../vendor/base-64`); they should | ||
| * not reach into `upstream/` directly. | ||
| */ | ||
|
|
||
| import * as upstreamModule from './upstream/base64.js'; | ||
|
|
||
| interface Base64Module { | ||
| encode: (input: string) => string; | ||
| decode: (input: string) => string; | ||
| version: string; | ||
| } | ||
|
|
||
| // tsc infers `typeof upstreamModule` as `{}` because base-64's UMD wrapper | ||
| // hides the module.exports assignment inside an IIFE. The shape asserted | ||
| // here is verified empirically by __tests__/parity.spec.ts against the | ||
| // upstream npm package (kept as a devDependency for this purpose). | ||
| const upstream: Base64Module = upstreamModule as unknown as Base64Module; | ||
|
|
||
| /** | ||
| * Encode a binary-safe string to base64. Compatible with the WHATWG | ||
| * `btoa()` algorithm (RFC 4648 §4). Throws on non-Latin-1 input. | ||
| * | ||
| * Vendored from base-64@1.0.0 — see ./README.md. | ||
| */ | ||
| export const encode: (input: string) => string = upstream.encode; | ||
|
|
||
| /** | ||
| * Decode a base64-encoded string back to a binary string. Compatible with | ||
| * the WHATWG `atob()` algorithm (RFC 4648 §4). Throws on invalid input. | ||
| * | ||
| * Vendored from base-64@1.0.0 — see ./README.md. | ||
| */ | ||
| export const decode: (input: string) => string = upstream.decode; |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,20 @@ | ||
| Copyright Mathias Bynens <https://mathiasbynens.be/> | ||
|
|
||
| Permission is hereby granted, free of charge, to any person obtaining | ||
| a copy of this software and associated documentation files (the | ||
| "Software"), to deal in the Software without restriction, including | ||
| without limitation the rights to use, copy, modify, merge, publish, | ||
| distribute, sublicense, and/or sell copies of the Software, and to | ||
| permit persons to whom the Software is furnished to do so, subject to | ||
| the following conditions: | ||
|
|
||
| The above copyright notice and this permission notice shall be | ||
| included in all copies or substantial portions of the Software. | ||
|
|
||
| THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, | ||
| EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF | ||
| MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND | ||
| NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE | ||
| LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION | ||
| OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION | ||
| WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Global
base-64restriction conflicts with parity tests and can break lint.This rule is global, but the PR includes parity tests that intentionally import the upstream
base-64package for behavioral comparison. As written, those tests will be lint-blocked unless excluded.Scope this restriction to non-test Expo source files (or add a test-file override exception).
Suggested fix (scope restriction away from test files)
{ name: 'repo/global', @@ 'no-restricted-imports': [ 'error', { paths: [ { message: "Please always import from '`@clerk/shared/`<module>' instead of '`@clerk/shared`'.", name: '`@clerk/shared`', }, - { - name: 'base-64', - message: - "base-64 is vendored at packages/expo/src/vendor/base-64. Import { encode, decode } from '../vendor/base-64' instead. See packages/expo/src/vendor/base-64/README.md.", - }, ], @@ ], }, }, + { + name: 'packages/expo base-64 restriction', + files: ['packages/expo/src/**/*.{ts,tsx,js,jsx}'], + ignores: ['packages/expo/src/**/__tests__/**', 'packages/expo/src/**/*.test.{ts,tsx,js,jsx}'], + rules: { + 'no-restricted-imports': [ + 'error', + { + paths: [ + { + name: 'base-64', + message: + "base-64 is vendored at packages/expo/src/vendor/base-64. Import { encode, decode } from '../vendor/base-64' instead. See packages/expo/src/vendor/base-64/README.md.", + }, + ], + }, + ], + }, + },🤖 Prompt for AI Agents