Add release workflow#264
Conversation
There was a problem hiding this comment.
Idk that I have anything new to add, so I suppose I'll list the tradeoffs as I've clarified my understanding over the past few months.
The threat model that I want to see us operate under is that a compromise of a repo admin's Github PAT cannot lead to an npm publish. E.g. npm i in the 2 hour window when there's malware in someone's dep tree somewhere and I get my github cli PAT exfiled.
Today we are protected via local publishing w/ 2fa npm gate.
The tradeoff w/ TP is that we gain release automation, but we become vulnerable to the above threat. We do not currently use any npm tokens in CI, so the TP benefit of obseleting them does not apply to us.
I went down the rabbit hole on all the Github 2fa gates for Trusted Publishing I could think of, envionments w/ required approvals, verifying tags are signed by a known key, etc.
The core issue with gating at Github is a "fox in the the hen house" problem where:
- Any gate we can configure lives in Github settings, or workflows
- A compromised repo admin PAT is sufficient to disable all those gates, via API calls to reconfigure environments (dropping required reviewers, allowing self approval, tampering w/ workflows, rewriting branch protection).
- No 2fa is required to do the above via the API (I verified the environment tampering via gh cli with a dummy repo)
(claude has more technical terms for the structural issue: "GitHub's auth domain and config domain are the same trust domain. A token that can act on the repo can also rewrite the rules governing how the repo can be acted upon.")
I know none of that is in this PR, but I have been trying to figure out how to harden a TP model bc the workflow in the PR (which is the recommended setup by githhub/npm) is susceptible today to someone w/ tag push access having their GH PAT compromised.
I don't think TP can satisfy the threat model I desire of "a compromise of a repo admin's Github PAT cannot lead to an npm publish."
I evaluated the step security wait-for-secrets approach of eschewing TP and bringing in an external workflow to bring back 2fa prompt. It relies on long (90 day) lived tokens in CI, but it at least closes the risk of a gh token exfil leading to publish package. Assuming that packages and user accounts are all 2fa publishing protected, an attacker can do maximal damage within a repo w/ a repo admin's stolen token but cannot cause a publish.
I dont like any of our options to be honest. My wishlist is just that there was first party CI publishing w/ a npm 2fa challenge. The Step Security workflow is closest to that, but is not TP, and has overhead that is gross (keeping people's npm tokens in repo secrets, rotating, 3rd party touching 2fa tokens). I dont like it, but it does at least pass the threat model. So idk that I'd recommend it over local publish, but if CI publish is a requirement, that does seeem like the safest one.
I wrote this all up to get my thoughts out there and see how we feel. Am I wrong about any facts here? Is there agreement on the threat model I outlined about preventing publish on repo admin compromise? Do we want to operate under a different threat model? Are we already operating under a different threat model?
|
You're absolutely right on GitHub token exploitation and I haven't come up with anything better, so I don't think there's any acceptable solution today without some sort of additional feature development from NPM. The best case is that they enable either a staging area or 2FA that can be done async, because Trusted Publishing is preferable to managing any kind of token rotation ourselves or potentially exposing ourselves to new third parties that could be exploited. I'd prefer to keep the surface area of attack minimal and, at least with TP, that's only GitHub. |
Co-authored-by: Jon Church <me@jonchurch.com>
bjohansebas
left a comment
There was a problem hiding this comment.
LGTM, that new command definitely simplifies everything
|
@jonchurch Just a heads up but you are still blocking this PR. That said, I don't intend to merge it until latest node comes with |
| - run: npm ci | ||
| - run: npm test | ||
| - run: npm stage publish # Runs prepublish script. |
There was a problem hiding this comment.
What do yall think about:
- dropping the test run from the publish job
- splitting out the pack and publish steps, so the publish job doesn't need to run
npm ci? That firewalls the OIDC token from thenpm iwhich couldstealmint a token if somehow a poisoned install script got into the tree. Idk how duplicate stage attempts work, aka if they attempted to stage a version before us and then we stage the same version. Maybe it is extra complexity without buying us much?
name: release
on:
push:
tags:
- "v*.*.*"
jobs:
# No id-token, no environment. ALL arbitrary code runs here, where
# there's nothing to steal: dependency install scripts, your tests,
# and your prepack/prepare build (npm pack triggers those).
pack:
name: Test & pack
runs-on: ubuntu-latest
permissions:
contents: read # checkout only
steps:
- uses: actions/checkout@93cb6efe18208431cddfb8368fd83d5badbf9bfd # v5.0.1
- uses: actions/setup-node@49933ea5288caeca8642d1e84afbd3f7d6820020 # v4.4.0
with:
node-version: latest
- run: npm ci
- run: npm test # idk still if we want test at all? doesn't hurt in this step I don't think
- run: npm pack --pack-destination ./tarball # runs prepack/prepare
- uses: actions/upload-artifact@043fb46d1a93c77aae656e7c1c64a875d1fc6a0a # v7.0.1
with:
name: npm-tarball
path: ./tarball/*.tgz
if-no-files-found: error
retention-days: 1
# Holds id-token + environment. Runs NO arbitrary code: downloads the
# prebuilt tarball and stages it. `npm publish <tarball>` treats the
# tarball as opaque bytes, so no lifecycle scripts run in this job.
publish:
name: Publish to npm
needs: pack
runs-on: ubuntu-latest
environment:
name: release
permissions:
id-token: write # OIDC claim for npm trusted publishing
steps:
- uses: actions/setup-node@49933ea5288caeca8642d1e84afbd3f7d6820020 # v4.4.0
with:
node-version: latest
- uses: actions/download-artifact@3e5f45b2cfb9172054b4087a40e8e0b5a5461e7c # v8.0.1
with:
name: npm-tarball
path: ./tarball
- run: npm stage publish ./tarball/*.tgzThere was a problem hiding this comment.
I think it's unnecessary. If the vulnerability is from a dependency, won't it still end up modifying things to get into the tarball and be published? Is the concern the id-token getting exfiltrated? I'm trying to think about what this is attempting to protect against and that's all I can think of.
There was a problem hiding this comment.
I realized in my original proposal I had npm ci --ignore-scripts but it wasn't in this PR, so I added it.
Another note is that using pack does make the things a bit more complex because I need to refactor how it "builds". It isn't a big deal independently but every package I maintain uses prepublishOnly, which runs before publish (and not before pack?), but it would need to be removed for this workflow to work. It's not an issue, but it is a footgun that makes it easy for someone manually publishing to forget to build the code therefore publishing an outdated or old version.
Edit: prepublishOnly. We can switch to prepare which resolves this but I explicitly use prepublishOnly because I don't want to build every time I run npm install. All easily solvable but I just don't know if it's even adding any extra protection to need refactoring here.
Adds a release workflow in GitHub Actions.