Skip to content

SSLSocket write and non-block fixes#354

Open
kares wants to merge 7 commits intomasterfrom
gem-error-minimal
Open

SSLSocket write and non-block fixes#354
kares wants to merge 7 commits intomasterfrom
gem-error-minimal

Conversation

@kares
Copy link
Copy Markdown
Member

@kares kares commented Apr 1, 2026

another attempt at (minimal) fixing non-blocking write + read.

similar to #351 but wout test madness (tests here are still not great but we can revisit those later)

includes #347

jsvd and others added 7 commits April 1, 2026 19:40
write() unconditionally called netWriteData.clear() after a non-blocking
flushData() that may not have flushed all encrypted bytes; discarding
unflushed TLS records, corrupting the encrypted stream and causing
'Broken pipe' or 'Connection reset' errors on subsequent writes (most
commonly seen during 'gem push' over https of large artifacts)

additionally, sysread needs to also flush pending writes before
reading since after write_nonblock, encrypted bytes could remain unsent;
without flushing first the server would never receive the complete
request body (e.g. net/http POST), causing it to time out or reset
`readAndUnwrap()` called doHandshake(blocking) assuming `exception = true`

When a post-handshake TLS event (e.g. TLS 1.3 NewSessionTicket) triggered
handshake during a non-blocking read, waitSelect raised SSLErrorWaitReadable
instead of returning :wait_readable.
- writeNonblockDataIntegrity: approximates the gem push scenario (#242)
  large payload via write_nonblock loop, then read server's byte count
  response, assert data integrity (no bytes lost)
- writeNonblockNetWriteDataState: saturates TCP buffer, then accesses
  the package-private netWriteData field directly to verify buffer
  consistency after the compact() fix
@kares
Copy link
Copy Markdown
Member Author

kares commented Apr 2, 2026

@jsvd have included #347 here, if you have any opinions please share

@kares kares requested a review from headius April 2, 2026 09:00
Copy link
Copy Markdown
Member

@headius headius left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The batch of fixes looks pretty good after a first review pass. The tests are heinous but you know that. I suppose in the interest of getting fixes landed we can go with them for now but they shouldn't stay like this for long. It's very hard to follow what they're testing, or if they're actually testing what they should be.

private void doWrap(final boolean blocking) throws IOException {
netWriteData.clear();
SSLEngineResult result = engine.wrap(dummy, netWriteData);
SSLEngineResult result = engine.wrap(EMPTY_DATA.duplicate(), netWriteData);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it necessary to duplicate EMPTY_DATA? It is already zero-length and marked as read-only.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't be, except engine.wrap in theory does set limit although likely not on an empty buffer.

netWriteData.clear();
try {
engine.wrap(dummy, netWriteData); // send close (after sslEngine.closeOutbound)
engine.wrap(EMPTY_DATA.duplicate(), netWriteData); // send close (after sslEngine.closeOutbound)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto here, why duplicate?

totalSent += ((RubyInteger) written).getLongValue();
} catch (RaiseException e) {
if ("OpenSSL::SSL::SSLErrorWaitWritable".equals(e.getException().getMetaClass().getName())) {
System.out.println("syswrite_nonblock expected: " + e.getMessage());
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't these be asserts or fails, so they show up as the cause of failure? stdout printlns can get lost in test harnesses.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this part is mostly debug messages, the other branch fails.

they're useful esp. since a lot of time these exceptions do not have a trace.

} catch (RaiseException e) {
String errorName = e.getException().getMetaClass().getName();
if ("EOFError".equals(errorName) || "IOError".equals(errorName)) { // client closes connection
System.out.println("server-reader expected: " + e.getMessage());
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again seems weird to have printlns in a test, even if it's running in a separate thread.

if ("EOFError".equals(errorName) || "IOError".equals(errorName)) { // client closes connection
System.out.println("server-reader expected: " + e.getMessage());
} else {
System.err.println("server-reader unexpected: " + e.getMessage());
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now we're outputting to err?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

OpenSSL::Buffering#read_nonblock(..., exception: false) can still throw exceptions gem push fails with Broken Pipe IOError

3 participants