Skip to content

ssl.SSLSocket.read() / write() missing ERR_clear_error() before SSL_read_ex() / SSL_write_ex() causes spurious errors with cooperative threading #148594

@kswia

Description

@kswia

Bug report

Bug description:

Summary

_ssl__SSLSocket_read_impl and _ssl__SSLSocket_write_impl in Modules/_ssl.c do not call ERR_clear_error() before SSL_read_ex() / SSL_write_ex(). This allows stale entries on the per-thread OpenSSL error queue to corrupt the result of SSL_get_error(), causing spurious BrokenPipeError or OSError on healthy SSL connections.

Affected versions

All current CPython versions. Confirmed in 3.12 branch (line 2544) and main / 3.15-dev (line 2941).

Root cause

The do { ... } while() retry loop in _ssl__SSLSocket_read_impl (Modules/_ssl.c L2939-2942 on main):

do {
    Py_BEGIN_ALLOW_THREADS;
    retval = SSL_read_ex(self->ssl, mem, (size_t)len, &count);
    err = _PySSL_errno(retval == 0, self->ssl, retval);
    Py_END_ALLOW_THREADS;
    // ...
} while (err.ssl == SSL_ERROR_WANT_READ || err.ssl == SSL_ERROR_WANT_WRITE);

_PySSL_errno() calls SSL_get_error(ssl, retcode), which internally calls ERR_peek_last_error(). Per the OpenSSL documentation:

In addition to ssl and ret, SSL_get_error() inspects the current thread's OpenSSL error queue. Thus, SSL_get_error() must be called in the same thread that performed the TLS/SSL I/O operation, and no other OpenSSL function calls should appear in between. The current thread's error queue must be empty before the TLS/SSL I/O operation is attempted, or SSL_get_error() will not work reliably.

If stale error entries are present on the queue from a prior SSL operation (on the same thread but a different SSL object), SSL_get_error() misattributes them and returns SSL_ERROR_SYSCALL instead of the correct SSL_ERROR_WANT_READ.

The same issue exists in _ssl__SSLSocket_write_impl.

When this manifests

This bug is invisible in multi-threaded programs because each OS thread has its own OpenSSL error queue. It becomes critical in cooperative multitasking frameworks (gevent, eventlet, asyncio with SSL) where multiple coroutines/greenlets share a single OS thread and thus a single OpenSSL error queue.

Concrete scenario (gevent):

  1. Greenlet A performs an SSL write on an HTTPS connection. The remote client has disconnected, so SSL_write_ex()send() fails with EPIPE. OpenSSL pushes an error entry onto the (per-thread) error queue. The greenlet handles the exception, but the error queue is not cleared.
  2. The gevent hub switches to Greenlet B, which is an AMQP consumer doing SSL_read_ex() on a healthy RabbitMQ connection.
  3. SSL_read_ex()recv() returns EAGAIN (no data available — normal for a non-blocking socket).
  4. SSL_get_error() finds the stale error from step 1 via ERR_peek_last_error() and returns SSL_ERROR_SYSCALL instead of SSL_ERROR_WANT_READ.
  5. _PySSL_errno() captures errno = 32 (stale EPIPE from step 1).
  6. CPython exits the retry loop, enters PySSL_SetError(), and raises BrokenPipeError(errno=32, "Broken pipe") on a perfectly healthy connection.

Evidence

  • Disassembly: The compiled _ssl.cpython-312-x86_64-linux-gnu.so confirms no ERR_clear_error (PLT 0x9050) before SSL_read_ex (PLT 0x93b0) at the call site.
  • Production telemetry: At the moment of every BrokenPipeError, getsockopt(SO_ERROR) returns 0 (no kernel-level error), and tcpdump shows no FIN/RST from the remote side — the TCP connection is healthy.
  • Workaround validation: Calling ERR_clear_error() (via ctypes) before every _sslobj.read() in a monkey-patched ssl.SSLSocket.read() completely eliminates the spurious errors. Tested for 15+ minutes under production load with zero errors, after months of constant failures every ~90 seconds.

Proposed fix

Add ERR_clear_error() before SSL_read_ex() and SSL_write_ex() in their respective retry loops:

do {
    Py_BEGIN_ALLOW_THREADS;
    ERR_clear_error();  /* Prevent stale errors from affecting SSL_get_error() */
    retval = SSL_read_ex(self->ssl, mem, (size_t)len, &count);
    err = _PySSL_errno(retval == 0, self->ssl, retval);
    Py_END_ALLOW_THREADS;
    // ...

This matches OpenSSL's documented requirement and is consistent with how CPython already calls ERR_clear_error() in other SSL functions (e.g., _ssl__SSLSocket_do_handshake_impl, _ssl_ctx_new).

Related

Reproducer

A minimal reproducer requires two SSL connections on the same OS thread. In pseudocode:

import ssl, socket, gevent

def writer_greenlet():
    """SSL connection that will fail, leaving stale error on queue"""
    ctx = ssl.create_default_context()
    sock = ctx.wrap_socket(socket.socket(), server_hostname="...")
    sock.connect(...)
    # Remote side disconnects
    sock.write(b"data")  # raises BrokenPipeError — leaves stale OpenSSL error

def reader_greenlet():
    """Healthy SSL connection that reads — gets spurious BrokenPipeError"""
    ctx = ssl.create_default_context()
    sock = ctx.wrap_socket(socket.socket(), server_hostname="...")
    sock.connect(...)
    # This should block waiting for data, but instead raises BrokenPipeError
    sock.read(4096)  # BrokenPipeError on a HEALTHY connection

gevent.joinall([
    gevent.spawn(writer_greenlet),
    gevent.spawn(reader_greenlet),
])

Versions

  • CPython: 3.12.12, also present in main (3.15-dev, commit d14e31e)
  • OpenSSL: 3.5.1 (also reproducible with 3.0.x, 3.2.x)
  • OS: RHEL 10.1
  • gevent: 25.4.1 / 25.8.2

The pseudocode reproducer is schematic — in practice, the trigger requires precise greenlet switching timing. The production scenario (AMQP consumers + HTTPS server in gevent) triggers it reliably every ~90 seconds.

CPython versions tested on:

3.12

Operating systems tested on:

Linux

Linked PRs

Metadata

Metadata

Assignees

No one assigned

    Labels

    3.13bugs and security fixes3.14bugs and security fixes3.15new features, bugs and security fixesextension-modulesC modules in the Modules dirtopic-SSLtype-bugAn unexpected behavior, bug, or error

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions