ssl.SSLSocket.read() / write() missing ERR_clear_error() before SSL_read_ex() / SSL_write_ex() causes spurious errors with cooperative threading

# Bug report

### Bug description:

### Summary

`_ssl__SSLSocket_read_impl` and `_ssl__SSLSocket_write_impl` in `Modules/_ssl.c` do not call `ERR_clear_error()` before `SSL_read_ex()` / `SSL_write_ex()`. This allows stale entries on the per-thread OpenSSL error queue to corrupt the result of `SSL_get_error()`, causing spurious `BrokenPipeError` or `OSError` on healthy SSL connections.

### Affected versions

All current CPython versions. Confirmed in `3.12` branch (line 2544) and `main` / 3.15-dev (line 2941).

### Root cause

The `do { ... } while()` retry loop in `_ssl__SSLSocket_read_impl` ([`Modules/_ssl.c` L2939-2942 on main](https://github.com/python/cpython/blob/main/Modules/_ssl.c#L2939-L2942)):

```c
do {
    Py_BEGIN_ALLOW_THREADS;
    retval = SSL_read_ex(self->ssl, mem, (size_t)len, &count);
    err = _PySSL_errno(retval == 0, self->ssl, retval);
    Py_END_ALLOW_THREADS;
    // ...
} while (err.ssl == SSL_ERROR_WANT_READ || err.ssl == SSL_ERROR_WANT_WRITE);
```

`_PySSL_errno()` calls `SSL_get_error(ssl, retcode)`, which internally calls `ERR_peek_last_error()`. Per the [OpenSSL documentation](https://www.openssl.org/docs/man3.0/man3/SSL_get_error.html):

> In addition to `ssl` and `ret`, **`SSL_get_error()` inspects the current thread's OpenSSL error queue**. Thus, `SSL_get_error()` must be called in the same thread that performed the TLS/SSL I/O operation, and **no other OpenSSL function calls should appear in between**. The current thread's error queue must be empty before the TLS/SSL I/O operation is attempted, or `SSL_get_error()` will not work reliably.

If stale error entries are present on the queue from a prior SSL operation (on the same thread but a different SSL object), `SSL_get_error()` misattributes them and returns `SSL_ERROR_SYSCALL` instead of the correct `SSL_ERROR_WANT_READ`.

The same issue exists in `_ssl__SSLSocket_write_impl`.

### When this manifests

This bug is invisible in multi-threaded programs because each OS thread has its own OpenSSL error queue. It becomes **critical** in cooperative multitasking frameworks (**gevent**, **eventlet**, **asyncio** with SSL) where multiple coroutines/greenlets share a single OS thread and thus a single OpenSSL error queue.

**Concrete scenario** (gevent):
1. Greenlet A performs an SSL write on an HTTPS connection. The remote client has disconnected, so `SSL_write_ex()` → `send()` fails with `EPIPE`. OpenSSL pushes an error entry onto the (per-thread) error queue. The greenlet handles the exception, but the error queue is **not cleared**.
2. The gevent hub switches to Greenlet B, which is an AMQP consumer doing `SSL_read_ex()` on a healthy RabbitMQ connection.
3. `SSL_read_ex()` → `recv()` returns `EAGAIN` (no data available — normal for a non-blocking socket).
4. `SSL_get_error()` finds the stale error from step 1 via `ERR_peek_last_error()` and returns `SSL_ERROR_SYSCALL` instead of `SSL_ERROR_WANT_READ`.
5. `_PySSL_errno()` captures `errno = 32` (stale EPIPE from step 1).
6. CPython exits the retry loop, enters `PySSL_SetError()`, and raises `BrokenPipeError(errno=32, "Broken pipe")` on a **perfectly healthy** connection.

### Evidence

- **Disassembly**: The compiled `_ssl.cpython-312-x86_64-linux-gnu.so` confirms no `ERR_clear_error` (PLT `0x9050`) before `SSL_read_ex` (PLT `0x93b0`) at the call site.
- **Production telemetry**: At the moment of every `BrokenPipeError`, `getsockopt(SO_ERROR)` returns 0 (no kernel-level error), and `tcpdump` shows no FIN/RST from the remote side — the TCP connection is healthy.
- **Workaround validation**: Calling `ERR_clear_error()` (via ctypes) before every `_sslobj.read()` in a monkey-patched `ssl.SSLSocket.read()` **completely eliminates** the spurious errors. Tested for 15+ minutes under production load with zero errors, after months of constant failures every ~90 seconds.

### Proposed fix

Add `ERR_clear_error()` before `SSL_read_ex()` and `SSL_write_ex()` in their respective retry loops:

```c
do {
    Py_BEGIN_ALLOW_THREADS;
    ERR_clear_error();  /* Prevent stale errors from affecting SSL_get_error() */
    retval = SSL_read_ex(self->ssl, mem, (size_t)len, &count);
    err = _PySSL_errno(retval == 0, self->ssl, retval);
    Py_END_ALLOW_THREADS;
    // ...
```

This matches OpenSSL's documented requirement and is consistent with how CPython already calls `ERR_clear_error()` in other SSL functions (e.g., `_ssl__SSLSocket_do_handshake_impl`, `_ssl_ctx_new`).

### Related

- gh-115627 (commit ea9a296fce2) — Improved `PySSL_SetError` handling for `SSL_ERROR_SYSCALL`, but only in `main`; does **not** add `ERR_clear_error()` before read/write calls.
- gh-127257 (commit 7f707fa6c67) — `ERR_LIB_SYS` handling improvement, backported to 3.12; does not address this issue.

### Reproducer

A minimal reproducer requires two SSL connections on the same OS thread. In pseudocode:

```python
import ssl, socket, gevent

def writer_greenlet():
    """SSL connection that will fail, leaving stale error on queue"""
    ctx = ssl.create_default_context()
    sock = ctx.wrap_socket(socket.socket(), server_hostname="...")
    sock.connect(...)
    # Remote side disconnects
    sock.write(b"data")  # raises BrokenPipeError — leaves stale OpenSSL error

def reader_greenlet():
    """Healthy SSL connection that reads — gets spurious BrokenPipeError"""
    ctx = ssl.create_default_context()
    sock = ctx.wrap_socket(socket.socket(), server_hostname="...")
    sock.connect(...)
    # This should block waiting for data, but instead raises BrokenPipeError
    sock.read(4096)  # BrokenPipeError on a HEALTHY connection

gevent.joinall([
    gevent.spawn(writer_greenlet),
    gevent.spawn(reader_greenlet),
])
```

### Versions

- CPython: 3.12.12, also present in `main` (3.15-dev, commit d14e31ed683)
- OpenSSL: 3.5.1 (also reproducible with 3.0.x, 3.2.x)
- OS: RHEL 10.1
- gevent: 25.4.1 / 25.8.2

---

The pseudocode reproducer is schematic — in practice, the trigger requires precise greenlet switching timing. The production scenario (AMQP consumers + HTTPS server in gevent) triggers it reliably every ~90 seconds.

### CPython versions tested on:

3.12

### Operating systems tested on:

Linux


### Linked PRs
* gh-148597

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ssl.SSLSocket.read() / write() missing ERR_clear_error() before SSL_read_ex() / SSL_write_ex() causes spurious errors with cooperative threading #148594

Bug report

Bug description:

Summary

Affected versions

Root cause

When this manifests

Evidence

Proposed fix

Related

Reproducer

Versions

CPython versions tested on:

Operating systems tested on:

Linked PRs

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

ssl.SSLSocket.read() / write() missing ERR_clear_error() before SSL_read_ex() / SSL_write_ex() causes spurious errors with cooperative threading #148594

Description

Bug report

Bug description:

Summary

Affected versions

Root cause

When this manifests

Evidence

Proposed fix

Related

Reproducer

Versions

CPython versions tested on:

Operating systems tested on:

Linked PRs

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions