[Bug] RegisteredMemory not properly destroyed

Hi, the following code causes GPU OOM on hopper with nvls enabled. I am using the latest main branch.
```python
from mscclpp import Transport, TcpBootstrap, Communicator
from mscclpp._mscclpp import Context, RawGpuBuffer
import cupy as cp
cp.cuda.Device(0).use()
bootstrap = TcpBootstrap.create(0, 1)
bootstrap.initialize(bootstrap.create_unique_id(), 60)
comm = Communicator(bootstrap)
for i in range(100):
    if i % 10 == 0:
        print(f"{i=}", flush=True)
    mem = RawGpuBuffer(2 ** 30)
    reg = comm.register_memory(mem.data(), mem.bytes(), Transport.CudaIpc)
    del reg, mem
```
Output:
```bash
i=0
i=10
i=20
i=30
i=40
i=50
i=60
i=70
Traceback (most recent call last):
  File "<stdin>", line 4, in <module>
mscclpp._mscclpp.CuError: (2, 'Call to result failed./.../mscclpp/src/gpu_utils.cc:128 (Cu failure: out of memory)')
```
The code is fine if memory is not registered. Could you please check if it can be reproduced on your side?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] RegisteredMemory not properly destroyed #533

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Bug] RegisteredMemory not properly destroyed #533

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions