Hi, the following code causes GPU OOM on hopper with nvls enabled. I am using the latest main branch.
from mscclpp import Transport, TcpBootstrap, Communicator
from mscclpp._mscclpp import Context, RawGpuBuffer
import cupy as cp
cp.cuda.Device(0).use()
bootstrap = TcpBootstrap.create(0, 1)
bootstrap.initialize(bootstrap.create_unique_id(), 60)
comm = Communicator(bootstrap)
for i in range(100):
if i % 10 == 0:
print(f"{i=}", flush=True)
mem = RawGpuBuffer(2 ** 30)
reg = comm.register_memory(mem.data(), mem.bytes(), Transport.CudaIpc)
del reg, mem
Output:
i=0
i=10
i=20
i=30
i=40
i=50
i=60
i=70
Traceback (most recent call last):
File "<stdin>", line 4, in <module>
mscclpp._mscclpp.CuError: (2, 'Call to result failed./.../mscclpp/src/gpu_utils.cc:128 (Cu failure: out of memory)')
The code is fine if memory is not registered. Could you please check if it can be reproduced on your side?
Hi, the following code causes GPU OOM on hopper with nvls enabled. I am using the latest main branch.
Output:
The code is fine if memory is not registered. Could you please check if it can be reproduced on your side?