When I train my PyTorch Lightning model on two GPUs on jupyter lab with strategy=“ddp_notebook”, only two CPUs are used and their usages are 100%. How can I overcome this CPU bottleneck?

Edit: I tested with PyTorchProfiler and it was because of old ssds used on the server

  • troye888@lemmy.one
    link
    fedilink
    arrow-up
    3
    ·
    11 months ago

    Yup this, if you would like more help we need the code, or at least a minimal viable reproduction scenario.