1. "RuntimeError: cuda runtime error (11) : invalid argument at /pytorch/aten/src/THC/THCCachingHostAllocator.cpp:278" 2. "libcudart.so.9.0 cannot open shared object file no such file or directory" 3. "Make sure the device specification refers to a valid device. The requested device appears to be a GPU,but CUDA is not enabled"
The CUDA version of the newly installed package does not match the CUDA version in the image.
Before creating a training job, use the ModelArts development environment to debug the training code to maximally eliminate errors in code migration.