Files
doc-exports/docs/modelarts/umn/modelarts_trouble_0032.html
Lai, Weijian 6aa966a79a ModelArts UMN 24.3.0 version
Reviewed-by: Pruthi, Vineet <vineet.pruthi@t-systems.com>
Co-authored-by: Lai, Weijian <laiweijian4@huawei.com>
Co-committed-by: Lai, Weijian <laiweijian4@huawei.com>
2024-11-02 09:04:52 +00:00

3.6 KiB

Error Message "No CUDA-capable device is detected" Displayed in Logs

Symptom

An error similar to the following occurs during the running of the program:
1. 'failed call to cuInit: CUDA_ERROR_NO_DEVICE:  no CUDA-capable device is detected'
2. 'No CUDA-capable device is detected although requirements are installed'

Possible Causes

The possible causes are as follows:

  • CUDA_VISIBLE_DEVICES has been incorrectly set.
  • CUDA operations are performed on GPUs with IDs that are not specified by CUDA_VISIBLE_DEVICES.

Solution

  1. Do not change the CUDA_VISIBLE_DEVICES value in the code. Use its default value.
  2. Ensure that the specified GPU IDs are within the available GPU IDs.
  3. If the error persists, print the CUDA_VISIBLE_DEVICES value and debug it in the notebook, or run the following commands to check whether the returned result is True:
    import torch
    torch.cuda.is_available()

Summary and Suggestions

Before creating a training job, use the ModelArts development environment to debug the training code to maximally eliminate errors in code migration.