Files
doc-exports/docs/modelarts/umn/modelarts_13_0043.html
Lai, Weijian 6aa966a79a ModelArts UMN 24.3.0 version
Reviewed-by: Pruthi, Vineet <vineet.pruthi@t-systems.com>
Co-authored-by: Lai, Weijian <laiweijian4@huawei.com>
Co-committed-by: Lai, Weijian <laiweijian4@huawei.com>
2024-11-02 09:04:52 +00:00

2.1 KiB

Insufficient Container Space for Copying Data

Symptom

When a ModelArts training job is running, the following error is reported in the log. As a result, data cannot be copied to the container.

OSError:[Errno 28] No space left on device

Possible Causes

The container space is insufficient for downloading data.

Solution

  1. Check whether data is downloaded to the /cache directory. Each GPU node has a /cache directory with 4 TB of storage.
  2. Check whether GPU resources are used. If CPU resources are used, /cache and the code directory share 10 GB of memory. As a result, the memory is insufficient. In this case, use GPU resources instead.
  3. Add the following environment variable to the code:
    import os
    os.system('export TMPDIR=/cache')