The following error message is displayed for a ModelArts training job:
Encountered Unknown Error EntityTooLarge Your proposed upload exceeds the maximum allowed object size.:
If the signature check failed. This could be because of a time skew. Attempting to adjust the signer
The size of files to be uploaded at a time is limited to 5 GB in OBS. TensorFlow may save the summary file in local cache. Therefore, when flush is triggered each time, the summary file overwrites the original file on OBS. If the size of the file exceeds 5 GB, the file stops being written.
If this problem occurs during the running of a training job, use the following method for troubleshooting.
import moxing.tensorflow as mox mox.cache()