forked from docs/doc-exports
Reviewed-by: Pruthi, Vineet <vineet.pruthi@t-systems.com> Co-authored-by: Lai, Weijian <laiweijian4@huawei.com> Co-committed-by: Lai, Weijian <laiweijian4@huawei.com>
2.0 KiB
2.0 KiB
TensorFlow Stops Writing TensorBoard to OBS When the Size of Written Data Reaches 5 GB
Symptom
The following error message is displayed for a ModelArts training job:
Encountered Unknown Error EntityTooLarge Your proposed upload exceeds the maximum allowed object size.:
If the signature check failed. This could be because of a time skew. Attempting to adjust the signer
Possible Cause
The size of files to be uploaded at a time is limited to 5 GB in OBS. TensorFlow may save the summary file in local cache. Therefore, when flush is triggered each time, the summary file overwrites the original file on OBS. If the size of the file exceeds 5 GB, the file stops being written.
Solution
If this problem occurs during the running of a training job, use the following method for troubleshooting.
- You are advised to use the following local cache method:
import moxing.tensorflow as mox mox.cache()
Parent topic: OBS Operation Issues