Training Job Process Exits Unexpectedly

Symptom

Running a training job failed, and error information similar to the following is displayed in logs:

[Modelarts Service Log]Training end with return code: 137

Possible Causes

According to the log, the exit code of the training job is 137. The training process starts after the user code is executed. Therefore, the exit code mentioned in this section is generated after the code for training job is executed. Common error codes include codes 247 and 139.

Troubleshooting

According to the error information, the error is caused by the user code.

You can use either of the following methods to locate the fault: