Files
doc-exports/docs/modelarts/umn/develop-modelarts-0092.html
Lai, Weijian 6aa966a79a ModelArts UMN 24.3.0 version
Reviewed-by: Pruthi, Vineet <vineet.pruthi@t-systems.com>
Co-authored-by: Lai, Weijian <laiweijian4@huawei.com>
Co-committed-by: Lai, Weijian <laiweijian4@huawei.com>
2024-11-02 09:04:52 +00:00

17 lines
6.2 KiB
HTML

<a name="EN-US_TOPIC_0000002043177208"></a><a name="EN-US_TOPIC_0000002043177208"></a>
<h1 class="topictitle1">Viewing Training Job Events</h1>
<div id="body8662426"><p id="EN-US_TOPIC_0000002043177208__en-us_topic_0000001231305846_en-us_topic_0000001244362299_p17841121665213">Any key event of a training job will be recorded at the backend after the training job is displayed for you. You can check events on the training job details page.</p>
<p id="EN-US_TOPIC_0000002043177208__en-us_topic_0000001231305846_en-us_topic_0000001244362299_p12484144610439">This helps you better understand the running process of a training job and locate faults more accurately when a task exception occurs. The following job events are supported:</p>
<ul id="EN-US_TOPIC_0000002043177208__en-us_topic_0000001231305846_en-us_topic_0000001244362299_ul686375464319"><li id="EN-US_TOPIC_0000002043177208__en-us_topic_0000001231305846_li36691623135420">Training job created.</li><li id="EN-US_TOPIC_0000002043177208__en-us_topic_0000001231305846_li1121105216541">Training job failures:</li><li id="EN-US_TOPIC_0000002043177208__en-us_topic_0000001231305846_li20222155185517">Preparations timed out. The possible cause is that the cross-region algorithm synchronization or creating shared storage timed out.</li><li id="EN-US_TOPIC_0000002043177208__en-us_topic_0000001231305846_li960713316558">The training job is queuing and awaiting resource allocation.</li><li id="EN-US_TOPIC_0000002043177208__en-us_topic_0000001231305846_li12317253185510">Failed to be queued.</li><li id="EN-US_TOPIC_0000002043177208__en-us_topic_0000001231305846_li977555145613">The training job starts to run.</li><li id="EN-US_TOPIC_0000002043177208__en-us_topic_0000001231305846_li4884201755611">Training job executed.</li><li id="EN-US_TOPIC_0000002043177208__en-us_topic_0000001231305846_li192771230115616">Failed to run the training job.</li><li id="EN-US_TOPIC_0000002043177208__li1910204015325">The training job is preempted.</li><li id="EN-US_TOPIC_0000002043177208__en-us_topic_0000001231305846_li13134643125615">The system detects that your training job may be suspended. Go to the job details page to view the cause and handle the issue.</li><li id="EN-US_TOPIC_0000002043177208__li141028164466">The training job has been restarted.</li><li id="EN-US_TOPIC_0000002043177208__en-us_topic_0000001231305846_li13106202512578">The training job has been manually stopped.</li><li id="EN-US_TOPIC_0000002043177208__li17589185712451">The training job has been stopped. (Maximum running duration: 1 hour)</li><li id="EN-US_TOPIC_0000002043177208__en-us_topic_0000001231305846_li114165118576">The training job has been stopped. (Maximum running duration: 3 hours)</li><li id="EN-US_TOPIC_0000002043177208__en-us_topic_0000001231305846_li168022017586">The training job has been manually deleted.</li><li id="EN-US_TOPIC_0000002043177208__en-us_topic_0000001231305846_li1754681712584">Billing information synchronized.</li><li id="EN-US_TOPIC_0000002043177208__li15376171245014">[worker-0] The training environment is being pre-checked.</li><li id="EN-US_TOPIC_0000002043177208__li7256111314507">[worker-0] [Duration: second] Pre-check completed.</li><li id="EN-US_TOPIC_0000002043177208__en-us_topic_0000001231305846_li104714104114">[worker-0] [Duration: second] Pre-check failed. Error: xxx</li><li id="EN-US_TOPIC_0000002043177208__li7646168125219">[worker-0] [Duration: second] Pre-check failed. Error: xxx</li><li id="EN-US_TOPIC_0000002043177208__li147796254522">[worker-0] The training code is being downloaded.</li><li id="EN-US_TOPIC_0000002043177208__li91133745212">[worker-0] [Duration: second] Training code downloaded.</li><li id="EN-US_TOPIC_0000002043177208__li851414586522">[worker-0] [Duration: second] Failed to download the training code. Failure cause:</li><li id="EN-US_TOPIC_0000002043177208__li6923011145314">[worker-0] The training input is being downloaded.</li><li id="EN-US_TOPIC_0000002043177208__li0742153595310">[worker-0] [Duration: second] Training input (parameter: xxx) downloaded.</li><li id="EN-US_TOPIC_0000002043177208__li07651311545">[worker-0] [Duration: second] Failed to download the training input (parameter: xxx). Failure cause:</li><li id="EN-US_TOPIC_0000002043177208__li1604853125417">[worker-0] Python dependency packages are being installed. Import the following files:</li><li id="EN-US_TOPIC_0000002043177208__li1537762710548">[worker-0] [Duration: second] Python dependency packages installed. Import the following files:</li><li id="EN-US_TOPIC_0000002043177208__li6704202119559">[worker-0] The training job starts to run.</li><li id="EN-US_TOPIC_0000002043177208__li4167163413559">[worker-0] Training job executed.</li><li id="EN-US_TOPIC_0000002043177208__li149022185563">[worker-0] The training input is being uploaded.</li><li id="EN-US_TOPIC_0000002043177208__li1897193320568">[worker-0] [Duration: second] Training output (parameter: xxx) uploaded.</li></ul>
<p id="EN-US_TOPIC_0000002043177208__en-us_topic_0000001231305846_en-us_topic_0000001244362299_p973155519522">During the training process, key events can be manually or automatically refreshed.</p>
<div class="section" id="EN-US_TOPIC_0000002043177208__en-us_topic_0000001231305846_en-us_topic_0000001244362299_section282962985013"><h4 class="sectiontitle">Procedure</h4><ol id="EN-US_TOPIC_0000002043177208__en-us_topic_0000001231305846_en-us_topic_0000001244362299_ol1251152035016"><li id="EN-US_TOPIC_0000002043177208__en-us_topic_0000001231305846_en-us_topic_0000001244362299_en-us_topic_0000001206009603_li79561311145814">In the navigation pane of the ModelArts management console, choose <strong id="EN-US_TOPIC_0000002043177208__en-us_topic_0000001231305846_b1821335077">Training Management</strong> &gt; <strong id="EN-US_TOPIC_0000002043177208__en-us_topic_0000001231305846_b417833814714">Training Jobs</strong>. In the training job list, click a job name.</li><li id="EN-US_TOPIC_0000002043177208__en-us_topic_0000001231305846_en-us_topic_0000001244362299_li5913146155019">Click <strong id="EN-US_TOPIC_0000002043177208__b614571561617">Events</strong> to view events.</li></ol>
</div>
</div>
<div>
<div class="familylinks">
<div class="parentlink"><strong>Parent topic:</strong> <a href="develop-modelarts-0010.html">Performing a Training</a></div>
</div>
</div>