Files
doc-exports/docs/modelarts/umn/develop-modelarts-0011.html
Lai, Weijian 6aa966a79a ModelArts UMN 24.3.0 version
Reviewed-by: Pruthi, Vineet <vineet.pruthi@t-systems.com>
Co-authored-by: Lai, Weijian <laiweijian4@huawei.com>
Co-committed-by: Lai, Weijian <laiweijian4@huawei.com>
2024-11-02 09:04:52 +00:00

279 lines
44 KiB
HTML

<a name="EN-US_TOPIC_0000002043018944"></a><a name="EN-US_TOPIC_0000002043018944"></a>
<h1 class="topictitle1">Creating a Training Job</h1>
<div id="body0000001165912040"><p id="EN-US_TOPIC_0000002043018944__en-us_topic_0000001072729016_p1165555874212">ModelArts training management enables you to create training jobs, view training statuses, and manage job versions. Model training is an iterative optimization process. Through unified training management, you can flexibly select algorithms, data, and hyperparameters to obtain the optimal input configuration and model. After comparing metrics between training versions, you can determine the most satisfactory training job.</p>
<div class="section" id="EN-US_TOPIC_0000002043018944__en-us_topic_0000001072729016_section588716131207"><h4 class="sectiontitle">Prerequisites</h4><ul id="EN-US_TOPIC_0000002043018944__en-us_topic_0000001072729016_ul153712018018"><li id="EN-US_TOPIC_0000002043018944__en-us_topic_0000001072729016_li1937182017010">Training data is available. You can create a dataset in ModelArts or upload training data to the OBS directory.</li><li id="EN-US_TOPIC_0000002043018944__en-us_topic_0000001072729016_li174229192296">You have created an algorithm either by using a preset image (<a href="develop-modelarts-0006.html">Using a Preset Image (Custom Script)</a>) or using a custom image (<a href="develop-modelarts-0077.html">Using a Custom Image</a>).</li><li id="EN-US_TOPIC_0000002043018944__en-us_topic_0000001072729016_li1864816519315">At least one empty folder has been created in OBS for storing the training output. OBS buckets are not encrypted. ModelArts does not support encrypted OBS buckets. When creating an OBS bucket, do not enable bucket encryption.</li><li id="EN-US_TOPIC_0000002043018944__en-us_topic_0000001072729016_li3988463226">Access authorization has been configured. For details, see <a href="modelarts_08_0007.html">Configuring Access Authorization (Global Configuration)</a>.</li></ul>
</div>
<div class="section" id="EN-US_TOPIC_0000002043018944__en-us_topic_0000001072729016_section210412592420"><h4 class="sectiontitle">Creating a Training Job</h4><ol id="EN-US_TOPIC_0000002043018944__en-us_topic_0000001072729016_ol14786932332"><li id="EN-US_TOPIC_0000002043018944__en-us_topic_0000001072729016_li1797795816273">Log in to the ModelArts management console.</li><li id="EN-US_TOPIC_0000002043018944__en-us_topic_0000001072729016_li1165643715559">In the navigation pane, choose <strong id="EN-US_TOPIC_0000002043018944__b356610024910">Training Management</strong> &gt; <strong id="EN-US_TOPIC_0000002043018944__b195661807492">Training Jobs</strong>. The training job list is displayed.</li><li id="EN-US_TOPIC_0000002043018944__en-us_topic_0000001072729016_li1678610328320">Click <strong id="EN-US_TOPIC_0000002043018944__b6941113124914">Create Training Job</strong>. Then, configure parameters.
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="EN-US_TOPIC_0000002043018944__en-us_topic_0000001072729016_table151111142134310" frame="border" border="1" rules="all"><caption><b>Table 1 </b>Parameters of a training job</caption><thead align="left"><tr id="EN-US_TOPIC_0000002043018944__en-us_topic_0000001072729016_row161121142204316"><th align="left" class="cellrowborder" colspan="2" valign="top" id="mcps1.3.3.2.3.2.2.4.1.1"><p id="EN-US_TOPIC_0000002043018944__en-us_topic_0000001072729016_p171131423438">Parameter</p>
</th>
<th align="left" class="cellrowborder" valign="top" id="mcps1.3.3.2.3.2.2.4.1.2"><p id="EN-US_TOPIC_0000002043018944__en-us_topic_0000001072729016_p7113342104314">Description</p>
</th>
</tr>
</thead>
<tbody><tr id="EN-US_TOPIC_0000002043018944__en-us_topic_0000001072729016_row711344234314"><td class="cellrowborder" colspan="2" valign="top" headers="mcps1.3.3.2.3.2.2.4.1.1 "><p id="EN-US_TOPIC_0000002043018944__en-us_topic_0000001072729016_p195841043144514">Name</p>
</td>
<td class="cellrowborder" valign="top" headers="mcps1.3.3.2.3.2.2.4.1.2 "><p id="EN-US_TOPIC_0000002043018944__en-us_topic_0000001072729016_p1622929104911">Name of a training job.</p>
<p id="EN-US_TOPIC_0000002043018944__en-us_topic_0000001072729016_p11114042164315">The system automatically generates a name. You can rename it based on the following naming rules:</p>
<ul id="EN-US_TOPIC_0000002043018944__en-us_topic_0000001072729016_ul136493552494"><li id="EN-US_TOPIC_0000002043018944__en-us_topic_0000001072729016_li12649755134917">The name contains 1 to 64 characters.</li><li id="EN-US_TOPIC_0000002043018944__en-us_topic_0000001072729016_li8237152310516">Letters, digits, hyphens (-), and underscores (_) are allowed.</li></ul>
</td>
</tr>
<tr id="EN-US_TOPIC_0000002043018944__en-us_topic_0000001072729016_row411494211436"><td class="cellrowborder" colspan="2" valign="top" headers="mcps1.3.3.2.3.2.2.4.1.1 "><p id="EN-US_TOPIC_0000002043018944__en-us_topic_0000001072729016_p19614154164511">Description</p>
</td>
<td class="cellrowborder" valign="top" headers="mcps1.3.3.2.3.2.2.4.1.2 "><p id="EN-US_TOPIC_0000002043018944__en-us_topic_0000001072729016_p511415421439">Description of a training job.</p>
</td>
</tr>
</tbody>
</table>
</div>
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="EN-US_TOPIC_0000002043018944__en-us_topic_0000001072729016_table082233873315" frame="border" border="1" rules="all"><caption><b>Table 2 </b>Algorithm parameters of a training job</caption><thead align="left"><tr id="EN-US_TOPIC_0000002043018944__en-us_topic_0000001072729016_row582313384334"><th align="left" class="cellrowborder" valign="top" width="18.02%" id="mcps1.3.3.2.3.3.2.4.1.1"><p id="EN-US_TOPIC_0000002043018944__en-us_topic_0000001072729016_p3823183873312">Parameter</p>
</th>
<th align="left" class="cellrowborder" valign="top" width="14.879999999999999%" id="mcps1.3.3.2.3.3.2.4.1.2"><p id="EN-US_TOPIC_0000002043018944__en-us_topic_0000001072729016_p16823153893316">Sub-Parameter</p>
</th>
<th align="left" class="cellrowborder" valign="top" width="67.10000000000001%" id="mcps1.3.3.2.3.3.2.4.1.3"><p id="EN-US_TOPIC_0000002043018944__en-us_topic_0000001072729016_p1982314383332">Description</p>
</th>
</tr>
</thead>
<tbody><tr id="EN-US_TOPIC_0000002043018944__en-us_topic_0000001072729016_row112051543173415"><td class="cellrowborder" valign="top" width="18.02%" headers="mcps1.3.3.2.3.3.2.4.1.1 "><p id="EN-US_TOPIC_0000002043018944__en-us_topic_0000001072729016_p11324478344">Algorithm Type &gt; Custom algorithm &gt; Boot Mode</p>
</td>
<td class="cellrowborder" valign="top" width="14.879999999999999%" headers="mcps1.3.3.2.3.3.2.4.1.2 "><p id="EN-US_TOPIC_0000002043018944__en-us_topic_0000001072729016_p139595488345">Preset image</p>
</td>
<td class="cellrowborder" valign="top" width="67.10000000000001%" headers="mcps1.3.3.2.3.3.2.4.1.3 "><p id="EN-US_TOPIC_0000002043018944__en-us_topic_0000001072729016_p137213213354">If <strong id="EN-US_TOPIC_0000002043018944__b9843747413">Boot Mode</strong> is set to <strong id="EN-US_TOPIC_0000002043018944__b1384314477119">Preset image</strong>, select a preset engine and configure the code directory and boot file.</p>
<ul id="EN-US_TOPIC_0000002043018944__en-us_topic_0000001072729016_ul337213214356"><li id="EN-US_TOPIC_0000002043018944__en-us_topic_0000001072729016_li153721321183515"><strong id="EN-US_TOPIC_0000002043018944__b6829153111">Code Directory</strong>: Select the code directory required for this training job. Upload code to the OBS bucket in advance. The total size of files in the directory cannot exceed 5 GB, the number of files cannot exceed 1,000, and the folder depth cannot exceed 32.</li><li id="EN-US_TOPIC_0000002043018944__en-us_topic_0000001072729016_li3372112111356"><strong id="EN-US_TOPIC_0000002043018944__b157102012214">Boot File</strong>: Select the Python boot script in the code directory. The boot file must a .py file because ModelArts supports only boot files written in Python.</li></ul>
</td>
</tr>
<tr id="EN-US_TOPIC_0000002043018944__en-us_topic_0000001072729016_row38247385338"><td class="cellrowborder" valign="top" width="18.02%" headers="mcps1.3.3.2.3.3.2.4.1.1 "><p id="EN-US_TOPIC_0000002043018944__en-us_topic_0000001072729016_p6328477346">Algorithm Type &gt; Custom algorithm &gt; Boot Mode</p>
</td>
<td class="cellrowborder" valign="top" width="14.879999999999999%" headers="mcps1.3.3.2.3.3.2.4.1.2 "><p id="EN-US_TOPIC_0000002043018944__en-us_topic_0000001072729016_p495984853416">Custom image</p>
</td>
<td class="cellrowborder" valign="top" width="67.10000000000001%" headers="mcps1.3.3.2.3.3.2.4.1.3 "><p id="EN-US_TOPIC_0000002043018944__en-us_topic_0000001072729016_p88241638203311">If <strong id="EN-US_TOPIC_0000002043018944__b10744215103614">Boot Mode</strong> is set to <strong id="EN-US_TOPIC_0000002043018944__b0797219378">Custom image</strong>, specify the image, code directory, and boot command.</p>
<ul id="EN-US_TOPIC_0000002043018944__en-us_topic_0000001072729016_ul10824538113316"><li id="EN-US_TOPIC_0000002043018944__en-us_topic_0000001072729016_li48241438193312"><strong id="EN-US_TOPIC_0000002043018944__b1690611314318">Code Directory</strong>: Select the code directory required for this training job. This parameter is optional.<p id="EN-US_TOPIC_0000002043018944__p1156617718542">Take OBS path <span class="filepath" id="EN-US_TOPIC_0000002043018944__filepath8366163345319"><b>obs://obs-bucket/training-test/demo-code</b></span> as an example. The content in the OBS path will be automatically downloaded to <span class="filepath" id="EN-US_TOPIC_0000002043018944__filepath19366163375319"><b>${MA_JOB_DIR}/demo-code</b></span> in the training container, and <strong id="EN-US_TOPIC_0000002043018944__b173672335531">demo-code</strong> (customizable) is the last-level directory of the OBS path.</p>
</li><li id="EN-US_TOPIC_0000002043018944__li676915154541"><strong id="EN-US_TOPIC_0000002043018944__b43531137521">User ID</strong>: User ID for running the container. The default value <strong id="EN-US_TOPIC_0000002043018944__b730241313506">1000</strong> is recommended. This parameter is optional.<p id="EN-US_TOPIC_0000002043018944__p1297010526505">If the UID needs to be specified, its value must be within the specified range. The UID ranges of different resource pools are as follows:</p>
<ul id="EN-US_TOPIC_0000002043018944__ul16821150205011"><li id="EN-US_TOPIC_0000002043018944__li38211203501">Public resource pool: 1000 to 65535</li><li id="EN-US_TOPIC_0000002043018944__li382190155017">Dedicated resource pool: 0 to 65535</li></ul>
</li><li id="EN-US_TOPIC_0000002043018944__en-us_topic_0000001072729016_li1282413810336"><strong id="EN-US_TOPIC_0000002043018944__b1395733614310">Boot Command</strong>: Enter the image boot command. This parameter is mandatory. The boot command will be automatically executed after the code directory is downloaded.<ul id="EN-US_TOPIC_0000002043018944__ul15531857144615"><li id="EN-US_TOPIC_0000002043018944__li855385710468">If the training boot script is a .py file, <strong id="EN-US_TOPIC_0000002043018944__b1563653195415">train.py</strong> for example, the boot command can be <strong id="EN-US_TOPIC_0000002043018944__b19636183125416">python ${MA_JOB_DIR}/demo-code/train.py</strong>.</li><li id="EN-US_TOPIC_0000002043018944__li145538572466">If the training boot script is an .sh file, <strong id="EN-US_TOPIC_0000002043018944__b226118615542">main.sh</strong> for example, the boot command can be <strong id="EN-US_TOPIC_0000002043018944__b192611260545">bash ${MA_JOB_DIR}/demo-code/main.sh</strong>.</li></ul>
<p id="EN-US_TOPIC_0000002043018944__p14196586547">Semicolons (;) and ampersands (&amp;&amp;) can be used to combine multiple boot commands, but line breaks are not supported. <strong id="EN-US_TOPIC_0000002043018944__b3726141095413">demo-code</strong> (customizable) in the boot command is the last-level directory of the OBS path.</p>
</li></ul>
</td>
</tr>
<tr id="EN-US_TOPIC_0000002043018944__en-us_topic_0000001072729016_row2822114318439"><td class="cellrowborder" valign="top" width="18.02%" headers="mcps1.3.3.2.3.3.2.4.1.1 "><p id="EN-US_TOPIC_0000002043018944__en-us_topic_0000001072729016_p18822124374319">Algorithm Type &gt; Custom algorithm</p>
</td>
<td class="cellrowborder" valign="top" width="14.879999999999999%" headers="mcps1.3.3.2.3.3.2.4.1.2 "><p id="EN-US_TOPIC_0000002043018944__en-us_topic_0000001072729016_p128221443184312">Local Code Directory</p>
</td>
<td class="cellrowborder" valign="top" width="67.10000000000001%" headers="mcps1.3.3.2.3.3.2.4.1.3 "><p id="EN-US_TOPIC_0000002043018944__en-us_topic_0000001072729016_p282294313433">You can specify the local directory of a training container. When a training starts, the system automatically downloads the code directory to this directory.</p>
<p id="EN-US_TOPIC_0000002043018944__p1860711552565">The default local code directory is <strong id="EN-US_TOPIC_0000002043018944__b343614542105">/home/ma-user/modelarts/user-job-dir</strong>. This parameter is optional.</p>
</td>
</tr>
<tr id="EN-US_TOPIC_0000002043018944__en-us_topic_0000001072729016_row1239014413438"><td class="cellrowborder" valign="top" width="18.02%" headers="mcps1.3.3.2.3.3.2.4.1.1 "><p id="EN-US_TOPIC_0000002043018944__en-us_topic_0000001072729016_p11390341204318">Algorithm Type &gt; Custom algorithm</p>
</td>
<td class="cellrowborder" valign="top" width="14.879999999999999%" headers="mcps1.3.3.2.3.3.2.4.1.2 "><p id="EN-US_TOPIC_0000002043018944__en-us_topic_0000001072729016_p339015413439">Work Directory</p>
</td>
<td class="cellrowborder" valign="top" width="67.10000000000001%" headers="mcps1.3.3.2.3.3.2.4.1.3 "><p id="EN-US_TOPIC_0000002043018944__p75018168139">Set the directory where the boot file in the training container is located. When a training job starts, the system automatically runs the <strong id="EN-US_TOPIC_0000002043018944__b10982142812419">cd</strong> command to change the work directory to the specified directory.</p>
</td>
</tr>
<tr id="EN-US_TOPIC_0000002043018944__en-us_topic_0000001072729016_row139385394119"><td class="cellrowborder" valign="top" width="18.02%" headers="mcps1.3.3.2.3.3.2.4.1.1 "><p id="EN-US_TOPIC_0000002043018944__en-us_topic_0000001072729016_p1486317715429">Created By</p>
</td>
<td class="cellrowborder" valign="top" width="14.879999999999999%" headers="mcps1.3.3.2.3.3.2.4.1.2 "><p id="EN-US_TOPIC_0000002043018944__en-us_topic_0000001072729016_p286313716426">My algorithms</p>
</td>
<td class="cellrowborder" valign="top" width="67.10000000000001%" headers="mcps1.3.3.2.3.3.2.4.1.3 "><p id="EN-US_TOPIC_0000002043018944__en-us_topic_0000001072729016_p386319794214">Select an algorithm or create an algorithm. For details, see <a href="develop-modelarts-0009.html">Creating an Algorithm</a>.</p>
</td>
</tr>
</tbody>
</table>
</div>
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="EN-US_TOPIC_0000002043018944__en-us_topic_0000001072729016_table1771312193816" frame="border" border="1" rules="all"><caption><b>Table 3 </b>Parameters of training input and output</caption><thead align="left"><tr id="EN-US_TOPIC_0000002043018944__en-us_topic_0000001072729016_row4713102110384"><th align="left" class="cellrowborder" valign="top" width="17.93%" id="mcps1.3.3.2.3.4.2.4.1.1"><p id="EN-US_TOPIC_0000002043018944__en-us_topic_0000001072729016_p13713192183816">Parameter</p>
</th>
<th align="left" class="cellrowborder" valign="top" width="15.22%" id="mcps1.3.3.2.3.4.2.4.1.2"><p id="EN-US_TOPIC_0000002043018944__en-us_topic_0000001072729016_p14713112112383">Sub-Parameter</p>
</th>
<th align="left" class="cellrowborder" valign="top" width="66.85%" id="mcps1.3.3.2.3.4.2.4.1.3"><p id="EN-US_TOPIC_0000002043018944__en-us_topic_0000001072729016_p18713821193817">Description</p>
</th>
</tr>
</thead>
<tbody><tr id="EN-US_TOPIC_0000002043018944__en-us_topic_0000001072729016_row1871618217389"><td class="cellrowborder" rowspan="4" valign="top" width="17.93%" headers="mcps1.3.3.2.3.4.2.4.1.1 "><p id="EN-US_TOPIC_0000002043018944__en-us_topic_0000001072729016_p167162021183818">Input</p>
<p id="EN-US_TOPIC_0000002043018944__p137021334817"></p>
</td>
<td class="cellrowborder" valign="top" width="15.22%" headers="mcps1.3.3.2.3.4.2.4.1.2 "><p id="EN-US_TOPIC_0000002043018944__en-us_topic_0000001072729016_p371614211381">Name</p>
</td>
<td class="cellrowborder" valign="top" width="66.85%" headers="mcps1.3.3.2.3.4.2.4.1.3 "><p id="EN-US_TOPIC_0000002043018944__en-us_topic_0000001072729016_p167160214386">The recommended value is <strong id="EN-US_TOPIC_0000002043018944__b618131805617">data_url</strong>. The training input must match the data input configuration set in your selected algorithm. For details, see <a href="develop-modelarts-0009.html#EN-US_TOPIC_0000002079176585__en-us_topic_0000001133351332_table126437359515">Table 2</a>.</p>
<p id="EN-US_TOPIC_0000002043018944__en-us_topic_0000001072729016_p20716321113819">For example, if you use <strong id="EN-US_TOPIC_0000002043018944__b2693240175611">argparse</strong> in the training code to parse <strong id="EN-US_TOPIC_0000002043018944__b569354017568">data_url</strong> into the data input, set the data input parameter to <strong id="EN-US_TOPIC_0000002043018944__b146931440135619">data_url</strong> when creating the algorithm.</p>
<p id="EN-US_TOPIC_0000002043018944__en-us_topic_0000001072729016_p1171652193818">You can select a dataset or data path for data input. When the training job is started, ModelArts automatically downloads the data in the input path to the container directory for training.</p>
</td>
</tr>
<tr id="EN-US_TOPIC_0000002043018944__en-us_topic_0000001072729016_row1871619210386"><td class="cellrowborder" valign="top" headers="mcps1.3.3.2.3.4.2.4.1.1 "><p id="EN-US_TOPIC_0000002043018944__en-us_topic_0000001072729016_p1371632118387">Dataset</p>
</td>
<td class="cellrowborder" valign="top" headers="mcps1.3.3.2.3.4.2.4.1.2 "><p id="EN-US_TOPIC_0000002043018944__en-us_topic_0000001072729016_p8716621173816">Select an available dataset and its version from the ModelArts <strong id="EN-US_TOPIC_0000002043018944__b1738854575718">Data Management</strong> module.</p>
<p id="EN-US_TOPIC_0000002043018944__en-us_topic_0000001072729016_p127161521113810">Click <strong id="EN-US_TOPIC_0000002043018944__b112341049185711">Dataset</strong> and select the target dataset and its version in the dialog box displayed.</p>
<div class="note" id="EN-US_TOPIC_0000002043018944__en-us_topic_0000001072729016_note1171672133817"><span class="notetitle"> NOTE: </span><div class="notebody"><p id="EN-US_TOPIC_0000002043018944__en-us_topic_0000001072729016_p1716102123818">If <strong id="EN-US_TOPIC_0000002043018944__b4820195511572">Dataset</strong> is unavailable, the training data of the selected algorithm cannot be from a dataset.</p>
</div></div>
</td>
</tr>
<tr id="EN-US_TOPIC_0000002043018944__en-us_topic_0000001072729016_row1971622143813"><td class="cellrowborder" valign="top" headers="mcps1.3.3.2.3.4.2.4.1.1 "><p id="EN-US_TOPIC_0000002043018944__en-us_topic_0000001072729016_p97168214383">Data path</p>
</td>
<td class="cellrowborder" valign="top" headers="mcps1.3.3.2.3.4.2.4.1.2 "><p id="EN-US_TOPIC_0000002043018944__en-us_topic_0000001072729016_p15716172133814">Select the training data from your OBS bucket.</p>
<p id="EN-US_TOPIC_0000002043018944__en-us_topic_0000001072729016_p207161721163816">Click <strong id="EN-US_TOPIC_0000002043018944__b1342732105811">Data path</strong> and select the OBS bucket and folder in the dialog box displayed.</p>
<div class="note" id="EN-US_TOPIC_0000002043018944__en-us_topic_0000001072729016_note47161721193820"><span class="notetitle"> NOTE: </span><div class="notebody"><p id="EN-US_TOPIC_0000002043018944__en-us_topic_0000001072729016_p6717172143817">If <strong id="EN-US_TOPIC_0000002043018944__b1140754185810">Data path</strong> is unavailable, the training data of the selected algorithm cannot be from a data path.</p>
</div></div>
</td>
</tr>
<tr id="EN-US_TOPIC_0000002043018944__row437091384818"><td class="cellrowborder" valign="top" headers="mcps1.3.3.2.3.4.2.4.1.1 "><p id="EN-US_TOPIC_0000002043018944__p1637051334810">Obtained from</p>
</td>
<td class="cellrowborder" valign="top" headers="mcps1.3.3.2.3.4.2.4.1.2 "><p id="EN-US_TOPIC_0000002043018944__p178411830104819">The following uses training input <strong id="EN-US_TOPIC_0000002043018944__b6212929165517">data_path</strong> as an example.</p>
<p id="EN-US_TOPIC_0000002043018944__p784133034814">If you select <strong id="EN-US_TOPIC_0000002043018944__b1876013475814">Hyperparameters</strong>, do as follows to obtain the training input:</p>
<pre class="screen" id="EN-US_TOPIC_0000002043018944__screen384173012485">import argparse
parser = argparse.ArgumentParser()
parser.add_argument('--data_path')
args, unknown = parser.parse_known_args()
data_path = args.data_path </pre>
<p id="EN-US_TOPIC_0000002043018944__p4841153034815">If you select <strong id="EN-US_TOPIC_0000002043018944__b1694141265914">Environment variables</strong>, do as follows to obtain the training input:</p>
<pre class="screen" id="EN-US_TOPIC_0000002043018944__screen0841203020482">import os
data_path = os.getenv("data_path", "")</pre>
</td>
</tr>
<tr id="EN-US_TOPIC_0000002043018944__en-us_topic_0000001072729016_row137174216383"><td class="cellrowborder" rowspan="4" valign="top" width="17.93%" headers="mcps1.3.3.2.3.4.2.4.1.1 "><p id="EN-US_TOPIC_0000002043018944__en-us_topic_0000001072729016_p19717421133813">Output</p>
</td>
<td class="cellrowborder" valign="top" width="15.22%" headers="mcps1.3.3.2.3.4.2.4.1.2 "><p id="EN-US_TOPIC_0000002043018944__en-us_topic_0000001072729016_p107170216387">Name</p>
</td>
<td class="cellrowborder" valign="top" width="66.85%" headers="mcps1.3.3.2.3.4.2.4.1.3 "><p id="EN-US_TOPIC_0000002043018944__en-us_topic_0000001072729016_p13673194756">The algorithm code reads the local path to the training output based on this parameter.</p>
<p id="EN-US_TOPIC_0000002043018944__en-us_topic_0000001072729016_p1071752112388">The recommended value is <strong id="EN-US_TOPIC_0000002043018944__b715321385820">train_url</strong>. The training output must match the data output configuration set in your selected algorithm. For details, see <a href="develop-modelarts-0009.html#EN-US_TOPIC_0000002079176585__en-us_topic_0000001133351332_table8644335195117">Table 3</a>.</p>
<p id="EN-US_TOPIC_0000002043018944__en-us_topic_0000001072729016_p177171221203816">For example, if you use <strong id="EN-US_TOPIC_0000002043018944__b660511813582">argparse</strong> in the algorithm code to parse <strong id="EN-US_TOPIC_0000002043018944__b1160551817581">train_url</strong> into the data output, set the data output parameter to <strong id="EN-US_TOPIC_0000002043018944__b106051518135813">train_url</strong> when creating the algorithm.</p>
<p id="EN-US_TOPIC_0000002043018944__en-us_topic_0000001072729016_p1671712110387">You can select an OBS path for data output. During training, ModelArts automatically uploads the training output to the OBS path.</p>
</td>
</tr>
<tr id="EN-US_TOPIC_0000002043018944__en-us_topic_0000001072729016_row87171521153819"><td class="cellrowborder" valign="top" headers="mcps1.3.3.2.3.4.2.4.1.1 "><p id="EN-US_TOPIC_0000002043018944__en-us_topic_0000001072729016_p271702113381">Data path</p>
</td>
<td class="cellrowborder" valign="top" headers="mcps1.3.3.2.3.4.2.4.1.2 "><p id="EN-US_TOPIC_0000002043018944__en-us_topic_0000001072729016_p145791537057">This data path stores the training output. During and after the training, the system automatically synchronizes files from the local directory to the data path. Currently, only OBS paths can be set as the data path.</p>
<p id="EN-US_TOPIC_0000002043018944__en-us_topic_0000001072729016_p8717921203811">Select the storage path of the training result (OBS path). To minimize errors, select an empty directory.</p>
</td>
</tr>
<tr id="EN-US_TOPIC_0000002043018944__en-us_topic_0000001072729016_row1038343201211"><td class="cellrowborder" valign="top" headers="mcps1.3.3.2.3.4.2.4.1.1 "><p id="EN-US_TOPIC_0000002043018944__en-us_topic_0000001072729016_p1138310311123">Obtained from</p>
</td>
<td class="cellrowborder" valign="top" headers="mcps1.3.3.2.3.4.2.4.1.2 "><p id="EN-US_TOPIC_0000002043018944__en-us_topic_0000001072729016_p99011010201310">The following uses the training output <strong id="EN-US_TOPIC_0000002043018944__b16632534175820">train_url</strong> as an example.</p>
<p id="EN-US_TOPIC_0000002043018944__en-us_topic_0000001072729016_p20383183181216">Obtain the training output from hyperparameters by using the following code:</p>
<pre class="screen" id="EN-US_TOPIC_0000002043018944__en-us_topic_0000001072729016_screen5423125191311">import argparse
parser = argparse.ArgumentParser()
parser.add_argument('--train_url')
args, unknown = parser.parse_known_args()
train_url = args.train_url </pre>
<p id="EN-US_TOPIC_0000002043018944__en-us_topic_0000001072729016_p1078117403153">Obtain the training output from environment variables by using the following code:</p>
<pre class="screen" id="EN-US_TOPIC_0000002043018944__en-us_topic_0000001072729016_screen127821340171518">import os
train_url = os.getenv("train_url", "")</pre>
</td>
</tr>
<tr id="EN-US_TOPIC_0000002043018944__en-us_topic_0000001072729016_row72419512217"><td class="cellrowborder" valign="top" headers="mcps1.3.3.2.3.4.2.4.1.1 "><p id="EN-US_TOPIC_0000002043018944__en-us_topic_0000001072729016_p2025351172111">Predownload</p>
</td>
<td class="cellrowborder" valign="top" headers="mcps1.3.3.2.3.4.2.4.1.2 "><p id="EN-US_TOPIC_0000002043018944__en-us_topic_0000001072729016_p7254515218">If you set <strong id="EN-US_TOPIC_0000002043018944__b310217477588">Predownload</strong> to <strong id="EN-US_TOPIC_0000002043018944__b16103747125819">Yes</strong>, the system automatically downloads the files in the training output data path to a local directory of the training container before the training job is started. Select <strong id="EN-US_TOPIC_0000002043018944__b1655194845814">Yes</strong> for <a href="develop-modelarts-0023.html">resumable training and incremental training</a>.</p>
</td>
</tr>
<tr id="EN-US_TOPIC_0000002043018944__en-us_topic_0000001072729016_row571742183816"><td class="cellrowborder" valign="top" width="17.93%" headers="mcps1.3.3.2.3.4.2.4.1.1 "><p id="EN-US_TOPIC_0000002043018944__en-us_topic_0000001072729016_p157170214389">Hyperparameters</p>
</td>
<td class="cellrowborder" valign="top" width="15.22%" headers="mcps1.3.3.2.3.4.2.4.1.2 "><p id="EN-US_TOPIC_0000002043018944__en-us_topic_0000001072729016_p12717121143813">None</p>
</td>
<td class="cellrowborder" valign="top" width="66.85%" headers="mcps1.3.3.2.3.4.2.4.1.3 "><p id="EN-US_TOPIC_0000002043018944__en-us_topic_0000001072729016_p1771772114383">The value of this parameter varies according to the selected algorithm.</p>
<p id="EN-US_TOPIC_0000002043018944__en-us_topic_0000001072729016_p9717132115381">If you have defined hyperparameters when creating an algorithm, all hyperparameters of the algorithm are displayed. Whether hyperparameters can be modified or deleted depends on how you configure the constraints when creating the algorithm. For details, see <a href="develop-modelarts-0009.html#EN-US_TOPIC_0000002079176585__en-us_topic_0000001133351332_en-us_topic_0000001071986951_section1883311313516">Defining Hyperparameters</a>.</p>
</td>
</tr>
<tr id="EN-US_TOPIC_0000002043018944__en-us_topic_0000001072729016_row18718621173818"><td class="cellrowborder" valign="top" width="17.93%" headers="mcps1.3.3.2.3.4.2.4.1.1 "><p id="EN-US_TOPIC_0000002043018944__en-us_topic_0000001072729016_p19161171618115">Environment Variable</p>
</td>
<td class="cellrowborder" valign="top" width="15.22%" headers="mcps1.3.3.2.3.4.2.4.1.2 "><p id="EN-US_TOPIC_0000002043018944__en-us_topic_0000001072729016_p971882117382">None</p>
</td>
<td class="cellrowborder" valign="top" width="66.85%" headers="mcps1.3.3.2.3.4.2.4.1.3 "><p id="EN-US_TOPIC_0000002043018944__en-us_topic_0000001072729016_p167181621173818">Environment variables, which you can add as required. For details about the environment variables preset in the training container, see <a href="develop-modelarts-0104.html">Viewing Environment Variables of a Training Container</a>.</p>
</td>
</tr>
<tr id="EN-US_TOPIC_0000002043018944__en-us_topic_0000001072729016_row7718421143816"><td class="cellrowborder" valign="top" width="17.93%" headers="mcps1.3.3.2.3.4.2.4.1.1 "><p id="EN-US_TOPIC_0000002043018944__en-us_topic_0000001072729016_p15718172103812">Auto Restart</p>
</td>
<td class="cellrowborder" valign="top" width="15.22%" headers="mcps1.3.3.2.3.4.2.4.1.2 "><p id="EN-US_TOPIC_0000002043018944__en-us_topic_0000001072729016_p1571892173819">None</p>
</td>
<td class="cellrowborder" valign="top" width="66.85%" headers="mcps1.3.3.2.3.4.2.4.1.3 "><p id="EN-US_TOPIC_0000002043018944__en-us_topic_0000001072729016_p57182211384">Number of retries for a failed training job. If this parameter is enabled, a failed training job will be automatically re-delivered and run. On the training job details page, you can view the number of retries for a failed training job.</p>
<ul id="EN-US_TOPIC_0000002043018944__en-us_topic_0000001072729016_ul557183754814"><li id="EN-US_TOPIC_0000002043018944__en-us_topic_0000001072729016_li7571737154819">This function is disabled by default.</li><li id="EN-US_TOPIC_0000002043018944__en-us_topic_0000001072729016_li105711137174811">If you enable this function, set the number of retries. The value ranges from 1 to 3 and cannot be changed.</li></ul>
</td>
</tr>
</tbody>
</table>
</div>
<div class="note" id="EN-US_TOPIC_0000002043018944__en-us_topic_0000001072729016_note8645637121414"><img src="public_sys-resources/note_3.0-en-us.png"><span class="notetitle"> </span><div class="notebody"><p id="EN-US_TOPIC_0000002043018944__en-us_topic_0000001072729016_p9645153781412">The training input, training output, and hyperparameters vary according to the selected algorithm.</p>
<p id="EN-US_TOPIC_0000002043018944__en-us_topic_0000001072729016_p156451037101415">If the system displays a message for <strong id="EN-US_TOPIC_0000002043018944__b10387770315">Training Input</strong>, indicating there is no input channel for the selected algorithm, you do not need to set data input on this page.</p>
<p id="EN-US_TOPIC_0000002043018944__en-us_topic_0000001072729016_p136451337181415">If the system displays a message for <strong id="EN-US_TOPIC_0000002043018944__b166211699318">Training Output</strong>, indicating there is no output channel for the selected algorithm, you do not need to set data output on this page.</p>
<p id="EN-US_TOPIC_0000002043018944__en-us_topic_0000001072729016_p9645193771414">If the system displays a message for <span class="parmname" id="EN-US_TOPIC_0000002043018944__parmname656191113313"><b>Hyperparameters</b></span>, indicating the selected algorithm does not support custom hyperparameters, you do not need to set hyperparameters on this page.</p>
</div></div>
</li><li id="EN-US_TOPIC_0000002043018944__en-us_topic_0000001072729016_li76967465182">Select an instance flavor. The value range of the training parameters is consistent with the constraints of existing algorithms.
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="EN-US_TOPIC_0000002043018944__en-us_topic_0000001072729016_table117915347161" frame="border" border="1" rules="all"><caption><b>Table 4 </b>Resource parameters</caption><thead align="left"><tr id="EN-US_TOPIC_0000002043018944__en-us_topic_0000001072729016_row1779103431617"><th align="left" class="cellrowborder" valign="top" width="15.82%" id="mcps1.3.3.2.4.1.2.3.1.1"><p id="EN-US_TOPIC_0000002043018944__en-us_topic_0000001072729016_p13791834141614">Parameter</p>
</th>
<th align="left" class="cellrowborder" valign="top" width="84.17999999999999%" id="mcps1.3.3.2.4.1.2.3.1.2"><p id="EN-US_TOPIC_0000002043018944__en-us_topic_0000001072729016_p1679173412165">Description</p>
</th>
</tr>
</thead>
<tbody><tr id="EN-US_TOPIC_0000002043018944__en-us_topic_0000001072729016_row3791133461615"><td class="cellrowborder" valign="top" width="15.82%" headers="mcps1.3.3.2.4.1.2.3.1.1 "><p id="EN-US_TOPIC_0000002043018944__en-us_topic_0000001072729016_p5791163491618">Resource Pool</p>
</td>
<td class="cellrowborder" valign="top" width="84.17999999999999%" headers="mcps1.3.3.2.4.1.2.3.1.2 "><p id="EN-US_TOPIC_0000002043018944__en-us_topic_0000001072729016_p177911834131610">Select resource pools for the job. Public and dedicated resource pools are available for you to select.</p>
<p id="EN-US_TOPIC_0000002043018944__en-us_topic_0000001072729016_p94551230102217">If you select a dedicated resource pool, you can view details about the pool. If the number of available cards of this pool is insufficient, jobs may need to be queued. In this case, use another resource pool or reduce the number of cards required.</p>
<div class="note" id="EN-US_TOPIC_0000002043018944__en-us_topic_0000001072729016_note4713143118427"><span class="notetitle"> NOTE: </span><div class="notebody"><p id="EN-US_TOPIC_0000002043018944__p2041372011548">Dedicated resource pools can be accessed to your VPCs and subnets. For details, see <a href="resmgmt-modelarts_0012.html#EN-US_TOPIC_0000002043020048__section1473914311415">(Optional) Interconnecting a VPC with a ModelArts Network</a>.</p>
<p id="EN-US_TOPIC_0000002043018944__en-us_topic_0000001072729016_p371393116429">If you want to change the VPC accessible to your dedicated resource pool, see <a href="resmgmt-modelarts_0012.html#EN-US_TOPIC_0000002043020048__section1473914311415">(Optional) Interconnecting a VPC with a ModelArts Network</a>.</p>
</div></div>
</td>
</tr>
<tr id="EN-US_TOPIC_0000002043018944__en-us_topic_0000001072729016_row179114343162"><td class="cellrowborder" valign="top" width="15.82%" headers="mcps1.3.3.2.4.1.2.3.1.1 "><p id="EN-US_TOPIC_0000002043018944__en-us_topic_0000001072729016_p57912346166">Resource Type</p>
</td>
<td class="cellrowborder" valign="top" width="84.17999999999999%" headers="mcps1.3.3.2.4.1.2.3.1.2 "><p id="EN-US_TOPIC_0000002043018944__en-us_topic_0000001072729016_p12791133417164">Select CPU or GPU as needed. Set this parameter based on the resource type specified in your training code.</p>
</td>
</tr>
<tr id="EN-US_TOPIC_0000002043018944__en-us_topic_0000001072729016_row4791133410162"><td class="cellrowborder" valign="top" width="15.82%" headers="mcps1.3.3.2.4.1.2.3.1.1 "><p id="EN-US_TOPIC_0000002043018944__en-us_topic_0000001072729016_p079111349166">Specifications</p>
</td>
<td class="cellrowborder" valign="top" width="84.17999999999999%" headers="mcps1.3.3.2.4.1.2.3.1.2 "><p id="EN-US_TOPIC_0000002043018944__en-us_topic_0000001072729016_p1179183419166">Select a resource flavor based on the resource type. If the type of resources to be used has been specified in your training code, only the options that comply with the constraints of the selected algorithm are available for you to choose. For example, if <strong id="EN-US_TOPIC_0000002043018944__b3276023112010">GPU</strong> is selected in the training code but you select <strong id="EN-US_TOPIC_0000002043018944__b2276723102011">CPU</strong> here, the training may fail.</p>
<p id="EN-US_TOPIC_0000002043018944__en-us_topic_0000001072729016_p16791143414167">During training, ModelArts will mount NVME SSDs to the <strong id="EN-US_TOPIC_0000002043018944__b20438144211203">/cache</strong> directory. You can use this directory to store temporary files. The data disk size varies depending on the resource type. To prevent insufficient memory during training, click <strong id="EN-US_TOPIC_0000002043018944__b166581721949">Check Input Size</strong> to check whether the disk size of selected instance flavor is sufficient for the input size.</p>
</td>
</tr>
<tr id="EN-US_TOPIC_0000002043018944__en-us_topic_0000001072729016_row17791113416169"><td class="cellrowborder" valign="top" width="15.82%" headers="mcps1.3.3.2.4.1.2.3.1.1 "><p id="EN-US_TOPIC_0000002043018944__en-us_topic_0000001072729016_p11791193461619">Compute Nodes</p>
</td>
<td class="cellrowborder" valign="top" width="84.17999999999999%" headers="mcps1.3.3.2.4.1.2.3.1.2 "><p id="EN-US_TOPIC_0000002043018944__en-us_topic_0000001072729016_p20791534131612">Set the number of compute nodes. The default value is <span class="parmvalue" id="EN-US_TOPIC_0000002043018944__parmvalue1241471119420"><b>1</b></span>.</p>
</td>
</tr>
<tr id="EN-US_TOPIC_0000002043018944__en-us_topic_0000001072729016_row923284913713"><td class="cellrowborder" valign="top" width="15.82%" headers="mcps1.3.3.2.4.1.2.3.1.1 "><p id="EN-US_TOPIC_0000002043018944__en-us_topic_0000001072729016_p2232164911711">Job Priority</p>
</td>
<td class="cellrowborder" valign="top" width="84.17999999999999%" headers="mcps1.3.3.2.4.1.2.3.1.2 "><p id="EN-US_TOPIC_0000002043018944__en-us_topic_0000001072729016_p1723217491478">When using a new-version dedicated resource pool, you can set the priority of a training job. The value ranges from 1 to 3. The default priority is <strong id="EN-US_TOPIC_0000002043018944__b414112183414">1</strong>, and the highest priority is <strong id="EN-US_TOPIC_0000002043018944__b161426181446">3</strong>. By default, the job priority can be set to <strong id="EN-US_TOPIC_0000002043018944__b5566145655819">1</strong> or <strong id="EN-US_TOPIC_0000002043018944__b9212165817587">2</strong>. After the permission to <a href="develop-modelarts-0082.html">set the highest job priority</a> is configured, the priority can be set to <strong id="EN-US_TOPIC_0000002043018944__b135238151725">1</strong> to <strong id="EN-US_TOPIC_0000002043018944__b239513171522">3</strong>.</p>
<p id="EN-US_TOPIC_0000002043018944__en-us_topic_0000001072729016_p176845231113">You can change the priority of a pending job.</p>
</td>
</tr>
<tr id="EN-US_TOPIC_0000002043018944__en-us_topic_0000001072729016_row473516541975"><td class="cellrowborder" valign="top" width="15.82%" headers="mcps1.3.3.2.4.1.2.3.1.1 "><p id="EN-US_TOPIC_0000002043018944__en-us_topic_0000001072729016_p1773614544710">SFS Turbo</p>
</td>
<td class="cellrowborder" valign="top" width="84.17999999999999%" headers="mcps1.3.3.2.4.1.2.3.1.2 "><p id="EN-US_TOPIC_0000002043018944__en-us_topic_0000001072729016_p1373685417712">When a dedicated resource pool is used for training, multiple SFS Turbo file systems can be mounted for one training job.</p>
<ul id="EN-US_TOPIC_0000002043018944__ul798714516113"><li id="EN-US_TOPIC_0000002043018944__li148650564711"><strong id="EN-US_TOPIC_0000002043018944__b182547470508">Name</strong>: SFS Turbo name</li><li id="EN-US_TOPIC_0000002043018944__li69875512015"><strong id="EN-US_TOPIC_0000002043018944__b1880714102110">Server Path</strong>: SFS Turbo directory</li><li id="EN-US_TOPIC_0000002043018944__li9987145114114"><strong id="EN-US_TOPIC_0000002043018944__b948613612216">Local Path</strong>: mounting path of the SFS Turbo directory in the training job</li></ul>
<p id="EN-US_TOPIC_0000002043018944__en-us_topic_0000001072729016_p12912659145111">A file system can be mounted only once and to only one path. Each mount path must be unique. A maximum of 8 disks can be mounted to a training job.</p>
<div class="note" id="EN-US_TOPIC_0000002043018944__note1315712984817"><span class="notetitle"> NOTE: </span><div class="notebody"><ul id="EN-US_TOPIC_0000002043018944__ul670183314711"><li id="EN-US_TOPIC_0000002043018944__li1270193394718">Before mounting an SFS Turbo file system to a training job, configure the VPC and subnet where SFS Turbo is deployed to be accessible in the dedicated resource pool. For details, see .</li><li id="EN-US_TOPIC_0000002043018944__li13701233124717">The mounting path cannot be a <strong id="EN-US_TOPIC_0000002043018944__b71361591236">/</strong> directory or a default mounting path, such as <strong id="EN-US_TOPIC_0000002043018944__b14825182112310">/cache</strong> and <strong id="EN-US_TOPIC_0000002043018944__b1869924736">/home/ma-user/modelarts</strong>.</li></ul>
</div></div>
</td>
</tr>
<tr id="EN-US_TOPIC_0000002043018944__row13840221420"><td class="cellrowborder" valign="top" width="15.82%" headers="mcps1.3.3.2.4.1.2.3.1.1 "><p id="EN-US_TOPIC_0000002043018944__p483511814529">Parallel File System</p>
</td>
<td class="cellrowborder" valign="top" width="84.17999999999999%" headers="mcps1.3.3.2.4.1.2.3.1.2 "><p id="EN-US_TOPIC_0000002043018944__p3916171013219">An OBS parallel file system can be mounted to a training job to store training data. Click <strong id="EN-US_TOPIC_0000002043018944__b553042911226">Add Mount Configuration</strong> and set the following parameters:</p>
<ul id="EN-US_TOPIC_0000002043018944__ul1691614109211"><li id="EN-US_TOPIC_0000002043018944__li169161610923"><span class="parmname" id="EN-US_TOPIC_0000002043018944__parmname591610101228"><b>Storage Configuration</b></span>: Select a parallel file system.</li><li id="EN-US_TOPIC_0000002043018944__li189162101921"><strong id="EN-US_TOPIC_0000002043018944__b176241344102214">Mount Path</strong>: Enter the cloud mounting path in the training container.</li></ul>
</td>
</tr>
<tr id="EN-US_TOPIC_0000002043018944__en-us_topic_0000001072729016_row1428113162619"><td class="cellrowborder" valign="top" width="15.82%" headers="mcps1.3.3.2.4.1.2.3.1.1 "><p id="EN-US_TOPIC_0000002043018944__en-us_topic_0000001072729016_p92811714266">Persistent Log Saving</p>
</td>
<td class="cellrowborder" valign="top" width="84.17999999999999%" headers="mcps1.3.3.2.4.1.2.3.1.2 "><p id="EN-US_TOPIC_0000002043018944__p18448151533317">If you select CPU or GPU flavors, <strong id="EN-US_TOPIC_0000002043018944__b1133014915520">Persistent Log Saving</strong> is available for you to set.</p>
<p id="EN-US_TOPIC_0000002043018944__en-us_topic_0000001072729016_p572771718264">This function is disabled by default. ModelArts automatically stores the logs for 30 days. You can download all logs on the job details page.</p>
<p id="EN-US_TOPIC_0000002043018944__p1642921316308">After this function is enabled, select an empty OBS path for storing training logs. Ensure that you have read and write permissions to the selected OBS directory.</p>
</td>
</tr>
<tr id="EN-US_TOPIC_0000002043018944__en-us_topic_0000001072729016_row171311402318"><td class="cellrowborder" valign="top" width="15.82%" headers="mcps1.3.3.2.4.1.2.3.1.1 "><p id="EN-US_TOPIC_0000002043018944__en-us_topic_0000001072729016_p171444013312">Auto Stop</p>
</td>
<td class="cellrowborder" valign="top" width="84.17999999999999%" headers="mcps1.3.3.2.4.1.2.3.1.2 "><ul id="EN-US_TOPIC_0000002043018944__en-us_topic_0000001072729016_ul874271773815"><li id="EN-US_TOPIC_0000002043018944__en-us_topic_0000001072729016_li7530825173814">After this parameter is enabled and the auto stop time is set, a training job automatically stops at the specified time.</li><li id="EN-US_TOPIC_0000002043018944__en-us_topic_0000001072729016_li474241712380">If this function is disabled, a training job will continue to run.</li><li id="EN-US_TOPIC_0000002043018944__en-us_topic_0000001072729016_li1537212614369">The options are <strong id="EN-US_TOPIC_0000002043018944__b11811532161819">1hour</strong>, <strong id="EN-US_TOPIC_0000002043018944__b101811932171818">2hours</strong>, <strong id="EN-US_TOPIC_0000002043018944__b16182132161816">4hours</strong>, <strong id="EN-US_TOPIC_0000002043018944__b1418293221819">6hours</strong>, and <strong id="EN-US_TOPIC_0000002043018944__b1018233251813">Customization</strong> (1 hour to 72 hours).</li></ul>
<p id="EN-US_TOPIC_0000002043018944__en-us_topic_0000001072729016_p1540645153414"></p>
</td>
</tr>
</tbody>
</table>
</div>
</li></ol><ol start="5" id="EN-US_TOPIC_0000002043018944__en-us_topic_0000001072729016_ol0873861001"><li id="EN-US_TOPIC_0000002043018944__en-us_topic_0000001072729016_li18162037133010">Click <strong id="EN-US_TOPIC_0000002043018944__b43691138171817">Submit</strong> to create the training job.<p id="EN-US_TOPIC_0000002043018944__en-us_topic_0000001072729016_p192653183295">A training job generally runs for a period of time. To view the real-time status and basic information of a training job, switch to the training job list.</p>
<ul id="EN-US_TOPIC_0000002043018944__en-us_topic_0000001072729016_ul1097185162915"><li id="EN-US_TOPIC_0000002043018944__en-us_topic_0000001072729016_li79965172916">In the training job list, <strong id="EN-US_TOPIC_0000002043018944__b5197175319189">Status</strong> of the newly created training job is <strong id="EN-US_TOPIC_0000002043018944__b819745316188">Pending</strong>.</li><li id="EN-US_TOPIC_0000002043018944__en-us_topic_0000001072729016_li1499195112912">When the status of a training job changes to <strong id="EN-US_TOPIC_0000002043018944__b139074552182">Completed</strong>, the training job is complete, and the generated model is stored in the corresponding training output path.</li><li id="EN-US_TOPIC_0000002043018944__en-us_topic_0000001072729016_li710035152912">If the status is <strong id="EN-US_TOPIC_0000002043018944__b1316915351919">Failed</strong> or <strong id="EN-US_TOPIC_0000002043018944__b5177143141917">Abnormal</strong>, click the job name to go to the job details page and view logs for troubleshooting. For details, see <a href="develop-modelarts-0013.html">Training Job Details</a>.</li></ul>
</li></ol>
</div>
</div>
<div>
<div class="familylinks">
<div class="parentlink"><strong>Parent topic:</strong> <a href="develop-modelarts-0010.html">Performing a Training</a></div>
</div>
</div>