Files
doc-exports/docs/modelarts/umn/develop-modelarts-0002.html
Lai, Weijian 6aa966a79a ModelArts UMN 24.3.0 version
Reviewed-by: Pruthi, Vineet <vineet.pruthi@t-systems.com>
Co-authored-by: Lai, Weijian <laiweijian4@huawei.com>
Co-committed-by: Lai, Weijian <laiweijian4@huawei.com>
2024-11-02 09:04:52 +00:00

30 lines
4.8 KiB
HTML

<a name="EN-US_TOPIC_0000002079176577"></a><a name="EN-US_TOPIC_0000002079176577"></a>
<h1 class="topictitle1">Preparing Data</h1>
<div id="body0000001211470499"><p id="EN-US_TOPIC_0000002079176577__en-us_topic_0000001180077347_p4441511275">ModelArts uses OBS to store data, and backs up and takes snapshots for models, achieving secure, reliable storage at low costs.</p>
<ul id="EN-US_TOPIC_0000002079176577__en-us_topic_0000001180077347_ul10366165917395"><li id="EN-US_TOPIC_0000002079176577__en-us_topic_0000001180077347_li536655910399"><a href="#EN-US_TOPIC_0000002079176577__en-us_topic_0000001180077347_section81631146162713">OBS</a></li><li id="EN-US_TOPIC_0000002079176577__en-us_topic_0000001180077347_li11366359113920"><a href="#EN-US_TOPIC_0000002079176577__en-us_topic_0000001180077347_section471310416365">Obtaining Training Data</a></li></ul>
<div class="section" id="EN-US_TOPIC_0000002079176577__en-us_topic_0000001180077347_section81631146162713"><a name="EN-US_TOPIC_0000002079176577__en-us_topic_0000001180077347_section81631146162713"></a><a name="en-us_topic_0000001180077347_section81631146162713"></a><h4 class="sectiontitle">OBS</h4><p id="EN-US_TOPIC_0000002079176577__en-us_topic_0000001180077347_p8283145820272">OBS provides stable, secure, and efficient cloud storage service that lets you store virtually any volume of unstructured data in any format. Bucket and objects are basic concepts in OBS. A bucket is a container for storing objects in OBS. Each bucket is specific to a region and has specific storage class and access permissions. A bucket is accessible through its domain name over the Internet. An object is the basic unit of data storage in OBS. </p>
<p id="EN-US_TOPIC_0000002079176577__en-us_topic_0000001180077347_p14512131063213">OBS is a data storage center for ModelArts. All the input data, output data, and cache data during AI development can be stored in OBS buckets for reading.</p>
<p id="EN-US_TOPIC_0000002079176577__en-us_topic_0000001180077347_p3977142833620">Before using ModelArts, <a href="modelarts_08_0003.html">create an OBS bucket</a> and folders for storing data.</p>
</div>
<div class="fignone" id="EN-US_TOPIC_0000002079176577__en-us_topic_0000001180077347_fig8288447193120"><span class="figcap"><b>Figure 1 </b>OBS</span><br><span><img id="EN-US_TOPIC_0000002079176577__en-us_topic_0000001180077347_image1428934783115" src="figure/en-us_image_0000002043177512.png" height="325.185" width="523.6875" title="Click to enlarge" class="imgResize"></span></div>
<div class="section" id="EN-US_TOPIC_0000002079176577__en-us_topic_0000001180077347_section471310416365"><a name="EN-US_TOPIC_0000002079176577__en-us_topic_0000001180077347_section471310416365"></a><a name="en-us_topic_0000001180077347_section471310416365"></a><h4 class="sectiontitle">Obtaining Training Data</h4><p id="EN-US_TOPIC_0000002079176577__en-us_topic_0000001180077347_p16291192681810">Use either of the following methods to obtain ModelArts training data:</p>
<ul id="EN-US_TOPIC_0000002079176577__en-us_topic_0000001180077347_ul137347509177"><li id="EN-US_TOPIC_0000002079176577__en-us_topic_0000001180077347_li073085211713">Datasets stored in OBS buckets<p id="EN-US_TOPIC_0000002079176577__en-us_topic_0000001180077347_p2136114619237"><a name="EN-US_TOPIC_0000002079176577__en-us_topic_0000001180077347_li073085211713"></a><a name="en-us_topic_0000001180077347_li073085211713"></a>After labeling and preprocessing your dataset, upload it to an OBS bucket. When you create a training job, set <strong id="EN-US_TOPIC_0000002079176577__en-us_topic_0000001180077347_b6471642417">Training Input</strong> to the path of the OBS bucket where the training data is stored.</p>
</li><li id="EN-US_TOPIC_0000002079176577__en-us_topic_0000001180077347_li1073435011173">Datasets in data management<p id="EN-US_TOPIC_0000002079176577__en-us_topic_0000001180077347_p1760131831818"><a name="EN-US_TOPIC_0000002079176577__en-us_topic_0000001180077347_li1073435011173"></a><a name="en-us_topic_0000001180077347_li1073435011173"></a>If your dataset has not labeled or requires preprocessing, import it to ModelArts data management for data preprocessing. </p>
</li></ul>
<div class="fignone" id="EN-US_TOPIC_0000002079176577__en-us_topic_0000001180077347_fig1126520510815"><span class="figcap"><b>Figure 2 </b>Preparing data</span><br><span><img id="EN-US_TOPIC_0000002079176577__en-us_topic_0000001180077347_image9265175118814" src="figure/en-us_image_0000002043019208.png" height="244.38750000000002" width="523.6875" title="Click to enlarge" class="imgResize"></span></div>
</div>
</div>
<div>
<div class="familylinks">
<div class="parentlink"><strong>Parent topic:</strong> <a href="modelarts_77_0148.html">Training Management</a></div>
</div>
</div>
<script language="JavaScript">
<!--
image_size('.imgResize');
var msg_imageMax = "view original image";
var msg_imageClose = "close";
//--></script>