forked from docs/doc-exports
Reviewed-by: Hasko, Vladimir <vladimir.hasko@t-systems.com> Co-authored-by: Lai, Weijian <laiweijian4@huawei.com> Co-committed-by: Lai, Weijian <laiweijian4@huawei.com>
116 lines
8.9 KiB
HTML
116 lines
8.9 KiB
HTML
<a name="EN-US_TOPIC_0000001910012872"></a><a name="EN-US_TOPIC_0000001910012872"></a>
|
|
|
|
<h1 class="topictitle1">Data Deredundancy</h1>
|
|
<div id="body0000001147857528"><div class="section" id="EN-US_TOPIC_0000001910012872__section82221552193917"><h4 class="sectiontitle">RRD Operator Overview</h4><p id="EN-US_TOPIC_0000001910012872__p591174864111">The data with the largest difference can be removed based on the preset proportion.</p>
|
|
|
|
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="EN-US_TOPIC_0000001910012872__table9901181815119" frame="border" border="1" rules="all"><caption><b>Table 1 </b>Advanced parameters</caption><thead align="left"><tr id="EN-US_TOPIC_0000001910012872__row18902118121114"><th align="left" class="cellrowborder" valign="top" width="16.400000000000002%" id="mcps1.3.1.3.2.5.1.1"><p id="EN-US_TOPIC_0000001910012872__p2408105171318">Name</p>
|
|
</th>
|
|
<th align="left" class="cellrowborder" valign="top" width="10.48%" id="mcps1.3.1.3.2.5.1.2"><p id="EN-US_TOPIC_0000001910012872__p2408951161319">Mandatory</p>
|
|
</th>
|
|
<th align="left" class="cellrowborder" valign="top" width="12.7%" id="mcps1.3.1.3.2.5.1.3"><p id="EN-US_TOPIC_0000001910012872__p54080515133">Default</p>
|
|
</th>
|
|
<th align="left" class="cellrowborder" valign="top" width="60.419999999999995%" id="mcps1.3.1.3.2.5.1.4"><p id="EN-US_TOPIC_0000001910012872__p640875131319">Description</p>
|
|
</th>
|
|
</tr>
|
|
</thead>
|
|
<tbody><tr id="EN-US_TOPIC_0000001910012872__row1490241851111"><td class="cellrowborder" valign="top" width="16.400000000000002%" headers="mcps1.3.1.3.2.5.1.1 "><p id="EN-US_TOPIC_0000001910012872__p1969913710152">sample_ratio</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="10.48%" headers="mcps1.3.1.3.2.5.1.2 "><p id="EN-US_TOPIC_0000001910012872__p640825111317">No</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="12.7%" headers="mcps1.3.1.3.2.5.1.3 "><p id="EN-US_TOPIC_0000001910012872__p14408185141312">0.9</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="60.419999999999995%" headers="mcps1.3.1.3.2.5.1.4 "><p id="EN-US_TOPIC_0000001910012872__p134081951171310">Percentage of reserved data. The value ranges from 0 to 1. For example, <strong id="EN-US_TOPIC_0000001910012872__b163461520104320">0.9</strong> indicates that 90% of the original data is reserved.</p>
|
|
</td>
|
|
</tr>
|
|
<tr id="EN-US_TOPIC_0000001910012872__row37101415191613"><td class="cellrowborder" valign="top" width="16.400000000000002%" headers="mcps1.3.1.3.2.5.1.1 "><p id="EN-US_TOPIC_0000001910012872__p671061571617">n_clusters</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="10.48%" headers="mcps1.3.1.3.2.5.1.2 "><p id="EN-US_TOPIC_0000001910012872__p16710101591610">auto</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="12.7%" headers="mcps1.3.1.3.2.5.1.3 "><p id="EN-US_TOPIC_0000001910012872__p1471041513166">auto</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="60.419999999999995%" headers="mcps1.3.1.3.2.5.1.4 "><p id="EN-US_TOPIC_0000001910012872__p071018152162">Number of data sample types. The default value is <strong id="EN-US_TOPIC_0000001910012872__b145467262439">auto</strong>, indicating that the total number of types is obtained based on the number of images in the directory. For example, you can specify the number of types to <strong id="EN-US_TOPIC_0000001910012872__b125533268439">4</strong>.</p>
|
|
</td>
|
|
</tr>
|
|
<tr id="EN-US_TOPIC_0000001910012872__row1090216183118"><td class="cellrowborder" valign="top" width="16.400000000000002%" headers="mcps1.3.1.3.2.5.1.1 "><p id="EN-US_TOPIC_0000001910012872__p4408551151319">do_validation</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="10.48%" headers="mcps1.3.1.3.2.5.1.2 "><p id="EN-US_TOPIC_0000001910012872__p44082510139">No</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="12.7%" headers="mcps1.3.1.3.2.5.1.3 "><p id="EN-US_TOPIC_0000001910012872__p540820517138">True</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="60.419999999999995%" headers="mcps1.3.1.3.2.5.1.4 "><p id="EN-US_TOPIC_0000001910012872__p74081051191312">Indicates whether to validate data. The value can be <strong id="EN-US_TOPIC_0000001910012872__b178781555164316">True</strong> or <strong id="EN-US_TOPIC_0000001910012872__b3878455164316">False</strong>. <strong id="EN-US_TOPIC_0000001910012872__b144448586437">True</strong> indicates that data is validated before deredundancy. <strong id="EN-US_TOPIC_0000001910012872__b114455584435">False</strong> indicates that data is deduplicated only.</p>
|
|
</td>
|
|
</tr>
|
|
</tbody>
|
|
</table>
|
|
</div>
|
|
</div>
|
|
<div class="section" id="EN-US_TOPIC_0000001910012872__section1182518553429"><h4 class="sectiontitle">Operator Input Requirements</h4><p id="EN-US_TOPIC_0000001910012872__p16537340104320">The following two types of operator input are available:</p>
|
|
<ul id="EN-US_TOPIC_0000001910012872__ul14876131134413"><li id="EN-US_TOPIC_0000001910012872__li77422281519"><span class="parmname" id="EN-US_TOPIC_0000001910012872__parmname2870181447"><b>Datasets</b></span>: Select a dataset and its version created on the ModelArts console from the drop-down list. Ensure that the dataset type be the same as the scenario type selected in this task.</li><li id="EN-US_TOPIC_0000001910012872__li76966371454"><strong id="EN-US_TOPIC_0000001910012872__b1039814206449">OBSCatalog</strong>: Select either of the following storage structures:<ul id="EN-US_TOPIC_0000001910012872__ul1453175141914"><li id="EN-US_TOPIC_0000001910012872__li1090315312190"><strong id="EN-US_TOPIC_0000001910012872__b52781156194416">Only images</strong>: If the directory contains only images, the JPG, JPEG, PNG, and BMP formats are supported, and all images in the nested subdirectories are read.</li><li id="EN-US_TOPIC_0000001910012872__li129034310194"><strong id="EN-US_TOPIC_0000001910012872__b14845557174412">Images and labels</strong>: The structure varies depending on the scenario type.<p id="EN-US_TOPIC_0000001910012872__p107521758151813">The following shows the directory structure in the image classification scenario. The following directory structure supports only single-label scenarios.</p>
|
|
<pre class="screen" id="EN-US_TOPIC_0000001910012872__screen1799653232014">input_path/
|
|
--label1/
|
|
----1.jpg
|
|
--label2/
|
|
----2.jpg
|
|
--../</pre>
|
|
<p id="EN-US_TOPIC_0000001910012872__p81551055184710">The following shows the directory structure in the object detection scenario. Images in JPG, JPEG, PNG, and BMP formats are supported. XML files are standard PACAL VOC files.</p>
|
|
<pre class="screen" id="EN-US_TOPIC_0000001910012872__screen969651015228">input_path/
|
|
--1.jpg
|
|
--1.xml
|
|
--2.jpg
|
|
--2.xml
|
|
...</pre>
|
|
</li></ul>
|
|
</li></ul>
|
|
</div>
|
|
<div class="section" id="EN-US_TOPIC_0000001910012872__section885518434215"><h4 class="sectiontitle">Output Description</h4><ul id="EN-US_TOPIC_0000001910012872__ul83177113307"><li id="EN-US_TOPIC_0000001910012872__li1276562883015"><strong id="EN-US_TOPIC_0000001910012872__b9206492451">Image classification</strong><p id="EN-US_TOPIC_0000001910012872__p276592843018">The output directory structure is as follows:</p>
|
|
<pre class="screen" id="EN-US_TOPIC_0000001910012872__screen1876515285303">output_path/
|
|
--Data/
|
|
----class1/ # If the input data has labeling information, the information is also output. class1 indicates the labeling class.
|
|
------1.jpg
|
|
----class2/
|
|
------2.jpg
|
|
------3.jpg
|
|
--output.manifest</pre>
|
|
<p id="EN-US_TOPIC_0000001910012872__p1176515287301">A manifest file example is as follows:</p>
|
|
<pre class="screen" id="EN-US_TOPIC_0000001910012872__screen37653282302">{
|
|
"id": "xss",
|
|
"source": "obs://home/fc8e2688015d4a1784dcbda44d840307_14.jpg",
|
|
"usage": "train",
|
|
"annotation": [
|
|
{
|
|
"name": "Cat",
|
|
"type": "modelarts/image_classification"
|
|
}
|
|
]
|
|
}</pre>
|
|
</li></ul>
|
|
<ul id="EN-US_TOPIC_0000001910012872__ul27141151590"><li id="EN-US_TOPIC_0000001910012872__li13412531197"><strong id="EN-US_TOPIC_0000001910012872__b1026462194512">Object detection</strong><div class="p" id="EN-US_TOPIC_0000001910012872__p6766425111020">The output directory structure is as follows:<pre class="screen" id="EN-US_TOPIC_0000001910012872__screen7450135814916">output_path/
|
|
--Data/
|
|
----1.jpg
|
|
----1.xml # If the input data has labeling information, the information is also output. xml indicates the label file.
|
|
----2.jpg
|
|
----3.jpg
|
|
--output.manifest</pre>
|
|
</div>
|
|
<p id="EN-US_TOPIC_0000001910012872__p384124695911">A manifest file example is as follows:</p>
|
|
<pre class="screen" id="EN-US_TOPIC_0000001910012872__screen6941163881116">{
|
|
"source":"obs://fake/be462ea9c5abc09f.jpg",
|
|
"annotation":[
|
|
{
|
|
"annotation-loc":"obs://fake/be462ea9c5abc09f.xml",
|
|
"type":"modelarts/object_detection",
|
|
"annotation-format":"PASCAL VOC",
|
|
"annotated-by":"modelarts/hard_example_algo"
|
|
}
|
|
]
|
|
}</pre>
|
|
</li></ul>
|
|
</div>
|
|
</div>
|
|
<div>
|
|
<div class="familylinks">
|
|
<div class="parentlink"><strong>Parent topic:</strong> <a href="dataprocess-modelarts-00005.html">Data Selection</a></div>
|
|
</div>
|
|
</div>
|
|
|