Yang, Tong 6182f91ba8 MRS component operation guide_normal 2.0.38.SP20 version
Reviewed-by: Hasko, Vladimir <vladimir.hasko@t-systems.com>
Co-authored-by: Yang, Tong <yangtong2@huawei.com>
Co-committed-by: Yang, Tong <yangtong2@huawei.com>
2022-12-09 14:55:21 +00:00

30 lines
8.2 KiB
HTML

<a name="mrs_01_0408"></a><a name="mrs_01_0408"></a>
<h1 class="topictitle1">Example: Using Loader to Import Data from OBS to HDFS</h1>
<div id="body1589421630819"><div class="section" id="mrs_01_0408__scb2c24634b0344dba198e390d6cea59e"><h4 class="sectiontitle">Scenario</h4><p id="mrs_01_0408__a9b16a657788a4bd7987afc1d3a64231f">If you need to import a large volume of data from the external cluster to the internal cluster, import it from OBS to HDFS.</p>
</div>
<div class="section" id="mrs_01_0408__sb8fe6f0415124936bb9a7810db345b17"><h4 class="sectiontitle">Prerequisites</h4><ul id="mrs_01_0408__u77b44b62f4094567bab90cb33acea37d"><li id="mrs_01_0408__l3da0b33cd6d647379b1ae998f71758af">You have prepared service data.</li><li id="mrs_01_0408__l8719dce3c103490ca6b2589ab48280fe">You have created an analysis cluster.</li></ul>
</div>
<div class="section" id="mrs_01_0408__s12aa651484ea43cd846dd6d94301117b"><h4 class="sectiontitle">Procedure</h4><ol id="mrs_01_0408__o7c344be096d34bc088536862a749ba75"><li id="mrs_01_0408__lca7b4202ba684befb04e8ce0b88c8e94"><span>Upload service data to your OBS file system.</span></li><li id="mrs_01_0408__l1f2a15abe5f0458091ec622d8621a9c8"><span>Obtain the AK/SK information and create an OBS and HDFS link.</span><p><p id="mrs_01_0408__adce6e642c8b748bebc2fb8711347521c">For details, see <a href="mrs_01_0402.html">Loader Link Configuration</a>.</p>
</p></li><li id="mrs_01_0408__lb978fb6411794f79bba2776dab66b6b1"><span>Access the Loader page.</span><p><p id="mrs_01_0408__en-us_topic_0071084995_p953337610204">If Kerberos authentication is enabled in the analysis cluster, refer to instructions in <a href="mrs_01_0370.html">Accessing the Hue Web UI</a>.</p>
</p></li><li id="mrs_01_0408__l60efd765e66043b39b26f9a4460f2b52"><span>Click <span class="uicontrol" id="mrs_01_0408__uicontrol1659698169164433"><b>New Job</b></span>.</span></li><li id="mrs_01_0408__lc59ee384b3eb475b9638244066f282ad"><span>In <span class="parmname" id="mrs_01_0408__parmname20402369152130"><b>Information</b></span>, set parameters.</span><p><ol type="a" id="mrs_01_0408__od37b4ecbc96e4477a63660d86353f041"><li id="mrs_01_0408__l7f72db27a5dd419a908118788798ac63">In <span class="parmname" id="mrs_01_0408__parmname51594321151140"><b>Name</b></span>, enter a job name. For example, <span class="parmvalue" id="mrs_01_0408__parmvalue63986881816464"><b>obs2hdfs</b></span>.</li><li id="mrs_01_0408__l315ed74593c7407084dd54a8b47ebbe4">In <span class="parmname" id="mrs_01_0408__parmname2095538803164619"><b>From link</b></span>, select the OBS link you create.</li><li id="mrs_01_0408__ldf73c237ef764a2193eee7a0cefca3e4">In <span class="parmname" id="mrs_01_0408__parmname5528201416472"><b>To link</b></span>, select the HDFS link you create.</li></ol>
</p></li><li id="mrs_01_0408__lfc0f9b6e01894f928eb3cd1de00368d7"><span>In <span class="parmname" id="mrs_01_0408__parmname503727309164741"><b>From</b></span>, set source link parameters.</span><p><ol type="a" id="mrs_01_0408__o91654955aa5e4d90b6b43fc1e5e82377"><li id="mrs_01_0408__l400434a76922470797bc3532a47e4dc5">In <span class="parmname" id="mrs_01_0408__parmname101157131616561"><b>Bucket Name</b></span>, enter a name of the OBS file system.</li><li id="mrs_01_0408__l11d8e94881c6454bb7bb15206084c2b3">In <span class="parmname" id="mrs_01_0408__parmname829597205171631"><b>Input directory or file</b></span>, enter a detailed location of service data in the file system.<p id="mrs_01_0408__a0ab56965724e4f7bbe2f4bb6386f11e3">If it is a single file, enter a complete path containing the file name. If it is a directory, enter the complete path of the directory.</p>
</li><li id="mrs_01_0408__ld408930f2f5d426289bd0acd45f8e485"><a name="mrs_01_0408__ld408930f2f5d426289bd0acd45f8e485"></a><a name="ld408930f2f5d426289bd0acd45f8e485"></a>In <span class="parmname" id="mrs_01_0408__parmname1652683619171836"><b>File format</b></span>, enter the type of the service data file.</li></ol>
<p id="mrs_01_0408__a1e3cdc6fd6804cf5b1c9a5d4b46f0785">For details, see <a href="mrs_01_0404.html#mrs_01_0404__sdd455438f59c455d868736ad52d1097c">obs-connector</a>.</p>
</p></li><li id="mrs_01_0408__l1e568724e0c5448fabe5ed665a4fa4f9"><span>In <span class="parmname" id="mrs_01_0408__parmname1184816220"><b>To</b></span>, set destination link parameters.</span><p><ol type="a" id="mrs_01_0408__o8312dce2b66d467b83be42187340cc3b"><li id="mrs_01_0408__leae0b854bf904fd1b153fd0af5aa9981">In <span class="parmname" id="mrs_01_0408__parmname160214853172029"><b>Output directory</b></span>, enter the directory for storing service data in HDFS.<p id="mrs_01_0408__en-us_topic_0071084995_p837884910308">If Kerberos authentication is enabled in the cluster, the current user accessing Loader needs to have the permission to write data to the directory.</p>
</li><li id="mrs_01_0408__ld445f5283d544c5aba02bf95ac944f3b">In <span class="parmname" id="mrs_01_0408__parmname1112461615"><b>File format</b></span>, enter the type of the service data file.<p id="mrs_01_0408__a74dda2cba61e4d069f493335ec16621c">The type must correspond to the type in <a href="#mrs_01_0408__ld408930f2f5d426289bd0acd45f8e485">6.c</a>.</p>
</li><li id="mrs_01_0408__lf74985ca92e248f79cddcbbd6856706b">In <span class="parmname" id="mrs_01_0408__parmname85920515717242"><b>Compression codec</b></span>, enter a compression algorithm. For example, if you do not compress data, select <span class="parmvalue" id="mrs_01_0408__parmvalue67655049017256"><b>NONE</b></span>.</li><li id="mrs_01_0408__lb725c3e1051c4647908b97e97353ab42">In <span class="parmname" id="mrs_01_0408__parmname1914292716172713"><b>Overwrite</b></span>, select <span class="parmvalue" id="mrs_01_0408__parmvalue1605762510172722"><b>True</b></span>.</li><li id="mrs_01_0408__lf1c1b54f8f134020a4c29ad0d85a4d23">Click <span class="uicontrol" id="mrs_01_0408__uicontrol1951153571172814"><b>Show Senior Parameter</b></span> and set <span class="parmname" id="mrs_01_0408__parmname1057159813172859"><b>Line Separator</b></span>.</li><li id="mrs_01_0408__l3803b4cd9de241658acf6f4d276cb9a1">Set <span class="parmname" id="mrs_01_0408__parmname122632060172950"><b>Field Separator</b></span>.</li></ol>
<p id="mrs_01_0408__a4aacc8c4d8c5409796a436d4b86c9707">For details, see <a href="mrs_01_0405.html#mrs_01_0405__s0e7a49c2520c498aa9e3d9fa84325e2e">hdfs-connector</a>.</p>
</p></li><li id="mrs_01_0408__l456b9bcbabf34860b89f476587cdf153"><span>In <span class="parmname" id="mrs_01_0408__parmname36974479154331"><b>Task Config</b></span>, set job running parameters.</span><p><ol type="a" id="mrs_01_0408__o3ee530c32e234b5d8ebeaa0561c8ef72"><li id="mrs_01_0408__ldbc98677173c4ef3afe4e696c3192a98">In <span class="parmname" id="mrs_01_0408__parmname1803883814173052"><b>Extractors</b></span>, enter the number of Map tasks.</li><li id="mrs_01_0408__l908bf3efd57b437b81f4d8781f3a69d0">In <span class="parmname" id="mrs_01_0408__parmname488126471173121"><b>Loaders</b></span>, enter the number of Reduce tasks.<p id="mrs_01_0408__a9becd3a73c5e44c49d16124bdf9ff49b">If the destination link is an HDFS link, <span class="parmname" id="mrs_01_0408__parmname8657182919211"><b>Loaders</b></span> is hidden.</p>
</li><li id="mrs_01_0408__l3d43d564f0df4ea39f51a6d9af702d54">In <span class="parmname" id="mrs_01_0408__parmname257561912173150"><b>Max error records in single split</b></span>, enter an error record threshold.</li><li id="mrs_01_0408__ld565dd7961e3433d84db354f58a367ca">In <span class="parmname" id="mrs_01_0408__parmname317851576173326"><b>Dirty data directory</b></span>, enter a directory for saving dirty data, for example, <span class="filepath" id="mrs_01_0408__filepath17339314817342"><b>/user/sqoop/obs2hdfs-dd</b></span>.</li></ol>
</p></li><li id="mrs_01_0408__lf6592a9edadb4327bd7bbb3450084c01"><span>Click <span class="uicontrol" id="mrs_01_0408__uicontrol1467444419173417"><b>Save and execute</b></span>.</span><p><p id="mrs_01_0408__ac1fb3ebd35f947cfba55e2066f6d64b8">On the <span class="parmname" id="mrs_01_0408__parmname127432635216572"><b>Manage jobs</b></span> page, view the job running result. You can click <span class="uicontrol" id="mrs_01_0408__uicontrol21689711104833"><b>Refresh</b></span> to obtain the latest job status.</p>
</p></li></ol>
</div>
</div>
<div>
<div class="familylinks">
<div class="parentlink"><strong>Parent topic:</strong> <a href="mrs_01_0400.html">Using Loader</a></div>
</div>
</div>