Yang, Tong 6182f91ba8 MRS component operation guide_normal 2.0.38.SP20 version
Reviewed-by: Hasko, Vladimir <vladimir.hasko@t-systems.com>
Co-authored-by: Yang, Tong <yangtong2@huawei.com>
Co-committed-by: Yang, Tong <yangtong2@huawei.com>
2022-12-09 14:55:21 +00:00

43 lines
4.4 KiB
HTML

<a name="mrs_01_1636"></a><a name="mrs_01_1636"></a>
<h1 class="topictitle1">Improving the BulkLoad Efficiency</h1>
<div id="body1595926919615"><div class="section" id="mrs_01_1636__sc9dfd4052a474b238112bfc35f39f30e"><h4 class="sectiontitle">Scenario</h4><p id="mrs_01_1636__a653a0af268704f2697fcb43d7afa98a1">BulkLoad uses MapReduce jobs to directly generate files that comply with the internal data format of HBase, and then loads the generated StoreFiles to a running cluster. Compared with HBase APIs, BulkLoad saves more CPU and network resources.</p>
<p id="mrs_01_1636__a5bb3f57630064401a4d72037fd0a9fc0">ImportTSV is an HBase table data loading tool.</p>
<div class="note" id="mrs_01_1636__note3659121463219"><img src="public_sys-resources/note_3.0-en-us.png"><span class="notetitle"> </span><div class="notebody"><p id="mrs_01_1636__p136601141326">This section applies to MRS 3.<em id="mrs_01_1636__i93983353510">x</em> and later versions.</p>
</div></div>
</div>
<div class="section" id="mrs_01_1636__s48ef0970ce6f460d947d04cb3bc8a850"><h4 class="sectiontitle">Prerequisites</h4><p id="mrs_01_1636__a60123a6eb0d342c8a102b6ae6fa88829">When using BulkLoad, the output path of the file has been specified using the <span class="parmname" id="mrs_01_1636__parmname47821738855"><b>Dimporttsv.bulk.output</b></span> parameter.</p>
</div>
<div class="section" id="mrs_01_1636__s856b324e426443708e88208f943da204"><h4 class="sectiontitle">Procedure</h4><p id="mrs_01_1636__aeeb187e2a38f4401acdd7df470bb10e5">Add the following parameter to the BulkLoad command when performing a batch loading task:</p>
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="mrs_01_1636__t2ab511d7d6b24a3890065fb5d617aa83" frame="border" border="1" rules="all"><caption><b>Table 1 </b>Parameter for improving BulkLoad efficiency</caption><thead align="left"><tr id="mrs_01_1636__r88aeccb32c3a4817a1eeb268530a91ec"><th align="left" class="cellrowborder" valign="top" width="22.62%" id="mcps1.3.3.3.2.4.1.1"><p id="mrs_01_1636__a2f9f07352bb34119b82ce3cd36d32dc6">Parameter</p>
</th>
<th align="left" class="cellrowborder" valign="top" width="54.300000000000004%" id="mcps1.3.3.3.2.4.1.2"><p id="mrs_01_1636__a3707bff6bdac4fc5942f85e7a0ff799a">Description</p>
</th>
<th align="left" class="cellrowborder" valign="top" width="23.080000000000002%" id="mcps1.3.3.3.2.4.1.3"><p id="mrs_01_1636__a9545585db0a346e88bebe9d2878c21a2">Value</p>
</th>
</tr>
</thead>
<tbody><tr id="mrs_01_1636__r4513599d05ec4afeb9e7fdcd1f818f1a"><td class="cellrowborder" valign="top" width="22.62%" headers="mcps1.3.3.3.2.4.1.1 "><p id="mrs_01_1636__abd0af664492f4ea59b642a152477430f">-Dimporttsv.mapper.class</p>
</td>
<td class="cellrowborder" valign="top" width="54.300000000000004%" headers="mcps1.3.3.3.2.4.1.2 "><p id="mrs_01_1636__ad0a4c940285b4837bceee5f8ffe817fc">The construction of key-value pairs is moved from the user-defined mapper to reducer to improve performance. The mapper only needs to send the original text in each row to the reducer. The reducer parses the record in each row and creates a key-value) pair.</p>
<div class="note" id="mrs_01_1636__nc2a7bbe9146e4935ba461978df01f0bf"><span class="notetitle"> NOTE: </span><div class="notebody"><p id="mrs_01_1636__a31718962b1174f26864f590b397abc23">When this parameter is set to <span class="parmvalue" id="mrs_01_1636__parmvalue13362125515512"><b>org.apache.hadoop.hbase.mapreduce.TsvImporterByteMapper</b></span>, this parameter is used only when the batch loading command without the <i><span class="varname" id="mrs_01_1636__varname536714554516">HBASE_CELL_VISIBILITY OR HBASE_CELL_TTL</span></i> option is executed. The <span class="parmvalue" id="mrs_01_1636__parmvalue1182811588514"><b>org.apache.hadoop.hbase.mapreduce.TsvImporterByteMapper</b></span> provides better performance.</p>
</div></div>
</td>
<td class="cellrowborder" valign="top" width="23.080000000000002%" headers="mcps1.3.3.3.2.4.1.3 "><p id="mrs_01_1636__af1dd7b62ad964fa1b79dee3fa4f71002">org.apache.hadoop.hbase.mapreduce.TsvImporterByteMapper</p>
<p id="mrs_01_1636__aff52d37ac1fc4e1c89857ee92241f744">and</p>
<p id="mrs_01_1636__a3bf1f0964f2d441bb745dd77cc6c2cbe">org.apache.hadoop.hbase.mapreduce.TsvImporterTextMapper</p>
</td>
</tr>
</tbody>
</table>
</div>
</div>
</div>
<div>
<div class="familylinks">
<div class="parentlink"><strong>Parent topic:</strong> <a href="mrs_01_1013.html">HBase Performance Tuning</a></div>
</div>
</div>