Yang, Tong 6182f91ba8 MRS component operation guide_normal 2.0.38.SP20 version
Reviewed-by: Hasko, Vladimir <vladimir.hasko@t-systems.com>
Co-authored-by: Yang, Tong <yangtong2@huawei.com>
Co-committed-by: Yang, Tong <yangtong2@huawei.com>
2022-12-09 14:55:21 +00:00

190 lines
25 KiB
HTML

<a name="mrs_01_0794"></a><a name="mrs_01_0794"></a>
<h1 class="topictitle1">Running the DistCp Command</h1>
<div id="body1590130531380"><div class="section" id="mrs_01_0794__se9608011680e423ca403d5207c374daa"><h4 class="sectiontitle">Scenario</h4><p id="mrs_01_0794__a38fb70b283514ab78798b4fe0c90e0b6">DistCp is a tool used to perform large-amount data replication between clusters or in a cluster. It uses MapReduce tasks to implement distributed copy of a large amount of data.</p>
</div>
<div class="section" id="mrs_01_0794__sa8a135a114cf4cbc8674242bd0cfebd7"><h4 class="sectiontitle">Prerequisites</h4><ul id="mrs_01_0794__ul6387132716574"><li id="mrs_01_0794__li8601330165719">The Yarn client or a client that contains Yarn has been installed. For example, the installation directory is <strong id="mrs_01_0794__b8319135514116">/opt/client</strong>.</li><li id="mrs_01_0794__la8a352e20c414015a0e60f8023ecc0a5">Service users of each component are created by the system administrator based on service requirements. In security mode, machine-machine users need to download the keytab file. A human-machine user must change the password upon the first login. (Not involved in normal mode)</li><li id="mrs_01_0794__li442982112020">To copy data between clusters, you need to enable the inter-cluster data copy function on both clusters.</li></ul>
</div>
<div class="section" id="mrs_01_0794__s729d2c48e7354c7bb15884e152e37570"><h4 class="sectiontitle">Procedure</h4><ol id="mrs_01_0794__o65a723c3f0644a5ca5bc95b50051ef8f"><li id="mrs_01_0794__l3a2e1b7a8d004152865881b7d9bda58b"><span>Log in to the node where the client is installed.</span></li><li id="mrs_01_0794__l918962d747ac488fae8d369a02cf0b01"><span>Run the following command to go to the client installation directory:</span><p><p id="mrs_01_0794__ae6fabba0d23e4ae4bc35c6ef79cff605"><strong id="mrs_01_0794__a5a32ca4aacc5401ca0e8e8a12040e570">cd /opt/client</strong></p>
</p></li><li id="mrs_01_0794__lf110d935709c4ac48ee0ad1420665c2e"><span>Run the following command to configure environment variables:</span><p><p id="mrs_01_0794__acfff651122a74259a5e01741838c29e9"><strong id="mrs_01_0794__a468a1a99aae541fbab02686dddf0e45a">source bigdata_env</strong></p>
</p></li><li id="mrs_01_0794__lc24dad0c1fb34eec930f24087726a7fd"><span>If the cluster is in security mode, the user group to which the user executing the DistCp command belongs must be <strong id="mrs_01_0794__b107621146842">supergroup</strong> and the user run the following command to perform user authentication. In normal mode, user authentication is not required.</span><p><p id="mrs_01_0794__a2a7d2557b3594004a61ea1dc86fb7d75"><strong id="mrs_01_0794__b299183011614">kinit</strong> <em id="mrs_01_0794__a3b95ea3e27ab410b9ddb246864099722">Component service user</em></p>
</p></li><li id="mrs_01_0794__li6626247810502"><span>Run the DistCp command. The following provides an example:</span><p><p id="mrs_01_0794__p627806210507"><strong id="mrs_01_0794__b49237178104947">hadoop distcp hdfs://hacluster/source hdfs://hacluster/target</strong></p>
</p></li></ol>
</div>
<div class="section" id="mrs_01_0794__section17302426416"><h4 class="sectiontitle">Common Usage of DistCp</h4><ol id="mrs_01_0794__ol1757016389429"><li id="mrs_01_0794__li192632919213">The following is an example of the commonest usage of DistCp:<pre class="screen" id="mrs_01_0794__screen141611532139">hadoop distcp -numListstatusThreads 40 -update -delete -prbugpaxtq <em id="mrs_01_0794__i167714412139">hdfs://cluster1/source hdfs://cluster2/target</em></pre>
<div class="note" id="mrs_01_0794__note6412193754414"><img src="public_sys-resources/note_3.0-en-us.png"><span class="notetitle"> </span><div class="notebody"><p id="mrs_01_0794__p54281509457">In the preceding command:</p>
<ul id="mrs_01_0794__ul8334655134616"><li id="mrs_01_0794__li3334455124620"><strong id="mrs_01_0794__b19634815910">-numListstatusThreads</strong> specifies the number of threads for creating the list of 40 copied files.</li></ul>
<ul id="mrs_01_0794__ul74751122470"><li id="mrs_01_0794__li7475228471"><strong id="mrs_01_0794__b962824913918">-update -delete</strong> specifies that files at the source location and the target location are synchronized, and that files with excessive target locations are deleted. If you need to copy files incrementally, delete <strong id="mrs_01_0794__b1261432510107">-delete</strong>.</li></ul>
<ul id="mrs_01_0794__ul17709139164710"><li id="mrs_01_0794__li2709179164715">If <strong id="mrs_01_0794__b680915321101">-prbugpaxtq</strong> and <strong id="mrs_01_0794__b1063864431010">-update</strong> are used, it indicates that the status information of the copied file is also updated.</li></ul>
<ul id="mrs_01_0794__ul1973171214478"><li id="mrs_01_0794__li2973101264712"><strong id="mrs_01_0794__b12771173891212">hdfs://cluster1/source</strong> indicates the source location, and <strong id="mrs_01_0794__b1437312444126">hdfs://cluster2/target</strong> indicates the target location.</li></ul>
</div></div>
</li><li id="mrs_01_0794__li55701438184214">The following is an example of data copy between clusters:<pre class="screen" id="mrs_01_0794__screen1199431118119">hadoop distcp hdfs://cluster1/foo/bar <em id="mrs_01_0794__i1559365820132">hdfs://cluster2/bar/foo</em></pre>
<div class="note" id="mrs_01_0794__note68354112220"><img src="public_sys-resources/note_3.0-en-us.png"><span class="notetitle"> </span><div class="notebody"><p id="mrs_01_0794__p108135442211">The network between cluster1 and cluster2 must be reachable, and the two clusters must use the same HDFS version or compatible HDFS versions.</p>
</div></div>
</li><li id="mrs_01_0794__li13570143814424">The following are multiple examples of data copy in a source directory:<pre class="screen" id="mrs_01_0794__screen7758112755111">hadoop distcp hdfs://cluster1/foo/a \
hdfs://cluster1/foo/b \
hdfs://cluster2/bar/foo</pre>
<p id="mrs_01_0794__p87101910115115">The preceding command is used to copy the folders a and b of cluster1 to the <span class="filepath" id="mrs_01_0794__filepath194131133141"><b>/bar/foo</b></span> directory of cluster2. The effect is equivalent to that of the following commands:</p>
<pre class="screen" id="mrs_01_0794__screen14289534137">hadoop distcp -f hdfs://cluster1/srclist \
hdfs://cluster2/bar/foo</pre>
<p id="mrs_01_0794__p18710181095116">The content of <strong id="mrs_01_0794__b1333612514179">srclist</strong> is as follows. Before running the DistCp command, upload the <strong id="mrs_01_0794__b13842301818">srclist</strong> file to HDFS.</p>
<pre class="screen" id="mrs_01_0794__screen949281513816">hdfs://cluster1/foo/a
hdfs://cluster1/foo/b</pre>
</li><li id="mrs_01_0794__li7389173504218"><strong id="mrs_01_0794__b9238121314317">-update</strong> indicates that a to-be-copied file does not exist in the target location, or the content of the copied file in the target location is updated; and <strong id="mrs_01_0794__b13462182474311">-overwrite</strong> is used to overwrite existing files in the target location.<p id="mrs_01_0794__p127984010428">The following is an example of the difference between no option and any one of the two options (either <strong id="mrs_01_0794__b10659162084810">update</strong> or <strong id="mrs_01_0794__b74221823184820">overwrite</strong>) that is added:</p>
<p id="mrs_01_0794__p168261893449">Assume that the structure of a file at the source location is as follows:</p>
<pre class="screen" id="mrs_01_0794__screen43699485012">hdfs://cluster1/source/first/1
hdfs://cluster1/source/first/2
hdfs://cluster1/source/second/10
hdfs://cluster1/source/second/20</pre>
<p id="mrs_01_0794__p139781843175010">Commands without options are as follows:</p>
<pre class="screen" id="mrs_01_0794__screen209941030145116">hadoop distcp hdfs://cluster1/source/first hdfs://cluster1/source/second hdfs://cluster2/target</pre>
<p id="mrs_01_0794__p977510176523">By default, the preceding command creates the <strong id="mrs_01_0794__b1762541974618">first</strong> and <strong id="mrs_01_0794__b782332124620">second</strong> folders at the target location. Therefore, the copy results are as follows:</p>
<pre class="screen" id="mrs_01_0794__screen857481675420">hdfs://cluster2/target/first/1
hdfs://cluster2/target/first/2
hdfs://cluster2/target/second/10
hdfs://cluster2/target/second/20</pre>
<p id="mrs_01_0794__p13421151785512">The command with any one of the two options (for example, <strong id="mrs_01_0794__b61995312460">update</strong>) is as follows:</p>
<pre class="screen" id="mrs_01_0794__screen7206919165616">hadoop distcp -update hdfs://cluster1/source/first hdfs://cluster1/source/second hdfs://cluster2/target</pre>
<p id="mrs_01_0794__p5161620577">The preceding command copies only the content at the source location to the target location. Therefore, the copy results are as follows:</p>
<pre class="screen" id="mrs_01_0794__screen22074116598">hdfs://cluster2/target/1
hdfs://cluster2/target/2
hdfs://cluster2/target/10
hdfs://cluster2/target/20</pre>
<div class="note" id="mrs_01_0794__note4652155218018"><img src="public_sys-resources/note_3.0-en-us.png"><span class="notetitle"> </span><div class="notebody"><ul id="mrs_01_0794__ul186441454477"><li id="mrs_01_0794__li56441456479">If files with the same name exist in multiple source locations, the DistCp command fails.</li></ul>
<ul id="mrs_01_0794__ul3512507479"><li id="mrs_01_0794__li1069781185613">If neither <strong id="mrs_01_0794__b15143053154918">update</strong> nor <strong id="mrs_01_0794__b19501205518495">overwrite</strong> is used and the file to be copied already exists in the target location, the file will be skipped.</li><li id="mrs_01_0794__li998319417565">When <strong id="mrs_01_0794__b128161227105012">update</strong> is used, if the file to be copied already exists in the target location but the file content is different, the file content in the target location is updated.</li><li id="mrs_01_0794__li751105014719">When <strong id="mrs_01_0794__b17970193815912">overwrite</strong> is used, if the file to be copied already exists in the target location, the file in the target location is still overwritten.</li></ul>
</div></div>
</li><li id="mrs_01_0794__li251694918152">The following table describes other command options:
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="mrs_01_0794__te8e9526e6a194fbaaae82c9ef3eac29c" frame="border" border="1" rules="all"><caption><b>Table 1 </b>Other command options</caption><thead align="left"><tr id="mrs_01_0794__r86fb9b6d7172413ca889334e0b64da8e"><th align="left" class="cellrowborder" valign="top" width="30.080000000000002%" id="mcps1.3.4.2.5.1.2.3.1.1"><p id="mrs_01_0794__a706d40de42eb435eb92a47f3b03d32f3">Option</p>
</th>
<th align="left" class="cellrowborder" valign="top" width="69.92%" id="mcps1.3.4.2.5.1.2.3.1.2"><p id="mrs_01_0794__a428e8878dc8c447da30f15d0824984c9"><strong id="mrs_01_0794__b16887047165419">Description</strong></p>
</th>
</tr>
</thead>
<tbody><tr id="mrs_01_0794__r5b18ce7c641047428a3af9078d726920"><td class="cellrowborder" valign="top" width="30.080000000000002%" headers="mcps1.3.4.2.5.1.2.3.1.1 "><p id="mrs_01_0794__a480bafc889e3412fa38741ff1688ae87">-p[rbugpcaxtq]</p>
</td>
<td class="cellrowborder" valign="top" width="69.92%" headers="mcps1.3.4.2.5.1.2.3.1.2 "><p id="mrs_01_0794__p729774019144">When <strong id="mrs_01_0794__b15777155465412">-update</strong> is also used, the status information of a copied file is updated even if the content of the copied file is not updated.</p>
<p id="mrs_01_0794__p136411146175711"><strong id="mrs_01_0794__b1864144645716">r</strong>: number of copies</p>
<p id="mrs_01_0794__p1931444865712"><strong id="mrs_01_0794__b2314194811570">b</strong>: size of a block</p>
<p id="mrs_01_0794__p2867104995720"><strong id="mrs_01_0794__b108671749135717">u</strong>: user to which the files belong</p>
<p id="mrs_01_0794__p17352105315718"><strong id="mrs_01_0794__b12352165355711">g</strong>: user group to which the user belongs</p>
<p id="mrs_01_0794__p15941566577"><strong id="mrs_01_0794__b694205612571">p</strong>: permission</p>
<p id="mrs_01_0794__p117611758185717"><strong id="mrs_01_0794__b177611558155713">c</strong>: check and type</p>
<p id="mrs_01_0794__p18830190195817"><strong id="mrs_01_0794__b15830160105813">a</strong>: access control</p>
<p id="mrs_01_0794__p1047420416588"><strong id="mrs_01_0794__b547414165814">t</strong>: timestamp</p>
<p id="mrs_01_0794__p1638711241308"><strong id="mrs_01_0794__b1788545314567">q</strong>: quota information</p>
</td>
</tr>
<tr id="mrs_01_0794__r0c3ebfe579f64409ae319329123dac63"><td class="cellrowborder" valign="top" width="30.080000000000002%" headers="mcps1.3.4.2.5.1.2.3.1.1 "><p id="mrs_01_0794__a242129ef11394dd7925150ea2d4998c3">-i</p>
</td>
<td class="cellrowborder" valign="top" width="69.92%" headers="mcps1.3.4.2.5.1.2.3.1.2 "><p id="mrs_01_0794__p144309346427">Failures ignored during copying</p>
</td>
</tr>
<tr id="mrs_01_0794__r28b7fbc9a3f241afa9dcb2567f0ce7d1"><td class="cellrowborder" valign="top" width="30.080000000000002%" headers="mcps1.3.4.2.5.1.2.3.1.1 "><p id="mrs_01_0794__a3429411a2ae348f1a3f9733924c74855">-log &lt;logdir&gt;</p>
</td>
<td class="cellrowborder" valign="top" width="69.92%" headers="mcps1.3.4.2.5.1.2.3.1.2 "><p id="mrs_01_0794__aa63fb3d6ff374890a5ebc240ae9a438a">Path of the specified log</p>
</td>
</tr>
<tr id="mrs_01_0794__r9aa69838aa474c329f7074b65c38f433"><td class="cellrowborder" valign="top" width="30.080000000000002%" headers="mcps1.3.4.2.5.1.2.3.1.1 "><p id="mrs_01_0794__ac3a56e3435ef45b899f05d6f33a5f0f0">-v</p>
</td>
<td class="cellrowborder" valign="top" width="69.92%" headers="mcps1.3.4.2.5.1.2.3.1.2 "><p id="mrs_01_0794__a52e06ea33d44425a8e96c5fbd27c2706">Additional information in the specified log</p>
</td>
</tr>
<tr id="mrs_01_0794__r8c9b807b44aa49778a079bda2f45c1db"><td class="cellrowborder" valign="top" width="30.080000000000002%" headers="mcps1.3.4.2.5.1.2.3.1.1 "><p id="mrs_01_0794__p999216223441">-m &lt;num_maps&gt;</p>
</td>
<td class="cellrowborder" valign="top" width="69.92%" headers="mcps1.3.4.2.5.1.2.3.1.2 "><p id="mrs_01_0794__a7f0de4e9eeb14dfdbb70a56666e5a380">Maximum number of concurrent copy tasks that can be executed at the same time</p>
</td>
</tr>
<tr id="mrs_01_0794__row688415165585"><td class="cellrowborder" valign="top" width="30.080000000000002%" headers="mcps1.3.4.2.5.1.2.3.1.1 "><p id="mrs_01_0794__p15884101655814">-numListstatusThreads</p>
</td>
<td class="cellrowborder" valign="top" width="69.92%" headers="mcps1.3.4.2.5.1.2.3.1.2 "><p id="mrs_01_0794__p68841916115811">Number of threads for constituting the list of copied files. This option increases the running speed of DistCp.</p>
</td>
</tr>
<tr id="mrs_01_0794__rfc7e6c467b004760ba274cb6034265f8"><td class="cellrowborder" valign="top" width="30.080000000000002%" headers="mcps1.3.4.2.5.1.2.3.1.1 "><p id="mrs_01_0794__aa2eda2895cfc41df8134452d39682ca7">-overwrite</p>
</td>
<td class="cellrowborder" valign="top" width="69.92%" headers="mcps1.3.4.2.5.1.2.3.1.2 "><p id="mrs_01_0794__a1fad969b48fb4d79adfc54296e645d7d">File at the target location that is to be overwritten</p>
</td>
</tr>
<tr id="mrs_01_0794__r35461f5fecce4c9c814a1e2f8f98da6b"><td class="cellrowborder" valign="top" width="30.080000000000002%" headers="mcps1.3.4.2.5.1.2.3.1.1 "><p id="mrs_01_0794__ac0777ae32aea459f902e0dfa0fbf2d2b">-update</p>
</td>
<td class="cellrowborder" valign="top" width="69.92%" headers="mcps1.3.4.2.5.1.2.3.1.2 "><p id="mrs_01_0794__af7226af4cbdf465ca6bd832f133ee668">A file at the target location is updated if the size and check of a file at the source location are different from those of the file at the target location.</p>
</td>
</tr>
<tr id="mrs_01_0794__r260833e490d54ccda5df108a605e4f52"><td class="cellrowborder" valign="top" width="30.080000000000002%" headers="mcps1.3.4.2.5.1.2.3.1.1 "><p id="mrs_01_0794__a12ccbce5c42742b6b95c7b460b999e5b">-append</p>
</td>
<td class="cellrowborder" valign="top" width="69.92%" headers="mcps1.3.4.2.5.1.2.3.1.2 "><p id="mrs_01_0794__a48153fa5295c47ed88d7e3c1b7ac1fc3">When <strong id="mrs_01_0794__b491518442219">-update</strong> is also used, the content of the file at the source location is added to the file at the target location.</p>
</td>
</tr>
<tr id="mrs_01_0794__r1ec0fd8b3dae46ed85433c708837a967"><td class="cellrowborder" valign="top" width="30.080000000000002%" headers="mcps1.3.4.2.5.1.2.3.1.1 "><p id="mrs_01_0794__aadf674d0efe54fb3960a673a49e9114b">-f &lt;urilist_uri&gt;</p>
</td>
<td class="cellrowborder" valign="top" width="69.92%" headers="mcps1.3.4.2.5.1.2.3.1.2 "><p id="mrs_01_0794__a8a173172debf497e83385f312f8a34b1">Content of the <strong id="mrs_01_0794__b0395130571">&lt;urilist_uri&gt;</strong> file is used as the file list to be copied.</p>
</td>
</tr>
<tr id="mrs_01_0794__rd030ccf4b0584c12b83f6e7a2a5bde89"><td class="cellrowborder" valign="top" width="30.080000000000002%" headers="mcps1.3.4.2.5.1.2.3.1.1 "><p id="mrs_01_0794__a7220d06cb06047f192faa3c654b5d29d">-filters</p>
</td>
<td class="cellrowborder" valign="top" width="69.92%" headers="mcps1.3.4.2.5.1.2.3.1.2 "><p id="mrs_01_0794__adf4e42e9bb664b59a76edb077bddbf72">A local file is specified whose content contains multiple regular expressions. If the file to be copied matches a regular expression, the file is not copied.</p>
</td>
</tr>
<tr id="mrs_01_0794__row182421450135818"><td class="cellrowborder" valign="top" width="30.080000000000002%" headers="mcps1.3.4.2.5.1.2.3.1.1 "><p id="mrs_01_0794__p0242175010585">-async</p>
</td>
<td class="cellrowborder" valign="top" width="69.92%" headers="mcps1.3.4.2.5.1.2.3.1.2 "><p id="mrs_01_0794__p19242155011585">The <strong id="mrs_01_0794__b959422610113">distcp</strong> command is run asynchronously.</p>
</td>
</tr>
<tr id="mrs_01_0794__row030265512586"><td class="cellrowborder" valign="top" width="30.080000000000002%" headers="mcps1.3.4.2.5.1.2.3.1.1 "><p id="mrs_01_0794__p1339126134">-atomic {-tmp &lt;tmp_dir&gt;}</p>
</td>
<td class="cellrowborder" valign="top" width="69.92%" headers="mcps1.3.4.2.5.1.2.3.1.2 "><p id="mrs_01_0794__p1330215525815">An atomic copy can be performed. You can add a temporary directory during copying.</p>
</td>
</tr>
<tr id="mrs_01_0794__row1122431195916"><td class="cellrowborder" valign="top" width="30.080000000000002%" headers="mcps1.3.4.2.5.1.2.3.1.1 "><p id="mrs_01_0794__p72246125913">-bandwidth</p>
</td>
<td class="cellrowborder" valign="top" width="69.92%" headers="mcps1.3.4.2.5.1.2.3.1.2 "><p id="mrs_01_0794__p1322410120597">The transmission bandwidth of each copy task. Unit: MB/s.</p>
</td>
</tr>
<tr id="mrs_01_0794__row670875205915"><td class="cellrowborder" valign="top" width="30.080000000000002%" headers="mcps1.3.4.2.5.1.2.3.1.1 "><p id="mrs_01_0794__p970875115911">-delete</p>
</td>
<td class="cellrowborder" valign="top" width="69.92%" headers="mcps1.3.4.2.5.1.2.3.1.2 "><p id="mrs_01_0794__p1670816515593">The files that exist in the target location is deleted but do not exist in the source location. This option is usually used with <strong id="mrs_01_0794__b8989134315133">-update</strong>, and indicates that files at the source location are synchronized with those at the target location and the redundant files at the target location are deleted.</p>
</td>
</tr>
<tr id="mrs_01_0794__row1850515915918"><td class="cellrowborder" valign="top" width="30.080000000000002%" headers="mcps1.3.4.2.5.1.2.3.1.1 "><p id="mrs_01_0794__p35051694595">-diff &lt;oldSnapshot&gt; &lt;newSnapshot&gt;</p>
</td>
<td class="cellrowborder" valign="top" width="69.92%" headers="mcps1.3.4.2.5.1.2.3.1.2 "><p id="mrs_01_0794__p3505119115918">The differences between the old and new versions are copied to a file in the old version at the target location.</p>
</td>
</tr>
<tr id="mrs_01_0794__row14590529317"><td class="cellrowborder" valign="top" width="30.080000000000002%" headers="mcps1.3.4.2.5.1.2.3.1.1 "><p id="mrs_01_0794__p15590729913">-skipcrccheck</p>
</td>
<td class="cellrowborder" valign="top" width="69.92%" headers="mcps1.3.4.2.5.1.2.3.1.2 "><p id="mrs_01_0794__p135902291618">Whether to skip the cyclic redundancy check (CRC) between the source file and the target file.</p>
</td>
</tr>
<tr id="mrs_01_0794__row374034415"><td class="cellrowborder" valign="top" width="30.080000000000002%" headers="mcps1.3.4.2.5.1.2.3.1.1 "><p id="mrs_01_0794__p59921449111720">-strategy {dynamic|uniformsize}</p>
</td>
<td class="cellrowborder" valign="top" width="69.92%" headers="mcps1.3.4.2.5.1.2.3.1.2 "><p id="mrs_01_0794__p18741341910">The policy for copying a task. The default policy is <strong id="mrs_01_0794__b10871146161610">uniformsize</strong>, that is, each copy task copies the same number of bytes.</p>
</td>
</tr>
</tbody>
</table>
</div>
</li></ol>
</div>
<div class="section" id="mrs_01_0794__section29499362105048"><h4 class="sectiontitle">FAQs of DistCp</h4><ol id="mrs_01_0794__ol744197105431"><li id="mrs_01_0794__li15025511105431">When you run the DistCp command, if the content of some copied files is large, you are advised to change the timeout period of MapReduce that executes the copy task. It can be implemented by specifying the <strong id="mrs_01_0794__b514091424915">mapreduce.task.timeout</strong> in the DistCp command. For example, run the following command to change the timeout to 30 minutes:<pre class="screen" id="mrs_01_0794__screen720661703113">hadoop distcp -Dmapreduce.task.timeout=1800000 hdfs://cluster1/source hdfs://cluster2/target</pre>
<p id="mrs_01_0794__p570125819494">Or, you can also use <strong id="mrs_01_0794__b798604252012">filters</strong> to exclude the large files out of the copy process. The command example is as follows:</p>
<pre class="screen" id="mrs_01_0794__screen3701185874916">hadoop distcp -filters /opt/client/filterfile hdfs://cluster1/source hdfs://cluster2/target</pre>
<p id="mrs_01_0794__p9701558204916">In the preceding command, <em id="mrs_01_0794__i94271457172013">filterfile</em> indicates a local file, which contains multiple expressions used to match the path of a file that is not copied. The following is an example:</p>
<pre class="screen" id="mrs_01_0794__screen1770111583499">.*excludeFile1.*
.*excludeFile2.*</pre>
</li><li id="mrs_01_0794__li41575152105932">If the DistCp command unexpectedly quits, the error message "java.lang.OutOfMemoryError" is displayed.<p id="mrs_01_0794__p267795031718"><a name="mrs_01_0794__li41575152105932"></a><a name="li41575152105932"></a>This is because the memory required for running the copy command exceeds the preset memory limit (default value: 128 MB). You can change the memory upper limit of the client by modifying <span class="parmname" id="mrs_01_0794__parmname343303011723"><b>CLIENT_GC_OPTS</b></span> in <em id="mrs_01_0794__i2683123115914">&lt;Client installation path&gt;</em><strong id="mrs_01_0794__b1098183315222">/HDFS/component_env</strong>. For example, if you want to set the memory upper limit to 1 GB, refer to the following configuration:</p>
<pre class="screen" id="mrs_01_0794__screen14993084173648">CLIENT_GC_OPTS="-Xmx1G"</pre>
<p id="mrs_01_0794__p18677165061718">After the modification, run the following command to make the modification take effect:</p>
<p id="mrs_01_0794__p44359303111033"><strong id="mrs_01_0794__b08971554172214">source </strong>{<em id="mrs_01_0794__i179021754162219">Client installation path</em>}<strong id="mrs_01_0794__b119023548227">/bigdata_env</strong></p>
</li><li id="mrs_01_0794__li2486129153919">When the dynamic policy is used to run the DistCp command, the command exits unexpectedly and the error message "Too many chunks created with splitRatio" is displayed.<p id="mrs_01_0794__p742804114555"><a name="mrs_01_0794__li2486129153919"></a><a name="li2486129153919"></a>The cause of this problem is that the value of <span class="parmname" id="mrs_01_0794__parmname3428154175517"><b>distcp.dynamic.max.chunks.tolerable</b></span> (default value: 20,000) is less than the value of <span class="parmname" id="mrs_01_0794__parmname204281841185517"><b>distcp.dynamic.split.ratio</b></span> (default value: 2) multiplied by the number of Maps. This problem occurs when the number of Maps exceeds 10,000. You can use the <strong id="mrs_01_0794__b98121153122414">-m</strong> parameter to reduce the number of Maps to less than 10,000.</p>
<pre class="screen" id="mrs_01_0794__screen158024065619">hadoop distcp -strategy dynamic -m 9500 hdfs://cluster1/source hdfs://cluster2/target</pre>
<p id="mrs_01_0794__p1348662919392">Alternatively, you can use the <strong id="mrs_01_0794__b114779832517">-D</strong> parameter to set <span class="parmname" id="mrs_01_0794__parmname1964033525513"><b>distcp.dynamic.max.chunks.tolerable</b></span> to a large value.</p>
<pre class="screen" id="mrs_01_0794__screen20138728564">hadoop distcp -Ddistcp.dynamic.max.chunks.tolerable=30000 -strategy dynamic hdfs://cluster1/source hdfs://cluster2/target</pre>
</li></ol>
</div>
</div>
<div>
<div class="familylinks">
<div class="parentlink"><strong>Parent topic:</strong> <a href="mrs_01_0790.html">Using HDFS</a></div>
</div>
</div>