forked from docs/doc-exports
Reviewed-by: Pruthi, Vineet <vineet.pruthi@t-systems.com> Co-authored-by: luhuayi <luhuayi@huawei.com> Co-committed-by: luhuayi <luhuayi@huawei.com>
19 lines
5.2 KiB
HTML
19 lines
5.2 KiB
HTML
<a name="EN-US_TOPIC_0000001811610225"></a><a name="EN-US_TOPIC_0000001811610225"></a>
|
|
|
|
<h1 class="topictitle1">Overview</h1>
|
|
<div id="body8662426"><div class="p" id="EN-US_TOPIC_0000001811610225__en-us_topic_0000001188482216_p8818114882415"><span id="EN-US_TOPIC_0000001811610225__ph13161122917">GaussDB(DWS)</span> allows you to export ORC and Parquet data to MRS using an HDFS foreign table. You can specify the export mode and export data format in the foreign table. Data is exported from GaussDB(DWS) in parallel using multiple DNs and stored in HDFS. In this way, the overall export performance is improved.<ul id="EN-US_TOPIC_0000001811610225__en-us_topic_0000001188482216_ul36531438163113"><li id="EN-US_TOPIC_0000001811610225__en-us_topic_0000001188482216_en-us_topic_0000001145491123_en-us_topic_0117407657_l31f934f062fc4d73bb806420f4e0ab2a">The CN only plans data export tasks and delivers the tasks to DNs for execution. In this case, the CN is released to process external requests.</li><li id="EN-US_TOPIC_0000001811610225__en-us_topic_0000001188482216_en-us_topic_0000001145491123_en-us_topic_0117407657_ldd50fb5c0ad54fe6834d26dfefb17575">Every DN is involved in data export, and the computing capabilities and bandwidths of all the DNs are fully leveraged to export data.</li></ul>
|
|
<ul id="EN-US_TOPIC_0000001811610225__en-us_topic_0000001188482216_en-us_topic_0117407657_u8cb2994a33d84a4587e0a4ed60c9219f"><li id="EN-US_TOPIC_0000001811610225__en-us_topic_0000001188482216_li03395213277">Multiple HDFS servers can export data concurrently. The export path can be empty. The naming rule of the path must be the same as that of the exported file.</li><li id="EN-US_TOPIC_0000001811610225__en-us_topic_0000001188482216_li1659182513013">MRS connects to GaussDB(DWS) cluster nodes. The export rate is affected by the network bandwidth.</li><li id="EN-US_TOPIC_0000001811610225__en-us_topic_0000001188482216_li529717211107">Data files in the ORC or Parquet format are supported.</li></ul>
|
|
</div>
|
|
<div class="note" id="EN-US_TOPIC_0000001811610225__en-us_topic_0000001188482188_note1857455514420"><img src="public_sys-resources/note_3.0-en-us.png"><span class="notetitle"> </span><div class="notebody"><p id="EN-US_TOPIC_0000001811610225__en-us_topic_0000001188482188_p557425513447">This section uses the ORC format as an example to describe how to export data. The method for exporting Parquet data is similar. Parquet data can be exported from clusters of version 9.1.0 or later.</p>
|
|
</div></div>
|
|
<div class="section" id="EN-US_TOPIC_0000001811610225__en-us_topic_0000001188482216_section16167108388"><h4 class="sectiontitle">Naming Rules of Exported Files</h4><p id="EN-US_TOPIC_0000001811610225__en-us_topic_0000001188482216_p1711432817103">The rules for naming ORC and Parquet data files exported from <span id="EN-US_TOPIC_0000001811610225__ph4423185051013">GaussDB(DWS)</span> are as follows:</p>
|
|
</div>
|
|
<ol id="EN-US_TOPIC_0000001811610225__en-us_topic_0000001188482216_ol820017121092"><li id="EN-US_TOPIC_0000001811610225__en-us_topic_0000001188482216_li534222815912">Data exported to MRS (HDFS): When data is exported from a DN, the data is stored in HDFS in the segment format. The file is named in the format of <strong id="EN-US_TOPIC_0000001811610225__b1539620111111">mpp_</strong><em id="EN-US_TOPIC_0000001811610225__i9540320171111">Database name</em><strong id="EN-US_TOPIC_0000001811610225__b1854019208112">_</strong><em id="EN-US_TOPIC_0000001811610225__i13540112017115">Schema name</em><strong id="EN-US_TOPIC_0000001811610225__b17540620151115">_</strong><em id="EN-US_TOPIC_0000001811610225__i354122011115">Table name</em><strong id="EN-US_TOPIC_0000001811610225__b15541920131119">_</strong><em id="EN-US_TOPIC_0000001811610225__i9541920101115">Node name</em><strong id="EN-US_TOPIC_0000001811610225__b354192041117">_</strong><em id="EN-US_TOPIC_0000001811610225__i15421720121116">n</em><strong id="EN-US_TOPIC_0000001811610225__b3542182017118">_</strong><em id="EN-US_TOPIC_0000001811610225__i94521524161212">UUID</em><strong id="EN-US_TOPIC_0000001811610225__b6679132015126">.</strong><em id="EN-US_TOPIC_0000001811610225__i15883359111117">Data format</em>. <em id="EN-US_TOPIC_0000001811610225__en-us_topic_0000001188482216_i166961840154051">n</em> is a natural number starting from 0 in ascending order, for example, 0, 1, 2, 3. The UUID should be a standard one, comprising of 32 hexadecimal characters and divided into five segments by hyphens (-). The format should be 8-4-4-4-12.</li><li id="EN-US_TOPIC_0000001811610225__en-us_topic_0000001188482216_li838914361915">You are advised to export data from different clusters or databases to different paths. The maximum size of a single file in ORC or Parquet format is about 256 MB. (This is a soft constraint and may exceed a little in actual services.)</li><li id="EN-US_TOPIC_0000001811610225__en-us_topic_0000001188482216_li14301185618912">After the export is complete, the <strong id="EN-US_TOPIC_0000001811610225__en-us_topic_0000001188482216_b39333862954051">_SUCCESS</strong> file is generated.</li></ol>
|
|
</div>
|
|
<div>
|
|
<div class="familylinks">
|
|
<div class="parentlink"><strong>Parent topic:</strong> <a href="dws_04_0159.html">Exporting ORC and Parquet Data to MRS</a></div>
|
|
</div>
|
|
</div>
|
|
|