forked from docs/doc-exports
Reviewed-by: Pruthi, Vineet <vineet.pruthi@t-systems.com> Co-authored-by: luhuayi <luhuayi@huawei.com> Co-committed-by: luhuayi <luhuayi@huawei.com>
77 lines
9.2 KiB
HTML
77 lines
9.2 KiB
HTML
<a name="EN-US_TOPIC_0000001764491676"></a><a name="EN-US_TOPIC_0000001764491676"></a>
|
|
|
|
<h1 class="topictitle1">Creating a Hudi Data Description (Foreign Table)</h1>
|
|
<div id="body0000001591976589"><p id="EN-US_TOPIC_0000001764491676__p19777144574716">A foreign table maps data on OBS. GaussDB(DWS) accesses Hudi data on OBS through foreign tables. For details, see section "CREATE FOREIGN TABLE (SQL on OBS or Hadoop)" in the <em id="EN-US_TOPIC_0000001764491676__i19943103473118">Data Warehouse Service SQL Syntax Reference</em>.</p>
|
|
<p id="EN-US_TOPIC_0000001764491676__p1350416565336">Compared with OBS foreign tables, you only need to set <strong id="EN-US_TOPIC_0000001764491676__b4919144010187">format</strong> to <strong id="EN-US_TOPIC_0000001764491676__b16650543141813">hudi</strong> for Hudi foreign tables. For Hudi bucket tables, you need to set <strong id="EN-US_TOPIC_0000001764491676__b73131348196">distribute by</strong> to <strong id="EN-US_TOPIC_0000001764491676__b171479712192">hash(bk_col1,bk_col2...)</strong>. Only 9.1.0.100 and later versions support Hudi bucket tables.</p>
|
|
<div class="section" id="EN-US_TOPIC_0000001764491676__section41158485227"><h4 class="sectiontitle">Obtaining the Definitions of Tables on MRS.</h4><p id="EN-US_TOPIC_0000001764491676__p345645216285">Hudi foreign tables on GaussDB(DWS) are read-only. Before creating a foreign table, you need to specify the number of fields defined in the target data and the type of each field. <span id="EN-US_TOPIC_0000001764491676__ph472865420176">A Hudi foreign table supports a maximum of 5000 columns.</span></p>
|
|
<p id="EN-US_TOPIC_0000001764491676__p13638150102210">For example, for a Hudi table on MRS, you can use spark-sql to query the original table definitions:</p>
|
|
<div class="codecoloring" codetype="Sql" id="EN-US_TOPIC_0000001764491676__screen476432902318"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span></pre></div></td><td class="code"><div><pre><span></span><span class="k">SHOW</span><span class="w"> </span><span class="k">create</span><span class="w"> </span><span class="k">table</span><span class="w"> </span><span class="n">rtd_mfdt_int_currency_t</span><span class="p">;</span>
|
|
</pre></div></td></tr></table></div>
|
|
</div>
|
|
</div>
|
|
<div class="section" id="EN-US_TOPIC_0000001764491676__section313211121246"><h4 class="sectiontitle">Compiling GaussDB(DWS) Table Definitions</h4><ul id="EN-US_TOPIC_0000001764491676__ul16980151473913"><li id="EN-US_TOPIC_0000001764491676__li598041417390">Non-bucket table<p id="EN-US_TOPIC_0000001764491676__p420851519241"><a name="EN-US_TOPIC_0000001764491676__li598041417390"></a><a name="li598041417390"></a>Copy the definitions of all columns in the MRS table, perform proper type conversion to adapt to the GaussDB(DWS) syntax, and create an OBS foreign table.</p>
|
|
<div class="codecoloring" codetype="Sql" id="EN-US_TOPIC_0000001764491676__screen17808134352418"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal"> 1</span>
|
|
<span class="normal"> 2</span>
|
|
<span class="normal"> 3</span>
|
|
<span class="normal"> 4</span>
|
|
<span class="normal"> 5</span>
|
|
<span class="normal"> 6</span>
|
|
<span class="normal"> 7</span>
|
|
<span class="normal"> 8</span>
|
|
<span class="normal"> 9</span>
|
|
<span class="normal">10</span>
|
|
<span class="normal">11</span>
|
|
<span class="normal">12</span></pre></div></td><td class="code"><div><pre><span></span><span class="k">CREATE</span><span class="w"> </span><span class="k">FOREIGN</span><span class="w"> </span><span class="k">TABLE</span><span class="w"> </span><span class="n">rtd_mfdt_int_currency_ft</span><span class="p">(</span>
|
|
<span class="n">_hoodie_commit_time</span><span class="w"> </span><span class="nb">text</span><span class="p">,</span>
|
|
<span class="n">_hoodie_commit_seqno</span><span class="w"> </span><span class="nb">text</span><span class="p">,</span>
|
|
<span class="n">_hoodie_record_key</span><span class="w"> </span><span class="nb">text</span><span class="p">,</span>
|
|
<span class="n">_hoodie_partition_path</span><span class="w"> </span><span class="nb">text</span><span class="p">,</span>
|
|
<span class="n">_hoodie_file_name</span><span class="w"> </span><span class="nb">text</span><span class="p">,</span>
|
|
<span class="p">...</span>
|
|
<span class="p">)</span><span class="n">SERVER</span><span class="w"> </span><span class="n">obs_server</span><span class="w"> </span><span class="k">OPTIONS</span><span class="w"> </span><span class="p">(</span>
|
|
<span class="n">foldername</span><span class="w"> </span><span class="s1">'/erpgc-obs-test-01/s000/sbi_fnd/rtd_mfdt_int_currency_t/'</span><span class="p">,</span>
|
|
<span class="n">format</span><span class="w"> </span><span class="s1">'hudi'</span><span class="p">,</span>
|
|
<span class="k">encoding</span><span class="w"> </span><span class="s1">'utf-8'</span>
|
|
<span class="p">)</span><span class="n">distribute</span><span class="w"> </span><span class="k">by</span><span class="w"> </span><span class="n">roundrobin</span><span class="p">;</span>
|
|
</pre></div></td></tr></table></div>
|
|
</div>
|
|
<p id="EN-US_TOPIC_0000001764491676__p2467174415309"><strong id="EN-US_TOPIC_0000001764491676__b44914683911">foldername</strong> indicates the storage path of the Hudi data on OBS, which corresponds to <strong id="EN-US_TOPIC_0000001764491676__b551921714402">LOCATION</strong> in the Spark-sql table definitions of MRS. The path must end with a slash (/).</p>
|
|
</li></ul>
|
|
<ul id="EN-US_TOPIC_0000001764491676__ul244219524430"><li id="EN-US_TOPIC_0000001764491676__li844212527437">Bucket table<p id="EN-US_TOPIC_0000001764491676__p05937431434"><a name="EN-US_TOPIC_0000001764491676__li844212527437"></a><a name="li844212527437"></a>Copy the definitions of all columns in the MRS table, perform proper type conversion to adapt to the GaussDB(DWS) syntax, create an OBS foreign table, and specify the hash distribution mode.</p>
|
|
<div class="codecoloring" codetype="Sql" id="EN-US_TOPIC_0000001764491676__screen359374314319"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal"> 1</span>
|
|
<span class="normal"> 2</span>
|
|
<span class="normal"> 3</span>
|
|
<span class="normal"> 4</span>
|
|
<span class="normal"> 5</span>
|
|
<span class="normal"> 6</span>
|
|
<span class="normal"> 7</span>
|
|
<span class="normal"> 8</span>
|
|
<span class="normal"> 9</span>
|
|
<span class="normal">10</span>
|
|
<span class="normal">11</span>
|
|
<span class="normal">12</span></pre></div></td><td class="code"><div><pre><span></span><span class="k">CREATE</span><span class="w"> </span><span class="k">FOREIGN</span><span class="w"> </span><span class="k">TABLE</span><span class="w"> </span><span class="n">rtd_mfdt_int_currency_ft</span><span class="p">(</span>
|
|
<span class="n">_hoodie_commit_time</span><span class="w"> </span><span class="nb">text</span><span class="p">,</span>
|
|
<span class="n">_hoodie_commit_seqno</span><span class="w"> </span><span class="nb">text</span><span class="p">,</span>
|
|
<span class="n">_hoodie_record_key</span><span class="w"> </span><span class="nb">text</span><span class="p">,</span>
|
|
<span class="n">_hoodie_partition_path</span><span class="w"> </span><span class="nb">text</span><span class="p">,</span>
|
|
<span class="n">_hoodie_file_name</span><span class="w"> </span><span class="nb">text</span><span class="p">,</span>
|
|
<span class="p">...</span>
|
|
<span class="p">)</span><span class="n">SERVER</span><span class="w"> </span><span class="n">obs_server</span><span class="w"> </span><span class="k">OPTIONS</span><span class="w"> </span><span class="p">(</span>
|
|
<span class="n">foldername</span><span class="w"> </span><span class="s1">'/erpgc-obs-test-01/s000/sbi_fnd/rtd_mfdt_int_currency_t/'</span><span class="p">,</span>
|
|
<span class="n">format</span><span class="w"> </span><span class="s1">'hudi'</span><span class="p">,</span>
|
|
<span class="k">encoding</span><span class="w"> </span><span class="s1">'utf-8'</span>
|
|
<span class="p">)</span><span class="n">distribute</span><span class="w"> </span><span class="k">by</span><span class="w"> </span><span class="n">hash</span><span class="p">(</span><span class="n">bk_col1</span><span class="p">,</span><span class="n">bk_col2</span><span class="p">...);</span>
|
|
</pre></div></td></tr></table></div>
|
|
</div>
|
|
<p id="EN-US_TOPIC_0000001764491676__p11593194334313"><strong id="EN-US_TOPIC_0000001764491676__b4202201013207">foldername</strong> indicates the storage path of the Hudi data on OBS, which corresponds to <strong id="EN-US_TOPIC_0000001764491676__b18203151092014">LOCATION</strong> in the Spark-sql table definitions of MRS. The path must end with a slash (/).</p>
|
|
<p id="EN-US_TOPIC_0000001764491676__p892213505811"><strong id="EN-US_TOPIC_0000001764491676__b1128515300202">distribute by</strong> indicates the distribution column of the bucket table. The value must be the same as that of <strong id="EN-US_TOPIC_0000001764491676__b1891945010205">hoodie.bucket.index.hash.field</strong> in the <strong id="EN-US_TOPIC_0000001764491676__b12493356192016">foldername/.hoodie/hoodie.index.properties</strong> file.</p>
|
|
</li></ul>
|
|
</div>
|
|
</div>
|
|
<div>
|
|
<div class="familylinks">
|
|
<div class="parentlink"><strong>Parent topic:</strong> <a href="dws_04_1069.html">SQL on Hudi</a></div>
|
|
</div>
|
|
</div>
|
|
|