forked from docs/doc-exports
Reviewed-by: Hasko, Vladimir <vladimir.hasko@t-systems.com> Co-authored-by: Yang, Tong <yangtong2@huawei.com> Co-committed-by: Yang, Tong <yangtong2@huawei.com>
34 lines
4.4 KiB
HTML
34 lines
4.4 KiB
HTML
<a name="mrs_01_24099"></a><a name="mrs_01_24099"></a>
|
|
|
|
<h1 class="topictitle1">Reading MOR Table Views</h1>
|
|
<div id="body0000001104703162"><p id="mrs_01_24099__p106171637185916">After the MOR table is synchronized to Hive, the following two tables are synchronized to Hive: <em id="mrs_01_24099__i1643884812120">Table name</em><strong id="mrs_01_24099__b164431848013">_rt</strong> and <em id="mrs_01_24099__i244420488117">Table name</em><strong id="mrs_01_24099__b4444348715">_ro</strong>. The table suffixed with <strong id="mrs_01_24099__b2099087475112413">rt</strong> indicates the real-time view, and the table suffixed with <strong id="mrs_01_24099__b823526155112413">ro</strong> indicates the read-optimized view. For example, the name of the Hudi table to be synchronized to Hive is <strong id="mrs_01_24099__b380102999112413">test</strong>. After the table is synchronized to Hive, two more tables <strong id="mrs_01_24099__b1405481880112413">test_rt</strong> and <strong id="mrs_01_24099__b393048769112413">test_ro</strong> are generated in the Hive table.</p>
|
|
<ul id="mrs_01_24099__ul4185140152916"><li id="mrs_01_24099__li21851140172919">Reading the real-time view (using Hive and SparkSQL as an example): Directly read the Hudi table with suffix <strong id="mrs_01_24099__b95101752715">_rt</strong> stored in Hive.<pre class="screen" id="mrs_01_24099__screen972515233397">select count(*) from test_rt;</pre>
|
|
</li></ul>
|
|
<ul id="mrs_01_24099__ul57469424291"><li id="mrs_01_24099__li17746642162910">Reading the real-time view (using the Spark DataSource API as an example): The operations are the same as those for the COW table. For details, see the operations for the COW table.</li></ul>
|
|
<ul id="mrs_01_24099__ul159951244192915"><li id="mrs_01_24099__li1899504417296">Reading the incremental view (using Hive as an example):<pre class="screen" id="mrs_01_24099__screen195351156193917">set hive.input.format=org.apache.hudi.hadoop.hive.HoodieCombineHiveInputFormat; // This parameter does not need to be specified for SparkSQL.
|
|
set hoodie.test.consume.mode=INCREMENTAL;
|
|
set hoodie.test.consume.max.commits=3;
|
|
set hoodie.test.consume.start.timestamp=20201227153030;
|
|
select count(*) from default.test_rt where `_hoodie_commit_time`>'20201227153030';</pre>
|
|
</li></ul>
|
|
<ul id="mrs_01_24099__ul9588748132916"><li id="mrs_01_24099__li2028420591801">Reading the incremental view (using Spark SQL as an example):<pre class="screen" id="mrs_01_24099__screen5280163515211">set hoodie.test.consume.mode=INCREMENTAL;
|
|
set hoodie.test.consume.start.timestamp=20201227153030; // Specify the initial incremental pull commit.
|
|
set hoodie.test.consume.end.timestamp=20210308212318; // Specify the end commit of the incremental pull. If this parameter is not specified, the latest commit is used.
|
|
select count(*) from default.test_rt where `_hoodie_commit_time`>'20201227153030';</pre>
|
|
</li><li id="mrs_01_24099__li257317255910">Incremental view (using the Spark DataSource API as an example): The operations are the same as those for the COW table. For details, see the operations for the COW table.</li><li id="mrs_01_24099__li18588134832918">Reading the read-optimized view (using Hive and SparkSQL as an example): Directly read the Hudi table with suffix <strong id="mrs_01_24099__b135425813114">_ro</strong> stored in Hive.<pre class="screen" id="mrs_01_24099__screen15734412573">select count(*) from test_ro;</pre>
|
|
</li></ul>
|
|
<ul id="mrs_01_24099__ul144968013016"><li id="mrs_01_24099__li104965011301">Reading the read-optimized view (using the Spark DataSource API as an example): This is similar to reading a common DataSource table.<p id="mrs_01_24099__p9538155218515"><a name="mrs_01_24099__li104965011301"></a><a name="li104965011301"></a><strong id="mrs_01_24099__b160035390112413">QUERY_TYPE_OPT_KEY</strong> must be set to <strong id="mrs_01_24099__b1598341988112413">QUERY_TYPE_READ_OPTIMIZED_OPT_VAL</strong>.</p>
|
|
<pre class="screen" id="mrs_01_24099__screen1169693015719">spark.read.format("hudi")
|
|
.option(QUERY_TYPE_OPT_KEY, QUERY_TYPE_READ_OPTIMIZED_OPT_VAL) // Set the query type to the read-optimized view.
|
|
.load("/tmp/default/mor_bugx/*/*/*/*") // Set the path of the Hudi table to be read. The current table has three levels of partitions.
|
|
.createTempView("mycall")
|
|
spark.sql("select * from mycall").show(100)</pre>
|
|
</li></ul>
|
|
</div>
|
|
<div>
|
|
<div class="familylinks">
|
|
<div class="parentlink"><strong>Parent topic:</strong> <a href="mrs_01_24037.html">Read</a></div>
|
|
</div>
|
|
</div>
|
|
|