forked from docs/doc-exports
Reviewed-by: Pruthi, Vineet <vineet.pruthi@t-systems.com> Co-authored-by: Hasko, Vladimir <vladimir.hasko@t-systems.com> Co-committed-by: Hasko, Vladimir <vladimir.hasko@t-systems.com>
451 lines
52 KiB
HTML
451 lines
52 KiB
HTML
<a name="dli_09_0063"></a><a name="dli_09_0063"></a>
|
|
|
|
<h1 class="topictitle1">Scala Example Code</h1>
|
|
<div id="body8662426"><div class="section" id="dli_09_0063__section18688183217261"><h4 class="sectiontitle">Development Description</h4><p id="dli_09_0063__en-us_topic_0190532351_p5229154915814">The CloudTable HBase and MRS HBase can be connected to DLI as data sources.</p>
|
|
<ul id="dli_09_0063__ul11831020114918"><li id="dli_09_0063__li1683102014914">Prerequisites<p id="dli_09_0063__p1329614379491"><a name="dli_09_0063__li1683102014914"></a><a name="li1683102014914"></a>A datasource connection has been created on the DLI management console. </p>
|
|
<div class="note" id="dli_09_0063__note17925192652815"><img src="public_sys-resources/note_3.0-en-us.png"><span class="notetitle"> </span><div class="notebody"><p id="dli_09_0063__p692572617287">Hard-coded or plaintext passwords pose significant security risks. To ensure security, encrypt your passwords, store them in configuration files or environment variables, and decrypt them when needed.</p>
|
|
</div></div>
|
|
</li><li id="dli_09_0063__li1842519256505">Constructing dependency information and creating a Spark session<ol id="dli_09_0063__ol1875741315516"><li id="dli_09_0063__en-us_topic_0190532351_li9186182241715">Import dependencies.<div class="p" id="dli_09_0063__en-us_topic_0190532351_p562517472013"><a name="dli_09_0063__en-us_topic_0190532351_li9186182241715"></a><a name="en-us_topic_0190532351_li9186182241715"></a>Maven dependency involved<div class="codecoloring" codetype="Scala" id="dli_09_0063__en-us_topic_0190532351_screen5760163172012"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span>
|
|
<span class="normal">2</span>
|
|
<span class="normal">3</span>
|
|
<span class="normal">4</span>
|
|
<span class="normal">5</span></pre></div></td><td class="code"><div><pre><span></span><span class="o"><</span><span class="n">dependency</span><span class="o">></span>
|
|
<span class="w"> </span><span class="o"><</span><span class="n">groupId</span><span class="o">></span><span class="n">org</span><span class="p">.</span><span class="n">apache</span><span class="p">.</span><span class="n">spark</span><span class="o"></</span><span class="n">groupId</span><span class="o">></span>
|
|
<span class="w"> </span><span class="o"><</span><span class="n">artifactId</span><span class="o">></span><span class="n">spark</span><span class="o">-</span><span class="n">sql_2</span><span class="mf">.11</span><span class="o"></</span><span class="n">artifactId</span><span class="o">></span>
|
|
<span class="w"> </span><span class="o"><</span><span class="n">version</span><span class="o">></span><span class="mf">2.3.2</span><span class="o"></</span><span class="n">version</span><span class="o">></span>
|
|
<span class="o"></</span><span class="n">dependency</span><span class="o">></span>
|
|
</pre></div></td></tr></table></div>
|
|
|
|
</div>
|
|
</div>
|
|
<div class="p" id="dli_09_0063__en-us_topic_0190532351_p13761330205">Import dependency packages.<div class="codecoloring" codetype="Scala" id="dli_09_0063__en-us_topic_0190532351_screen1761153192016"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span>
|
|
<span class="normal">2</span>
|
|
<span class="normal">3</span>
|
|
<span class="normal">4</span></pre></div></td><td class="code"><div><pre><span></span><span class="k">import</span><span class="w"> </span><span class="nn">scala</span><span class="p">.</span><span class="nn">collection</span><span class="p">.</span><span class="n">mutable</span>
|
|
<span class="k">import</span><span class="w"> </span><span class="nn">org</span><span class="p">.</span><span class="nn">apache</span><span class="p">.</span><span class="nn">spark</span><span class="p">.</span><span class="nn">sql</span><span class="p">.{</span><span class="nc">Row</span><span class="p">,</span><span class="w"> </span><span class="nc">SparkSession</span><span class="p">}</span>
|
|
<span class="k">import</span><span class="w"> </span><span class="nn">org</span><span class="p">.</span><span class="nn">apache</span><span class="p">.</span><span class="nn">spark</span><span class="p">.</span><span class="nn">rdd</span><span class="p">.</span><span class="nc">RDD</span>
|
|
<span class="k">import</span><span class="w"> </span><span class="nn">org</span><span class="p">.</span><span class="nn">apache</span><span class="p">.</span><span class="nn">spark</span><span class="p">.</span><span class="nn">sql</span><span class="p">.</span><span class="nn">types</span><span class="p">.</span><span class="n">_</span>
|
|
</pre></div></td></tr></table></div>
|
|
|
|
</div>
|
|
</div>
|
|
</li><li id="dli_09_0063__en-us_topic_0190532351_li133002613132">Create a session.<div class="codecoloring" codetype="Scala" id="dli_09_0063__en-us_topic_0190532351_screen12232591413"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span></pre></div></td><td class="code"><div><pre><span></span><span class="kd">val</span><span class="w"> </span><span class="n">sparkSession</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nc">SparkSession</span><span class="p">.</span><span class="n">builder</span><span class="p">().</span><span class="n">getOrCreate</span><span class="p">()</span>
|
|
</pre></div></td></tr></table></div>
|
|
|
|
</div>
|
|
</li><li id="dli_09_0063__li107571813105119">Create a table to connect to an HBase data source.<ul id="dli_09_0063__ul142611043125419"><li id="dli_09_0063__li1579911545516">The sample code is applicable, if Kerberos authentication <strong id="dli_09_0063__b234843883113">is disabled</strong> for the interconnected HBase cluster:<div class="codecoloring" codetype="Scala" id="dli_09_0063__screen57999159550"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span>
|
|
<span class="normal">2</span>
|
|
<span class="normal">3</span>
|
|
<span class="normal">4</span>
|
|
<span class="normal">5</span>
|
|
<span class="normal">6</span>
|
|
<span class="normal">7</span>
|
|
<span class="normal">8</span>
|
|
<span class="normal">9</span></pre></div></td><td class="code"><div><pre><span></span><span class="n">sparkSession</span><span class="p">.</span><span class="n">sql</span><span class="p">(</span><span class="s">"CREATE TABLE test_hbase('id' STRING, 'location' STRING, 'city' STRING, 'booleanf' BOOLEAN, </span>
|
|
<span class="s"> 'shortf' SHORT, 'intf' INT, 'longf' LONG, 'floatf' FLOAT,'doublef' DOUBLE) using hbase OPTIONS (</span>
|
|
<span class="s"> 'ZKHost'='cloudtable-cf82-zk3-pa6HnHpf.cloudtable.com:2181,</span>
|
|
<span class="s"> cloudtable-cf82-zk2-weBkIrjI.cloudtable.com:2181,</span>
|
|
<span class="s"> cloudtable-cf82-zk1-WY09px9l.cloudtable.com:2181',</span>
|
|
<span class="s"> 'TableName'='table_DupRowkey1',</span>
|
|
<span class="s"> 'RowKey'='id:5,location:6,city:7',</span>
|
|
<span class="s"> 'Cols'='booleanf:CF1.booleanf,shortf:CF1.shortf,intf:CF1.intf,longf:CF1.longf,floatf:CF1.floatf,doublef:CF1.doublef')"</span>
|
|
<span class="p">)</span>
|
|
</pre></div></td></tr></table></div>
|
|
|
|
</div>
|
|
</li><li id="dli_09_0063__li1726111438547">The sample code is applicable, if Kerberos authentication <strong id="dli_09_0063__b115681649153216">is enabled</strong> for the interconnected HBase cluster:<div class="codecoloring" codetype="Scala" id="dli_09_0063__screen5613334145515"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal"> 1</span>
|
|
<span class="normal"> 2</span>
|
|
<span class="normal"> 3</span>
|
|
<span class="normal"> 4</span>
|
|
<span class="normal"> 5</span>
|
|
<span class="normal"> 6</span>
|
|
<span class="normal"> 7</span>
|
|
<span class="normal"> 8</span>
|
|
<span class="normal"> 9</span>
|
|
<span class="normal">10</span>
|
|
<span class="normal">11</span></pre></div></td><td class="code"><div><pre><span></span><span class="n">sparkSession</span><span class="p">.</span><span class="n">sql</span><span class="p">(</span><span class="s">"CREATE TABLE test_hbase('id' STRING, 'location' STRING, 'city' STRING, 'booleanf' BOOLEAN, </span>
|
|
<span class="s"> 'shortf' SHORT, 'intf' INT, 'longf' LONG, 'floatf' FLOAT,'doublef' DOUBLE) using hbase OPTIONS (</span>
|
|
<span class="s"> 'ZKHost'='cloudtable-cf82-zk3-pa6HnHpf.cloudtable.com:2181,</span>
|
|
<span class="s"> cloudtable-cf82-zk2-weBkIrjI.cloudtable.com:2181,</span>
|
|
<span class="s"> cloudtable-cf82-zk1-WY09px9l.cloudtable.com:2181',</span>
|
|
<span class="s"> 'TableName'='table_DupRowkey1',</span>
|
|
<span class="s"> 'RowKey'='id:5,location:6,city:7', </span>
|
|
<span class="s"> 'Cols'='booleanf:CF1.booleanf,shortf:CF1.shortf,intf:CF1.intf,longf:CF1.longf,floatf:CF1.floatf,doublef:CF1.doublef',</span>
|
|
<span class="s"> 'krb5conf'='./krb5.conf',</span>
|
|
<span class="s"> 'keytab' = './user.keytab',</span>
|
|
<span class="s"> 'principal' = 'krbtest')"</span><span class="p">)</span>
|
|
</pre></div></td></tr></table></div>
|
|
|
|
</div>
|
|
</li></ul>
|
|
|
|
<div class="tablenoborder"><a name="dli_09_0063__table15979164115531"></a><a name="table15979164115531"></a><table cellpadding="4" cellspacing="0" summary="" id="dli_09_0063__table15979164115531" frame="border" border="1" rules="all"><caption><b>Table 1 </b>Parameters for creating a table</caption><thead align="left"><tr id="dli_09_0063__row697774118533"><th align="left" class="cellrowborder" valign="top" width="20.61%" id="mcps1.3.1.3.2.1.3.2.2.3.1.1"><p id="dli_09_0063__p1197754135313">Parameter</p>
|
|
</th>
|
|
<th align="left" class="cellrowborder" valign="top" width="79.39%" id="mcps1.3.1.3.2.1.3.2.2.3.1.2"><p id="dli_09_0063__p18977174185318">Description</p>
|
|
</th>
|
|
</tr>
|
|
</thead>
|
|
<tbody><tr id="dli_09_0063__row2977144145319"><td class="cellrowborder" valign="top" width="20.61%" headers="mcps1.3.1.3.2.1.3.2.2.3.1.1 "><p id="dli_09_0063__p197716417532">ZKHost</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="79.39%" headers="mcps1.3.1.3.2.1.3.2.2.3.1.2 "><p id="dli_09_0063__p1297794115313">ZooKeeper IP address of the HBase cluster.</p>
|
|
<p id="dli_09_0063__p1497744185314">You need to create a datasource connection first. </p>
|
|
<ul id="dli_09_0063__ul1897704110533"><li id="dli_09_0063__li1197794119538">To access the CloudTable cluster, specify the ZooKeeper connection address in the internal network.</li><li id="dli_09_0063__li097764115535">To access the MRS cluster, specify the IP addresses and port numbers of the ZooKeeper nodes. The format is as follows: ZK_IP1:ZK_PORT1,ZK_IP2:ZK_PORT2</li></ul>
|
|
</td>
|
|
</tr>
|
|
<tr id="dli_09_0063__row199771941125316"><td class="cellrowborder" valign="top" width="20.61%" headers="mcps1.3.1.3.2.1.3.2.2.3.1.1 "><p id="dli_09_0063__p129771641115319">RowKey</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="79.39%" headers="mcps1.3.1.3.2.1.3.2.2.3.1.2 "><p id="dli_09_0063__p1197774112538">Row key field of the table connected to DLI. The single and composite row keys are supported. A single row key can be of the numeric or string type. The length does not need to be specified. The composite row key supports only fixed-length data of the string type. The format is <strong id="dli_09_0063__b0176203118579"><em id="dli_09_0063__i1517113315572">attribute name 1</em>:<em id="dli_09_0063__i15176231115712">Length</em>, <em id="dli_09_0063__i101761831185713">attribute name 2</em>:<em id="dli_09_0063__i1517623114575">Length</em></strong>.</p>
|
|
</td>
|
|
</tr>
|
|
<tr id="dli_09_0063__row79791241205317"><td class="cellrowborder" valign="top" width="20.61%" headers="mcps1.3.1.3.2.1.3.2.2.3.1.1 "><p id="dli_09_0063__p159789419536">Cols</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="79.39%" headers="mcps1.3.1.3.2.1.3.2.2.3.1.2 "><p id="dli_09_0063__p1697804120534">Mapping between the fields in the DLI table and the CloudTable table. In this mapping, the DLI table field is placed before the colon (:) and the CloudTable table field is placed after the colon (:). The period (.) is used to separate the column family and column name of the CloudTable table.</p>
|
|
<p id="dli_09_0063__p10979164117533">For example: <strong id="dli_09_0063__en-us_topic_0190532351_b1083385871312">DLI table field 1:CloudTable table.CloudTable table field 1</strong>, <strong id="dli_09_0063__en-us_topic_0190532351_b88711333146">DLI table field 2:CloudTable table.CloudTable table field 2</strong>, <strong id="dli_09_0063__en-us_topic_0190532351_b102421483148">DLI table field 3:CLoudTable table.CloudTable table field 3</strong></p>
|
|
</td>
|
|
</tr>
|
|
<tr id="dli_09_0063__row1497913415531"><td class="cellrowborder" valign="top" width="20.61%" headers="mcps1.3.1.3.2.1.3.2.2.3.1.1 "><p id="dli_09_0063__p79797418532">krb5conf</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="79.39%" headers="mcps1.3.1.3.2.1.3.2.2.3.1.2 "><p id="dli_09_0063__p139792412538">Path of the <strong id="dli_09_0063__b191607345346">krb5.conf</strong> file. This parameter is required when Kerberos authentication is enabled. The format is './krb5.conf'. For details, see <a href="dli_09_0196.html#dli_09_0196__section12676527182715">Completing Configurations for Enabling Kerberos Authentication</a>.</p>
|
|
</td>
|
|
</tr>
|
|
<tr id="dli_09_0063__row1397994115535"><td class="cellrowborder" valign="top" width="20.61%" headers="mcps1.3.1.3.2.1.3.2.2.3.1.1 "><p id="dli_09_0063__p169791941135318">keytab</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="79.39%" headers="mcps1.3.1.3.2.1.3.2.2.3.1.2 "><p id="dli_09_0063__p14979541175316">Path of the <strong id="dli_09_0063__b152531128113611">keytab</strong> file. This parameter is required when Kerberos authentication is enabled. The format is './user.keytab.'. For details, see <a href="dli_09_0196.html#dli_09_0196__section12676527182715">Completing Configurations for Enabling Kerberos Authentication</a>.</p>
|
|
</td>
|
|
</tr>
|
|
<tr id="dli_09_0063__row109799418533"><td class="cellrowborder" valign="top" width="20.61%" headers="mcps1.3.1.3.2.1.3.2.2.3.1.1 "><p id="dli_09_0063__p8979154115532">principal</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="79.39%" headers="mcps1.3.1.3.2.1.3.2.2.3.1.2 "><p id="dli_09_0063__p69791841185311">Username created for Kerberos authentication.</p>
|
|
</td>
|
|
</tr>
|
|
</tbody>
|
|
</table>
|
|
</div>
|
|
</li></ol>
|
|
</li></ul>
|
|
</div>
|
|
<div class="section" id="dli_09_0063__section91181950123817"><h4 class="sectiontitle">Accessing a Data Source Using a SQL API</h4><ol id="dli_09_0063__en-us_topic_0190532351_ol12255153864610"><li id="dli_09_0063__en-us_topic_0190532351_li1025543814612">Insert data.<div class="codecoloring" codetype="Scala" id="dli_09_0063__en-us_topic_0190532351_screen66292184333"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span></pre></div></td><td class="code"><div><pre><span></span><span class="n">sparkSession</span><span class="p">.</span><span class="n">sql</span><span class="p">(</span><span class="s">"insert into test_hbase values('12345','abc','guiyang',false,null,3,23,2.3,2.34)"</span><span class="p">)</span>
|
|
</pre></div></td></tr></table></div>
|
|
|
|
</div>
|
|
</li><li id="dli_09_0063__en-us_topic_0190532351_li112511258204618">Query data.<div class="codecoloring" codetype="Scala" id="dli_09_0063__en-us_topic_0190532351_screen8335528349"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span></pre></div></td><td class="code"><div><pre><span></span><span class="n">sparkSession</span><span class="p">.</span><span class="n">sql</span><span class="p">(</span><span class="s">"select * from test_hbase"</span><span class="p">).</span><span class="n">show</span><span class="w"> </span><span class="p">()</span>
|
|
</pre></div></td></tr></table></div>
|
|
|
|
</div>
|
|
</li></ol>
|
|
</div>
|
|
<div class="section" id="dli_09_0063__section1976565411384"><h4 class="sectiontitle">Accessing a Data Source Using a DataFrame API</h4><ol id="dli_09_0063__ol135116369374"><li id="dli_09_0063__li551173613375">Construct a schema.<div class="codecoloring" codetype="Scala" id="dli_09_0063__screen8511936133718"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal"> 1</span>
|
|
<span class="normal"> 2</span>
|
|
<span class="normal"> 3</span>
|
|
<span class="normal"> 4</span>
|
|
<span class="normal"> 5</span>
|
|
<span class="normal"> 6</span>
|
|
<span class="normal"> 7</span>
|
|
<span class="normal"> 8</span>
|
|
<span class="normal"> 9</span>
|
|
<span class="normal">10</span></pre></div></td><td class="code"><div><pre><span></span><span class="kd">val</span><span class="w"> </span><span class="n">attrId</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">new</span><span class="w"> </span><span class="nc">StructField</span><span class="p">(</span><span class="s">"id"</span><span class="p">,</span><span class="nc">StringType</span><span class="p">)</span>
|
|
<span class="kd">val</span><span class="w"> </span><span class="n">location</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">new</span><span class="w"> </span><span class="nc">StructField</span><span class="p">(</span><span class="s">"location"</span><span class="p">,</span><span class="nc">StringType</span><span class="p">)</span>
|
|
<span class="kd">val</span><span class="w"> </span><span class="n">city</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">new</span><span class="w"> </span><span class="nc">StructField</span><span class="p">(</span><span class="s">"city"</span><span class="p">,</span><span class="nc">StringType</span><span class="p">)</span>
|
|
<span class="kd">val</span><span class="w"> </span><span class="n">booleanf</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">new</span><span class="w"> </span><span class="nc">StructField</span><span class="p">(</span><span class="s">"booleanf"</span><span class="p">,</span><span class="nc">BooleanType</span><span class="p">)</span>
|
|
<span class="kd">val</span><span class="w"> </span><span class="n">shortf</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">new</span><span class="w"> </span><span class="nc">StructField</span><span class="p">(</span><span class="s">"shortf"</span><span class="p">,</span><span class="nc">ShortType</span><span class="p">)</span>
|
|
<span class="kd">val</span><span class="w"> </span><span class="n">intf</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">new</span><span class="w"> </span><span class="nc">StructField</span><span class="p">(</span><span class="s">"intf"</span><span class="p">,</span><span class="nc">IntegerType</span><span class="p">)</span>
|
|
<span class="kd">val</span><span class="w"> </span><span class="n">longf</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">new</span><span class="w"> </span><span class="nc">StructField</span><span class="p">(</span><span class="s">"longf"</span><span class="p">,</span><span class="nc">LongType</span><span class="p">)</span>
|
|
<span class="kd">val</span><span class="w"> </span><span class="n">floatf</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">new</span><span class="w"> </span><span class="nc">StructField</span><span class="p">(</span><span class="s">"floatf"</span><span class="p">,</span><span class="nc">FloatType</span><span class="p">)</span>
|
|
<span class="kd">val</span><span class="w"> </span><span class="n">doublef</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">new</span><span class="w"> </span><span class="nc">StructField</span><span class="p">(</span><span class="s">"doublef"</span><span class="p">,</span><span class="nc">DoubleType</span><span class="p">)</span>
|
|
<span class="kd">val</span><span class="w"> </span><span class="n">attrs</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nc">Array</span><span class="p">(</span><span class="n">attrId</span><span class="p">,</span><span class="w"> </span><span class="n">location</span><span class="p">,</span><span class="n">city</span><span class="p">,</span><span class="n">booleanf</span><span class="p">,</span><span class="n">shortf</span><span class="p">,</span><span class="n">intf</span><span class="p">,</span><span class="n">longf</span><span class="p">,</span><span class="n">floatf</span><span class="p">,</span><span class="n">doublef</span><span class="p">)</span>
|
|
</pre></div></td></tr></table></div>
|
|
|
|
</div>
|
|
</li><li id="dli_09_0063__li105163693720">Construct data based on the schema type.<div class="codecoloring" codetype="Scala" id="dli_09_0063__screen3511636113718"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span>
|
|
<span class="normal">2</span></pre></div></td><td class="code"><div><pre><span></span><span class="kd">val</span><span class="w"> </span><span class="n">mutableRow</span><span class="p">:</span><span class="w"> </span><span class="nc">Seq</span><span class="p">[</span><span class="nc">Any</span><span class="p">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nc">Seq</span><span class="p">(</span><span class="s">"12345"</span><span class="p">,</span><span class="s">"abc"</span><span class="p">,</span><span class="s">"city1"</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="kc">null</span><span class="p">,</span><span class="mi">3</span><span class="p">,</span><span class="mi">23</span><span class="p">,</span><span class="mf">2.3</span><span class="p">,</span><span class="mf">2.34</span><span class="p">)</span>
|
|
<span class="kd">val</span><span class="w"> </span><span class="n">rddData</span><span class="p">:</span><span class="w"> </span><span class="nc">RDD</span><span class="p">[</span><span class="nc">Row</span><span class="p">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">sparkSession</span><span class="p">.</span><span class="n">sparkContext</span><span class="p">.</span><span class="n">parallelize</span><span class="p">(</span><span class="nc">Array</span><span class="p">(</span><span class="nc">Row</span><span class="p">.</span><span class="n">fromSeq</span><span class="p">(</span><span class="n">mutableRow</span><span class="p">)),</span><span class="w"> </span><span class="mi">1</span><span class="p">)</span>
|
|
</pre></div></td></tr></table></div>
|
|
|
|
</div>
|
|
</li><li id="dli_09_0063__li18518367379">Import data to HBase.<div class="codecoloring" codetype="Scala" id="dli_09_0063__screen16510364371"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span></pre></div></td><td class="code"><div><pre><span></span><span class="n">sparkSession</span><span class="p">.</span><span class="n">createDataFrame</span><span class="p">(</span><span class="n">rddData</span><span class="p">,</span><span class="w"> </span><span class="k">new</span><span class="w"> </span><span class="nc">StructType</span><span class="p">(</span><span class="n">attrs</span><span class="p">)).</span><span class="n">write</span><span class="p">.</span><span class="n">insertInto</span><span class="p">(</span><span class="s">"test_hbase"</span><span class="p">)</span>
|
|
</pre></div></td></tr></table></div>
|
|
|
|
</div>
|
|
</li><li id="dli_09_0063__li1051236103716">Read data from HBase.<div class="codecoloring" codetype="Scala" id="dli_09_0063__screen115115365378"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span>
|
|
<span class="normal">2</span>
|
|
<span class="normal">3</span>
|
|
<span class="normal">4</span>
|
|
<span class="normal">5</span>
|
|
<span class="normal">6</span>
|
|
<span class="normal">7</span>
|
|
<span class="normal">8</span></pre></div></td><td class="code"><div><pre><span></span><span class="kd">val</span><span class="w"> </span><span class="n">map</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">new</span><span class="w"> </span><span class="n">mutable</span><span class="p">.</span><span class="nc">HashMap</span><span class="p">[</span><span class="nc">String</span><span class="p">,</span><span class="w"> </span><span class="nc">String</span><span class="p">]()</span>
|
|
<span class="n">map</span><span class="p">(</span><span class="s">"TableName"</span><span class="p">)</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">"table_DupRowkey1"</span>
|
|
<span class="n">map</span><span class="p">(</span><span class="s">"RowKey"</span><span class="p">)</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">"id:5,location:6,city:7"</span>
|
|
<span class="n">map</span><span class="p">(</span><span class="s">"Cols"</span><span class="p">)</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">"booleanf:CF1.booleanf,shortf:CF1.shortf,intf:CF1.intf,longf:CF1.longf,floatf:CF1.floatf,doublef:CF1.doublef"</span>
|
|
<span class="n">map</span><span class="p">(</span><span class="s">"ZKHost"</span><span class="p">)</span><span class="o">=</span><span class="s">"cloudtable-cf82-zk3-pa6HnHpf.cloudtable.com:2181,</span>
|
|
<span class="s"> cloudtable-cf82-zk2-weBkIrjI.cloudtable.com:2181,</span>
|
|
<span class="s"> cloudtable-cf82-zk1-WY09px9l.cloudtable.com:2181"</span>
|
|
<span class="n">sparkSession</span><span class="p">.</span><span class="n">read</span><span class="p">.</span><span class="n">schema</span><span class="p">(</span><span class="k">new</span><span class="w"> </span><span class="nc">StructType</span><span class="p">(</span><span class="n">attrs</span><span class="p">)).</span><span class="n">format</span><span class="p">(</span><span class="s">"hbase"</span><span class="p">).</span><span class="n">options</span><span class="p">(</span><span class="n">map</span><span class="p">.</span><span class="n">toMap</span><span class="p">).</span><span class="n">load</span><span class="p">().</span><span class="n">show</span><span class="p">()</span>
|
|
</pre></div></td></tr></table></div>
|
|
|
|
</div>
|
|
</li></ol>
|
|
</div>
|
|
<div class="section" id="dli_09_0063__section1996275616384"><h4 class="sectiontitle">Submitting a Spark Job</h4><ol id="dli_09_0063__ol3497251384"><li id="dli_09_0063__li144971754382">Generate a JAR package based on the code and upload the package to DLI.<p id="dli_09_0063__p1749619513385"><a name="dli_09_0063__li144971754382"></a><a name="li144971754382"></a></p>
|
|
<p id="dli_09_0063__p114961151385"></p>
|
|
</li><li id="dli_09_0063__li174976515386">(Optional) Add the <strong id="dli_09_0063__b1655183253817">krb5.conf</strong> and <strong id="dli_09_0063__b1965512321389">user.keytab</strong> files to other dependency files of the job when creating a Spark job in an MRS cluster with Kerberos authentication enabled. Skip this step if Kerberos authentication is not enabled for the cluster.</li><li id="dli_09_0063__li0497053385">In the Spark job editor, select the corresponding dependency module and execute the Spark job.<p id="dli_09_0063__p1149712510387"><a name="dli_09_0063__li0497053385"></a><a name="li0497053385"></a></p>
|
|
<div class="p" id="dli_09_0063__p104977515388"><div class="note" id="dli_09_0063__note849716511388"><img src="public_sys-resources/note_3.0-en-us.png"><span class="notetitle"> </span><div class="notebody"><ul id="dli_09_0063__ul849714563812"><li id="dli_09_0063__li154971654382">If the Spark version is 2.3.2 (will be offline soon) or 2.4.5, set <strong id="dli_09_0063__b138521147184213">Module</strong> to <strong id="dli_09_0063__b1285312471422">sys.datasource.hbase</strong> when you submit a job.</li><li id="dli_09_0063__li1949705183817">If the Spark version is 3.1.1, you do not need to select a module. Set <strong id="dli_09_0063__b11526252134219">Spark parameters (--conf)</strong>.<p id="dli_09_0063__p24971858385">spark.driver.extraClassPath=/usr/share/extension/dli/spark-jar/datasource/hbase/*</p>
|
|
<p id="dli_09_0063__p144971858385">spark.executor.extraClassPath=/usr/share/extension/dli/spark-jar/datasource/hbase/*</p>
|
|
</li></ul>
|
|
</div></div>
|
|
</div>
|
|
</li></ol>
|
|
</div>
|
|
<div class="section" id="dli_09_0063__section1338223113556"><h4 class="sectiontitle">Complete Example Code</h4><ul id="dli_09_0063__ul5761033165617"><li id="dli_09_0063__li17663395612">Maven dependency<div class="codecoloring" codetype="Scala" id="dli_09_0063__en-us_topic_0190532350_screen131251919174915"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span>
|
|
<span class="normal">2</span>
|
|
<span class="normal">3</span>
|
|
<span class="normal">4</span>
|
|
<span class="normal">5</span></pre></div></td><td class="code"><div><pre><span></span><span class="o"><</span><span class="n">dependency</span><span class="o">></span>
|
|
<span class="w"> </span><span class="o"><</span><span class="n">groupId</span><span class="o">></span><span class="n">org</span><span class="p">.</span><span class="n">apache</span><span class="p">.</span><span class="n">spark</span><span class="o"></</span><span class="n">groupId</span><span class="o">></span>
|
|
<span class="w"> </span><span class="o"><</span><span class="n">artifactId</span><span class="o">></span><span class="n">spark</span><span class="o">-</span><span class="n">sql_2</span><span class="mf">.11</span><span class="o"></</span><span class="n">artifactId</span><span class="o">></span>
|
|
<span class="w"> </span><span class="o"><</span><span class="n">version</span><span class="o">></span><span class="mf">2.3.2</span><span class="o"></</span><span class="n">version</span><span class="o">></span>
|
|
<span class="o"></</span><span class="n">dependency</span><span class="o">></span>
|
|
</pre></div></td></tr></table></div>
|
|
|
|
</div>
|
|
</li><li id="dli_09_0063__li10958125111563">Connecting to data sources through SQL APIs<ul id="dli_09_0063__ul12255131810227"><li id="dli_09_0063__li4315191511224">Sample code when Kerberos authentication is disabled<div class="codecoloring" codetype="Scala" id="dli_09_0063__screen89459272219"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal"> 1</span>
|
|
<span class="normal"> 2</span>
|
|
<span class="normal"> 3</span>
|
|
<span class="normal"> 4</span>
|
|
<span class="normal"> 5</span>
|
|
<span class="normal"> 6</span>
|
|
<span class="normal"> 7</span>
|
|
<span class="normal"> 8</span>
|
|
<span class="normal"> 9</span>
|
|
<span class="normal">10</span>
|
|
<span class="normal">11</span>
|
|
<span class="normal">12</span>
|
|
<span class="normal">13</span>
|
|
<span class="normal">14</span>
|
|
<span class="normal">15</span>
|
|
<span class="normal">16</span>
|
|
<span class="normal">17</span>
|
|
<span class="normal">18</span>
|
|
<span class="normal">19</span>
|
|
<span class="normal">20</span>
|
|
<span class="normal">21</span>
|
|
<span class="normal">22</span>
|
|
<span class="normal">23</span>
|
|
<span class="normal">24</span>
|
|
<span class="normal">25</span>
|
|
<span class="normal">26</span>
|
|
<span class="normal">27</span></pre></div></td><td class="code"><div><pre><span></span><span class="k">import</span><span class="w"> </span><span class="nn">org</span><span class="p">.</span><span class="nn">apache</span><span class="p">.</span><span class="nn">spark</span><span class="p">.</span><span class="nn">sql</span><span class="p">.</span><span class="nc">SparkSession</span>
|
|
|
|
<span class="k">object</span><span class="w"> </span><span class="nc">Test_SparkSql_HBase</span><span class="w"> </span><span class="p">{</span>
|
|
<span class="w"> </span><span class="k">def</span><span class="w"> </span><span class="nf">main</span><span class="p">(</span><span class="n">args</span><span class="p">:</span><span class="w"> </span><span class="nc">Array</span><span class="p">[</span><span class="nc">String</span><span class="p">]):</span><span class="w"> </span><span class="nc">Unit</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{</span>
|
|
<span class="w"> </span><span class="c1">// Create a SparkSession session.</span>
|
|
<span class="w"> </span><span class="kd">val</span><span class="w"> </span><span class="n">sparkSession</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nc">SparkSession</span><span class="p">.</span><span class="n">builder</span><span class="p">().</span><span class="n">getOrCreate</span><span class="p">()</span>
|
|
|
|
<span class="w"> </span><span class="cm">/**</span>
|
|
<span class="cm"> * Create an association table for the DLI association Hbase table</span>
|
|
<span class="cm"> */</span>
|
|
<span class="w"> </span><span class="n">sparkSession</span><span class="p">.</span><span class="n">sql</span><span class="p">(</span><span class="s">"CREATE TABLE test_hbase('id' STRING, 'location' STRING, 'city' STRING, 'booleanf' BOOLEAN, </span>
|
|
<span class="s"> 'shortf' SHORT, 'intf' INT, 'longf' LONG, 'floatf' FLOAT,'doublef' DOUBLE) using hbase OPTIONS (</span>
|
|
<span class="s"> 'ZKHost'='cloudtable-cf82-zk3-pa6HnHpf.cloudtable.com:2181,</span>
|
|
<span class="s"> cloudtable-cf82-zk2-weBkIrjI.cloudtable.com:2181,</span>
|
|
<span class="s"> cloudtable-cf82-zk1-WY09px9l.cloudtable.com:2181',</span>
|
|
<span class="s"> 'TableName'='table_DupRowkey1',</span>
|
|
<span class="s"> 'RowKey'='id:5,location:6,city:7',</span>
|
|
<span class="s"> 'Cols'='booleanf:CF1.booleanf,shortf:CF1.shortf,intf:CF1.intf,</span>
|
|
<span class="s"> longf:CF1.longf,floatf:CF1.floatf,doublef:CF1.doublef')"</span><span class="p">)</span>
|
|
|
|
<span class="w"> </span><span class="c1">//*****************************SQL model***********************************</span>
|
|
<span class="w"> </span><span class="n">sparkSession</span><span class="p">.</span><span class="n">sql</span><span class="p">(</span><span class="s">"insert into test_hbase values('12345','abc','city1',false,null,3,23,2.3,2.34)"</span><span class="p">)</span>
|
|
<span class="w"> </span><span class="n">sparkSession</span><span class="p">.</span><span class="n">sql</span><span class="p">(</span><span class="s">"select * from test_hbase"</span><span class="p">).</span><span class="n">collect</span><span class="p">()</span>
|
|
|
|
<span class="w"> </span><span class="n">sparkSession</span><span class="p">.</span><span class="n">close</span><span class="p">()</span>
|
|
<span class="w"> </span><span class="p">}</span>
|
|
<span class="p">}</span>
|
|
</pre></div></td></tr></table></div>
|
|
|
|
</div>
|
|
</li><li id="dli_09_0063__li16354422172215">Sample code when Kerberos authentication is enabled<pre class="screen" id="dli_09_0063__screen163574991615">import org.apache.spark.SparkFiles
|
|
import org.apache.spark.sql.SparkSession
|
|
|
|
import java.io.{File, FileInputStream, FileOutputStream}
|
|
|
|
object Test_SparkSql_HBase_Kerberos {
|
|
|
|
def copyFile2(Input:String)(OutPut:String): Unit ={
|
|
val fis = new FileInputStream(Input)
|
|
val fos = new FileOutputStream(OutPut)
|
|
val buf = new Array[Byte](1024)
|
|
var len = 0
|
|
while ({len = fis.read(buf);len} != -1){
|
|
fos.write(buf,0,len)
|
|
}
|
|
fos.close()
|
|
fis.close()
|
|
}
|
|
|
|
def main(args: Array[String]): Unit = {
|
|
// Create a SparkSession session.
|
|
val sparkSession = SparkSession.builder().getOrCreate()
|
|
val sc = sparkSession.sparkContext
|
|
sc.addFile("OBS address of krb5.conf")
|
|
sc.addFile("OBS address of user.keytab")
|
|
Thread.sleep(10)
|
|
|
|
val krb5_startfile = new File(SparkFiles.get("krb5.conf"))
|
|
val keytab_startfile = new File(SparkFiles.get("user.keytab"))
|
|
val path_user = System.getProperty("user.dir")
|
|
val keytab_endfile = new File(path_user + "/" + keytab_startfile.getName)
|
|
val krb5_endfile = new File(path_user + "/" + krb5_startfile.getName)
|
|
println(keytab_endfile)
|
|
println(krb5_endfile)
|
|
|
|
var krbinput = SparkFiles.get("krb5.conf")
|
|
var krboutput = path_user+"/krb5.conf"
|
|
copyFile2(krbinput)(krboutput)
|
|
|
|
var keytabinput = SparkFiles.get("user.keytab")
|
|
var keytaboutput = path_user+"/user.keytab"
|
|
copyFile2(keytabinput)(keytaboutput)
|
|
Thread.sleep(10)
|
|
/**
|
|
* Create an association table for the DLI association Hbase table
|
|
*/
|
|
sparkSession.sql("CREATE TABLE testhbase(id string,booleanf boolean,shortf short,intf int,longf long,floatf float,doublef double) " +
|
|
"using hbase OPTIONS(" +
|
|
"'ZKHost'='<em id="dli_09_0063__i1539594121717">10.0.0.146:2181</em>'," +
|
|
"'TableName'='hbtest'," +
|
|
"'RowKey'='id:100'," +
|
|
"'Cols'='booleanf:CF1.booleanf,shortf:CF1.shortf,intf:CF1.intf,longf:CF2.longf,floatf:CF1.floatf,doublef:CF2.doublef'," +
|
|
"'krb5conf'='" + path_user + "/krb5.conf'," +
|
|
"'keytab'='" + path_user+ "/user.keytab'," +
|
|
"'principal'='krbtest') ")
|
|
|
|
//*****************************SQL model***********************************
|
|
sparkSession.sql("insert into testhbase values('newtest',true,1,2,3,4,5)")
|
|
val result = sparkSession.sql("select * from testhbase")
|
|
result.show()
|
|
|
|
sparkSession.close()
|
|
}
|
|
}</pre>
|
|
</li></ul>
|
|
</li><li id="dli_09_0063__li1785173395716">Connecting to data sources through DataFrame APIs<div class="codecoloring" codetype="Scala" id="dli_09_0063__en-us_topic_0190532350_screen8533158155715"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal"> 1</span>
|
|
<span class="normal"> 2</span>
|
|
<span class="normal"> 3</span>
|
|
<span class="normal"> 4</span>
|
|
<span class="normal"> 5</span>
|
|
<span class="normal"> 6</span>
|
|
<span class="normal"> 7</span>
|
|
<span class="normal"> 8</span>
|
|
<span class="normal"> 9</span>
|
|
<span class="normal">10</span>
|
|
<span class="normal">11</span>
|
|
<span class="normal">12</span>
|
|
<span class="normal">13</span>
|
|
<span class="normal">14</span>
|
|
<span class="normal">15</span>
|
|
<span class="normal">16</span>
|
|
<span class="normal">17</span>
|
|
<span class="normal">18</span>
|
|
<span class="normal">19</span>
|
|
<span class="normal">20</span>
|
|
<span class="normal">21</span>
|
|
<span class="normal">22</span>
|
|
<span class="normal">23</span>
|
|
<span class="normal">24</span>
|
|
<span class="normal">25</span>
|
|
<span class="normal">26</span>
|
|
<span class="normal">27</span>
|
|
<span class="normal">28</span>
|
|
<span class="normal">29</span>
|
|
<span class="normal">30</span>
|
|
<span class="normal">31</span>
|
|
<span class="normal">32</span>
|
|
<span class="normal">33</span>
|
|
<span class="normal">34</span>
|
|
<span class="normal">35</span>
|
|
<span class="normal">36</span>
|
|
<span class="normal">37</span>
|
|
<span class="normal">38</span>
|
|
<span class="normal">39</span>
|
|
<span class="normal">40</span>
|
|
<span class="normal">41</span>
|
|
<span class="normal">42</span>
|
|
<span class="normal">43</span>
|
|
<span class="normal">44</span>
|
|
<span class="normal">45</span>
|
|
<span class="normal">46</span>
|
|
<span class="normal">47</span>
|
|
<span class="normal">48</span>
|
|
<span class="normal">49</span>
|
|
<span class="normal">50</span>
|
|
<span class="normal">51</span>
|
|
<span class="normal">52</span>
|
|
<span class="normal">53</span>
|
|
<span class="normal">54</span></pre></div></td><td class="code"><div><pre><span></span><span class="k">import</span><span class="w"> </span><span class="nn">scala</span><span class="p">.</span><span class="nn">collection</span><span class="p">.</span><span class="n">mutable</span>
|
|
|
|
<span class="k">import</span><span class="w"> </span><span class="nn">org</span><span class="p">.</span><span class="nn">apache</span><span class="p">.</span><span class="nn">spark</span><span class="p">.</span><span class="nn">sql</span><span class="p">.{</span><span class="nc">Row</span><span class="p">,</span><span class="w"> </span><span class="nc">SparkSession</span><span class="p">}</span>
|
|
<span class="k">import</span><span class="w"> </span><span class="nn">org</span><span class="p">.</span><span class="nn">apache</span><span class="p">.</span><span class="nn">spark</span><span class="p">.</span><span class="nn">rdd</span><span class="p">.</span><span class="nc">RDD</span>
|
|
<span class="k">import</span><span class="w"> </span><span class="nn">org</span><span class="p">.</span><span class="nn">apache</span><span class="p">.</span><span class="nn">spark</span><span class="p">.</span><span class="nn">sql</span><span class="p">.</span><span class="nn">types</span><span class="p">.</span><span class="n">_</span>
|
|
|
|
<span class="k">object</span><span class="w"> </span><span class="nc">Test_SparkSql_HBase</span><span class="w"> </span><span class="p">{</span>
|
|
<span class="w"> </span><span class="k">def</span><span class="w"> </span><span class="nf">main</span><span class="p">(</span><span class="n">args</span><span class="p">:</span><span class="w"> </span><span class="nc">Array</span><span class="p">[</span><span class="nc">String</span><span class="p">]):</span><span class="w"> </span><span class="nc">Unit</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{</span>
|
|
<span class="w"> </span><span class="c1">// Create a SparkSession session.</span>
|
|
<span class="w"> </span><span class="kd">val</span><span class="w"> </span><span class="n">sparkSession</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nc">SparkSession</span><span class="p">.</span><span class="n">builder</span><span class="p">().</span><span class="n">getOrCreate</span><span class="p">()</span>
|
|
|
|
<span class="w"> </span><span class="c1">// Create an association table for the DLI association Hbase table</span>
|
|
<span class="w"> </span><span class="n">sparkSession</span><span class="p">.</span><span class="n">sql</span><span class="p">(</span><span class="s">"CREATE TABLE test_hbase('id' STRING, 'location' STRING, 'city' STRING, 'booleanf' BOOLEAN, </span>
|
|
<span class="s"> 'shortf' SHORT, 'intf' INT, 'longf' LONG, 'floatf' FLOAT,'doublef' DOUBLE) using hbase OPTIONS (</span>
|
|
<span class="s"> 'ZKHost'='cloudtable-cf82-zk3-pa6HnHpf.cloudtable.com:2181,</span>
|
|
<span class="s"> cloudtable-cf82-zk2-weBkIrjI.cloudtable.com:2181,</span>
|
|
<span class="s"> cloudtable-cf82-zk1-WY09px9l.cloudtable.com:2181',</span>
|
|
<span class="s"> 'TableName'='table_DupRowkey1',</span>
|
|
<span class="s"> 'RowKey'='id:5,location:6,city:7',</span>
|
|
<span class="s"> 'Cols'='booleanf:CF1.booleanf,shortf:CF1.shortf,intf:CF1.intf,longf:CF1.longf,floatf:CF1.floatf,doublef:CF1.doublef')"</span><span class="p">)</span>
|
|
|
|
<span class="w"> </span><span class="c1">//*****************************DataFrame model***********************************</span>
|
|
<span class="w"> </span><span class="c1">// Setting schema</span>
|
|
<span class="w"> </span><span class="kd">val</span><span class="w"> </span><span class="n">attrId</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">new</span><span class="w"> </span><span class="nc">StructField</span><span class="p">(</span><span class="s">"id"</span><span class="p">,</span><span class="nc">StringType</span><span class="p">)</span>
|
|
<span class="w"> </span><span class="kd">val</span><span class="w"> </span><span class="n">location</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">new</span><span class="w"> </span><span class="nc">StructField</span><span class="p">(</span><span class="s">"location"</span><span class="p">,</span><span class="nc">StringType</span><span class="p">)</span>
|
|
<span class="w"> </span><span class="kd">val</span><span class="w"> </span><span class="n">city</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">new</span><span class="w"> </span><span class="nc">StructField</span><span class="p">(</span><span class="s">"city"</span><span class="p">,</span><span class="nc">StringType</span><span class="p">)</span>
|
|
<span class="w"> </span><span class="kd">val</span><span class="w"> </span><span class="n">booleanf</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">new</span><span class="w"> </span><span class="nc">StructField</span><span class="p">(</span><span class="s">"booleanf"</span><span class="p">,</span><span class="nc">BooleanType</span><span class="p">)</span>
|
|
<span class="w"> </span><span class="kd">val</span><span class="w"> </span><span class="n">shortf</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">new</span><span class="w"> </span><span class="nc">StructField</span><span class="p">(</span><span class="s">"shortf"</span><span class="p">,</span><span class="nc">ShortType</span><span class="p">)</span>
|
|
<span class="w"> </span><span class="kd">val</span><span class="w"> </span><span class="n">intf</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">new</span><span class="w"> </span><span class="nc">StructField</span><span class="p">(</span><span class="s">"intf"</span><span class="p">,</span><span class="nc">IntegerType</span><span class="p">)</span>
|
|
<span class="w"> </span><span class="kd">val</span><span class="w"> </span><span class="n">longf</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">new</span><span class="w"> </span><span class="nc">StructField</span><span class="p">(</span><span class="s">"longf"</span><span class="p">,</span><span class="nc">LongType</span><span class="p">)</span>
|
|
<span class="w"> </span><span class="kd">val</span><span class="w"> </span><span class="n">floatf</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">new</span><span class="w"> </span><span class="nc">StructField</span><span class="p">(</span><span class="s">"floatf"</span><span class="p">,</span><span class="nc">FloatType</span><span class="p">)</span>
|
|
<span class="w"> </span><span class="kd">val</span><span class="w"> </span><span class="n">doublef</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">new</span><span class="w"> </span><span class="nc">StructField</span><span class="p">(</span><span class="s">"doublef"</span><span class="p">,</span><span class="nc">DoubleType</span><span class="p">)</span>
|
|
<span class="w"> </span><span class="kd">val</span><span class="w"> </span><span class="n">attrs</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nc">Array</span><span class="p">(</span><span class="n">attrId</span><span class="p">,</span><span class="w"> </span><span class="n">location</span><span class="p">,</span><span class="n">city</span><span class="p">,</span><span class="n">booleanf</span><span class="p">,</span><span class="n">shortf</span><span class="p">,</span><span class="n">intf</span><span class="p">,</span><span class="n">longf</span><span class="p">,</span><span class="n">floatf</span><span class="p">,</span><span class="n">doublef</span><span class="p">)</span>
|
|
|
|
<span class="w"> </span><span class="c1">// Populate data according to the type of schema</span>
|
|
<span class="w"> </span><span class="kd">val</span><span class="w"> </span><span class="n">mutableRow</span><span class="p">:</span><span class="w"> </span><span class="nc">Seq</span><span class="p">[</span><span class="nc">Any</span><span class="p">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nc">Seq</span><span class="p">(</span><span class="s">"12345"</span><span class="p">,</span><span class="s">"abc"</span><span class="p">,</span><span class="s">"city1"</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="kc">null</span><span class="p">,</span><span class="mi">3</span><span class="p">,</span><span class="mi">23</span><span class="p">,</span><span class="mf">2.3</span><span class="p">,</span><span class="mf">2.34</span><span class="p">)</span>
|
|
<span class="w"> </span><span class="kd">val</span><span class="w"> </span><span class="n">rddData</span><span class="p">:</span><span class="w"> </span><span class="nc">RDD</span><span class="p">[</span><span class="nc">Row</span><span class="p">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">sparkSession</span><span class="p">.</span><span class="n">sparkContext</span><span class="p">.</span><span class="n">parallelize</span><span class="p">(</span><span class="nc">Array</span><span class="p">(</span><span class="nc">Row</span><span class="p">.</span><span class="n">fromSeq</span><span class="p">(</span><span class="n">mutableRow</span><span class="p">)),</span><span class="w"> </span><span class="mi">1</span><span class="p">)</span>
|
|
|
|
<span class="w"> </span><span class="c1">// Import the constructed data into Hbase</span>
|
|
<span class="w"> </span><span class="n">sparkSession</span><span class="p">.</span><span class="n">createDataFrame</span><span class="p">(</span><span class="n">rddData</span><span class="p">,</span><span class="w"> </span><span class="k">new</span><span class="w"> </span><span class="nc">StructType</span><span class="p">(</span><span class="n">attrs</span><span class="p">)).</span><span class="n">write</span><span class="p">.</span><span class="n">insertInto</span><span class="p">(</span><span class="s">"test_hbase"</span><span class="p">)</span>
|
|
|
|
<span class="w"> </span><span class="c1">// Read data on Hbase</span>
|
|
<span class="w"> </span><span class="kd">val</span><span class="w"> </span><span class="n">map</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">new</span><span class="w"> </span><span class="n">mutable</span><span class="p">.</span><span class="nc">HashMap</span><span class="p">[</span><span class="nc">String</span><span class="p">,</span><span class="w"> </span><span class="nc">String</span><span class="p">]()</span>
|
|
<span class="w"> </span><span class="n">map</span><span class="p">(</span><span class="s">"TableName"</span><span class="p">)</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">"table_DupRowkey1"</span>
|
|
<span class="w"> </span><span class="n">map</span><span class="p">(</span><span class="s">"RowKey"</span><span class="p">)</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">"id:5,location:6,city:7"</span>
|
|
<span class="w"> </span><span class="n">map</span><span class="p">(</span><span class="s">"Cols"</span><span class="p">)</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">"booleanf:CF1.booleanf,shortf:CF1.shortf,intf:CF1.intf,longf:CF1.longf,floatf:CF1.floatf,doublef:CF1.doublef"</span>
|
|
<span class="w"> </span><span class="n">map</span><span class="p">(</span><span class="s">"ZKHost"</span><span class="p">)</span><span class="o">=</span><span class="s">"cloudtable-cf82-zk3-pa6HnHpf.cloudtable.com:2181,</span>
|
|
<span class="s"> cloudtable-cf82-zk2-weBkIrjI.cloudtable.com:2181,</span>
|
|
<span class="s"> cloudtable-cf82-zk1-WY09px9l.cloudtable.com:2181"</span>
|
|
<span class="w"> </span><span class="n">sparkSession</span><span class="p">.</span><span class="n">read</span><span class="p">.</span><span class="n">schema</span><span class="p">(</span><span class="k">new</span><span class="w"> </span><span class="nc">StructType</span><span class="p">(</span><span class="n">attrs</span><span class="p">)).</span><span class="n">format</span><span class="p">(</span><span class="s">"hbase"</span><span class="p">).</span><span class="n">options</span><span class="p">(</span><span class="n">map</span><span class="p">.</span><span class="n">toMap</span><span class="p">).</span><span class="n">load</span><span class="p">().</span><span class="n">collect</span><span class="p">()</span>
|
|
|
|
<span class="w"> </span><span class="n">sparkSession</span><span class="p">.</span><span class="n">close</span><span class="p">()</span>
|
|
<span class="w"> </span><span class="p">}</span>
|
|
<span class="p">}</span>
|
|
</pre></div></td></tr></table></div>
|
|
|
|
</div>
|
|
</li></ul>
|
|
</div>
|
|
</div>
|
|
<div>
|
|
<div class="familylinks">
|
|
<div class="parentlink"><strong>Parent topic:</strong> <a href="dli_09_0077.html">Connecting to HBase</a></div>
|
|
</div>
|
|
</div>
|
|
|