forked from docs/doc-exports
Reviewed-by: Pruthi, Vineet <vineet.pruthi@t-systems.com> Co-authored-by: Hasko, Vladimir <vladimir.hasko@t-systems.com> Co-committed-by: Hasko, Vladimir <vladimir.hasko@t-systems.com>
760 lines
102 KiB
HTML
760 lines
102 KiB
HTML
<a name="dli_09_0061"></a><a name="dli_09_0061"></a>
|
|
|
|
<h1 class="topictitle1">Scala Example Code</h1>
|
|
<div id="body8662426"><div class="section" id="dli_09_0061__section1422611451282"><h4 class="sectiontitle">Prerequisites</h4><p id="dli_09_0061__p46151791993">A datasource connection has been created on the DLI management console. </p>
|
|
</div>
|
|
<div class="section" id="dli_09_0061__section206806558195"><h4 class="sectiontitle">CSS Non-Security Cluster</h4><ul id="dli_09_0061__ul17503611132012"><li id="dli_09_0061__li6503101132015">Development description<ul id="dli_09_0061__ul104834377201"><li id="dli_09_0061__li1984791742114">Constructing dependency information and creating a Spark session<ol id="dli_09_0061__en-us_topic_0190067468_ol10193636161115"><li id="dli_09_0061__en-us_topic_0190067468_li25522276128">Import dependencies.<div class="p" id="dli_09_0061__en-us_topic_0190067468_p562517472013"><a name="dli_09_0061__en-us_topic_0190067468_li25522276128"></a><a name="en-us_topic_0190067468_li25522276128"></a>Maven dependency<div class="codecoloring" codetype="Scala" id="dli_09_0061__en-us_topic_0190067468_screen5760163172012"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span>
|
|
<span class="normal">2</span>
|
|
<span class="normal">3</span>
|
|
<span class="normal">4</span>
|
|
<span class="normal">5</span></pre></div></td><td class="code"><div><pre><span></span><span class="o"><</span><span class="n">dependency</span><span class="o">></span>
|
|
<span class="w"> </span><span class="o"><</span><span class="n">groupId</span><span class="o">></span><span class="n">org</span><span class="p">.</span><span class="n">apache</span><span class="p">.</span><span class="n">spark</span><span class="o"></</span><span class="n">groupId</span><span class="o">></span>
|
|
<span class="w"> </span><span class="o"><</span><span class="n">artifactId</span><span class="o">></span><span class="n">spark</span><span class="o">-</span><span class="n">sql_2</span><span class="mf">.11</span><span class="o"></</span><span class="n">artifactId</span><span class="o">></span>
|
|
<span class="w"> </span><span class="o"><</span><span class="n">version</span><span class="o">></span><span class="mf">2.3.2</span><span class="o"></</span><span class="n">version</span><span class="o">></span>
|
|
<span class="o"></</span><span class="n">dependency</span><span class="o">></span>
|
|
</pre></div></td></tr></table></div>
|
|
|
|
</div>
|
|
</div>
|
|
<div class="p" id="dli_09_0061__en-us_topic_0190067468_p13761330205">Import dependency packages.<div class="codecoloring" codetype="Scala" id="dli_09_0061__en-us_topic_0190067468_screen1761153192016"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span>
|
|
<span class="normal">2</span></pre></div></td><td class="code"><div><pre><span></span><span class="k">import</span><span class="w"> </span><span class="nn">org</span><span class="p">.</span><span class="nn">apache</span><span class="p">.</span><span class="nn">spark</span><span class="p">.</span><span class="nn">sql</span><span class="p">.{</span><span class="nc">Row</span><span class="p">,</span><span class="w"> </span><span class="nc">SaveMode</span><span class="p">,</span><span class="w"> </span><span class="nc">SparkSession</span><span class="p">}</span>
|
|
<span class="k">import</span><span class="w"> </span><span class="nn">org</span><span class="p">.</span><span class="nn">apache</span><span class="p">.</span><span class="nn">spark</span><span class="p">.</span><span class="nn">sql</span><span class="p">.</span><span class="nn">types</span><span class="p">.{</span><span class="nc">IntegerType</span><span class="p">,</span><span class="w"> </span><span class="nc">StringType</span><span class="p">,</span><span class="w"> </span><span class="nc">StructField</span><span class="p">,</span><span class="w"> </span><span class="nc">StructType</span><span class="p">}</span>
|
|
</pre></div></td></tr></table></div>
|
|
|
|
</div>
|
|
</div>
|
|
</li><li id="dli_09_0061__en-us_topic_0190067468_li133002613132">Create a session.<div class="codecoloring" codetype="Scala" id="dli_09_0061__en-us_topic_0190067468_screen12232591413"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span></pre></div></td><td class="code"><div><pre><span></span><span class="kd">val</span><span class="w"> </span><span class="n">sparkSession</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nc">SparkSession</span><span class="p">.</span><span class="n">builder</span><span class="p">().</span><span class="n">getOrCreate</span><span class="p">()</span>
|
|
</pre></div></td></tr></table></div>
|
|
|
|
</div>
|
|
</li></ol>
|
|
</li><li id="dli_09_0061__li058012302229">Connecting to data sources through SQL APIs<ol id="dli_09_0061__en-us_topic_0190067468_ol7821439666"><li id="dli_09_0061__en-us_topic_0190067468_li1755222713127">Create a table to connect to a CSS data source.<div class="codecoloring" codetype="Scala" id="dli_09_0061__en-us_topic_0190067468_screen15140130103119"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span>
|
|
<span class="normal">2</span>
|
|
<span class="normal">3</span>
|
|
<span class="normal">4</span></pre></div></td><td class="code"><div><pre><span></span><span class="n">sparkSession</span><span class="p">.</span><span class="n">sql</span><span class="p">(</span><span class="s">"create table css_table(id int, name string) using css options(</span>
|
|
<span class="s"> 'es.nodes' 'to-css-1174404221-Y2bKVIqY.datasource.com:9200',</span>
|
|
<span class="s"> 'es.nodes.wan.only'='true',</span>
|
|
<span class="s"> 'resource' '/mytest/css')"</span><span class="p">)</span>
|
|
</pre></div></td></tr></table></div>
|
|
|
|
</div>
|
|
|
|
<div class="tablenoborder"><a name="dli_09_0061__en-us_topic_0190067468_table569314388144"></a><a name="en-us_topic_0190067468_table569314388144"></a><table cellpadding="4" cellspacing="0" summary="" id="dli_09_0061__en-us_topic_0190067468_table569314388144" frame="border" border="1" rules="all"><caption><b>Table 1 </b>Parameters for creating a table</caption><thead align="left"><tr id="dli_09_0061__en-us_topic_0190067468_row136916389144"><th align="left" class="cellrowborder" valign="top" width="14.530000000000001%" id="mcps1.3.2.2.1.1.2.1.1.2.2.3.1.1"><p id="dli_09_0061__en-us_topic_0190067468_p166911838111417">Parameter</p>
|
|
</th>
|
|
<th align="left" class="cellrowborder" valign="top" width="85.47%" id="mcps1.3.2.2.1.1.2.1.1.2.2.3.1.2"><p id="dli_09_0061__en-us_topic_0190067468_p186911238141416">Description</p>
|
|
</th>
|
|
</tr>
|
|
</thead>
|
|
<tbody><tr id="dli_09_0061__en-us_topic_0190067468_row1969243891413"><td class="cellrowborder" valign="top" width="14.530000000000001%" headers="mcps1.3.2.2.1.1.2.1.1.2.2.3.1.1 "><p id="dli_09_0061__en-us_topic_0190067468_p4691638191412">es.nodes</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="85.47%" headers="mcps1.3.2.2.1.1.2.1.1.2.2.3.1.2 "><p id="dli_09_0061__en-us_topic_0190067468_p4691238201416">CSS connection address. You need to create a datasource connection first. </p>
|
|
<p id="dli_09_0061__en-us_topic_0190067468_p8692133812143">If you have created an enhanced datasource connection, use the intranet IP address provided by CSS. The address format is <strong id="dli_09_0061__en-us_topic_0190067468_b46015209436"><em id="dli_09_0061__i3581320174314">IP1</em>:<em id="dli_09_0061__i1658152034316">PORT1</em>,<em id="dli_09_0061__i1359162064315">IP2</em>:<em id="dli_09_0061__i15916209433">PORT2</em></strong>.</p>
|
|
</td>
|
|
</tr>
|
|
<tr id="dli_09_0061__en-us_topic_0190067468_row166932384147"><td class="cellrowborder" valign="top" width="14.530000000000001%" headers="mcps1.3.2.2.1.1.2.1.1.2.2.3.1.1 "><p id="dli_09_0061__en-us_topic_0190067468_p56921138171417">resource</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="85.47%" headers="mcps1.3.2.2.1.1.2.1.1.2.2.3.1.2 "><p id="dli_09_0061__en-us_topic_0190067468_p1169216389144">Name of the resource for the CSS datasource connection name. You can use <strong id="dli_09_0061__en-us_topic_0190067468_b3725124014911">/index/type</strong> to specify the resource location (for easier understanding, the <strong id="dli_09_0061__en-us_topic_0190067468_b17723112912418">index</strong> may be seen as <strong id="dli_09_0061__en-us_topic_0190067468_b6724142934116">database</strong> and <strong id="dli_09_0061__en-us_topic_0190067468_b12725132912419">type</strong> as <strong id="dli_09_0061__en-us_topic_0190067468_b972819295412">table</strong>).</p>
|
|
<div class="note" id="dli_09_0061__en-us_topic_0190067468_note2975123311388"><span class="notetitle"> NOTE: </span><div class="notebody"><ul id="dli_09_0061__en-us_topic_0190067468_ul4143105815201"><li id="dli_09_0061__en-us_topic_0190067468_li1614316583201">In Elasticsearch 6.X, a single index supports only one type, and the type name can be customized.</li><li id="dli_09_0061__en-us_topic_0190067468_li3144558182013">In Elasticsearch 7.X, a single index uses <strong id="dli_09_0061__b1561693320111">_doc</strong> as the type name and cannot be customized. To access Elasticsearch 7.X, set this parameter to <strong id="dli_09_0061__b912443710114">index</strong>.</li></ul>
|
|
</div></div>
|
|
</td>
|
|
</tr>
|
|
<tr id="dli_09_0061__en-us_topic_0190067468_row469318381145"><td class="cellrowborder" valign="top" width="14.530000000000001%" headers="mcps1.3.2.2.1.1.2.1.1.2.2.3.1.1 "><p id="dli_09_0061__en-us_topic_0190067468_p166937386148">pushdown</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="85.47%" headers="mcps1.3.2.2.1.1.2.1.1.2.2.3.1.2 "><p id="dli_09_0061__en-us_topic_0190067468_p1569315383144">Whether to enable the pushdown function of CSS. The default value is <strong id="dli_09_0061__en-us_topic_0190067468_b52421720154413">true</strong>. For tables with a large number of I/O requests, the pushdown function help reduce I/O pressure when the <strong id="dli_09_0061__en-us_topic_0190067468_b1419312119506">where</strong> condition is specified.</p>
|
|
</td>
|
|
</tr>
|
|
<tr id="dli_09_0061__en-us_topic_0190067468_row8693153819143"><td class="cellrowborder" valign="top" width="14.530000000000001%" headers="mcps1.3.2.2.1.1.2.1.1.2.2.3.1.1 "><p id="dli_09_0061__en-us_topic_0190067468_p196931938101415">strict</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="85.47%" headers="mcps1.3.2.2.1.1.2.1.1.2.2.3.1.2 "><p id="dli_09_0061__en-us_topic_0190067468_p1569312384140">Whether the CSS pushdown is strict. The default value is <strong id="dli_09_0061__en-us_topic_0190067468_b1899934410564">false</strong>. The exact match function can reduce more I/O requests than pushdown.</p>
|
|
</td>
|
|
</tr>
|
|
<tr id="dli_09_0061__en-us_topic_0190067468_row186931387140"><td class="cellrowborder" valign="top" width="14.530000000000001%" headers="mcps1.3.2.2.1.1.2.1.1.2.2.3.1.1 "><p id="dli_09_0061__en-us_topic_0190067468_p1669313385145">batch.size.entries</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="85.47%" headers="mcps1.3.2.2.1.1.2.1.1.2.2.3.1.2 "><p id="dli_09_0061__en-us_topic_0190067468_p269323817147">Maximum number of entries that can be inserted in a batch. The default value is <strong id="dli_09_0061__en-us_topic_0190067468_b176162212579">1000</strong>. If the size of a single data record is so large that the number of data records in the bulk storage reaches the upper limit of the data amount in a single batch, the system stops storing data and submits the data based on the <strong id="dli_09_0061__en-us_topic_0190067468_b384217396573">batch.size.bytes</strong> parameter.</p>
|
|
</td>
|
|
</tr>
|
|
<tr id="dli_09_0061__en-us_topic_0190067468_row66937380147"><td class="cellrowborder" valign="top" width="14.530000000000001%" headers="mcps1.3.2.2.1.1.2.1.1.2.2.3.1.1 "><p id="dli_09_0061__en-us_topic_0190067468_p176933387148">batch.size.bytes</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="85.47%" headers="mcps1.3.2.2.1.1.2.1.1.2.2.3.1.2 "><p id="dli_09_0061__en-us_topic_0190067468_p9693163851419">Maximum amount of data in a single batch. The default value is <strong id="dli_09_0061__b185182041145918">1 MB</strong>. If the size of a single data record is so small that the number of data records in the bulk storage reaches the upper limit of the data amount of a single batch, the system stops storing data and submits the data based on the <strong id="dli_09_0061__en-us_topic_0190067468_b15553248165918">batch.size.entries</strong> parameter.</p>
|
|
</td>
|
|
</tr>
|
|
<tr id="dli_09_0061__en-us_topic_0190067468_row1769333861419"><td class="cellrowborder" valign="top" width="14.530000000000001%" headers="mcps1.3.2.2.1.1.2.1.1.2.2.3.1.1 "><p id="dli_09_0061__en-us_topic_0190067468_p26934384145">es.nodes.wan.only</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="85.47%" headers="mcps1.3.2.2.1.1.2.1.1.2.2.3.1.2 "><p id="dli_09_0061__en-us_topic_0190067468_p116931738181411">Whether to access the Elasticsearch node using only the domain name. The default value is <strong id="dli_09_0061__en-us_topic_0190067468_b94269131807">false</strong>. If the original internal IP address provided by CSS is used as the <strong id="dli_09_0061__b168044189197">es.nodes</strong>, you do not need to set this parameter or set it to <strong id="dli_09_0061__b58041918141913">false</strong>.</p>
|
|
</td>
|
|
</tr>
|
|
<tr id="dli_09_0061__en-us_topic_0190067468_row1969393891416"><td class="cellrowborder" valign="top" width="14.530000000000001%" headers="mcps1.3.2.2.1.1.2.1.1.2.2.3.1.1 "><p id="dli_09_0061__en-us_topic_0190067468_p569316388144">es.mapping.id</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="85.47%" headers="mcps1.3.2.2.1.1.2.1.1.2.2.3.1.2 "><p id="dli_09_0061__en-us_topic_0190067468_p15693163819147">Document field name that contains the document ID in the Elasticsearch node.</p>
|
|
<div class="note" id="dli_09_0061__en-us_topic_0190067468_note20720112316195"><span class="notetitle"> NOTE: </span><div class="notebody"><ul id="dli_09_0061__en-us_topic_0190067468_ul72517457221"><li id="dli_09_0061__en-us_topic_0190067468_li172518457227">The document ID in the same <strong id="dli_09_0061__en-us_topic_0190067468_b122258447433">/index/type</strong> is unique. If a field that contains a document ID has duplicate values, the document with the duplicate ID will be overwritten when the ES is inserted.</li><li id="dli_09_0061__en-us_topic_0190067468_li1525164572214">This feature can be used as a fault tolerance solution. When data is being inserted, the DLI job fails and some data has been inserted into Elasticsearch. The data is redundant. If the document ID is set, the previous data will be overwritten when the DLI job is executed again.</li></ul>
|
|
</div></div>
|
|
</td>
|
|
</tr>
|
|
</tbody>
|
|
</table>
|
|
</div>
|
|
<div class="note" id="dli_09_0061__en-us_topic_0190067468_note625154520182"><img src="public_sys-resources/note_3.0-en-us.png"><span class="notetitle"> </span><div class="notebody"><p id="dli_09_0061__en-us_topic_0190067468_p172521456189"><strong id="dli_09_0061__en-us_topic_0190067468_b3444034824">batch.size.entries</strong> and <strong id="dli_09_0061__en-us_topic_0190067468_b1944912349219">batch.size.bytes</strong> limit the number of data records and data volume respectively.</p>
|
|
</div></div>
|
|
</li><li id="dli_09_0061__en-us_topic_0190067468_li18313911617">Insert data.<div class="codecoloring" codetype="Scala" id="dli_09_0061__en-us_topic_0190067468_screen66292184333"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span></pre></div></td><td class="code"><div><pre><span></span><span class="n">sparkSession</span><span class="p">.</span><span class="n">sql</span><span class="p">(</span><span class="s">"insert into css_table values(13, 'John'),(22, 'Bob')"</span><span class="p">)</span>
|
|
</pre></div></td></tr></table></div>
|
|
|
|
</div>
|
|
</li><li id="dli_09_0061__en-us_topic_0190067468_li17401941177">Query data.<div class="codecoloring" codetype="Scala" id="dli_09_0061__en-us_topic_0190067468_screen8335528349"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span>
|
|
<span class="normal">2</span></pre></div></td><td class="code"><div><pre><span></span><span class="kd">val</span><span class="w"> </span><span class="n">dataFrame</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">sparkSession</span><span class="p">.</span><span class="n">sql</span><span class="p">(</span><span class="s">"select * from css_table"</span><span class="p">)</span>
|
|
<span class="n">dataFrame</span><span class="p">.</span><span class="n">show</span><span class="p">()</span>
|
|
</pre></div></td></tr></table></div>
|
|
|
|
</div>
|
|
<p id="dli_09_0061__en-us_topic_0190067468_p980612423518">Before data is inserted:</p>
|
|
<p id="dli_09_0061__en-us_topic_0190067468_p1111911912352"><span><img id="dli_09_0061__en-us_topic_0190067468_image16928173453517" src="en-us_image_0223997302.png"></span></p>
|
|
<p id="dli_09_0061__en-us_topic_0190067468_p1312675873519">Response:</p>
|
|
<p id="dli_09_0061__en-us_topic_0190067468_p186811539367"><span><img id="dli_09_0061__en-us_topic_0190067468_image091441010366" src="en-us_image_0223997303.png"></span></p>
|
|
</li><li id="dli_09_0061__en-us_topic_0190067468_li14681446870">Delete the datasource connection table.<div class="codecoloring" codetype="Scala" id="dli_09_0061__en-us_topic_0190067468_screen131211827184111"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span></pre></div></td><td class="code"><div><pre><span></span><span class="n">sparkSession</span><span class="p">.</span><span class="n">sql</span><span class="p">(</span><span class="s">"drop table css_table"</span><span class="p">)</span>
|
|
</pre></div></td></tr></table></div>
|
|
|
|
</div>
|
|
</li></ol>
|
|
</li><li id="dli_09_0061__li1982142020234">Connecting to data sources through DataFrame APIs<ol id="dli_09_0061__en-us_topic_0190067468_ol1565218286816"><li id="dli_09_0061__en-us_topic_0190067468_li17652112814813">Set connection parameters.<div class="codecoloring" codetype="Scala" id="dli_09_0061__en-us_topic_0190067468_screen20117532094"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span>
|
|
<span class="normal">2</span></pre></div></td><td class="code"><div><pre><span></span><span class="kd">val</span><span class="w"> </span><span class="n">resource</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">"/mytest/css"</span>
|
|
<span class="kd">val</span><span class="w"> </span><span class="n">nodes</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">"to-css-1174405013-Ht7O1tYf.datasource.com:9200"</span>
|
|
</pre></div></td></tr></table></div>
|
|
|
|
</div>
|
|
</li><li id="dli_09_0061__en-us_topic_0190067468_li17603210997">Create a schema and add data to it.<div class="codecoloring" codetype="Scala" id="dli_09_0061__en-us_topic_0190067468_screen1850883315920"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span>
|
|
<span class="normal">2</span></pre></div></td><td class="code"><div><pre><span></span><span class="kd">val</span><span class="w"> </span><span class="n">schema</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nc">StructType</span><span class="p">(</span><span class="nc">Seq</span><span class="p">(</span><span class="nc">StructField</span><span class="p">(</span><span class="s">"id"</span><span class="p">,</span><span class="w"> </span><span class="nc">IntegerType</span><span class="p">,</span><span class="w"> </span><span class="kc">false</span><span class="p">),</span><span class="w"> </span><span class="nc">StructField</span><span class="p">(</span><span class="s">"name"</span><span class="p">,</span><span class="w"> </span><span class="nc">StringType</span><span class="p">,</span><span class="w"> </span><span class="kc">false</span><span class="p">)))</span>
|
|
<span class="kd">val</span><span class="w"> </span><span class="n">rdd</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">sparkSession</span><span class="p">.</span><span class="n">sparkContext</span><span class="p">.</span><span class="n">parallelize</span><span class="p">(</span><span class="nc">Seq</span><span class="p">(</span><span class="nc">Row</span><span class="p">(</span><span class="mi">12</span><span class="p">,</span><span class="w"> </span><span class="s">"John"</span><span class="p">),</span><span class="nc">Row</span><span class="p">(</span><span class="mi">21</span><span class="p">,</span><span class="s">"Bob"</span><span class="p">)))</span>
|
|
</pre></div></td></tr></table></div>
|
|
|
|
</div>
|
|
</li><li id="dli_09_0061__en-us_topic_0190067468_li54123813917">Import data to CSS.<div class="codecoloring" codetype="Scala" id="dli_09_0061__en-us_topic_0190067468_screen3727448564"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span>
|
|
<span class="normal">2</span>
|
|
<span class="normal">3</span>
|
|
<span class="normal">4</span>
|
|
<span class="normal">5</span>
|
|
<span class="normal">6</span>
|
|
<span class="normal">7</span></pre></div></td><td class="code"><div><pre><span></span><span class="kd">val</span><span class="w"> </span><span class="n">dataFrame_1</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">sparkSession</span><span class="p">.</span><span class="n">createDataFrame</span><span class="p">(</span><span class="n">rdd</span><span class="p">,</span><span class="w"> </span><span class="n">schema</span><span class="p">)</span>
|
|
<span class="n">dataFrame_1</span><span class="p">.</span><span class="n">write</span><span class="w"> </span>
|
|
<span class="w"> </span><span class="p">.</span><span class="n">format</span><span class="p">(</span><span class="s">"css"</span><span class="p">)</span><span class="w"> </span>
|
|
<span class="w"> </span><span class="p">.</span><span class="n">option</span><span class="p">(</span><span class="s">"resource"</span><span class="p">,</span><span class="w"> </span><span class="n">resource</span><span class="p">)</span><span class="w"> </span>
|
|
<span class="w"> </span><span class="p">.</span><span class="n">option</span><span class="p">(</span><span class="s">"es.nodes"</span><span class="p">,</span><span class="w"> </span><span class="n">nodes</span><span class="p">)</span><span class="w"> </span>
|
|
<span class="w"> </span><span class="p">.</span><span class="n">mode</span><span class="p">(</span><span class="nc">SaveMode</span><span class="p">.</span><span class="nc">Append</span><span class="p">)</span><span class="w"> </span>
|
|
<span class="w"> </span><span class="p">.</span><span class="n">save</span><span class="p">()</span>
|
|
</pre></div></td></tr></table></div>
|
|
|
|
</div>
|
|
<div class="note" id="dli_09_0061__en-us_topic_0190067468_note17397174817568"><img src="public_sys-resources/note_3.0-en-us.png"><span class="notetitle"> </span><div class="notebody"><p id="dli_09_0061__en-us_topic_0190067468_p039712487568">The value of <strong id="dli_09_0061__en-us_topic_0190067468_b214525915511">SaveMode</strong> can be one of the following:</p>
|
|
<ul id="dli_09_0061__en-us_topic_0190067468_ul1929273321915"><li id="dli_09_0061__en-us_topic_0190067468_li8292633161916"><strong id="dli_09_0061__b1188293522710">ErrorIfExis</strong>: If the data already exists, the system throws an exception.</li><li id="dli_09_0061__en-us_topic_0190067468_li1229213391913"><strong id="dli_09_0061__b13984103710277">Overwrite</strong>: If the data already exists, the original data will be overwritten.</li><li id="dli_09_0061__en-us_topic_0190067468_li7292833201912"><strong id="dli_09_0061__b468103916277">Append</strong>: If the data already exists, the system saves the new data.</li><li id="dli_09_0061__en-us_topic_0190067468_li1029353311911"><strong id="dli_09_0061__b37271641182713">Ignore</strong>: If the data already exists, no operation is required. This is similar to the SQL statement <strong id="dli_09_0061__en-us_topic_0190067468_b18671029565">CREATE TABLE IF NOT EXISTS</strong>.</li></ul>
|
|
</div></div>
|
|
</li><li id="dli_09_0061__en-us_topic_0190067468_li1138464913105">Read data from CSS.<div class="codecoloring" codetype="Scala" id="dli_09_0061__en-us_topic_0190067468_screen5984155015578"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span>
|
|
<span class="normal">2</span></pre></div></td><td class="code"><div><pre><span></span><span class="kd">val</span><span class="w"> </span><span class="n">dataFrameR</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">sparkSession</span><span class="p">.</span><span class="n">read</span><span class="p">.</span><span class="n">format</span><span class="p">(</span><span class="s">"css"</span><span class="p">).</span><span class="n">option</span><span class="p">(</span><span class="s">"resource"</span><span class="p">,</span><span class="n">resource</span><span class="p">).</span><span class="n">option</span><span class="p">(</span><span class="s">"es.nodes"</span><span class="p">,</span><span class="w"> </span><span class="n">nodes</span><span class="p">).</span><span class="n">load</span><span class="p">()</span>
|
|
<span class="n">dataFrameR</span><span class="p">.</span><span class="n">show</span><span class="p">()</span>
|
|
</pre></div></td></tr></table></div>
|
|
|
|
</div>
|
|
<p id="dli_09_0061__en-us_topic_0190067468_p1024062717116">Before data is inserted:</p>
|
|
<p id="dli_09_0061__en-us_topic_0190067468_p14240162712111"><span><img id="dli_09_0061__en-us_topic_0190067468_image1466859593" src="en-us_image_0223997304.png"></span></p>
|
|
<p id="dli_09_0061__en-us_topic_0190067468_p4240192771110">Response:</p>
|
|
<p id="dli_09_0061__en-us_topic_0190067468_p72400279119"><span><img id="dli_09_0061__en-us_topic_0190067468_image18991013105914" src="en-us_image_0223997305.png"></span></p>
|
|
</li></ol>
|
|
</li><li id="dli_09_0061__li16902621258">Submitting a Spark job<ol id="dli_09_0061__ol163941462277"><li id="dli_09_0061__li1692416144334">Generate a JAR package based on the code and upload the package to DLI.<p id="dli_09_0061__dli_09_0063_p1749619513385"><a name="dli_09_0061__li1692416144334"></a><a name="li1692416144334"></a></p>
|
|
<p id="dli_09_0061__dli_09_0063_p114961151385"></p>
|
|
</li><li id="dli_09_0061__li79179367183">In the Spark job editor, select the corresponding dependency module and execute the Spark job.<p id="dli_09_0061__p86611641111819"><a name="dli_09_0061__li79179367183"></a><a name="li79179367183"></a></p>
|
|
<p id="dli_09_0061__p1936184518188"></p>
|
|
<div class="p" id="dli_09_0061__p76009382184"><div class="note" id="dli_09_0061__en-us_topic_0190067468_note1435543551919"><img src="public_sys-resources/note_3.0-en-us.png"><span class="notetitle"> </span><div class="notebody"><ul id="dli_09_0061__en-us_topic_0190067468_ul17825285811"><li id="dli_09_0061__en-us_topic_0197738142_li58215295819">If the Spark version is 2.3.2 (will be offline soon) or 2.4.5, specify the <strong id="dli_09_0061__b5116734175417">Module</strong> to <strong id="dli_09_0061__b4326163019533">sys.datasource.css</strong> when you submit a job.</li><li id="dli_09_0061__li6624653171317">If the Spark version is 3.1.1, you do not need to select a module. Configure <strong id="dli_09_0061__b659100115413">Spark parameters (--conf)</strong>.<p id="dli_09_0061__p13361102416273">spark.driver.extraClassPath=/usr/share/extension/dli/spark-jar/datasource/css/*</p>
|
|
<p id="dli_09_0061__p123611724162718">spark.executor.extraClassPath=/usr/share/extension/dli/spark-jar/datasource/css/*</p>
|
|
</li></ul>
|
|
</div></div>
|
|
</div>
|
|
</li></ol>
|
|
</li></ul>
|
|
</li><li id="dli_09_0061__li2402164019284">Complete example code<ul id="dli_09_0061__ul28199335392"><li id="dli_09_0061__li1586211296397">Maven dependency<div class="codecoloring" codetype="Scala" id="dli_09_0061__en-us_topic_0190067467_screen63558118176"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span>
|
|
<span class="normal">2</span>
|
|
<span class="normal">3</span>
|
|
<span class="normal">4</span>
|
|
<span class="normal">5</span></pre></div></td><td class="code"><div><pre><span></span><span class="o"><</span><span class="n">dependency</span><span class="o">></span>
|
|
<span class="w"> </span><span class="o"><</span><span class="n">groupId</span><span class="o">></span><span class="n">org</span><span class="p">.</span><span class="n">apache</span><span class="p">.</span><span class="n">spark</span><span class="o"></</span><span class="n">groupId</span><span class="o">></span>
|
|
<span class="w"> </span><span class="o"><</span><span class="n">artifactId</span><span class="o">></span><span class="n">spark</span><span class="o">-</span><span class="n">sql_2</span><span class="mf">.11</span><span class="o"></</span><span class="n">artifactId</span><span class="o">></span>
|
|
<span class="w"> </span><span class="o"><</span><span class="n">version</span><span class="o">></span><span class="mf">2.3.2</span><span class="o"></</span><span class="n">version</span><span class="o">></span>
|
|
<span class="o"></</span><span class="n">dependency</span><span class="o">></span>
|
|
</pre></div></td></tr></table></div>
|
|
|
|
</div>
|
|
</li><li id="dli_09_0061__li13507142864010">Connecting to data sources through SQL APIs<div class="codecoloring" codetype="Scala" id="dli_09_0061__en-us_topic_0190067467_screen172412256524"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal"> 1</span>
|
|
<span class="normal"> 2</span>
|
|
<span class="normal"> 3</span>
|
|
<span class="normal"> 4</span>
|
|
<span class="normal"> 5</span>
|
|
<span class="normal"> 6</span>
|
|
<span class="normal"> 7</span>
|
|
<span class="normal"> 8</span>
|
|
<span class="normal"> 9</span>
|
|
<span class="normal">10</span>
|
|
<span class="normal">11</span>
|
|
<span class="normal">12</span>
|
|
<span class="normal">13</span>
|
|
<span class="normal">14</span>
|
|
<span class="normal">15</span>
|
|
<span class="normal">16</span>
|
|
<span class="normal">17</span>
|
|
<span class="normal">18</span>
|
|
<span class="normal">19</span>
|
|
<span class="normal">20</span>
|
|
<span class="normal">21</span>
|
|
<span class="normal">22</span>
|
|
<span class="normal">23</span>
|
|
<span class="normal">24</span>
|
|
<span class="normal">25</span>
|
|
<span class="normal">26</span>
|
|
<span class="normal">27</span></pre></div></td><td class="code"><div><pre><span></span><span class="k">import</span><span class="w"> </span><span class="nn">org</span><span class="p">.</span><span class="nn">apache</span><span class="p">.</span><span class="nn">spark</span><span class="p">.</span><span class="nn">sql</span><span class="p">.</span><span class="nc">SparkSession</span>
|
|
|
|
<span class="k">object</span><span class="w"> </span><span class="nc">Test_SQL_CSS</span><span class="w"> </span><span class="p">{</span>
|
|
<span class="w"> </span><span class="k">def</span><span class="w"> </span><span class="nf">main</span><span class="p">(</span><span class="n">args</span><span class="p">:</span><span class="w"> </span><span class="nc">Array</span><span class="p">[</span><span class="nc">String</span><span class="p">]):</span><span class="w"> </span><span class="nc">Unit</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{</span>
|
|
<span class="w"> </span><span class="c1">// Create a SparkSession session.</span>
|
|
<span class="w"> </span><span class="kd">val</span><span class="w"> </span><span class="n">sparkSession</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nc">SparkSession</span><span class="p">.</span><span class="n">builder</span><span class="p">().</span><span class="n">getOrCreate</span><span class="p">()</span>
|
|
|
|
<span class="w"> </span><span class="c1">// Create a DLI data table for DLI-associated CSS</span>
|
|
<span class="w"> </span><span class="n">sparkSession</span><span class="p">.</span><span class="n">sql</span><span class="p">(</span><span class="s">"create table css_table(id long, name string) using css options(</span>
|
|
<span class="s"> 'es.nodes' = 'to-css-1174404217-QG2SwbVV.datasource.com:9200',</span>
|
|
<span class="s"> 'es.nodes.wan.only' = 'true',</span>
|
|
<span class="s"> 'resource' = '/mytest/css')"</span><span class="p">)</span>
|
|
|
|
<span class="w"> </span><span class="c1">//*****************************SQL model***********************************</span>
|
|
<span class="w"> </span><span class="c1">// Insert data into the DLI data table</span>
|
|
<span class="w"> </span><span class="n">sparkSession</span><span class="p">.</span><span class="n">sql</span><span class="p">(</span><span class="s">"insert into css_table values(13, 'John'),(22, 'Bob')"</span><span class="p">)</span>
|
|
<span class="w"> </span>
|
|
<span class="w"> </span><span class="c1">// Read data from DLI data table</span>
|
|
<span class="w"> </span><span class="kd">val</span><span class="w"> </span><span class="n">dataFrame</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">sparkSession</span><span class="p">.</span><span class="n">sql</span><span class="p">(</span><span class="s">"select * from css_table"</span><span class="p">)</span>
|
|
<span class="w"> </span><span class="n">dataFrame</span><span class="p">.</span><span class="n">show</span><span class="p">()</span>
|
|
<span class="w"> </span>
|
|
<span class="w"> </span><span class="c1">// drop table</span>
|
|
<span class="w"> </span><span class="n">sparkSession</span><span class="p">.</span><span class="n">sql</span><span class="p">(</span><span class="s">"drop table css_table"</span><span class="p">)</span>
|
|
|
|
<span class="w"> </span><span class="n">sparkSession</span><span class="p">.</span><span class="n">close</span><span class="p">()</span>
|
|
<span class="w"> </span><span class="p">}</span>
|
|
<span class="p">}</span>
|
|
</pre></div></td></tr></table></div>
|
|
|
|
</div>
|
|
</li><li id="dli_09_0061__li14772323164416">Connecting to data sources through DataFrame APIs<div class="codecoloring" codetype="Scala" id="dli_09_0061__en-us_topic_0190067467_screen1058825311184"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal"> 1</span>
|
|
<span class="normal"> 2</span>
|
|
<span class="normal"> 3</span>
|
|
<span class="normal"> 4</span>
|
|
<span class="normal"> 5</span>
|
|
<span class="normal"> 6</span>
|
|
<span class="normal"> 7</span>
|
|
<span class="normal"> 8</span>
|
|
<span class="normal"> 9</span>
|
|
<span class="normal">10</span>
|
|
<span class="normal">11</span>
|
|
<span class="normal">12</span>
|
|
<span class="normal">13</span>
|
|
<span class="normal">14</span>
|
|
<span class="normal">15</span>
|
|
<span class="normal">16</span>
|
|
<span class="normal">17</span>
|
|
<span class="normal">18</span>
|
|
<span class="normal">19</span>
|
|
<span class="normal">20</span>
|
|
<span class="normal">21</span>
|
|
<span class="normal">22</span>
|
|
<span class="normal">23</span>
|
|
<span class="normal">24</span>
|
|
<span class="normal">25</span>
|
|
<span class="normal">26</span>
|
|
<span class="normal">27</span>
|
|
<span class="normal">28</span>
|
|
<span class="normal">29</span>
|
|
<span class="normal">30</span>
|
|
<span class="normal">31</span>
|
|
<span class="normal">32</span>
|
|
<span class="normal">33</span>
|
|
<span class="normal">34</span>
|
|
<span class="normal">35</span>
|
|
<span class="normal">36</span>
|
|
<span class="normal">37</span>
|
|
<span class="normal">38</span></pre></div></td><td class="code"><div><pre><span></span><span class="k">import</span><span class="w"> </span><span class="nn">org</span><span class="p">.</span><span class="nn">apache</span><span class="p">.</span><span class="nn">spark</span><span class="p">.</span><span class="nn">sql</span><span class="p">.{</span><span class="nc">Row</span><span class="p">,</span><span class="w"> </span><span class="nc">SaveMode</span><span class="p">,</span><span class="w"> </span><span class="nc">SparkSession</span><span class="p">}</span><span class="err">;</span>
|
|
<span class="k">import</span><span class="w"> </span><span class="nn">org</span><span class="p">.</span><span class="nn">apache</span><span class="p">.</span><span class="nn">spark</span><span class="p">.</span><span class="nn">sql</span><span class="p">.</span><span class="nn">types</span><span class="p">.{</span><span class="nc">IntegerType</span><span class="p">,</span><span class="w"> </span><span class="nc">StringType</span><span class="p">,</span><span class="w"> </span><span class="nc">StructField</span><span class="p">,</span><span class="w"> </span><span class="nc">StructType</span><span class="p">}</span><span class="err">;</span>
|
|
|
|
<span class="k">object</span><span class="w"> </span><span class="nc">Test_SQL_CSS</span><span class="w"> </span><span class="p">{</span>
|
|
<span class="w"> </span><span class="k">def</span><span class="w"> </span><span class="nf">main</span><span class="p">(</span><span class="n">args</span><span class="p">:</span><span class="w"> </span><span class="nc">Array</span><span class="p">[</span><span class="nc">String</span><span class="p">]):</span><span class="w"> </span><span class="nc">Unit</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{</span>
|
|
<span class="w"> </span><span class="c1">//Create a SparkSession session.</span>
|
|
<span class="w"> </span><span class="kd">val</span><span class="w"> </span><span class="n">sparkSession</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nc">SparkSession</span><span class="p">.</span><span class="n">builder</span><span class="p">().</span><span class="n">getOrCreate</span><span class="p">()</span>
|
|
|
|
<span class="w"> </span><span class="c1">//*****************************DataFrame model***********************************</span>
|
|
<span class="w"> </span><span class="c1">// Setting the /index/type of CSS</span>
|
|
<span class="w"> </span><span class="kd">val</span><span class="w"> </span><span class="n">resource</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">"/mytest/css"</span>
|
|
<span class="w"> </span>
|
|
<span class="w"> </span><span class="c1">// Define the cross-origin connection address of the CSS cluster</span>
|
|
<span class="w"> </span><span class="kd">val</span><span class="w"> </span><span class="n">nodes</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">"to-css-1174405013-Ht7O1tYf.datasource.com:9200"</span>
|
|
|
|
<span class="w"> </span><span class="c1">//Setting schema</span>
|
|
<span class="w"> </span><span class="kd">val</span><span class="w"> </span><span class="n">schema</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nc">StructType</span><span class="p">(</span><span class="nc">Seq</span><span class="p">(</span><span class="nc">StructField</span><span class="p">(</span><span class="s">"id"</span><span class="p">,</span><span class="w"> </span><span class="nc">IntegerType</span><span class="p">,</span><span class="w"> </span><span class="kc">false</span><span class="p">),</span><span class="w"> </span><span class="nc">StructField</span><span class="p">(</span><span class="s">"name"</span><span class="p">,</span><span class="w"> </span><span class="nc">StringType</span><span class="p">,</span><span class="w"> </span><span class="kc">false</span><span class="p">)))</span>
|
|
<span class="w"> </span>
|
|
<span class="w"> </span><span class="c1">// Construction data</span>
|
|
<span class="w"> </span><span class="kd">val</span><span class="w"> </span><span class="n">rdd</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">sparkSession</span><span class="p">.</span><span class="n">sparkContext</span><span class="p">.</span><span class="n">parallelize</span><span class="p">(</span><span class="nc">Seq</span><span class="p">(</span><span class="nc">Row</span><span class="p">(</span><span class="mi">12</span><span class="p">,</span><span class="w"> </span><span class="s">"John"</span><span class="p">),</span><span class="nc">Row</span><span class="p">(</span><span class="mi">21</span><span class="p">,</span><span class="s">"Bob"</span><span class="p">)))</span>
|
|
<span class="w"> </span>
|
|
<span class="w"> </span><span class="c1">// Create a DataFrame from RDD and schema</span>
|
|
<span class="w"> </span><span class="kd">val</span><span class="w"> </span><span class="n">dataFrame_1</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">sparkSession</span><span class="p">.</span><span class="n">createDataFrame</span><span class="p">(</span><span class="n">rdd</span><span class="p">,</span><span class="w"> </span><span class="n">schema</span><span class="p">)</span>
|
|
<span class="w"> </span>
|
|
<span class="w"> </span><span class="c1">//Write data to the CSS</span>
|
|
<span class="w"> </span><span class="n">dataFrame_1</span><span class="p">.</span><span class="n">write</span><span class="p">.</span><span class="n">format</span><span class="p">(</span><span class="s">"css"</span><span class="p">)</span><span class="w"> </span>
|
|
<span class="w"> </span><span class="p">.</span><span class="n">option</span><span class="p">(</span><span class="s">"resource"</span><span class="p">,</span><span class="w"> </span><span class="n">resource</span><span class="p">)</span><span class="w"> </span>
|
|
<span class="w"> </span><span class="p">.</span><span class="n">option</span><span class="p">(</span><span class="s">"es.nodes"</span><span class="p">,</span><span class="w"> </span><span class="n">nodes</span><span class="p">)</span><span class="w"> </span>
|
|
<span class="w"> </span><span class="p">.</span><span class="n">mode</span><span class="p">(</span><span class="nc">SaveMode</span><span class="p">.</span><span class="nc">Append</span><span class="p">)</span><span class="w"> </span>
|
|
<span class="w"> </span><span class="p">.</span><span class="n">save</span><span class="p">()</span>
|
|
<span class="w"> </span>
|
|
<span class="w"> </span><span class="c1">//Read data</span>
|
|
<span class="w"> </span><span class="kd">val</span><span class="w"> </span><span class="n">dataFrameR</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">sparkSession</span><span class="p">.</span><span class="n">read</span><span class="p">.</span><span class="n">format</span><span class="p">(</span><span class="s">"css"</span><span class="p">).</span><span class="n">option</span><span class="p">(</span><span class="s">"resource"</span><span class="p">,</span><span class="w"> </span><span class="n">resource</span><span class="p">).</span><span class="n">option</span><span class="p">(</span><span class="s">"es.nodes"</span><span class="p">,</span><span class="w"> </span><span class="n">nodes</span><span class="p">).</span><span class="n">load</span><span class="p">()</span>
|
|
<span class="w"> </span><span class="n">dataFrameR</span><span class="p">.</span><span class="n">show</span><span class="p">()</span>
|
|
|
|
<span class="w"> </span><span class="n">spardSession</span><span class="p">.</span><span class="n">close</span><span class="p">()</span>
|
|
<span class="w"> </span><span class="p">}</span>
|
|
<span class="p">}</span>
|
|
</pre></div></td></tr></table></div>
|
|
|
|
</div>
|
|
</li></ul>
|
|
</li></ul>
|
|
</div>
|
|
<div class="section" id="dli_09_0061__section68222033184514"><h4 class="sectiontitle">CSS Security Cluster</h4><ul id="dli_09_0061__ul31921632124710"><li id="dli_09_0061__li1625821975314">Development description<ul id="dli_09_0061__ul11173185045517"><li id="dli_09_0061__li195264613574">Constructing dependency information and creating a Spark session<ol id="dli_09_0061__en-us_topic_0199537136_ol10193636161115"><li id="dli_09_0061__en-us_topic_0199537136_li25522276128">Import dependencies.<div class="p" id="dli_09_0061__en-us_topic_0199537136_p562517472013"><a name="dli_09_0061__en-us_topic_0199537136_li25522276128"></a><a name="en-us_topic_0199537136_li25522276128"></a>Maven dependency<div class="codecoloring" codetype="Scala" id="dli_09_0061__en-us_topic_0199537136_screen5760163172012"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span>
|
|
<span class="normal">2</span>
|
|
<span class="normal">3</span>
|
|
<span class="normal">4</span>
|
|
<span class="normal">5</span></pre></div></td><td class="code"><div><pre><span></span><span class="o"><</span><span class="n">dependency</span><span class="o">></span>
|
|
<span class="w"> </span><span class="o"><</span><span class="n">groupId</span><span class="o">></span><span class="n">org</span><span class="p">.</span><span class="n">apache</span><span class="p">.</span><span class="n">spark</span><span class="o"></</span><span class="n">groupId</span><span class="o">></span>
|
|
<span class="w"> </span><span class="o"><</span><span class="n">artifactId</span><span class="o">></span><span class="n">spark</span><span class="o">-</span><span class="n">sql_2</span><span class="mf">.11</span><span class="o"></</span><span class="n">artifactId</span><span class="o">></span>
|
|
<span class="w"> </span><span class="o"><</span><span class="n">version</span><span class="o">></span><span class="mf">2.3.2</span><span class="o"></</span><span class="n">version</span><span class="o">></span>
|
|
<span class="o"></</span><span class="n">dependency</span><span class="o">></span>
|
|
</pre></div></td></tr></table></div>
|
|
|
|
</div>
|
|
</div>
|
|
<div class="p" id="dli_09_0061__en-us_topic_0199537136_p13761330205">Import dependency packages.<div class="codecoloring" codetype="Scala" id="dli_09_0061__en-us_topic_0199537136_screen1761153192016"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span>
|
|
<span class="normal">2</span></pre></div></td><td class="code"><div><pre><span></span><span class="k">import</span><span class="w"> </span><span class="nn">org</span><span class="p">.</span><span class="nn">apache</span><span class="p">.</span><span class="nn">spark</span><span class="p">.</span><span class="nn">sql</span><span class="p">.{</span><span class="nc">Row</span><span class="p">,</span><span class="w"> </span><span class="nc">SaveMode</span><span class="p">,</span><span class="w"> </span><span class="nc">SparkSession</span><span class="p">}</span>
|
|
<span class="k">import</span><span class="w"> </span><span class="nn">org</span><span class="p">.</span><span class="nn">apache</span><span class="p">.</span><span class="nn">spark</span><span class="p">.</span><span class="nn">sql</span><span class="p">.</span><span class="nn">types</span><span class="p">.{</span><span class="nc">IntegerType</span><span class="p">,</span><span class="w"> </span><span class="nc">StringType</span><span class="p">,</span><span class="w"> </span><span class="nc">StructField</span><span class="p">,</span><span class="w"> </span><span class="nc">StructType</span><span class="p">}</span>
|
|
</pre></div></td></tr></table></div>
|
|
|
|
</div>
|
|
</div>
|
|
</li><li id="dli_09_0061__en-us_topic_0199537136_li133002613132">Create a session and set the AKs and SKs.<div class="note" id="dli_09_0061__note84662050195919"><img src="public_sys-resources/note_3.0-en-us.png"><span class="notetitle"> </span><div class="notebody"><p id="dli_09_0061__p1140569541">Hard-coded or plaintext AK and SK pose significant security risks. To ensure security, encrypt your AK and SK, store them in configuration files or environment variables, and decrypt them when needed.</p>
|
|
</div></div>
|
|
<div class="codecoloring" codetype="Scala" id="dli_09_0061__en-us_topic_0199537136_screen12232591413"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span>
|
|
<span class="normal">2</span>
|
|
<span class="normal">3</span>
|
|
<span class="normal">4</span>
|
|
<span class="normal">5</span></pre></div></td><td class="code"><div><pre><span></span><span class="kd">val</span><span class="w"> </span><span class="n">sparkSession</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nc">SparkSession</span><span class="p">.</span><span class="n">builder</span><span class="p">().</span><span class="n">getOrCreate</span><span class="p">()</span>
|
|
<span class="n">sparkSession</span><span class="p">.</span><span class="n">conf</span><span class="p">.</span><span class="n">set</span><span class="p">(</span><span class="s">"fs.obs.access.key"</span><span class="p">,</span><span class="w"> </span><span class="n">ak</span><span class="p">)</span>
|
|
<span class="n">sparkSession</span><span class="p">.</span><span class="n">conf</span><span class="p">.</span><span class="n">set</span><span class="p">(</span><span class="s">"fs.obs.secret.key"</span><span class="p">,</span><span class="w"> </span><span class="n">sk</span><span class="p">)</span>
|
|
<span class="n">sparkSession</span><span class="p">.</span><span class="n">conf</span><span class="p">.</span><span class="n">set</span><span class="p">(</span><span class="s">"fs.obs.endpoint"</span><span class="p">,</span><span class="w"> </span><span class="n">enpoint</span><span class="p">)</span>
|
|
<span class="n">sparkSession</span><span class="p">.</span><span class="n">conf</span><span class="p">.</span><span class="n">set</span><span class="p">(</span><span class="s">"fs.obs.connecton.ssl.enabled"</span><span class="p">,</span><span class="w"> </span><span class="s">"false"</span><span class="p">)</span>
|
|
</pre></div></td></tr></table></div>
|
|
|
|
</div>
|
|
</li></ol>
|
|
</li><li id="dli_09_0061__li192717614017">Connecting to data sources through SQL APIs<ol id="dli_09_0061__en-us_topic_0199537136_ol7821439666"><li id="dli_09_0061__en-us_topic_0199537136_li1755222713127">Create a table to connect to a CSS data source.<div class="codecoloring" codetype="Scala" id="dli_09_0061__en-us_topic_0199537136_screen15140130103119"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal"> 1</span>
|
|
<span class="normal"> 2</span>
|
|
<span class="normal"> 3</span>
|
|
<span class="normal"> 4</span>
|
|
<span class="normal"> 5</span>
|
|
<span class="normal"> 6</span>
|
|
<span class="normal"> 7</span>
|
|
<span class="normal"> 8</span>
|
|
<span class="normal"> 9</span>
|
|
<span class="normal">10</span>
|
|
<span class="normal">11</span></pre></div></td><td class="code"><div><pre><span></span><span class="n">sparkSession</span><span class="p">.</span><span class="n">sql</span><span class="p">(</span><span class="s">"create table css_table(id int, name string) using css options(</span>
|
|
<span class="s"> 'es.nodes' 'to-css-1174404221-Y2bKVIqY.datasource.com:9200',</span>
|
|
<span class="s"> 'es.nodes.wan.only'='true',</span>
|
|
<span class="s"> 'resource'='/mytest/css',</span>
|
|
<span class="s"> 'es.net.ssl'='true',</span>
|
|
<span class="s"> 'es.net.ssl.keystore.location'='obs://Bucket name/path/transport-keystore.jks',</span>
|
|
<span class="s"> 'es.net.ssl.keystore.pass'='***',</span>
|
|
<span class="s"> 'es.net.ssl.truststore.location'='obs://Bucket name/path/truststore.jks',</span>
|
|
<span class="s"> 'es.net.ssl.truststore.pass'='***',</span>
|
|
<span class="s"> 'es.net.http.auth.user'='admin',</span>
|
|
<span class="s"> 'es.net.http.auth.pass'='***')"</span><span class="p">)</span>
|
|
</pre></div></td></tr></table></div>
|
|
|
|
</div>
|
|
|
|
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="dli_09_0061__en-us_topic_0199537136_table569314388144" frame="border" border="1" rules="all"><caption><b>Table 2 </b>Parameters for creating a table</caption><thead align="left"><tr id="dli_09_0061__en-us_topic_0199537136_row136916389144"><th align="left" class="cellrowborder" valign="top" width="20.61%" id="mcps1.3.3.2.1.1.2.1.1.2.2.3.1.1"><p id="dli_09_0061__en-us_topic_0199537136_p166911838111417">Parameter</p>
|
|
</th>
|
|
<th align="left" class="cellrowborder" valign="top" width="79.39%" id="mcps1.3.3.2.1.1.2.1.1.2.2.3.1.2"><p id="dli_09_0061__en-us_topic_0199537136_p186911238141416">Description</p>
|
|
</th>
|
|
</tr>
|
|
</thead>
|
|
<tbody><tr id="dli_09_0061__en-us_topic_0199537136_row1969243891413"><td class="cellrowborder" valign="top" width="20.61%" headers="mcps1.3.3.2.1.1.2.1.1.2.2.3.1.1 "><p id="dli_09_0061__en-us_topic_0199537136_p4691638191412">es.nodes</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="79.39%" headers="mcps1.3.3.2.1.1.2.1.1.2.2.3.1.2 "><p id="dli_09_0061__en-us_topic_0199537136_p4691238201416">CSS connection address. You need to create a datasource connection first. </p>
|
|
<p id="dli_09_0061__en-us_topic_0199537136_p8692133812143">If you have created an enhanced datasource connection, use the intranet IP address provided by CSS. The address format is <strong id="dli_09_0061__b18612133214383"><em id="dli_09_0061__i156041132113810">IP1</em>:<em id="dli_09_0061__i66114322382">PORT1</em>,<em id="dli_09_0061__i56111232173814">IP2</em>:<em id="dli_09_0061__i1861223223812">PORT2</em></strong>.</p>
|
|
</td>
|
|
</tr>
|
|
<tr id="dli_09_0061__en-us_topic_0199537136_row166932384147"><td class="cellrowborder" valign="top" width="20.61%" headers="mcps1.3.3.2.1.1.2.1.1.2.2.3.1.1 "><p id="dli_09_0061__en-us_topic_0199537136_p56921138171417">resource</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="79.39%" headers="mcps1.3.3.2.1.1.2.1.1.2.2.3.1.2 "><p id="dli_09_0061__en-us_topic_0199537136_p1169216389144">Name of the resource for the CSS datasource connection name. You can use <strong id="dli_09_0061__b1211113893816">/index/type</strong> to specify the resource location (for easier understanding, the <strong id="dli_09_0061__b1917838193816">index</strong> may be seen as <strong id="dli_09_0061__b118438123814">database</strong> and <strong id="dli_09_0061__b0182386382">type</strong> as <strong id="dli_09_0061__b151813853816">table</strong>).</p>
|
|
<div class="note" id="dli_09_0061__en-us_topic_0199537136_note2975123311388"><span class="notetitle"> NOTE: </span><div class="notebody"><p id="dli_09_0061__en-us_topic_0199537136_p1997614336380">1. In Elasticsearch 6.<em id="dli_09_0061__i19753418198">X</em>, a single index supports only one type, and the type name can be customized.</p>
|
|
<p id="dli_09_0061__en-us_topic_0199537136_p2464373911">2. In Elasticsearch 7.<em id="dli_09_0061__i23228115204">X</em>, a single index uses <strong id="dli_09_0061__b071695016191">_doc</strong> as the type name and cannot be customized. To access Elasticsearch 7.<em id="dli_09_0061__i46892134216">X</em>, set this parameter to <strong id="dli_09_0061__en-us_topic_0190067468_b880563813439">index</strong>.</p>
|
|
</div></div>
|
|
</td>
|
|
</tr>
|
|
<tr id="dli_09_0061__en-us_topic_0199537136_row469318381145"><td class="cellrowborder" valign="top" width="20.61%" headers="mcps1.3.3.2.1.1.2.1.1.2.2.3.1.1 "><p id="dli_09_0061__en-us_topic_0199537136_p166937386148">pushdown</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="79.39%" headers="mcps1.3.3.2.1.1.2.1.1.2.2.3.1.2 "><p id="dli_09_0061__en-us_topic_0199537136_p1569315383144">Whether to enable the pushdown function of CSS. The default value is <strong id="dli_09_0061__b43828523382">true</strong>. For tables with a large number of I/O requests, the pushdown function help reduce I/O pressure when the <strong id="dli_09_0061__b18437555389">where</strong> condition is specified.</p>
|
|
</td>
|
|
</tr>
|
|
<tr id="dli_09_0061__en-us_topic_0199537136_row8693153819143"><td class="cellrowborder" valign="top" width="20.61%" headers="mcps1.3.3.2.1.1.2.1.1.2.2.3.1.1 "><p id="dli_09_0061__en-us_topic_0199537136_p196931938101415">strict</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="79.39%" headers="mcps1.3.3.2.1.1.2.1.1.2.2.3.1.2 "><p id="dli_09_0061__en-us_topic_0199537136_p1569312384140">Whether the CSS pushdown is strict. The default value is <strong id="dli_09_0061__b1247310017393">false</strong>. The exact match function can reduce more I/O requests than pushdown.</p>
|
|
</td>
|
|
</tr>
|
|
<tr id="dli_09_0061__en-us_topic_0199537136_row186931387140"><td class="cellrowborder" valign="top" width="20.61%" headers="mcps1.3.3.2.1.1.2.1.1.2.2.3.1.1 "><p id="dli_09_0061__en-us_topic_0199537136_p1669313385145">batch.size.entries</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="79.39%" headers="mcps1.3.3.2.1.1.2.1.1.2.2.3.1.2 "><p id="dli_09_0061__en-us_topic_0199537136_p269323817147">Maximum number of entries that can be inserted in a batch. The default value is <strong id="dli_09_0061__b515016251383">1000</strong>. If the size of a single data record is so large that the number of data records in the bulk storage reaches the upper limit of the data amount in a single batch, the system stops storing data and submits the data based on the <strong id="dli_09_0061__b1589219818">batch.size.bytes</strong> parameter.</p>
|
|
</td>
|
|
</tr>
|
|
<tr id="dli_09_0061__en-us_topic_0199537136_row66937380147"><td class="cellrowborder" valign="top" width="20.61%" headers="mcps1.3.3.2.1.1.2.1.1.2.2.3.1.1 "><p id="dli_09_0061__en-us_topic_0199537136_p176933387148">batch.size.bytes</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="79.39%" headers="mcps1.3.3.2.1.1.2.1.1.2.2.3.1.2 "><p id="dli_09_0061__en-us_topic_0199537136_p9693163851419">Maximum amount of data in a single batch. The default value is <strong id="dli_09_0061__b1213614171789">1 MB</strong>. If the size of a single data record is so small that the number of data records in the bulk storage reaches the upper limit of the data amount of a single batch, the system stops storing data and submits the data based on the <strong id="dli_09_0061__b1489118299812">batch.size.entries</strong> parameter.</p>
|
|
</td>
|
|
</tr>
|
|
<tr id="dli_09_0061__en-us_topic_0199537136_row1769333861419"><td class="cellrowborder" valign="top" width="20.61%" headers="mcps1.3.3.2.1.1.2.1.1.2.2.3.1.1 "><p id="dli_09_0061__en-us_topic_0199537136_p26934384145">es.nodes.wan.only</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="79.39%" headers="mcps1.3.3.2.1.1.2.1.1.2.2.3.1.2 "><p id="dli_09_0061__en-us_topic_0199537136_p116931738181411">Whether to access the Elasticsearch node using only the domain name. The default value is <strong id="dli_09_0061__b67521737382">false</strong>. If the original internal IP address provided by CSS is used as the <strong id="dli_09_0061__b725394017811">es.nodes</strong>, you do not need to set this parameter or set it to <strong id="dli_09_0061__b9254104015810">false</strong>.</p>
|
|
</td>
|
|
</tr>
|
|
<tr id="dli_09_0061__en-us_topic_0199537136_row1969393891416"><td class="cellrowborder" valign="top" width="20.61%" headers="mcps1.3.3.2.1.1.2.1.1.2.2.3.1.1 "><p id="dli_09_0061__en-us_topic_0199537136_p569316388144">es.mapping.id</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="79.39%" headers="mcps1.3.3.2.1.1.2.1.1.2.2.3.1.2 "><p id="dli_09_0061__en-us_topic_0199537136_p15693163819147">Document field name that contains the document ID in the Elasticsearch node.</p>
|
|
<div class="note" id="dli_09_0061__en-us_topic_0199537136_note20720112316195"><span class="notetitle"> NOTE: </span><div class="notebody"><ul id="dli_09_0061__en-us_topic_0199537136_ul72517457221"><li id="dli_09_0061__en-us_topic_0199537136_li172518457227">The document ID in the same <strong id="dli_09_0061__b95907461288">/index/type</strong> is unique. If a field that contains a document ID has duplicate values, the document with the duplicate ID will be overwritten when the ES is inserted.</li><li id="dli_09_0061__en-us_topic_0199537136_li1525164572214">This feature can be used as a fault tolerance solution. When data is being inserted, the DLI job fails and some data has been inserted into Elasticsearch. The data is redundant. If the document ID is set, the previous data will be overwritten when the DLI job is executed again.</li></ul>
|
|
</div></div>
|
|
</td>
|
|
</tr>
|
|
<tr id="dli_09_0061__en-us_topic_0199537136_row1325964931513"><td class="cellrowborder" valign="top" width="20.61%" headers="mcps1.3.3.2.1.1.2.1.1.2.2.3.1.1 "><p id="dli_09_0061__en-us_topic_0199537136_p1362113494517">es.net.ssl</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="79.39%" headers="mcps1.3.3.2.1.1.2.1.1.2.2.3.1.2 "><p id="dli_09_0061__en-us_topic_0199537136_p10621334194517">Whether to connect to the security CSS cluster. The default value is <strong id="dli_09_0061__b1285995096">false</strong>.</p>
|
|
</td>
|
|
</tr>
|
|
<tr id="dli_09_0061__en-us_topic_0199537136_row159705241517"><td class="cellrowborder" valign="top" width="20.61%" headers="mcps1.3.3.2.1.1.2.1.1.2.2.3.1.1 "><p id="dli_09_0061__en-us_topic_0199537136_p1357862632011">es.net.ssl.keystore.location</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="79.39%" headers="mcps1.3.3.2.1.1.2.1.1.2.2.3.1.2 "><p id="dli_09_0061__en-us_topic_0199537136_p2555162612019">OBS bucket location of the <strong id="dli_09_0061__b8107122932210">keystore</strong> file generated by the security CSS cluster certificate.</p>
|
|
</td>
|
|
</tr>
|
|
<tr id="dli_09_0061__en-us_topic_0199537136_row1976818612215"><td class="cellrowborder" valign="top" width="20.61%" headers="mcps1.3.3.2.1.1.2.1.1.2.2.3.1.1 "><p id="dli_09_0061__en-us_topic_0199537136_p576816652120">es.net.ssl.keystore.pass</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="79.39%" headers="mcps1.3.3.2.1.1.2.1.1.2.2.3.1.2 "><p id="dli_09_0061__en-us_topic_0199537136_p2768116112110">Password of the <strong id="dli_09_0061__b526263119228">keystore</strong> file generated by the security CSS cluster certificate.</p>
|
|
</td>
|
|
</tr>
|
|
<tr id="dli_09_0061__en-us_topic_0199537136_row4815131162111"><td class="cellrowborder" valign="top" width="20.61%" headers="mcps1.3.3.2.1.1.2.1.1.2.2.3.1.1 "><p id="dli_09_0061__en-us_topic_0199537136_p13816151117219">es.net.ssl.truststore.location</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="79.39%" headers="mcps1.3.3.2.1.1.2.1.1.2.2.3.1.2 "><p id="dli_09_0061__en-us_topic_0199537136_p320983519233">OBS bucket location of the <strong id="dli_09_0061__b10573133492215">truststore</strong> file generated by the security CSS cluster certificate.</p>
|
|
</td>
|
|
</tr>
|
|
<tr id="dli_09_0061__en-us_topic_0199537136_row13850111482118"><td class="cellrowborder" valign="top" width="20.61%" headers="mcps1.3.3.2.1.1.2.1.1.2.2.3.1.1 "><p id="dli_09_0061__en-us_topic_0199537136_p128501814102115">es.net.ssl.truststore.pass</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="79.39%" headers="mcps1.3.3.2.1.1.2.1.1.2.2.3.1.2 "><p id="dli_09_0061__en-us_topic_0199537136_p1285081418218">Password of the <strong id="dli_09_0061__b1734373614223">truststore</strong> file generated by the security CSS cluster certificate.</p>
|
|
</td>
|
|
</tr>
|
|
<tr id="dli_09_0061__en-us_topic_0199537136_row83104913216"><td class="cellrowborder" valign="top" width="20.61%" headers="mcps1.3.3.2.1.1.2.1.1.2.2.3.1.1 "><p id="dli_09_0061__en-us_topic_0199537136_p131089182115">es.net.http.auth.user</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="79.39%" headers="mcps1.3.3.2.1.1.2.1.1.2.2.3.1.2 "><p id="dli_09_0061__en-us_topic_0199537136_p13102912112">Username of the security CSS cluster.</p>
|
|
</td>
|
|
</tr>
|
|
<tr id="dli_09_0061__en-us_topic_0199537136_row20667349142115"><td class="cellrowborder" valign="top" width="20.61%" headers="mcps1.3.3.2.1.1.2.1.1.2.2.3.1.1 "><p id="dli_09_0061__en-us_topic_0199537136_p1566794910218">es.net.http.auth.pass</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="79.39%" headers="mcps1.3.3.2.1.1.2.1.1.2.2.3.1.2 "><p id="dli_09_0061__en-us_topic_0199537136_p121191192216">Password of the security CSS cluster.</p>
|
|
</td>
|
|
</tr>
|
|
</tbody>
|
|
</table>
|
|
</div>
|
|
<div class="note" id="dli_09_0061__en-us_topic_0199537136_note625154520182"><img src="public_sys-resources/note_3.0-en-us.png"><span class="notetitle"> </span><div class="notebody"><p id="dli_09_0061__en-us_topic_0199537136_p172521456189"><span class="parmname" id="dli_09_0061__parmname118068111234"><b>batch.size.entries</b></span> and <span class="parmname" id="dli_09_0061__parmname188123117233"><b>batch.size.bytes</b></span> limit the number of data records and data volume respectively.</p>
|
|
</div></div>
|
|
</li><li id="dli_09_0061__en-us_topic_0199537136_li18313911617">Insert data.<div class="codecoloring" codetype="Scala" id="dli_09_0061__en-us_topic_0199537136_screen66292184333"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span></pre></div></td><td class="code"><div><pre><span></span><span class="n">sparkSession</span><span class="p">.</span><span class="n">sql</span><span class="p">(</span><span class="s">"insert into css_table values(13, 'John'),(22, 'Bob')"</span><span class="p">)</span>
|
|
</pre></div></td></tr></table></div>
|
|
|
|
</div>
|
|
</li><li id="dli_09_0061__en-us_topic_0199537136_li17401941177">Query data.<div class="codecoloring" codetype="Scala" id="dli_09_0061__en-us_topic_0199537136_screen8335528349"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span>
|
|
<span class="normal">2</span></pre></div></td><td class="code"><div><pre><span></span><span class="kd">val</span><span class="w"> </span><span class="n">dataFrame</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">sparkSession</span><span class="p">.</span><span class="n">sql</span><span class="p">(</span><span class="s">"select * from css_table"</span><span class="p">)</span>
|
|
<span class="n">dataFrame</span><span class="p">.</span><span class="n">show</span><span class="p">()</span>
|
|
</pre></div></td></tr></table></div>
|
|
|
|
</div>
|
|
<p id="dli_09_0061__en-us_topic_0199537136_p980612423518">Before data is inserted:</p>
|
|
<p id="dli_09_0061__en-us_topic_0199537136_p1111911912352"><span><img id="dli_09_0061__en-us_topic_0199537136_image16928173453517" src="en-us_image_0266325813.png"></span></p>
|
|
<p id="dli_09_0061__en-us_topic_0199537136_p1312675873519">Response:</p>
|
|
<p id="dli_09_0061__en-us_topic_0199537136_p186811539367"><span><img id="dli_09_0061__en-us_topic_0199537136_image091441010366" src="en-us_image_0266325814.png"></span></p>
|
|
</li><li id="dli_09_0061__en-us_topic_0199537136_li14681446870">Delete the datasource connection table.<div class="codecoloring" codetype="Scala" id="dli_09_0061__en-us_topic_0199537136_screen131211827184111"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span></pre></div></td><td class="code"><div><pre><span></span><span class="n">sparkSession</span><span class="p">.</span><span class="n">sql</span><span class="p">(</span><span class="s">"drop table css_table"</span><span class="p">)</span>
|
|
</pre></div></td></tr></table></div>
|
|
|
|
</div>
|
|
</li></ol>
|
|
</li><li id="dli_09_0061__li449614121726">Connecting to data sources through DataFrame APIs<ol id="dli_09_0061__en-us_topic_0199537136_ol1565218286816"><li id="dli_09_0061__en-us_topic_0199537136_li17652112814813">Set connection parameters.<div class="codecoloring" codetype="Scala" id="dli_09_0061__en-us_topic_0199537136_screen20117532094"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span>
|
|
<span class="normal">2</span></pre></div></td><td class="code"><div><pre><span></span><span class="kd">val</span><span class="w"> </span><span class="n">resource</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">"/mytest/css"</span>
|
|
<span class="kd">val</span><span class="w"> </span><span class="n">nodes</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">"to-css-1174405013-Ht7O1tYf.datasource.com:9200"</span>
|
|
</pre></div></td></tr></table></div>
|
|
|
|
</div>
|
|
</li><li id="dli_09_0061__en-us_topic_0199537136_li17603210997">Create a schema and add data to it.<div class="codecoloring" codetype="Scala" id="dli_09_0061__en-us_topic_0199537136_screen1850883315920"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span>
|
|
<span class="normal">2</span></pre></div></td><td class="code"><div><pre><span></span><span class="kd">val</span><span class="w"> </span><span class="n">schema</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nc">StructType</span><span class="p">(</span><span class="nc">Seq</span><span class="p">(</span><span class="nc">StructField</span><span class="p">(</span><span class="s">"id"</span><span class="p">,</span><span class="w"> </span><span class="nc">IntegerType</span><span class="p">,</span><span class="w"> </span><span class="kc">false</span><span class="p">),</span><span class="w"> </span><span class="nc">StructField</span><span class="p">(</span><span class="s">"name"</span><span class="p">,</span><span class="w"> </span><span class="nc">StringType</span><span class="p">,</span><span class="w"> </span><span class="kc">false</span><span class="p">)))</span>
|
|
<span class="kd">val</span><span class="w"> </span><span class="n">rdd</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">sparkSession</span><span class="p">.</span><span class="n">sparkContext</span><span class="p">.</span><span class="n">parallelize</span><span class="p">(</span><span class="nc">Seq</span><span class="p">(</span><span class="nc">Row</span><span class="p">(</span><span class="mi">12</span><span class="p">,</span><span class="w"> </span><span class="s">"John"</span><span class="p">),</span><span class="nc">Row</span><span class="p">(</span><span class="mi">21</span><span class="p">,</span><span class="s">"Bob"</span><span class="p">)))</span>
|
|
</pre></div></td></tr></table></div>
|
|
|
|
</div>
|
|
</li><li id="dli_09_0061__en-us_topic_0199537136_li54123813917">Import data to CSS.<div class="codecoloring" codetype="Scala" id="dli_09_0061__en-us_topic_0199537136_screen3727448564"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal"> 1</span>
|
|
<span class="normal"> 2</span>
|
|
<span class="normal"> 3</span>
|
|
<span class="normal"> 4</span>
|
|
<span class="normal"> 5</span>
|
|
<span class="normal"> 6</span>
|
|
<span class="normal"> 7</span>
|
|
<span class="normal"> 8</span>
|
|
<span class="normal"> 9</span>
|
|
<span class="normal">10</span>
|
|
<span class="normal">11</span>
|
|
<span class="normal">12</span>
|
|
<span class="normal">13</span>
|
|
<span class="normal">14</span></pre></div></td><td class="code"><div><pre><span></span><span class="kd">val</span><span class="w"> </span><span class="n">dataFrame_1</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">sparkSession</span><span class="p">.</span><span class="n">createDataFrame</span><span class="p">(</span><span class="n">rdd</span><span class="p">,</span><span class="w"> </span><span class="n">schema</span><span class="p">)</span>
|
|
<span class="n">dataFrame_1</span><span class="p">.</span><span class="n">write</span><span class="w"> </span>
|
|
<span class="w"> </span><span class="p">.</span><span class="n">format</span><span class="p">(</span><span class="s">"css"</span><span class="p">)</span><span class="w"> </span>
|
|
<span class="w"> </span><span class="p">.</span><span class="n">option</span><span class="p">(</span><span class="s">"resource"</span><span class="p">,</span><span class="w"> </span><span class="n">resource</span><span class="p">)</span><span class="w"> </span>
|
|
<span class="w"> </span><span class="p">.</span><span class="n">option</span><span class="p">(</span><span class="s">"es.nodes"</span><span class="p">,</span><span class="w"> </span><span class="n">nodes</span><span class="p">)</span><span class="w"> </span>
|
|
<span class="w"> </span><span class="p">.</span><span class="n">option</span><span class="p">(</span><span class="s">"es.net.ssl"</span><span class="p">,</span><span class="w"> </span><span class="s">"true"</span><span class="p">)</span>
|
|
<span class="w"> </span><span class="p">.</span><span class="n">option</span><span class="p">(</span><span class="s">"es.net.ssl.keystore.location"</span><span class="p">,</span><span class="w"> </span><span class="s">"obs://Bucket name/path/transport-keystore.jks"</span><span class="p">)</span>
|
|
<span class="w"> </span><span class="p">.</span><span class="n">option</span><span class="p">(</span><span class="s">"es.net.ssl.keystore.pass"</span><span class="p">,</span><span class="w"> </span><span class="s">"***"</span><span class="p">)</span>
|
|
<span class="w"> </span><span class="p">.</span><span class="n">option</span><span class="p">(</span><span class="s">"es.net.ssl.truststore.location"</span><span class="p">,</span><span class="w"> </span><span class="s">"obs://Bucket name/path/truststore.jks"</span><span class="p">)</span>
|
|
<span class="w"> </span><span class="p">.</span><span class="n">option</span><span class="p">(</span><span class="s">"es.net.ssl.truststore.pass"</span><span class="p">,</span><span class="w"> </span><span class="s">"***"</span><span class="p">)</span>
|
|
<span class="w"> </span><span class="p">.</span><span class="n">option</span><span class="p">(</span><span class="s">"es.net.http.auth.user"</span><span class="p">,</span><span class="w"> </span><span class="s">"admin"</span><span class="p">)</span>
|
|
<span class="w"> </span><span class="p">.</span><span class="n">option</span><span class="p">(</span><span class="s">"es.net.http.auth.pass"</span><span class="p">,</span><span class="w"> </span><span class="s">"***"</span><span class="p">)</span>
|
|
<span class="w"> </span><span class="p">.</span><span class="n">mode</span><span class="p">(</span><span class="nc">SaveMode</span><span class="p">.</span><span class="nc">Append</span><span class="p">)</span><span class="w"> </span>
|
|
<span class="w"> </span><span class="p">.</span><span class="n">save</span><span class="p">()</span>
|
|
</pre></div></td></tr></table></div>
|
|
|
|
</div>
|
|
<div class="note" id="dli_09_0061__en-us_topic_0199537136_note17397174817568"><img src="public_sys-resources/note_3.0-en-us.png"><span class="notetitle"> </span><div class="notebody"><p id="dli_09_0061__en-us_topic_0199537136_p039712487568">The value of <strong id="dli_09_0061__b1115293047">Mode</strong> can be one of the following:</p>
|
|
<ul id="dli_09_0061__en-us_topic_0199537136_ul1929273321915"><li id="dli_09_0061__en-us_topic_0199537136_li8292633161916"><strong id="dli_09_0061__b1545782315275">ErrorIfExis</strong>: If the data already exists, the system throws an exception.</li><li id="dli_09_0061__en-us_topic_0199537136_li1229213391913"><strong id="dli_09_0061__b114122514270">Overwrite</strong>: If the data already exists, the original data will be overwritten.</li><li id="dli_09_0061__en-us_topic_0199537136_li7292833201912"><strong id="dli_09_0061__b1688162602715">Append</strong>: If the data already exists, the system saves the new data.</li><li id="dli_09_0061__en-us_topic_0199537136_li1029353311911"><strong id="dli_09_0061__b544192842712">Ignore</strong>: If the data already exists, no operation is required. This is similar to the SQL statement <strong id="dli_09_0061__b1105378881">CREATE TABLE IF NOT EXISTS</strong>.</li></ul>
|
|
</div></div>
|
|
</li><li id="dli_09_0061__en-us_topic_0199537136_li1138464913105">Read data from CSS.<div class="codecoloring" codetype="Scala" id="dli_09_0061__en-us_topic_0199537136_screen5984155015578"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal"> 1</span>
|
|
<span class="normal"> 2</span>
|
|
<span class="normal"> 3</span>
|
|
<span class="normal"> 4</span>
|
|
<span class="normal"> 5</span>
|
|
<span class="normal"> 6</span>
|
|
<span class="normal"> 7</span>
|
|
<span class="normal"> 8</span>
|
|
<span class="normal"> 9</span>
|
|
<span class="normal">10</span>
|
|
<span class="normal">11</span>
|
|
<span class="normal">12</span></pre></div></td><td class="code"><div><pre><span></span><span class="kd">val</span><span class="w"> </span><span class="n">dataFrameR</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">sparkSession</span><span class="p">.</span><span class="n">read</span><span class="p">.</span><span class="n">format</span><span class="p">(</span><span class="s">"css"</span><span class="p">)</span>
|
|
<span class="w"> </span><span class="p">.</span><span class="n">option</span><span class="p">(</span><span class="s">"resource"</span><span class="p">,</span><span class="n">resource</span><span class="p">)</span>
|
|
<span class="w"> </span><span class="p">.</span><span class="n">option</span><span class="p">(</span><span class="s">"es.nodes"</span><span class="p">,</span><span class="w"> </span><span class="n">nodes</span><span class="p">)</span>
|
|
<span class="w"> </span><span class="p">.</span><span class="n">option</span><span class="p">(</span><span class="s">"es.net.ssl"</span><span class="p">,</span><span class="w"> </span><span class="s">"true"</span><span class="p">)</span>
|
|
<span class="w"> </span><span class="p">.</span><span class="n">option</span><span class="p">(</span><span class="s">"es.net.ssl.keystore.location"</span><span class="p">,</span><span class="w"> </span><span class="s">"obs://Bucket name/path/transport-keystore.jks"</span><span class="p">)</span>
|
|
<span class="w"> </span><span class="p">.</span><span class="n">option</span><span class="p">(</span><span class="s">"es.net.ssl.keystore.pass"</span><span class="p">,</span><span class="w"> </span><span class="s">"***"</span><span class="p">)</span>
|
|
<span class="w"> </span><span class="p">.</span><span class="n">option</span><span class="p">(</span><span class="s">"es.net.ssl.truststore.location"</span><span class="p">,</span><span class="w"> </span><span class="s">"obs://Bucket name/path/truststore.jks"</span><span class="p">)</span>
|
|
<span class="w"> </span><span class="p">.</span><span class="n">option</span><span class="p">(</span><span class="s">"es.net.ssl.truststore.pass"</span><span class="p">,</span><span class="w"> </span><span class="s">"***"</span><span class="p">)</span>
|
|
<span class="w"> </span><span class="p">.</span><span class="n">option</span><span class="p">(</span><span class="s">"es.net.http.auth.user"</span><span class="p">,</span><span class="w"> </span><span class="s">"admin"</span><span class="p">)</span>
|
|
<span class="w"> </span><span class="p">.</span><span class="n">option</span><span class="p">(</span><span class="s">"es.net.http.auth.pass"</span><span class="p">,</span><span class="w"> </span><span class="s">"***"</span><span class="p">)</span>
|
|
<span class="w"> </span><span class="p">.</span><span class="n">load</span><span class="p">()</span>
|
|
<span class="n">dataFrameR</span><span class="p">.</span><span class="n">show</span><span class="p">()</span>
|
|
</pre></div></td></tr></table></div>
|
|
|
|
</div>
|
|
<p id="dli_09_0061__en-us_topic_0199537136_p1024062717116">Before data is inserted:</p>
|
|
<p id="dli_09_0061__en-us_topic_0199537136_p14240162712111"><span><img id="dli_09_0061__en-us_topic_0199537136_image1466859593" src="en-us_image_0266325815.png"></span></p>
|
|
<p id="dli_09_0061__en-us_topic_0199537136_p4240192771110">Response:</p>
|
|
<p id="dli_09_0061__en-us_topic_0199537136_p72400279119"><span><img id="dli_09_0061__en-us_topic_0199537136_image18991013105914" src="en-us_image_0266325816.png"></span></p>
|
|
</li></ol>
|
|
</li><li id="dli_09_0061__li191861821634">Submitting a Spark job<ol id="dli_09_0061__ol6476827139"><li id="dli_09_0061__li540795512264">Generate a JAR package based on the code and upload the package to DLI.<p id="dli_09_0061__dli_09_0063_p1749619513385_1"><a name="dli_09_0061__li540795512264"></a><a name="li540795512264"></a></p>
|
|
<p id="dli_09_0061__dli_09_0063_p114961151385_1"></p>
|
|
</li><li id="dli_09_0061__en-us_topic_0199537136_li67827509599">In the Spark job editor, select the corresponding dependency module and execute the Spark job. <div class="note" id="dli_09_0061__en-us_topic_0199537136_note1435543551919"><img src="public_sys-resources/note_3.0-en-us.png"><span class="notetitle"> </span><div class="notebody"><ul id="dli_09_0061__en-us_topic_0199537136_ul17825285811"><li id="dli_09_0061__en-us_topic_0199537136_li58215295819">When submitting a job, you need to specify a dependency module named <strong id="dli_09_0061__en-us_topic_0190067468_b182101714179">sys.datasource.css</strong>.</li><li id="dli_09_0061__en-us_topic_0199537136_li14401129269">For details about how to submit a job on the DLI console, see </li><li id="dli_09_0061__en-us_topic_0199537136_li193313445818">For details about how to submit a job through an API, see the <strong id="dli_09_0061__b74351139175814">modules</strong> parameter in </li></ul>
|
|
</div></div>
|
|
</li></ol>
|
|
</li></ul>
|
|
</li><li id="dli_09_0061__li314719331458">Complete example code<ul id="dli_09_0061__ul92053221464"><li id="dli_09_0061__li850518201461">Maven dependency<div class="codecoloring" codetype="Scala" id="dli_09_0061__en-us_topic_0199537137_screen63558118176"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span>
|
|
<span class="normal">2</span>
|
|
<span class="normal">3</span>
|
|
<span class="normal">4</span>
|
|
<span class="normal">5</span></pre></div></td><td class="code"><div><pre><span></span><span class="o"><</span><span class="n">dependency</span><span class="o">></span>
|
|
<span class="w"> </span><span class="o"><</span><span class="n">groupId</span><span class="o">></span><span class="n">org</span><span class="p">.</span><span class="n">apache</span><span class="p">.</span><span class="n">spark</span><span class="o"></</span><span class="n">groupId</span><span class="o">></span>
|
|
<span class="w"> </span><span class="o"><</span><span class="n">artifactId</span><span class="o">></span><span class="n">spark</span><span class="o">-</span><span class="n">sql_2</span><span class="mf">.11</span><span class="o"></</span><span class="n">artifactId</span><span class="o">></span>
|
|
<span class="w"> </span><span class="o"><</span><span class="n">version</span><span class="o">></span><span class="mf">2.3.2</span><span class="o"></</span><span class="n">version</span><span class="o">></span>
|
|
<span class="o"></</span><span class="n">dependency</span><span class="o">></span>
|
|
</pre></div></td></tr></table></div>
|
|
|
|
</div>
|
|
</li><li id="dli_09_0061__li713916191710">Connecting to data sources through SQL APIs<div class="codecoloring" codetype="Scala" id="dli_09_0061__en-us_topic_0199537137_screen172412256524"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal"> 1</span>
|
|
<span class="normal"> 2</span>
|
|
<span class="normal"> 3</span>
|
|
<span class="normal"> 4</span>
|
|
<span class="normal"> 5</span>
|
|
<span class="normal"> 6</span>
|
|
<span class="normal"> 7</span>
|
|
<span class="normal"> 8</span>
|
|
<span class="normal"> 9</span>
|
|
<span class="normal">10</span>
|
|
<span class="normal">11</span>
|
|
<span class="normal">12</span>
|
|
<span class="normal">13</span>
|
|
<span class="normal">14</span>
|
|
<span class="normal">15</span>
|
|
<span class="normal">16</span>
|
|
<span class="normal">17</span>
|
|
<span class="normal">18</span>
|
|
<span class="normal">19</span>
|
|
<span class="normal">20</span>
|
|
<span class="normal">21</span>
|
|
<span class="normal">22</span>
|
|
<span class="normal">23</span></pre></div></td><td class="code"><div><pre><span></span><span class="k">import</span><span class="w"> </span><span class="nn">org</span><span class="p">.</span><span class="nn">apache</span><span class="p">.</span><span class="nn">spark</span><span class="p">.</span><span class="nn">sql</span><span class="p">.</span><span class="nc">SparkSession</span>
|
|
<span class="w"> </span>
|
|
<span class="k">object</span><span class="w"> </span><span class="nc">csshttpstest</span><span class="w"> </span><span class="p">{</span>
|
|
<span class="w"> </span><span class="k">def</span><span class="w"> </span><span class="nf">main</span><span class="p">(</span><span class="n">args</span><span class="p">:</span><span class="w"> </span><span class="nc">Array</span><span class="p">[</span><span class="nc">String</span><span class="p">]):</span><span class="w"> </span><span class="nc">Unit</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{</span>
|
|
<span class="w"> </span><span class="c1">//Create a SparkSession session.</span>
|
|
<span class="w"> </span><span class="kd">val</span><span class="w"> </span><span class="n">sparkSession</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nc">SparkSession</span><span class="p">.</span><span class="n">builder</span><span class="p">().</span><span class="n">getOrCreate</span><span class="p">()</span>
|
|
<span class="w"> </span><span class="c1">// Create a DLI data table for DLI-associated CSS</span>
|
|
<span class="w"> </span><span class="n">sparkSession</span><span class="p">.</span><span class="n">sql</span><span class="p">(</span><span class="s">"create table css_table(id long, name string) using css options('es.nodes' = '192.168.6.204:9200','es.nodes.wan.only' = 'false','resource' = '/mytest','es.net.ssl'='true','es.net.ssl.keystore.location' = 'obs://xietest1/lzq/keystore.jks','es.net.ssl.keystore.pass' = '**','es.net.ssl.truststore.location'='obs://xietest1/lzq/truststore.jks','es.net.ssl.truststore.pass'='**','es.net.http.auth.user'='admin','es.net.http.auth.pass'='**')"</span><span class="p">)</span>
|
|
<span class="w"> </span>
|
|
<span class="w"> </span><span class="c1">//*****************************SQL model***********************************</span>
|
|
<span class="w"> </span><span class="c1">// Insert data into the DLI data table</span>
|
|
<span class="w"> </span><span class="n">sparkSession</span><span class="p">.</span><span class="n">sql</span><span class="p">(</span><span class="s">"insert into css_table values(13, 'John'),(22, 'Bob')"</span><span class="p">)</span>
|
|
<span class="w"> </span>
|
|
<span class="w"> </span><span class="c1">// Read data from DLI data table</span>
|
|
<span class="w"> </span><span class="kd">val</span><span class="w"> </span><span class="n">dataFrame</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">sparkSession</span><span class="p">.</span><span class="n">sql</span><span class="p">(</span><span class="s">"select * from css_table"</span><span class="p">)</span>
|
|
<span class="w"> </span><span class="n">dataFrame</span><span class="p">.</span><span class="n">show</span><span class="p">()</span>
|
|
<span class="w"> </span>
|
|
<span class="w"> </span><span class="c1">// drop table</span>
|
|
<span class="w"> </span><span class="n">sparkSession</span><span class="p">.</span><span class="n">sql</span><span class="p">(</span><span class="s">"drop table css_table"</span><span class="p">)</span>
|
|
<span class="w"> </span>
|
|
<span class="w"> </span><span class="n">sparkSession</span><span class="p">.</span><span class="n">close</span><span class="p">()</span>
|
|
<span class="w"> </span><span class="p">}</span>
|
|
<span class="p">}</span>
|
|
</pre></div></td></tr></table></div>
|
|
|
|
</div>
|
|
</li><li id="dli_09_0061__li1010195514711">Connecting to data sources through DataFrame APIs<div class="note" id="dli_09_0061__note7140183221514"><img src="public_sys-resources/note_3.0-en-us.png"><span class="notetitle"> </span><div class="notebody"><p id="dli_09_0061__p214073215153">Hard-coded or plaintext AK and SK pose significant security risks. To ensure security, encrypt your AK and SK, store them in configuration files or environment variables, and decrypt them when needed.</p>
|
|
</div></div>
|
|
<div class="codecoloring" codetype="Scala" id="dli_09_0061__en-us_topic_0199537137_screen1058825311184"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal"> 1</span>
|
|
<span class="normal"> 2</span>
|
|
<span class="normal"> 3</span>
|
|
<span class="normal"> 4</span>
|
|
<span class="normal"> 5</span>
|
|
<span class="normal"> 6</span>
|
|
<span class="normal"> 7</span>
|
|
<span class="normal"> 8</span>
|
|
<span class="normal"> 9</span>
|
|
<span class="normal">10</span>
|
|
<span class="normal">11</span>
|
|
<span class="normal">12</span>
|
|
<span class="normal">13</span>
|
|
<span class="normal">14</span>
|
|
<span class="normal">15</span>
|
|
<span class="normal">16</span>
|
|
<span class="normal">17</span>
|
|
<span class="normal">18</span>
|
|
<span class="normal">19</span>
|
|
<span class="normal">20</span>
|
|
<span class="normal">21</span>
|
|
<span class="normal">22</span>
|
|
<span class="normal">23</span>
|
|
<span class="normal">24</span>
|
|
<span class="normal">25</span>
|
|
<span class="normal">26</span>
|
|
<span class="normal">27</span>
|
|
<span class="normal">28</span>
|
|
<span class="normal">29</span>
|
|
<span class="normal">30</span>
|
|
<span class="normal">31</span>
|
|
<span class="normal">32</span>
|
|
<span class="normal">33</span>
|
|
<span class="normal">34</span>
|
|
<span class="normal">35</span>
|
|
<span class="normal">36</span>
|
|
<span class="normal">37</span>
|
|
<span class="normal">38</span>
|
|
<span class="normal">39</span>
|
|
<span class="normal">40</span>
|
|
<span class="normal">41</span>
|
|
<span class="normal">42</span>
|
|
<span class="normal">43</span>
|
|
<span class="normal">44</span>
|
|
<span class="normal">45</span>
|
|
<span class="normal">46</span>
|
|
<span class="normal">47</span>
|
|
<span class="normal">48</span>
|
|
<span class="normal">49</span>
|
|
<span class="normal">50</span>
|
|
<span class="normal">51</span>
|
|
<span class="normal">52</span>
|
|
<span class="normal">53</span>
|
|
<span class="normal">54</span>
|
|
<span class="normal">55</span>
|
|
<span class="normal">56</span>
|
|
<span class="normal">57</span></pre></div></td><td class="code"><div><pre><span></span><span class="k">import</span><span class="w"> </span><span class="nn">org</span><span class="p">.</span><span class="nn">apache</span><span class="p">.</span><span class="nn">spark</span><span class="p">.</span><span class="nn">sql</span><span class="p">.{</span><span class="nc">Row</span><span class="p">,</span><span class="w"> </span><span class="nc">SaveMode</span><span class="p">,</span><span class="w"> </span><span class="nc">SparkSession</span><span class="p">}</span><span class="err">;</span>
|
|
<span class="k">import</span><span class="w"> </span><span class="nn">org</span><span class="p">.</span><span class="nn">apache</span><span class="p">.</span><span class="nn">spark</span><span class="p">.</span><span class="nn">sql</span><span class="p">.</span><span class="nn">types</span><span class="p">.{</span><span class="nc">IntegerType</span><span class="p">,</span><span class="w"> </span><span class="nc">StringType</span><span class="p">,</span><span class="w"> </span><span class="nc">StructField</span><span class="p">,</span><span class="w"> </span><span class="nc">StructType</span><span class="p">}</span><span class="err">;</span>
|
|
|
|
<span class="k">object</span><span class="w"> </span><span class="nc">Test_SQL_CSS</span><span class="w"> </span><span class="p">{</span>
|
|
<span class="w"> </span><span class="k">def</span><span class="w"> </span><span class="nf">main</span><span class="p">(</span><span class="n">args</span><span class="p">:</span><span class="w"> </span><span class="nc">Array</span><span class="p">[</span><span class="nc">String</span><span class="p">]):</span><span class="w"> </span><span class="nc">Unit</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{</span>
|
|
<span class="w"> </span><span class="c1">//Create a SparkSession session.</span>
|
|
<span class="w"> </span><span class="kd">val</span><span class="w"> </span><span class="n">sparkSession</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nc">SparkSession</span><span class="p">.</span><span class="n">builder</span><span class="p">().</span><span class="n">getOrCreate</span><span class="p">()</span>
|
|
<span class="w"> </span><span class="n">sparkSession</span><span class="p">.</span><span class="n">conf</span><span class="p">.</span><span class="n">set</span><span class="p">(</span><span class="s">"fs.obs.access.key"</span><span class="p">,</span><span class="w"> </span><span class="n">ak</span><span class="p">)</span>
|
|
<span class="w"> </span><span class="n">sparkSession</span><span class="p">.</span><span class="n">conf</span><span class="p">.</span><span class="n">set</span><span class="p">(</span><span class="s">"fs.obs.secret.key"</span><span class="p">,</span><span class="w"> </span><span class="n">sk</span><span class="p">)</span>
|
|
|
|
<span class="w"> </span><span class="c1">//*****************************DataFrame model***********************************</span>
|
|
<span class="w"> </span><span class="c1">// Setting the /index/type of CSS</span>
|
|
<span class="w"> </span><span class="kd">val</span><span class="w"> </span><span class="n">resource</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">"/mytest/css"</span>
|
|
<span class="w"> </span>
|
|
<span class="w"> </span><span class="c1">// Define the cross-origin connection address of the CSS cluster</span>
|
|
<span class="w"> </span><span class="kd">val</span><span class="w"> </span><span class="n">nodes</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">"to-css-1174405013-Ht7O1tYf.datasource.com:9200"</span>
|
|
|
|
<span class="w"> </span><span class="c1">//Setting schema</span>
|
|
<span class="w"> </span><span class="kd">val</span><span class="w"> </span><span class="n">schema</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nc">StructType</span><span class="p">(</span><span class="nc">Seq</span><span class="p">(</span><span class="nc">StructField</span><span class="p">(</span><span class="s">"id"</span><span class="p">,</span><span class="w"> </span><span class="nc">IntegerType</span><span class="p">,</span><span class="w"> </span><span class="kc">false</span><span class="p">),</span><span class="w"> </span><span class="nc">StructField</span><span class="p">(</span><span class="s">"name"</span><span class="p">,</span><span class="w"> </span><span class="nc">StringType</span><span class="p">,</span><span class="w"> </span><span class="kc">false</span><span class="p">)))</span>
|
|
<span class="w"> </span>
|
|
<span class="w"> </span><span class="c1">// Construction data</span>
|
|
<span class="w"> </span><span class="kd">val</span><span class="w"> </span><span class="n">rdd</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">sparkSession</span><span class="p">.</span><span class="n">sparkContext</span><span class="p">.</span><span class="n">parallelize</span><span class="p">(</span><span class="nc">Seq</span><span class="p">(</span><span class="nc">Row</span><span class="p">(</span><span class="mi">12</span><span class="p">,</span><span class="w"> </span><span class="s">"John"</span><span class="p">),</span><span class="nc">Row</span><span class="p">(</span><span class="mi">21</span><span class="p">,</span><span class="s">"Bob"</span><span class="p">)))</span>
|
|
<span class="w"> </span>
|
|
<span class="w"> </span><span class="c1">// Create a DataFrame from RDD and schema</span>
|
|
<span class="w"> </span><span class="kd">val</span><span class="w"> </span><span class="n">dataFrame_1</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">sparkSession</span><span class="p">.</span><span class="n">createDataFrame</span><span class="p">(</span><span class="n">rdd</span><span class="p">,</span><span class="w"> </span><span class="n">schema</span><span class="p">)</span>
|
|
<span class="w"> </span>
|
|
<span class="w"> </span><span class="c1">//Write data to the CSS</span>
|
|
<span class="w"> </span><span class="n">dataFrame_1</span><span class="p">.</span><span class="n">write</span><span class="w"> </span><span class="p">.</span><span class="n">format</span><span class="p">(</span><span class="s">"css"</span><span class="p">)</span><span class="w"> </span>
|
|
<span class="w"> </span><span class="p">.</span><span class="n">option</span><span class="p">(</span><span class="s">"resource"</span><span class="p">,</span><span class="w"> </span><span class="n">resource</span><span class="p">)</span><span class="w"> </span>
|
|
<span class="w"> </span><span class="p">.</span><span class="n">option</span><span class="p">(</span><span class="s">"es.nodes"</span><span class="p">,</span><span class="w"> </span><span class="n">nodes</span><span class="p">)</span><span class="w"> </span>
|
|
<span class="w"> </span><span class="p">.</span><span class="n">option</span><span class="p">(</span><span class="s">"es.net.ssl"</span><span class="p">,</span><span class="w"> </span><span class="s">"true"</span><span class="p">)</span>
|
|
<span class="w"> </span><span class="p">.</span><span class="n">option</span><span class="p">(</span><span class="s">"es.net.ssl.keystore.location"</span><span class="p">,</span><span class="w"> </span><span class="s">"obs://Bucket name/path/transport-keystore.jks"</span><span class="p">)</span>
|
|
<span class="w"> </span><span class="p">.</span><span class="n">option</span><span class="p">(</span><span class="s">"es.net.ssl.keystore.pass"</span><span class="p">,</span><span class="w"> </span><span class="s">"***"</span><span class="p">)</span>
|
|
<span class="w"> </span><span class="p">.</span><span class="n">option</span><span class="p">(</span><span class="s">"es.net.ssl.truststore.location"</span><span class="p">,</span><span class="w"> </span><span class="s">"obs://Bucket name/path/truststore.jks"</span><span class="p">)</span>
|
|
<span class="w"> </span><span class="p">.</span><span class="n">option</span><span class="p">(</span><span class="s">"es.net.ssl.truststore.pass"</span><span class="p">,</span><span class="w"> </span><span class="s">"***"</span><span class="p">)</span>
|
|
<span class="w"> </span><span class="p">.</span><span class="n">option</span><span class="p">(</span><span class="s">"es.net.http.auth.user"</span><span class="p">,</span><span class="w"> </span><span class="s">"admin"</span><span class="p">)</span>
|
|
<span class="w"> </span><span class="p">.</span><span class="n">option</span><span class="p">(</span><span class="s">"es.net.http.auth.pass"</span><span class="p">,</span><span class="w"> </span><span class="s">"***"</span><span class="p">)</span>
|
|
<span class="w"> </span><span class="p">.</span><span class="n">mode</span><span class="p">(</span><span class="nc">SaveMode</span><span class="p">.</span><span class="nc">Append</span><span class="p">)</span><span class="w"> </span>
|
|
<span class="w"> </span><span class="p">.</span><span class="n">save</span><span class="p">();</span>
|
|
<span class="w"> </span>
|
|
<span class="w"> </span><span class="c1">//Read data</span>
|
|
<span class="w"> </span><span class="kd">val</span><span class="w"> </span><span class="n">dataFrameR</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">sparkSession</span><span class="p">.</span><span class="n">read</span><span class="p">.</span><span class="n">format</span><span class="p">(</span><span class="s">"css"</span><span class="p">)</span>
|
|
<span class="w"> </span><span class="p">.</span><span class="n">option</span><span class="p">(</span><span class="s">"resource"</span><span class="p">,</span><span class="w"> </span><span class="n">resource</span><span class="p">)</span>
|
|
<span class="w"> </span><span class="p">.</span><span class="n">option</span><span class="p">(</span><span class="s">"es.nodes"</span><span class="p">,</span><span class="w"> </span><span class="n">nodes</span><span class="p">)</span>
|
|
<span class="w"> </span><span class="p">.</span><span class="n">option</span><span class="p">(</span><span class="s">"es.net.ssl"</span><span class="p">,</span><span class="w"> </span><span class="s">"true"</span><span class="p">)</span>
|
|
<span class="w"> </span><span class="p">.</span><span class="n">option</span><span class="p">(</span><span class="s">"es.net.ssl.keystore.location"</span><span class="p">,</span><span class="w"> </span><span class="s">"obs://Bucket name/path/transport-keystore.jks"</span><span class="p">)</span>
|
|
<span class="w"> </span><span class="p">.</span><span class="n">option</span><span class="p">(</span><span class="s">"es.net.ssl.keystore.pass"</span><span class="p">,</span><span class="w"> </span><span class="s">"***"</span><span class="p">)</span>
|
|
<span class="w"> </span><span class="p">.</span><span class="n">option</span><span class="p">(</span><span class="s">"es.net.ssl.truststore.location"</span><span class="p">,</span><span class="w"> </span><span class="s">"obs://Bucket name/path/truststore.jks"</span><span class="p">)</span>
|
|
<span class="w"> </span><span class="p">.</span><span class="n">option</span><span class="p">(</span><span class="s">"es.net.ssl.truststore.pass"</span><span class="p">,</span><span class="w"> </span><span class="s">"***"</span><span class="p">)</span>
|
|
<span class="w"> </span><span class="p">.</span><span class="n">option</span><span class="p">(</span><span class="s">"es.net.http.auth.user"</span><span class="p">,</span><span class="w"> </span><span class="s">"admin"</span><span class="p">)</span>
|
|
<span class="w"> </span><span class="p">.</span><span class="n">option</span><span class="p">(</span><span class="s">"es.net.http.auth.pass"</span><span class="p">,</span><span class="w"> </span><span class="s">"***"</span><span class="p">)</span>
|
|
<span class="w"> </span><span class="p">.</span><span class="n">load</span><span class="p">()</span>
|
|
<span class="w"> </span><span class="n">dataFrameR</span><span class="p">.</span><span class="n">show</span><span class="p">()</span>
|
|
|
|
<span class="w"> </span><span class="n">spardSession</span><span class="p">.</span><span class="n">close</span><span class="p">()</span>
|
|
<span class="w"> </span><span class="p">}</span>
|
|
<span class="p">}</span>
|
|
</pre></div></td></tr></table></div>
|
|
|
|
</div>
|
|
</li></ul>
|
|
</li></ul>
|
|
</div>
|
|
</div>
|
|
<div>
|
|
<div class="familylinks">
|
|
<div class="parentlink"><strong>Parent topic:</strong> <a href="dli_09_0089.html">Connecting to CSS</a></div>
|
|
</div>
|
|
</div>
|
|
|