forked from docs/doc-exports
Reviewed-by: Pruthi, Vineet <vineet.pruthi@t-systems.com> Co-authored-by: Hasko, Vladimir <vladimir.hasko@t-systems.com> Co-committed-by: Hasko, Vladimir <vladimir.hasko@t-systems.com>
578 lines
74 KiB
HTML
578 lines
74 KiB
HTML
<a name="dli_09_0067"></a><a name="dli_09_0067"></a>
|
|
|
|
<h1 class="topictitle1">Scala Example Code</h1>
|
|
<div id="body8662426"><div class="section" id="dli_09_0067__section9281315175512"><h4 class="sectiontitle">Development Description</h4><ul id="dli_09_0067__ul1283018389553"><li id="dli_09_0067__li198301938205515">Prerequisites<p id="dli_09_0067__en-us_topic_0190647826_p1088215354811"><a name="dli_09_0067__li198301938205515"></a><a name="li198301938205515"></a>A datasource connection has been created and bound to a queue on the DLI management console. </p>
|
|
<div class="note" id="dli_09_0067__note1358715714155"><img src="public_sys-resources/note_3.0-en-us.png"><span class="notetitle"> </span><div class="notebody"><p id="dli_09_0067__p1858718570154">Hard-coded or plaintext passwords pose significant security risks. To ensure security, encrypt your passwords, store them in configuration files or environment variables, and decrypt them when needed.</p>
|
|
</div></div>
|
|
</li><li id="dli_09_0067__li361730175612">Constructing dependency information and creating a Spark session<ol id="dli_09_0067__en-us_topic_0190647826_ol831808585"><li id="dli_09_0067__en-us_topic_0190647826_li1822810810586">Import dependencies.<p id="dli_09_0067__en-us_topic_0190647826_p9751145613019"><a name="dli_09_0067__en-us_topic_0190647826_li1822810810586"></a><a name="en-us_topic_0190647826_li1822810810586"></a>Maven dependency involved</p>
|
|
<div class="codecoloring" codetype="Scala" id="dli_09_0067__en-us_topic_0190647826_screen5760163172012"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span>
|
|
<span class="normal">2</span>
|
|
<span class="normal">3</span>
|
|
<span class="normal">4</span>
|
|
<span class="normal">5</span></pre></div></td><td class="code"><div><pre><span></span><span class="o"><</span><span class="n">dependency</span><span class="o">></span>
|
|
<span class="w"> </span><span class="o"><</span><span class="n">groupId</span><span class="o">></span><span class="n">org</span><span class="p">.</span><span class="n">apache</span><span class="p">.</span><span class="n">spark</span><span class="o"></</span><span class="n">groupId</span><span class="o">></span>
|
|
<span class="w"> </span><span class="o"><</span><span class="n">artifactId</span><span class="o">></span><span class="n">spark</span><span class="o">-</span><span class="n">sql_2</span><span class="mf">.11</span><span class="o"></</span><span class="n">artifactId</span><span class="o">></span>
|
|
<span class="w"> </span><span class="o"><</span><span class="n">version</span><span class="o">></span><span class="mf">2.3.2</span><span class="o"></</span><span class="n">version</span><span class="o">></span>
|
|
<span class="o"></</span><span class="n">dependency</span><span class="o">></span>
|
|
</pre></div></td></tr></table></div>
|
|
|
|
</div>
|
|
<div class="p" id="dli_09_0067__en-us_topic_0190647826_p13761330205">Import dependency packages.<div class="codecoloring" codetype="Scala" id="dli_09_0067__en-us_topic_0190647826_screen1761153192016"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span>
|
|
<span class="normal">2</span>
|
|
<span class="normal">3</span></pre></div></td><td class="code"><div><pre><span></span><span class="k">import</span><span class="w"> </span><span class="nn">java</span><span class="p">.</span><span class="nn">util</span><span class="p">.</span><span class="nc">Properties</span>
|
|
<span class="k">import</span><span class="w"> </span><span class="nn">org</span><span class="p">.</span><span class="nn">apache</span><span class="p">.</span><span class="nn">spark</span><span class="p">.</span><span class="nn">sql</span><span class="p">.{</span><span class="nc">Row</span><span class="p">,</span><span class="nc">SparkSession</span><span class="p">}</span>
|
|
<span class="k">import</span><span class="w"> </span><span class="nn">org</span><span class="p">.</span><span class="nn">apache</span><span class="p">.</span><span class="nn">spark</span><span class="p">.</span><span class="nn">sql</span><span class="p">.</span><span class="nc">SaveMode</span>
|
|
</pre></div></td></tr></table></div>
|
|
|
|
</div>
|
|
</div>
|
|
</li><li id="dli_09_0067__en-us_topic_0190647826_li663417557599">Create a session.<div class="codecoloring" codetype="Scala" id="dli_09_0067__en-us_topic_0190647826_screen1363475510592"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span></pre></div></td><td class="code"><div><pre><span></span><span class="kd">val</span><span class="w"> </span><span class="n">sparkSession</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nc">SparkSession</span><span class="p">.</span><span class="n">builder</span><span class="p">().</span><span class="n">getOrCreate</span><span class="p">()</span>
|
|
</pre></div></td></tr></table></div>
|
|
|
|
</div>
|
|
</li></ol>
|
|
</li><li id="dli_09_0067__li61650276565">Connecting to data sources through SQL APIs<ol id="dli_09_0067__en-us_topic_0190647826_ol11169755105419"><li id="dli_09_0067__en-us_topic_0190647826_li1216955517548">Create a table to connect to an RDS data source and set connection parameters.<div class="codecoloring" codetype="Scala" id="dli_09_0067__en-us_topic_0190647826_screen187422020161315"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span>
|
|
<span class="normal">2</span>
|
|
<span class="normal">3</span>
|
|
<span class="normal">4</span>
|
|
<span class="normal">5</span>
|
|
<span class="normal">6</span>
|
|
<span class="normal">7</span></pre></div></td><td class="code"><div><pre><span></span><span class="n">sparkSession</span><span class="p">.</span><span class="n">sql</span><span class="p">(</span>
|
|
<span class="w"> </span><span class="s">"CREATE TABLE IF NOT EXISTS dli_to_rds USING JDBC OPTIONS (</span>
|
|
<span class="s"> 'url'='jdbc:mysql://to-rds-1174404209-cA37siB6.datasource.com:3306', // Set this parameter to the actual URL.</span>
|
|
<span class="s"> 'dbtable'='test.customer',</span>
|
|
<span class="s"> 'user'='root', // Set this parameter to the actual user.</span>
|
|
<span class="s"> 'password'='######', // Set this parameter to the actual password.</span>
|
|
<span class="s"> 'driver'='com.mysql.jdbc.Driver')"</span><span class="p">)</span>
|
|
</pre></div></td></tr></table></div>
|
|
|
|
</div>
|
|
|
|
<div class="tablenoborder"><a name="dli_09_0067__en-us_topic_0190647826_table127421320141311"></a><a name="en-us_topic_0190647826_table127421320141311"></a><table cellpadding="4" cellspacing="0" summary="" id="dli_09_0067__en-us_topic_0190647826_table127421320141311" frame="border" border="1" rules="all"><caption><b>Table 1 </b>Parameters for creating a table</caption><thead align="left"><tr id="dli_09_0067__en-us_topic_0190647826_row674182010135"><th align="left" class="cellrowborder" valign="top" width="20.61%" id="mcps1.3.1.2.3.1.1.2.2.3.1.1"><p id="dli_09_0067__en-us_topic_0190647826_p10741720161314">Parameter</p>
|
|
</th>
|
|
<th align="left" class="cellrowborder" valign="top" width="79.39%" id="mcps1.3.1.2.3.1.1.2.2.3.1.2"><p id="dli_09_0067__en-us_topic_0190647826_p1474118207137">Description</p>
|
|
</th>
|
|
</tr>
|
|
</thead>
|
|
<tbody><tr id="dli_09_0067__en-us_topic_0190647826_row1074213203131"><td class="cellrowborder" valign="top" width="20.61%" headers="mcps1.3.1.2.3.1.1.2.2.3.1.1 "><p id="dli_09_0067__en-us_topic_0190647826_p1874213208138">url</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="79.39%" headers="mcps1.3.1.2.3.1.1.2.2.3.1.2 "><p id="dli_09_0067__en-us_topic_0190647826_p216518281377">To obtain an RDS IP address, you need to create a datasource connection first. Refer to the <em id="dli_09_0067__i1950719556117">Data Lake Insight User Guide</em> for more information.</p>
|
|
<p id="dli_09_0067__en-us_topic_0190647826_p17658829185311">If you have created an enhanced datasource connection, use the internal network domain name or internal network address and the database port number provided by RDS to set up the connection. If MySQL is used, the format is <strong id="dli_09_0067__en-us_topic_0190647826_b1145111262286"><em id="dli_09_0067__i66001031162812">Protocol header</em>://<em id="dli_09_0067__i2293153718284">Internal IP address</em>:<em id="dli_09_0067__i6240445192815">Internal network port number</em></strong>. If PostgreSQL is used, the format is <strong id="dli_09_0067__en-us_topic_0190647826_b14322192002913"><em id="dli_09_0067__i17564823307">Protocol header</em>://<em id="dli_09_0067__i1347488183015">Internal IP address</em>:<em id="dli_09_0067__i811371463015">Internal network port number</em>/<em id="dli_09_0067__i990251814301">Database name</em></strong>.</p>
|
|
<p id="dli_09_0067__en-us_topic_0190647826_p14832153644513">For example: <strong id="dli_09_0067__b7926342185114">jdbc:mysql://192.168.0.193:3306</strong> or <strong id="dli_09_0067__b1993424215513">jdbc:postgresql://192.168.0.193:3306/postgres</strong>.</p>
|
|
</td>
|
|
</tr>
|
|
<tr id="dli_09_0067__row139373453519"><td class="cellrowborder" valign="top" width="20.61%" headers="mcps1.3.1.2.3.1.1.2.2.3.1.1 "><p id="dli_09_0067__en-us_topic_0190647826_p13742720171310">dbtable</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="79.39%" headers="mcps1.3.1.2.3.1.1.2.2.3.1.2 "><p id="dli_09_0067__en-us_topic_0190647826_en-us_topic_0114776213_en-us_topic_0103157088_p570649132164">To connect to a MySQL cluster, enter <strong id="dli_09_0067__en-us_topic_0190647826_b7138193714013">Database name.Table name</strong>. To connect to a PostgreSQL cluster, enter <strong id="dli_09_0067__en-us_topic_0190647826_b157474519413">Mode name.Table name</strong>.</p>
|
|
<div class="note" id="dli_09_0067__note1652944662814"><span class="notetitle"> NOTE: </span><div class="notebody"><p id="dli_09_0067__p195331446162817">If the database and table do not exist, create them first. Otherwise, the system reports an error and fails to run.</p>
|
|
</div></div>
|
|
</td>
|
|
</tr>
|
|
<tr id="dli_09_0067__en-us_topic_0190647826_row20742112081315"><td class="cellrowborder" valign="top" width="20.61%" headers="mcps1.3.1.2.3.1.1.2.2.3.1.1 "><p id="dli_09_0067__en-us_topic_0190647826_p18742320141311">user</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="79.39%" headers="mcps1.3.1.2.3.1.1.2.2.3.1.2 "><p id="dli_09_0067__en-us_topic_0190647826_p19742162011317">RDS database username.</p>
|
|
</td>
|
|
</tr>
|
|
<tr id="dli_09_0067__en-us_topic_0190647826_row5742220191318"><td class="cellrowborder" valign="top" width="20.61%" headers="mcps1.3.1.2.3.1.1.2.2.3.1.1 "><p id="dli_09_0067__en-us_topic_0190647826_p1674212091318">password</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="79.39%" headers="mcps1.3.1.2.3.1.1.2.2.3.1.2 "><p id="dli_09_0067__en-us_topic_0190647826_p147421208131">RDS database password.</p>
|
|
</td>
|
|
</tr>
|
|
<tr id="dli_09_0067__en-us_topic_0190647826_row19946132619151"><td class="cellrowborder" valign="top" width="20.61%" headers="mcps1.3.1.2.3.1.1.2.2.3.1.1 "><p id="dli_09_0067__en-us_topic_0190647826_en-us_topic_0114776213_en-us_topic_0103157088_p584237211576">driver</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="79.39%" headers="mcps1.3.1.2.3.1.1.2.2.3.1.2 "><p id="dli_09_0067__en-us_topic_0190647826_p694713265151">JDBC driver class name. To connect to a MySQL cluster, enter <strong id="dli_09_0067__en-us_topic_0190647826_b103515396432">com.mysql.jdbc.Driver</strong>. To connect to a PostgreSQL cluster, enter <strong id="dli_09_0067__en-us_topic_0190647826_b111731114114412">org.postgresql.Driver</strong>.</p>
|
|
</td>
|
|
</tr>
|
|
<tr id="dli_09_0067__en-us_topic_0190647826_row560310553369"><td class="cellrowborder" valign="top" width="20.61%" headers="mcps1.3.1.2.3.1.1.2.2.3.1.1 "><p id="dli_09_0067__en-us_topic_0190647826_p115736213142">partitionColumn</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="79.39%" headers="mcps1.3.1.2.3.1.1.2.2.3.1.2 "><p id="dli_09_0067__en-us_topic_0190647826_p657314251414">One of the numeric fields that are required for concurrently reading data.</p>
|
|
<div class="note" id="dli_09_0067__en-us_topic_0190647826_note84102206211"><span class="notetitle"> NOTE: </span><div class="notebody"><ul id="dli_09_0067__ul1795720614416"><li id="dli_09_0067__li5957362415">The <strong id="dli_09_0067__b113965131916">partitionColumn</strong>, <strong id="dli_09_0067__b14007131914">lowerBound</strong>, <strong id="dli_09_0067__b440061151911">upperBound</strong>, and <strong id="dli_09_0067__b144019161910">numPartitions</strong> parameters must be set at the same time.</li><li id="dli_09_0067__li995836442">To improve the concurrent read performance, you are advised to use auto-increment columns.</li></ul>
|
|
</div></div>
|
|
</td>
|
|
</tr>
|
|
<tr id="dli_09_0067__en-us_topic_0190647826_row4603755123619"><td class="cellrowborder" valign="top" width="20.61%" headers="mcps1.3.1.2.3.1.1.2.2.3.1.1 "><p id="dli_09_0067__en-us_topic_0190647826_p17774145171412">lowerBound</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="79.39%" headers="mcps1.3.1.2.3.1.1.2.2.3.1.2 "><p id="dli_09_0067__en-us_topic_0190647826_p8774175121411">Minimum value of a column specified by <strong id="dli_09_0067__en-us_topic_0190647826_b641419454252">partitionColumn</strong>. The value is contained in the returned result.</p>
|
|
</td>
|
|
</tr>
|
|
<tr id="dli_09_0067__en-us_topic_0190647826_row1560375533616"><td class="cellrowborder" valign="top" width="20.61%" headers="mcps1.3.1.2.3.1.1.2.2.3.1.1 "><p id="dli_09_0067__en-us_topic_0190647826_p106263918145">upperBound</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="79.39%" headers="mcps1.3.1.2.3.1.1.2.2.3.1.2 "><p id="dli_09_0067__en-us_topic_0190647826_p09634592231">Maximum value of a column specified by <strong id="dli_09_0067__en-us_topic_0190647826_b552118472259">partitionColumn</strong>. The value is not contained in the returned result.</p>
|
|
</td>
|
|
</tr>
|
|
<tr id="dli_09_0067__en-us_topic_0190647826_row46033559365"><td class="cellrowborder" valign="top" width="20.61%" headers="mcps1.3.1.2.3.1.1.2.2.3.1.1 "><p id="dli_09_0067__en-us_topic_0190647826_p175509134149">numPartitions</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="79.39%" headers="mcps1.3.1.2.3.1.1.2.2.3.1.2 "><p id="dli_09_0067__en-us_topic_0190647826_p135501013191415">Number of concurrent read operations.</p>
|
|
<div class="note" id="dli_09_0067__en-us_topic_0190647826_note1178124011241"><span class="notetitle"> NOTE: </span><div class="notebody"><p id="dli_09_0067__en-us_topic_0190647826_p3178240102415">When data is read,<strong id="dli_09_0067__en-us_topic_0190647826_b83361539257"> lowerBound</strong> and <strong id="dli_09_0067__en-us_topic_0190647826_b9337153182514">upperBound</strong> are evenly allocated to each task to obtain data. Example:</p>
|
|
<p id="dli_09_0067__en-us_topic_0190647826_p18913250182719">'partitionColumn'='id',</p>
|
|
<p id="dli_09_0067__en-us_topic_0190647826_p2062520610287">'lowerBound'='0',</p>
|
|
<p id="dli_09_0067__en-us_topic_0190647826_p16744113102810">'upperBound'='100',</p>
|
|
<p id="dli_09_0067__en-us_topic_0190647826_p9266172843018">'numPartitions'='2'</p>
|
|
<p id="dli_09_0067__en-us_topic_0190647826_p1361474310301">Two concurrent tasks are started in DLI. The execution ID of one task is greater than or equal to <strong id="dli_09_0067__en-us_topic_0190647826_b18536175813255">0</strong> and the ID is smaller than <strong id="dli_09_0067__en-us_topic_0190647826_b19536115813251">50</strong>; the execution ID of the other task is greater than or equal to <strong id="dli_09_0067__en-us_topic_0190647826_b5536165892518">50</strong> and the ID is smaller than <strong id="dli_09_0067__en-us_topic_0190647826_b75371958122519">100</strong>.</p>
|
|
</div></div>
|
|
</td>
|
|
</tr>
|
|
<tr id="dli_09_0067__en-us_topic_0190647826_row1560395533617"><td class="cellrowborder" valign="top" width="20.61%" headers="mcps1.3.1.2.3.1.1.2.2.3.1.1 "><p id="dli_09_0067__en-us_topic_0190647826_p10294816131410">fetchsize</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="79.39%" headers="mcps1.3.1.2.3.1.1.2.2.3.1.2 "><p id="dli_09_0067__en-us_topic_0190647826_p20295111641419">Number of data records obtained in each batch during data reading. The default value is <strong id="dli_09_0067__en-us_topic_0190647826_b1250455265">1000</strong>. If this parameter is set to a large value, the performance is good but more memory is occupied, causing memory overflow as a result.</p>
|
|
</td>
|
|
</tr>
|
|
<tr id="dli_09_0067__en-us_topic_0190647826_row14603115513362"><td class="cellrowborder" valign="top" width="20.61%" headers="mcps1.3.1.2.3.1.1.2.2.3.1.1 "><p id="dli_09_0067__en-us_topic_0190647826_p714473981918">batchsize</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="79.39%" headers="mcps1.3.1.2.3.1.1.2.2.3.1.2 "><p id="dli_09_0067__en-us_topic_0190647826_p14144123961915">Number of data records written in each batch. The default value is <strong id="dli_09_0067__en-us_topic_0190647826_b19332181711267">1000</strong>. If this parameter is set to a large value, the performance is good but more memory is occupied, causing memory overflow as a result.</p>
|
|
</td>
|
|
</tr>
|
|
<tr id="dli_09_0067__en-us_topic_0190647826_row5603355163615"><td class="cellrowborder" valign="top" width="20.61%" headers="mcps1.3.1.2.3.1.1.2.2.3.1.1 "><p id="dli_09_0067__en-us_topic_0190647826_p17393542141920">truncate</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="79.39%" headers="mcps1.3.1.2.3.1.1.2.2.3.1.2 "><p id="dli_09_0067__en-us_topic_0190647826_p339304231915">Whether to clear the table without deleting the original table when <strong id="dli_09_0067__en-us_topic_0190647826_b5686162120265">overwrite</strong> is executed. The options are as follows:</p>
|
|
<ul id="dli_09_0067__ul20537152011419"><li id="dli_09_0067__li175371201245">true</li><li id="dli_09_0067__li14537172019410">false</li></ul>
|
|
<p id="dli_09_0067__en-us_topic_0190647826_p1212110107349">The default value is <span class="parmvalue" id="dli_09_0067__en-us_topic_0190647826_parmvalue1115442622618"><b>false</b></span>, indicating that the original table is deleted and then a new table is created when the <strong id="dli_09_0067__en-us_topic_0190647826_b121551263263">overwrite</strong> operation is performed.</p>
|
|
</td>
|
|
</tr>
|
|
<tr id="dli_09_0067__en-us_topic_0190647826_row13602165520364"><td class="cellrowborder" valign="top" width="20.61%" headers="mcps1.3.1.2.3.1.1.2.2.3.1.1 "><p id="dli_09_0067__en-us_topic_0190647826_p73369516203">isolationLevel</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="79.39%" headers="mcps1.3.1.2.3.1.1.2.2.3.1.2 "><p id="dli_09_0067__en-us_topic_0190647826_p113361357206">Transaction isolation level. The options are as follows:</p>
|
|
<ul id="dli_09_0067__ul16848133219718"><li id="dli_09_0067__li168481532771">NONE</li><li id="dli_09_0067__li284816321172">READ_UNCOMMITTED</li><li id="dli_09_0067__li48485322071">READ_COMMITTED</li><li id="dli_09_0067__li68481132176">REPEATABLE_READ</li><li id="dli_09_0067__li2849163212710">SERIALIZABLE</li></ul>
|
|
<p id="dli_09_0067__en-us_topic_0190647826_p1895985011353">The default value is <span class="parmvalue" id="dli_09_0067__en-us_topic_0190647826_parmvalue15592230112614"><b>READ_UNCOMMITTED</b></span>.</p>
|
|
</td>
|
|
</tr>
|
|
</tbody>
|
|
</table>
|
|
</div>
|
|
</li><li id="dli_09_0067__en-us_topic_0190647826_li911302371611">Insert data.<div class="codecoloring" codetype="Scala" id="dli_09_0067__en-us_topic_0190647826_screen1711312301613"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span></pre></div></td><td class="code"><div><pre><span></span><span class="n">sparkSession</span><span class="p">.</span><span class="n">sql</span><span class="p">(</span><span class="s">"insert into dli_to_rds values(1, 'John',24),(2, 'Bob',32)"</span><span class="p">)</span>
|
|
</pre></div></td></tr></table></div>
|
|
|
|
</div>
|
|
</li><li id="dli_09_0067__en-us_topic_0190647826_li513113315552">Query data.<div class="codecoloring" codetype="Scala" id="dli_09_0067__en-us_topic_0190647826_screen8335528349"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span>
|
|
<span class="normal">2</span></pre></div></td><td class="code"><div><pre><span></span><span class="kd">val</span><span class="w"> </span><span class="n">dataFrame</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">sparkSession</span><span class="p">.</span><span class="n">sql</span><span class="p">(</span><span class="s">"select * from dli_to_rds"</span><span class="p">)</span>
|
|
<span class="n">dataFrame</span><span class="p">.</span><span class="n">show</span><span class="p">()</span>
|
|
</pre></div></td></tr></table></div>
|
|
|
|
</div>
|
|
<p id="dli_09_0067__en-us_topic_0190647826_p980612423518">Before data is inserted</p>
|
|
<p id="dli_09_0067__en-us_topic_0190647826_p1111911912352"><span><img id="dli_09_0067__en-us_topic_0190647826_image13988182014307" src="en-us_image_0223997410.png"></span></p>
|
|
<p id="dli_09_0067__en-us_topic_0190647826_p1344084843017">After data is inserted</p>
|
|
<p id="dli_09_0067__en-us_topic_0190647826_p4683160173110"><span><img id="dli_09_0067__en-us_topic_0190647826_image1835112103113" src="en-us_image_0223997411.png"></span></p>
|
|
</li><li id="dli_09_0067__en-us_topic_0190647826_li9448332565">Delete the datasource connection table.<div class="codecoloring" codetype="Scala" id="dli_09_0067__en-us_topic_0190647826_screen83651238173211"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span></pre></div></td><td class="code"><div><pre><span></span><span class="n">sparkSession</span><span class="p">.</span><span class="n">sql</span><span class="p">(</span><span class="s">"drop table dli_to_rds"</span><span class="p">)</span>
|
|
</pre></div></td></tr></table></div>
|
|
|
|
</div>
|
|
</li></ol>
|
|
</li><li id="dli_09_0067__li1746412305570">Connecting to data sources through DataFrame APIs<ol id="dli_09_0067__en-us_topic_0190647826_ol1713544165617"><li id="dli_09_0067__en-us_topic_0190647826_li474366135711">Configure datasource connection parameters.<div class="codecoloring" codetype="Scala" id="dli_09_0067__en-us_topic_0190647826_screen184594313195"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span>
|
|
<span class="normal">2</span>
|
|
<span class="normal">3</span>
|
|
<span class="normal">4</span></pre></div></td><td class="code"><div><pre><span></span><span class="kd">val</span><span class="w"> </span><span class="n">url</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">"jdbc:mysql://to-rds-1174405057-EA1Kgo8H.datasource.com:3306"</span>
|
|
<span class="kd">val</span><span class="w"> </span><span class="n">username</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">"root"</span>
|
|
<span class="kd">val</span><span class="w"> </span><span class="n">password</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">"######"</span>
|
|
<span class="kd">val</span><span class="w"> </span><span class="n">dbtable</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">"test.customer"</span>
|
|
</pre></div></td></tr></table></div>
|
|
|
|
</div>
|
|
</li><li id="dli_09_0067__en-us_topic_0190647826_li78219071918">Create a DataFrame, add data, and rename fields.<div class="codecoloring" codetype="Scala" id="dli_09_0067__en-us_topic_0190647826_screen188211017190"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span>
|
|
<span class="normal">2</span>
|
|
<span class="normal">3</span>
|
|
<span class="normal">4</span></pre></div></td><td class="code"><div><pre><span></span><span class="kd">var</span><span class="w"> </span><span class="n">dataFrame_1</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">sparkSession</span><span class="p">.</span><span class="n">createDataFrame</span><span class="p">(</span><span class="nc">List</span><span class="p">((</span><span class="mi">8</span><span class="p">,</span><span class="w"> </span><span class="s">"Jack_1"</span><span class="p">,</span><span class="w"> </span><span class="mi">18</span><span class="p">)))</span>
|
|
<span class="kd">val</span><span class="w"> </span><span class="n">df</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">dataFrame_1</span><span class="p">.</span><span class="n">withColumnRenamed</span><span class="p">(</span><span class="s">"_1"</span><span class="p">,</span><span class="w"> </span><span class="s">"id"</span><span class="p">)</span>
|
|
<span class="w"> </span><span class="p">.</span><span class="n">withColumnRenamed</span><span class="p">(</span><span class="s">"_2"</span><span class="p">,</span><span class="w"> </span><span class="s">"name"</span><span class="p">)</span>
|
|
<span class="w"> </span><span class="p">.</span><span class="n">withColumnRenamed</span><span class="p">(</span><span class="s">"_3"</span><span class="p">,</span><span class="w"> </span><span class="s">"age"</span><span class="p">)</span>
|
|
</pre></div></td></tr></table></div>
|
|
|
|
</div>
|
|
</li><li id="dli_09_0067__en-us_topic_0190647826_li992661185815">Import data to RDS.<div class="codecoloring" codetype="Scala" id="dli_09_0067__en-us_topic_0190647826_screen3727448564"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span>
|
|
<span class="normal">2</span>
|
|
<span class="normal">3</span>
|
|
<span class="normal">4</span>
|
|
<span class="normal">5</span>
|
|
<span class="normal">6</span>
|
|
<span class="normal">7</span>
|
|
<span class="normal">8</span></pre></div></td><td class="code"><div><pre><span></span><span class="n">df</span><span class="p">.</span><span class="n">write</span><span class="p">.</span><span class="n">format</span><span class="p">(</span><span class="s">"jdbc"</span><span class="p">)</span>
|
|
<span class="w"> </span><span class="p">.</span><span class="n">option</span><span class="p">(</span><span class="s">"url"</span><span class="p">,</span><span class="w"> </span><span class="n">url</span><span class="p">)</span>
|
|
<span class="w"> </span><span class="p">.</span><span class="n">option</span><span class="p">(</span><span class="s">"dbtable"</span><span class="p">,</span><span class="w"> </span><span class="n">dbtable</span><span class="p">)</span>
|
|
<span class="w"> </span><span class="p">.</span><span class="n">option</span><span class="p">(</span><span class="s">"user"</span><span class="p">,</span><span class="w"> </span><span class="n">username</span><span class="p">)</span>
|
|
<span class="w"> </span><span class="p">.</span><span class="n">option</span><span class="p">(</span><span class="s">"password"</span><span class="p">,</span><span class="w"> </span><span class="n">password</span><span class="p">)</span>
|
|
<span class="w"> </span><span class="p">.</span><span class="n">option</span><span class="p">(</span><span class="s">"driver"</span><span class="p">,</span><span class="w"> </span><span class="s">"com.mysql.jdbc.Driver"</span><span class="p">)</span>
|
|
<span class="w"> </span><span class="p">.</span><span class="n">mode</span><span class="p">(</span><span class="nc">SaveMode</span><span class="p">.</span><span class="nc">Append</span><span class="p">)</span>
|
|
<span class="w"> </span><span class="p">.</span><span class="n">save</span><span class="p">()</span>
|
|
</pre></div></td></tr></table></div>
|
|
|
|
</div>
|
|
<div class="p" id="dli_09_0067__en-us_topic_0190647826_p93754571258"><div class="note" id="dli_09_0067__en-us_topic_0190647826_note17397174817568"><img src="public_sys-resources/note_3.0-en-us.png"><span class="notetitle"> </span><div class="notebody"><p id="dli_09_0067__en-us_topic_0190647826_p039712487568">The value of <strong id="dli_09_0067__en-us_topic_0190647826_b138892399482">SaveMode</strong> can be one of the following:</p>
|
|
<ul id="dli_09_0067__en-us_topic_0190647826_ul16164620151513"><li id="dli_09_0067__en-us_topic_0190647826_li1416452015156"><strong id="dli_09_0067__b8543101214819">ErrorIfExis</strong>: If the data already exists, the system throws an exception.</li><li id="dli_09_0067__en-us_topic_0190647826_li191651720151518"><strong id="dli_09_0067__b115941915103">Overwrite</strong>: If the data already exists, the original data will be overwritten.</li><li id="dli_09_0067__en-us_topic_0190647826_li10165620111513"><strong id="dli_09_0067__b47101296107">Append</strong>: If the data already exists, the system saves the new data.</li><li id="dli_09_0067__en-us_topic_0190647826_li181651720161514"><strong id="dli_09_0067__b159191820108">Ignore</strong>: If the data already exists, no operation is required. This is similar to the SQL statement <strong id="dli_09_0067__en-us_topic_0190647826_b4767183155614">CREATE TABLE IF NOT EXISTS</strong>.</li></ul>
|
|
</div></div>
|
|
</div>
|
|
</li><li id="dli_09_0067__en-us_topic_0190647826_li146260357585"><a name="dli_09_0067__en-us_topic_0190647826_li146260357585"></a><a name="en-us_topic_0190647826_li146260357585"></a>Read data from RDS.<ul id="dli_09_0067__en-us_topic_0190647826_ul14686182819334"><li id="dli_09_0067__en-us_topic_0190647826_li1668592818331">Method 1: read.format()<div class="codecoloring" codetype="Scala" id="dli_09_0067__en-us_topic_0190647826_screen26851628183317"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span>
|
|
<span class="normal">2</span>
|
|
<span class="normal">3</span>
|
|
<span class="normal">4</span>
|
|
<span class="normal">5</span>
|
|
<span class="normal">6</span>
|
|
<span class="normal">7</span></pre></div></td><td class="code"><div><pre><span></span><span class="kd">val</span><span class="w"> </span><span class="n">jdbcDF</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">sparkSession</span><span class="p">.</span><span class="n">read</span><span class="p">.</span><span class="n">format</span><span class="p">(</span><span class="s">"jdbc"</span><span class="p">)</span>
|
|
<span class="w"> </span><span class="p">.</span><span class="n">option</span><span class="p">(</span><span class="s">"url"</span><span class="p">,</span><span class="w"> </span><span class="n">url</span><span class="p">)</span>
|
|
<span class="w"> </span><span class="p">.</span><span class="n">option</span><span class="p">(</span><span class="s">"dbtable"</span><span class="p">,</span><span class="w"> </span><span class="n">dbtable</span><span class="p">)</span>
|
|
<span class="w"> </span><span class="p">.</span><span class="n">option</span><span class="p">(</span><span class="s">"user"</span><span class="p">,</span><span class="w"> </span><span class="n">username</span><span class="p">)</span>
|
|
<span class="w"> </span><span class="p">.</span><span class="n">option</span><span class="p">(</span><span class="s">"password"</span><span class="p">,</span><span class="w"> </span><span class="n">password</span><span class="p">)</span>
|
|
<span class="w"> </span><span class="p">.</span><span class="n">option</span><span class="p">(</span><span class="s">"driver"</span><span class="p">,</span><span class="w"> </span><span class="s">"org.postgresql.Driver"</span><span class="p">)</span>
|
|
<span class="w"> </span><span class="p">.</span><span class="n">load</span><span class="p">()</span>
|
|
</pre></div></td></tr></table></div>
|
|
|
|
</div>
|
|
</li><li id="dli_09_0067__en-us_topic_0190647826_li1368692883319">Method 2: read.jdbc()<div class="codecoloring" codetype="Scala" id="dli_09_0067__en-us_topic_0190647826_screen1768617288339"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span>
|
|
<span class="normal">2</span>
|
|
<span class="normal">3</span>
|
|
<span class="normal">4</span></pre></div></td><td class="code"><div><pre><span></span><span class="kd">val</span><span class="w"> </span><span class="n">properties</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">new</span><span class="w"> </span><span class="nc">Properties</span><span class="p">()</span>
|
|
<span class="n">properties</span><span class="p">.</span><span class="n">put</span><span class="p">(</span><span class="s">"user"</span><span class="p">,</span><span class="w"> </span><span class="n">username</span><span class="p">)</span>
|
|
<span class="n">properties</span><span class="p">.</span><span class="n">put</span><span class="p">(</span><span class="s">"password"</span><span class="p">,</span><span class="w"> </span><span class="n">password</span><span class="p">)</span>
|
|
<span class="kd">val</span><span class="w"> </span><span class="n">jdbcDF2</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">sparkSession</span><span class="p">.</span><span class="n">read</span><span class="p">.</span><span class="n">jdbc</span><span class="p">(</span><span class="n">url</span><span class="p">,</span><span class="w"> </span><span class="n">dbtable</span><span class="p">,</span><span class="w"> </span><span class="n">properties</span><span class="p">)</span>
|
|
</pre></div></td></tr></table></div>
|
|
|
|
</div>
|
|
</li></ul>
|
|
<p id="dli_09_0067__en-us_topic_0190647826_p9686122813314">Before data is inserted</p>
|
|
<p id="dli_09_0067__en-us_topic_0190647826_p166861128123319"><span><img id="dli_09_0067__en-us_topic_0190647826_image106867286336" src="en-us_image_0223997412.png"></span></p>
|
|
<p id="dli_09_0067__en-us_topic_0190647826_p16686028133314">After data is inserted</p>
|
|
<p id="dli_09_0067__en-us_topic_0190647826_p9686328153312"><span><img id="dli_09_0067__en-us_topic_0190647826_image968652813332" src="en-us_image_0223997413.png"></span></p>
|
|
<p id="dli_09_0067__en-us_topic_0190647826_p1068617283335">The DataFrame read by the <strong id="dli_09_0067__en-us_topic_0190647826_b71001851145811">read.format()</strong> or <strong id="dli_09_0067__en-us_topic_0190647826_b14788115411580">read.jdbc()</strong> method is registered as a temporary table. Then, you can use SQL statements to query data.</p>
|
|
<div class="codecoloring" codetype="Scala" id="dli_09_0067__en-us_topic_0190647826_screen16861283332"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span>
|
|
<span class="normal">2</span></pre></div></td><td class="code"><div><pre><span></span><span class="n">jdbcDF</span><span class="p">.</span><span class="n">registerTempTable</span><span class="p">(</span><span class="s">"customer_test"</span><span class="p">)</span>
|
|
<span class="n">sparkSession</span><span class="p">.</span><span class="n">sql</span><span class="p">(</span><span class="s">"select * from customer_test where id = 1"</span><span class="p">).</span><span class="n">show</span><span class="p">()</span>
|
|
</pre></div></td></tr></table></div>
|
|
|
|
</div>
|
|
<p id="dli_09_0067__en-us_topic_0190647826_p46861428173312">Query results</p>
|
|
<p id="dli_09_0067__en-us_topic_0190647826_p7686192816337"><span><img id="dli_09_0067__en-us_topic_0190647826_image136865282330" src="en-us_image_0223997414.png"></span></p>
|
|
</li></ol>
|
|
</li><li id="dli_09_0067__li16992162645816">DataFrame-related operations<p id="dli_09_0067__en-us_topic_0190647826_p256316251443"><a name="dli_09_0067__li16992162645816"></a><a name="li16992162645816"></a>The data created by the <strong id="dli_09_0067__b1393125372112">createDataFrame()</strong> method and the data queried by the <strong id="dli_09_0067__b12931053172111">read.format()</strong> method and the <strong id="dli_09_0067__b18938537210">read.jdbc()</strong> method are all DataFrame objects. You can directly query a single record. (In <a href="#dli_09_0067__en-us_topic_0190647826_li146260357585">4</a>, the DataFrame data is registered as a temporary table.)</p>
|
|
<ul id="dli_09_0067__en-us_topic_0190647826_ul42831734124912"><li id="dli_09_0067__en-us_topic_0190647826_li1283134174915">where<p id="dli_09_0067__en-us_topic_0190647826_p19687844194911"><a name="dli_09_0067__en-us_topic_0190647826_li1283134174915"></a><a name="en-us_topic_0190647826_li1283134174915"></a>The <strong id="dli_09_0067__en-us_topic_0190647826_b1412814502209">where</strong> statement can be combined with filter expressions such as AND and OR. The DataFrame object after filtering is returned. The following is an example:</p>
|
|
<div class="codecoloring" codetype="Scala" id="dli_09_0067__en-us_topic_0190647826_screen33171610519"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span></pre></div></td><td class="code"><div><pre><span></span><span class="n">jdbcDF</span><span class="p">.</span><span class="n">where</span><span class="p">(</span><span class="s">"id = 1 or age <=10"</span><span class="p">).</span><span class="n">show</span><span class="p">()</span>
|
|
</pre></div></td></tr></table></div>
|
|
|
|
</div>
|
|
<p id="dli_09_0067__en-us_topic_0190647826_p23061540145118"><span><img id="dli_09_0067__en-us_topic_0190647826_image1537955013517" src="en-us_image_0223997415.png"></span></p>
|
|
</li><li id="dli_09_0067__en-us_topic_0190647826_li1052710112528">filter<p id="dli_09_0067__en-us_topic_0190647826_p15820201214527"><a name="dli_09_0067__en-us_topic_0190647826_li1052710112528"></a><a name="en-us_topic_0190647826_li1052710112528"></a>The <strong id="dli_09_0067__b9719428122218">filter</strong> statement can be used in the same way as <strong id="dli_09_0067__b472622832215">where</strong>. The DataFrame object after filtering is returned. The following is an example:</p>
|
|
<div class="codecoloring" codetype="Scala" id="dli_09_0067__en-us_topic_0190647826_screen1430455175210"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span></pre></div></td><td class="code"><div><pre><span></span><span class="n">jdbcDF</span><span class="p">.</span><span class="n">filter</span><span class="p">(</span><span class="s">"id = 1 or age <=10"</span><span class="p">).</span><span class="n">show</span><span class="p">()</span>
|
|
</pre></div></td></tr></table></div>
|
|
|
|
</div>
|
|
<p id="dli_09_0067__en-us_topic_0190647826_p555112219531"><span><img id="dli_09_0067__en-us_topic_0190647826_image12333183495310" src="en-us_image_0223997416.png"></span></p>
|
|
</li><li id="dli_09_0067__en-us_topic_0190647826_li152231752155319">select<p id="dli_09_0067__en-us_topic_0190647826_p19347919125416"><a name="dli_09_0067__en-us_topic_0190647826_li152231752155319"></a><a name="en-us_topic_0190647826_li152231752155319"></a>The <strong id="dli_09_0067__en-us_topic_0190647826_b1263642213208">select</strong> statement is used to query the DataFrame object of the specified field. Multiple fields can be queried.</p>
|
|
<ul id="dli_09_0067__en-us_topic_0190647826_ul866719515557"><li id="dli_09_0067__en-us_topic_0190647826_li11622335518">Example 1:<div class="codecoloring" codetype="Scala" id="dli_09_0067__en-us_topic_0190647826_screen720712280554"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span></pre></div></td><td class="code"><div><pre><span></span><span class="n">jdbcDF</span><span class="p">.</span><span class="n">select</span><span class="p">(</span><span class="s">"id"</span><span class="p">).</span><span class="n">show</span><span class="p">()</span>
|
|
</pre></div></td></tr></table></div>
|
|
|
|
</div>
|
|
<p id="dli_09_0067__en-us_topic_0190647826_p376801511586"><span><img id="dli_09_0067__en-us_topic_0190647826_image1521542615812" src="en-us_image_0223997417.png"></span></p>
|
|
</li><li id="dli_09_0067__en-us_topic_0190647826_li282410115615">Example 2:<div class="codecoloring" codetype="Scala" id="dli_09_0067__en-us_topic_0190647826_screen147057205560"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span></pre></div></td><td class="code"><div><pre><span></span><span class="n">jdbcDF</span><span class="p">.</span><span class="n">select</span><span class="p">(</span><span class="s">"id"</span><span class="p">,</span><span class="w"> </span><span class="s">"name"</span><span class="p">).</span><span class="n">show</span><span class="p">()</span>
|
|
</pre></div></td></tr></table></div>
|
|
|
|
</div>
|
|
<p id="dli_09_0067__en-us_topic_0190647826_p981913565812"><span><img id="dli_09_0067__en-us_topic_0190647826_image493144518581" src="en-us_image_0223997418.png"></span></p>
|
|
</li><li id="dli_09_0067__en-us_topic_0190647826_li569594313568">Example 3:<div class="codecoloring" codetype="Scala" id="dli_09_0067__en-us_topic_0190647826_screen884051035712"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span></pre></div></td><td class="code"><div><pre><span></span><span class="n">jdbcDF</span><span class="p">.</span><span class="n">select</span><span class="p">(</span><span class="s">"id"</span><span class="p">,</span><span class="s">"name"</span><span class="p">).</span><span class="n">where</span><span class="p">(</span><span class="s">"id<4"</span><span class="p">).</span><span class="n">show</span><span class="p">()</span>
|
|
</pre></div></td></tr></table></div>
|
|
|
|
</div>
|
|
<p id="dli_09_0067__en-us_topic_0190647826_p969625418585"><span><img id="dli_09_0067__en-us_topic_0190647826_image1857104115916" src="en-us_image_0223997419.png"></span></p>
|
|
</li></ul>
|
|
</li><li id="dli_09_0067__en-us_topic_0190647826_li1933094065919">selectExpr<p id="dli_09_0067__en-us_topic_0190647826_p133711121805"><a name="dli_09_0067__en-us_topic_0190647826_li1933094065919"></a><a name="en-us_topic_0190647826_li1933094065919"></a><strong id="dli_09_0067__en-us_topic_0190647826_b861935611198">selectExpr</strong> is used to perform special processing on a field. For example, the <strong id="dli_09_0067__en-us_topic_0190647826_b8476101616102">selectExpr</strong> function can be used to change the field name. The following is an example:</p>
|
|
<p id="dli_09_0067__en-us_topic_0190647826_p66334116219">If you want to set the <strong id="dli_09_0067__en-us_topic_0190647826_b14966247171319">name</strong> field to <strong id="dli_09_0067__en-us_topic_0190647826_b16389133911139">name_test</strong> and add 1 to the value of <strong id="dli_09_0067__en-us_topic_0190647826_b17692115281114">age</strong>, run the following statement:</p>
|
|
<div class="codecoloring" codetype="Scala" id="dli_09_0067__en-us_topic_0190647826_screen2312105913417"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span></pre></div></td><td class="code"><div><pre><span></span><span class="n">jdbcDF</span><span class="p">.</span><span class="n">selectExpr</span><span class="p">(</span><span class="s">"id"</span><span class="p">,</span><span class="w"> </span><span class="s">"name as name_test"</span><span class="p">,</span><span class="w"> </span><span class="s">"age+1"</span><span class="p">).</span><span class="n">show</span><span class="p">()</span>
|
|
</pre></div></td></tr></table></div>
|
|
|
|
</div>
|
|
</li><li id="dli_09_0067__en-us_topic_0190647826_li1249223720518">col<p id="dli_09_0067__en-us_topic_0190647826_p119341053157"><a name="dli_09_0067__en-us_topic_0190647826_li1249223720518"></a><a name="en-us_topic_0190647826_li1249223720518"></a><strong id="dli_09_0067__en-us_topic_0190647826_b972014255240">col</strong> is used to obtain a specified field. Different from <strong id="dli_09_0067__en-us_topic_0190647826_b1222513142415">select</strong>, <strong id="dli_09_0067__en-us_topic_0190647826_b96575321245">col</strong> can only be used to query the column type and one field can be returned at a time. The following is an example:</p>
|
|
<div class="codecoloring" codetype="Scala" id="dli_09_0067__en-us_topic_0190647826_screen5117162121019"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span></pre></div></td><td class="code"><div><pre><span></span><span class="kd">val</span><span class="w"> </span><span class="n">idCol</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">jdbcDF</span><span class="p">.</span><span class="n">col</span><span class="p">(</span><span class="s">"id"</span><span class="p">)</span>
|
|
</pre></div></td></tr></table></div>
|
|
|
|
</div>
|
|
</li><li id="dli_09_0067__en-us_topic_0190647826_li1743853613133">drop<p id="dli_09_0067__en-us_topic_0190647826_p20754345201313"><a name="dli_09_0067__en-us_topic_0190647826_li1743853613133"></a><a name="en-us_topic_0190647826_li1743853613133"></a><strong id="dli_09_0067__en-us_topic_0190647826_b5773172213286">drop</strong> is used to delete a specified field. Specify a field you need to delete (only one field can be deleted at a time), the DataFrame object that does not contain the field is returned. The following is an example:</p>
|
|
<div class="codecoloring" codetype="Scala" id="dli_09_0067__en-us_topic_0190647826_screen174231152181411"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span></pre></div></td><td class="code"><div><pre><span></span><span class="n">jdbcDF</span><span class="p">.</span><span class="n">drop</span><span class="p">(</span><span class="s">"id"</span><span class="p">).</span><span class="n">show</span><span class="p">()</span>
|
|
</pre></div></td></tr></table></div>
|
|
|
|
</div>
|
|
<p id="dli_09_0067__en-us_topic_0190647826_p41511136156"><span><img id="dli_09_0067__en-us_topic_0190647826_image4709299159" src="en-us_image_0223997420.png"></span></p>
|
|
</li></ul>
|
|
</li><li id="dli_09_0067__li38961079590">Submitting a Spark job<ol id="dli_09_0067__ol1457716359012"><li id="dli_09_0067__li1692416144334">Generate a JAR package based on the code and upload the package to DLI.<p id="dli_09_0067__dli_09_0063_p1749619513385"><a name="dli_09_0067__li1692416144334"></a><a name="li1692416144334"></a></p>
|
|
<p id="dli_09_0067__dli_09_0063_p114961151385"></p>
|
|
</li><li id="dli_09_0067__li188823513258">In the Spark job editor, select the corresponding dependency module and execute the Spark job.<p id="dli_09_0067__p1042917566256"><a name="dli_09_0067__li188823513258"></a><a name="li188823513258"></a></p>
|
|
<div class="p" id="dli_09_0067__p9294453142516"><div class="note" id="dli_09_0067__en-us_topic_0190647826_note1435543551919"><img src="public_sys-resources/note_3.0-en-us.png"><span class="notetitle"> </span><div class="notebody"><ul id="dli_09_0067__en-us_topic_0190647826_ul17825285811"><li id="dli_09_0067__en-us_topic_0197738142_li58215295819">If the Spark version is 2.3.2 (will be offline soon) or 2.4.5, specify the <strong id="dli_09_0067__b204191638593">Module</strong> to <strong id="dli_09_0067__b14419203145917">sys.datasource.rds</strong> when you submit a job.</li><li id="dli_09_0067__li6624653171317">If the Spark version is 3.1.1, you do not need to select a module. Configure <strong id="dli_09_0067__b7800498595">Spark parameters (--conf)</strong>.<p id="dli_09_0067__p131071216162710">spark.driver.extraClassPath=/usr/share/extension/dli/spark-jar/datasource/rds/*</p>
|
|
<p id="dli_09_0067__p210761614275">spark.executor.extraClassPath=/usr/share/extension/dli/spark-jar/datasource/rds/*</p>
|
|
</li></ul>
|
|
</div></div>
|
|
</div>
|
|
</li></ol>
|
|
</li></ul>
|
|
</div>
|
|
<div class="section" id="dli_09_0067__section1337013191117"><h4 class="sectiontitle">Complete Example Code</h4><ul id="dli_09_0067__ul962710319114"><li id="dli_09_0067__li16627123115110">Maven dependency<div class="codecoloring" codetype="Scala" id="dli_09_0067__en-us_topic_0190647825_screen1963733125215"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span>
|
|
<span class="normal">2</span>
|
|
<span class="normal">3</span>
|
|
<span class="normal">4</span>
|
|
<span class="normal">5</span></pre></div></td><td class="code"><div><pre><span></span><span class="o"><</span><span class="n">dependency</span><span class="o">></span>
|
|
<span class="w"> </span><span class="o"><</span><span class="n">groupId</span><span class="o">></span><span class="n">org</span><span class="p">.</span><span class="n">apache</span><span class="p">.</span><span class="n">spark</span><span class="o"></</span><span class="n">groupId</span><span class="o">></span>
|
|
<span class="w"> </span><span class="o"><</span><span class="n">artifactId</span><span class="o">></span><span class="n">spark</span><span class="o">-</span><span class="n">sql_2</span><span class="mf">.11</span><span class="o"></</span><span class="n">artifactId</span><span class="o">></span>
|
|
<span class="w"> </span><span class="o"><</span><span class="n">version</span><span class="o">></span><span class="mf">2.3.2</span><span class="o"></</span><span class="n">version</span><span class="o">></span>
|
|
<span class="o"></</span><span class="n">dependency</span><span class="o">></span>
|
|
</pre></div></td></tr></table></div>
|
|
|
|
</div>
|
|
</li><li id="dli_09_0067__li746475512120">Connecting to data sources through SQL APIs<div class="codecoloring" codetype="Scala" id="dli_09_0067__en-us_topic_0190647825_screen172412256524"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal"> 1</span>
|
|
<span class="normal"> 2</span>
|
|
<span class="normal"> 3</span>
|
|
<span class="normal"> 4</span>
|
|
<span class="normal"> 5</span>
|
|
<span class="normal"> 6</span>
|
|
<span class="normal"> 7</span>
|
|
<span class="normal"> 8</span>
|
|
<span class="normal"> 9</span>
|
|
<span class="normal">10</span>
|
|
<span class="normal">11</span>
|
|
<span class="normal">12</span>
|
|
<span class="normal">13</span>
|
|
<span class="normal">14</span>
|
|
<span class="normal">15</span>
|
|
<span class="normal">16</span>
|
|
<span class="normal">17</span>
|
|
<span class="normal">18</span>
|
|
<span class="normal">19</span>
|
|
<span class="normal">20</span>
|
|
<span class="normal">21</span>
|
|
<span class="normal">22</span>
|
|
<span class="normal">23</span>
|
|
<span class="normal">24</span>
|
|
<span class="normal">25</span>
|
|
<span class="normal">26</span>
|
|
<span class="normal">27</span>
|
|
<span class="normal">28</span>
|
|
<span class="normal">29</span>
|
|
<span class="normal">30</span></pre></div></td><td class="code"><div><pre><span></span><span class="k">import</span><span class="w"> </span><span class="nn">java</span><span class="p">.</span><span class="nn">util</span><span class="p">.</span><span class="nc">Properties</span>
|
|
<span class="k">import</span><span class="w"> </span><span class="nn">org</span><span class="p">.</span><span class="nn">apache</span><span class="p">.</span><span class="nn">spark</span><span class="p">.</span><span class="nn">sql</span><span class="p">.</span><span class="nc">SparkSession</span>
|
|
|
|
<span class="k">object</span><span class="w"> </span><span class="nc">Test_SQL_RDS</span><span class="w"> </span><span class="p">{</span>
|
|
<span class="w"> </span><span class="k">def</span><span class="w"> </span><span class="nf">main</span><span class="p">(</span><span class="n">args</span><span class="p">:</span><span class="w"> </span><span class="nc">Array</span><span class="p">[</span><span class="nc">String</span><span class="p">]):</span><span class="w"> </span><span class="nc">Unit</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{</span>
|
|
<span class="w"> </span><span class="c1">// Create a SparkSession session.</span>
|
|
<span class="w"> </span><span class="kd">val</span><span class="w"> </span><span class="n">sparkSession</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nc">SparkSession</span><span class="p">.</span><span class="n">builder</span><span class="p">().</span><span class="n">getOrCreate</span><span class="p">()</span>
|
|
<span class="w"> </span>
|
|
<span class="w"> </span><span class="c1">// Create a data table for DLI-associated RDS</span>
|
|
<span class="w"> </span><span class="n">sparkSession</span><span class="p">.</span><span class="n">sql</span><span class="p">(</span><span class="s">"CREATE TABLE IF NOT EXISTS dli_to_rds USING JDBC OPTIONS (</span>
|
|
<span class="s"> 'url'='jdbc:mysql://to-rds-1174404209-cA37siB6.datasource.com:3306,</span>
|
|
<span class="s"> 'dbtable'='test.customer',</span>
|
|
<span class="s"> 'user'='root',</span>
|
|
<span class="s"> 'password'='######',</span>
|
|
<span class="s"> 'driver'='com.mysql.jdbc.Driver')"</span><span class="p">)</span>
|
|
|
|
<span class="w"> </span><span class="c1">//*****************************SQL model***********************************</span>
|
|
<span class="w"> </span><span class="c1">//Insert data into the DLI data table</span>
|
|
<span class="w"> </span><span class="n">sparkSession</span><span class="p">.</span><span class="n">sql</span><span class="p">(</span><span class="s">"insert into dli_to_rds values(1,'John',24),(2,'Bob',32)"</span><span class="p">)</span>
|
|
<span class="w"> </span>
|
|
<span class="w"> </span><span class="c1">//Read data from DLI data table</span>
|
|
<span class="w"> </span><span class="kd">val</span><span class="w"> </span><span class="n">dataFrame</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">sparkSession</span><span class="p">.</span><span class="n">sql</span><span class="p">(</span><span class="s">"select * from dli_to_rds"</span><span class="p">)</span>
|
|
<span class="w"> </span><span class="n">dataFrame</span><span class="p">.</span><span class="n">show</span><span class="p">()</span>
|
|
<span class="w"> </span>
|
|
<span class="w"> </span><span class="c1">//drop table</span>
|
|
<span class="w"> </span><span class="n">sparkSession</span><span class="p">.</span><span class="n">sql</span><span class="p">(</span><span class="s">"drop table dli_to_rds"</span><span class="p">)</span>
|
|
|
|
<span class="w"> </span><span class="n">sparkSession</span><span class="p">.</span><span class="n">close</span><span class="p">()</span>
|
|
<span class="w"> </span><span class="p">}</span>
|
|
<span class="p">}</span>
|
|
</pre></div></td></tr></table></div>
|
|
|
|
</div>
|
|
</li><li id="dli_09_0067__li129658301628">Connecting to data sources through DataFrame APIs<div class="codecoloring" codetype="Scala" id="dli_09_0067__en-us_topic_0190647825_screen133971647936"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal"> 1</span>
|
|
<span class="normal"> 2</span>
|
|
<span class="normal"> 3</span>
|
|
<span class="normal"> 4</span>
|
|
<span class="normal"> 5</span>
|
|
<span class="normal"> 6</span>
|
|
<span class="normal"> 7</span>
|
|
<span class="normal"> 8</span>
|
|
<span class="normal"> 9</span>
|
|
<span class="normal">10</span>
|
|
<span class="normal">11</span>
|
|
<span class="normal">12</span>
|
|
<span class="normal">13</span>
|
|
<span class="normal">14</span>
|
|
<span class="normal">15</span>
|
|
<span class="normal">16</span>
|
|
<span class="normal">17</span>
|
|
<span class="normal">18</span>
|
|
<span class="normal">19</span>
|
|
<span class="normal">20</span>
|
|
<span class="normal">21</span>
|
|
<span class="normal">22</span>
|
|
<span class="normal">23</span>
|
|
<span class="normal">24</span>
|
|
<span class="normal">25</span>
|
|
<span class="normal">26</span>
|
|
<span class="normal">27</span>
|
|
<span class="normal">28</span>
|
|
<span class="normal">29</span>
|
|
<span class="normal">30</span>
|
|
<span class="normal">31</span>
|
|
<span class="normal">32</span>
|
|
<span class="normal">33</span>
|
|
<span class="normal">34</span>
|
|
<span class="normal">35</span>
|
|
<span class="normal">36</span>
|
|
<span class="normal">37</span>
|
|
<span class="normal">38</span>
|
|
<span class="normal">39</span>
|
|
<span class="normal">40</span>
|
|
<span class="normal">41</span>
|
|
<span class="normal">42</span>
|
|
<span class="normal">43</span>
|
|
<span class="normal">44</span>
|
|
<span class="normal">45</span>
|
|
<span class="normal">46</span>
|
|
<span class="normal">47</span>
|
|
<span class="normal">48</span>
|
|
<span class="normal">49</span>
|
|
<span class="normal">50</span>
|
|
<span class="normal">51</span>
|
|
<span class="normal">52</span>
|
|
<span class="normal">53</span>
|
|
<span class="normal">54</span>
|
|
<span class="normal">55</span>
|
|
<span class="normal">56</span>
|
|
<span class="normal">57</span>
|
|
<span class="normal">58</span>
|
|
<span class="normal">59</span>
|
|
<span class="normal">60</span>
|
|
<span class="normal">61</span>
|
|
<span class="normal">62</span>
|
|
<span class="normal">63</span>
|
|
<span class="normal">64</span>
|
|
<span class="normal">65</span>
|
|
<span class="normal">66</span>
|
|
<span class="normal">67</span>
|
|
<span class="normal">68</span>
|
|
<span class="normal">69</span></pre></div></td><td class="code"><div><pre><span></span><span class="k">import</span><span class="w"> </span><span class="nn">java</span><span class="p">.</span><span class="nn">util</span><span class="p">.</span><span class="nc">Properties</span>
|
|
<span class="k">import</span><span class="w"> </span><span class="nn">org</span><span class="p">.</span><span class="nn">apache</span><span class="p">.</span><span class="nn">spark</span><span class="p">.</span><span class="nn">sql</span><span class="p">.</span><span class="nc">SparkSession</span>
|
|
<span class="k">import</span><span class="w"> </span><span class="nn">org</span><span class="p">.</span><span class="nn">apache</span><span class="p">.</span><span class="nn">spark</span><span class="p">.</span><span class="nn">sql</span><span class="p">.</span><span class="nc">SaveMode</span>
|
|
|
|
<span class="k">object</span><span class="w"> </span><span class="nc">Test_SQL_RDS</span><span class="w"> </span><span class="p">{</span>
|
|
<span class="w"> </span><span class="k">def</span><span class="w"> </span><span class="nf">main</span><span class="p">(</span><span class="n">args</span><span class="p">:</span><span class="w"> </span><span class="nc">Array</span><span class="p">[</span><span class="nc">String</span><span class="p">]):</span><span class="w"> </span><span class="nc">Unit</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{</span>
|
|
<span class="w"> </span><span class="c1">// Create a SparkSession session.</span>
|
|
<span class="w"> </span><span class="kd">val</span><span class="w"> </span><span class="n">sparkSession</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nc">SparkSession</span><span class="p">.</span><span class="n">builder</span><span class="p">().</span><span class="n">getOrCreate</span><span class="p">()</span>
|
|
|
|
<span class="w"> </span><span class="c1">//*****************************DataFrame model***********************************</span>
|
|
<span class="w"> </span><span class="c1">// Set the connection configuration parameters. Contains url, username, password, dbtable.</span>
|
|
<span class="w"> </span><span class="kd">val</span><span class="w"> </span><span class="n">url</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">"jdbc:mysql://to-rds-1174404209-cA37siB6.datasource.com:3306"</span>
|
|
<span class="w"> </span><span class="kd">val</span><span class="w"> </span><span class="n">username</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">"root"</span>
|
|
<span class="w"> </span><span class="kd">val</span><span class="w"> </span><span class="n">password</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">"######"</span>
|
|
<span class="w"> </span><span class="kd">val</span><span class="w"> </span><span class="n">dbtable</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">"test.customer"</span>
|
|
|
|
<span class="w"> </span><span class="c1">// Create a DataFrame and initialize the DataFrame data.</span>
|
|
<span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="n">dataFrame_1</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">sparkSession</span><span class="p">.</span><span class="n">createDataFrame</span><span class="p">(</span><span class="nc">List</span><span class="p">((</span><span class="mi">1</span><span class="p">,</span><span class="w"> </span><span class="s">"Jack"</span><span class="p">,</span><span class="w"> </span><span class="mi">18</span><span class="p">)))</span>
|
|
<span class="w"> </span>
|
|
<span class="w"> </span><span class="c1">// Rename the fields set by the createDataFrame() method.</span>
|
|
<span class="w"> </span><span class="kd">val</span><span class="w"> </span><span class="n">df</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">dataFrame_1</span><span class="p">.</span><span class="n">withColumnRenamed</span><span class="p">(</span><span class="s">"_1"</span><span class="p">,</span><span class="w"> </span><span class="s">"id"</span><span class="p">)</span>
|
|
<span class="w"> </span><span class="p">.</span><span class="n">withColumnRenamed</span><span class="p">(</span><span class="s">"_2"</span><span class="p">,</span><span class="w"> </span><span class="s">"name"</span><span class="p">)</span>
|
|
<span class="w"> </span><span class="p">.</span><span class="n">withColumnRenamed</span><span class="p">(</span><span class="s">"_3"</span><span class="p">,</span><span class="w"> </span><span class="s">"age"</span><span class="p">)</span>
|
|
|
|
<span class="w"> </span><span class="c1">// Write data to the rds_table_1 table</span>
|
|
<span class="w"> </span><span class="n">df</span><span class="p">.</span><span class="n">write</span><span class="p">.</span><span class="n">format</span><span class="p">(</span><span class="s">"jdbc"</span><span class="p">)</span>
|
|
<span class="w"> </span><span class="p">.</span><span class="n">option</span><span class="p">(</span><span class="s">"url"</span><span class="p">,</span><span class="w"> </span><span class="n">url</span><span class="p">)</span>
|
|
<span class="w"> </span><span class="p">.</span><span class="n">option</span><span class="p">(</span><span class="s">"dbtable"</span><span class="p">,</span><span class="w"> </span><span class="n">dbtable</span><span class="p">)</span>
|
|
<span class="w"> </span><span class="p">.</span><span class="n">option</span><span class="p">(</span><span class="s">"user"</span><span class="p">,</span><span class="w"> </span><span class="n">username</span><span class="p">)</span>
|
|
<span class="w"> </span><span class="p">.</span><span class="n">option</span><span class="p">(</span><span class="s">"password"</span><span class="p">,</span><span class="w"> </span><span class="n">password</span><span class="p">)</span>
|
|
<span class="w"> </span><span class="p">.</span><span class="n">option</span><span class="p">(</span><span class="s">"driver"</span><span class="p">,</span><span class="w"> </span><span class="s">"com.mysql.jdbc.Driver"</span><span class="p">)</span>
|
|
<span class="w"> </span><span class="p">.</span><span class="n">mode</span><span class="p">(</span><span class="nc">SaveMode</span><span class="p">.</span><span class="nc">Append</span><span class="p">)</span>
|
|
<span class="w"> </span><span class="p">.</span><span class="n">save</span><span class="p">()</span>
|
|
|
|
<span class="w"> </span><span class="c1">// DataFrame object for data manipulation</span>
|
|
<span class="w"> </span><span class="c1">//Filter users with id=1</span>
|
|
<span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="n">newDF</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">df</span><span class="p">.</span><span class="n">filter</span><span class="p">(</span><span class="s">"id!=1"</span><span class="p">)</span>
|
|
<span class="w"> </span><span class="n">newDF</span><span class="p">.</span><span class="n">show</span><span class="p">()</span>
|
|
<span class="w"> </span>
|
|
<span class="w"> </span><span class="c1">// Filter the id column data</span>
|
|
<span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="n">newDF_1</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">df</span><span class="p">.</span><span class="n">drop</span><span class="p">(</span><span class="s">"id"</span><span class="p">)</span>
|
|
<span class="w"> </span><span class="n">newDF_1</span><span class="p">.</span><span class="n">show</span><span class="p">()</span>
|
|
|
|
<span class="w"> </span><span class="c1">// Read the data of the customer table in the RDS database</span>
|
|
<span class="w"> </span><span class="c1">// Way one: Read data from RDS using read.format()</span>
|
|
<span class="w"> </span><span class="kd">val</span><span class="w"> </span><span class="n">jdbcDF</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">sparkSession</span><span class="p">.</span><span class="n">read</span><span class="p">.</span><span class="n">format</span><span class="p">(</span><span class="s">"jdbc"</span><span class="p">)</span>
|
|
<span class="w"> </span><span class="p">.</span><span class="n">option</span><span class="p">(</span><span class="s">"url"</span><span class="p">,</span><span class="w"> </span><span class="n">url</span><span class="p">)</span>
|
|
<span class="w"> </span><span class="p">.</span><span class="n">option</span><span class="p">(</span><span class="s">"dbtable"</span><span class="p">,</span><span class="w"> </span><span class="n">dbtable</span><span class="p">)</span>
|
|
<span class="w"> </span><span class="p">.</span><span class="n">option</span><span class="p">(</span><span class="s">"user"</span><span class="p">,</span><span class="w"> </span><span class="n">username</span><span class="p">)</span>
|
|
<span class="w"> </span><span class="p">.</span><span class="n">option</span><span class="p">(</span><span class="s">"password"</span><span class="p">,</span><span class="w"> </span><span class="n">password</span><span class="p">)</span>
|
|
<span class="w"> </span><span class="p">.</span><span class="n">option</span><span class="p">(</span><span class="s">"driver"</span><span class="p">,</span><span class="w"> </span><span class="s">"com.mysql.jdbc.Driver"</span><span class="p">)</span>
|
|
<span class="w"> </span><span class="p">.</span><span class="n">load</span><span class="p">()</span>
|
|
<span class="w"> </span><span class="c1">// Way two: Read data from RDS using read.jdbc()</span>
|
|
<span class="w"> </span><span class="kd">val</span><span class="w"> </span><span class="n">properties</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">new</span><span class="w"> </span><span class="nc">Properties</span><span class="p">()</span>
|
|
<span class="w"> </span><span class="n">properties</span><span class="p">.</span><span class="n">put</span><span class="p">(</span><span class="s">"user"</span><span class="p">,</span><span class="w"> </span><span class="n">username</span><span class="p">)</span>
|
|
<span class="w"> </span><span class="n">properties</span><span class="p">.</span><span class="n">put</span><span class="p">(</span><span class="s">"password"</span><span class="p">,</span><span class="w"> </span><span class="n">password</span><span class="p">)</span>
|
|
<span class="w"> </span><span class="kd">val</span><span class="w"> </span><span class="n">jdbcDF2</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">sparkSession</span><span class="p">.</span><span class="n">read</span><span class="p">.</span><span class="n">jdbc</span><span class="p">(</span><span class="n">url</span><span class="p">,</span><span class="w"> </span><span class="n">dbtable</span><span class="p">,</span><span class="w"> </span><span class="n">properties</span><span class="p">)</span>
|
|
|
|
<span class="w"> </span><span class="cm">/**</span>
|
|
<span class="cm"> * Register the dateFrame read by read.format() or read.jdbc() as a temporary table, and query the data </span>
|
|
<span class="cm"> * using the sql statement.</span>
|
|
<span class="cm"> */</span>
|
|
<span class="w"> </span><span class="n">jdbcDF</span><span class="p">.</span><span class="n">registerTempTable</span><span class="p">(</span><span class="s">"customer_test"</span><span class="p">)</span>
|
|
<span class="w"> </span><span class="kd">val</span><span class="w"> </span><span class="n">result</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">sparkSession</span><span class="p">.</span><span class="n">sql</span><span class="p">(</span><span class="s">"select * from customer_test where id = 1"</span><span class="p">)</span>
|
|
<span class="w"> </span><span class="n">result</span><span class="p">.</span><span class="n">show</span><span class="p">()</span>
|
|
|
|
<span class="w"> </span><span class="n">sparkSession</span><span class="p">.</span><span class="n">close</span><span class="p">()</span>
|
|
<span class="w"> </span><span class="p">}</span>
|
|
<span class="p">}</span>
|
|
</pre></div></td></tr></table></div>
|
|
|
|
</div>
|
|
</li><li id="dli_09_0067__li19214041634">DataFrame-related operations<div class="codecoloring" codetype="Scala" id="dli_09_0067__en-us_topic_0190647825_screen12918266556"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal"> 1</span>
|
|
<span class="normal"> 2</span>
|
|
<span class="normal"> 3</span>
|
|
<span class="normal"> 4</span>
|
|
<span class="normal"> 5</span>
|
|
<span class="normal"> 6</span>
|
|
<span class="normal"> 7</span>
|
|
<span class="normal"> 8</span>
|
|
<span class="normal"> 9</span>
|
|
<span class="normal">10</span>
|
|
<span class="normal">11</span>
|
|
<span class="normal">12</span>
|
|
<span class="normal">13</span>
|
|
<span class="normal">14</span>
|
|
<span class="normal">15</span>
|
|
<span class="normal">16</span>
|
|
<span class="normal">17</span>
|
|
<span class="normal">18</span>
|
|
<span class="normal">19</span>
|
|
<span class="normal">20</span>
|
|
<span class="normal">21</span>
|
|
<span class="normal">22</span>
|
|
<span class="normal">23</span>
|
|
<span class="normal">24</span>
|
|
<span class="normal">25</span></pre></div></td><td class="code"><div><pre><span></span><span class="w"> </span><span class="c1">// The where() method uses " and" and "or" for condition filters, returning filtered DataFrame objects</span>
|
|
<span class="w"> </span><span class="n">jdbcDF</span><span class="p">.</span><span class="n">where</span><span class="p">(</span><span class="s">"id = 1 or age <=10"</span><span class="p">).</span><span class="n">show</span><span class="p">()</span>
|
|
|
|
<span class="w"> </span><span class="c1">// The filter() method is used in the same way as the where() method.</span>
|
|
<span class="w"> </span><span class="n">jdbcDF</span><span class="p">.</span><span class="n">filter</span><span class="p">(</span><span class="s">"id = 1 or age <=10"</span><span class="p">).</span><span class="n">show</span><span class="p">()</span>
|
|
|
|
<span class="w"> </span><span class="c1">// The select() method passes multiple arguments and returns the DataFrame object of the specified field.</span>
|
|
<span class="w"> </span><span class="n">jdbcDF</span><span class="p">.</span><span class="n">select</span><span class="p">(</span><span class="s">"id"</span><span class="p">).</span><span class="n">show</span><span class="p">()</span>
|
|
<span class="w"> </span><span class="n">jdbcDF</span><span class="p">.</span><span class="n">select</span><span class="p">(</span><span class="s">"id"</span><span class="p">,</span><span class="w"> </span><span class="s">"name"</span><span class="p">).</span><span class="n">show</span><span class="p">()</span>
|
|
<span class="w"> </span><span class="n">jdbcDF</span><span class="p">.</span><span class="n">select</span><span class="p">(</span><span class="s">"id"</span><span class="p">,</span><span class="s">"name"</span><span class="p">).</span><span class="n">where</span><span class="p">(</span><span class="s">"id<4"</span><span class="p">).</span><span class="n">show</span><span class="p">()</span>
|
|
|
|
<span class="w"> </span><span class="cm">/**</span>
|
|
<span class="cm"> * The selectExpr() method implements special handling of fields, such as renaming, increasing or </span>
|
|
<span class="cm"> * decreasing data values.</span>
|
|
<span class="cm"> */</span>
|
|
<span class="w"> </span><span class="n">jdbcDF</span><span class="p">.</span><span class="n">selectExpr</span><span class="p">(</span><span class="s">"id"</span><span class="p">,</span><span class="w"> </span><span class="s">"name as name_test"</span><span class="p">,</span><span class="w"> </span><span class="s">"age+1"</span><span class="p">).</span><span class="n">show</span><span class="p">()</span>
|
|
|
|
<span class="w"> </span><span class="c1">// The col() method gets a specified field each time, and the return type is a Column type.</span>
|
|
<span class="w"> </span><span class="kd">val</span><span class="w"> </span><span class="n">idCol</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">jdbcDF</span><span class="p">.</span><span class="n">col</span><span class="p">(</span><span class="s">"id"</span><span class="p">)</span>
|
|
|
|
<span class="w"> </span><span class="cm">/**</span>
|
|
<span class="cm"> * The drop() method returns a DataFrame object that does not contain deleted fields, and only one field </span>
|
|
<span class="cm"> * can be deleted at a time.</span>
|
|
<span class="cm"> */</span>
|
|
<span class="w"> </span><span class="n">jdbcDF</span><span class="p">.</span><span class="n">drop</span><span class="p">(</span><span class="s">"id"</span><span class="p">).</span><span class="n">show</span><span class="p">()</span>
|
|
</pre></div></td></tr></table></div>
|
|
|
|
</div>
|
|
</li></ul>
|
|
</div>
|
|
</div>
|
|
<div>
|
|
<div class="familylinks">
|
|
<div class="parentlink"><strong>Parent topic:</strong> <a href="dli_09_0083.html">Connecting to RDS</a></div>
|
|
</div>
|
|
</div>
|
|
|