forked from docs/doc-exports
Reviewed-by: Rechenburg, Matthias <matthias.rechenburg@t-systems.com> Co-authored-by: Su, Xiaomeng <suxiaomeng1@huawei.com> Co-committed-by: Su, Xiaomeng <suxiaomeng1@huawei.com>
158 lines
19 KiB
HTML
158 lines
19 KiB
HTML
<a name="dli_09_0171"></a><a name="dli_09_0171"></a>
|
|
|
|
<h1 class="topictitle1">Calling UDFs in Spark SQL Jobs</h1>
|
|
<div id="body8662426"><div class="section" id="dli_09_0171__en-us_topic_0206789796_section20910549205110"><h4 class="sectiontitle">Scenario</h4><p id="dli_09_0171__en-us_topic_0206789796_p1383510563517">DLI allows you to use Hive user-defined functions (UDFs) to query data. UDFs take effect only on a single row of data and are applicable to inserting and deleting a single data record.</p>
|
|
</div>
|
|
<div class="section" id="dli_09_0171__section52621635112515"><h4 class="sectiontitle">Constraints</h4><ul id="dli_09_0171__ul18355349102515"><li id="dli_09_0171__li717014702012">To perform UDF-related operations on DLI, you need to create a SQL queue instead of using the default queue.</li><li id="dli_09_0171__li123559497259">When UDFs are used across accounts, other users, except the user who creates them, need to be authorized before using the UDF. The authorization operations are as follows:<p id="dli_09_0171__p6355114911251"><a name="dli_09_0171__li123559497259"></a><a name="li123559497259"></a>Log in to the DLI console and choose <strong id="dli_09_0171__b1945844205818">Data Management</strong> > <strong id="dli_09_0171__b125761168586">Package Management</strong>. On the displayed page, select your UDF Jar package and click <strong id="dli_09_0171__b26015395016">Manage Permissions</strong> in the <strong id="dli_09_0171__b446210452596">Operation</strong> column. On the permission management page, click <strong id="dli_09_0171__b3772335216">Grant Permission</strong> in the upper right corner and select the required permissions.</p>
|
|
</li><li id="dli_09_0171__li1043012581321">If you use a static class or interface in a UDF, add <strong id="dli_09_0171__b191531822915">try catch</strong> to capture exceptions. Otherwise, package conflicts may occur.</li></ul>
|
|
</div>
|
|
<div class="section" id="dli_09_0171__section199842111628"><h4 class="sectiontitle">Environment Preparations</h4><p id="dli_09_0171__p8202163717211">Before you start, set up the development environment.</p>
|
|
|
|
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="dli_09_0171__table15851625229" frame="border" border="1" rules="all"><caption><b>Table 1 </b>Development environment</caption><thead align="left"><tr id="dli_09_0171__row11859253210"><th align="left" class="cellrowborder" valign="top" width="27.63%" id="mcps1.3.3.3.2.3.1.1"><p id="dli_09_0171__p9852251528">Item</p>
|
|
</th>
|
|
<th align="left" class="cellrowborder" valign="top" width="72.37%" id="mcps1.3.3.3.2.3.1.2"><p id="dli_09_0171__p8851725529">Description</p>
|
|
</th>
|
|
</tr>
|
|
</thead>
|
|
<tbody><tr id="dli_09_0171__row78519251429"><td class="cellrowborder" valign="top" width="27.63%" headers="mcps1.3.3.3.2.3.1.1 "><p id="dli_09_0171__p108522517216">OS</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="72.37%" headers="mcps1.3.3.3.2.3.1.2 "><p id="dli_09_0171__p20851825626">Windows 7 or later</p>
|
|
</td>
|
|
</tr>
|
|
<tr id="dli_09_0171__row18851325325"><td class="cellrowborder" valign="top" width="27.63%" headers="mcps1.3.3.3.2.3.1.1 "><p id="dli_09_0171__p1885825624">JDK</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="72.37%" headers="mcps1.3.3.3.2.3.1.2 "><p id="dli_09_0171__p8859251424">JDK 1.8.</p>
|
|
</td>
|
|
</tr>
|
|
<tr id="dli_09_0171__row24601502619"><td class="cellrowborder" valign="top" width="27.63%" headers="mcps1.3.3.3.2.3.1.1 "><p id="dli_09_0171__p16497910469">IntelliJ IDEA</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="72.37%" headers="mcps1.3.3.3.2.3.1.2 "><p id="dli_09_0171__p84601601562">This tool is used for application development. The version of the tool must be 2019.1 or other compatible versions.</p>
|
|
</td>
|
|
</tr>
|
|
<tr id="dli_09_0171__row53111251665"><td class="cellrowborder" valign="top" width="27.63%" headers="mcps1.3.3.3.2.3.1.1 "><p id="dli_09_0171__p831117511968">Maven</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="72.37%" headers="mcps1.3.3.3.2.3.1.2 "><p id="dli_09_0171__p23118511064">Basic configurations of the development environment. Maven is used for project management throughout the lifecycle of software development.</p>
|
|
</td>
|
|
</tr>
|
|
</tbody>
|
|
</table>
|
|
</div>
|
|
</div>
|
|
<div class="section" id="dli_09_0171__section54791739112210"><h4 class="sectiontitle">Development Process</h4><div class="p" id="dli_09_0171__p892144112221">The process of developing a UDF is as follows:<div class="fignone" id="dli_09_0171__fig17431474448"><span class="figcap"><b>Figure 1 </b>Development process</span><br><span><img id="dli_09_0171__image37435714442" src="en-us_image_0000001200327862.png"></span></div>
|
|
|
|
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="dli_09_0171__table1421119391677" frame="border" border="1" rules="all"><caption><b>Table 2 </b>Process description</caption><thead align="left"><tr id="dli_09_0171__row11211153918715"><th align="left" class="cellrowborder" valign="top" width="6.830601092896176%" id="mcps1.3.4.2.2.2.5.1.1"><p id="dli_09_0171__p11573151398">No.</p>
|
|
</th>
|
|
<th align="left" class="cellrowborder" valign="top" width="23.936377829820454%" id="mcps1.3.4.2.2.2.5.1.2"><p id="dli_09_0171__p8211239475">Phase</p>
|
|
</th>
|
|
<th align="left" class="cellrowborder" valign="top" width="10.685011709601874%" id="mcps1.3.4.2.2.2.5.1.3"><p id="dli_09_0171__p167011419911">Software Portal</p>
|
|
</th>
|
|
<th align="left" class="cellrowborder" valign="top" width="58.548009367681495%" id="mcps1.3.4.2.2.2.5.1.4"><p id="dli_09_0171__p1921103911712">Description</p>
|
|
</th>
|
|
</tr>
|
|
</thead>
|
|
<tbody><tr id="dli_09_0171__row102114391879"><td class="cellrowborder" valign="top" width="6.830601092896176%" headers="mcps1.3.4.2.2.2.5.1.1 "><p id="dli_09_0171__p65761516918">1</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="23.936377829820454%" headers="mcps1.3.4.2.2.2.5.1.2 "><p id="dli_09_0171__p4211133911710">Create a Maven project and configure the POM file.</p>
|
|
</td>
|
|
<td class="cellrowborder" rowspan="3" valign="top" width="10.685011709601874%" headers="mcps1.3.4.2.2.2.5.1.3 "><p id="dli_09_0171__p81691210101">IntelliJ IDEA</p>
|
|
</td>
|
|
<td class="cellrowborder" rowspan="3" valign="top" width="58.548009367681495%" headers="mcps1.3.4.2.2.2.5.1.4 "><p id="dli_09_0171__p321103914719"></p>
|
|
<p id="dli_09_0171__p152111391671">Write UDF code by referring the steps in <a href="#dli_09_0171__en-us_topic_0206789796_section164701187527">Procedure</a>.</p>
|
|
<p id="dli_09_0171__p694692512124"></p>
|
|
</td>
|
|
</tr>
|
|
<tr id="dli_09_0171__row1211123914712"><td class="cellrowborder" valign="top" headers="mcps1.3.4.2.2.2.5.1.1 "><p id="dli_09_0171__p55731512916">2</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" headers="mcps1.3.4.2.2.2.5.1.2 "><p id="dli_09_0171__p16211739576">Write UDF code.</p>
|
|
</td>
|
|
</tr>
|
|
<tr id="dli_09_0171__row79452250121"><td class="cellrowborder" valign="top" headers="mcps1.3.4.2.2.2.5.1.1 "><p id="dli_09_0171__p79461255124">3</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" headers="mcps1.3.4.2.2.2.5.1.2 "><p id="dli_09_0171__p10946172551215">Debug, compile, and pack the code into a Jar package.</p>
|
|
</td>
|
|
</tr>
|
|
<tr id="dli_09_0171__row86521956191210"><td class="cellrowborder" valign="top" width="6.830601092896176%" headers="mcps1.3.4.2.2.2.5.1.1 "><p id="dli_09_0171__p7652456101218">4</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="23.936377829820454%" headers="mcps1.3.4.2.2.2.5.1.2 "><p id="dli_09_0171__p10652185691214">Upload the Jar package to OBS.</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="10.685011709601874%" headers="mcps1.3.4.2.2.2.5.1.3 "><p id="dli_09_0171__p565211562128">OBS console</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="58.548009367681495%" headers="mcps1.3.4.2.2.2.5.1.4 "><p id="dli_09_0171__p1165216565129">Upload the UDF Jar file to an OBS directory.</p>
|
|
</td>
|
|
</tr>
|
|
<tr id="dli_09_0171__row18133049101414"><td class="cellrowborder" valign="top" width="6.830601092896176%" headers="mcps1.3.4.2.2.2.5.1.1 "><p id="dli_09_0171__p1513384931416">5</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="23.936377829820454%" headers="mcps1.3.4.2.2.2.5.1.2 "><p id="dli_09_0171__p17133194971413">Create the UDF on DLI.</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="10.685011709601874%" headers="mcps1.3.4.2.2.2.5.1.3 "><p id="dli_09_0171__p11133449181419">DLI console</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="58.548009367681495%" headers="mcps1.3.4.2.2.2.5.1.4 "><p id="dli_09_0171__p107651124156">Create a UDF on the SQL job management page of the DLI console.</p>
|
|
</td>
|
|
</tr>
|
|
<tr id="dli_09_0171__row9403719162"><td class="cellrowborder" valign="top" width="6.830601092896176%" headers="mcps1.3.4.2.2.2.5.1.1 "><p id="dli_09_0171__p134035191618">6</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="23.936377829820454%" headers="mcps1.3.4.2.2.2.5.1.2 "><p id="dli_09_0171__p114038181618">Verify and use the UDF on DLI.</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="10.685011709601874%" headers="mcps1.3.4.2.2.2.5.1.3 "><p id="dli_09_0171__p184101541614">DLI console</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="58.548009367681495%" headers="mcps1.3.4.2.2.2.5.1.4 "><p id="dli_09_0171__p17403415169">Use the UDF in your DLI job.</p>
|
|
</td>
|
|
</tr>
|
|
</tbody>
|
|
</table>
|
|
</div>
|
|
</div>
|
|
</div>
|
|
<div class="section" id="dli_09_0171__en-us_topic_0206789796_section164701187527"><a name="dli_09_0171__en-us_topic_0206789796_section164701187527"></a><a name="en-us_topic_0206789796_section164701187527"></a><h4 class="sectiontitle">Procedure</h4><ol id="dli_09_0171__en-us_topic_0206789796_ol1580116925614"><li id="dli_09_0171__li1539717934515">Create a Maven project and configure the POM file. This step uses IntelliJ IDEA 2020.2 as an example.<ol type="a" id="dli_09_0171__ol2397109104513"><li id="dli_09_0171__li1039714915454">Start IntelliJ IDEA and choose <strong id="dli_09_0171__b4824667037400">File</strong> > <strong id="dli_09_0171__b19079767327400">New</strong> > <strong id="dli_09_0171__b5187221767400">Project</strong>.<div class="fignone" id="dli_09_0171__fig975857114919"><span class="figcap"><b>Figure 2 </b>Creating a project</span><br><span><img id="dli_09_0171__image9757576496" src="en-us_image_0000001245448995.png"></span></div>
|
|
</li><li id="dli_09_0171__li13857332152816">Choose <strong id="dli_09_0171__b15951273487400">Maven</strong>, set <strong id="dli_09_0171__b14194763407400">Project SDK</strong> to <strong id="dli_09_0171__b19244204717400">1.8</strong>, and click <strong id="dli_09_0171__b3437524177400">Next</strong>.<p id="dli_09_0171__p18264474285"><span><img id="dli_09_0171__image1026247102814" src="en-us_image_0000001245660555.png"></span></p>
|
|
</li><li id="dli_09_0171__li1974116643214">Set the project name, configure the storage path, and click <strong id="dli_09_0171__b11843596227400">Finish</strong>.<p id="dli_09_0171__p1740793545412"><span><img id="dli_09_0171__image164070353549" src="en-us_image_0000001245649477.png"></span></p>
|
|
</li><li id="dli_09_0171__li56201025357">Add the following content to the <strong id="dli_09_0171__b1548638417400">pom.xml</strong> file.<pre class="screen" id="dli_09_0171__screen175551817363"><dependencies>
|
|
<dependency>
|
|
<groupId>org.apache.hive</groupId>
|
|
<artifactId>hive-exec</artifactId>
|
|
<version>1.2.1</version>
|
|
</dependency>
|
|
</dependencies></pre>
|
|
<div class="p" id="dli_09_0171__p1141017953718"><div class="fignone" id="dli_09_0171__fig1638171715711"><span class="figcap"><b>Figure 3 </b>Adding configurations to the POM file</span><br><span><img id="dli_09_0171__image1863881720572" src="en-us_image_0000001200329970.png"></span></div>
|
|
</div>
|
|
</li><li id="dli_09_0171__li532734873814">Choose <strong id="dli_09_0171__b10555136297400">src</strong> > <strong id="dli_09_0171__b8331314267400">main</strong> and right-click the <strong id="dli_09_0171__b6319848137400">java</strong> folder. Choose <strong id="dli_09_0171__b13611449777400">New</strong> > <strong id="dli_09_0171__b5678232607400">Package</strong> to create a package and a class file.<p id="dli_09_0171__p1024862145915"><span><img id="dli_09_0171__image82487235912" src="en-us_image_0000001245651049.png"></span></p>
|
|
<p id="dli_09_0171__p242315145436">Set the package name as you need. Then, press <strong id="dli_09_0171__b168991141173416">Enter</strong>.</p>
|
|
<p id="dli_09_0171__p20373144612015"><span><img id="dli_09_0171__image7373114619012" src="en-us_image_0000001245291807.png"></span></p>
|
|
<p id="dli_09_0171__p14790156134412">Create a Java Class file in the package path. In this example, the Java Class file is <strong id="dli_09_0171__b13426181461016">SumUdfDemo</strong>.</p>
|
|
<p id="dli_09_0171__p2295123810319"><span><img id="dli_09_0171__image14295153813318" src="en-us_image_0000001245532035.png"></span></p>
|
|
</li></ol>
|
|
</li><li id="dli_09_0171__li162811928115118">Write UDF code.<ol type="a" id="dli_09_0171__ol49451957145416"><li id="dli_09_0171__li1380025315545">The UDF must inherit <strong id="dli_09_0171__b188431911101115">org.apache.hadoop.hive.ql.UDF</strong>.</li><li id="dli_09_0171__li613815335583">You must implement the <strong id="dli_09_0171__b10453611141217">evaluate</strong> function, which can be reloaded.</li></ol>
|
|
<p id="dli_09_0171__p9378145616810">For details about how to implement the UDF, see the following sample code:</p>
|
|
<pre class="screen" id="dli_09_0171__screen3313641814">package com.demo;
|
|
import org.apache.hadoop.hive.ql.exec.UDF;
|
|
public class SumUdfDemo extends UDF {
|
|
public int evaluate(int a, int b) {
|
|
return a + b;
|
|
}
|
|
}</pre>
|
|
</li><li id="dli_09_0171__li12693471628">Use IntelliJ IDEA to compile the code and pack it into the JAR package.<ol type="a" id="dli_09_0171__ol3387191918248"><li id="dli_09_0171__li469121413243">Click <strong id="dli_09_0171__b1042918500165">Maven</strong> in the tool bar on the right, and click <strong id="dli_09_0171__b742915091616">clean</strong> and <strong id="dli_09_0171__b10430450101614">compile</strong> to compile the code.<p id="dli_09_0171__p1190194252510">After the compilation is successful, click <strong id="dli_09_0171__b422845371617">package</strong>.</p>
|
|
<p id="dli_09_0171__p23005544140"><span><img id="dli_09_0171__image17300454131412" src="en-us_image_0000001245535709.png"></span></p>
|
|
<p id="dli_09_0171__p7583182817281">The generated JAR package is stored in the <strong id="dli_09_0171__b19216269287400">target</strong> directory. In this example, <strong id="dli_09_0171__b1122019171713">MyUDF-1.0-SNAPSHOT.jar</strong> is stored in <strong id="dli_09_0171__b82211611172">D:\DLITest\MyUDF\target</strong>.</p>
|
|
<p id="dli_09_0171__p1976514307172"><span><img id="dli_09_0171__image13765143041714" src="en-us_image_0000001245456049.png"></span></p>
|
|
</li></ol>
|
|
</li><li id="dli_09_0171__li124681258143211">Log in to the OBS console and upload the file to the OBS path.<div class="note" id="dli_09_0171__note1529413575330"><img src="public_sys-resources/note_3.0-en-us.png"><span class="notetitle"> </span><div class="notebody"><p id="dli_09_0171__p429435710335">The region of the OBS bucket to which the Jar package is uploaded must be the same as the region of the DLI queue. Cross-region operations are not allowed.</p>
|
|
</div></div>
|
|
</li><li id="dli_09_0171__li125643214189">(Optional) Upload the file to DLI for package management.<ol type="a" id="dli_09_0171__ol756103218180"><li id="dli_09_0171__li1556183241810">Log in to the DLI management console and choose <strong id="dli_09_0171__b15206259185">Data Management</strong> > <strong id="dli_09_0171__b1820192519188">Package Management</strong>.</li><li id="dli_09_0171__li8565328181">On the <strong id="dli_09_0171__b1327773161815">Package Management</strong> page, click <strong id="dli_09_0171__b112779315182">Create</strong> in the upper right corner.</li><li id="dli_09_0171__li8561032181818">In the <strong id="dli_09_0171__b16428203551819">Create Package</strong> dialog, set the following parameters:<ol class="substepthirdol" id="dli_09_0171__ol7567320189"><li id="dli_09_0171__li55633241816"><strong id="dli_09_0171__b5650986537400">Type</strong>: Select <strong id="dli_09_0171__b13471127557400">JAR</strong>.</li><li id="dli_09_0171__li185693281815"><strong id="dli_09_0171__b1840105837400">OBS Path</strong>: Specify the OBS path for storing the package.</li><li id="dli_09_0171__li135663221819">Set <strong id="dli_09_0171__b19903433567400">Group</strong> and <strong id="dli_09_0171__b8883014797400">Group Name</strong> as required for package identification and management.</li></ol>
|
|
</li><li id="dli_09_0171__li756193221817">Click <strong id="dli_09_0171__b14470606207400">OK</strong>.</li></ol>
|
|
</li><li id="dli_09_0171__en-us_topic_0206789796_li9516133616203"><a name="dli_09_0171__en-us_topic_0206789796_li9516133616203"></a><a name="en-us_topic_0206789796_li9516133616203"></a>Create the UDF on DLI.<ol type="a" id="dli_09_0171__ol10758142517377"><li id="dli_09_0171__li128611920173710">Log in to the DLI console, choose <strong id="dli_09_0171__b41673201916">SQL Editor</strong>. Set <strong id="dli_09_0171__b1553414236199">Engine</strong> to <strong id="dli_09_0171__b185021826161920">spark</strong>, and select the created SQL queue and database.</li><li id="dli_09_0171__li114046144815">In the SQL editing area, run the following statement to create a UDF and click <strong id="dli_09_0171__b3191918112020">Execute</strong>.<pre class="screen" id="dli_09_0171__screen9585108193711">CREATE FUNCTION TestSumUDF AS 'com.demo.SumUdfDemo' using jar 'obs://dli-test-obs01/MyUDF-1.0-SNAPSHOT.jar';</pre>
|
|
</li></ol>
|
|
</li><li id="dli_09_0171__li1547203712127">Restart the original SQL queue for the added function to take effect.<ol type="a" id="dli_09_0171__ol6915156171419"><li id="dli_09_0171__li195347431411">Log in to the DLI console and choose <strong id="dli_09_0171__b15543195442417">Queue Management</strong> from the navigation pane. In the <strong id="dli_09_0171__b18259912513">Operation</strong> column of the SQL queue job, click <strong id="dli_09_0171__b8846122519256">Restart</strong>.</li><li id="dli_09_0171__li770731015168">In the <strong id="dli_09_0171__b131831634172512">Restart</strong> dialog box, click <strong id="dli_09_0171__b5914103512512">OK</strong>.</li></ol>
|
|
</li><li id="dli_09_0171__en-us_topic_0206789796_li816783552118">Call the UDF.<p id="dli_09_0171__en-us_topic_0206789796_p1064914469213"><a name="dli_09_0171__en-us_topic_0206789796_li816783552118"></a><a name="en-us_topic_0206789796_li816783552118"></a>Use the UDF created in <a href="#dli_09_0171__en-us_topic_0206789796_li9516133616203">6</a> in the SELECT statement as follows:</p>
|
|
<pre class="screen" id="dli_09_0171__screen5786134210256">select TestSumUDF(1,2);</pre>
|
|
</li><li id="dli_09_0171__en-us_topic_0206789796_li7751241152315">(Optional) Delete the UDF.<p id="dli_09_0171__en-us_topic_0206789796_p2654842182313"><a name="dli_09_0171__en-us_topic_0206789796_li7751241152315"></a><a name="en-us_topic_0206789796_li7751241152315"></a>If the UDF is no longer used, run the following statement to delete it:</p>
|
|
<pre class="screen" id="dli_09_0171__screen58694547259">Drop FUNCTION TestSumUDF;</pre>
|
|
</li></ol>
|
|
</div>
|
|
</div>
|
|
<div>
|
|
<div class="familylinks">
|
|
<div class="parentlink"><strong>Parent topic:</strong> <a href="dli_09_0120.html">Spark SQL Jobs</a></div>
|
|
</div>
|
|
</div>
|
|
|