Files
doc-exports/docs/dli/dev/dli_09_0176.html
Su, Xiaomeng 89b6bedc33 dli_dev_0104_version
Reviewed-by: Rechenburg, Matthias <matthias.rechenburg@t-systems.com>
Co-authored-by: Su, Xiaomeng <suxiaomeng1@huawei.com>
Co-committed-by: Su, Xiaomeng <suxiaomeng1@huawei.com>
2024-01-08 15:25:35 +00:00

371 lines
39 KiB
HTML

<a name="dli_09_0176"></a><a name="dli_09_0176"></a>
<h1 class="topictitle1">Using the Spark Job to Access DLI Metadata</h1>
<div id="body0000001071746299"><div class="section" id="dli_09_0176__section1996121998"><h4 class="sectiontitle">Scenario</h4><p id="dli_09_0176__p2055272213525">DLI allows you to develop a program to create Spark jobs for operations related to databases, DLI or OBS tables, and table data. This example demonstrates how to develop a job by writing a Java program, and use a Spark job to create a database and table and insert table data.</p>
</div>
<div class="section" id="dli_09_0176__section2084510151413"><h4 class="sectiontitle">Constraints</h4><ul id="dli_09_0176__ul154472811215"><li id="dli_09_0176__li1944615071810">You must create a queue to use Spark 3.1 for metadata access.</li><li id="dli_09_0176__li62671499199">The following cases are not supported:<ul id="dli_09_0176__ul138075151192"><li id="dli_09_0176__li6953161261920">If you create a database with a SQL job, you cannot write a program to create tables in that database.<p id="dli_09_0176__p1328310281491"><a name="dli_09_0176__li6953161261920"></a><a name="li6953161261920"></a>For example, the <strong id="dli_09_0176__b210222151610">testdb</strong> database is created using the SQL editor of DLI. A program package for creating the <strong id="dli_09_0176__b71457337271">testTable</strong> table in the <strong id="dli_09_0176__b179771129152811">testdb</strong> database does not work after it is submitted to a Spark Jar job.</p>
</li></ul>
</li><li id="dli_09_0176__li191801191217">The following cases are supported:<ul id="dli_09_0176__ul1289815228587"><li id="dli_09_0176__li45115308580">You can create databases and tables in a SQL job, and read and insert data using SQL statements or a Spark program.</li><li id="dli_09_0176__li14594201995815">You can create databases and tables in a Spark job, and read and insert data using SQL statements or a Spark program.</li></ul>
</li></ul>
</div>
<div class="section" id="dli_09_0176__section199842111628"><h4 class="sectiontitle">Environment Preparations</h4><p id="dli_09_0176__p8202163717211">Before developing a Spark job to access DLI metadata, set up a development environment that meets the following requirements.</p>
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="dli_09_0176__table15851625229" frame="border" border="1" rules="all"><caption><b>Table 1 </b>Development environment</caption><thead align="left"><tr id="dli_09_0176__row11859253210"><th align="left" class="cellrowborder" valign="top" width="27.63%" id="mcps1.3.3.3.2.3.1.1"><p id="dli_09_0176__p9852251528">Item</p>
</th>
<th align="left" class="cellrowborder" valign="top" width="72.37%" id="mcps1.3.3.3.2.3.1.2"><p id="dli_09_0176__p8851725529">Description</p>
</th>
</tr>
</thead>
<tbody><tr id="dli_09_0176__row78519251429"><td class="cellrowborder" valign="top" width="27.63%" headers="mcps1.3.3.3.2.3.1.1 "><p id="dli_09_0176__p108522517216">OS</p>
</td>
<td class="cellrowborder" valign="top" width="72.37%" headers="mcps1.3.3.3.2.3.1.2 "><p id="dli_09_0176__p20851825626">Windows 7 or later</p>
</td>
</tr>
<tr id="dli_09_0176__row18851325325"><td class="cellrowborder" valign="top" width="27.63%" headers="mcps1.3.3.3.2.3.1.1 "><p id="dli_09_0176__p1885825624">JDK</p>
</td>
<td class="cellrowborder" valign="top" width="72.37%" headers="mcps1.3.3.3.2.3.1.2 "><p id="dli_09_0176__p8859251424">JDK 1.8.</p>
</td>
</tr>
<tr id="dli_09_0176__row24601502619"><td class="cellrowborder" valign="top" width="27.63%" headers="mcps1.3.3.3.2.3.1.1 "><p id="dli_09_0176__p16497910469">IntelliJ IDEA</p>
</td>
<td class="cellrowborder" valign="top" width="72.37%" headers="mcps1.3.3.3.2.3.1.2 "><p id="dli_09_0176__p84601601562">This tool is used for application development. The version of the tool must be 2019.1 or other compatible versions.</p>
</td>
</tr>
<tr id="dli_09_0176__row53111251665"><td class="cellrowborder" valign="top" width="27.63%" headers="mcps1.3.3.3.2.3.1.1 "><p id="dli_09_0176__p831117511968">Maven</p>
</td>
<td class="cellrowborder" valign="top" width="72.37%" headers="mcps1.3.3.3.2.3.1.2 "><p id="dli_09_0176__p23118511064">Basic configurations of the development environment. Maven is used for project management throughout the lifecycle of software development.</p>
</td>
</tr>
</tbody>
</table>
</div>
</div>
<div class="section" id="dli_09_0176__section54791739112210"><h4 class="sectiontitle">Development Process</h4><div class="p" id="dli_09_0176__p892144112221">The following figure shows the process for developing a Spark job to access DLI metadata.<div class="fignone" id="dli_09_0176__fig151414781913"><span class="figcap"><b>Figure 1 </b>Development process</span><br><span><img id="dli_09_0176__image6144713194" src="en-us_image_0000001208012082.png"></span></div>
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="dli_09_0176__table1421119391677" frame="border" border="1" rules="all"><caption><b>Table 2 </b>Process description</caption><thead align="left"><tr id="dli_09_0176__row11211153918715"><th align="left" class="cellrowborder" valign="top" width="6.830601092896176%" id="mcps1.3.4.2.2.2.5.1.1"><p id="dli_09_0176__p11573151398">No.</p>
</th>
<th align="left" class="cellrowborder" valign="top" width="23.34113973458236%" id="mcps1.3.4.2.2.2.5.1.2"><p id="dli_09_0176__p8211239475">Phase</p>
</th>
<th align="left" class="cellrowborder" valign="top" width="10.724043715846996%" id="mcps1.3.4.2.2.2.5.1.3"><p id="dli_09_0176__p167011419911">Software Portal</p>
</th>
<th align="left" class="cellrowborder" valign="top" width="59.10421545667448%" id="mcps1.3.4.2.2.2.5.1.4"><p id="dli_09_0176__p1921103911712">Description</p>
</th>
</tr>
</thead>
<tbody><tr id="dli_09_0176__row1722811511589"><td class="cellrowborder" valign="top" width="6.830601092896176%" headers="mcps1.3.4.2.2.2.5.1.1 "><p id="dli_09_0176__p32288513813">1</p>
</td>
<td class="cellrowborder" valign="top" width="23.34113973458236%" headers="mcps1.3.4.2.2.2.5.1.2 "><p id="dli_09_0176__p622813512814">Create a queue for general use.</p>
</td>
<td class="cellrowborder" valign="top" width="10.724043715846996%" headers="mcps1.3.4.2.2.2.5.1.3 "><p id="dli_09_0176__p172281051989">DLI console</p>
</td>
<td class="cellrowborder" valign="top" width="59.10421545667448%" headers="mcps1.3.4.2.2.2.5.1.4 "><p id="dli_09_0176__p32282511387">The DLI queue is created for running your job.</p>
</td>
</tr>
<tr id="dli_09_0176__row13783723184813"><td class="cellrowborder" valign="top" width="6.830601092896176%" headers="mcps1.3.4.2.2.2.5.1.1 "><p id="dli_09_0176__p47841723154817">2</p>
</td>
<td class="cellrowborder" valign="top" width="23.34113973458236%" headers="mcps1.3.4.2.2.2.5.1.2 "><p id="dli_09_0176__p3784122334811">Configure the OBS file.</p>
</td>
<td class="cellrowborder" valign="top" width="10.724043715846996%" headers="mcps1.3.4.2.2.2.5.1.3 "><p id="dli_09_0176__p137841823104818">OBS console</p>
</td>
<td class="cellrowborder" valign="top" width="59.10421545667448%" headers="mcps1.3.4.2.2.2.5.1.4 "><ul id="dli_09_0176__ul7627144316229"><li id="dli_09_0176__li1462716432228">To create an OBS table, you need to upload the file to the OBS bucket.</li><li id="dli_09_0176__li741613871619">Configure the path for storing DLI metadata. This folder is used to store DLI metadata in <strong id="dli_09_0176__b1916418176358">spark.sql.warehouse.dir</strong>.</li></ul>
</td>
</tr>
<tr id="dli_09_0176__row102114391879"><td class="cellrowborder" valign="top" width="6.830601092896176%" headers="mcps1.3.4.2.2.2.5.1.1 "><p id="dli_09_0176__p65761516918">3</p>
</td>
<td class="cellrowborder" valign="top" width="23.34113973458236%" headers="mcps1.3.4.2.2.2.5.1.2 "><p id="dli_09_0176__p4211133911710">Create a Maven project and configure the POM file.</p>
</td>
<td class="cellrowborder" rowspan="3" valign="top" width="10.724043715846996%" headers="mcps1.3.4.2.2.2.5.1.3 "><p id="dli_09_0176__p81691210101">IntelliJ IDEA</p>
</td>
<td class="cellrowborder" rowspan="3" valign="top" width="59.10421545667448%" headers="mcps1.3.4.2.2.2.5.1.4 "><p id="dli_09_0176__p321103914719"></p>
<p id="dli_09_0176__p152111391671">Write a program to create a DLI or OBS table by referring to the sample code.</p>
<p id="dli_09_0176__p694692512124"></p>
</td>
</tr>
<tr id="dli_09_0176__row1211123914712"><td class="cellrowborder" valign="top" headers="mcps1.3.4.2.2.2.5.1.1 "><p id="dli_09_0176__p55731512916">4</p>
</td>
<td class="cellrowborder" valign="top" headers="mcps1.3.4.2.2.2.5.1.2 "><p id="dli_09_0176__p16211739576">Write code.</p>
</td>
</tr>
<tr id="dli_09_0176__row79452250121"><td class="cellrowborder" valign="top" headers="mcps1.3.4.2.2.2.5.1.1 "><p id="dli_09_0176__p79461255124">5</p>
</td>
<td class="cellrowborder" valign="top" headers="mcps1.3.4.2.2.2.5.1.2 "><p id="dli_09_0176__p10946172551215">Debug, compile, and pack the code into a Jar package.</p>
</td>
</tr>
<tr id="dli_09_0176__row86521956191210"><td class="cellrowborder" valign="top" width="6.830601092896176%" headers="mcps1.3.4.2.2.2.5.1.1 "><p id="dli_09_0176__p7652456101218">6</p>
</td>
<td class="cellrowborder" valign="top" width="23.34113973458236%" headers="mcps1.3.4.2.2.2.5.1.2 "><p id="dli_09_0176__p10652185691214">Upload the Jar package to OBS and DLI.</p>
</td>
<td class="cellrowborder" valign="top" width="10.724043715846996%" headers="mcps1.3.4.2.2.2.5.1.3 "><p id="dli_09_0176__p565211562128">OBS console</p>
</td>
<td class="cellrowborder" valign="top" width="59.10421545667448%" headers="mcps1.3.4.2.2.2.5.1.4 "><p id="dli_09_0176__p1165216565129">You can upload the generated Spark Jar package to an OBS directory and DLI program package.</p>
</td>
</tr>
<tr id="dli_09_0176__row18133049101414"><td class="cellrowborder" valign="top" width="6.830601092896176%" headers="mcps1.3.4.2.2.2.5.1.1 "><p id="dli_09_0176__p1513384931416">7</p>
</td>
<td class="cellrowborder" valign="top" width="23.34113973458236%" headers="mcps1.3.4.2.2.2.5.1.2 "><p id="dli_09_0176__p17133194971413">Create a Spark JAR job.</p>
</td>
<td class="cellrowborder" valign="top" width="10.724043715846996%" headers="mcps1.3.4.2.2.2.5.1.3 "><p id="dli_09_0176__p11133449181419">DLI console</p>
</td>
<td class="cellrowborder" valign="top" width="59.10421545667448%" headers="mcps1.3.4.2.2.2.5.1.4 "><p id="dli_09_0176__p107651124156">The Spark Jar job is created and submitted on the DLI console.</p>
</td>
</tr>
<tr id="dli_09_0176__row9403719162"><td class="cellrowborder" valign="top" width="6.830601092896176%" headers="mcps1.3.4.2.2.2.5.1.1 "><p id="dli_09_0176__p134035191618">8</p>
</td>
<td class="cellrowborder" valign="top" width="23.34113973458236%" headers="mcps1.3.4.2.2.2.5.1.2 "><p id="dli_09_0176__p114038181618">Check execution result of the job.</p>
</td>
<td class="cellrowborder" valign="top" width="10.724043715846996%" headers="mcps1.3.4.2.2.2.5.1.3 "><p id="dli_09_0176__p184101541614">DLI console</p>
</td>
<td class="cellrowborder" valign="top" width="59.10421545667448%" headers="mcps1.3.4.2.2.2.5.1.4 "><p id="dli_09_0176__p17403415169">You can view the job running status and run logs.</p>
</td>
</tr>
</tbody>
</table>
</div>
</div>
</div>
<div class="section" id="dli_09_0176__section3345113541312"><a name="dli_09_0176__section3345113541312"></a><a name="section3345113541312"></a><h4 class="sectiontitle">Step 1: Create a Queue for General Purpose</h4><div class="p" id="dli_09_0176__dli_09_0205_p628631819213">If you submit a Spark job for the first time, you need to create a queue first. For example, create a queue, name it <strong id="dli_09_0176__dli_09_0205_b1118111954613">sparktest</strong>, and set <strong id="dli_09_0176__dli_09_0205_b151891914618">Queue Usage</strong> to <strong id="dli_09_0176__dli_09_0205_b981911164473">For general purpose</strong>.<ol id="dli_09_0176__dli_09_0205_ol20286618192116"><li id="dli_09_0176__dli_09_0205_li1328719182211">In the navigation pane of the DLI management console, choose <span class="uicontrol" id="dli_09_0176__dli_09_0205_uicontrol1335111394482"><b>Queue Management</b></span>.</li><li id="dli_09_0176__dli_09_0205_li728716186219">In the upper right corner of the <span class="wintitle" id="dli_09_0176__dli_09_0205_wintitle315024624813"><b>Queue Management</b></span> page, click Create Queue to create a queue.</li><li id="dli_09_0176__dli_09_0205_li16287181811214">Create a queue, name it <strong id="dli_09_0176__dli_09_0205_b9798102695019">sparktest</strong>, and set the queue usage to for general purpose. For details about how to create a queue, see Creating a Queue.</li><li id="dli_09_0176__dli_09_0205_li14287318162119">Click <span class="uicontrol" id="dli_09_0176__dli_09_0205_uicontrol9930051125114"><b>Create Now</b></span> to create a queue.</li></ol>
</div>
</div>
<div class="section" id="dli_09_0176__section66881652423"><a name="dli_09_0176__section66881652423"></a><a name="section66881652423"></a><h4 class="sectiontitle">Step 2: Configure the OBS Bucket File</h4><ol id="dli_09_0176__ol2028912141832"><li id="dli_09_0176__li181196724910">To create an OBS table, upload data to the OBS bucket directory.<div class="p" id="dli_09_0176__p4792103013115"><a name="dli_09_0176__li181196724910"></a><a name="li181196724910"></a>Use the following sample data to create the <strong id="dli_09_0176__b7553145893711">testdata.csv</strong> file and upload it to an OBS bucket.<pre class="screen" id="dli_09_0176__screen512023023116">12,Michael
27,Andy
30,Justin</pre>
</div>
</li><li id="dli_09_0176__li12184172017233">Log in to the OBS Console. In the Bucket page, click the name of the created OBS bucket. In this example, the bucket name is <strong id="dli_09_0176__b395919516418">dli-test-obs01</strong>. The overview page is displayed.</li><li id="dli_09_0176__li3616145163117">In the navigation pane on the left, choose <strong id="dli_09_0176__b184231754847">Objects</strong>. Click <strong id="dli_09_0176__b1042319549419">Upload Object</strong> to upload the <strong id="dli_09_0176__b146073510517">testdata.csv</strong> file to the root directory of the OBS bucket.</li><li id="dli_09_0176__li204841438205413">In the root directory of the OBS bucket, click <strong id="dli_09_0176__b11671141617519">Create Folder</strong> to create a folder and name it <strong id="dli_09_0176__b17677121612515">warehousepath</strong>. This folder is used to store DLI metadata in <strong id="dli_09_0176__b36650841915">spark.sql.warehouse.dir</strong>.</li></ol>
</div>
<div class="section" id="dli_09_0176__section155442205718"><h4 class="sectiontitle">Step 3: Create a Maven Project and Configure the POM Dependency</h4><div class="p" id="dli_09_0176__p1323313312581">This step uses IntelliJ IDEA 2020.2 as an example.<ol id="dli_09_0176__ol2397109104513"><li id="dli_09_0176__li1039714915454">Start IntelliJ IDEA and choose <strong id="dli_09_0176__b179911857123818">File</strong> &gt; <strong id="dli_09_0176__b8997657193818">New</strong> &gt; <strong id="dli_09_0176__b0997165723818">Project</strong>.<div class="fignone" id="dli_09_0176__fig975857114919"><span class="figcap"><b>Figure 2 </b>Creating a project</span><br><span><img id="dli_09_0176__image9757576496" src="en-us_image_0000001208518262.png"></span></div>
</li><li id="dli_09_0176__li13857332152816">Choose <strong id="dli_09_0176__b4888161163915">Maven</strong>, set <strong id="dli_09_0176__b1888818110395">Project SDK</strong> to <strong id="dli_09_0176__b1788961183915">1.8</strong>, and click <strong id="dli_09_0176__b10889171193911">Next</strong>.<div class="fignone" id="dli_09_0176__fig490834613819"><span class="figcap"><b>Figure 3 </b>Selecting an SDK</span><br><span><img id="dli_09_0176__image19908246113818" src="en-us_image_0000001685849073.png"></span></div>
</li><li id="dli_09_0176__li1974116643214">Set the project name, configure the storage path, and click <strong id="dli_09_0176__b9798440398">Finish</strong>.<div class="fignone" id="dli_09_0176__fig153729450399"><span class="figcap"><b>Figure 4 </b>Creating a project</span><br><span><img id="dli_09_0176__image037264583910" src="en-us_image_0000001685690365.png"></span></div>
<p id="dli_09_0176__p1740793545412">In this example, the Maven project name is <strong id="dli_09_0176__b162514319710">SparkJarMetadata</strong>, and the project storage path is <strong id="dli_09_0176__b116251431271">D:\DLITest\SparkJarMetadata</strong>.</p>
</li><li id="dli_09_0176__li56201025357">Add the following content to the <strong id="dli_09_0176__b157611863915">pom.xml</strong> file.<pre class="screen" id="dli_09_0176__screen175551817363">&lt;dependencies&gt;
&lt;dependency&gt;
&lt;groupId&gt;org.apache.spark&lt;/groupId&gt;
&lt;artifactId&gt;spark-sql_2.11&lt;/artifactId&gt;
&lt;version&gt;2.3.2&lt;/version&gt;
&lt;/dependency&gt;
&lt;/dependencies&gt;</pre>
<div class="p" id="dli_09_0176__p1141017953718"><div class="fignone" id="dli_09_0176__fig2844132554514"><span class="figcap"><b>Figure 5 </b>Modifying the <strong id="dli_09_0176__b194006451577">pom.xml</strong> file</span><br><span><img id="dli_09_0176__image584419251457" src="en-us_image_0000001252854995.png"></span></div>
</div>
</li><li id="dli_09_0176__li532734873814">Choose <strong id="dli_09_0176__b3660050471">src</strong> &gt; <strong id="dli_09_0176__b166608501674">main</strong> and right-click the <strong id="dli_09_0176__b136607502719">java</strong> folder. Choose <strong id="dli_09_0176__b66601650671">New</strong> &gt; <strong id="dli_09_0176__b1660650775">Package</strong> to create a package and a class file.<div class="fignone" id="dli_09_0176__fig1318043994014"><span class="figcap"><b>Figure 6 </b>Creating a package</span><br><span><img id="dli_09_0176__image7180539124019" src="en-us_image_0000001685850245.png"></span></div>
<div class="p" id="dli_09_0176__p242315145436">Set the package name as you need. In this example, set <strong id="dli_09_0176__b157312561178">Package</strong> to <strong id="dli_09_0176__b1295266183713">com.</strong><strong id="dli_09_0176__b18952664378"></strong><strong id="dli_09_0176__b795214683720">dli.demo</strong> and press <strong id="dli_09_0176__b1065731013374">Enter</strong>.<div class="fignone" id="dli_09_0176__fig776782524120"><span class="figcap"><b>Figure 7 </b>Entering the package name</span><br><span><img id="dli_09_0176__image176752510418" src="en-us_image_0000001685851125.png"></span></div>
</div>
<div class="p" id="dli_09_0176__p14790156134412">Create a Java Class file in the package path. In this example, the Java Class file is <strong id="dli_09_0176__b1463365811712">DliCatalogTest</strong>.<div class="fignone" id="dli_09_0176__fig347611516426"><span class="figcap"><b>Figure 8 </b>Creating a Java class file</span><br><span><img id="dli_09_0176__image74761155429" src="en-us_image_0000001685852449.png"></span></div>
</div>
</li></ol>
</div>
</div>
<div class="section" id="dli_09_0176__section584152211144"><h4 class="sectiontitle">Step 4: Write Code</h4><p id="dli_09_0176__p19566440161818">Write the DliCatalogTest program to create a database, DLI table, and OBS table.</p>
<p id="dli_09_0176__p858265318217">For the sample code, see <a href="#dli_09_0176__section92626175315">Example Java Code</a>.</p>
<ol id="dli_09_0176__ol153077251228"><li id="dli_09_0176__li73076259225">Import the dependency.<pre class="screen" id="dli_09_0176__screen1063393611226">import org.apache.spark.sql.SparkSession;</pre>
</li><li id="dli_09_0176__li138215458228">Create a SparkSession instance.<div class="p" id="dli_09_0176__p1557132914114"><a name="dli_09_0176__li138215458228"></a><a name="li138215458228"></a>When you create a SparkSession, you need to specify <strong id="dli_09_0176__b259219309913">spark.sql.session.state.builder</strong>, <strong id="dli_09_0176__b191083349915">spark.sql.catalog.class</strong>, and <strong id="dli_09_0176__b383063610920">spark.sql.extensions</strong> parameters as configured in the following example.<pre class="screen" id="dli_09_0176__screen45224276114">SparkSession spark = SparkSession
.builder()
.config("spark.sql.session.state.builder", "org.apache.spark.sql.hive.UQueryHiveACLSessionStateBuilder")
.config("spark.sql.catalog.class", "org.apache.spark.sql.hive.UQueryHiveACLExternalCatalog")
.config("spark.sql.extensions","org.apache.spark.sql.DliSparkExtension")
.appName("java_spark_demo")
.getOrCreate();
</pre>
</div>
</li><li id="dli_09_0176__li166965435316">Create a database.<div class="p" id="dli_09_0176__p752417212617"><a name="dli_09_0176__li166965435316"></a><a name="li166965435316"></a>The following sample code shows how to create a database and named <strong id="dli_09_0176__b276311015110">test_sparkapp</strong>.<pre class="screen" id="dli_09_0176__screen1084812201763">spark.sql("create database if not exists test_sparkapp").collect();</pre>
</div>
</li><li id="dli_09_0176__li1574253853513">Create a DLI table and insert test data.<pre class="screen" id="dli_09_0176__screen964994313378">spark.sql("drop table if exists test_sparkapp.dli_testtable").collect();
spark.sql("create table test_sparkapp.dli_testtable(id INT, name STRING)").collect();
spark.sql("insert into test_sparkapp.dli_testtable VALUES (123,'jason')").collect();
spark.sql("insert into test_sparkapp.dli_testtable VALUES (456,'merry')").collect();</pre>
</li><li id="dli_09_0176__li1093890183918">Create an OBS Table. Replace the OBS path in the following example with the path you set in <a href="#dli_09_0176__section66881652423">Step 2: Configure the OBS Bucket File</a>.<pre class="screen" id="dli_09_0176__screen951176194318">spark.sql("drop table if exists test_sparkapp.dli_testobstable").collect();
spark.sql("create table test_sparkapp.dli_testobstable(age INT, name STRING) using csv options (path '<em id="dli_09_0176__i740316543170">obs://dli-test-obs01/testdata.csv</em>')").collect();</pre>
</li><li id="dli_09_0176__li15115664311">Disable the <strong id="dli_09_0176__b1598724191713">spark</strong> session.<pre class="screen" id="dli_09_0176__screen817220167439">spark.stop();</pre>
</li></ol>
</div>
<div class="section" id="dli_09_0176__section1618514424450"><h4 class="sectiontitle">Step 5: Debug, Compile, and Pack the Code into a Jar Package.</h4><ol id="dli_09_0176__ol3387191918248"><li id="dli_09_0176__li469121413243">Double-click <strong id="dli_09_0176__b172375172376">Maven</strong> in the tool bar on the right, and double-click <strong id="dli_09_0176__b32387179371">clean</strong> and <strong id="dli_09_0176__b32381817153712">compile</strong> to compile the code.<div class="p" id="dli_09_0176__p1190194252510">After the compilation is successful, double-click <strong id="dli_09_0176__b56681219153714">package</strong>.<div class="fignone" id="dli_09_0176__fig541817616463"><span class="figcap"><b>Figure 9 </b>Compiling and packaging</span><br><span><img id="dli_09_0176__image1841816619467" src="en-us_image_0000001637614490.png"></span></div>
</div>
<div class="p" id="dli_09_0176__p7583182817281">The generated JAR package is stored in the <strong id="dli_09_0176__b1773504719177">target</strong> directory. In this example, <strong id="dli_09_0176__b777518493174">SparkJarMetadata-1.0-SNAPSHOT.jar</strong> is stored in <strong id="dli_09_0176__b57753497178">D:\DLITest\SparkJarMetadata\target</strong>.<div class="fignone" id="dli_09_0176__fig79771135104617"><span class="figcap"><b>Figure 10 </b>Exporting the JAR file</span><br><span><img id="dli_09_0176__image2978635114613" src="en-us_image_0000001685695637.png"></span></div>
</div>
</li></ol>
</div>
<div class="section" id="dli_09_0176__section633044910536"><a name="dli_09_0176__section633044910536"></a><a name="section633044910536"></a><h4 class="sectiontitle">Step 6: Upload the JAR Package to OBS and DLI</h4><ol id="dli_09_0176__ol397711215413"><li id="dli_09_0176__li19776129541">Log in to the OBS console and upload the <strong id="dli_09_0176__b176891810121810">SparkJarMetadata-1.0-SNAPSHOT.jar</strong> file to the OBS path.</li><li id="dli_09_0176__li125643214189">Upload the file to DLI for package management.<ol type="a" id="dli_09_0176__ol756103218180"><li id="dli_09_0176__li1556183241810">Log in to the DLI management console and choose <strong id="dli_09_0176__b131331021141815">Data Management</strong> &gt; <strong id="dli_09_0176__b2133921161815">Package Management</strong>.</li><li id="dli_09_0176__li8565328181">On the <strong id="dli_09_0176__b10401182316188">Package Management</strong> page, click <strong id="dli_09_0176__b19401123181810">Create</strong> in the upper right corner.</li><li id="dli_09_0176__li8561032181818">In the <strong id="dli_09_0176__b145782025131817">Create Package</strong> dialog, set the following parameters:<ol class="substepthirdol" id="dli_09_0176__ol7567320189"><li id="dli_09_0176__li55633241816"><strong id="dli_09_0176__b127931328151819">Type</strong>: Select <strong id="dli_09_0176__b13799528191813">JAR</strong>.</li><li id="dli_09_0176__li185693281815"><strong id="dli_09_0176__b05051929121812">OBS Path</strong>: Specify the OBS path for storing the package.</li><li id="dli_09_0176__li135663221819">Set <strong id="dli_09_0176__b1914223171810">Group</strong> and <strong id="dli_09_0176__b0142113121812">Group Name</strong> as required for package identification and management.</li></ol>
</li><li id="dli_09_0176__li756193221817">Click <strong id="dli_09_0176__b9246736121816">OK</strong>.</li></ol>
</li></ol>
</div>
<div class="section" id="dli_09_0176__section1780916256569"><h4 class="sectiontitle">Step 7: Create a Spark Jar Job</h4><ol id="dli_09_0176__ol1811712355154"><li id="dli_09_0176__li1711733561514">Log in to the DLI console. In the navigation pane, choose <strong id="dli_09_0176__b76911047141818">Job Management</strong> &gt; <strong id="dli_09_0176__b8692947111817">Spark Jobs</strong>.</li><li id="dli_09_0176__li138433811617">On the <strong id="dli_09_0176__b14326145081817">Spark Jobs</strong> page, click <strong id="dli_09_0176__b5327350111812">Create Job</strong>.</li><li id="dli_09_0176__li144811015170">On the displayed page, configure the following parameters:
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="dli_09_0176__table162431837162619" frame="border" border="1" rules="all"><caption><b>Table 3 </b>Spark Jar job parameters</caption><thead align="left"><tr id="dli_09_0176__row102441372266"><th align="left" class="cellrowborder" valign="top" width="24.759999999999998%" id="mcps1.3.11.2.3.1.2.3.1.1"><p id="dli_09_0176__p22441137162618">Parameter</p>
</th>
<th align="left" class="cellrowborder" valign="top" width="75.24%" id="mcps1.3.11.2.3.1.2.3.1.2"><p id="dli_09_0176__p17244437112619">Value</p>
</th>
</tr>
</thead>
<tbody><tr id="dli_09_0176__row112443375261"><td class="cellrowborder" valign="top" width="24.759999999999998%" headers="mcps1.3.11.2.3.1.2.3.1.1 "><p id="dli_09_0176__p12244137182615">Queue</p>
</td>
<td class="cellrowborder" valign="top" width="75.24%" headers="mcps1.3.11.2.3.1.2.3.1.2 "><p id="dli_09_0176__p62442372261">Select the DLI queue created for general purpose. For example, select the queue <strong id="dli_09_0176__b23901128112117">sparktest</strong> created in <a href="#dli_09_0176__section3345113541312">Step 1: Create a Queue for General Purpose</a>.</p>
</td>
</tr>
<tr id="dli_09_0176__row119521518125911"><td class="cellrowborder" valign="top" width="24.759999999999998%" headers="mcps1.3.11.2.3.1.2.3.1.1 "><p id="dli_09_0176__p1995281885919">Spark Version</p>
</td>
<td class="cellrowborder" valign="top" width="75.24%" headers="mcps1.3.11.2.3.1.2.3.1.2 "><p id="dli_09_0176__p12480249145916">Select a Spark version. Select a supported Spark version from the drop-down list. The latest version is recommended.</p>
</td>
</tr>
<tr id="dli_09_0176__row924483752616"><td class="cellrowborder" valign="top" width="24.759999999999998%" headers="mcps1.3.11.2.3.1.2.3.1.1 "><p id="dli_09_0176__p11244437182612">Job Name (--name)</p>
</td>
<td class="cellrowborder" valign="top" width="75.24%" headers="mcps1.3.11.2.3.1.2.3.1.2 "><p id="dli_09_0176__p824493716269">Name of a custom Spark Jar job. For example, <strong id="dli_09_0176__b183114055012">SparkTestMeta</strong>.</p>
</td>
</tr>
<tr id="dli_09_0176__row924419372264"><td class="cellrowborder" valign="top" width="24.759999999999998%" headers="mcps1.3.11.2.3.1.2.3.1.1 "><p id="dli_09_0176__p152441437132612">Application</p>
</td>
<td class="cellrowborder" valign="top" width="75.24%" headers="mcps1.3.11.2.3.1.2.3.1.2 "><p id="dli_09_0176__p62447376266">Select the package uploaded to DLI in <a href="#dli_09_0176__section633044910536">Step 6: Upload the JAR Package to OBS and DLI</a>. For example, select <strong id="dli_09_0176__b6499141271116">SparkJarObs-1.0-SNAPSHOT.jar</strong>.</p>
</td>
</tr>
<tr id="dli_09_0176__row1024418375266"><td class="cellrowborder" valign="top" width="24.759999999999998%" headers="mcps1.3.11.2.3.1.2.3.1.1 "><p id="dli_09_0176__p1424415377264">Main Class (--class)</p>
</td>
<td class="cellrowborder" valign="top" width="75.24%" headers="mcps1.3.11.2.3.1.2.3.1.2 "><p id="dli_09_0176__p10432740192817">The format is program package name + class name. </p>
</td>
</tr>
<tr id="dli_09_0176__row10942230103217"><td class="cellrowborder" valign="top" width="24.759999999999998%" headers="mcps1.3.11.2.3.1.2.3.1.1 "><p id="dli_09_0176__p39428309320">Spark Arguments (--conf)</p>
</td>
<td class="cellrowborder" valign="top" width="75.24%" headers="mcps1.3.11.2.3.1.2.3.1.2 "><p id="dli_09_0176__p9942143013212">spark.dli.metaAccess.enable=true</p>
<p id="dli_09_0176__p4114161335">spark.sql.warehouse.dir=<em id="dli_09_0176__i3896171312334">obs://dli-test-obs01/warehousepath</em></p>
<div class="note" id="dli_09_0176__note17829151920335"><span class="notetitle"> NOTE: </span><div class="notebody"><p id="dli_09_0176__p4829191963317">Set <strong id="dli_09_0176__b123949191132">spark.sql.warehouse.dir</strong> to the OBS path that is specified in <a href="#dli_09_0176__section66881652423">Step 2: Configure the OBS Bucket File</a>.</p>
</div></div>
</td>
</tr>
<tr id="dli_09_0176__row14347431343"><td class="cellrowborder" valign="top" width="24.759999999999998%" headers="mcps1.3.11.2.3.1.2.3.1.1 "><p id="dli_09_0176__p12347133163417">Access Metadata</p>
</td>
<td class="cellrowborder" valign="top" width="75.24%" headers="mcps1.3.11.2.3.1.2.3.1.2 "><p id="dli_09_0176__p1034710310341">Select <strong id="dli_09_0176__b1446353514158">Yes</strong>.</p>
</td>
</tr>
</tbody>
</table>
</div>
<p id="dli_09_0176__p922119483012">Retain default values for other parameters.</p>
</li><li id="dli_09_0176__li19508132763014">Click <strong id="dli_09_0176__b365717701710">Execute</strong> to submit the Spark Jar job. On the Job management page, view the running status.
</li></ol>
</div>
<div class="section" id="dli_09_0176__section6553456537"><h4 class="sectiontitle">Step 8: View Job Execution Result</h4><ol id="dli_09_0176__ol15196726195319"><li id="dli_09_0176__li019615264537">On the Job management page, view the running status. The initial status is <strong id="dli_09_0176__b146641734189">Starting</strong>.</li><li id="dli_09_0176__li1775262516375">If the job is successfully executed, the job status is <strong id="dli_09_0176__b03834410429">Finished</strong>. Perform the following operations to view the created database and table:<ol type="a" id="dli_09_0176__ol6385173317377"><li id="dli_09_0176__li1199319302379">On the DLI console, choose <strong id="dli_09_0176__b925242974318">SQL Editor</strong> in the left navigation pane. The created database <strong id="dli_09_0176__b1563155619432">test_sparkapp</strong> is displayed in the database list.</li><li id="dli_09_0176__li9387183503714">Double-click the database name to view the created DLI and OBS tables in the database.</li><li id="dli_09_0176__li223331319438">Double-click <strong id="dli_09_0176__b761764216445">dli_testtable</strong> and click <strong id="dli_09_0176__b949915459446">Execute</strong> to query data in the table.</li><li id="dli_09_0176__li433262612459">Comment out the statement for querying the DLI table, double-click the OBS table <strong id="dli_09_0176__b104741226184510">dli_testobstable</strong>, and click <strong id="dli_09_0176__b18147133214458">Execute</strong> to query the OBS table data.</li></ol>
</li><li id="dli_09_0176__li17196172614532">If the job fails, the job status is <strong id="dli_09_0176__b1522923094610">Failed</strong>. Click <strong id="dli_09_0176__b192299301467">More</strong> in the <strong id="dli_09_0176__b72291630144613">Operation</strong> column and select <strong id="dli_09_0176__b22291230204612">Driver Logs</strong> to view the running log.<p id="dli_09_0176__p91961265538">After the fault is rectified, click <strong id="dli_09_0176__b1181124820478">Edit</strong> in the <strong id="dli_09_0176__b188734508477">Operation</strong> column of the job, modify job parameters, and click <strong id="dli_09_0176__b15664105518478">Execute</strong> to run the job again.</p>
</li></ol>
</div>
<div class="section" id="dli_09_0176__section846454316231"><h4 class="sectiontitle">Follow-up Guide</h4><ul id="dli_09_0176__ul0689182420"><li id="dli_09_0176__li5421818174212">For details about the syntax for creating DLI tables, see "SQL Syntax of Batch Jobs" &gt; "Creating a DLI Table" in <em id="dli_09_0176__i69871267421">Data Lake Insight SQL Syntax Reference</em>. For details about the syntax for creating OBS tables, see "SQL Syntax of Batch Jobs" &gt; "Creating an OBS Table" in <em id="dli_09_0176__i298832604215">Data Lake Insight SQL Syntax Reference</em>.</li><li id="dli_09_0176__li880915152418">If you submit the job by calling an API, perform the following operations:<p id="dli_09_0176__p2901321193319"><a name="dli_09_0176__li880915152418"></a><a name="li880915152418"></a>Call the API for creating a batch processing job. The following table describes the request parameters.</p>
<ul id="dli_09_0176__ul380915172413"><li id="dli_09_0176__li1480912114243">Set <strong id="dli_09_0176__b1476855855116">catalog_name</strong> in the request to <strong id="dli_09_0176__b129362010525">dli</strong>.</li><li id="dli_09_0176__li780961102419">Add <strong id="dli_09_0176__b010672115216">"spark.dli.metaAccess.enable":"true"</strong> to the CONF file.<p id="dli_09_0176__p157574532715">Configure <strong id="dli_09_0176__b161618975219">"spark.sql.warehouse.dir": "obs://bucket/warehousepath"</strong> in the CONF file if you need to run the DDL.</p>
<p id="dli_09_0176__p1380913192417">The following example provided you with the complete API request.</p>
<pre class="screen" id="dli_09_0176__screen1680918113244">{
"queue":"<em id="dli_09_0176__i5809141172420">citest</em>",
"file":"<em id="dli_09_0176__i197405013277">SparkJarMetadata-1.0-SNAPSHOT.jar</em>",
"className":"DliCatalogTest",
<strong id="dli_09_0176__b11810141192415">"conf":{"spark.sql.warehouse.dir": "obs://bucket/warehousepath",</strong>
<strong id="dli_09_0176__b118113142414"> "spark.dli.metaAccess.enable":"true"</strong>},
"sc_type":"A",
"executorCores":1,
"numExecutors":6,
"executorMemory":"4G",
"driverCores":2,
"driverMemory":"7G",
<strong id="dli_09_0176__b581115172412"> "catalog_name": "dli"</strong>
}</pre>
</li></ul>
</li></ul>
</div>
<div class="section" id="dli_09_0176__section92626175315"><a name="dli_09_0176__section92626175315"></a><a name="section92626175315"></a><h4 class="sectiontitle">Example Java Code</h4><p id="dli_09_0176__p4425121915294">This example uses Java for coding. The complete sample code is as follows:</p>
<pre class="screen" id="dli_09_0176__screen1439817376536">package com.dli.demo;
import org.apache.spark.sql.SparkSession;
public class DliCatalogTest {
public static void main(String[] args) {
SparkSession spark = SparkSession
.builder()
.config("spark.sql.session.state.builder", "org.apache.spark.sql.hive.UQueryHiveACLSessionStateBuilder")
.config("spark.sql.catalog.class", "org.apache.spark.sql.hive.UQueryHiveACLExternalCatalog")
.config("spark.sql.extensions","org.apache.spark.sql.DliSparkExtension")
.appName("java_spark_demo")
.getOrCreate();
spark.sql("create database if not exists test_sparkapp").collect();
spark.sql("drop table if exists test_sparkapp.dli_testtable").collect();
spark.sql("create table test_sparkapp.dli_testtable(id INT, name STRING)").collect();
spark.sql("insert into test_sparkapp.dli_testtable VALUES (123,'jason')").collect();
spark.sql("insert into test_sparkapp.dli_testtable VALUES (456,'merry')").collect();
spark.sql("drop table if exists test_sparkapp.dli_testobstable").collect();
spark.sql("create table test_sparkapp.dli_testobstable(age INT, name STRING) using csv options (path 'obs://dli-test-obs01/testdata.csv')").collect();
spark.stop();
}
}</pre>
</div>
<div class="section" id="dli_09_0176__section1965974920564"><h4 class="sectiontitle">Example Scala Code</h4><pre class="screen" id="dli_09_0176__screen579414145259">object DliCatalogTest {
def main(args:Array[String]): Unit = {
val sql = args(0)
val runDdl =
Try(args(1).toBoolean).getOrElse(true)
System.out.println(s"sql is $sql
runDdl is $runDdl")
val sparkConf = new SparkConf(true)
sparkConf
.set("spark.sql.session.state.builder","org.apache.spark.sql.hive.UQueryHiveACLSessionStateBuilder")
.set("spark.sql.catalog.class","org.apache.spark.sql.hive.UQueryHiveACLExternalCatalog")
sparkConf.setAppName("dlicatalogtester")
val spark = SparkSession.builder
.config(sparkConf)
.enableHiveSupport()
.config("spark.sql.extensions","org.apache.spark.sql.DliSparkExtension")
.appName("SparkTest")
.getOrCreate()
System.out.println("catalog is "
+ spark.sessionState.catalog.toString)
if (runDdl) {
val df = spark.sql(sql).collect()
} else {
spark.sql(sql).show()
}
spark.close()
}
}</pre>
</div>
<div class="section" id="dli_09_0176__section443111922220"><h4 class="sectiontitle">Example Python Code</h4><pre class="screen" id="dli_09_0176__screen1240318340229">#!/usr/bin/python
# -*- coding: UTF-8 -*-
from __future__ import print_function
import sys
from pyspark.sql import SparkSession
if __name__ == "__main__":
url = sys.argv[1]
creatTbl = "CREATE TABLE test_sparkapp.dli_rds USING JDBC OPTIONS ('url'='jdbc:mysql://%s'," \
"'driver'='com.mysql.jdbc.Driver','dbtable'='test.test'," \
" 'passwdauth' = 'DatasourceRDSTest_pwd','encryption' = 'true')" % url
spark = SparkSession \
.builder \
.enableHiveSupport() \
.config("spark.sql.session.state.builder","org.apache.spark.sql.hive.UQueryHiveACLSessionStateBuilder") \
.config("spark.sql.catalog.class", "org.apache.spark.sql.hive.UQueryHiveACLExternalCatalog") \
.config("spark.sql.extensions","org.apache.spark.sql.DliSparkExtension") \
.appName("python Spark test catalog") \
.getOrCreate()
spark.sql("CREATE database if not exists test_sparkapp").collect()
spark.sql("drop table if exists test_sparkapp.dli_rds").collect()
spark.sql(creatTbl).collect()
spark.sql("select * from test_sparkapp.dli_rds").show()
spark.sql("insert into table test_sparkapp.dli_rds select 12,'aaa'").collect()
spark.sql("select * from test_sparkapp.dli_rds").show()
spark.sql("insert overwrite table test_sparkapp.dli_rds select 1111,'asasasa'").collect()
spark.sql("select * from test_sparkapp.dli_rds").show()
spark.sql("drop table test_sparkapp.dli_rds").collect()
spark.stop()</pre>
</div>
</div>
<div>
<div class="familylinks">
<div class="parentlink"><strong>Parent topic:</strong> <a href="dli_09_0203.html">Spark Jar Jobs</a></div>
</div>
</div>