forked from docs/doc-exports
Reviewed-by: Pruthi, Vineet <vineet.pruthi@t-systems.com> Co-authored-by: chenxiaoxiong <chenxiaoxiong@huawei.com> Co-committed-by: chenxiaoxiong <chenxiaoxiong@huawei.com>
36 lines
9.6 KiB
HTML
36 lines
9.6 KiB
HTML
<a name="dataartsstudio_01_0521"></a><a name="dataartsstudio_01_0521"></a>
|
|
|
|
<h1 class="topictitle1">Developing a DLI Spark Job</h1>
|
|
<div id="body8662426"><p id="dataartsstudio_01_0521__en-us_topic_0127305014_p499864211311">This section introduces how to develop a DLI Spark job on <span id="dataartsstudio_01_0521__en-us_topic_0127305014_text139981442183116">DataArts Factory</span>.</p>
|
|
<div class="section" id="dataartsstudio_01_0521__en-us_topic_0127305014_section997812914303"><h4 class="sectiontitle">Scenario Description</h4><p id="dataartsstudio_01_0521__en-us_topic_0127305014_p14604113343016">In most cases, SQL is used to analyze and process data when using Data Lake Insight (DLI). However, SQL is usually unable to deal with complex processing logic. In this case, Spark jobs can help. This section uses an example to demonstrate how to submit a Spark job on <span id="dataartsstudio_01_0521__en-us_topic_0127305014_text97524544154">DataArts Factory</span>.</p>
|
|
<p id="dataartsstudio_01_0521__en-us_topic_0127305014_p6617121591620">The general submission procedure is as follows:</p>
|
|
<ol id="dataartsstudio_01_0521__en-us_topic_0127305014_ol10648133416167"><li id="dataartsstudio_01_0521__en-us_topic_0127305014_li1712355204313">Create a DLI cluster and run a Spark job using physical resources of the DLI cluster.</li><li id="dataartsstudio_01_0521__en-us_topic_0127305014_li2648734181612">Obtain a demo JAR package of the Spark job and associate with the JAR package on <span id="dataartsstudio_01_0521__en-us_topic_0127305014_text46541150104113">DataArts Factory</span>.</li><li id="dataartsstudio_01_0521__en-us_topic_0127305014_li76017368425">Create a <span id="dataartsstudio_01_0521__en-us_topic_0127305014_text201451942184214">DataArts Factory</span> job and submit it using the <span id="dataartsstudio_01_0521__en-us_topic_0127305014_text66741334194819">DLI Spark</span> node.</li></ol>
|
|
</div>
|
|
<div class="section" id="dataartsstudio_01_0521__en-us_topic_0127305014_section151881748103213"><h4 class="sectiontitle">Preparations</h4><ul id="dataartsstudio_01_0521__en-us_topic_0127305014_ul14782135310324"><li id="dataartsstudio_01_0521__en-us_topic_0127305014_li1978210535325">Object Storage Service (OBS) has been enabled and a bucket, for example, <span class="parmvalue" id="dataartsstudio_01_0521__en-us_topic_0127305014_parmvalue1498034618151"><b>obs://dlfexample</b></span>, has been created for storing the JAR package of the Spark job.</li><li id="dataartsstudio_01_0521__en-us_topic_0127305014_li12147104114565">DLI has been enabled, and the Spark cluster <span class="parmvalue" id="dataartsstudio_01_0521__en-us_topic_0127305014_parmvalue9819693234"><b>spark_cluster</b></span> has been created for providing physical resources required for the Spark job.</li></ul>
|
|
</div>
|
|
<div class="section" id="dataartsstudio_01_0521__en-us_topic_0127305014_section973185620415"><h4 class="sectiontitle">Obtaining Spark Job Code</h4><p id="dataartsstudio_01_0521__en-us_topic_0127305014_p58061011181215">The Spark job code used in this example comes from the maven repository that can be download from <a href="https://repo.maven.apache.org/maven2/org/apache/spark/spark-examples_2.10/1.1.1/spark-examples_2.10-1.1.1.jar" target="_blank" rel="noopener noreferrer">https://repo.maven.apache.org/maven2/org/apache/spark/spark-examples_2.10/1.1.1/spark-examples_2.10-1.1.1.jar</a>. This Spark job is to calculate the approximate value of π.</p>
|
|
<ol id="dataartsstudio_01_0521__en-us_topic_0127305014_ol0494151272"><li id="dataartsstudio_01_0521__en-us_topic_0127305014_li202617185718"><a name="dataartsstudio_01_0521__en-us_topic_0127305014_li202617185718"></a><a name="en-us_topic_0127305014_li202617185718"></a><span>After obtaining the JAR package of the Spark job codes, upload it to the OBS bucket. The save path is <span class="parmvalue" id="dataartsstudio_01_0521__en-us_topic_0127305014_parmvalue5682025273"><b>obs://dlfexample/spark-examples_2.10-1.1.1.jar</b></span>.</span></li><li id="dataartsstudio_01_0521__li10888120591"><span>On the <span id="dataartsstudio_01_0521__en-us_topic_0181092879_text185611381448">DataArts Studio</span> console, locate a workspace and click <strong id="dataartsstudio_01_0521__en-us_topic_0181092879_b65382814249">DataArts Factory</strong>.</span></li><li id="dataartsstudio_01_0521__en-us_topic_0127305014_li95886261478"><a name="dataartsstudio_01_0521__en-us_topic_0127305014_li95886261478"></a><a name="en-us_topic_0127305014_li95886261478"></a><span>In the navigation tree on the left, choose <span class="menucascade" id="dataartsstudio_01_0521__menucascade1271733514514"><b><span class="uicontrol" id="dataartsstudio_01_0521__uicontrol157101735559">Configuration</span></b> > <b><span class="uicontrol" id="dataartsstudio_01_0521__uicontrol1671516351157"><span id="dataartsstudio_01_0521__text13712193511511">Manage Resource</span></span></b></span>. Click <strong id="dataartsstudio_01_0521__b735914585218">Create Resource</strong> and create resource <span class="parmvalue" id="dataartsstudio_01_0521__parmvalue166752671111"><b>spark-example</b></span> on <span id="dataartsstudio_01_0521__en-us_topic_0127305014_text1011504212919">DataArts Factory</span> and associate it with the JAR package obtained in <a href="#dataartsstudio_01_0521__en-us_topic_0127305014_li202617185718">1</a>.</span><p><div class="fignone" id="dataartsstudio_01_0521__en-us_topic_0127305014_fig1964822591719"><span class="figcap"><b>Figure 1 </b>Creating a resource</span><br><span><img id="dataartsstudio_01_0521__image819775613319" src="en-us_image_0000002234235780.png" title="Click to enlarge" class="imgResize"></span></div>
|
|
</p></li></ol>
|
|
</div>
|
|
<div class="section" id="dataartsstudio_01_0521__en-us_topic_0127305014_section24471416123914"><h4 class="sectiontitle">Submitting a Spark Job</h4><p id="dataartsstudio_01_0521__en-us_topic_0127305014_p1573113318411">You need to create a job on <span id="dataartsstudio_01_0521__en-us_topic_0127305014_text157910234314">DataArts Factory</span> and submit the Spark job using the <span id="dataartsstudio_01_0521__en-us_topic_0127305014_text1142144333114">DLI Spark</span> node of the job.</p>
|
|
<ol id="dataartsstudio_01_0521__en-us_topic_0127305014_ol1918874083217"><li id="dataartsstudio_01_0521__en-us_topic_0127305014_li394010553512"><span>Create a job named <span class="parmvalue" id="dataartsstudio_01_0521__en-us_topic_0127305014_parmvalue14468121114113"><b>job_DLI_Spark</b></span> for the DataArts Factory module.</span><p><div class="fignone" id="dataartsstudio_01_0521__en-us_topic_0127305014_fig17440821181815"><span class="figcap"><b>Figure 2 </b>Creating a job</span><br><span><img id="dataartsstudio_01_0521__image194563550517" src="en-us_image_0000002234235784.png" title="Click to enlarge" class="imgResize"></span></div>
|
|
</p></li><li id="dataartsstudio_01_0521__en-us_topic_0127305014_en-us_topic_0120230640_li981518813820"><span>Go to the job development page, drag the <span id="dataartsstudio_01_0521__en-us_topic_0127305014_text522062415116">DLI Spark</span> node to the canvas, and click the node to configure node properties.</span><p><div class="fignone" id="dataartsstudio_01_0521__en-us_topic_0127305014_fig4280516112019"><span class="figcap"><b>Figure 3 </b>Configuring node properties</span><br><span><img id="dataartsstudio_01_0521__image1874472618422" src="en-us_image_0000002269115141.png" title="Click to enlarge" class="imgResize"></span></div>
|
|
<p id="dataartsstudio_01_0521__en-us_topic_0127305014_p118591925141">Description of key properties:</p>
|
|
<ul id="dataartsstudio_01_0521__en-us_topic_0127305014_ul45472162144"><li id="dataartsstudio_01_0521__en-us_topic_0127305014_li32681040151611"><strong id="dataartsstudio_01_0521__b165412323514">DLI Queue</strong>: Select a DLI queue.</li><li id="dataartsstudio_01_0521__en-us_topic_0127305014_li1547141601415">Job Running Resource: Maximum CPU and memory resources that can be used when a <span id="dataartsstudio_01_0521__en-us_topic_0127305014_text17763115188">DLI Spark</span> node is running.</li><li id="dataartsstudio_01_0521__en-us_topic_0127305014_li1951525371817">Major Job Class: major class of a <span id="dataartsstudio_01_0521__en-us_topic_0127305014_text199523582186">DLI Spark</span> node. In this example, the major class is <span class="parmvalue" id="dataartsstudio_01_0521__en-us_topic_0127305014_parmvalue877184714196"><b>org.apache.spark.examples.SparkPi</b></span>.</li><li id="dataartsstudio_01_0521__en-us_topic_0127305014_li42511124101917"><strong id="dataartsstudio_01_0521__b277971910350">Spark program resource package</strong>: Select the resources created in <a href="#dataartsstudio_01_0521__en-us_topic_0127305014_li95886261478">3</a>.</li></ul>
|
|
</p></li><li id="dataartsstudio_01_0521__en-us_topic_0127305014_li2067134781615"><span>After the job orchestration is complete, click <span><img id="dataartsstudio_01_0521__image385374154816" src="en-us_image_0000002269195221.png"></span> to test the job.</span><p><div class="fignone" id="dataartsstudio_01_0521__en-us_topic_0127305014_fig325316594251"><span class="figcap"><b>Figure 4 </b>Job logs (for reference only)</span><br><span><img id="dataartsstudio_01_0521__image1208184105316" src="en-us_image_0000002269115137.png" title="Click to enlarge" class="imgResize"></span></div>
|
|
</p></li><li id="dataartsstudio_01_0521__en-us_topic_0127305014_li1482020111912"><span>If no error is recorded in logs, save and submit the job.</span></li></ol>
|
|
</div>
|
|
</div>
|
|
<div>
|
|
<div class="familylinks">
|
|
<div class="parentlink"><strong>Parent topic:</strong> <a href="dataartsstudio_01_0520.html">Usage Guidance</a></div>
|
|
</div>
|
|
</div>
|
|
|
|
|
|
<script language="JavaScript">
|
|
<!--
|
|
initImageViewer('.imgResize');
|
|
var msg_imageMax = "view original image";
|
|
var msg_imageClose = "close";
|
|
//--></script> |