Yang, Tong 6182f91ba8 MRS component operation guide_normal 2.0.38.SP20 version
Reviewed-by: Hasko, Vladimir <vladimir.hasko@t-systems.com>
Co-authored-by: Yang, Tong <yangtong2@huawei.com>
Co-committed-by: Yang, Tong <yangtong2@huawei.com>
2022-12-09 14:55:21 +00:00

50 lines
14 KiB
HTML

<a name="mrs_01_0367"></a><a name="mrs_01_0367"></a>
<h1 class="topictitle1">Getting Started with Spark SQL</h1>
<div id="body1589421619401"><p id="mrs_01_0367__af359eb4111cf43d8932b2ddcdb045f5c">Spark provides the Spark SQL language that is similar to SQL to perform operations on structured data. This section describes how to use Spark SQL from scratch. Create a table named <strong id="mrs_01_0367__b122075528446">src_data</strong>, write a data record in each row of the table, and store the data in the <strong id="mrs_01_0367__b755910137451">mrs_20160907</strong> cluster. Then use SQL statements to query data in the table, and delete the table at last.</p>
<div class="section" id="mrs_01_0367__sf409cb8b039d45e191bed0dc51e447e3"><a name="mrs_01_0367__sf409cb8b039d45e191bed0dc51e447e3"></a><a name="sf409cb8b039d45e191bed0dc51e447e3"></a><h4 class="sectiontitle">Prerequisites</h4><div class="p" id="mrs_01_0367__ae400034bcd694dff989d9d2daddd99ea">You have obtained the AK/SK for writing data from an OBS data source to a Spark SQL table. To obtain it, perform as follows:<ol id="mrs_01_0367__o8e5a70b3de4b41a392d301430dfec146"><li id="mrs_01_0367__l9bf73504983446c9942869d57c938e43">Log in to the management console.</li><li id="mrs_01_0367__l809413b8944e46b8a3b8e977ccfcc4d3">Click the username and select <span class="parmname" id="mrs_01_0367__parmname10891143218460"><b>My Credentials</b></span> from the drop-down list.</li><li id="mrs_01_0367__l831818f2b8cc448ea160f122aa488bf4">On the displayed <strong id="mrs_01_0367__b1033783912310">My Credentials</strong> page, click <span class="parmname" id="mrs_01_0367__parmname8337163911237"><b>Access Keys</b></span>.</li><li id="mrs_01_0367__la2c64cad1bfe45a9969b01050a3ee534">Click <span class="uicontrol" id="mrs_01_0367__uicontrol11643183618474"><b>Create Access Key</b></span> to switch to the <span class="wintitle" id="mrs_01_0367__wintitle20649536144719"><b>Create Access Key</b></span> dialog box.</li><li id="mrs_01_0367__l9ee6d15b0ec941218b6e49d32e7df089">Enter the password and verification code in the email, and click <strong id="mrs_01_0367__b1554191716501">OK</strong> to download the access key. Keep the access key secure. </li></ol>
</div>
</div>
<div class="section" id="mrs_01_0367__s56e8f18de4d644e58ee2280f2ea5ec88"><h4 class="sectiontitle">Procedure</h4><ol id="mrs_01_0367__ob9d24e62c8e34d26bdf4277d0b8b5ae9"><li id="mrs_01_0367__l55e4469be18d43aea169f9a59bed629f"><span>Prepare data sources for Spark SQL analysis.</span><p><p id="mrs_01_0367__a408c0ccae0ef4b0497cf3e5c05780d0b">The sample text file is as follows:</p>
<pre class="screen" id="mrs_01_0367__sd8781613b25944b481ae59373289d09e">abcd3ghji
efgh658ko
1234jjyu9
7h8kodfg1
kk99icxz3</pre>
</p></li><li id="mrs_01_0367__l915c79cafdb0480597b0409bda9710cf"><span>Upload data to OBS.</span><p><ol type="a" id="mrs_01_0367__o04e50dc3142f40c9bd14366bd954cd6e"><li id="mrs_01_0367__l9db3bf130cde43abbcdb299269c86482">Log in to OBS Console.</li><li id="mrs_01_0367__l4908eaeaaddb45138c8d91d4f40e7f0d">Choose <strong id="mrs_01_0367__b149491328142413">Parallel File System</strong> &gt; <strong id="mrs_01_0367__b6949192818241">Create Parallel File System</strong> to create a file system named <strong id="mrs_01_0367__b1894922810249">sparksql</strong>.<p id="mrs_01_0367__a7159dad5a95342f0a29587d6a2952654"><strong id="mrs_01_0367__b3164153662418">sparksql</strong> is only an example. The file system name must be globally unique. Otherwise, the parallel file system fails to be created.</p>
</li><li id="mrs_01_0367__li1989212491022">Click the name of the <strong id="mrs_01_0367__b316024313248">sparksql</strong> file system and click <strong id="mrs_01_0367__b3161124311247">Files</strong>.</li><li id="mrs_01_0367__l1f7011601300422694efae5a48f3684f">Click <strong id="mrs_01_0367__b14961050125119">Create Folder</strong> to create the <strong id="mrs_01_0367__b726465620516">input</strong> folder.</li><li id="mrs_01_0367__l1d1db7562e6e46cdaa618a7d2536571b">Go to the <strong id="mrs_01_0367__b1464741145210">input</strong> folder, choose <strong id="mrs_01_0367__b1268017935411">Upload File</strong> &gt; <strong id="mrs_01_0367__b7563131415546">add file</strong>, select the local TXT file, and click <strong id="mrs_01_0367__b19464152215548">Upload</strong>.</li></ol>
</p></li><li id="mrs_01_0367__l8392811897d2429ca1dfcbf41ca4fe1a"><span>Log in to the MRS console. In the left navigation pane, choose <strong id="mrs_01_0367__b870464910542">Clusters</strong> &gt; <strong id="mrs_01_0367__b18704134911543">Active Clusters</strong>, and click a cluster name.</span></li><li id="mrs_01_0367__l175f02b3a1ae469a8fcc7a7cf83544ea"><span>Import the text file from OBS to HDFS.</span><p><ol type="a" id="mrs_01_0367__o47f064d350e24aa5904cb06c3b0b0935"><li id="mrs_01_0367__l289dd0eb725240f49435def440805bd0">Click the <strong id="mrs_01_0367__b179142520551">Files</strong> tab.</li><li id="mrs_01_0367__l98174c22947941439783e4441b9939f1">On the <strong id="mrs_01_0367__b36261521115512">HDFS File List</strong> tab page, click <strong id="mrs_01_0367__b155511434557">Create Folder</strong>, and create a folder named <strong id="mrs_01_0367__b15937747105515">userinput</strong>.</li><li id="mrs_01_0367__lf84405107fe74f18a0da342cbf02fc8c">Go to the <strong id="mrs_01_0367__b14690195318553">userinput</strong> folder, and click <strong id="mrs_01_0367__b38028219561">Import Data</strong>.</li><li id="mrs_01_0367__lc0149c72a8a44dfe9c0fd2606e0dba71">Select the OBS and HDFS paths and click <strong id="mrs_01_0367__b1285972018560">OK</strong>.<p id="mrs_01_0367__a621c8e5678264bcfab2a9018c487907b"><strong id="mrs_01_0367__b9127138175712">OBS Path</strong>: <strong id="mrs_01_0367__b167914356573">obs://sparksql/input/sparksql-test.txt</strong></p>
<p id="mrs_01_0367__a3b7cca56e6c34f1980d3c2e7ea2a9ada"><strong id="mrs_01_0367__b221042317019">HDFS Path</strong>: <strong id="mrs_01_0367__b1565617261807">/user/userinput</strong></p>
</li></ol>
</p></li><li id="mrs_01_0367__ldd3eef379d4a4d42af77afa9b1e480f2"><span>Submit the SQL statement.</span><p><ol type="a" id="mrs_01_0367__o73c0d7304389425f82d9145eb35bc6d2"><li id="mrs_01_0367__lc5877924998c4dc38fa166544e442b6d">On the MRS console, select <strong id="mrs_01_0367__b4482616144115">Job Management</strong>. For details about how to submit the statement, see <a href="https://docs.otc.t-systems.com/en-us/usermanual/mrs/mrs_01_0524.html" target="_blank" rel="noopener noreferrer">Running a SparkSubmit or Spark Job</a>.<p id="mrs_01_0367__a9da57265752e473581353e14cbd05eef">A job can be submitted only when the <span class="parmvalue" id="mrs_01_0367__pc2ab318cd02348709ab51e04db6bb02e"><b>mrs_20160907</b></span> cluster is in the <span class="parmvalue" id="mrs_01_0367__parmvalue983710236115"><b>Running</b></span> state.</p>
</li><li id="mrs_01_0367__l88a501d0bb944d6ba3b53eccd26f7ce9">Enter the Spark SQL statement for table creation.<p id="mrs_01_0367__a87b2d8e91cd641ad88d9062d9df85399"><a name="mrs_01_0367__l88a501d0bb944d6ba3b53eccd26f7ce9"></a><a name="l88a501d0bb944d6ba3b53eccd26f7ce9"></a>When entering Spark SQL statements, ensure that the statement characters are not more than 10,000.</p>
<p id="mrs_01_0367__a55c4af4241774fef8b182dc9748fe4cc">Syntax:</p>
<p id="mrs_01_0367__ae470a455aef74180b202d52b0e9f70a2"><strong id="mrs_01_0367__a19a83a68944f4235978b61eb82a7c841">CREATE</strong> [EXTERNAL] <strong id="mrs_01_0367__en-us_topic_0019549323_b89976385286">TABLE</strong> [IF NOT EXISTS] <em id="mrs_01_0367__abc53e1c9ca3f4dbd9d35dcb5e2545f19">table_name</em> [(col_name data_type [COMMENT col_comment], ...)] [COMMENT table_comment] [PARTITIONED <strong id="mrs_01_0367__a4830619a07cd4a5bbf7e4f89218337ea">BY</strong> (col_name data_type [COMMENT col_comment], ...)] [CLUSTERED <strong id="mrs_01_0367__a813ee7d5402e4d12a8df83e45b610537">BY</strong> (col_name, col_name, ...) [SORTED <strong id="mrs_01_0367__a94d22e24ac8645a88118c80960de2418">BY</strong> (col_name [ASC|DESC], ...)] INTO num_buckets BUCKETS] [ROW FORMAT row_format] [STORED <strong id="mrs_01_0367__en-us_topic_0019549323_b755712083014">AS</strong> file_format] [LOCATION hdfs_path];</p>
<p id="mrs_01_0367__a4432aa2136ba4c17888bcc43c4ba3a0d">You can use the following two methods to create a table example:</p>
<ul id="mrs_01_0367__ud72961a0a1144170965951f7cb847b80"><li id="mrs_01_0367__l9c963e203f84471cbf3a6c4ec7e0bffd">Method 1: Create table <strong id="mrs_01_0367__b08812579514">src_data</strong> and write data in every row.<ul id="mrs_01_0367__u819fe898597840819a88d607e8a929e1"><li id="mrs_01_0367__l5d72d3f50f8a453d97353dd2d1dae308">The data source is stored in the <span class="filepath" id="mrs_01_0367__f4ee9975da31c471c90095440ffbb93f6"><b>/user/userinput</b></span> folder of HDFS: <i><b><span class="cmdname" style="font-family:Arial" id="mrs_01_0367__cmdname20444135501419">create external table</span></b></i> <i><span class="varname" id="mrs_01_0367__v4a4ef74534b84da5b79a64806b8c59c8">src_data</span></i><i><b><span class="cmdname" style="font-family:Arial" id="mrs_01_0367__cmdname1744435581417">(line string) row format delimited fields terminated by '\\n' stored as textfile location</span></b></i> '<em id="mrs_01_0367__a6622b2f8f4714d7a8824a8de09a54928">/user/</em><em id="mrs_01_0367__acaab063ca41841dd96e4a0f7adc61ea6">userinput</em>';</li><li id="mrs_01_0367__l26d5f4cd81734291bdc2fab25eb68953">The data source is stored in the <span class="filepath" id="mrs_01_0367__febcadc1ffc604f08bdbd7ef5e2b61b18"><b>/sparksql/input</b></span> folder of OBS: <strong id="mrs_01_0367__en-us_topic_0019549323_b1602490312">create external table</strong> <em id="mrs_01_0367__en-us_topic_0019549323_i899817918320">src_data</em><strong id="mrs_01_0367__a5bc53cab32e8429c86dece0365d57b11">(line string) row format delimited fields terminated by '\\n' stored as textfile location</strong> '<em id="mrs_01_0367__en-us_topic_0019549323_i51675413211">obs://AK:SK@sparksql/input</em>';<p id="mrs_01_0367__aaefc6f1bd8ba43038752e2b5c0ca1ba4">For details about how to obtain the AK/SK, see <a href="#mrs_01_0367__sf409cb8b039d45e191bed0dc51e447e3">Prerequisites</a>.</p>
</li></ul>
</li><li id="mrs_01_0367__l72d005e8791d4a27a8935855623e418f">Method 2: Create table <strong id="mrs_01_0367__b20379116187">src_data1</strong> and load data to the table in batches.<p id="mrs_01_0367__a4f1ca8550ea943a395c708f141617028"><strong id="mrs_01_0367__a54f49c233b8f40098a9de0931ac3d688">create table</strong> <em id="mrs_01_0367__a1738e948b07349708046e32faae7979a">src_data1</em> <strong id="mrs_01_0367__a00a9492ef11444029d74780376b6027e">(line string) row format delimited fields terminated by ','</strong> ;</p>
<p id="mrs_01_0367__ae3b3e71945cc4bc797389fa550b55084"><strong id="mrs_01_0367__en-us_topic_0019549323_b201193147425">load data inpath</strong> '<em id="mrs_01_0367__aed64243e07a54e9f9408ef30bee41078">/user/</em><em id="mrs_01_0367__a840f5ce9bf9c4b4f8e77ba153b4f3fd1">userinput/sparksql-test.txt</em>' <strong id="mrs_01_0367__a63c20e272d1b48b881d788017b4ad1fd">into table</strong> <em id="mrs_01_0367__en-us_topic_0019549323_i137839395428">src_data1</em>;</p>
</li></ul>
<div class="note" id="mrs_01_0367__n97342fa4140b46079eddb6a5ffc0c15e"><img src="public_sys-resources/note_3.0-en-us.png"><span class="notetitle"> </span><div class="notebody"><p id="mrs_01_0367__a52a3b74c4c8e4b089ef36ec0ec8ace02">When method 2 is used, the data from OBS cannot be loaded to the created tables directly.</p>
</div></div>
</li><li id="mrs_01_0367__l0c4bdea4aa5844d5a138a4848ffaffda">Enter the Spark SQL statement for table query.<p id="mrs_01_0367__aef277db2b7be4dea941c47ce7b302487"><a name="mrs_01_0367__l0c4bdea4aa5844d5a138a4848ffaffda"></a><a name="l0c4bdea4aa5844d5a138a4848ffaffda"></a>Syntax:</p>
<p id="mrs_01_0367__a3b23be6ca70a4d529210a283f38a927c"><strong id="mrs_01_0367__ab0abd593eba7419f9e2fa7300dd5f04d">SELECT</strong> col_name <strong id="mrs_01_0367__en-us_topic_0019549323_b640615474443">FROM</strong> <em id="mrs_01_0367__af78a6f06bb6d41c8b5b6bcf10dc6cd4b">table_name</em>;</p>
<p id="mrs_01_0367__ac2c7928377b944d18b6c034fdf6a297b">Example of querying all data in the <strong id="mrs_01_0367__b484001792">src_data</strong> table:</p>
<p id="mrs_01_0367__a86280a9dd77545f2af2022a295dd7936"><i><b><span class="cmdname" style="font-family:Arial" id="mrs_01_0367__c3d9e4f488825415eadd7cfa8b0edc438">select * from src_data;</span></b></i></p>
</li><li id="mrs_01_0367__l359d5323c22a41159d5a6368bea4fca5">Enter the Spark SQL statement for table deletion.<p id="mrs_01_0367__aa3791b4b12384904bcb549ff9b30dec3"><a name="mrs_01_0367__l359d5323c22a41159d5a6368bea4fca5"></a><a name="l359d5323c22a41159d5a6368bea4fca5"></a>Syntax:</p>
<p id="mrs_01_0367__a2eba189dc72840fbb3f15ffe10fbab2c"><strong id="mrs_01_0367__abbda4aba50b14df5877aadc43d4fc105">DROP TABLE</strong> [IF EXISTS] <em id="mrs_01_0367__en-us_topic_0019549323_i169212486454">table_name</em>;</p>
<p id="mrs_01_0367__ad5f866ee5ed949599b2d300c67affbfc">Example of deleting the <strong id="mrs_01_0367__b11949125151118">src_data</strong> table:</p>
<p id="mrs_01_0367__af69f8cf4b7654a1f9f42ad07911dc386"><i><b><span class="cmdname" style="font-family:Arial" id="mrs_01_0367__cf2cd8f614b41412c920a13fcb389bde0">drop table src_data;</span></b></i></p>
</li><li id="mrs_01_0367__l285960a036334c6c9a952297c56dfb17">Click <strong id="mrs_01_0367__b27231725121114">Check</strong> to check the statement correctness.</li><li id="mrs_01_0367__l9044613714934e06888df1be13f8f2c9">Click <span class="uicontrol" id="mrs_01_0367__uicontrol9132134510116"><b>OK</b></span>.<p id="mrs_01_0367__a3ae9ba9056134bf492f510bd7e6cab79">After the Spark SQL statements are submitted, the statement execution results are displayed in the result column.</p>
</li></ol>
</p></li><li id="mrs_01_0367__l2131e3a7a72b4a41bca2bf862944b811"><span>Delete the cluster.</span></li></ol>
</div>
</div>
<div>
<div class="familylinks">
<div class="parentlink"><strong>Parent topic:</strong> <a href="mrs_01_0589.html">Using Spark</a></div>
</div>
</div>