forked from docs/doc-exports
Reviewed-by: Pruthi, Vineet <vineet.pruthi@t-systems.com> Co-authored-by: Hasko, Vladimir <vladimir.hasko@t-systems.com> Co-committed-by: Hasko, Vladimir <vladimir.hasko@t-systems.com>
168 lines
16 KiB
HTML
168 lines
16 KiB
HTML
<a name="dli_09_0081"></a><a name="dli_09_0081"></a>
|
|
|
|
<h1 class="topictitle1">PySpark Example Code</h1>
|
|
<div id="body8662426"><div class="section" id="dli_09_0081__section3685105194914"><h4 class="sectiontitle">Development Description</h4><p id="dli_09_0081__en-us_topic_0197738133_p492312464537">The CloudTable OpenTSDB and MRS OpenTSDB can be connected to DLI as data sources.</p>
|
|
<ul id="dli_09_0081__ul62191935508"><li id="dli_09_0081__li221993135018">Prerequisites<p id="dli_09_0081__p246892735015"><a name="dli_09_0081__li221993135018"></a><a name="li221993135018"></a>A datasource connection has been created on the DLI management console. </p>
|
|
<div class="note" id="dli_09_0081__note1358715714155"><img src="public_sys-resources/note_3.0-en-us.png"><span class="notetitle"> </span><div class="notebody"><p id="dli_09_0081__p1858718570154">Hard-coded or plaintext passwords pose significant security risks. To ensure security, encrypt your passwords, store them in configuration files or environment variables, and decrypt them when needed.</p>
|
|
</div></div>
|
|
</li><li id="dli_09_0081__li55257218511">Code implementation<ol id="dli_09_0081__en-us_topic_0197738133_ol12123050181818"><li id="dli_09_0081__en-us_topic_0197738133_li1612316509182">Import dependency packages.<div class="codecoloring" codetype="Python" id="dli_09_0081__en-us_topic_0197738133_screen68181719144911"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span>
|
|
<span class="normal">2</span>
|
|
<span class="normal">3</span></pre></div></td><td class="code"><div><pre><span></span><span class="kn">from</span> <span class="nn">__future__</span> <span class="kn">import</span> <span class="n">print_function</span>
|
|
<span class="kn">from</span> <span class="nn">pyspark.sql.types</span> <span class="kn">import</span> <span class="n">StructType</span><span class="p">,</span> <span class="n">StructField</span><span class="p">,</span> <span class="n">StringType</span><span class="p">,</span> <span class="n">LongType</span><span class="p">,</span> <span class="n">DoubleType</span>
|
|
<span class="kn">from</span> <span class="nn">pyspark.sql</span> <span class="kn">import</span> <span class="n">SparkSession</span>
|
|
</pre></div></td></tr></table></div>
|
|
|
|
</div>
|
|
</li><li id="dli_09_0081__en-us_topic_0197738133_li11272141817195">Create a session.<div class="codecoloring" codetype="Python" id="dli_09_0081__en-us_topic_0197738133_screen2658132002217"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span></pre></div></td><td class="code"><div><pre><span></span><span class="n">sparkSession</span> <span class="o">=</span> <span class="n">SparkSession</span><span class="o">.</span><span class="n">builder</span><span class="o">.</span><span class="n">appName</span><span class="p">(</span><span class="s2">"datasource-opentsdb"</span><span class="p">)</span><span class="o">.</span><span class="n">getOrCreate</span><span class="p">()</span>
|
|
</pre></div></td></tr></table></div>
|
|
|
|
</div>
|
|
</li><li id="dli_09_0081__en-us_topic_0197738133_li17698293198">Create a table to connect to an OpenTSDB data source.<div class="codecoloring" codetype="Python" id="dli_09_0081__en-us_topic_0197738133_screen95431138152317"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span>
|
|
<span class="normal">2</span>
|
|
<span class="normal">3</span>
|
|
<span class="normal">4</span></pre></div></td><td class="code"><div><pre><span></span><span class="n">sparkSession</span><span class="o">.</span><span class="n">sql</span><span class="p">(</span><span class="s2">"create table opentsdb_test using opentsdb options(</span>
|
|
<span class="s1">'Host'</span><span class="o">=</span><span class="s1">'opentsdb-3xcl8dir15m58z3.cloudtable.com:4242'</span><span class="p">,</span>
|
|
<span class="s1">'metric'</span><span class="o">=</span><span class="s1">'ct_opentsdb'</span><span class="p">,</span>
|
|
<span class="s1">'tags'</span><span class="o">=</span><span class="s1">'city,location'</span><span class="p">)</span><span class="s2">")</span>
|
|
</pre></div></td></tr></table></div>
|
|
|
|
</div>
|
|
<div class="note" id="dli_09_0081__en-us_topic_0197738133_note1376719247267"><img src="public_sys-resources/note_3.0-en-us.png"><span class="notetitle"> </span><div class="notebody"><p id="dli_09_0081__en-us_topic_0197738133_p14768153018616">For details about the <strong id="dli_09_0081__b159035137210">Host</strong>, <strong id="dli_09_0081__b11903111352110">metric</strong>, and <strong id="dli_09_0081__b790420131217">tags</strong> parameters, see <a href="dli_09_0065.html#dli_09_0065__en-us_topic_0190597601_table463015581831">Table 1</a>.</p>
|
|
</div></div>
|
|
</li></ol>
|
|
</li><li id="dli_09_0081__li98591845192715">Connecting to data sources through SQL APIs<ol id="dli_09_0081__ol19158103973018"><li id="dli_09_0081__li12158839163020">Insert data.<pre class="screen" id="dli_09_0081__screen1821218553307">sparkSession.sql("insert into opentsdb_test values('aaa', 'abc', '2021-06-30 18:00:00', 30.0)")</pre>
|
|
</li><li id="dli_09_0081__li1687241143016">Query data.<pre class="screen" id="dli_09_0081__screen1913623113213">result = sparkSession.sql("SELECT * FROM opentsdb_test")</pre>
|
|
</li></ol>
|
|
</li><li id="dli_09_0081__li761708155216">Connecting to data sources through DataFrame APIs<ol id="dli_09_0081__en-us_topic_0197738133_ol62934313101"><li id="dli_09_0081__en-us_topic_0197738133_li4293143141018">Construct a schema.<div class="codecoloring" codetype="Python" id="dli_09_0081__en-us_topic_0197738133_screen6395195210104"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span>
|
|
<span class="normal">2</span>
|
|
<span class="normal">3</span>
|
|
<span class="normal">4</span></pre></div></td><td class="code"><div><pre><span></span><span class="n">schema</span> <span class="o">=</span> <span class="n">StructType</span><span class="p">([</span><span class="n">StructField</span><span class="p">(</span><span class="s2">"location"</span><span class="p">,</span> <span class="n">StringType</span><span class="p">()),</span>\
|
|
<span class="n">StructField</span><span class="p">(</span><span class="s2">"name"</span><span class="p">,</span> <span class="n">StringType</span><span class="p">()),</span> \
|
|
<span class="n">StructField</span><span class="p">(</span><span class="s2">"timestamp"</span><span class="p">,</span> <span class="n">LongType</span><span class="p">()),</span>\
|
|
<span class="n">StructField</span><span class="p">(</span><span class="s2">"value"</span><span class="p">,</span> <span class="n">DoubleType</span><span class="p">())])</span>
|
|
</pre></div></td></tr></table></div>
|
|
|
|
</div>
|
|
</li><li id="dli_09_0081__en-us_topic_0197738133_li531012517114">Set data.<div class="codecoloring" codetype="Python" id="dli_09_0081__en-us_topic_0197738133_screen2083911515127"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span></pre></div></td><td class="code"><div><pre><span></span><span class="n">dataList</span> <span class="o">=</span> <span class="n">sparkSession</span><span class="o">.</span><span class="n">sparkContext</span><span class="o">.</span><span class="n">parallelize</span><span class="p">([(</span><span class="s2">"aaa"</span><span class="p">,</span> <span class="s2">"abc"</span><span class="p">,</span> <span class="mi">123456</span><span class="n">L</span><span class="p">,</span> <span class="mf">30.0</span><span class="p">)])</span>
|
|
</pre></div></td></tr></table></div>
|
|
|
|
</div>
|
|
</li><li id="dli_09_0081__en-us_topic_0197738133_li4741141821116">Create a DataFrame.<div class="codecoloring" codetype="Python" id="dli_09_0081__en-us_topic_0197738133_screen1520319134349"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span></pre></div></td><td class="code"><div><pre><span></span><span class="n">dataFrame</span> <span class="o">=</span> <span class="n">sparkSession</span><span class="o">.</span><span class="n">createDataFrame</span><span class="p">(</span><span class="n">dataList</span><span class="p">,</span> <span class="n">schema</span><span class="p">)</span>
|
|
</pre></div></td></tr></table></div>
|
|
|
|
</div>
|
|
</li><li id="dli_09_0081__en-us_topic_0197738133_li10107173841110">Import data to OpenTSDB.<div class="codecoloring" codetype="Python" id="dli_09_0081__en-us_topic_0197738133_screen14133320357"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span></pre></div></td><td class="code"><div><pre><span></span><span class="n">dataFrame</span><span class="o">.</span><span class="n">write</span><span class="o">.</span><span class="n">insertInto</span><span class="p">(</span><span class="s2">"opentsdb_test"</span><span class="p">)</span>
|
|
</pre></div></td></tr></table></div>
|
|
|
|
</div>
|
|
</li><li id="dli_09_0081__en-us_topic_0197738133_li1956013474119">Read data from OpenTSDB.<div class="codecoloring" codetype="Python" id="dli_09_0081__en-us_topic_0197738133_screen7259610133515"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span>
|
|
<span class="normal">2</span>
|
|
<span class="normal">3</span>
|
|
<span class="normal">4</span>
|
|
<span class="normal">5</span>
|
|
<span class="normal">6</span>
|
|
<span class="normal">7</span></pre></div></td><td class="code"><div><pre><span></span><span class="n">jdbdDF</span> <span class="o">=</span> <span class="n">sparkSession</span><span class="o">.</span><span class="n">read</span>
|
|
<span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="s2">"opentsdb"</span><span class="p">)</span>\
|
|
<span class="o">.</span><span class="n">option</span><span class="p">(</span><span class="s2">"Host"</span><span class="p">,</span><span class="s2">"opentsdb-3xcl8dir15m58z3.cloudtable.com:4242"</span><span class="p">)</span>\
|
|
<span class="o">.</span><span class="n">option</span><span class="p">(</span><span class="s2">"metric"</span><span class="p">,</span><span class="s2">"ctopentsdb"</span><span class="p">)</span>\
|
|
<span class="o">.</span><span class="n">option</span><span class="p">(</span><span class="s2">"tags"</span><span class="p">,</span><span class="s2">"city,location"</span><span class="p">)</span>\
|
|
<span class="o">.</span><span class="n">load</span><span class="p">()</span>
|
|
<span class="n">jdbdDF</span><span class="o">.</span><span class="n">show</span><span class="p">()</span>
|
|
</pre></div></td></tr></table></div>
|
|
|
|
</div>
|
|
</li></ol>
|
|
</li><li id="dli_09_0081__li1410414355525">Submitting a Spark job<ol id="dli_09_0081__en-us_topic_0197738133_ol612481914610"><li id="dli_09_0081__li525841115179">Upload the Python code file to DLI.<p id="dli_09_0081__p648216175172"><a name="dli_09_0081__li525841115179"></a><a name="li525841115179"></a></p>
|
|
<p id="dli_09_0081__p7676212171720"></p>
|
|
</li><li id="dli_09_0081__li78195201174">In the Spark job editor, select the corresponding dependency module and execute the Spark job.<p id="dli_09_0081__p5931225131720"><a name="dli_09_0081__li78195201174"></a><a name="li78195201174"></a></p>
|
|
<div class="p" id="dli_09_0081__p7319112271716"><div class="note" id="dli_09_0081__en-us_topic_0197738133_note1435543551919"><img src="public_sys-resources/note_3.0-en-us.png"><span class="notetitle"> </span><div class="notebody"><ul id="dli_09_0081__en-us_topic_0197738133_ul17825285811"><li id="dli_09_0081__en-us_topic_0197738142_li58215295819">If the Spark version is 2.3.2 (will be offline soon) or 2.4.5, specify the <strong id="dli_09_0081__b923021854915">Module</strong> to <strong id="dli_09_0081__b3230618174915">sys.datasource.opentsdb</strong> when you submit a job.</li><li id="dli_09_0081__li6624653171317">If the Spark version is 3.1.1, you do not need to select a module. Configure <strong id="dli_09_0081__b18248151924914">Spark parameters (--conf)</strong>.<p id="dli_09_0081__p1723617371259">spark.driver.extraClassPath=/usr/share/extension/dli/spark-jar/datasource/opentsdb/*</p>
|
|
<p id="dli_09_0081__p6236153714259">spark.executor.extraClassPath=/usr/share/extension/dli/spark-jar/datasource/opentsdb/*</p>
|
|
</li></ul>
|
|
</div></div>
|
|
</div>
|
|
</li></ol>
|
|
</li></ul>
|
|
</div>
|
|
<div class="section" id="dli_09_0081__section1783516613536"><h4 class="sectiontitle">Complete Example Code</h4><ul id="dli_09_0081__ul2617145113018"><li id="dli_09_0081__li16176503011">Connecting to MRS OpenTSDB through SQL APIs<pre class="screen" id="dli_09_0081__screen1024318416307"># _*_ coding: utf-8 _*_
|
|
from __future__ import print_function
|
|
from pyspark.sql.types import StructType, StructField, StringType, LongType, DoubleType
|
|
from pyspark.sql import SparkSession
|
|
|
|
if __name__ == "__main__":
|
|
# Create a SparkSession session.
|
|
sparkSession = SparkSession.builder.appName("datasource-opentsdb").getOrCreate()
|
|
|
|
|
|
# Create a DLI cross-source association opentsdb data table
|
|
sparkSession.sql(\
|
|
"create table opentsdb_test using opentsdb options(\
|
|
'Host'='10.0.0.171:4242',\
|
|
'metric'='cts_opentsdb',\
|
|
'tags'='city,location')")
|
|
|
|
sparkSession.sql("insert into opentsdb_test values('aaa', 'abc', '2021-06-30 18:00:00', 30.0)")
|
|
|
|
result = sparkSession.sql("SELECT * FROM opentsdb_test")
|
|
result.show()
|
|
|
|
# close session
|
|
sparkSession.stop()</pre>
|
|
</li><li id="dli_09_0081__li469501910305">Connecting to OpenTSDB through DataFrame APIs<pre class="screen" id="dli_09_0081__screen1895134416305"># _*_ coding: utf-8 _*_
|
|
from __future__ import print_function
|
|
from pyspark.sql.types import StructType, StructField, StringType, LongType, DoubleType
|
|
from pyspark.sql import SparkSession
|
|
|
|
if __name__ == "__main__":
|
|
# Create a SparkSession session.
|
|
sparkSession = SparkSession.builder.appName("datasource-opentsdb").getOrCreate()
|
|
|
|
# Create a DLI cross-source association opentsdb data table
|
|
sparkSession.sql(
|
|
"create table opentsdb_test using opentsdb options(\
|
|
'Host'='opentsdb-3xcl8dir15m58z3.cloudtable.com:4242',\
|
|
'metric'='ct_opentsdb',\
|
|
'tags'='city,location')")
|
|
|
|
# Create a DataFrame and initialize the DataFrame data.
|
|
dataList = sparkSession.sparkContext.parallelize([("aaa", "abc", 123456L, 30.0)])
|
|
|
|
# Setting schema
|
|
schema = StructType([StructField("location", StringType()),\
|
|
StructField("name", StringType()),\
|
|
StructField("timestamp", LongType()),\
|
|
StructField("value", DoubleType())])
|
|
|
|
# Create a DataFrame from RDD and schema
|
|
dataFrame = sparkSession.createDataFrame(dataList, schema)
|
|
|
|
# Set cross-source connection parameters
|
|
metric = "ctopentsdb"
|
|
tags = "city,location"
|
|
Host = "opentsdb-3xcl8dir15m58z3.cloudtable.com:4242"
|
|
|
|
# Write data to the cloudtable-opentsdb
|
|
dataFrame.write.insertInto("opentsdb_test")
|
|
# ******* Opentsdb does not currently implement the ctas method to save data, so the save() method cannot be used.*******
|
|
# dataFrame.write.format("opentsdb").option("Host", Host).option("metric", metric).option("tags", tags).mode("Overwrite").save()
|
|
|
|
# Read data on CloudTable-OpenTSDB
|
|
jdbdDF = sparkSession.read\
|
|
.format("opentsdb")\
|
|
.option("Host",Host)\
|
|
.option("metric",metric)\
|
|
.option("tags",tags)\
|
|
.load()
|
|
jdbdDF.show()
|
|
|
|
# close session
|
|
sparkSession.stop()</pre>
|
|
</li></ul>
|
|
</div>
|
|
</div>
|
|
<div>
|
|
<div class="familylinks">
|
|
<div class="parentlink"><strong>Parent topic:</strong> <a href="dli_09_0080.html">Connecting to OpenTSDB</a></div>
|
|
</div>
|
|
</div>
|
|
|