forked from docs/doc-exports
Reviewed-by: Pruthi, Vineet <vineet.pruthi@t-systems.com> Co-authored-by: Hasko, Vladimir <vladimir.hasko@t-systems.com> Co-committed-by: Hasko, Vladimir <vladimir.hasko@t-systems.com>
218 lines
28 KiB
HTML
218 lines
28 KiB
HTML
<a name="dli_09_0097"></a><a name="dli_09_0097"></a>
|
|
|
|
<h1 class="topictitle1">PySpark Example Code</h1>
|
|
<div id="body8662426"><div class="section" id="dli_09_0097__section1822343181116"><h4 class="sectiontitle">Development Description</h4><p id="dli_09_0097__en-us_topic_0200509991_p492312464537">Redis supports only enhanced datasource connections. </p>
|
|
<ul id="dli_09_0097__ul995671919132"><li id="dli_09_0097__li695651911134">Prerequisites<p id="dli_09_0097__en-us_topic_0200509991_p1944354710257"><a name="dli_09_0097__li695651911134"></a><a name="li695651911134"></a>An enhanced datasource connection has been created on the DLI management console and bound to a queue in packages. </p>
|
|
<div class="note" id="dli_09_0097__note1358715714155"><img src="public_sys-resources/note_3.0-en-us.png"><span class="notetitle"> </span><div class="notebody"><p id="dli_09_0097__p692572617287">Hard-coded or plaintext passwords pose significant security risks. To ensure security, encrypt your passwords, store them in configuration files or environment variables, and decrypt them when needed.</p>
|
|
</div></div>
|
|
</li><li id="dli_09_0097__li183719311415">Connecting to data sources through DataFrame APIs<ol id="dli_09_0097__en-us_topic_0200509991_ol62934313101"><li id="dli_09_0097__en-us_topic_0200509991_li17921229203113">Import dependencies.<div class="codecoloring" codetype="Python" id="dli_09_0097__en-us_topic_0200509991_screen176071342153111"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span>
|
|
<span class="normal">2</span>
|
|
<span class="normal">3</span></pre></div></td><td class="code"><div><pre><span></span><span class="kn">from</span> <span class="nn">__future__</span> <span class="kn">import</span> <span class="n">print_function</span>
|
|
<span class="kn">from</span> <span class="nn">pyspark.sql.types</span> <span class="kn">import</span> <span class="n">StructType</span><span class="p">,</span> <span class="n">StructField</span><span class="p">,</span> <span class="n">IntegerType</span><span class="p">,</span> <span class="n">StringType</span>
|
|
<span class="kn">from</span> <span class="nn">pyspark.sql</span> <span class="kn">import</span> <span class="n">SparkSession</span>
|
|
</pre></div></td></tr></table></div>
|
|
|
|
</div>
|
|
</li><li id="dli_09_0097__en-us_topic_0200509991_li618044220189">Create a session.<div class="codecoloring" codetype="Python" id="dli_09_0097__en-us_topic_0200509991_screen18318195741816"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span></pre></div></td><td class="code"><div><pre><span></span><span class="n">sparkSession</span> <span class="o">=</span> <span class="n">SparkSession</span><span class="o">.</span><span class="n">builder</span><span class="o">.</span><span class="n">appName</span><span class="p">(</span><span class="s2">"datasource-redis"</span><span class="p">)</span><span class="o">.</span><span class="n">getOrCreate</span><span class="p">()</span>
|
|
</pre></div></td></tr></table></div>
|
|
|
|
</div>
|
|
</li><li id="dli_09_0097__en-us_topic_0200509991_li362510127192">Set connection parameters.<div class="codecoloring" codetype="Python" id="dli_09_0097__en-us_topic_0200509991_screen711517279193"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span>
|
|
<span class="normal">2</span>
|
|
<span class="normal">3</span>
|
|
<span class="normal">4</span></pre></div></td><td class="code"><div><pre><span></span><span class="n">host</span> <span class="o">=</span> <span class="s2">"192.168.4.199"</span>
|
|
<span class="n">port</span> <span class="o">=</span> <span class="s2">"6379"</span>
|
|
<span class="n">table</span> <span class="o">=</span> <span class="s2">"person"</span>
|
|
<span class="n">auth</span> <span class="o">=</span> <span class="s2">"@@@@@@"</span>
|
|
</pre></div></td></tr></table></div>
|
|
|
|
</div>
|
|
</li><li id="dli_09_0097__en-us_topic_0200509991_li4741141821116">Create a DataFrame.<ol type="a" id="dli_09_0097__en-us_topic_0200509991_ol1998664172010"><li id="dli_09_0097__en-us_topic_0200509991_li7987641202012">Method 1:<div class="codecoloring" codetype="Python" id="dli_09_0097__en-us_topic_0200509991_screen111295412013"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span>
|
|
<span class="normal">2</span>
|
|
<span class="normal">3</span>
|
|
<span class="normal">4</span>
|
|
<span class="normal">5</span></pre></div></td><td class="code"><div><pre><span></span><span class="n">dataList</span> <span class="o">=</span> <span class="n">sparkSession</span><span class="o">.</span><span class="n">sparkContext</span><span class="o">.</span><span class="n">parallelize</span><span class="p">([(</span><span class="mi">1</span><span class="p">,</span> <span class="s2">"Katie"</span><span class="p">,</span> <span class="mi">19</span><span class="p">),(</span><span class="mi">2</span><span class="p">,</span><span class="s2">"Tom"</span><span class="p">,</span><span class="mi">20</span><span class="p">)])</span>
|
|
<span class="n">schema</span> <span class="o">=</span> <span class="n">StructType</span><span class="p">([</span><span class="n">StructField</span><span class="p">(</span><span class="s2">"id"</span><span class="p">,</span> <span class="n">IntegerType</span><span class="p">(),</span> <span class="kc">False</span><span class="p">),</span>
|
|
<span class="n">StructField</span><span class="p">(</span><span class="s2">"name"</span><span class="p">,</span> <span class="n">StringType</span><span class="p">(),</span> <span class="kc">False</span><span class="p">),</span>
|
|
<span class="n">StructField</span><span class="p">(</span><span class="s2">"age"</span><span class="p">,</span> <span class="n">IntegerType</span><span class="p">(),</span> <span class="kc">False</span><span class="p">)])</span>
|
|
<span class="n">dataFrame</span> <span class="o">=</span> <span class="n">sparkSession</span><span class="o">.</span><span class="n">createDataFrame</span><span class="p">(</span><span class="n">dataList</span><span class="p">,</span> <span class="n">schema</span><span class="p">)</span>
|
|
</pre></div></td></tr></table></div>
|
|
|
|
</div>
|
|
</li><li id="dli_09_0097__en-us_topic_0200509991_li167287393215">Method 2:<div class="codecoloring" codetype="Python" id="dli_09_0097__en-us_topic_0200509991_screen1243834818218"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span>
|
|
<span class="normal">2</span></pre></div></td><td class="code"><div><pre><span></span><span class="n">jdbcDF</span> <span class="o">=</span> <span class="n">sparkSession</span><span class="o">.</span><span class="n">createDataFrame</span><span class="p">([(</span><span class="mi">3</span><span class="p">,</span><span class="s2">"Jack"</span><span class="p">,</span> <span class="mi">23</span><span class="p">)])</span>
|
|
<span class="n">dataFrame</span> <span class="o">=</span> <span class="n">jdbcDF</span><span class="o">.</span><span class="n">withColumnRenamed</span><span class="p">(</span><span class="s2">"_1"</span><span class="p">,</span> <span class="s2">"id"</span><span class="p">)</span><span class="o">.</span><span class="n">withColumnRenamed</span><span class="p">(</span><span class="s2">"_2"</span><span class="p">,</span> <span class="s2">"name"</span><span class="p">)</span><span class="o">.</span><span class="n">withColumnRenamed</span><span class="p">(</span><span class="s2">"_3"</span><span class="p">,</span> <span class="s2">"age"</span><span class="p">)</span>
|
|
</pre></div></td></tr></table></div>
|
|
|
|
</div>
|
|
</li></ol>
|
|
</li><li id="dli_09_0097__en-us_topic_0200509991_li10107173841110">Import data to Redis.<div class="codecoloring" codetype="Python" id="dli_09_0097__en-us_topic_0200509991_screen10458192419221"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span>
|
|
<span class="normal">2</span>
|
|
<span class="normal">3</span>
|
|
<span class="normal">4</span>
|
|
<span class="normal">5</span>
|
|
<span class="normal">6</span>
|
|
<span class="normal">7</span>
|
|
<span class="normal">8</span></pre></div></td><td class="code"><div><pre><span></span><span class="n">dataFrame</span><span class="o">.</span><span class="n">write</span>
|
|
<span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="s2">"redis"</span><span class="p">)</span>\
|
|
<span class="o">.</span><span class="n">option</span><span class="p">(</span><span class="s2">"host"</span><span class="p">,</span> <span class="n">host</span><span class="p">)</span>\
|
|
<span class="o">.</span><span class="n">option</span><span class="p">(</span><span class="s2">"port"</span><span class="p">,</span> <span class="n">port</span><span class="p">)</span>\
|
|
<span class="o">.</span><span class="n">option</span><span class="p">(</span><span class="s2">"table"</span><span class="p">,</span> <span class="n">table</span><span class="p">)</span>\
|
|
<span class="o">.</span><span class="n">option</span><span class="p">(</span><span class="s2">"password"</span><span class="p">,</span> <span class="n">auth</span><span class="p">)</span>\
|
|
<span class="o">.</span><span class="n">mode</span><span class="p">(</span><span class="s2">"Overwrite"</span><span class="p">)</span>\
|
|
<span class="o">.</span><span class="n">save</span><span class="p">()</span>
|
|
</pre></div></td></tr></table></div>
|
|
|
|
</div>
|
|
<div class="note" id="dli_09_0097__en-us_topic_0200509991_note25701844152517"><img src="public_sys-resources/note_3.0-en-us.png"><span class="notetitle"> </span><div class="notebody"><ul id="dli_09_0097__en-us_topic_0200509991_ul378316544259"><li id="dli_09_0097__en-us_topic_0200509991_li626991818615">The options of <strong id="dli_09_0097__b19104133111712">mode</strong> are <strong id="dli_09_0097__b12104113119713">Overwrite</strong>, <strong id="dli_09_0097__b1105103115710">Append</strong>, <strong id="dli_09_0097__b101053311977">ErrorIfExis</strong>, and <strong id="dli_09_0097__b31050311712">Ignore</strong>.</li><li id="dli_09_0097__en-us_topic_0200509991_li2858111015261">To specify a key, use <strong id="dli_09_0097__en-us_topic_0200509991_b1577745121911">.option("key.column", "name")</strong>. <strong id="dli_09_0097__en-us_topic_0200509991_b247385619196">name</strong> indicates the column name.</li><li id="dli_09_0097__en-us_topic_0200509991_li12261202720614">To save nested DataFrames, use <strong id="dli_09_0097__en-us_topic_0200509991_b9225942191910">.option("model", "binary")</strong>.</li><li id="dli_09_0097__en-us_topic_0200509991_li3542271091">If you need to specify the data expiration time, use <strong id="dli_09_0097__en-us_topic_0200509991_b175839582011">.option("ttl", 1000)</strong>. The unit is second.</li></ul>
|
|
</div></div>
|
|
</li><li id="dli_09_0097__en-us_topic_0200509991_li1956013474119">Read data from Redis.<div class="codecoloring" codetype="Python" id="dli_09_0097__en-us_topic_0200509991_screen7259610133515"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span></pre></div></td><td class="code"><div><pre><span></span><span class="n">sparkSession</span><span class="o">.</span><span class="n">read</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="s2">"redis"</span><span class="p">)</span><span class="o">.</span><span class="n">option</span><span class="p">(</span><span class="s2">"host"</span><span class="p">,</span> <span class="n">host</span><span class="p">)</span><span class="o">.</span><span class="n">option</span><span class="p">(</span><span class="s2">"port"</span><span class="p">,</span> <span class="n">port</span><span class="p">)</span><span class="o">.</span><span class="n">option</span><span class="p">(</span><span class="s2">"table"</span><span class="p">,</span> <span class="n">table</span><span class="p">)</span><span class="o">.</span><span class="n">option</span><span class="p">(</span><span class="s2">"password"</span><span class="p">,</span> <span class="n">auth</span><span class="p">)</span><span class="o">.</span><span class="n">load</span><span class="p">()</span><span class="o">.</span><span class="n">show</span><span class="p">()</span>
|
|
</pre></div></td></tr></table></div>
|
|
|
|
</div>
|
|
</li><li id="dli_09_0097__en-us_topic_0200509991_li106413175281">View the operation result.<p id="dli_09_0097__en-us_topic_0200509991_p156515302222"><a name="dli_09_0097__en-us_topic_0200509991_li106413175281"></a><a name="en-us_topic_0200509991_li106413175281"></a><span><img id="dli_09_0097__en-us_topic_0200509991_image16565183019228" src="en-us_image_0223997787.png"></span></p>
|
|
</li></ol>
|
|
</li><li id="dli_09_0097__li028895510140">Connecting to data sources through SQL APIs<ol id="dli_09_0097__dli_09_0094_en-us_topic_0200509988_ol11291435145014"><li id="dli_09_0097__dli_09_0094_en-us_topic_0200509988_li7129135105012">Create a table to connect to a Redis data source.<pre class="screen" id="dli_09_0097__dli_09_0094_screen1812862316408">sparkSession.sql(
|
|
"CREATE TEMPORARY VIEW person (name STRING, age INT) USING org.apache.spark.sql.redis OPTIONS (
|
|
'host' = '192.168.4.199',
|
|
'port' = '6379',
|
|
'password' = '######',
|
|
table 'person')".stripMargin)</pre>
|
|
</li><li id="dli_09_0097__dli_09_0094_en-us_topic_0200509988_li08868382528">Insert data.<div class="codecoloring" codetype="Scala" id="dli_09_0097__dli_09_0094_en-us_topic_0200509988_screen82431853135210"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span></pre></div></td><td class="code"><div><pre><span></span><span class="n">sparkSession</span><span class="p">.</span><span class="n">sql</span><span class="p">(</span><span class="s">"INSERT INTO TABLE person VALUES ('John', 30),('Peter', 45)"</span><span class="p">.</span><span class="n">stripMargin</span><span class="p">)</span>
|
|
</pre></div></td></tr></table></div>
|
|
|
|
</div>
|
|
</li><li id="dli_09_0097__dli_09_0094_en-us_topic_0200509988_li1777420614539">Query data.<div class="codecoloring" codetype="Scala" id="dli_09_0097__dli_09_0094_en-us_topic_0200509988_screen7922121675310"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span></pre></div></td><td class="code"><div><pre><span></span><span class="n">sparkSession</span><span class="p">.</span><span class="n">sql</span><span class="p">(</span><span class="s">"SELECT * FROM person"</span><span class="p">.</span><span class="n">stripMargin</span><span class="p">).</span><span class="n">collect</span><span class="p">().</span><span class="n">foreach</span><span class="p">(</span><span class="n">println</span><span class="p">)</span>
|
|
</pre></div></td></tr></table></div>
|
|
|
|
</div>
|
|
</li></ol>
|
|
</li><li id="dli_09_0097__li1416922071513">Submitting a Spark job<ol id="dli_09_0097__en-us_topic_0200509991_ol612481914610"><li id="dli_09_0097__li6755131715306">Upload the Python code file to DLI.<p id="dli_09_0097__p199721221163017"><a name="dli_09_0097__li6755131715306"></a><a name="li6755131715306"></a></p>
|
|
<p id="dli_09_0097__p1358161903019"></p>
|
|
</li><li id="dli_09_0097__li12901527153012">In the Spark job editor, select the corresponding dependency module and execute the Spark job.<p id="dli_09_0097__p2026693115303"><a name="dli_09_0097__li12901527153012"></a><a name="li12901527153012"></a></p>
|
|
<div class="p" id="dli_09_0097__p10346828113019"><div class="note" id="dli_09_0097__en-us_topic_0200509991_note1435543551919"><img src="public_sys-resources/note_3.0-en-us.png"><span class="notetitle"> </span><div class="notebody"><ul id="dli_09_0097__en-us_topic_0200509991_ul17825285811"><li id="dli_09_0097__en-us_topic_0197738142_li58215295819">If the Spark version is 2.3.2 (will be offline soon) or 2.4.5, specify the <strong id="dli_09_0097__b421204704105239">Module</strong> to <strong id="dli_09_0097__b321834206105239">sys.datasource.redis</strong> when you submit a job.</li><li id="dli_09_0097__li6624653171317">If the Spark version is 3.1.1, you do not need to select a module. Configure <strong id="dli_09_0097__b2083193281105239">Spark parameters (--conf)</strong>.<p id="dli_09_0097__p292175763011">spark.driver.extraClassPath=/usr/share/extension/dli/spark-jar/datasource/redis/*</p>
|
|
<p id="dli_09_0097__p392165713308">spark.executor.extraClassPath=/usr/share/extension/dli/spark-jar/datasource/redis/*</p>
|
|
</li></ul>
|
|
</div></div>
|
|
</div>
|
|
</li></ol>
|
|
</li></ul>
|
|
</div>
|
|
<div class="section" id="dli_09_0097__section9305165121520"><h4 class="sectiontitle">Complete Example Code</h4><ul id="dli_09_0097__ul1792013071619"><li id="dli_09_0097__li109206021620">Connecting to data sources through DataFrame APIs<div class="codecoloring" codetype="Python" id="dli_09_0097__en-us_topic_0200509992_screen1239532192219"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal"> 1</span>
|
|
<span class="normal"> 2</span>
|
|
<span class="normal"> 3</span>
|
|
<span class="normal"> 4</span>
|
|
<span class="normal"> 5</span>
|
|
<span class="normal"> 6</span>
|
|
<span class="normal"> 7</span>
|
|
<span class="normal"> 8</span>
|
|
<span class="normal"> 9</span>
|
|
<span class="normal">10</span>
|
|
<span class="normal">11</span>
|
|
<span class="normal">12</span>
|
|
<span class="normal">13</span>
|
|
<span class="normal">14</span>
|
|
<span class="normal">15</span>
|
|
<span class="normal">16</span>
|
|
<span class="normal">17</span>
|
|
<span class="normal">18</span>
|
|
<span class="normal">19</span>
|
|
<span class="normal">20</span>
|
|
<span class="normal">21</span>
|
|
<span class="normal">22</span>
|
|
<span class="normal">23</span>
|
|
<span class="normal">24</span>
|
|
<span class="normal">25</span>
|
|
<span class="normal">26</span>
|
|
<span class="normal">27</span>
|
|
<span class="normal">28</span>
|
|
<span class="normal">29</span>
|
|
<span class="normal">30</span>
|
|
<span class="normal">31</span></pre></div></td><td class="code"><div><pre><span></span><span class="c1"># _*_ coding: utf-8 _*_</span>
|
|
<span class="kn">from</span> <span class="nn">__future__</span> <span class="kn">import</span> <span class="n">print_function</span>
|
|
<span class="kn">from</span> <span class="nn">pyspark.sql.types</span> <span class="kn">import</span> <span class="n">StructType</span><span class="p">,</span> <span class="n">StructField</span><span class="p">,</span> <span class="n">IntegerType</span><span class="p">,</span> <span class="n">StringType</span>
|
|
<span class="kn">from</span> <span class="nn">pyspark.sql</span> <span class="kn">import</span> <span class="n">SparkSession</span>
|
|
<span class="k">if</span> <span class="vm">__name__</span> <span class="o">==</span> <span class="s2">"__main__"</span><span class="p">:</span>
|
|
<span class="c1"># Create a SparkSession session. </span>
|
|
<span class="n">sparkSession</span> <span class="o">=</span> <span class="n">SparkSession</span><span class="o">.</span><span class="n">builder</span><span class="o">.</span><span class="n">appName</span><span class="p">(</span><span class="s2">"datasource-redis"</span><span class="p">)</span><span class="o">.</span><span class="n">getOrCreate</span><span class="p">()</span>
|
|
|
|
<span class="c1"># Set cross-source connection parameters.</span>
|
|
<span class="n">host</span> <span class="o">=</span> <span class="s2">"192.168.4.199"</span>
|
|
<span class="n">port</span> <span class="o">=</span> <span class="s2">"6379"</span>
|
|
<span class="n">table</span> <span class="o">=</span> <span class="s2">"person"</span>
|
|
<span class="n">auth</span> <span class="o">=</span> <span class="s2">"######"</span>
|
|
|
|
<span class="c1"># Create a DataFrame and initialize the DataFrame data. </span>
|
|
<span class="c1"># ******* method noe ********* </span>
|
|
<span class="n">dataList</span> <span class="o">=</span> <span class="n">sparkSession</span><span class="o">.</span><span class="n">sparkContext</span><span class="o">.</span><span class="n">parallelize</span><span class="p">([(</span><span class="mi">1</span><span class="p">,</span> <span class="s2">"Katie"</span><span class="p">,</span> <span class="mi">19</span><span class="p">),(</span><span class="mi">2</span><span class="p">,</span><span class="s2">"Tom"</span><span class="p">,</span><span class="mi">20</span><span class="p">)])</span>
|
|
<span class="n">schema</span> <span class="o">=</span> <span class="n">StructType</span><span class="p">([</span><span class="n">StructField</span><span class="p">(</span><span class="s2">"id"</span><span class="p">,</span> <span class="n">IntegerType</span><span class="p">(),</span> <span class="kc">False</span><span class="p">),</span><span class="n">StructField</span><span class="p">(</span><span class="s2">"name"</span><span class="p">,</span> <span class="n">StringType</span><span class="p">(),</span> <span class="kc">False</span><span class="p">),</span><span class="n">StructField</span><span class="p">(</span><span class="s2">"age"</span><span class="p">,</span> <span class="n">IntegerType</span><span class="p">(),</span> <span class="kc">False</span><span class="p">)])</span>
|
|
<span class="n">dataFrame_one</span> <span class="o">=</span> <span class="n">sparkSession</span><span class="o">.</span><span class="n">createDataFrame</span><span class="p">(</span><span class="n">dataList</span><span class="p">,</span> <span class="n">schema</span><span class="p">)</span>
|
|
|
|
<span class="c1"># ****** method two ****** </span>
|
|
<span class="c1"># jdbcDF = sparkSession.createDataFrame([(3,"Jack", 23)])</span>
|
|
<span class="c1"># dataFrame = jdbcDF.withColumnRenamed("_1", "id").withColumnRenamed("_2", "name").withColumnRenamed("_3", "age")</span>
|
|
|
|
<span class="c1"># Write data to the redis table </span>
|
|
<span class="n">dataFrame</span><span class="o">.</span><span class="n">write</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="s2">"redis"</span><span class="p">)</span><span class="o">.</span><span class="n">option</span><span class="p">(</span><span class="s2">"host"</span><span class="p">,</span> <span class="n">host</span><span class="p">)</span><span class="o">.</span><span class="n">option</span><span class="p">(</span><span class="s2">"port"</span><span class="p">,</span> <span class="n">port</span><span class="p">)</span><span class="o">.</span><span class="n">option</span><span class="p">(</span><span class="s2">"table"</span><span class="p">,</span> <span class="n">table</span><span class="p">)</span><span class="o">.</span><span class="n">option</span><span class="p">(</span><span class="s2">"password"</span><span class="p">,</span> <span class="n">auth</span><span class="p">)</span><span class="o">.</span><span class="n">mode</span><span class="p">(</span><span class="s2">"Overwrite"</span><span class="p">)</span><span class="o">.</span><span class="n">save</span><span class="p">()</span>
|
|
<span class="c1"># Read data </span>
|
|
<span class="n">sparkSession</span><span class="o">.</span><span class="n">read</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="s2">"redis"</span><span class="p">)</span><span class="o">.</span><span class="n">option</span><span class="p">(</span><span class="s2">"host"</span><span class="p">,</span> <span class="n">host</span><span class="p">)</span><span class="o">.</span><span class="n">option</span><span class="p">(</span><span class="s2">"port"</span><span class="p">,</span> <span class="n">port</span><span class="p">)</span><span class="o">.</span><span class="n">option</span><span class="p">(</span><span class="s2">"table"</span><span class="p">,</span> <span class="n">table</span><span class="p">)</span><span class="o">.</span><span class="n">option</span><span class="p">(</span><span class="s2">"password"</span><span class="p">,</span> <span class="n">auth</span><span class="p">)</span><span class="o">.</span><span class="n">load</span><span class="p">()</span><span class="o">.</span><span class="n">show</span><span class="p">()</span>
|
|
|
|
<span class="c1"># close session </span>
|
|
<span class="n">sparkSession</span><span class="o">.</span><span class="n">stop</span><span class="p">()</span>
|
|
</pre></div></td></tr></table></div>
|
|
|
|
</div>
|
|
</li><li id="dli_09_0097__li8634551141620">Connecting to data sources through SQL APIs<div class="codecoloring" codetype="Python" id="dli_09_0097__en-us_topic_0200509992_screen1994116224814"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal"> 1</span>
|
|
<span class="normal"> 2</span>
|
|
<span class="normal"> 3</span>
|
|
<span class="normal"> 4</span>
|
|
<span class="normal"> 5</span>
|
|
<span class="normal"> 6</span>
|
|
<span class="normal"> 7</span>
|
|
<span class="normal"> 8</span>
|
|
<span class="normal"> 9</span>
|
|
<span class="normal">10</span>
|
|
<span class="normal">11</span>
|
|
<span class="normal">12</span>
|
|
<span class="normal">13</span>
|
|
<span class="normal">14</span>
|
|
<span class="normal">15</span>
|
|
<span class="normal">16</span>
|
|
<span class="normal">17</span>
|
|
<span class="normal">18</span>
|
|
<span class="normal">19</span>
|
|
<span class="normal">20</span>
|
|
<span class="normal">21</span></pre></div></td><td class="code"><div><pre><span></span><span class="c1"># _*_ coding: utf-8 _*_</span>
|
|
<span class="kn">from</span> <span class="nn">__future__</span> <span class="kn">import</span> <span class="n">print_function</span>
|
|
<span class="kn">from</span> <span class="nn">pyspark.sql</span> <span class="kn">import</span> <span class="n">SparkSession</span>
|
|
|
|
<span class="k">if</span> <span class="vm">__name__</span> <span class="o">==</span> <span class="s2">"__main__"</span><span class="p">:</span>
|
|
<span class="c1"># Create a SparkSession </span>
|
|
<span class="n">sparkSession</span> <span class="o">=</span> <span class="n">SparkSession</span><span class="o">.</span><span class="n">builder</span><span class="o">.</span><span class="n">appName</span><span class="p">(</span><span class="s2">"datasource_redis"</span><span class="p">)</span><span class="o">.</span><span class="n">getOrCreate</span><span class="p">()</span>
|
|
|
|
<span class="n">sparkSession</span><span class="o">.</span><span class="n">sql</span><span class="p">(</span>
|
|
<span class="s2">"CREATE TEMPORARY VIEW person (name STRING, age INT) USING org.apache.spark.sql.redis OPTIONS (</span><span class="se">\</span>
|
|
<span class="s2"> 'host' = '192.168.4.199', </span><span class="se">\</span>
|
|
<span class="s2"> 'port' = '6379',</span><span class="se">\</span>
|
|
<span class="s2"> 'password' = '######',</span><span class="se">\</span>
|
|
<span class="s2"> 'table'= 'person')"</span><span class="o">.</span><span class="n">stripMargin</span><span class="p">);</span>
|
|
|
|
<span class="n">sparkSession</span><span class="o">.</span><span class="n">sql</span><span class="p">(</span><span class="s2">"INSERT INTO TABLE person VALUES ('John', 30),('Peter', 45)"</span><span class="o">.</span><span class="n">stripMargin</span><span class="p">)</span>
|
|
|
|
<span class="n">sparkSession</span><span class="o">.</span><span class="n">sql</span><span class="p">(</span><span class="s2">"SELECT * FROM person"</span><span class="o">.</span><span class="n">stripMargin</span><span class="p">)</span><span class="o">.</span><span class="n">collect</span><span class="p">()</span><span class="o">.</span><span class="n">foreach</span><span class="p">(</span><span class="n">println</span><span class="p">)</span>
|
|
|
|
<span class="c1"># close session </span>
|
|
<span class="n">sparkSession</span><span class="o">.</span><span class="n">stop</span><span class="p">()</span>
|
|
</pre></div></td></tr></table></div>
|
|
|
|
</div>
|
|
</li></ul>
|
|
</div>
|
|
</div>
|
|
<div>
|
|
<div class="familylinks">
|
|
<div class="parentlink"><strong>Parent topic:</strong> <a href="dli_09_0093.html">Connecting to Redis</a></div>
|
|
</div>
|
|
</div>
|
|
|