Files
doc-exports/docs/dli/dev/dli_09_0110.html
Hasko, Vladimir cfc48b3aed dli_dev_0104_version
Reviewed-by: Pruthi, Vineet <vineet.pruthi@t-systems.com>
Co-authored-by: Hasko, Vladimir <vladimir.hasko@t-systems.com>
Co-committed-by: Hasko, Vladimir <vladimir.hasko@t-systems.com>
2024-05-06 09:14:57 +00:00

203 lines
23 KiB
HTML

<a name="dli_09_0110"></a><a name="dli_09_0110"></a>
<h1 class="topictitle1">Java Example Code</h1>
<div id="body8662426"><div class="section" id="dli_09_0110__section52015212350"><h4 class="sectiontitle">Development Description</h4><p id="dli_09_0110__en-us_topic_0204097190_p492312464537">Mongo can be connected only through enhanced datasource connections. </p>
<div class="note" id="dli_09_0110__note12343132893511"><img src="public_sys-resources/note_3.0-en-us.png"><span class="notetitle"> </span><div class="notebody"><p id="dli_09_0110__p1734422863515">DDS is compatible with the MongoDB protocol.</p>
</div></div>
<ul id="dli_09_0110__ul12115629153613"><li id="dli_09_0110__li14116829103617">Prerequisites<p id="dli_09_0110__p6629155314372"><a name="dli_09_0110__li14116829103617"></a><a name="li14116829103617"></a>An enhanced datasource connection has been created on the DLI management console and bound to a queue in packages. </p>
<div class="note" id="dli_09_0110__note1358715714155"><img src="public_sys-resources/note_3.0-en-us.png"><span class="notetitle"> </span><div class="notebody"><p id="dli_09_0110__p1858718570154">Hard-coded or plaintext passwords pose significant security risks. To ensure security, encrypt your passwords, store them in configuration files or environment variables, and decrypt them when needed.</p>
</div></div>
</li><li id="dli_09_0110__li18191192112388">Code implementation<ol id="dli_09_0110__en-us_topic_0204097190_ol12123050181818"><li id="dli_09_0110__li2022074516506">Import dependencies.<ul id="dli_09_0110__ul15742204125119"><li id="dli_09_0110__li145285157562">Maven dependency involved<div class="codecoloring" codetype="Scala" id="dli_09_0110__en-us_topic_0190647826_screen5760163172012"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span>
<span class="normal">2</span>
<span class="normal">3</span>
<span class="normal">4</span>
<span class="normal">5</span></pre></div></td><td class="code"><div><pre><span></span><span class="o">&lt;</span><span class="n">dependency</span><span class="o">&gt;</span>
<span class="w"> </span><span class="o">&lt;</span><span class="n">groupId</span><span class="o">&gt;</span><span class="n">org</span><span class="p">.</span><span class="n">apache</span><span class="p">.</span><span class="n">spark</span><span class="o">&lt;/</span><span class="n">groupId</span><span class="o">&gt;</span>
<span class="w"> </span><span class="o">&lt;</span><span class="n">artifactId</span><span class="o">&gt;</span><span class="n">spark</span><span class="o">-</span><span class="n">sql_2</span><span class="mf">.11</span><span class="o">&lt;/</span><span class="n">artifactId</span><span class="o">&gt;</span>
<span class="w"> </span><span class="o">&lt;</span><span class="n">version</span><span class="o">&gt;</span><span class="mf">2.3.2</span><span class="o">&lt;/</span><span class="n">version</span><span class="o">&gt;</span>
<span class="o">&lt;/</span><span class="n">dependency</span><span class="o">&gt;</span>
</pre></div></td></tr></table></div>
</div>
</li><li id="dli_09_0110__li139701324135215">Import dependency packages.<pre class="screen" id="dli_09_0110__screen16454165511509">import org.apache.spark.SparkConf;
import org.apache.spark.SparkContext;
import org.apache.spark.api.java.JavaRDD;
import org.apache.spark.api.java.JavaSparkContext;
import org.apache.spark.sql.Dataset;
import org.apache.spark.sql.Row;
import org.apache.spark.sql.SQLContext;
import org.apache.spark.sql.SaveMode;</pre>
</li></ul>
</li><li id="dli_09_0110__en-us_topic_0204097190_li11272141817195">Create a session.<div class="codecoloring" codetype="Java" id="dli_09_0110__en-us_topic_0204097190_screen2658132002217"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span>
<span class="normal">2</span>
<span class="normal">3</span></pre></div></td><td class="code"><div><pre><span></span><span class="n">SparkContext</span><span class="w"> </span><span class="n">sparkContext</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">new</span><span class="w"> </span><span class="n">SparkContext</span><span class="p">(</span><span class="k">new</span><span class="w"> </span><span class="n">SparkConf</span><span class="p">().</span><span class="na">setAppName</span><span class="p">(</span><span class="s">&quot;datasource-mongo&quot;</span><span class="p">));</span>
<span class="n">JavaSparkContext</span><span class="w"> </span><span class="n">javaSparkContext</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">new</span><span class="w"> </span><span class="n">JavaSparkContext</span><span class="p">(</span><span class="n">sparkContext</span><span class="p">);</span>
<span class="n">SQLContext</span><span class="w"> </span><span class="n">sqlContext</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">new</span><span class="w"> </span><span class="n">SQLContext</span><span class="p">(</span><span class="n">javaSparkContext</span><span class="p">);</span>
</pre></div></td></tr></table></div>
</div>
</li></ol>
</li><li id="dli_09_0110__li19777165511388">Connecting to data sources through DataFrame APIs<ol id="dli_09_0110__en-us_topic_0204097190_ol62934313101"><li id="dli_09_0110__en-us_topic_0204097190_li4293143141018">Read JSON data as DataFrames.<pre class="screen" id="dli_09_0110__en-us_topic_0204097190_screen16345156123915">JavaRDD&lt;String&gt; javaRDD = javaSparkContext.parallelize(Arrays.asList("{\"id\":\"5\",\"name\":\"Ann\",\"age\":\"23\"}"));
Dataset&lt;Row&gt; dataFrame = sqlContext.read().json(javaRDD);</pre>
</li><li id="dli_09_0110__en-us_topic_0204097190_li531012517114">Set connection parameters.<pre class="screen" id="dli_09_0110__en-us_topic_0204097190_screen642017379401">String url = "192.168.4.62:8635,192.168.5.134:8635/test?authSource=admin";
String uri = "mongodb://username:pwd@host:8635/db";
String user = "rwuser";
String database = "test";
String collection = "test";
String password = "######";</pre>
<div class="note" id="dli_09_0110__note5393153410317"><img src="public_sys-resources/note_3.0-en-us.png"><span class="notetitle"> </span><div class="notebody"><p id="dli_09_0110__p139353416314">For details about the parameters, see <a href="dli_09_0114.html#dli_09_0114__en-us_topic_0204096844_table2072415395012">Table 1</a>.</p>
</div></div>
</li><li id="dli_09_0110__en-us_topic_0204097190_li10107173841110">Import data to Mongo.<pre class="screen" id="dli_09_0110__en-us_topic_0204097190_screen2799193814116">dataFrame.write().format("mongo")
.option("url",url)
.option("uri",uri)
.option("database",database)
.option("collection",collection)
.option("user",user)
.option("password",password)
.mode(SaveMode.Overwrite)
.save();</pre>
</li><li id="dli_09_0110__en-us_topic_0204097190_li1956013474119">Read data from Mongo.<div class="codecoloring" codetype="Java" id="dli_09_0110__en-us_topic_0204097190_screen7259610133515"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span>
<span class="normal">2</span>
<span class="normal">3</span>
<span class="normal">4</span>
<span class="normal">5</span>
<span class="normal">6</span>
<span class="normal">7</span>
<span class="normal">8</span></pre></div></td><td class="code"><div><pre><span></span><span class="n">sqlContext</span><span class="p">.</span><span class="na">read</span><span class="p">().</span><span class="na">format</span><span class="p">(</span><span class="s">&quot;mongo&quot;</span><span class="p">)</span><span class="w"> </span>
<span class="w"> </span><span class="p">.</span><span class="na">option</span><span class="p">(</span><span class="s">&quot;url&quot;</span><span class="p">,</span><span class="n">url</span><span class="p">)</span><span class="w"> </span>
<span class="w"> </span><span class="p">.</span><span class="na">option</span><span class="p">(</span><span class="s">&quot;uri&quot;</span><span class="p">,</span><span class="n">uri</span><span class="p">)</span>
<span class="w"> </span><span class="p">.</span><span class="na">option</span><span class="p">(</span><span class="s">&quot;database&quot;</span><span class="p">,</span><span class="n">database</span><span class="p">)</span><span class="w"> </span>
<span class="w"> </span><span class="p">.</span><span class="na">option</span><span class="p">(</span><span class="s">&quot;collection&quot;</span><span class="p">,</span><span class="n">collection</span><span class="p">)</span><span class="w"> </span>
<span class="w"> </span><span class="p">.</span><span class="na">option</span><span class="p">(</span><span class="s">&quot;user&quot;</span><span class="p">,</span><span class="n">user</span><span class="p">)</span><span class="w"> </span>
<span class="w"> </span><span class="p">.</span><span class="na">option</span><span class="p">(</span><span class="s">&quot;password&quot;</span><span class="p">,</span><span class="n">password</span><span class="p">)</span><span class="w"> </span>
<span class="w"> </span><span class="p">.</span><span class="na">load</span><span class="p">().</span><span class="na">show</span><span class="p">();</span>
</pre></div></td></tr></table></div>
</div>
</li></ol>
</li><li id="dli_09_0110__li1841981594019">Submitting a Spark job<ol id="dli_09_0110__ol578984916407"><li id="dli_09_0110__li12391104515287">Upload the Java code file to DLI.<p id="dli_09_0110__p1168625072818"><a name="dli_09_0110__li12391104515287"></a><a name="li12391104515287"></a></p>
<p id="dli_09_0110__p16364174632815"></p>
</li><li id="dli_09_0110__li665635342817">In the Spark job editor, select the corresponding dependency module and execute the Spark job.<p id="dli_09_0110__p7662105816286"><a name="dli_09_0110__li665635342817"></a><a name="li665635342817"></a></p>
<div class="p" id="dli_09_0110__p13485755192819"><div class="note" id="dli_09_0110__note1979064934010"><img src="public_sys-resources/note_3.0-en-us.png"><span class="notetitle"> </span><div class="notebody"><ul id="dli_09_0110__ul12790184917402"><li id="dli_09_0110__en-us_topic_0197738142_li58215295819">If the Spark version is 2.3.2 (will be offline soon) or 2.4.5, specify the <strong id="dli_09_0110__b2116599581105243">Module</strong> to <strong id="dli_09_0110__b1174158955105243">sys.datasource.mongo</strong> when you submit a job.</li><li id="dli_09_0110__li6624653171317">If the Spark version is 3.1.1, you do not need to select a module. Configure <strong id="dli_09_0110__b62931823135512">Spark parameters (--conf)</strong>.<p id="dli_09_0110__p1520611118290">spark.driver.extraClassPath=/usr/share/extension/dli/spark-jar/datasource/mongo/*</p>
<p id="dli_09_0110__p182061411152917">spark.executor.extraClassPath=/usr/share/extension/dli/spark-jar/datasource/mongo/*</p>
</li></ul>
</div></div>
</div>
</li></ol>
</li></ul>
</div>
<div class="section" id="dli_09_0110__section1691403213449"><h4 class="sectiontitle">Complete Example Code</h4><div class="codecoloring" codetype="Java" id="dli_09_0110__en-us_topic_0204097191_screen1659318340448"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal"> 1</span>
<span class="normal"> 2</span>
<span class="normal"> 3</span>
<span class="normal"> 4</span>
<span class="normal"> 5</span>
<span class="normal"> 6</span>
<span class="normal"> 7</span>
<span class="normal"> 8</span>
<span class="normal"> 9</span>
<span class="normal">10</span>
<span class="normal">11</span>
<span class="normal">12</span>
<span class="normal">13</span>
<span class="normal">14</span>
<span class="normal">15</span>
<span class="normal">16</span>
<span class="normal">17</span>
<span class="normal">18</span>
<span class="normal">19</span>
<span class="normal">20</span>
<span class="normal">21</span>
<span class="normal">22</span>
<span class="normal">23</span>
<span class="normal">24</span>
<span class="normal">25</span>
<span class="normal">26</span>
<span class="normal">27</span>
<span class="normal">28</span>
<span class="normal">29</span>
<span class="normal">30</span>
<span class="normal">31</span>
<span class="normal">32</span>
<span class="normal">33</span>
<span class="normal">34</span>
<span class="normal">35</span>
<span class="normal">36</span>
<span class="normal">37</span>
<span class="normal">38</span>
<span class="normal">39</span>
<span class="normal">40</span>
<span class="normal">41</span>
<span class="normal">42</span>
<span class="normal">43</span>
<span class="normal">44</span>
<span class="normal">45</span>
<span class="normal">46</span>
<span class="normal">47</span>
<span class="normal">48</span>
<span class="normal">49</span>
<span class="normal">50</span>
<span class="normal">51</span>
<span class="normal">52</span></pre></div></td><td class="code"><div><pre><span></span><span class="kn">import</span><span class="w"> </span><span class="nn">org.apache.spark.SparkConf</span><span class="p">;</span>
<span class="kn">import</span><span class="w"> </span><span class="nn">org.apache.spark.SparkContext</span><span class="p">;</span>
<span class="kn">import</span><span class="w"> </span><span class="nn">org.apache.spark.api.java.JavaRDD</span><span class="p">;</span>
<span class="kn">import</span><span class="w"> </span><span class="nn">org.apache.spark.api.java.JavaSparkContext</span><span class="p">;</span>
<span class="kn">import</span><span class="w"> </span><span class="nn">org.apache.spark.sql.Dataset</span><span class="p">;</span>
<span class="kn">import</span><span class="w"> </span><span class="nn">org.apache.spark.sql.Row</span><span class="p">;</span>
<span class="kn">import</span><span class="w"> </span><span class="nn">org.apache.spark.sql.SQLContext</span><span class="p">;</span>
<span class="kn">import</span><span class="w"> </span><span class="nn">org.apache.spark.sql.SaveMode</span><span class="p">;</span>
<span class="kn">import</span><span class="w"> </span><span class="nn">java.util.Arrays</span><span class="p">;</span>
<span class="kd">public</span><span class="w"> </span><span class="kd">class</span> <span class="nc">TestMongoSparkSql</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kd">public</span><span class="w"> </span><span class="kd">static</span><span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="nf">main</span><span class="p">(</span><span class="n">String</span><span class="o">[]</span><span class="w"> </span><span class="n">args</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">SparkContext</span><span class="w"> </span><span class="n">sparkContext</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">new</span><span class="w"> </span><span class="n">SparkContext</span><span class="p">(</span><span class="k">new</span><span class="w"> </span><span class="n">SparkConf</span><span class="p">().</span><span class="na">setAppName</span><span class="p">(</span><span class="s">&quot;datasource-mongo&quot;</span><span class="p">));</span>
<span class="w"> </span><span class="n">JavaSparkContext</span><span class="w"> </span><span class="n">javaSparkContext</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">new</span><span class="w"> </span><span class="n">JavaSparkContext</span><span class="p">(</span><span class="n">sparkContext</span><span class="p">);</span>
<span class="w"> </span><span class="n">SQLContext</span><span class="w"> </span><span class="n">sqlContext</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">new</span><span class="w"> </span><span class="n">SQLContext</span><span class="p">(</span><span class="n">javaSparkContext</span><span class="p">);</span>
<span class="c1">// // Read json file as DataFrame, read csv / parquet file, same as json file distribution</span>
<span class="c1">// DataFrame dataFrame = sqlContext.read().format(&quot;json&quot;).load(&quot;filepath&quot;);</span>
<span class="w"> </span><span class="c1">// Read RDD in JSON format to create DataFrame</span>
<span class="w"> </span><span class="n">JavaRDD</span><span class="o">&lt;</span><span class="n">String</span><span class="o">&gt;</span><span class="w"> </span><span class="n">javaRDD</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">javaSparkContext</span><span class="p">.</span><span class="na">parallelize</span><span class="p">(</span><span class="n">Arrays</span><span class="p">.</span><span class="na">asList</span><span class="p">(</span><span class="s">&quot;{\&quot;id\&quot;:\&quot;5\&quot;,\&quot;name\&quot;:\&quot;Ann\&quot;,\&quot;age\&quot;:\&quot;23\&quot;}&quot;</span><span class="p">));</span>
<span class="w"> </span><span class="n">Dataset</span><span class="o">&lt;</span><span class="n">Row</span><span class="o">&gt;</span><span class="w"> </span><span class="n">dataFrame</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">sqlContext</span><span class="p">.</span><span class="na">read</span><span class="p">().</span><span class="na">json</span><span class="p">(</span><span class="n">javaRDD</span><span class="p">);</span>
<span class="w"> </span><span class="n">String</span><span class="w"> </span><span class="n">url</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">&quot;192.168.4.62:8635,192.168.5.134:8635/test?authSource=admin&quot;</span><span class="p">;</span>
<span class="w"> </span><span class="n">String</span><span class="w"> </span><span class="n">uri</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">&quot;mongodb://username:pwd@host:8635/db&quot;</span><span class="p">;</span>
<span class="w"> </span><span class="n">String</span><span class="w"> </span><span class="n">user</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">&quot;rwuser&quot;</span><span class="p">;</span>
<span class="w"> </span><span class="n">String</span><span class="w"> </span><span class="n">database</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">&quot;test&quot;</span><span class="p">;</span>
<span class="w"> </span><span class="n">String</span><span class="w"> </span><span class="n">collection</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">&quot;test&quot;</span><span class="p">;</span>
<span class="w"> </span><span class="n">String</span><span class="w"> </span><span class="n">password</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">&quot;######&quot;</span><span class="p">;</span>
<span class="w"> </span><span class="n">dataFrame</span><span class="p">.</span><span class="na">write</span><span class="p">().</span><span class="na">format</span><span class="p">(</span><span class="s">&quot;mongo&quot;</span><span class="p">)</span>
<span class="w"> </span><span class="p">.</span><span class="na">option</span><span class="p">(</span><span class="s">&quot;url&quot;</span><span class="p">,</span><span class="n">url</span><span class="p">)</span>
<span class="w"> </span><span class="p">.</span><span class="na">option</span><span class="p">(</span><span class="s">&quot;uri&quot;</span><span class="p">,</span><span class="n">uri</span><span class="p">)</span>
<span class="w"> </span><span class="p">.</span><span class="na">option</span><span class="p">(</span><span class="s">&quot;database&quot;</span><span class="p">,</span><span class="n">database</span><span class="p">)</span><span class="w"> </span>
<span class="w"> </span><span class="p">.</span><span class="na">option</span><span class="p">(</span><span class="s">&quot;collection&quot;</span><span class="p">,</span><span class="n">collection</span><span class="p">)</span>
<span class="w"> </span><span class="p">.</span><span class="na">option</span><span class="p">(</span><span class="s">&quot;user&quot;</span><span class="p">,</span><span class="n">user</span><span class="p">)</span>
<span class="w"> </span><span class="p">.</span><span class="na">option</span><span class="p">(</span><span class="s">&quot;password&quot;</span><span class="p">,</span><span class="n">password</span><span class="p">)</span>
<span class="w"> </span><span class="p">.</span><span class="na">mode</span><span class="p">(</span><span class="n">SaveMode</span><span class="p">.</span><span class="na">Overwrite</span><span class="p">)</span><span class="w"> </span>
<span class="w"> </span><span class="p">.</span><span class="na">save</span><span class="p">();</span>
<span class="w"> </span><span class="n">sqlContext</span><span class="p">.</span><span class="na">read</span><span class="p">().</span><span class="na">format</span><span class="p">(</span><span class="s">&quot;mongo&quot;</span><span class="p">)</span>
<span class="w"> </span><span class="p">.</span><span class="na">option</span><span class="p">(</span><span class="s">&quot;url&quot;</span><span class="p">,</span><span class="n">url</span><span class="p">)</span>
<span class="w"> </span><span class="p">.</span><span class="na">option</span><span class="p">(</span><span class="s">&quot;uri&quot;</span><span class="p">,</span><span class="n">uri</span><span class="p">)</span>
<span class="w"> </span><span class="p">.</span><span class="na">option</span><span class="p">(</span><span class="s">&quot;database&quot;</span><span class="p">,</span><span class="n">database</span><span class="p">)</span>
<span class="w"> </span><span class="p">.</span><span class="na">option</span><span class="p">(</span><span class="s">&quot;collection&quot;</span><span class="p">,</span><span class="n">collection</span><span class="p">)</span>
<span class="w"> </span><span class="p">.</span><span class="na">option</span><span class="p">(</span><span class="s">&quot;user&quot;</span><span class="p">,</span><span class="n">user</span><span class="p">)</span>
<span class="w"> </span><span class="p">.</span><span class="na">option</span><span class="p">(</span><span class="s">&quot;password&quot;</span><span class="p">,</span><span class="n">password</span><span class="p">)</span>
<span class="w"> </span><span class="p">.</span><span class="na">load</span><span class="p">().</span><span class="na">show</span><span class="p">();</span>
<span class="w"> </span><span class="n">sparkContext</span><span class="p">.</span><span class="na">stop</span><span class="p">();</span>
<span class="w"> </span><span class="n">javaSparkContext</span><span class="p">.</span><span class="na">close</span><span class="p">();</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
</pre></div></td></tr></table></div>
</div>
</div>
</div>
<div>
<div class="familylinks">
<div class="parentlink"><strong>Parent topic:</strong> <a href="dli_09_0113.html">Connecting to Mongo</a></div>
</div>
</div>