doc-exports/docs/dws/dev/dws_06_0108.html
Lu, Huayi ef0ada5a59 DWS DEV 20240716 version
Reviewed-by: Pruthi, Vineet <vineet.pruthi@t-systems.com>
Co-authored-by: Lu, Huayi <luhuayi@huawei.com>
Co-committed-by: Lu, Huayi <luhuayi@huawei.com>
2024-11-02 09:07:47 +00:00

47 lines
8.5 KiB
HTML

<a name="EN-US_TOPIC_0000001510400965"></a><a name="EN-US_TOPIC_0000001510400965"></a>
<h1 class="topictitle1">Ispell Dictionary</h1>
<div id="body1561195448345"><p id="EN-US_TOPIC_0000001510400965__p8060118">The Ispell dictionary template supports morphological dictionaries, which can normalize many different linguistic forms of a word into the same lexeme. For example, an English Ispell dictionary can match all declensions and conjugations of the search term <strong id="EN-US_TOPIC_0000001510400965__b207017481619">bank</strong>, such as <strong id="EN-US_TOPIC_0000001510400965__b1416243965216">banking</strong>, <strong id="EN-US_TOPIC_0000001510400965__b86761741205220">banked</strong>, <strong id="EN-US_TOPIC_0000001510400965__b13762184605210">banks</strong>, <strong id="EN-US_TOPIC_0000001510400965__b1423135355214">banks'</strong>, and <strong id="EN-US_TOPIC_0000001510400965__b153516564527">bank's</strong>.</p>
<p id="EN-US_TOPIC_0000001510400965__p5892155115356"><span id="EN-US_TOPIC_0000001510400965__text1413194050">GaussDB(DWS)</span> does not provide any predefined Ispell dictionaries or dictionary files. The <strong id="EN-US_TOPIC_0000001510400965__b1752414263359">.dict</strong> files and <strong id="EN-US_TOPIC_0000001510400965__b9175172918357">.affix</strong> files support multiple open-source dictionary formats, including <strong id="EN-US_TOPIC_0000001510400965__b1159291018118">Ispell</strong>, <strong id="EN-US_TOPIC_0000001510400965__b334215128112">MySpell</strong>, and <strong id="EN-US_TOPIC_0000001510400965__b182667146117">Hunspell</strong>.</p>
<div class="section" id="EN-US_TOPIC_0000001510400965__section737061503610"><h4 class="sectiontitle">Procedure</h4><ol id="EN-US_TOPIC_0000001510400965__ol14501539114610"><li id="EN-US_TOPIC_0000001510400965__li450163974617"><span>Obtain the dictionary definition file (.dict) and affix file (.affix).</span><p><p id="EN-US_TOPIC_0000001510400965__p959419111211">You can use an open-source dictionary. The name extensions of the open-source dictionary may be <strong id="EN-US_TOPIC_0000001510400965__b1781412515355">.aff</strong> and <strong id="EN-US_TOPIC_0000001510400965__b1081145413354">.dic</strong>. In this case, you need to change them to <strong id="EN-US_TOPIC_0000001510400965__b12108125710353">.affix</strong> and <strong id="EN-US_TOPIC_0000001510400965__b299319583354">.dict</strong>. In addition, for some dictionary files (for example, Norwegian dictionary files), you need to run the following commands to convert the character encoding to UTF-8:</p>
<div class="codecoloring" codetype="Sql" id="EN-US_TOPIC_0000001510400965__screen8456192613377"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span>
<span class="normal">2</span></pre></div></td><td class="code"><div><pre><span></span><span class="n">iconv</span><span class="w"> </span><span class="o">-</span><span class="n">f</span><span class="w"> </span><span class="n">ISO_8859</span><span class="o">-</span><span class="mi">1</span><span class="w"> </span><span class="o">-</span><span class="n">t</span><span class="w"> </span><span class="n">UTF</span><span class="o">-</span><span class="mi">8</span><span class="w"> </span><span class="o">-</span><span class="n">o</span><span class="w"> </span><span class="n">nn_no</span><span class="p">.</span><span class="n">affix</span><span class="w"> </span><span class="n">nn_NO</span><span class="p">.</span><span class="n">aff</span><span class="w"> </span>
<span class="n">iconv</span><span class="w"> </span><span class="o">-</span><span class="n">f</span><span class="w"> </span><span class="n">ISO_8859</span><span class="o">-</span><span class="mi">1</span><span class="w"> </span><span class="o">-</span><span class="n">t</span><span class="w"> </span><span class="n">UTF</span><span class="o">-</span><span class="mi">8</span><span class="w"> </span><span class="o">-</span><span class="n">o</span><span class="w"> </span><span class="n">nn_no</span><span class="p">.</span><span class="n">dict</span><span class="w"> </span><span class="n">nn_NO</span><span class="p">.</span><span class="n">dic</span>
</pre></div></td></tr></table></div>
</div>
</p></li><li id="EN-US_TOPIC_0000001510400965__li18501639134619"><span>Create an Ispell dictionary.</span><p><div class="codecoloring" codetype="Sql" id="EN-US_TOPIC_0000001510400965__screen101864317208"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span>
<span class="normal">2</span>
<span class="normal">3</span>
<span class="normal">4</span>
<span class="normal">5</span>
<span class="normal">6</span></pre></div></td><td class="code"><div><pre><span></span><span class="k">CREATE</span><span class="w"> </span><span class="nb">TEXT</span><span class="w"> </span><span class="k">SEARCH</span><span class="w"> </span><span class="k">DICTIONARY</span><span class="w"> </span><span class="n">norwegian_ispell</span><span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="k">TEMPLATE</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">ispell</span><span class="p">,</span>
<span class="w"> </span><span class="n">DictFile</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">nn_no</span><span class="p">,</span>
<span class="w"> </span><span class="n">AffFile</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">nn_no</span><span class="p">,</span>
<span class="w"> </span><span class="n">FilePath</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">'obs://bucket01/obs.example.com accesskey=xxxxx secretkey=xxxxx region=xx-xx-xx'</span>
<span class="p">);</span>
</pre></div></td></tr></table></div>
</div>
<p id="EN-US_TOPIC_0000001510400965__p436810233391">The full name of the Ispell dictionary file is <strong id="EN-US_TOPIC_0000001510400965__b3946102113417">nn_no.dict</strong> and <strong id="EN-US_TOPIC_0000001510400965__b59471226348">nn_no.affix</strong>, and the dictionary is stored in 'obs://bucket01/obs.example.com accesskey=xxxxx secretkey=xxxxx region=<em id="EN-US_TOPIC_0000001510400965__i1294810211348"><span id="EN-US_TOPIC_0000001510400965__ph49479215346">xx-xx-xx</span></em>'. For details about the syntax and parameters for creating an Ispell dictionary, see <a href="dws_06_0183.html">CREATE TEXT SEARCH DICTIONARY</a>.</p>
</p></li><li id="EN-US_TOPIC_0000001510400965__li1550143934613"><span>Use the Ispell dictionary to split compound words.</span><p><div class="codecoloring" codetype="Sql" id="EN-US_TOPIC_0000001510400965__screen2527244202618"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span>
<span class="normal">2</span>
<span class="normal">3</span>
<span class="normal">4</span>
<span class="normal">5</span></pre></div></td><td class="code"><div><pre><span></span><span class="k">SELECT</span><span class="w"> </span><span class="n">ts_lexize</span><span class="p">(</span><span class="s1">'norwegian_ispell'</span><span class="p">,</span><span class="w"> </span><span class="s1">'sjokoladefabrikk'</span><span class="p">);</span>
<span class="w"> </span><span class="n">ts_lexize</span><span class="w"> </span>
<span class="c1">---------------------</span>
<span class="w"> </span><span class="err">{</span><span class="n">sjokolade</span><span class="p">,</span><span class="n">fabrikk</span><span class="err">}</span>
<span class="p">(</span><span class="mi">1</span><span class="w"> </span><span class="k">row</span><span class="p">)</span>
</pre></div></td></tr></table></div>
</div>
<p id="EN-US_TOPIC_0000001510400965__p199091334174211"><strong id="EN-US_TOPIC_0000001510400965__b437832232110">MySpell</strong> does not support compound words. <strong id="EN-US_TOPIC_0000001510400965__b08570353214">Hunspell</strong> supports compound words. <span id="EN-US_TOPIC_0000001510400965__text786846749">GaussDB(DWS)</span> supports only the basic compound word operations of <strong id="EN-US_TOPIC_0000001510400965__b1239141172312">Hunspell</strong>. Generally, an Ispell dictionary recognizes a limited set of words, so they should be followed by another broader dictionary, for example, a Snowball dictionary, which recognizes everything.</p>
</p></li></ol>
</div>
</div>
<div>
<div class="familylinks">
<div class="parentlink"><strong>Parent topic:</strong> <a href="dws_06_0102.html">Dictionaries</a></div>
</div>
</div>