forked from docs/doc-exports
Reviewed-by: Pruthi, Vineet <vineet.pruthi@t-systems.com> Reviewed-by: Rechenburg, Matthias <matthias.rechenburg@t-systems.com> Co-authored-by: Lu, Huayi <luhuayi@huawei.com> Co-committed-by: Lu, Huayi <luhuayi@huawei.com>
135 lines
19 KiB
HTML
135 lines
19 KiB
HTML
<a name="EN-US_TOPIC_0000001828076805"></a><a name="EN-US_TOPIC_0000001828076805"></a>
|
|
|
|
<h1 class="topictitle1">How Do I Delete Duplicate Table Data?</h1>
|
|
<div id="body0000001828076805"><p id="EN-US_TOPIC_0000001828076805__p1764923101417">When clearing dirty data in the database, you may retain only one piece of duplicate data. In this scenario, you can use the aggregate function or window function.</p>
|
|
<p id="EN-US_TOPIC_0000001828076805__p66151313143016"><strong id="EN-US_TOPIC_0000001828076805__b179723352303">Constructing Table Data</strong></p>
|
|
<ol id="EN-US_TOPIC_0000001828076805__ol1512517581186"><li id="EN-US_TOPIC_0000001828076805__li19940111122110"><span>Create a table t_customer and insert data that contains duplicate records into the table.</span><p><div class="codecoloring" codetype="Sql" id="EN-US_TOPIC_0000001828076805__screen75317243224"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal"> 1</span>
|
|
<span class="normal"> 2</span>
|
|
<span class="normal"> 3</span>
|
|
<span class="normal"> 4</span>
|
|
<span class="normal"> 5</span>
|
|
<span class="normal"> 6</span>
|
|
<span class="normal"> 7</span>
|
|
<span class="normal"> 8</span>
|
|
<span class="normal"> 9</span>
|
|
<span class="normal">10</span>
|
|
<span class="normal">11</span>
|
|
<span class="normal">12</span>
|
|
<span class="normal">13</span>
|
|
<span class="normal">14</span></pre></div></td><td class="code"><div><pre><span></span><span class="k">CREATE</span><span class="w"> </span><span class="k">TABLE</span><span class="w"> </span><span class="n">t_customer</span><span class="w"> </span><span class="p">(</span>
|
|
<span class="w"> </span><span class="n">id</span><span class="w"> </span><span class="nb">int</span><span class="w"> </span><span class="k">NOT</span><span class="w"> </span><span class="k">NULL</span><span class="p">,</span>
|
|
<span class="n">cust_name</span><span class="w"> </span><span class="nb">varchar</span><span class="p">(</span><span class="mi">32</span><span class="p">)</span><span class="w"> </span><span class="k">NOT</span><span class="w"> </span><span class="k">NULL</span><span class="w"> </span><span class="k">COMMENT</span><span class="s1">' Name'</span><span class="p">,</span>
|
|
<span class="n">gender</span><span class="w"> </span><span class="nb">varchar</span><span class="p">(</span><span class="mi">10</span><span class="p">)</span><span class="w"> </span><span class="k">NOT</span><span class="w"> </span><span class="k">NULL</span><span class="w"> </span><span class="k">COMMENT</span><span class="s1">' Gender'</span><span class="p">,</span>
|
|
<span class="w"> </span><span class="n">email</span><span class="w"> </span><span class="nb">varchar</span><span class="p">(</span><span class="mi">32</span><span class="p">)</span><span class="w"> </span><span class="k">NOT</span><span class="w"> </span><span class="k">NULL</span><span class="w"> </span><span class="k">COMMENT</span><span class="w"> </span><span class="s1">'email'</span><span class="p">,</span>
|
|
<span class="w"> </span><span class="k">PRIMARY</span><span class="w"> </span><span class="k">KEY</span><span class="w"> </span><span class="p">(</span><span class="n">id</span><span class="p">)</span>
|
|
<span class="p">)</span><span class="w"> </span><span class="p">;</span>
|
|
|
|
<span class="k">INSERT</span><span class="w"> </span><span class="k">INTO</span><span class="w"> </span><span class="n">t_customer</span><span class="w"> </span><span class="k">VALUES</span><span class="w"> </span><span class="p">(</span><span class="s1">'1'</span><span class="p">,</span><span class="w"> </span><span class="s1">'Tom'</span><span class="p">,</span><span class="w"> </span><span class="s1">'Male'</span><span class="p">,</span><span class="w"> </span><span class="s1">'high_salary@sample.com'</span><span class="p">);</span>
|
|
<span class="k">INSERT</span><span class="w"> </span><span class="k">INTO</span><span class="w"> </span><span class="n">t_customer</span><span class="w"> </span><span class="k">VALUES</span><span class="w"> </span><span class="p">(</span><span class="s1">'2'</span><span class="p">,</span><span class="w"> </span><span class="s1">'Jennifer'</span><span class="p">,</span><span class="w"> </span><span class="s1">'Female'</span><span class="p">,</span><span class="w"> </span><span class="s1">'good_job@sample.com'</span><span class="p">);</span>
|
|
<span class="k">INSERT</span><span class="w"> </span><span class="k">INTO</span><span class="w"> </span><span class="n">t_customer</span><span class="w"> </span><span class="k">VALUES</span><span class="w"> </span><span class="p">(</span><span class="s1">'3'</span><span class="p">,</span><span class="w"> </span><span class="s1">'Tom'</span><span class="p">,</span><span class="w"> </span><span class="s1">'Male'</span><span class="p">,</span><span class="w"> </span><span class="s1">'high_salary@sample.com'</span><span class="p">);</span>
|
|
<span class="k">INSERT</span><span class="w"> </span><span class="k">INTO</span><span class="w"> </span><span class="n">t_customer</span><span class="w"> </span><span class="k">VALUES</span><span class="w"> </span><span class="p">(</span><span class="s1">'4'</span><span class="p">,</span><span class="w"> </span><span class="s1">'John'</span><span class="p">,</span><span class="w"> </span><span class="s1">'Male'</span><span class="p">,</span><span class="w"> </span><span class="s1">'good_job@sample.com'</span><span class="p">);</span>
|
|
<span class="k">INSERT</span><span class="w"> </span><span class="k">INTO</span><span class="w"> </span><span class="n">t_customer</span><span class="w"> </span><span class="k">VALUES</span><span class="w"> </span><span class="p">(</span><span class="s1">'5'</span><span class="p">,</span><span class="w"> </span><span class="s1">'Jennifer'</span><span class="p">,</span><span class="w"> </span><span class="s1">'Female'</span><span class="p">,</span><span class="w"> </span><span class="s1">'good_job@sample.com'</span><span class="p">);</span>
|
|
<span class="k">INSERT</span><span class="w"> </span><span class="k">INTO</span><span class="w"> </span><span class="n">t_customer</span><span class="w"> </span><span class="k">VALUES</span><span class="w"> </span><span class="p">(</span><span class="s1">'6'</span><span class="p">,</span><span class="w"> </span><span class="s1">'Tom'</span><span class="p">,</span><span class="w"> </span><span class="s1">'Male'</span><span class="p">,</span><span class="w"> </span><span class="s1">'high_salary@sample.com'</span><span class="p">);</span>
|
|
</pre></div></td></tr></table></div>
|
|
|
|
</div>
|
|
</p></li><li id="EN-US_TOPIC_0000001828076805__li10383151572112"><span>Query the t_customer table.</span><p><div class="codecoloring" codetype="Sql" id="EN-US_TOPIC_0000001828076805__screen88014762611"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span></pre></div></td><td class="code"><div><pre><span></span><span class="k">SELECT</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">t_customer</span><span class="w"> </span><span class="k">ORDER</span><span class="w"> </span><span class="k">BY</span><span class="w"> </span><span class="n">id</span><span class="p">;</span>
|
|
</pre></div></td></tr></table></div>
|
|
|
|
</div>
|
|
<p id="EN-US_TOPIC_0000001828076805__p194717569252"><span><img id="EN-US_TOPIC_0000001828076805__image64716567258" src="figure/en-us_image_0000001828838801.png"></span></p>
|
|
</p></li></ol>
|
|
<p id="EN-US_TOPIC_0000001828076805__p7521113103119">If the name, gender, and email of a customer are the same, the customer is regarded as a duplicate record. In the t_customer table, data whose IDs are 1, 3, and 6 is duplicate, and data whose IDs are 2 and 5 is also duplicate. Delete redundant data and retain one of them.</p>
|
|
<p id="EN-US_TOPIC_0000001828076805__p3945145418313">Method 1: Use the aggregate function <strong id="EN-US_TOPIC_0000001828076805__b14282121724814">min(expr)</strong>.</p>
|
|
<p id="EN-US_TOPIC_0000001828076805__p12561449103216">Use aggregate functions to obtain non-duplicate rows with the smallest ID through subqueries, and then use NOT IN to delete duplicate data.</p>
|
|
<ol id="EN-US_TOPIC_0000001828076805__ol1360511333401"><li id="EN-US_TOPIC_0000001828076805__li1060533310405"><span>Run the following command to query the unique row with the smallest ID:</span><p><div class="codecoloring" codetype="Sql" id="EN-US_TOPIC_0000001828076805__screen157963563422"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span>
|
|
<span class="normal">2</span>
|
|
<span class="normal">3</span>
|
|
<span class="normal">4</span>
|
|
<span class="normal">5</span>
|
|
<span class="normal">6</span>
|
|
<span class="normal">7</span>
|
|
<span class="normal">8</span></pre></div></td><td class="code"><div><pre><span></span><span class="k">SELECT</span>
|
|
<span class="w"> </span><span class="k">min</span><span class="p">(</span><span class="n">id</span><span class="p">)</span><span class="w"> </span><span class="n">id</span><span class="p">,</span>
|
|
<span class="w"> </span><span class="n">cust_name</span><span class="p">,</span>
|
|
<span class="w"> </span><span class="n">gender</span><span class="p">,</span>
|
|
<span class="w"> </span><span class="k">COUNT</span><span class="p">(</span><span class="w"> </span><span class="n">cust_name</span><span class="w"> </span><span class="p">)</span><span class="w"> </span><span class="k">count</span><span class="w"> </span>
|
|
<span class="k">FROM</span><span class="w"> </span><span class="n">t_customer</span><span class="w"> </span>
|
|
<span class="k">GROUP</span><span class="w"> </span><span class="k">BY</span><span class="w"> </span><span class="n">cust_name</span><span class="p">,</span><span class="n">gender</span>
|
|
<span class="k">ORDER</span><span class="w"> </span><span class="k">BY</span><span class="w"> </span><span class="n">id</span><span class="p">;</span>
|
|
</pre></div></td></tr></table></div>
|
|
|
|
</div>
|
|
<p id="EN-US_TOPIC_0000001828076805__p178441931164217"><span><img id="EN-US_TOPIC_0000001828076805__image1884419318423" src="figure/en-us_image_0000001829012173.png"></span></p>
|
|
<p id="EN-US_TOPIC_0000001828076805__p784443164211">According to the query result, duplicate data rows whose IDs are 3, 5, and 6 are filtered out.</p>
|
|
</p></li><li id="EN-US_TOPIC_0000001828076805__li22024814017"><span>Use NOT IN to filter out duplicate data rows and delete them.</span><p><div class="codecoloring" codetype="Sql" id="EN-US_TOPIC_0000001828076805__screen9418102554518"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span>
|
|
<span class="normal">2</span>
|
|
<span class="normal">3</span>
|
|
<span class="normal">4</span>
|
|
<span class="normal">5</span>
|
|
<span class="normal">6</span></pre></div></td><td class="code"><div><pre><span></span><span class="k">DELETE</span><span class="w"> </span><span class="k">from</span><span class="w"> </span><span class="n">t_customer</span><span class="w"> </span><span class="k">where</span><span class="w"> </span><span class="n">id</span><span class="w"> </span><span class="k">not</span><span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="p">(</span>
|
|
<span class="w"> </span><span class="k">SELECT</span>
|
|
<span class="w"> </span><span class="k">min</span><span class="p">(</span><span class="n">id</span><span class="p">)</span><span class="w"> </span><span class="n">id</span>
|
|
<span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">t_customer</span>
|
|
<span class="w"> </span><span class="k">GROUP</span><span class="w"> </span><span class="k">BY</span><span class="w"> </span><span class="n">cust_name</span><span class="p">,</span><span class="n">gender</span>
|
|
<span class="p">);</span>
|
|
</pre></div></td></tr></table></div>
|
|
|
|
</div>
|
|
</p></li><li id="EN-US_TOPIC_0000001828076805__li919214542416"><span>Query the t_customer table after duplicate data is deleted.</span><p><div class="codecoloring" codetype="Sql" id="EN-US_TOPIC_0000001828076805__screen1312512920466"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span></pre></div></td><td class="code"><div><pre><span></span><span class="k">SELECT</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">t_customer</span><span class="w"> </span><span class="k">ORDER</span><span class="w"> </span><span class="k">BY</span><span class="w"> </span><span class="n">id</span><span class="p">;</span>
|
|
</pre></div></td></tr></table></div>
|
|
|
|
</div>
|
|
<p id="EN-US_TOPIC_0000001828076805__p17678192854615"><span><img id="EN-US_TOPIC_0000001828076805__image632743084618" src="figure/en-us_image_0000001829015621.png"></span></p>
|
|
<p id="EN-US_TOPIC_0000001828076805__p18282327464">The command output indicates that duplicate data has been deleted.</p>
|
|
</p></li></ol>
|
|
<p id="EN-US_TOPIC_0000001828076805__p03759316477"><strong id="EN-US_TOPIC_0000001828076805__b15871455184716">Method 2: Use the window function row_number().</strong></p>
|
|
<p id="EN-US_TOPIC_0000001828076805__p896782312471">Use PARTITION BY to partition and sort columns, generate sequence number columns, and delete rows whose sequence numbers are greater than 1.</p>
|
|
<ol id="EN-US_TOPIC_0000001828076805__ol625953213119"><li id="EN-US_TOPIC_0000001828076805__li1025993271111"><span>Partition query. Sort columns by partition and generate sequence number columns.</span><p><div class="codecoloring" codetype="Sql" id="EN-US_TOPIC_0000001828076805__screen650024591217"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span>
|
|
<span class="normal">2</span>
|
|
<span class="normal">3</span>
|
|
<span class="normal">4</span>
|
|
<span class="normal">5</span>
|
|
<span class="normal">6</span></pre></div></td><td class="code"><div><pre><span></span><span class="k">SELECT</span>
|
|
<span class="w"> </span><span class="n">id</span><span class="p">,</span>
|
|
<span class="w"> </span><span class="n">cust_name</span><span class="p">,</span>
|
|
<span class="w"> </span><span class="n">gender</span><span class="p">,</span>
|
|
<span class="w"> </span><span class="n">ROW_NUMBER</span><span class="p">()</span><span class="w"> </span><span class="n">OVER</span><span class="w"> </span><span class="p">(</span><span class="n">PARTITION</span><span class="w"> </span><span class="k">BY</span><span class="w"> </span><span class="n">cust_name</span><span class="p">,</span><span class="n">gender</span><span class="w"> </span><span class="k">ORDER</span><span class="w"> </span><span class="k">BY</span><span class="w"> </span><span class="n">id</span><span class="p">)</span><span class="w"> </span><span class="n">num</span>
|
|
<span class="k">FROM</span><span class="w"> </span><span class="n">t_customer</span><span class="p">;</span>
|
|
</pre></div></td></tr></table></div>
|
|
|
|
</div>
|
|
<p id="EN-US_TOPIC_0000001828076805__p128153177164"><span><img id="EN-US_TOPIC_0000001828076805__image10269144201613" src="figure/en-us_image_0000001782493466.png"></span></p>
|
|
<p id="EN-US_TOPIC_0000001828076805__p6988162916171">According to the command output, the data in num>1 is duplicate.</p>
|
|
</p></li><li id="EN-US_TOPIC_0000001828076805__li1250194981617"><span>Delete the data of num>1.</span><p><div class="codecoloring" codetype="Sql" id="EN-US_TOPIC_0000001828076805__screen1218422651912"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span>
|
|
<span class="normal">2</span>
|
|
<span class="normal">3</span>
|
|
<span class="normal">4</span>
|
|
<span class="normal">5</span>
|
|
<span class="normal">6</span>
|
|
<span class="normal">7</span>
|
|
<span class="normal">8</span></pre></div></td><td class="code"><div><pre><span></span><span class="k">DELETE</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">t_customer</span><span class="w"> </span><span class="k">WHERE</span><span class="w"> </span><span class="n">id</span><span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="p">(</span>
|
|
<span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="n">id</span><span class="w"> </span><span class="k">FROM</span><span class="p">(</span>
|
|
<span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="p">(</span>
|
|
<span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="n">ROW_NUMBER</span><span class="p">()</span><span class="w"> </span><span class="n">OVER</span><span class="w"> </span><span class="n">w</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">row_num</span><span class="p">,</span><span class="n">id</span>
|
|
<span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">t_customer</span><span class="w"> </span>
|
|
<span class="w"> </span><span class="n">WINDOW</span><span class="w"> </span><span class="n">w</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="p">(</span><span class="n">PARTITION</span><span class="w"> </span><span class="k">BY</span><span class="w"> </span><span class="n">cust_name</span><span class="p">,</span><span class="n">gender</span><span class="w"> </span><span class="k">ORDER</span><span class="w"> </span><span class="k">BY</span><span class="w"> </span><span class="n">id</span><span class="p">)</span><span class="w"> </span><span class="p">)</span>
|
|
<span class="w"> </span><span class="k">WHERE</span><span class="w"> </span><span class="n">row_num</span><span class="w"> </span><span class="o">></span><span class="mi">1</span><span class="w"> </span><span class="p">)</span>
|
|
<span class="p">);</span>
|
|
</pre></div></td></tr></table></div>
|
|
|
|
</div>
|
|
</p></li><li id="EN-US_TOPIC_0000001828076805__li16683315142016"><span>Query the t_customer table after duplicate data is deleted.</span><p><div class="codecoloring" codetype="Sql" id="EN-US_TOPIC_0000001828076805__screen5643337152017"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span></pre></div></td><td class="code"><div><pre><span></span><span class="k">SELECT</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">t_customer</span><span class="w"> </span><span class="k">ORDER</span><span class="w"> </span><span class="k">BY</span><span class="w"> </span><span class="n">id</span><span class="p">;</span>
|
|
</pre></div></td></tr></table></div>
|
|
|
|
</div>
|
|
<p id="EN-US_TOPIC_0000001828076805__p10169475210"><span><img id="EN-US_TOPIC_0000001828076805__image1665514718217" src="figure/en-us_image_0000001782496566.png"></span></p>
|
|
</p></li></ol>
|
|
</div>
|
|
<div>
|
|
<div class="familylinks">
|
|
<div class="parentlink"><strong>Parent topic:</strong> <a href="dws_03_0065.html">Database Usage</a></div>
|
|
</div>
|
|
</div>
|
|
|