SQL - 使用 ID NOT IN 的高级重复删除

2023-04-04数据库问题
22

本文介绍了SQL - 使用 ID NOT IN 的高级重复删除的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着跟版网的小编来一起学习吧!

问题描述

我在我的表中发现了需要删除的重复项.该表包含:

I found duplicates in my table which needed to be removed. The table contains:

  1. ID - 表的唯一键
  2. STUDENT_ID - 学生的 ID
  3. SUBJECT_ID - 学生的科目
  4. CLASS_ID - 班级学生在
  5. XP_LVL - 专业水平

一个学生应该只有一个科目、班级和 XP_lvl 的记录.在这种情况下,删除重复项是基于删除所有但保留一个.

One student should have only one record of subject, class and XP_lvl. In this case the removal of duplicities is based on delete all but keep one.

在我的情况下,重复看起来像这样:

In my case duplicates looks like this:

<头>
IDSTUDENT_IDSUBJECT_IDCLASS_IDEXPERTISE_LVL
11AAA55FFECLASS8082
21AAA55FFECLASS8082
32AAB49BBCLASS8903
42AAB49BBCLASS8903
52AAB49BBCLASS8904
62AAB49BBCLASS8903

我通过创建 (CONCAT(STUDENT_ID, CONCAT(SUBJECT_ID, CLASS_ID))) 的唯一 ID,然后通过 count.. >1 识别出所有重复项> 工作正常.

I have identified all the duplicates by creating unique ID of (CONCAT(STUDENT_ID, CONCAT(SUBJECT_ID, CLASS_ID))) and then by having count.. >1 which works fine.

现在我需要识别所有 ID,以便我可以在从查询中删除时使用 ID NOT IN (SELECT...).

Now I need to identify all the ID so I can use ID NOT IN (SELECT...) in my delete from query.

所以我这样做了..

AND ID NOT IN (SELECT UID FROM (
SELECT
    min(ID) AS UID,
    STUDENT_ID,
    SUBJECT_ID,
    CLASS_ID    
FROM
    my_table 
GROUP BY
    STUDENT_ID,
    SUBJECT_ID,
    CLASS_ID    
HAVING
    count(CONCAT(STUDENT_ID, CONCAT(SUBJECT_ID, CLASS_ID))) > 1))

但是,我不能使用 min/max(ID) 来选择要保留的 ID,因为正如您所见,对于学生 2AAB,存在具有不同 XP_LVL 的重复项.

However I cannot use min/max(ID) to choose which ID to keep because as you can see for student 2AAB there are duplicities with different XP_LVL.

在这种情况下,我需要选择最高 XP_LVL 的 ID 来保留和删除所有其他的.

In this case I need to select ID of highest XP_LVL to keep and delete all other.

我尝试使用 RANK、ROWNUM 不同的排序和子选择的负载,但没有想要的结果.

I tried using RANK, ROWNUM different ordering and loaaads of subselects but without desired results.

有谁知道如何有效地做到这一点?我们正在谈论 6k 重复,所以我不能一一做.提前感谢您的帮助.

Does anyone have any idea how to do it effectively ? We are talking about 6k duplicities so I cant do it one by one. Thank you for your help in advance.

推荐答案

您可以使用 ROW_NUMBER 解析函数并使用 ROWID 伪列关联删除:

You can use ROW_NUMBER analytic function and correlate the deletion using the ROWID pseudo-column:

DELETE FROM my_table
WHERE ROWID IN (
  SELECT ROWID
  FROM   (
    SELECT ROW_NUMBER() OVER (
             PARTITION BY student_id, subject_id, class_id
             ORDER BY expertise_lvl DESC
           ) AS rn
    FROM   my_table
  )
  WHERE  rn > 1
)

db<>fiddle 这里

这篇关于SQL - 使用 ID NOT IN 的高级重复删除的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持跟版网!

The End

相关推荐

Mysql目录里的ibtmp1文件过大造成磁盘占满的解决办法
ibtmp1是非压缩的innodb临时表的独立表空间,通过innodb_temp_data_file_path参数指定文件的路径,文件名和大小,默认配置为ibtmp1:12M:autoextend,也就是说在文件系统磁盘足够的情况下,这个文件大小是可以无限增长的。 为了避免ibtmp1文件无止境的暴涨导致...
2025-01-02 数据库问题
151

按天分组的 SQL 查询
SQL query to group by day(按天分组的 SQL 查询)...
2024-04-16 数据库问题
77

SQL 子句“GROUP BY 1"是什么意思?意思是?
What does SQL clause quot;GROUP BY 1quot; mean?(SQL 子句“GROUP BY 1是什么意思?意思是?)...
2024-04-16 数据库问题
62

MySQL groupwise MAX() 返回意外结果
MySQL groupwise MAX() returns unexpected results(MySQL groupwise MAX() 返回意外结果)...
2024-04-16 数据库问题
13

MySQL SELECT 按组最频繁
MySQL SELECT most frequent by group(MySQL SELECT 按组最频繁)...
2024-04-16 数据库问题
16

在 Group By 查询中包含缺失的月份
Include missing months in Group By query(在 Group By 查询中包含缺失的月份)...
2024-04-16 数据库问题
12