编写 SQL 以识别分组中的多个子分组

Write SQL to identify multiple subgroupings within a grouping(编写 SQL 以识别分组中的多个子分组)
本文介绍了编写 SQL 以识别分组中的多个子分组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着跟版网的小编来一起学习吧!

问题描述

我有一个程序可以汇总一个表中的非规范化数据并将其移动到另一个表中,由于数据错误,我们经常在插入时遇到重复的键冲突.我想为用户创建一个报告,以帮助他们确定错误的原因.

I have a program that summarizes non-normalized data in one table and moves it to another and we frequently get a duplicate key violation on the insert due to bad data. I want to create a report for the users to help them identify the cause of the error.

例如,考虑以下人为设计的简单 SQL,它汇总了 Companies 表中的数据并将其插入 CompanySum 中,该 CompanySum 的主键为 State/Zone.为了使 INSERT 不会失败,对于每个唯一的主键 State/Zone 组合,公司/代码的不同组合不能超过一个.如果有,我们希望插入失败,以便可以更正数据.

For example, consider the following contrived simple SQL which summarizes data in the table Companies and inserts it into CompanySum, which has a primary key of State/Zone. In order for the INSERT not to fail, there cannot be more than one distinct combinations of Company/Code for every unique primary key State/Zone combination. If there is, we want the insert to fail so that the data can be corrected.

INSERT INTO CompanySum
(
    [State] 
    ,[Zone] 
    ,[Company]
    ,[Code] 
    ,[Revenue] 
)
SELECT 
    --Keys of target
    [State] 
    ,[Zone] 

    --We are expecting to have one distinct combination of these fields per key grouping
    ,[Company]
    ,[Code] 

    --Aggregate
    ,SUM([Revenue])
    
FROM COMPANIES

GROUP BY
    [State] 
    ,[Zone] 
    ,[Company]
    ,[Code]

我想创建一份报告来帮助用户轻松识别和更正数据,以便在一个州/地区内只有一个不同的公司/代码组合.对于每个不同的州/地区值,我想确定州/地区内不同的公司/代码组合.如果一个州/地区中有多个公司/代码组合,我希望该州/地区中的所有记录都显示在输出中.例如,这里是示例输入和所需的输出:

I would like to create a report to help the users easily identify and correct the data so that there is only one distinct Company/Code combination within a State/Zone. For each distinct State/Zone value, I would like to identify the distinct Company/Code combinations within the State/Zone. If there are more than one Company/Code combinations within a State/Zone, I would like all of the records in the State/Zone to be displayed in the output. For example, here is the sample input and desired output:

Data:

RecordNumber    State   Zone    Company         Code    Revenue
------------    -----   ----    -------         ----    --------
1               CT      B       State of CT     65453    10
2               CT      B       State of CT     65453     3
3               CT      B       Travelers       33443    20
4               CT      C       Cigna           45678    24
5               CT      C       Cigna           45678   234
6               MI      A       GM              48089   100
7               MI      A       GM              54555   200
8               MI      B       Chrysler        43434    44


Desired Output:

RecordNumber    State   Zone    Company         Code    Revenue
------------    -----   ----    -------         ----    --------
1               CT      B       State of CT     65453     10
2               CT      B       State of CT     65453      3
3               CT      B       Travelers       33443     20
6               MI      A       GM              48089    100
7               MI      A       GM              54555    200

这是创建此测试场景所需的 DDL 和 DML

Here is the DDL and DML needed to create this test scenario

CREATE TABLE [dbo].[Companies](
    [RecordNumber] [int] NULL,
    [State] [char](2) NOT NULL,
    [Zone] [varchar](30) NOT NULL,
    [Company] [varchar](30) NOT NULL,
    [Code] [varchar](30) NOT NULL,
    [Revenue] [numeric](9, 1) NULL
) ON [PRIMARY]


CREATE TABLE [dbo].[CompanySum](
    [State] [char](2) NOT NULL,
    [Zone] [varchar](30) NOT NULL,
    [Company] [varchar](30) NOT NULL,
    [Code] [varchar](30) NOT NULL,
    [Revenue] [numeric](9, 1) NULL,
 CONSTRAINT [PK_CompanySum] PRIMARY KEY CLUSTERED 
(
    [State] ASC,
    [Zone] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]


DELETE FROM [dbo].[Companies]
GO

INSERT [dbo].[Companies] ([RecordNumber], [State], [Zone], [Company], [Code], [Revenue]) VALUES (1, N'CT', N'B', N'State of CT', N'65453', CAST(10.0 AS Numeric(9, 1)))
GO
INSERT [dbo].[Companies] ([RecordNumber], [State], [Zone], [Company], [Code], [Revenue]) VALUES (2, N'CT', N'B', N'State of CT', N'65453', CAST(3.0 AS Numeric(9, 1)))
GO
INSERT [dbo].[Companies] ([RecordNumber], [State], [Zone], [Company], [Code], [Revenue]) VALUES (3, N'CT', N'B', N'Travelers', N'33443', CAST(20.0 AS Numeric(9, 1)))
GO
INSERT [dbo].[Companies] ([RecordNumber], [State], [Zone], [Company], [Code], [Revenue]) VALUES (4, N'CT', N'C', N'Cigna', N'45678', CAST(24.0 AS Numeric(9, 1)))
INSERT [dbo].[Companies] ([RecordNumber], [State], [Zone], [Company], [Code], [Revenue]) VALUES (5, N'CT', N'C', N'Cigna', N'45678', CAST(234.0 AS Numeric(9, 1)))
GO
INSERT [dbo].[Companies] ([RecordNumber], [State], [Zone], [Company], [Code], [Revenue]) VALUES (6, N'MI', N'A', N'GM', N'48089', CAST(100.0 AS Numeric(9, 1)))
GO
INSERT [dbo].[Companies] ([RecordNumber], [State], [Zone], [Company], [Code], [Revenue]) VALUES (7, N'MI', N'A', N'GM', N'54555', CAST(200.0 AS Numeric(9, 1)))
GO
INSERT [dbo].[Companies] ([RecordNumber], [State], [Zone], [Company], [Code], [Revenue]) VALUES (8, N'MI', N'B', N'Chrysler', N'43434', CAST(44.0 AS Numeric(9, 1)))
GO

这是对我之前一篇文章的更好重构SQL 返回一组关键列中非关键列的唯一组合,我试图帮助澄清问题并提供一个读者可以使用的简单工作示例.

This is a hopefully better re-construction of a previous post of mine SQL to return unique combinations of non key columns within a set of key columns where I am trying to help clarify the question and provide a simple working example that readers can use.

请看这个 SQL 小提琴:

Please see this SQL Fiddle:

http://sqlfiddle.com/#!18/d0141/1

推荐答案

这是解决方案吗?

小提琴:http://sqlfiddle.com/#!18/12e9a0/9

select c.*
from
    Companies c
        inner join (
            select State, Zone
            from Companies
            group by State, Zone
            having count(distinct Company + Code) > 1
        ) as dup_state_zone
        on(
                c.State = dup_state_zone.State
            and c.Zone  = dup_state_zone.Zone
        )

已编辑 - 修复了 have 子句,有一点作弊...

Edited - Fix the having clause, with a little cheat...

这篇关于编写 SQL 以识别分组中的多个子分组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持跟版网!

本站部分内容来源互联网,如果有图片或者内容侵犯了您的权益,请联系我们,我们会在确认后第一时间进行删除!

相关文档推荐

ibtmp1是非压缩的innodb临时表的独立表空间,通过innodb_temp_data_file_path参数指定文件的路径,文件名和大小,默认配置为ibtmp1:12M:autoextend,也就是说在文件系统磁盘足够的情况下,这个文件大小是可以无限增长的。 为了避免ibtmp1文件无止境的暴涨导致
SQL query to group by day(按天分组的 SQL 查询)
What does SQL clause quot;GROUP BY 1quot; mean?(SQL 子句“GROUP BY 1是什么意思?意思是?)
MySQL groupwise MAX() returns unexpected results(MySQL groupwise MAX() 返回意外结果)
MySQL SELECT most frequent by group(MySQL SELECT 按组最频繁)
Include missing months in Group By query(在 Group By 查询中包含缺失的月份)