SQL Server Window Function 窗体函数读书笔记二 - A Detailed Look at Window Functions
這一章主要是介紹?窗體中的?Aggregate?函數, Rank?函數, Distribution?函數以及?Offset?函數.
?
Window Aggregate?函數
?
Window Aggregate?函數和在Group分組中使用的聚合函數是一樣的,?只是不再定義Group并且是通過?OVER子句來定義和使用的.?在標準的SQL中,?窗體聚合函數是支持這三種元素的?- Partitioning, Ordering?和?Framing
?
function_name(<arguments>) OVER( [ <window partition clause> ] [ <window order clause> [ <window frame clause> ] ] )?
這三種元素的作用可以限制窗體集中的行,?如果沒有指定任何元素,?那么窗體中包含的就是查詢結果集中所有的行.
?
Partitioning?分區
?
通過PARTITION BY?得到的窗體集是基于當前查詢結果的當前行的一個集,?比如說?PARTITION BY CustomerID,?當前行的?CustomerID = 1,?那么對于當前行的這個?Window?集就是在當前查詢結果之上再加上?CustomerID = 1?的一個查詢結果.?如果當前行的?CustomerID = 2,?那么它的窗體就是在查詢結果上所有?CustomerID = 2?的集.
?
與GROUP不同, PARTITION?可以在一個?SELECT?中對應不同的分區列,?并且每一行對應的窗體集也可能而不相同.
與子查詢也不同,子查詢可以任意查詢不同的對象集,而?PARTITION?分區對應的窗口集首先它是基于當前?SELECT?的結果集.
?
回顧上一篇文章中提到的這個例子?-
?
USE TSQL2012; GOSELECT orderid,custid,val,SUM(val) OVER() AS sumall,SUM(val) OVER(PARTITION BY custid) AS sumcust FROM Sales.OrderValues AS O1;-- 查詢結果
10643????????1????????814.50????????1265793.22????????4273.00
10692????????1????????878.00????????1265793.22????????4273.00
10702????????1????????330.00????????1265793.22????????4273.00
10835????????1????????845.80????????1265793.22????????4273.00
10952????????1????????471.20????????1265793.22????????4273.00
11011????????1????????933.50????????1265793.22????????4273.00
10926????????2????????514.40????????1265793.22????????1402.95
10759????????2????????320.00????????1265793.22????????1402.95
10625????????2????????479.75????????1265793.22????????1402.95
?
?
第一個窗體函數每一行對應的都是相同的,它們的窗體都一樣,計算的都是Val的總和.
第二個窗體函數每一行對應的可能是不同的,因為它基于?custid?進行了分區,?即在所有的窗體集基礎上加入了?custid =?當前行的?custid?這個過濾限制.
SELECT orderid,custid,val,CAST(100. * val / SUM(val) OVER() AS NUMERIC(5, 2)) AS pctall,CAST(100. * val / SUM(val) OVER(PARTITION BY custid) AS NUMERIC(5, 2)) AS pctcust FROM Sales.OrderValues AS O1;-- 查詢結果
10643????????1????????814.50????????0.06????????19.06
10692????????1????????878.00????????0.07????????20.55
10702????????1????????330.00????????0.03????????7.72
10835????????1????????845.80????????0.07????????19.79
10952????????1????????471.20????????0.04????????11.03
11011????????1????????933.50????????0.07????????21.85
10926????????2????????514.40????????0.04????????36.67
10759????????2????????320.00????????0.03????????22.81
10625????????2????????479.75????????0.04????????34.20
?
這幾個窗體同時并存
?
Ordering and Framing
?
Framing?框架也是用來過濾和限制窗體集中的行,但同時一般會首先排好序,?然后再通過?Framing?來定位窗體集中的起始行和結束行來獲取特定的行集.
?
function_name(<arguments>) OVER( [ <window partition clause> ] [ <window order clause> [ <window frame clause> ] ] )在?<window frame clause>?中,?有這三個部分?- <window frame units> <window frame extent> [ <window frame exclusion> ].
在?<window frame units>?應該指定?ROWS?或者?RANGE
?
ROWS?的使用
ROWS BETWEEN UNBOUNDED PRECEDING |<n> PRECEDING |<n> FOLLOWING |CURRENT ROW ANDUNBOUNDED FOLLOWING |<n> PRECEDING |<n> FOLLOWING |CURRENT ROWUNBOUNDED PRECEDING?指的是相對于當前行來說之前的所有的行
UNBOUNDED FOLLOWING?指的是相對于當前行來說之后的所有的行
?
CURRENT ROW?就是當前行
?
SELECT empid,ordermonth,qty,SUM(qty) OVER(PARTITION BY empidORDER BY ordermonthROWS BETWEEN UNBOUNDED PRECEDINGAND CURRENT ROW) AS runqty FROM Sales.EmpOrders;-- 查詢結果
1????????2006-07-01 00:00:00.000????????121????????121
1????????2006-08-01 00:00:00.000????????247????????368
1????????2006-09-01 00:00:00.000????????255????????623
1????????2006-10-01 00:00:00.000????????143????????766
1????????2006-11-01 00:00:00.000????????318????????1084
?
?
可以寫的更簡潔一些
?
SELECT empid,ordermonth,qty,SUM(qty) OVER(PARTITION BY empidORDER BY ordermonthROWS UNBOUNDED PRECEDING) AS runqty FROM Sales.EmpOrders;下面的這個例子定義了3個窗體
第一個窗體中?ROWS BETWEEN 1 PRECEDING AND 1 PRECEDING?所表明的行的范圍就是當前行的上一行
第二個窗體中?ROWS BETWEEN 1 FOLLOWING AND 1 FOLLOWING?所表明的行的范圍是當前行的下一行
第三個窗體中?ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING?所表明的行的范圍是上一行到一下行
?
SELECT empid,ordermonth,MAX(qty) OVER(PARTITION BY empidORDER BY ordermonthROWS BETWEEN 1 PRECEDING AND 1 PRECEDING) AS prvqty,qty AS curqty,MAX(qty) OVER(PARTITION BY empidORDER BY ordermonthROWS BETWEEN 1 FOLLOWING AND 1 FOLLOWING) AS nxtqty,AVG(qty) OVER(PARTITION BY empidORDER BY ordermonthROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING ) AS avgqty FROM Sales.EmpOrders;-- 查詢結果
1????????2006-07-01 00:00:00.000????????NULL????? 121????????247????????184
1????????2006-08-01 00:00:00.000????????121???????? 247????????255????????207
1????????2006-09-01 00:00:00.000????????247???????? 255????????143????????215
1????????2006-10-01 00:00:00.000????????255???????? 143????????318????????238
1????????2006-11-01 00:00:00.000????????143???????? 318????????536????????332
1????????2006-12-01 00:00:00.000????????318???????? 536????????304????????386
1????????2007-01-01 00:00:00.000????????536???????? 304????????168????????336
1????????2007-02-01 00:00:00.000????????304???????? 168????????275????????249
?
?
需要值得注意的是,?第三個窗體中求平均值的時候,第一行沒有上一行元素的引用,最后一行也不會存在對下一行的引用,所以對于第一行和最后一行的窗體行數可能比其它的窗體行數相對要少,但是最多不會超過3行,AVG?函數在這里會自動判斷行數來求平均值。
?
在這個例子中,PARTITION的列和?ORDER?列它們組合在一起唯一標識了一行記錄,因此它們在窗體集里不會重復,所以它們查詢出來
的結果是唯一的。
?
當使用在?PARTITION?和?ORDER?上的列組合起來不能唯一確定一行記錄的話,查詢的結果有可能不是唯一的,下面的例子就能說明這個問題。
SET NOCOUNT ON; USE TSQL2012;IF OBJECT_ID('dbo.T1', 'U') IS NOT NULL DROP TABLE dbo.T1; GOCREATE TABLE dbo.T1 (keycol INT NOT NULL CONSTRAINT PK_T1 PRIMARY KEY,col1 VARCHAR(10) NOT NULL );INSERT INTO dbo.T1 VALUES (2, 'A'),(3, 'A'), (5, 'B'),(7, 'B'),(11, 'B'), (13, 'C'),(17, 'C'),(19, 'C'),(23, 'C');SELECT * FROM dbo.T1查詢結果
2????????A
3????????A
5????????B
7????????B
11????????B
13????????C
17????????C
19????????C
23????????C
?
SELECT keycol, col1,COUNT(*) OVER(ORDER BY col1ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS cnt FROM dbo.T1?
查詢結果
2????????A????????1
3????????A????????2
5????????B????????3
7????????B????????4
11????????B????????5
13????????C????????6
17????????C????????7
19????????C????????8
23????????C????????9
?
由于沒有使用?PARTITION,因此默認的情況就是每一行都使用的相同的?PARTITION?即?SELECT?查詢結果集。但是在這里由于ORDER BY?的列?col1?并不是唯一的,因此相同的?col1?的行共享同一個窗體,那么這時它們計算?COUNT?的方式可能就無法確定。
?
比如這個例子中的?A, B, C,?A?對應的窗體里有2條A?的記錄,B?有3條,C?有4條,無法判斷它們如何定位計算。這時?SQL Server就強制性的給相同的元素定了位以便計算各個元素之前的條數。
?
給它們設置一個唯一索引,這樣SQL Server就知道如何在內部對它們進行排序。
?
CREATE UNIQUE INDEX idx_col1D_keycol ON dbo.T1(col1 DESC, keycol);SELECT keycol, col1,COUNT(*) OVER(ORDER BY col1ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS cnt FROM dbo.T1?
查詢結果
3????????? A????????1
2???????? ?A????????2
11?????? ?B????????3
7???????? ?B????????4
5??????? ? B????????5
23????????C????????6
19????????C????????7
17????????C????????8
13????????C????????9
?
理解了這個原因之后,可以這樣寫來確保在窗體集里的排序的元素是唯一的,這樣查詢的結果也一定是唯一的。
SELECT keycol, col1,COUNT(*) OVER(ORDER BY keycol,col1ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS cnt FROM dbo.T12????????A????????1
3????????A????????2
5????????B????????3
7????????B????????4
11????????B????????5
13????????C????????6
17????????C????????7
19????????C????????8
23????????C????????9
?
RANGE?框架的擴展選項和使用
定義
RANGE BETWEEN UNBOUNDED PRECEDING | <val> PRECEDING | <val> FOLLOWING | CURRENT ROW AND UNBOUNDED FOLLOWING | <val> PRECEDING | <val> FOLLOWING | CURRENT ROW要事先說的是 RANGE 框架在 SQL Server 2012 中并沒有實現的很完善, 目前只支持 UNBOUNDED 和? CURRENT ROW 這兩個選項。
比如說,這樣的代碼
?
SELECT empid,ordermonth,qty,SUM(qty) OVER(PARTITION BY empidORDER BY ordermonthRANGE BETWEEN 2 PRECEDING AND CURRENT ROW) FROM Sales.EmpOrders AS O1 ORDER BY empid,ordermonth會出現錯誤
Msg 4194, Level 16, State 1, Line 1
RANGE is only supported with UNBOUNDED and CURRENT ROW window frame delimiters.
?
如果假設有這樣的一個代碼結構能在 2012 中實現的話,那么就真正可以做到一個動態的范圍控制。比如,查詢當前月到它前兩個月的訂單總額。
SELECT empid, ordermonth, qty,SUM(qty) OVER(PARTITION BY empidORDER BY ordermonthRANGE BETWEEN INTERVAL '2' MONTH PRECEDING AND CURRENT ROW) AS sum3month FROM Sales.EmpOrders;假設,當前月是3月,那么它前兩個月就應該包含2月和1月,那么總共的取值范圍就是1,2,3 這三個月。假設2月份沒有數據時,那么結果就應該是1月和3月,而不會因為2月不存在就往前走一個月包含12,1和3月來湊齊3個月,這就是 RANGE 和 ROWS 的不同 。只不過很遺憾,目前還沒有支持到這個程度。
如何理解和 ROWS 不同,ROWS BETWEEN 2 PRECEDING AND CURRENT ROW 表示的是包含當前行以及前兩行的數據。借用上面的一個例子,如果當前月是3月,同時2月份的數據不存在,那么它取值的范圍是 12月,1月和3月。
現在如果想要實現類似于RANGE BETWEEN INTERVAL '2' MONTH PRECEDING 的效果在 Windows Function 中還是非常復雜, 還有一個選擇就是使用下面提到的
這種替代方案。
查詢員工在各個訂單月的訂單額以及從當前月到它前兩個月共三個月的總訂單額
SELECT empid,ordermonth,qty,( SELECT SUM(qty)FROM Sales.EmpOrders AS O2WHERE O2.empid = O1.empidAND O2.ordermonth BETWEEN DATEADD(MONTH, -2, O1.ordermonth)AND O1.ordermonth) AS sum3month FROM Sales.EmpOrders AS O1 ORDER BY empid,ordermonth查詢結果
1????????2006-07-01 00:00:00.000????????121????????121
1????????2006-08-01 00:00:00.000????????247????????368
1????????2006-09-01 00:00:00.000????????255????????623
1????????2006-10-01 00:00:00.000????????143????????645
1????????2006-11-01 00:00:00.000????????318????????716
通過上面這種方式來查詢就實現了 RANGE BETWEEN INTERVAL '2' MONTH PRECEDING? 所描述的效果。
如果只是計算包含當前月以及之前所有月份的總和,在這里使用 RANGE 和 ROWS 效果一樣。
SELECT empid,ordermonth,qty,SUM(qty) OVER(PARTITION BY empidORDER BY ordermonthRANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS runqty FROM Sales.EmpOrders; SELECT empid,ordermonth,qty,SUM(qty) OVER(PARTITION BY empidORDER BY ordermonthROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS runqty FROM Sales.EmpOrders;可以省略掉 CURRENT ROW, 默認就是
SELECT empid,ordermonth,qty,SUM(qty) OVER(PARTITION BY empidORDER BY ordermonthRANGE UNBOUNDED PRECEDING ) AS runqty FROM Sales.EmpOrders;甚至在這里可以更加簡化成
SELECT empid,ordermonth,qty,SUM(qty) OVER(PARTITION BY empidORDER BY ordermonth) AS runqty FROM Sales.EmpOrders;因為這里使用了 PARTITION BY 和 ORDER BY 它們倆唯一定位了一條記錄,雖然沒有顯示寫出 RANGE UNBOUNDED PRECEDING 但是內部處理也是包含了從之前所有行當當前行的所有記錄。
1????????2006-07-01 00:00:00.000????????121????????121
1????????2006-08-01 00:00:00.000????????247????????368
1????????2006-09-01 00:00:00.000????????255????????623
1????????2006-10-01 00:00:00.000????????143????????766
1????????2006-11-01 00:00:00.000????????318????????1084
1????????2006-12-01 00:00:00.000????????536????????1620
?
如果把 ORDER BY 去掉,那么就只剩下 empid, 可以簡單理解為按 empid 分類計算各個 empid 下的總和。
SELECT empid, ordermonth, qty,SUM(qty) OVER(PARTITION BY empid) AS runqty FROM Sales.EmpOrders;查詢結果
1????????2007-03-01 00:00:00.000????????275????????7812
1????????2008-01-01 00:00:00.000????????397????????7812
1????????2007-12-01 00:00:00.000????????583????????7812
1????????2006-11-01 00:00:00.000????????318????????7812
1????????2008-03-01 00:00:00.000????????467????????7812
?
那么再來看看 RANGE 和 ROWS 到底有什么區別?
在之前提到的例子中,我們是假設 RANGE 支持 RANGE BETWEEN INTERVAL '2' MONTH PRECEDING? 這樣的功能來和 ROWS 做比較的。
而事實上我們知道,這個功能并沒有在 SQL Server 2012 中實現。
(未帶完續)
轉載于:https://www.cnblogs.com/biwork/p/3244527.html
總結
以上是生活随笔為你收集整理的SQL Server Window Function 窗体函数读书笔记二 - A Detailed Look at Window Functions的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: JavaScript精要
- 下一篇: DbHelperSQL 判断数据库表结构