mysql稠化报表_Oracle使用PARTITION BY 实现数据稠化报表
所謂的數(shù)據(jù)稠化,就是補(bǔ)全缺失的數(shù)據(jù)。因?yàn)樵跀?shù)據(jù)庫表中,存儲(chǔ)的數(shù)據(jù)經(jīng)常是稀疏的(sparse data),也就是不完整的。比如記錄一個(gè)員工每個(gè)月的銷售額,用這么一個(gè)銷售表來記錄:SalesRecord(Name(姓名),Date(日期),Sales(銷售額)),假設(shè)某個(gè)月這個(gè)員工請假?zèng)]上班,對應(yīng)的沒有銷售額,一般也不會(huì)將這個(gè)員工的銷售額存儲(chǔ)為0,而是直接不存儲(chǔ),這樣在銷售表中就會(huì)產(chǎn)生缺失的行,導(dǎo)致的結(jié)果就是這些銷售數(shù)據(jù)在時(shí)間上是不連續(xù)的,或者說就是缺失的。為了后續(xù)的一些統(tǒng)計(jì),需要對數(shù)據(jù)進(jìn)行補(bǔ)全也就是所謂的數(shù)據(jù)稠化。
下面做一個(gè)簡單的例子,
圖 1
需要知道每個(gè)人每科的成績,這里就是需要補(bǔ)全數(shù)據(jù),比如(Lucy的Chinese成績??),因此,最終我們需要的到
的結(jié)果是這樣的(紅色背景是補(bǔ)全的內(nèi)容):
圖 2
步驟如下:
先創(chuàng)建一張成績表Scores
1 --創(chuàng)建Scores表
2 create tableScores(3 stuName varchar2(10),4 subject varchar2(10),5 score number );
然后插入數(shù)據(jù),得到圖1.
接下來,我們是實(shí)現(xiàn)一維(學(xué)科)數(shù)據(jù)稠密,也就是對學(xué)科,每個(gè)人都有每個(gè)學(xué)科,首先想到的是要找出所有學(xué)科(暫且這么做,以后經(jīng)常是對另外一張學(xué)科表join),
廢話少說,找出所有學(xué)科:
select distinct subject from Scores;
我們可以根據(jù)原表利用partition by()語法來進(jìn)行下一步操作
1 --一維稠密數(shù)據(jù)
2 SELECTscores.stuname,3 m.subject,4 NVL(scores.score,0)5 FROMscores6 PARTITION BY(scores.stuname) --這里是重點(diǎn)7 right join
8 (SELECT DISTINCT subject FROMscores) m9 ON scores.subject=m.subject;
最終得到結(jié)果:
顯然,上面的代碼看起來很雜亂,我們可以來個(gè)with as 語法,使代碼看起來清晰:
withv1as (select distinct subject fromscores),SELECT scores.stuname,v1.subject,NVL(scores.score,0) FROMscores
PARTITIONBY(scores.stuname)right joinv1ON scores.subject=v1.subject;
實(shí)現(xiàn)了一維的數(shù)據(jù)稠密,那么給表再添加個(gè)字段—年份,需要知道每個(gè)人,每年的每科成績又怎么稠密呢?同樣的道理
先實(shí)現(xiàn)一維的稠化,再在稠化后的基礎(chǔ)上再稠化一次,以此類推就可以了嘛!
withv1as (select * fromscores), --這是原表
v2as (select distinct subject fromscores), --所有科目
v3as (select distinct dateyear fromscores), --所有年份
v4as (select v1.stuname,v2.subject,v1.score,v1.dateyear fromv1
partitionby(v1.stuname)right joinv2on v1.subject =v2.subject) --v4就是對學(xué)科稠密化后的表,如圖2所示select v4.stuname , v4.subject , NVL(v4.score,0),v3.dateyear fromv4
partitionby(stuname,subject) --注意這里right joinv3 --最后在對v4進(jìn)行年份的稠密,on v4.dateyear=v3.dateyear;
這樣就大功告成啦,每個(gè)人,每年的每課成績均可有了。接下來,需要對著表進(jìn)行行列轉(zhuǎn)換如下圖所示,這里我們以后再說!
下面實(shí)現(xiàn)二維數(shù)據(jù)稠化,我們同樣有如下表:
YEARMONTH
STUDENT
SUBJECT
SCORE
201601
Jim
Chinese
78
201601
Jim
Math
34
201603
Jim
English
89
201605
Jim
Physics
88
201608
Jim
Math
67
201601
Joe
Math
87
201602
Joe
Chinese
87
201604
Joe
Chinese
55
201609
Joe
Math
45
201609
Joe
Physics
90
YEARMONTH
STUDENT
SUBJECT
SCORE
201601
Jim
Chinese
78
201601
Jim
Math
34
201601
Jim
English
0
201601
Jim
Physics
0
201602
Jim
Chinese
0
201602
Jim
Math
0
201602
Jim
English
0
201602
Jim
Physic
0
201603
Jim
Chinese
0
201603
Jim
Math
0
201603
Jim
English
89
201603
Jim
Physics
0
如右表所示,部分補(bǔ)全數(shù)據(jù)為紅色背景的數(shù)據(jù),其分?jǐn)?shù)默認(rèn)為0,這樣我們就能看到
每個(gè)人(student維度)在所有時(shí)間(yearmonth維度)的每科(subject維度)的分?jǐn)?shù)(score度量)
我們的做法應(yīng)該是怎么樣呢?
我們先做前期的準(zhǔn)備,創(chuàng)建一張成績表,并插入相應(yīng)的數(shù)據(jù)
--創(chuàng)建學(xué)生成績表
CREATE TABLEstu_score (
yearmonthnumber,
studentVARCHAR2(20),
subjectvarchar2(20),
scorenumber)--往表中插入數(shù)據(jù)
INSERT INTO stu_score VALUES(201601,'Jim','Chinese',78);INSERT INTO stu_score VALUES(201601,'Jim','Math',34);INSERT INTO stu_score VALUES(201603,'Jim','English',89);INSERT INTO stu_score VALUES(201605,'Jim','Physics',88);INSERT INTO stu_score VALUES(201608,'Jim','Math',67);INSERT INTO stu_score VALUES(201601,'Joe','Math',87);INSERT INTO stu_score VALUES(201602,'Joe','Chinese',87);INSERT INTO stu_score VALUES(201604,'Joe','Chinese',55);INSERT INTO stu_score VALUES(201609,'Joe','Math',45);INSERT INTO stu_score VALUES(201609,'Joe','Physics',90);
同樣,創(chuàng)建一張時(shí)間維度表
--創(chuàng)建時(shí)間維度表
CREATE TABLEDIM_DATE (
yearmonthnumber);INSERT INTO DIM_DATE VALUES(201601);INSERT INTO DIM_DATE VALUES(201602);INSERT INTO DIM_DATE VALUES(201603);INSERT INTO DIM_DATE VALUES(201604);INSERT INTO DIM_DATE VALUES(201605);INSERT INTO DIM_DATE VALUES(201606);INSERT INTO DIM_DATE VALUES(201607);INSERT INTO DIM_DATE VALUES(201608);INSERT INTO DIM_DATE VALUES(201609);INSERT INTO DIM_DATE VALUES(201610);INSERT INTO DIM_DATE VALUES(201611);INSERT INTO DIM_DATE VALUES(201612);
然后,我們需要做的就是去稠化這些數(shù)據(jù),保證在每個(gè)維度都有數(shù)據(jù)
WITH sub AS(SELECT DISTINCT subject FROMstu_score
),
t1as(SELECT t.yearmonth,t.student,sub.subject,t.score FROMstu_score t
PARTITIONBY (t.student) RIGHT JOINsubON t.subject=sub.subject)SELECT dim_date.yearmonth,t1.student,t1.subject,nvl(t1.score,0) FROMt1
PARTITIONBY(student,subject)right JOIN DIM_DATE ON dim_date.yearmonth = t1.yearmonth;
或者,不用創(chuàng)建臨時(shí)表,直接合并
WITH sub AS ( --學(xué)科維度表,將所有學(xué)科選出
SELECT DISTINCT subject FROMstu_score
),SELECT dim_date.yearmonth,t1.student,t1.subject,nvl(t1.score,0) FROM(SELECT t.yearmonth,t.student,sub.subject,t.score FROMstu_score t
PARTITIONBY (t.student) RIGHT JOINsubON t.subject=sub.subject)t1 --對學(xué)科稠化,每個(gè)人在每個(gè)學(xué)科都有數(shù)據(jù)
PARTITION BY(student,subject)right JOIN DIM_DATE ON dim_date.yearmonth = t1.yearmonth; --對日期稠化,保證每個(gè)日期都有數(shù)據(jù)
總結(jié)
以上是生活随笔為你收集整理的mysql稠化报表_Oracle使用PARTITION BY 实现数据稠化报表的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: QTP的那些事--操作excel的函数
- 下一篇: mysql cannot connect