Clickhouse(20.4.2.9) SSB性能测试
生活随笔
收集整理的這篇文章主要介紹了
Clickhouse(20.4.2.9) SSB性能测试
小編覺得挺不錯的,現在分享給大家,幫大家做個參考.
Clickhouse 性能測試
ClickHouse 簡介
ClickHouse 是戰斗民族 Yandex 公司出品的 OLAP 開源數據庫,簡稱 CH,也有人簡稱 CK,是目前市面上最快的
OLAP 數據庫。性能遠超 Vertica、Sybase IQ 等。ClickHouse 可能更適合流式或批次入庫的時序數據。
CH 具有以下幾個特點:
簡言之,如果你有以下業務場景,可以考慮用 CH:
性能測試
選用了 CH 官方提供的一個測試方案:SSBM (Star Schema Benchmark)
服務器配置
[root@p2hadoop075 ssb-dbgen-master]# uname -a Linux p2hadoop075 3.10.0-862.el7.x86_64 #1 SMP Wed Mar 21 18:14:51 EDT 2018 x86_64 x86_64 x86_64 GNU/Linux [root@p2hadoop075 ssb-dbgen-master]# cat /proc/cpuinfo | grep name | cut -f2 -d: | uniq -c64 Intel(R) Xeon(R) Gold 5218 CPU @ 2.30GHz [root@p2hadoop075 ssb-dbgen-master]# grep MemTotal /proc/meminfo MemTotal: 527782880 kBSSB 模型介紹
SSB(Star Schema Benchmark)是麻省州立大學波士頓校區的研究人員定義的基于現實商業應用的數據模型,業界公認用來模擬決策支持類應用,比較公正和中立。
學術界和工業界普遍采用它來評價決策支持技術方面應用的性能。
全方位評測系統的整體商業計算綜合能力,對廠商的要求更高。
在銀行信貸分析和信用卡分析、電信運營分析、稅收分析、煙草行業決策分析中都有廣泛的應用。
SSB 基準測試包括:
- 1 個事實表:lineorder
- 4 個維度表:customer,part,dwdate,supplier
13 條標準 SQL 查詢測試語句:統計查詢、多表關聯、sum、復雜條件、group by、order by 等組合方式
生成數據
# 下載SSBM工具 [root@p2hadoop075 data03]# git clone https://github.com/vadimtk/ssb-dbgen.git [root@p2hadoop075 data03]# cd ssb-dbgen-master [root@p2hadoop075 ssb-dbgen-master]# make# 生成測試數據,機器性能和磁盤有限,所以指定 -s 100 [root@sdw1 ssb-dbgen-master]# ./dbgen -s 100 -T c [root@sdw1 ssb-dbgen-master]# ./dbgen -s 100 -T p [root@sdw1 ssb-dbgen-master]# ./dbgen -s 100 -T s [root@sdw1 ssb-dbgen-master]# ./dbgen -s 100 -T l# 查看文件 [root@sdw1 ssb-dbgen-master]# ll .tbl -rw-r--r-- 1 root root 289529327 4月 26 17:21 customer.tbl -rw-r--r-- 1 root root 63289191180 4月 26 17:38 lineorder.tbl -rw-r--r-- 1 root root 121042413 4月 26 17:21 part.tbl -rw-r--r-- 1 root root 17062852 4月 26 17:21 supplier.tbl [root@sdw1 ssb-dbgen-master]## 查看記錄數 [root@sdw1 ssb-dbgen-master]# wc -l .tbl3000000 customer.tbl600037902 lineorder.tbl1400000 part.tbl200000 supplier.tbl集群建表
? **note:**注意在lineorder表 切勿使用ReplicatedReplacingMergeTree引擎,會將數據去重,導致數據量不對
create database ssb ON CLUSTER center_cluster;show databases; -- customer 本地表 CREATE TABLE ssb.customer_local ON CLUSTER center_cluster (CCUSTKEY UInt32,CNAME String,CADDRESS String,CCITY LowCardinality(String),CNATION LowCardinality(String),CREGION LowCardinality(String),CPHONE String,CMKTSEGMENT LowCardinality(String) ) ENGINE = ReplicatedMergeTree('/clickhouse/tables/ssb/customer_local/{shard}/replicate', '{replica}') ORDER BY (CCUSTKEY) SETTINGS storage_policy = 'multiple_disk', index_granularity = 8192; -- customer 分布式表 CREATE TABLE ssb.customer ON CLUSTER center_cluster AS ssb.customer_local engine = Distributed(center_cluster, ssb, customer_local, rand());-- lineorder 本地表 CREATE TABLE ssb.lineorder_local ON CLUSTER center_cluster (LOORDERKEY UInt32,LOLINENUMBER UInt8,LOCUSTKEY UInt32,LOPARTKEY UInt32,LOSUPPKEY UInt32,LOORDERDATE Date,LOORDERPRIORITY LowCardinality(String),LOSHIPPRIORITY UInt8,LOQUANTITY UInt8,LOEXTENDEDPRICE UInt32,LOORDTOTALPRICE UInt32,LODISCOUNT UInt8,LOREVENUE UInt32,LOSUPPLYCOST UInt32,LOTAX UInt8,LOCOMMITDATE Date,LOSHIPMODE LowCardinality(String) ) ENGINE = ReplicatedMergeTree('/clickhouse/tables/ssb/lineorder_local/{shard}/replicate', '{replica}') PARTITION BY toYear(LOORDERDATE) ORDER BY (LOORDERDATE, LOORDERKEY) SETTINGS storage_policy = 'multiple_disk', index_granularity = 8192;-- lineorder 分布式表 CREATE TABLE ssb.lineorder ON CLUSTER center_cluster AS ssb.lineorder_local engine = Distributed(center_cluster, ssb, lineorder_local, rand());-- part 本地表 CREATE TABLE ssb.part_local ON CLUSTER center_cluster (PPARTKEY UInt32,PNAME String,PMFGR LowCardinality(String),PCATEGORY LowCardinality(String),PBRAND LowCardinality(String),PCOLOR LowCardinality(String),PTYPE LowCardinality(String),PSIZE UInt8,PCONTAINER LowCardinality(String) ) ENGINE = ReplicatedMergeTree('/clickhouse/tables/ssb/part_local/{shard}/replicate', '{replica}') ORDER BY PPARTKEY SETTINGS storage_policy = 'multiple_disk', index_granularity = 8192;-- part 分布式表 CREATE TABLE ssb.part ON CLUSTER center_cluster AS ssb.part_local engine = Distributed(center_cluster, ssb, part_local, rand());-- supplier本地表 CREATE TABLE ssb.supplier_local ON CLUSTER center_cluster (SSUPPKEY UInt32,SNAME String,SADDRESS String,SCITY LowCardinality(String),SNATION LowCardinality(String),SREGION LowCardinality(String),SPHONE String ) ENGINE = ReplicatedMergeTree('/clickhouse/tables/ssb/supplier_local/{shard}/replicate', '{replica}') ORDER BY SSUPPKEY SETTINGS storage_policy = 'multiple_disk', index_granularity = 8192;CREATE TABLE ssb.supplier ON CLUSTER center_cluster AS ssb.supplier_local engine = Distributed(center_cluster, ssb, supplier_local, rand());-- 導入數據 clickhouse-client -h p2hadoop074 --format_csv_delimiter="," --query "INSERT INTO ssb.customer FORMAT CSV" < customer.tbl clickhouse-client -h p2hadoop074 --format_csv_delimiter="," --query "INSERT INTO ssb.part FORMAT CSV" < part.tbl clickhouse-client -h p2hadoop074 --format_csv_delimiter="," --query "INSERT INTO ssb.supplier FORMAT CSV" < supplier.tbl clickhouse-client -h p2hadoop074 --format_csv_delimiter="," --query "INSERT INTO ssb.lineorder FORMAT CSV" < lineorder.tbl-- 查詢數據 SELECT COUNT(*) from ssb.lineorder -- 600037902 SELECT COUNT(*) from ssb.customer -- 3000000 SELECT COUNT(*) from ssb.part -- 1400000 SELECT COUNT(*) from ssb.supplier -- 200000-- lineorderflat本地表 CREATE TABLE ssb.lineorderflat_local ON CLUSTER center_cluster( LOORDERKEY UInt32, LOLINENUMBER UInt8, LOCUSTKEY UInt32 , LOPARTKEY UInt32 , LOSUPPKEY UInt32 , LOORDERDATE Date , LOORDERPRIORITY String , LOSHIPPRIORITY UInt8, LOQUANTITY UInt8, LOEXTENDEDPRICE UInt32 , LOORDTOTALPRICE UInt32 , LODISCOUNT UInt8, LOREVENUE UInt32 , LOSUPPLYCOST UInt32 , LOTAX UInt32 , LOCOMMITDATE Date , LOSHIPMODE String , CNAME String , CADDRESS String , CCITY String , CNATION String , CREGION String , CPHONE String , CMKTSEGMENT String , SNAME String , SADDRESS String , SCITY String , SNATION String , SREGION String , SPHONE String , PNAME String , PMFGR String , PCATEGORY String , PBRAND String , PCOLOR String , PTYPE String , PSIZE UInt8, PCONTAINER String ) ENGINE = ReplicatedMergeTree('/clickhouse/tables/ssb/lineorderflat_local/{shard}/replicate', '{replica}') PARTITION BY toYear(LOORDERDATE) ORDER BY (LOORDERDATE, LOORDERKEY) SETTINGS storage_policy = 'multiple_disk', index_granularity = 8192; -- -- lineorderflat分布式表 CREATE TABLE ssb.lineorderflat ON CLUSTER center_cluster AS ssb.lineorderflat_local engine = Distributed(center_cluster, ssb, lineorderflat_local, rand());-- 導入寬表 INSERT INTO ssb.lineorderflat SELECTl.LOORDERKEY AS LOORDERKEY,l.LOLINENUMBER AS LOLINENUMBER,l.LOCUSTKEY AS LOCUSTKEY,l.LOPARTKEY AS LOPARTKEY,l.LOSUPPKEY AS LOSUPPKEY,l.LOORDERDATE AS LOORDERDATE,l.LOORDERPRIORITY AS LOORDERPRIORITY,l.LOSHIPPRIORITY AS LOSHIPPRIORITY,l.LOQUANTITY AS LOQUANTITY,l.LOEXTENDEDPRICE AS LOEXTENDEDPRICE,l.LOORDTOTALPRICE AS LOORDTOTALPRICE,l.LODISCOUNT AS LODISCOUNT,l.LOREVENUE AS LOREVENUE,l.LOSUPPLYCOST AS LOSUPPLYCOST,l.LOTAX AS LOTAX,l.LOCOMMITDATE AS LOCOMMITDATE,l.LOSHIPMODE AS LOSHIPMODE,c.CNAME AS CNAME,c.CADDRESS AS CADDRESS,c.CCITY AS CCITY,c.CNATION AS CNATION,c.CREGION AS CREGION,c.CPHONE AS CPHONE,c.CMKTSEGMENT AS CMKTSEGMENT,s.SNAME AS SNAME,s.SADDRESS AS SADDRESS,s.SCITY AS SCITY,s.SNATION AS SNATION,s.SREGION AS SREGION,s.SPHONE AS SPHONE,p.PNAME AS PNAME,p.PMFGR AS PMFGR,p.PCATEGORY AS PCATEGORY,p.PBRAND AS PBRAND,p.PCOLOR AS PCOLOR,p.PTYPE AS PTYPE,p.PSIZE AS PSIZE,p.PCONTAINER AS PCONTAINER FROM lineorder AS l INNER JOIN ssb.customer AS c ON c.CCUSTKEY = l.LOCUSTKEY INNER JOIN ssb.supplier AS s ON s.SSUPPKEY = l.LOSUPPKEY INNER JOIN ssb.part AS p ON p.PPARTKEY = l.LOPARTKEYProgress: 23.84 million rows, 1.39 GB (687.66 thousand rows/s., 40.02 MB/s.)查詢測試sql
-- 單表查詢 Q1.1 SELECT sum(LOEXTENDEDPRICE * LODISCOUNT) AS revenue FROM lineorderflat WHERE (toYear(LOORDERDATE) = 1993) AND ((LODISCOUNT >= 1) AND (LODISCOUNT <= 3)) AND (LO_QUANTITY < 25) Q1.2 SELECT sum(LOEXTENDEDPRICE * LODISCOUNT) AS revenue FROM lineorderflat WHERE (toYYYYMM(LOORDERDATE) = 199401) AND ((LODISCOUNT >= 4) AND (LODISCOUNT <= 6)) AND ((LOQUANTITY >= 26) AND (LOQUANTITY <= 35)) Q1.3 SELECT sum(LOEXTENDEDPRICE * LODISCOUNT) AS revenue FROM lineorderflat WHERE (toISOWeek(LOORDERDATE) = 6) AND (toYear(LOORDERDATE) = 1994) AND ((LODISCOUNT >= 5) AND (LODISCOUNT <= 7)) AND ((LOQUANTITY >= 26) AND (LO_QUANTITY <= 35)) Q2.1 SELECT sum(LOREVENUE), toYear(LOORDERDATE) AS year, PBRAND FROM lineorderflat WHERE (PCATEGORY = 'MFGR#12') AND (SREGION = 'AMERICA') GROUP BY year, PBRAND ORDER BY year ASC, PBRAND ASC Q2.2 SELECT sum(LOREVENUE), toYear(LOORDERDATE) AS year, PBRAND FROM lineorderflat WHERE (PBRAND >= 'MFGR#2221') AND (PBRAND <= 'MFGR#2228') AND (SREGION = 'ASIA') GROUP BY year, PBRAND ORDER BY year ASC, PBRAND ASC Q2.3 SELECT sum(LOREVENUE),toYear(LOORDERDATE) AS year,PBRAND FROM lineorderflat WHERE (PBRAND = 'MFGR#2239') AND (SREGION = 'EUROPE')GROUP BY year, PBRAND ORDER BY year ASC, PBRAND ASC Q3.1 SELECT CNATION, SNATION, toYear(LOORDERDATE) AS year, sum(LOREVENUE) AS revenue FROM lineorderflat WHERE (CREGION = 'ASIA') AND (SREGION = 'ASIA') AND (year >= 1992) AND (year <= 1997) GROUP BY CNATION, SNATION, year ORDER BY year ASC, revenue DESCQ3.2 SELECT CCITY, SCITY, toYear(LOORDERDATE) AS year, sum(LOREVENUE) AS revenue FROM lineorderflat WHERE (CNATION = 'UNITED STATES') AND (SNATION = 'UNITED STATES') AND (year >= 1992) AND (year <= 1997) GROUP BY CCITY, SCITY, year ORDER BY year ASC, revenue DESC Q3.3 SELECT CCITY, SCITY, toYear(LOORDERDATE) AS year, sum(LOREVENUE) AS revenue FROM lineorderflat WHERE ((CCITY = 'UNITED KI1') OR (CCITY = 'UNITED KI5')) AND ((SCITY = 'UNITED KI1') OR (SCITY = 'UNITED KI5')) AND (year >= 1992) AND (year <= 1997) GROUP BY CCITY, SCITY, year ORDER BY year ASC, revenue DESCQ3.4 SELECT CCITY, SCITY, toYear(LOORDERDATE) AS year, sum(LOREVENUE) AS revenue FROM lineorderflat WHERE ((CCITY = 'UNITED KI1') OR (CCITY = 'UNITED KI5')) AND ((SCITY = 'UNITED KI1') OR (SCITY = 'UNITED KI5')) AND (toYYYYMM(LOORDERDATE) = 199712) GROUP BY CCITY, SCITY, year ORDER BY year ASC, revenue DESCQ4.1 SELECT toYear(LOORDERDATE) AS year, CNATION, sum(LOREVENUE - LOSUPPLYCOST) AS profit FROM lineorderflat WHERE (CREGION = 'AMERICA') AND (SREGION = 'AMERICA') AND ((PMFGR = 'MFGR#1') OR (PMFGR = 'MFGR#2')) GROUP BY year, CNATION ORDER BY year ASC, CNATION ASC;Q4.2 SELECT toYear(LOORDERDATE) AS year, SNATION, PCATEGORY, sum(LOREVENUE - LOSUPPLYCOST) AS profit FROM lineorderflat WHERE (CREGION = 'AMERICA') AND (SREGION = 'AMERICA') AND ((year = 1997) OR (year = 1998)) AND ((PMFGR = 'MFGR#1') OR (PMFGR = 'MFGR#2')) GROUP BY year, SNATION, PCATEGORY ORDER BY year ASC, SNATION ASC, PCATEGORY ASC ;Q4.3 SELECT toYear(LOORDERDATE) AS year, SCITY, PBRAND, sum(LOREVENUE - LOSUPPLYCOST) AS profit FROM lineorderflat WHERE (SNATION = 'UNITED STATES') AND ((year = 1997) OR (year = 1998)) AND (PCATEGORY = 'MFGR#14') GROUP BY year, SCITY, PBRAND ORDER BY year ASC, SCITY ASC, PBRAND ASC;-- 低基數查詢 --Q1 SELECT count(*), LOSHIPMODE FROM lineorderflat GROUP BY LOSHIPMODE; --Q2 SELECT count(distinct LOSHIPMODE) FROM lineorderflat; --Q3 SELECT COUNT(*),LOSHIPMODE,LOORDERPRIORITY FROM ssb.lineorderflat GROUP BY LOSHIPMODE,LOORDERPRIORITY; --Q4 SELECT COUNT(*),LOSHIPMODE,LOORDERPRIORITY FROM ssb.lineorderflat GROUP BY LOSHIPMODE,LOORDERPRIORITY,LOSHIPPRIORITY; --Q5 SELECT COUNT(*),LOSHIPMODE,SCITY FROM ssb.lineorderflat GROUP BY LOSHIPMODE,SCITY; --Q6 SELECT COUNT(*) FROM ssb.lineorderflat GROUP BY CCITY,SCITY; --Q7 SELECT COUNT(*) FROM ssb.lineorderflat GROUP BY LOSHIPMODE,LOORDERDATE; --Q8 SELECT COUNT(*) FROM ssb.lineorderflat GROUP BY LOORDERDATE,SNATION,SREGION; --Q9 SELECT COUNT(*) FROM ssb.lineorderflat GROUP BY CCITY,SCITY,CNATION,SNATION; --Q10 SELECT COUNT(*) FROM (SELECT COUNT(*) FROM ssb.lineorderflat GROUP BY LOSHIPMODE,LOORDERPRIORITY,PCATEGORY,SNATION,CNATION) T; --Q11 SELECT COUNT(*) FROM (SELECT COUNT(*) FROM ssb.lineorderflat GROUP BY LOSHIPMODE,LOORDERPRIORITY,PCATEGORY,SNATION,CNATION,PMFGR) T; --Q12 SELECT COUNT(*) FROM (SELECT COUNT(*) FROM ssb.lineorderflat GROUP BY SUBSTR(LOSHIPMODE,2),LOWER(LOORDERPRIORITY),PCATEGORY,SNATION,CNATION,SREGION,PMFGR) T;測試結果:
單表測試查詢
| Q1.1 | 27 |
| Q1.2 | 21 |
| Q1.3 | 18 |
| Q2.1 | 376 |
| Q2.2 | 309 |
| Q2.3 | 481 |
| Q3.1 | 792 |
| Q3.2 | 848 |
| Q3.3 | 622 |
| Q3.4 | 33 |
| Q4.1 | 919 |
| Q4.2 | 441 |
| Q4.3 | 295 |
低基數查詢性能
| Q1 | group by 1個低基數列(<50) | 0.199 |
| Q2 | count distinct 1個低基數列(<50) | 0.365 |
| Q3 | group by 2個低基數列 | 2.732 |
| Q4 | group by 2個低基數列,一個int列 | 3.465 |
| Q5 | group by 4個低基數列(7*250) | 0.996 |
| Q6 | group by 2個低基數列(250*250) | 1.947 |
| Q7 | group by 1個低基數列(<50)和1個日期列 | 0.656 |
| Q8 | group by 2個低基數列(<50)和2個日期列 | 0.978 |
| Q9 | group by 4個低基數列 | 3.308 |
| Q10 | group by 5個低基數列(<50) | 4.46 |
| Q11 | group by 6個低基數列(<50) | 5.254 |
| Q12 | group by 7個包含函數計算低基數列(<50) | 5.868 |
總結
以上是生活随笔為你收集整理的Clickhouse(20.4.2.9) SSB性能测试的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: html文档在word打开是乱码怎么解决
- 下一篇: 如何将PayPal中的美元以人民币的形式