clickhouse SSB 性能测试
SSB(Star Schema Benchmark)的介紹論文地址:
https://www.cs.umb.edu/~poneil/StarSchemaB.PDF
官網鏈接 https://clickhouse.com/docs/en/getting-started/example-datasets/star-schema/
如果安裝系統時,時最小化mini安裝,經常會提示很多命令不存在
- 提示 git 不存在,使用 yum install git 安裝即可
- 提示 make: command not found,使用以下命令安裝 make yum install -y gcc gcc-c++ automake autoconf libtool make
ssb-dbgen 測試工具 GitHub 地址 https://github.com/vadimtk/ssb-dbgen
下載并編譯測試工具
git clone https://github.com/vadimtk/ssb-dbgen.git cd ssb-dbgen make之后就會在當前目錄生成 dbgen 和 qgen 這兩個可執行文件
|
| ./dbgen -s 1 -T p | part.tbl |
| ./dbgen -s 1 -T s | suppliers.tbl |
| ./dbgen -s 1 -T c | customers.tbl |
| ./dbgen -s 1 -T d | date.tbl |
| ./dbgen -s 1 -T l | lineorder.tbl |
| ./dbgen -s 1 -T a | 一次性生成以上所有表 |
添加 -h 符號,會將文件大小進行格式化,并顯示單位
使用 head -n 10 customer.tbl 命令打印前10行可以看到,tbl 文件是用逗號分隔列,然后使用換行符分割行的
這里數據量業界有一個統稱叫做SF 1SF == 1G
裝載數據 1G -s 1 == 1G #:但這樣有一個弊端 那就是如果裝載的數據量特別大的時候例如 1T 這樣基本需要花費一天的時間 所以我來用多線程的dbgen方法
方法如下:
./dbgen -vfF -s 1000 -S 1 -C 6 & ./dbgen -vfF -s 1000 -S 2 -C 6 & ./dbgen -vfF -s 1000 -S 3 -C 6 & ./dbgen -vfF -s 1000 -S 4 -C 6 & ./dbgen -vfF -s 1000 -S 5 -C 6 & ./dbgen -vfF -s 1000 -S 6 -C 6參數詳解
- -v 詳細信息
- -s 表示生成數據的規模
- -S 切分數據
- -f 覆蓋之前的文件
dss.ddl 這個文件存儲的是建表的語句
cat dss.ddl 逐一執行里面的建表語句
建表語句可以在官網拿到,使用外部工具,比如 DBeaver 建表
CREATE DATABASE IF NOT EXISTS ssb USE ssb CREATE TABLE IF NOT EXISTS customer (C_CUSTKEY UInt32,C_NAME String,C_ADDRESS String,C_CITY LowCardinality(String),C_NATION LowCardinality(String),C_REGION LowCardinality(String),C_PHONE String,C_MKTSEGMENT LowCardinality(String) ) ENGINE = MergeTree ORDER BY (C_CUSTKEY);CREATE TABLE IF NOT EXISTS lineorder (LO_ORDERKEY UInt32,LO_LINENUMBER UInt8,LO_CUSTKEY UInt32,LO_PARTKEY UInt32,LO_SUPPKEY UInt32,LO_ORDERDATE Date,LO_ORDERPRIORITY LowCardinality(String),LO_SHIPPRIORITY UInt8,LO_QUANTITY UInt8,LO_EXTENDEDPRICE UInt32,LO_ORDTOTALPRICE UInt32,LO_DISCOUNT UInt8,LO_REVENUE UInt32,LO_SUPPLYCOST UInt32,LO_TAX UInt8,LO_COMMITDATE Date,LO_SHIPMODE LowCardinality(String) ) ENGINE = MergeTree PARTITION BY toYear(LO_ORDERDATE) ORDER BY (LO_ORDERDATE, LO_ORDERKEY);CREATE TABLE IF NOT EXISTS part (P_PARTKEY UInt32,P_NAME String,P_MFGR LowCardinality(String),P_CATEGORY LowCardinality(String),P_BRAND LowCardinality(String),P_COLOR LowCardinality(String),P_TYPE LowCardinality(String),P_SIZE UInt8,P_CONTAINER LowCardinality(String) ) ENGINE = MergeTree ORDER BY P_PARTKEY;CREATE TABLE IF NOT EXISTS supplier (S_SUPPKEY UInt32,S_NAME String,S_ADDRESS String,S_CITY LowCardinality(String),S_NATION LowCardinality(String),S_REGION LowCardinality(String),S_PHONE String ) ENGINE = MergeTree ORDER BY S_SUPPKEY;之后定位到 dbgen 生成的目錄,使用 pwd 命令,可以打印出當前目錄
接下來,使用 clickhouse-client 工具,將 tbl 中的數據導入數據庫
注意:如果你使用的不是 default 數據庫(默認數據庫),請在表前面加上數據庫前綴,否則會報如下錯誤
這是維度為1,即 -s 1 情況下的數據量
SELECT (SELECT COUNT(1) FROM customer) AS customer, (SELECT COUNT(1) FROM lineorder) AS lineorder, (SELECT COUNT(1) FROM part) AS part,(SELECT COUNT(1) FROM supplier) AS supplier
下面這段SQL,會將星型模式(star schema)轉化為 非標準化的(denormalized)平面模型(flat schema)。也就是說,將原本相關聯的表結構,通過某種關系,整合到一張表里去。
clickhouse 默認內存大小是1G。在做 SSB 的星型轉平面模型的時,如果沒有增加最大內存限制,就會報如下錯誤(DB::Exception: Memory limit (total) exceeded)
注意:set_memory_usage = 20000000000 這個函數在外部的數據庫管理工具,比如DBeaver中是無法設置的。但在 clickhouse-client 中設置卻有效(本地或遠程均可)。這是因為,該函數僅支持在 TCP 模式下被調用,即 port=9000
另外,set 函數是針對當前會話的,只要一退出,立馬又會還原成之前的樣子。
查看變量值的命令為:SELECT name,value FROM system.settings WHERE name = 'max_memory_usage'
由于0太多,不好數。我們可以使用 formatReadableSize 函數將其格式化。但由于 system.settings 這個表中的 value 是字符串類型的,因此我們必須將起轉為 Int 類型,這就要用到 toInt64 函數了(Int32長度不夠,會導致數值溢出)
完整的命令如下:SELECT name,formatReadableSize(toInt64(value)) FROM system.settings WHERE name = 'max_memory_usage';
接著先將可用內存增大后,再刪除之前創建后轉化失敗的表,然后進行測試
如果機子內存沒有這么大,上述語句無效。但可以通過 SET min_insert_block_size_rows=8192; 減小單次插入塊的行大小(默認值為 1048545)這樣就能安全通過了。
可以看到在虛擬機3G內存下的速度為:每秒71萬行,30MB/S
在執行時,打開另一個終端,在clickhouse-client 中執行 SHOW PROCESSLIST 會打印當前執行的進度
將字符串復制出來,把單引號替換成雙引號。放在在線JSON格式化頁面進行轉義
運行下面查詢語句(使用 USE ssb; 設置默認數據庫)
Q1.1
SELECT sum(LO_EXTENDEDPRICE * LO_DISCOUNT) AS revenue FROM lineorder_flat WHERE toYear(LO_ORDERDATE) = 1993 AND LO_DISCOUNT BETWEEN 1 AND 3 AND LO_QUANTITY < 25;Q1.2
SELECT sum(LO_EXTENDEDPRICE * LO_DISCOUNT) AS revenue FROM lineorder_flat WHERE toYYYYMM(LO_ORDERDATE) = 199401 AND LO_DISCOUNT BETWEEN 4 AND 6 AND LO_QUANTITY BETWEEN 26 AND 35;Q1.3
SELECT sum(LO_EXTENDEDPRICE * LO_DISCOUNT) AS revenue FROM lineorder_flat WHERE toISOWeek(LO_ORDERDATE) = 6 AND toYear(LO_ORDERDATE) = 1994AND LO_DISCOUNT BETWEEN 5 AND 7 AND LO_QUANTITY BETWEEN 26 AND 35;Q2.1
SELECTsum(LO_REVENUE),toYear(LO_ORDERDATE) AS year,P_BRAND FROM lineorder_flat WHERE P_CATEGORY = 'MFGR#12' AND S_REGION = 'AMERICA' GROUP BYyear,P_BRAND ORDER BYyear,P_BRAND;
Q2.2
SELECTsum(LO_REVENUE),toYear(LO_ORDERDATE) AS year,P_BRAND FROM lineorder_flat WHERE P_BRAND >= 'MFGR#2221' AND P_BRAND <= 'MFGR#2228' AND S_REGION = 'ASIA' GROUP BYyear,P_BRAND ORDER BYyear,P_BRAND;
Q2.3
SELECTsum(LO_REVENUE),toYear(LO_ORDERDATE) AS year,P_BRAND FROM lineorder_flat WHERE P_BRAND = 'MFGR#2239' AND S_REGION = 'EUROPE' GROUP BYyear,P_BRAND ORDER BYyear,P_BRAND;Q3.1
SELECTC_NATION,S_NATION,toYear(LO_ORDERDATE) AS year,sum(LO_REVENUE) AS revenue FROM lineorder_flat WHERE C_REGION = 'ASIA' AND S_REGION = 'ASIA' AND year >= 1992 AND year <= 1997 GROUP BYC_NATION,S_NATION,year ORDER BYyear ASC,revenue DESC;
Q3.2
SELECTC_CITY,S_CITY,toYear(LO_ORDERDATE) AS year,sum(LO_REVENUE) AS revenue FROM lineorder_flat WHERE C_NATION = 'UNITED STATES' AND S_NATION = 'UNITED STATES' AND year >= 1992 AND year <= 1997 GROUP BYC_CITY,S_CITY,year ORDER BYyear ASC,revenue DESC;Q3.3
SELECTC_CITY,S_CITY,toYear(LO_ORDERDATE) AS year,sum(LO_REVENUE) AS revenue FROM lineorder_flat WHERE (C_CITY = 'UNITED KI1' OR C_CITY = 'UNITED KI5') AND (S_CITY = 'UNITED KI1' OR S_CITY = 'UNITED KI5') AND year >= 1992 AND year <= 1997 GROUP BYC_CITY,S_CITY,year ORDER BYyear ASC,revenue DESC;Q3.4
SELECTC_CITY,S_CITY,toYear(LO_ORDERDATE) AS year,sum(LO_REVENUE) AS revenue FROM lineorder_flat WHERE (C_CITY = 'UNITED KI1' OR C_CITY = 'UNITED KI5') AND (S_CITY = 'UNITED KI1' OR S_CITY = 'UNITED KI5') AND toYYYYMM(LO_ORDERDATE) = 199712 GROUP BYC_CITY,S_CITY,year ORDER BYyear ASC,revenue DESC;Q4.1
SELECTtoYear(LO_ORDERDATE) AS year,C_NATION,sum(LO_REVENUE - LO_SUPPLYCOST) AS profit FROM lineorder_flat WHERE C_REGION = 'AMERICA' AND S_REGION = 'AMERICA' AND (P_MFGR = 'MFGR#1' OR P_MFGR = 'MFGR#2') GROUP BYyear,C_NATION ORDER BYyear ASC,C_NATION ASC;Q4.2
SELECTtoYear(LO_ORDERDATE) AS year,S_NATION,P_CATEGORY,sum(LO_REVENUE - LO_SUPPLYCOST) AS profit FROM lineorder_flat WHERE C_REGION = 'AMERICA' AND S_REGION = 'AMERICA' AND (year = 1997 OR year = 1998) AND (P_MFGR = 'MFGR#1' OR P_MFGR = 'MFGR#2') GROUP BYyear,S_NATION,P_CATEGORY ORDER BYyear ASC,S_NATION ASC,P_CATEGORY ASC;
Q4.3
SELECTtoYear(LO_ORDERDATE) AS year,S_CITY,P_BRAND,sum(LO_REVENUE - LO_SUPPLYCOST) AS profit FROM lineorder_flat WHERE S_NATION = 'UNITED STATES' AND (year = 1997 OR year = 1998) AND P_CATEGORY = 'MFGR#14' GROUP BYyear,S_CITY,P_BRAND ORDER BYyear ASC,S_CITY ASC,P_BRAND ASC;總結
以上是生活随笔為你收集整理的clickhouse SSB 性能测试的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: [转载]使用 Apache Geroni
- 下一篇: DDNS请求到底请求了什么