日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問 生活随笔!

生活随笔

當前位置: 首頁 > 编程资源 > 编程问答 >内容正文

编程问答

clickhouse SSB 性能测试

發布時間:2023/12/18 编程问答 31 豆豆
生活随笔 收集整理的這篇文章主要介紹了 clickhouse SSB 性能测试 小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

SSB(Star Schema Benchmark)的介紹論文地址:
https://www.cs.umb.edu/~poneil/StarSchemaB.PDF

官網鏈接 https://clickhouse.com/docs/en/getting-started/example-datasets/star-schema/

如果安裝系統時,時最小化mini安裝,經常會提示很多命令不存在

  • 提示 git 不存在,使用 yum install git 安裝即可
  • 提示 make: command not found,使用以下命令安裝 make yum install -y gcc gcc-c++ automake autoconf libtool make

ssb-dbgen 測試工具 GitHub 地址 https://github.com/vadimtk/ssb-dbgen

下載并編譯測試工具

git clone https://github.com/vadimtk/ssb-dbgen.git cd ssb-dbgen make

之后就會在當前目錄生成 dbgen 和 qgen 這兩個可執行文件

|

命令結果
./dbgen -s 1 -T ppart.tbl
./dbgen -s 1 -T ssuppliers.tbl
./dbgen -s 1 -T ccustomers.tbl
./dbgen -s 1 -T ddate.tbl
./dbgen -s 1 -T llineorder.tbl
./dbgen -s 1 -T a一次性生成以上所有表

添加 -h 符號,會將文件大小進行格式化,并顯示單位

使用 head -n 10 customer.tbl 命令打印前10行可以看到,tbl 文件是用逗號分隔列,然后使用換行符分割行的

這里數據量業界有一個統稱叫做SF 1SF == 1G
裝載數據 1G -s 1 == 1G #:但這樣有一個弊端 那就是如果裝載的數據量特別大的時候例如 1T 這樣基本需要花費一天的時間 所以我來用多線程的dbgen方法

方法如下:

./dbgen -vfF -s 1000 -S 1 -C 6 & ./dbgen -vfF -s 1000 -S 2 -C 6 & ./dbgen -vfF -s 1000 -S 3 -C 6 & ./dbgen -vfF -s 1000 -S 4 -C 6 & ./dbgen -vfF -s 1000 -S 5 -C 6 & ./dbgen -vfF -s 1000 -S 6 -C 6

參數詳解

  • -v 詳細信息
  • -s 表示生成數據的規模
  • -S 切分數據
  • -f 覆蓋之前的文件

dss.ddl 這個文件存儲的是建表的語句
cat dss.ddl 逐一執行里面的建表語句

建表語句可以在官網拿到,使用外部工具,比如 DBeaver 建表

CREATE DATABASE IF NOT EXISTS ssb USE ssb CREATE TABLE IF NOT EXISTS customer (C_CUSTKEY UInt32,C_NAME String,C_ADDRESS String,C_CITY LowCardinality(String),C_NATION LowCardinality(String),C_REGION LowCardinality(String),C_PHONE String,C_MKTSEGMENT LowCardinality(String) ) ENGINE = MergeTree ORDER BY (C_CUSTKEY);CREATE TABLE IF NOT EXISTS lineorder (LO_ORDERKEY UInt32,LO_LINENUMBER UInt8,LO_CUSTKEY UInt32,LO_PARTKEY UInt32,LO_SUPPKEY UInt32,LO_ORDERDATE Date,LO_ORDERPRIORITY LowCardinality(String),LO_SHIPPRIORITY UInt8,LO_QUANTITY UInt8,LO_EXTENDEDPRICE UInt32,LO_ORDTOTALPRICE UInt32,LO_DISCOUNT UInt8,LO_REVENUE UInt32,LO_SUPPLYCOST UInt32,LO_TAX UInt8,LO_COMMITDATE Date,LO_SHIPMODE LowCardinality(String) ) ENGINE = MergeTree PARTITION BY toYear(LO_ORDERDATE) ORDER BY (LO_ORDERDATE, LO_ORDERKEY);CREATE TABLE IF NOT EXISTS part (P_PARTKEY UInt32,P_NAME String,P_MFGR LowCardinality(String),P_CATEGORY LowCardinality(String),P_BRAND LowCardinality(String),P_COLOR LowCardinality(String),P_TYPE LowCardinality(String),P_SIZE UInt8,P_CONTAINER LowCardinality(String) ) ENGINE = MergeTree ORDER BY P_PARTKEY;CREATE TABLE IF NOT EXISTS supplier (S_SUPPKEY UInt32,S_NAME String,S_ADDRESS String,S_CITY LowCardinality(String),S_NATION LowCardinality(String),S_REGION LowCardinality(String),S_PHONE String ) ENGINE = MergeTree ORDER BY S_SUPPKEY;

之后定位到 dbgen 生成的目錄,使用 pwd 命令,可以打印出當前目錄

接下來,使用 clickhouse-client 工具,將 tbl 中的數據導入數據庫

clickhouse-client --query "INSERT INTO ssb.customer FORMAT CSV" < customer.tbl clickhouse-client --query "INSERT INTO ssb.part FORMAT CSV" < part.tbl clickhouse-client --query "INSERT INTO ssb.supplier FORMAT CSV" < supplier.tbl clickhouse-client --query "INSERT INTO ssb.lineorder FORMAT CSV" < lineorder.tbl

注意:如果你使用的不是 default 數據庫(默認數據庫),請在表前面加上數據庫前綴,否則會報如下錯誤

這是維度為1,即 -s 1 情況下的數據量

SELECT (SELECT COUNT(1) FROM customer) AS customer, (SELECT COUNT(1) FROM lineorder) AS lineorder, (SELECT COUNT(1) FROM part) AS part,(SELECT COUNT(1) FROM supplier) AS supplier


下面這段SQL,會將星型模式(star schema)轉化為 非標準化的(denormalized)平面模型(flat schema)。也就是說,將原本相關聯的表結構,通過某種關系,整合到一張表里去。

DROP TABLE IF EXISTS ssb.lineorder_flat;CREATE TABLE ssb.lineorder_flat ENGINE = MergeTree PARTITION BY toYear(LO_ORDERDATE) ORDER BY (LO_ORDERDATE, LO_ORDERKEY) AS SELECTl.LO_ORDERKEY AS LO_ORDERKEY,l.LO_LINENUMBER AS LO_LINENUMBER,l.LO_CUSTKEY AS LO_CUSTKEY,l.LO_PARTKEY AS LO_PARTKEY,l.LO_SUPPKEY AS LO_SUPPKEY,l.LO_ORDERDATE AS LO_ORDERDATE,l.LO_ORDERPRIORITY AS LO_ORDERPRIORITY,l.LO_SHIPPRIORITY AS LO_SHIPPRIORITY,l.LO_QUANTITY AS LO_QUANTITY,l.LO_EXTENDEDPRICE AS LO_EXTENDEDPRICE,l.LO_ORDTOTALPRICE AS LO_ORDTOTALPRICE,l.LO_DISCOUNT AS LO_DISCOUNT,l.LO_REVENUE AS LO_REVENUE,l.LO_SUPPLYCOST AS LO_SUPPLYCOST,l.LO_TAX AS LO_TAX,l.LO_COMMITDATE AS LO_COMMITDATE,l.LO_SHIPMODE AS LO_SHIPMODE,c.C_NAME AS C_NAME,c.C_ADDRESS AS C_ADDRESS,c.C_CITY AS C_CITY,c.C_NATION AS C_NATION,c.C_REGION AS C_REGION,c.C_PHONE AS C_PHONE,c.C_MKTSEGMENT AS C_MKTSEGMENT,s.S_NAME AS S_NAME,s.S_ADDRESS AS S_ADDRESS,s.S_CITY AS S_CITY,s.S_NATION AS S_NATION,s.S_REGION AS S_REGION,s.S_PHONE AS S_PHONE,p.P_NAME AS P_NAME,p.P_MFGR AS P_MFGR,p.P_CATEGORY AS P_CATEGORY,p.P_BRAND AS P_BRAND,p.P_COLOR AS P_COLOR,p.P_TYPE AS P_TYPE,p.P_SIZE AS P_SIZE,p.P_CONTAINER AS P_CONTAINER FROM ssb.lineorder AS l INNER JOIN ssb.customer AS c ON c.C_CUSTKEY = l.LO_CUSTKEY INNER JOIN ssb.supplier AS s ON s.S_SUPPKEY = l.LO_SUPPKEY INNER JOIN ssb.part AS p ON p.P_PARTKEY = l.LO_PARTKEY;

clickhouse 默認內存大小是1G。在做 SSB 的星型轉平面模型的時,如果沒有增加最大內存限制,就會報如下錯誤(DB::Exception: Memory limit (total) exceeded)


注意:set_memory_usage = 20000000000 這個函數在外部的數據庫管理工具,比如DBeaver中是無法設置的。但在 clickhouse-client 中設置卻有效(本地或遠程均可)。這是因為,該函數僅支持在 TCP 模式下被調用,即 port=9000

另外,set 函數是針對當前會話的,只要一退出,立馬又會還原成之前的樣子。
查看變量值的命令為:SELECT name,value FROM system.settings WHERE name = 'max_memory_usage'

由于0太多,不好數。我們可以使用 formatReadableSize 函數將其格式化。但由于 system.settings 這個表中的 value 是字符串類型的,因此我們必須將起轉為 Int 類型,這就要用到 toInt64 函數了(Int32長度不夠,會導致數值溢出)

完整的命令如下:SELECT name,formatReadableSize(toInt64(value)) FROM system.settings WHERE name = 'max_memory_usage';


接著先將可用內存增大后,再刪除之前創建后轉化失敗的表,然后進行測試

SET max_memory_usage = 20000000000 DROP TABLE IF EXISTS ssb.lineorder_flat

如果機子內存沒有這么大,上述語句無效。但可以通過 SET min_insert_block_size_rows=8192; 減小單次插入塊的行大小(默認值為 1048545)這樣就能安全通過了。

可以看到在虛擬機3G內存下的速度為:每秒71萬行,30MB/S

在執行時,打開另一個終端,在clickhouse-client 中執行 SHOW PROCESSLIST 會打印當前執行的進度

將字符串復制出來,把單引號替換成雙引號。放在在線JSON格式化頁面進行轉義

運行下面查詢語句(使用 USE ssb; 設置默認數據庫)

Q1.1

SELECT sum(LO_EXTENDEDPRICE * LO_DISCOUNT) AS revenue FROM lineorder_flat WHERE toYear(LO_ORDERDATE) = 1993 AND LO_DISCOUNT BETWEEN 1 AND 3 AND LO_QUANTITY < 25;

Q1.2

SELECT sum(LO_EXTENDEDPRICE * LO_DISCOUNT) AS revenue FROM lineorder_flat WHERE toYYYYMM(LO_ORDERDATE) = 199401 AND LO_DISCOUNT BETWEEN 4 AND 6 AND LO_QUANTITY BETWEEN 26 AND 35;

Q1.3

SELECT sum(LO_EXTENDEDPRICE * LO_DISCOUNT) AS revenue FROM lineorder_flat WHERE toISOWeek(LO_ORDERDATE) = 6 AND toYear(LO_ORDERDATE) = 1994AND LO_DISCOUNT BETWEEN 5 AND 7 AND LO_QUANTITY BETWEEN 26 AND 35;

Q2.1

SELECTsum(LO_REVENUE),toYear(LO_ORDERDATE) AS year,P_BRAND FROM lineorder_flat WHERE P_CATEGORY = 'MFGR#12' AND S_REGION = 'AMERICA' GROUP BYyear,P_BRAND ORDER BYyear,P_BRAND;


Q2.2

SELECTsum(LO_REVENUE),toYear(LO_ORDERDATE) AS year,P_BRAND FROM lineorder_flat WHERE P_BRAND >= 'MFGR#2221' AND P_BRAND <= 'MFGR#2228' AND S_REGION = 'ASIA' GROUP BYyear,P_BRAND ORDER BYyear,P_BRAND;


Q2.3

SELECTsum(LO_REVENUE),toYear(LO_ORDERDATE) AS year,P_BRAND FROM lineorder_flat WHERE P_BRAND = 'MFGR#2239' AND S_REGION = 'EUROPE' GROUP BYyear,P_BRAND ORDER BYyear,P_BRAND;

Q3.1

SELECTC_NATION,S_NATION,toYear(LO_ORDERDATE) AS year,sum(LO_REVENUE) AS revenue FROM lineorder_flat WHERE C_REGION = 'ASIA' AND S_REGION = 'ASIA' AND year >= 1992 AND year <= 1997 GROUP BYC_NATION,S_NATION,year ORDER BYyear ASC,revenue DESC;


Q3.2

SELECTC_CITY,S_CITY,toYear(LO_ORDERDATE) AS year,sum(LO_REVENUE) AS revenue FROM lineorder_flat WHERE C_NATION = 'UNITED STATES' AND S_NATION = 'UNITED STATES' AND year >= 1992 AND year <= 1997 GROUP BYC_CITY,S_CITY,year ORDER BYyear ASC,revenue DESC;

Q3.3

SELECTC_CITY,S_CITY,toYear(LO_ORDERDATE) AS year,sum(LO_REVENUE) AS revenue FROM lineorder_flat WHERE (C_CITY = 'UNITED KI1' OR C_CITY = 'UNITED KI5') AND (S_CITY = 'UNITED KI1' OR S_CITY = 'UNITED KI5') AND year >= 1992 AND year <= 1997 GROUP BYC_CITY,S_CITY,year ORDER BYyear ASC,revenue DESC;

Q3.4

SELECTC_CITY,S_CITY,toYear(LO_ORDERDATE) AS year,sum(LO_REVENUE) AS revenue FROM lineorder_flat WHERE (C_CITY = 'UNITED KI1' OR C_CITY = 'UNITED KI5') AND (S_CITY = 'UNITED KI1' OR S_CITY = 'UNITED KI5') AND toYYYYMM(LO_ORDERDATE) = 199712 GROUP BYC_CITY,S_CITY,year ORDER BYyear ASC,revenue DESC;

Q4.1

SELECTtoYear(LO_ORDERDATE) AS year,C_NATION,sum(LO_REVENUE - LO_SUPPLYCOST) AS profit FROM lineorder_flat WHERE C_REGION = 'AMERICA' AND S_REGION = 'AMERICA' AND (P_MFGR = 'MFGR#1' OR P_MFGR = 'MFGR#2') GROUP BYyear,C_NATION ORDER BYyear ASC,C_NATION ASC;

Q4.2

SELECTtoYear(LO_ORDERDATE) AS year,S_NATION,P_CATEGORY,sum(LO_REVENUE - LO_SUPPLYCOST) AS profit FROM lineorder_flat WHERE C_REGION = 'AMERICA' AND S_REGION = 'AMERICA' AND (year = 1997 OR year = 1998) AND (P_MFGR = 'MFGR#1' OR P_MFGR = 'MFGR#2') GROUP BYyear,S_NATION,P_CATEGORY ORDER BYyear ASC,S_NATION ASC,P_CATEGORY ASC;


Q4.3

SELECTtoYear(LO_ORDERDATE) AS year,S_CITY,P_BRAND,sum(LO_REVENUE - LO_SUPPLYCOST) AS profit FROM lineorder_flat WHERE S_NATION = 'UNITED STATES' AND (year = 1997 OR year = 1998) AND P_CATEGORY = 'MFGR#14' GROUP BYyear,S_CITY,P_BRAND ORDER BYyear ASC,S_CITY ASC,P_BRAND ASC;

總結

以上是生活随笔為你收集整理的clickhouse SSB 性能测试的全部內容,希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯,歡迎將生活随笔推薦給好友。