當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

postgresql 11 的并行(parallel)简介

發布時間：2023/12/18 编程问答 26 豆豆

生活随笔收集整理的這篇文章主要介紹了 postgresql 11 的并行(parallel)简介小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

postgresql 11 的并行(parallel)簡介

os:centos 7.4
db:postgresql 11.1

postgresql 11 對parallel又有了進一步的加強。

并行創建btree索引

使用共享hash table時可以并行執行hash join

單個選擇如果不能并行化，則允許UNION并行運行每個SELECT

并行掃描分區表

允許 limit 傳遞給并行進程

允許并行進程使用索引掃描式減少返回結果

允許并行化單個計算查詢、where子句聚合查詢和目標列表中的函數

新加參數 parallel_leader_participation 控制執行計劃中的領導者，默認啟用。

并行執行CREATE TABLE … AS, CREATE MATERIALIZED VIEW, certain queries using UNION

并行hash join、并行順序掃描在多并行進程下得到加強

在EXPLAIN中添加并行進程排序活動的報告

# cat /etc/redhat-release CentOS Linux release 7.4.1708 (Core) # su - postgres Last login: Tue Nov 6 16:06:23 CST 2018 on pts/2$ psql -c "select version();"version ---------------------------------------------------------------------------------------------------------PostgreSQL 11.0 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-28), 64-bit (1 row)

實例級參數

有幾種設置會導致查詢規劃器在任何情況下都不生成并行查詢計劃。為了讓并行查詢計劃能夠被生成，必須配置好下列設置。

max_parallel_workers_per_gather 必須被設置為大于零的值。這是一種特殊情況，更加普遍的原則是所用的工作者數量不能超過max_parallel_workers_per_gather所配置的數量。

dynamic_shared_memory_type 必須被設置為除none之外的值。并行查詢要求動態共享內存以便在合作的進程之間傳遞數據。

select *from pg_settings pswhere 1=1and ps.name in ('force_parallel_mode','max_worker_processes','max_parallel_workers','max_parallel_maintenance_workers','max_parallel_workers_per_gather',--'min_parallel_relation_size',-- add 9.6,remove from 10'min_parallel_index_scan_size','min_parallel_table_scan_size','parallel_tuple_cost','parallel_setup_cost','parallel_leader_participation' ) ;name | setting | unit | category | short_desc | extra_desc | context | vartype | source | min_val | max_val | enumvals | boot_val | reset_val | sourcefile | sourceline | pending_restart ----------------------------------+---------+------+----------------------------------------+----------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------+------------+---------+---------+---------+--------------+------------------+----------+-----------+------------+------------+-----------------force_parallel_mode | off | | Query Tuning / Other Planner Options | Forces use of parallel query facilities. | If possible, run query using a parallel worker and with parallel restrictions. | user | enum | default | | | {off,on,regress} | off | off | | | fmax_parallel_maintenance_workers | 4 | | Resource Usage / Asynchronous Behavior | Sets the maximum number of parallel processes per maintenance operation. | | user | integer | session | 0 | 1024 | | 2 | 2 | | | fmax_parallel_workers | 8 | | Resource Usage / Asynchronous Behavior | Sets the maximum number of parallel workers that can be active at one time. | | user | integer | default | 0 | 1024 | | 8 | 8 | | | fmax_parallel_workers_per_gather | 2 | | Resource Usage / Asynchronous Behavior | Sets the maximum number of parallel processes per executor node. | | user | integer | default | 0 | 1024 | | 2 | 2 | | | fmax_worker_processes | 8 | | Resource Usage / Asynchronous Behavior | Maximum number of concurrent worker processes. | | postmaster | integer | default | 0 | 262143 | | 8 | 8 | | | fmin_parallel_index_scan_size | 64 | 8kB | Query Tuning / Planner Cost Constants | Sets the minimum amount of index data for a parallel scan. | If the planner estimates that it will read a number of index pages too small to reach this limit, a parallel scan will not be considered. | user | integer | default | 0 | 715827882 | | 64 | 64 | | | fmin_parallel_table_scan_size | 1024 | 8kB | Query Tuning / Planner Cost Constants | Sets the minimum amount of table data for a parallel scan. | If the planner estimates that it will read a number of table pages too small to reach this limit, a parallel scan will not be considered. | user | integer | default | 0 | 715827882 | | 1024 | 1024 | | | fparallel_leader_participation | on | | Resource Usage / Asynchronous Behavior | Controls whether Gather and Gather Merge also run subplans. | Should gather nodes also run subplans, or just gather tuples? | user | bool | default | | | | on | on | | | fparallel_setup_cost | 1000 | | Query Tuning / Planner Cost Constants | Sets the planner's estimate of the cost of starting up worker processes for parallel query. | | user | real | default | 0 | 1.79769e+308 | | 1000 | 1000 | | | fparallel_tuple_cost | 0.1 | | Query Tuning / Planner Cost Constants | Sets the planner's estimate of the cost of passing each tuple (row) from worker to master backend. | | user | real | default | 0 | 1.79769e+308 | | 0.1 | 0.1 | | | f (10 rows)

新加了幾個參數
max_parallel_maintenance_workers
設置維護命令(例如 CREATE INDEX) 允許的最大并行進程數，默認值為2。

parallel_leader_participation
這個參數沒太理解，保持默認設置吧。看英文文檔大概理解為控制并行執行的效率。

Allows the leader process to execute the query plan under Gather and Gather Merge nodes instead of waiting for worker processes. The default is on. Setting this value to off reduces the likelihood that workers will become blocked because the leader is not reading tuples fast enough, but requires the leader process to wait for worker processes to start up before the first tuples can be produced. The degree to which the leader can help or hinder performance depends on the plan type, number of workers and query duration.

之前對max_worker_processes這個參數理解不深刻，又翻了一遍文檔，再次理解了下。

max_worker_processes
數據庫允許的最大注冊后臺進程數，并行進程屬于后臺進程的一種。
這里描述下postgresql的后臺進程又分為兩種：
第一種是只能在postmaster內調用RegisterBackgroundWorker(BackgroundWorker *worker)來注冊；
第二種是在系統啟動后通過調用函數RegisterDynamicBackgroundWorker(BackgroundWorker *worker, BackgroundWorkerHandle **handle)來啟動后臺工作者，該注冊的后臺工作者最大數量由max_worker_processes限制。

max_parallel_workers
參數設置數據庫允許的最大并行進程數。

postgresql 11 并行進程調整為兩類:
第一類是并行查詢，其并行度由 max_parallel_workers_per_gather 控制
第二類是維護命令(例如 CREATE INDEX)，其并行度由 max_parallel_maintenance_workers 控制。

max_parallel_workers 值應小于或等于max_worker_processes。
max_parallel_workers_per_gather+max_parallel_maintenance_workers 值應小于或等于 max_parallel_workers。

參數控制體現了層級的思維

驗證

并行全表掃描

postgres=# explain select count(1) from tmp_t0;QUERY PLAN ------------------------------------------------------------------------------------------Finalize Aggregate (cost=75279.14..75279.15 rows=1 width=8)-> Gather (cost=75278.92..75279.13 rows=2 width=8)Workers Planned: 2-> Partial Aggregate (cost=74278.92..74278.93 rows=1 width=8)-> Parallel Seq Scan on tmp_t0 (cost=0.00..73613.74 rows=266074 width=0) (5 rows)

并行創建btree索引

postgres=# create index idx_tmp_t0_x1 on public.tmp_t0 using btree (c0); CREATE INDEX

通過top命令可以查看到并行進程
postgres: parallel worker for PID 2196
postgres: postgres postgres [local] CREATE INDEX

并行create table as

postgres=# show max_parallel_maintenance_workers ;max_parallel_maintenance_workers ----------------------------------2 (1 row)postgres=# create table tmp_t1 as select * from public.tmp_t0; SELECT 5000000

并行join

postgres=# explain select count(1) from tmp_t0 t0,tmp_t1 t1 where t0.c0=t1.c0;QUERY PLAN ----------------------------------------------------------------------------------------------------------Finalize Aggregate (cost=267463.69..267463.70 rows=1 width=8)-> Gather (cost=267463.47..267463.68 rows=2 width=8)Workers Planned: 2-> Partial Aggregate (cost=266463.47..266463.48 rows=1 width=8)-> Parallel Hash Join (cost=125958.51..261255.86 rows=2083045 width=0)Hash Cond: ((t0.c0)::text = (t1.c0)::text)-> Parallel Seq Scan on tmp_t0 t0 (cost=0.00..91786.33 rows=2083333 width=7)-> Parallel Hash (cost=91783.45..91783.45 rows=2083045 width=7)-> Parallel Seq Scan on tmp_t1 t1 (cost=0.00..91783.45 rows=2083045 width=7) (9 rows)

參考：
https://www.postgresql.org/docs/11/release-11-1.html

總結

以上是生活随笔為你收集整理的postgresql 11 的并行(parallel)简介的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇：个人见解：什么是WBS?
下一篇： mac上彻底删除搜狗输入法鼠须管输入