當前位置：首頁 >

Elasticsearch索引的数据存储路径是如何确定的

發布時間：2025/3/15 39 豆豆

生活随笔收集整理的這篇文章主要介紹了 Elasticsearch索引的数据存储路径是如何确定的小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

Elasticsearch中，在node的配置中可以指定path.data用來作為節點數據的存儲目錄，而且我們可以指定多個值來作為數據存儲的路徑，那么Elasticsearch是如何判斷應該存儲到哪個路徑下呢？今天我就記錄一下這個問題。

Elasticsearch的索引創建過程

集群master收到創建索引的請求后，經過創建索引的一些步驟，最終會將索引創建完成的請求提交到ClusterState

master將根據ClusterState分發給所有節點

涉及創建shard的節點會讀取本地可用的path.data，然后依據一定的規則獲取路徑。

創建基本shard路徑，保存基本的shard信息。

如何確定在哪個目錄下

源碼

主要調用的是ShardPath的selectNewPathForShard方法

for (NodeEnvironment.NodePath nodePath : env.nodePaths()) {totFreeSpace = totFreeSpace.add(BigInteger.valueOf(nodePath.fileStore.getUsableSpace()));}// TODO: this is a hack!! We should instead keep track of incoming (relocated) shards since we know// how large they will be once they're done copying, instead of a silly guess for such cases:// Very rough heuristic of how much disk space we expect the shard will use over its lifetime, the max of current average// shard size across the cluster and 5% of the total available free space on this node:BigInteger estShardSizeInBytes = BigInteger.valueOf(avgShardSizeInBytes).max(totFreeSpace.divide(BigInteger.valueOf(20)));// TODO - do we need something more extensible? Yet, this does the job for now...final NodeEnvironment.NodePath[] paths = env.nodePaths();// If no better path is chosen, use the one with the most space by defaultNodeEnvironment.NodePath bestPath = getPathWithMostFreeSpace(env);if (paths.length != 1) {Map<NodeEnvironment.NodePath, Long> pathToShardCount = env.shardCountPerPath(shardId.getIndex());// Compute how much space there is on each pathfinal Map<NodeEnvironment.NodePath, BigInteger> pathsToSpace = new HashMap<>(paths.length);for (NodeEnvironment.NodePath nodePath : paths) {FileStore fileStore = nodePath.fileStore;BigInteger usableBytes = BigInteger.valueOf(fileStore.getUsableSpace());pathsToSpace.put(nodePath, usableBytes);}bestPath = Arrays.stream(paths)// Filter out paths that have enough space.filter((path) -> pathsToSpace.get(path).subtract(estShardSizeInBytes).compareTo(BigInteger.ZERO) > 0)// Sort by the number of shards for this index.sorted((p1, p2) -> {int cmp = Long.compare(pathToShardCount.getOrDefault(p1, 0L),pathToShardCount.getOrDefault(p2, 0L));if (cmp == 0) {// if the number of shards is equal, tie-break with the number of total shardscmp = Integer.compare(dataPathToShardCount.getOrDefault(p1.path, 0),dataPathToShardCount.getOrDefault(p2.path, 0));if (cmp == 0) {// if the number of shards is equal, tie-break with the usable bytescmp = pathsToSpace.get(p2).compareTo(pathsToSpace.get(p1));}}return cmp;})// Return the first result.findFirst()// Or the existing best path if there aren't any that fit the criteria.orElse(bestPath);}statePath = bestPath.resolve(shardId);dataPath = statePath;}

過程分析

首先判斷是否自定義了path.data，沒有自定義就在默認路徑下創建

自定義的情況下確保節點下最少有5%的空間可以使用

獲取所有的paths，

然后設置默認最佳的path是當前擁有最多空間的path

遍歷所有的paths，首先過濾掉沒有空間的path，如果最終沒有符合的，就返回4步驟的path，否則繼續6步驟

按照規則對paths排序，首先判斷每個path下該索引的shard數，優先返回含有本索引的shard數最少的path；
當條件結果相同，對比每個path中包含有的shard總數（所有索引的），返回包含shard數最少的path；
當2條件結果相同，對比可用空間，返回可用空間最大的path

生成相應的路徑，創建目錄等信息。

總結

以上是生活随笔為你收集整理的Elasticsearch索引的数据存储路径是如何确定的的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇： php获取p标签的值,js使用html(
下一篇： ElasticSearch前缀匹配查询（

日韩av黄I国产麻豆传媒I国产91av视频在线观看I日韩一区二区三区在线看I美女国产在线I麻豆视频国产在线观看I成人黄色短片

Elasticsearch索引的数据存储路径是如何确定的

Elasticsearch的索引創建過程

如何確定在哪個目錄下

源碼

過程分析

總結