當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

MongoDB导出场景查询优化 #1

發(fā)布時間：2025/5/22 编程问答 22 豆豆

生活随笔收集整理的這篇文章主要介紹了 MongoDB导出场景查询优化 #1 小編覺得挺不錯的,現(xiàn)在分享給大家,幫大家做個參考.

原始鏈接:https://github.com/aCoder2013/blog/issues/1 轉(zhuǎn)載請注明出處

引言

前段時間遇到一個類似導(dǎo)出數(shù)據(jù)場景，觀察下來發(fā)現(xiàn)速度會越來越慢，導(dǎo)出100萬數(shù)據(jù)需要耗費40-60分鐘，從日志觀察發(fā)現(xiàn)，耗時也是越來越高。

原因

從代碼邏輯上看，這里采取了分批次導(dǎo)出的方式，類似前端的分頁，具體是通過skip+limit的方式實現(xiàn)的，那么采用這種方式會有什么問題呢?我們google一下這兩個接口的文檔:

The?cursor.skip()?method is often expensive because it requires the server to walk from the beginning of the collection or index to get the offset or skip position before beginning to return results. As the offset (e.g.?pageNumber?above) increases,?cursor.skip()?will become slower and more CPU intensive. With larger collections,?cursor.skip()?may become IO bound.

簡單來說，隨著頁數(shù)的增長，skip()會變得越來越慢，但是具體就我們這里導(dǎo)出的場景來說，按理說應(yīng)該沒必要每次都去重復(fù)計算，做一些無用功，我的理解應(yīng)該可以拿到一個指針，慢慢遍歷，簡單google之后，我們發(fā)現(xiàn)果然是可以這樣做的。

我們可以在持久層新增一個方法，返回一個cursor專門供上層去遍歷數(shù)據(jù)，這樣就不用再去遍歷已經(jīng)導(dǎo)出過的結(jié)果集，從O(N2)優(yōu)化到了O(N),這里還可以指定一個batchSize,設(shè)置一次從MongoDB中抓取的數(shù)據(jù)量(元素個數(shù))，注意這里最大是4M.

/*** Limits the number of elements returned in one batch. A cursor * typically fetches a batch of result objects and store them* locally.** If {@code batchSize} is positive, it represents the size of each batch of objects retrieved. It can be adjusted to optimize* performance and limit data transfer.** If {@code batchSize} is negative, it will limit of number objects returned, that fit within the max batch size limit (usually* 4MB), and cursor will be closed. For example if {@code batchSize} is -10, then the server will return a maximum of 10 documents and* as many as can fit in 4MB, then close the cursor. Note that this feature is different from limit() in that documents must fit within* a maximum size, and it removes the need to send a request to close the cursor server-side. */

比如說我這里配置的8000，那么mongo客戶端就會去默認抓取這么多的數(shù)據(jù)量:

經(jīng)過本地簡單的測試，我們發(fā)現(xiàn)性能已經(jīng)有了飛躍的提升，導(dǎo)出30萬數(shù)據(jù)，采用之前的方式，翻頁到后面平均要500ms，總耗時60039ms。而優(yōu)化后的方式，平均耗時在100ms-200ms之間，總耗時16667ms(中間包括業(yè)務(wù)邏輯的耗時)。

使用

DBCursor cursor = collection.find(query).batchSize(8000); while (dbCursor.hasNext()) {DBObject nextItem = dbCursor.next();//業(yè)務(wù)代碼... // }

那么我們再看看hasNext內(nèi)部的邏輯好嗎？好的.

@Overridepublic boolean hasNext() {if (closed) {throw new IllegalStateException("Cursor has been closed");}if (nextBatch != null) {return true;}if (limitReached()) {return false;}while (serverCursor != null) {//這里會向mongo發(fā)送一條指令去抓取數(shù)據(jù)getMore();if (nextBatch != null) {return true;}}return false;}private void getMore() {Connection connection = connectionSource.getConnection();try {if(serverIsAtLeastVersionThreeDotTwo(connection.getDescription()){try { //可以看到這里其實是調(diào)用了`nextBatch`指令 initFromCommandResult(connection.command(namespace.getDatabaseName(),asGetMoreCommandDocument(),false,new NoOpFieldNameValidator(),CommandResultDocumentCodec.create(decoder, "nextBatch")));} catch (MongoCommandException e) {throw translateCommandException(e, serverCursor);}} else {initFromQueryResult(connection.getMore(namespace, serverCursor.getId(),getNumberToReturn(limit, batchSize, count),decoder));}if (limitReached()) {killCursor(connection);}} finally {connection.release();}}

最后initFromCommandResult 拿到結(jié)果并解析成Bson對象

總結(jié)

我們平常寫代碼的時候，最好都能夠針對每個方法、接口甚至是更細的粒度加上埋點，也可以設(shè)置成debug級別，這樣利用log4j/logback等日志框架動態(tài)更新級別，可以隨時查看耗時，從而更能夠針對性的優(yōu)化，對于本文說的這個場景，我們首先看看是不是代碼的邏輯有問題，然后看看是不是數(shù)據(jù)庫的問題，比如說沒建索引、數(shù)據(jù)量過大等，再去想辦法針對性的優(yōu)化，而不要上來就擼代碼。

轉(zhuǎn)載于:https://blog.51cto.com/13876921/2146959

總結(jié)

以上是生活随笔為你收集整理的MongoDB导出场景查询优化 #1的全部內(nèi)容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯，歡迎將生活随笔推薦給好友。

上一篇： Android Studio 第六十五期
下一篇： MySQL基础之增删改查