日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問(wèn) 生活随笔!

生活随笔

當(dāng)前位置: 首頁(yè) > 运维知识 > 数据库 >内容正文

数据库

mysql源码解读——MVCC

發(fā)布時(shí)間:2023/12/31 数据库 22 豆豆
生活随笔 收集整理的這篇文章主要介紹了 mysql源码解读——MVCC 小編覺(jué)得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.

一、什么是MVCC

MVCC(Multi-Version Concurrency Control)多版本并發(fā)控制,這個(gè)玩意兒當(dāng)初大意過(guò),竟然理解成了源代碼的版本控制。傻了巴唧的。MVCC其實(shí)是用來(lái)做數(shù)據(jù)安全性的,有過(guò)多線(xiàn)程的共享數(shù)據(jù)控制的編寫(xiě)經(jīng)驗(yàn)的開(kāi)發(fā)人員,理解起來(lái)會(huì)更容易一些。后來(lái)在區(qū)塊鏈中的提高交易速度時(shí),有一些鏈采用了并行交易,而這其中,對(duì)交易的控制管理也使用了MVCC的控制方式。在MySql數(shù)據(jù)庫(kù)數(shù)據(jù)的訪問(wèn)中,多個(gè)客戶(hù)端訪問(wèn)服務(wù)端時(shí),如果有讀有寫(xiě),就可能產(chǎn)生數(shù)據(jù)不一致的現(xiàn)象(臟讀和幻讀,而具體到為RC和RR即Read Committed和Repeatable Read兩個(gè)事務(wù),MySql默認(rèn)是RR事務(wù)隔離級(jí)別),而此時(shí)就需要用到MVCC版本控制 。不同版本的MySql對(duì)MVCC的應(yīng)用,可能會(huì)有所不同,這時(shí)請(qǐng)關(guān)注相關(guān)版本的官方說(shuō)明文檔,一切以官方文檔或者源碼為基準(zhǔn),不要想當(dāng)然。如果想進(jìn)一步對(duì)數(shù)據(jù)庫(kù)中的相關(guān)數(shù)據(jù)安全性有興趣,推薦看一下《數(shù)據(jù)密集型應(yīng)用系統(tǒng)設(shè)計(jì)》,其中不但MVCC講的清晰還有更深層次的各種剖析。

二、mysql中的應(yīng)用

在MySql中,讀取已提交和可重復(fù)讀這兩個(gè)事務(wù)中MVCC是有效的,也就是說(shuō),只有在這兩種情況下,才有討論MVCC的意義。在MySql中為了實(shí)現(xiàn)MVCC,InnoDB引擎默認(rèn)為每一行添加了三個(gè)隱藏列(Oracle等數(shù)據(jù)庫(kù)也有類(lèi)似的動(dòng)作),這三個(gè)列分別為:
DB_ROW_ID:6字節(jié)長(zhǎng)的ID,MySQL中如果沒(méi)有主鍵會(huì)默認(rèn)創(chuàng)建這個(gè),當(dāng)初Oracle也有一個(gè)類(lèi)似的ROWID;
DB_TRX_ID:6字節(jié)長(zhǎng)的事務(wù)ID,存儲(chǔ)了當(dāng)前事務(wù)在做INSERT或UPDATE語(yǔ)句操作時(shí)的最后一個(gè)事務(wù)ID;
DB_ROLL_PTR:7字節(jié)長(zhǎng)的回滾指針,其指向?qū)懭牖貪L段的undo log記錄,通過(guò)它可以將不同的版本串聯(lián)起來(lái),形成版本鏈。這個(gè)如果不定期提交事務(wù),那么會(huì)使回滾部分占滿(mǎn)空間。
在MVCC中讀操作有兩種,快照讀(snapshot read)和當(dāng)前讀(current read),快照讀不加鎖,只讀可見(jiàn)版本;當(dāng)前讀即增刪改,需要加鎖,至于為啥叫讀,你增刪改不也得先讀到指定的位置才能寫(xiě)!
在MySql中有兩種實(shí)現(xiàn)事務(wù)隔離的方案,除了今天重點(diǎn)說(shuō)的MVCC,另外簡(jiǎn)單說(shuō)明一下MySql中LBCC方案,其有兩個(gè)鎖:
Record lock: 只鎖索引而不是記錄。如果沒(méi)有指定主鍵索引,如上所述InnoDB會(huì)創(chuàng)建一個(gè)隱藏的主鍵索引。
Gap lock: 間隙鎖,它創(chuàng)建在指定記錄前或后條記錄之間間隙的鎖,它只要是用于解決RR隔離級(jí)別下的幻讀問(wèn)題。
提到MVCC就得提到Read View(這玩意兒和PBFT中的場(chǎng)景有點(diǎn)類(lèi)似),在不同的事務(wù)級(jí)別下(前面提到的RC和RR),Read View的產(chǎn)生機(jī)制也有不同,比如RR下會(huì)創(chuàng)建使用同一個(gè)事務(wù)創(chuàng)建的快照,而RC則每次生成一個(gè)新Read View。
在查詢(xún)的過(guò)程中,有兩種情況,一種查詢(xún)是在本事務(wù)中,一種不是在本事務(wù)中。在MySql中,單純的查詢(xún)不會(huì)產(chǎn)生事務(wù)ID,只有更新(增刪改)操作后才會(huì)有,而且ID不是更新開(kāi)始就創(chuàng)建而是這個(gè)語(yǔ)句完成后才會(huì)創(chuàng)建。
這里面的不同在于,如果在相同事務(wù)中,是可以看到相關(guān)的更新的數(shù)據(jù)內(nèi)容的。
那么什么是Read View?前面提到過(guò)undo log,Read View其實(shí)就是通過(guò)這些快照數(shù)據(jù)產(chǎn)生的讀視圖,視圖中的每條數(shù)據(jù),可以通過(guò)上面提到的DB_TRX_ID和DB_ROLL_PTR來(lái)標(biāo)識(shí)版本和指向下一個(gè)版本的指針。如果有C語(yǔ)言中的鏈表的經(jīng)驗(yàn)?zāi)敲催@個(gè)說(shuō)法非常容易理解。通常,這個(gè)DB_TRX_ID,即事務(wù)ID是自動(dòng)+1的。所以最新的事務(wù)其ID值是最大的。弄明白了Read View,就可以理解MVCC的流程了:
1、將當(dāng)前存在的事務(wù)分成三部分:已提交事務(wù);未提交事務(wù)和已提交事務(wù);未開(kāi)始事務(wù)。這三部分通過(guò)目前已知活動(dòng)的事務(wù)ID中找出最小ID,最大ID(Read View來(lái)維護(hù))。
2、三段的意義是:小于最小ID的,表明已經(jīng)提交成功,在查詢(xún)時(shí)數(shù)據(jù)是可見(jiàn)的,也就是可以查詢(xún)出來(lái)的;大于最大ID的,說(shuō)明事務(wù)尚未啟動(dòng),數(shù)據(jù)不可見(jiàn);這里面需要說(shuō)明的是“未提交事務(wù)和已提交事務(wù)”,它指的是,在Read View中,如果這個(gè)事務(wù)ID處于未提交事務(wù)數(shù)組中,那么這個(gè)數(shù)據(jù)不可見(jiàn);如果不在這個(gè)數(shù)組中,則可見(jiàn)。記住噢,只有一個(gè)未提交事務(wù)數(shù)組。通過(guò)它來(lái)判斷。
3、通過(guò)這三段ID來(lái)判斷Read View中的事務(wù)ID,小于最小ID的,歸為已提交事務(wù);大于最大ID的歸為未開(kāi)始事務(wù);余下的為未提交事務(wù)和已提交事務(wù)。
4、根據(jù)具體的判斷結(jié)果,來(lái)決定采取使用哪個(gè)版本中的具體的數(shù)據(jù)。
5、處理版本數(shù)據(jù)并返回。

三、源碼解讀

通過(guò)上面的具體分析,來(lái)看一下源碼相關(guān)具體的實(shí)現(xiàn):
1、基本的數(shù)據(jù)結(jié)構(gòu)
基本的數(shù)據(jù)結(jié)構(gòu)包括事務(wù)、MVCC和Read View:

//storage/innobase/include /** The transaction system central memory data structure. */ struct trx_sys_t {TrxSysMutex mutex; /*!< mutex protecting most fields inthis structure except when notedotherwise */MVCC *mvcc; /*!< Multi version concurrency controlmanager */volatile trx_id_t max_trx_id; /*!< The smallest number not yetassigned as a transaction id ortransaction number. This is declaredvolatile because it can be accessedwithout holding any mutex duringAC-NL-RO view creation. */std::atomic<trx_id_t> min_active_id;/*!< Minimal transaction id which isstill in active state. */trx_ut_list_t serialisation_list;/*!< Ordered on trx_t::no of all thecurrenrtly active RW transactions */ #ifdef UNIV_DEBUGtrx_id_t rw_max_trx_no; /*!< Max trx number of read-writetransactions added for purge. */ #endif /* UNIV_DEBUG */char pad1[64]; /*!< To avoid false sharing */trx_ut_list_t rw_trx_list; /*!< List of active and committed inmemory read-write transactions, sortedon trx id, biggest first. Recoveredtransactions are always on this list. */char pad2[64]; /*!< To avoid false sharing */trx_ut_list_t mysql_trx_list; /*!< List of transactions createdfor MySQL. All user transactions areon mysql_trx_list. The rw_trx_listcan contain system transactions andrecovered transactions that will notbe in the mysql_trx_list.mysql_trx_list may additionally containtransactions that have not yet beenstarted in InnoDB. */trx_ids_t rw_trx_ids; /*!< Array of Read write transaction IDsfor MVCC snapshot. A ReadView would takea snapshot of these transactions whosechanges are not visible to it. We shouldremove transactions from the list beforecommitting in memory and releasing locksto ensure right order of removal andconsistent snapshot. */char pad3[64]; /*!< To avoid false sharing */Rsegs rsegs; /*!< Vector of pointers to rollbacksegments. These rsegs are iteratedand added to the end under a readlock. They are deleted under a writelock while the vector is adjusted.They are created and destroyed insingle-threaded mode. */Rsegs tmp_rsegs; /*!< Vector of pointers to rollbacksegments within the temp tablespace;This vector is created and destroyedin single-threaded mode so it is notprotected by any mutex because it isread-only during multi-threadedoperation. *//** Length of the TRX_RSEG_HISTORY list (update undo logs for committed* transactions). */std::atomic<uint64_t> rseg_history_len;TrxIdSet rw_trx_set; /*!< Mapping from transaction idto transaction instance */ulint n_prepared_trx; /*!< Number of transactions currentlyin the XA PREPARED state */bool found_prepared_trx; /*!< True if XA PREPARED trxs arefound. */ }; /** The MVCC read view manager */ //storage/innobase/include/read0read.h class MVCC {public:/** Constructor@param size Number of views to pre-allocate */explicit MVCC(ulint size);/** Destructor.Free all the views in the m_free list */~MVCC();/** Allocate and create a view.@param view View owned by this class created for the caller. Must befreed by calling view_close()@param trx Transaction instance of caller */void view_open(ReadView *&view, trx_t *trx);/**Close a view created by the above function.@param view view allocated by trx_open.@param own_mutex true if caller owns trx_sys_t::mutex */void view_close(ReadView *&view, bool own_mutex);/**Release a view that is inactive but not closed. Caller must ownthe trx_sys_t::mutex.@param view View to release */void view_release(ReadView *&view);/** Clones the oldest view and stores it in view. No need tocall view_close(). The caller owns the view that is passed in.It will also move the closed views from the m_views list to them_free list. This function is called by Purge to determine whether it shouldpurge the delete marked record or not.@param view Preallocated view, owned by the caller */void clone_oldest_view(ReadView *view);/**@return the number of active views */ulint size() const;/**@return true if the view is active and valid */static bool is_view_active(ReadView *view) {ut_a(view != reinterpret_cast<ReadView *>(0x1));return (view != nullptr && !(intptr_t(view) & 0x1));}/**Set the view creator transaction id. Note: This shouldbe set onlyfor views created by RW transactions. */static void set_view_creator_trx_id(ReadView *view, trx_id_t id);private:/**Validates a read view list. */bool validate() const;/**Find a free view from the active list, if none found then allocatea new view. This function will also attempt to move delete markedviews from the active list to the freed list.@return a view to use */inline ReadView *get_view();/**Get the oldest view in the system. It will also move the deletemarked read views from the views list to the freed list.@return oldest view if found or NULL */inline ReadView *get_oldest_view() const;ReadView *get_view_created_by_trx_id(trx_id_t trx_id) const;private:// Prevent copyingMVCC(const MVCC &);MVCC &operator=(const MVCC &);private:typedef UT_LIST_BASE_NODE_T(ReadView) view_list_t;/** Free views ready for reuse. */view_list_t m_free;/** Active and closed views, the closed views will have thecreator trx id set to TRX_ID_MAX */view_list_t m_views; };/** Mapping read-write transactions from id to transaction instance, for creating read views and during trx id lookup for MVCC and locking. */ struct TrxTrack {explicit TrxTrack(trx_id_t id, trx_t *trx = nullptr) : m_id(id), m_trx(trx) {// Do nothing}trx_id_t m_id;trx_t *m_trx; };struct TrxTrackHash {size_t operator()(const TrxTrack &key) const { return (size_t(key.m_id)); } };/** Comparator for TrxMap */ struct TrxTrackHashCmp {bool operator()(const TrxTrack &lhs, const TrxTrack &rhs) const {return (lhs.m_id == rhs.m_id);} };/** Comparator for TrxMap */ struct TrxTrackCmp {bool operator()(const TrxTrack &lhs, const TrxTrack &rhs) const {return (lhs.m_id < rhs.m_id);} };// typedef std::unordered_set<TrxTrack, TrxTrackHash, TrxTrackHashCmp> TrxIdSet; typedef std::set<TrxTrack, TrxTrackCmp, ut_allocator<TrxTrack>> TrxIdSet;//storage/innobase/include // Friend declaration class MVCC;/** Read view lists the trx ids of those transactions for which a consistent read should not see the modifications to the database. */class ReadView {/** This is similar to a std::vector but it is not a dropin replacement. It is specific to ReadView. */class ids_t {typedef trx_ids_t::value_type value_type;/**Constructor */ids_t() : m_ptr(), m_size(), m_reserved() {}/**Destructor */~ids_t() { UT_DELETE_ARRAY(m_ptr); }/** Try and increase the size of the array. Old elements are copied across.It is a no-op if n is < current size.@param n Make space for n elements */void reserve(ulint n);/**Resize the array, sets the current element count.@param n new size of the array, in elements */void resize(ulint n) {ut_ad(n <= capacity());m_size = n;}/**Reset the size to 0 */void clear() { resize(0); }/**@return the capacity of the array in elements */ulint capacity() const { return (m_reserved); }/**Copy and overwrite the current array contents@param start Source array@param end Pointer to end of array */void assign(const value_type *start, const value_type *end);/**Insert the value in the correct slot, preserving the order.Doesn't check for duplicates. */void insert(value_type value);/**@return the value of the first element in the array */value_type front() const {ut_ad(!empty());return (m_ptr[0]);}/**@return the value of the last element in the array */value_type back() const {ut_ad(!empty());return (m_ptr[m_size - 1]);}/**Append a value to the array.@param value the value to append */void push_back(value_type value);/**@return a pointer to the start of the array */trx_id_t *data() { return (m_ptr); }/**@return a const pointer to the start of the array */const trx_id_t *data() const { return (m_ptr); }/**@return the number of elements in the array */ulint size() const { return (m_size); }/**@return true if size() == 0 */bool empty() const { return (size() == 0); }private:// Prevent copyingids_t(const ids_t &);ids_t &operator=(const ids_t &);private:/** Memory for the array */value_type *m_ptr;/** Number of active elements in the array */ulint m_size;/** Size of m_ptr in elements */ulint m_reserved;friend class ReadView;};public:ReadView();~ReadView();/** Check whether transaction id is valid.@param[in] id transaction id to check@param[in] name table name */static void check_trx_id_sanity(trx_id_t id, const table_name_t &name);/** Check whether the changes by id are visible.@param[in] id transaction id to check against the view@param[in] name table name@return whether the view sees the modifications of id. */bool changes_visible(trx_id_t id, const table_name_t &name) constMY_ATTRIBUTE((warn_unused_result)) {ut_ad(id > 0);if (id < m_up_limit_id || id == m_creator_trx_id) {return (true);}check_trx_id_sanity(id, name);if (id >= m_low_limit_id) {return (false);} else if (m_ids.empty()) {return (true);}const ids_t::value_type *p = m_ids.data();return (!std::binary_search(p, p + m_ids.size(), id));}/**@param id transaction to check@return true if view sees transaction id */bool sees(trx_id_t id) const { return (id < m_up_limit_id); }/**Mark the view as closed */void close() {ut_ad(m_creator_trx_id != TRX_ID_MAX);m_creator_trx_id = TRX_ID_MAX;}/**@return true if the view is closed */bool is_closed() const { return (m_closed); }/**Write the limits to the file.@param file file to write to */void print_limits(FILE *file) const {fprintf(file,"Trx read view will not see trx with"" id >= " TRX_ID_FMT ", sees < " TRX_ID_FMT "\n",m_low_limit_id, m_up_limit_id);}/** Check and reduce low limit number for read view. Used toblock purge till GTID is persisted on disk table.@param[in] trx_no transaction number to check with */void reduce_low_limit(trx_id_t trx_no) {if (trx_no < m_low_limit_no) {/* Save low limit number set for Read View for MVCC. */ut_d(m_view_low_limit_no = m_low_limit_no);m_low_limit_no = trx_no;}}/**@return the low limit no */trx_id_t low_limit_no() const { return (m_low_limit_no); }/**@return the low limit id */trx_id_t low_limit_id() const { return (m_low_limit_id); }/**@return true if there are no transaction ids in the snapshot */bool empty() const { return (m_ids.empty()); }#ifdef UNIV_DEBUG/**@return the view low limit number */trx_id_t view_low_limit_no() const { return (m_view_low_limit_no); }/**@param rhs view to compare with@return truen if this view is less than or equal rhs */bool le(const ReadView *rhs) const {return (m_low_limit_no <= rhs->m_low_limit_no);} #endif /* UNIV_DEBUG */private:/**Copy the transaction ids from the source vector */inline void copy_trx_ids(const trx_ids_t &trx_ids);/**Opens a read view where exactly the transactions serialized before thispoint in time are seen in the view.@param id Creator transaction id */inline void prepare(trx_id_t id);/**Copy state from another view. Must call copy_complete() to finish.@param other view to copy from */inline void copy_prepare(const ReadView &other);/**Complete the copy, insert the creator transaction id into them_trx_ids too and adjust the m_up_limit_id *, if required */inline void copy_complete();/**Set the creator transaction id, existing id must be 0 */void creator_trx_id(trx_id_t id) {ut_ad(m_creator_trx_id == 0);m_creator_trx_id = id;}friend class MVCC;private:// Disable copyingReadView(const ReadView &);ReadView &operator=(const ReadView &);private:/** The read should not see any transaction with trx id >= thisvalue. In other words, this is the "high water mark". */trx_id_t m_low_limit_id;/** The read should see all trx ids which are strictlysmaller (<) than this value. In other words, this is thelow water mark". */trx_id_t m_up_limit_id;/** trx id of creating transaction, set to TRX_ID_MAX for freeviews. */trx_id_t m_creator_trx_id;/** Set of RW transactions that was active when this snapshotwas taken */ids_t m_ids;/** The view does not need to see the undo logs for transactionswhose transaction number is strictly smaller (<) than this value:they can be removed in purge if not needed by other views */trx_id_t m_low_limit_no;#ifdef UNIV_DEBUG/** The low limit number up to which read views don't need to accessundo log records for MVCC. This could be higher than m_low_limit_noif purge is blocked for GTID persistence. Currently used for debugvariable INNODB_PURGE_VIEW_TRX_ID_AGE. */trx_id_t m_view_low_limit_no; #endif /* UNIV_DEBUG *//** AC-NL-RO transaction view that has been "closed". */bool m_closed;typedef UT_LIST_NODE_T(ReadView) node_t;/** List of read views in trx_sys */byte pad1[64 - sizeof(node_t)];node_t m_view_list; };

/*
其實(shí)看上面的數(shù)據(jù)結(jié)構(gòu),其實(shí)內(nèi)聚性還是比較好的,內(nèi)聚性好意味著學(xué)習(xí)時(shí)的難度也降低不少,至少不用不斷的跳來(lái)跳去。英文注釋也挺清晰。

2、讀操作流程
一個(gè)完整的MVVC的對(duì)外暴露過(guò)程是從Select開(kāi)始的,它的調(diào)用棧在前面提到過(guò):
do_command->dispatch_sql_command->mysql_execute_command ->m_sql_cmd->execute---->row_sel->row_sel_get_clust_rec 最終會(huì)調(diào)用(一個(gè)集群一個(gè)非集群看實(shí)際的場(chǎng)景):

//storage/innobase/lock/lock0lock.cc /** Checks that a record is seen in a consistent read.@return true if sees, or false if an earlier version of the recordshould be retrieved */ bool lock_clust_rec_cons_read_sees(const rec_t *rec, /*!< in: user record which should be read orpassed over by a read cursor */dict_index_t *index, /*!< in: clustered index */const ulint *offsets, /*!< in: rec_get_offsets(rec, index) */ReadView *view) /*!< in: consistent read view */ {ut_ad(index->is_clustered());ut_ad(page_rec_is_user_rec(rec));ut_ad(rec_offs_validate(rec, index, offsets));/* Temp-tables are not shared across connections and multipletransactions from different connections cannot simultaneouslyoperate on same temp-table and so read of temp-table isalways consistent read. */if (srv_read_only_mode || index->table->is_temporary()) {ut_ad(view == nullptr || index->table->is_temporary());return (true);}/* NOTE that we call this function while holding the searchsystem latch. */trx_id_t trx_id = row_get_rec_trx_id(rec, index, offsets);return (view->changes_visible(trx_id, index->table->name)); }/** Checks that a non-clustered index record is seen in a consistent read.NOTE that a non-clustered index page contains so little information onits modifications that also in the case false, the present version ofrec may be the right, but we must check this from the clustered indexrecord.@return true if certainly sees, or false if an earlier version of theclustered index record might be needed */ bool lock_sec_rec_cons_read_sees(const rec_t *rec, /*!< in: user record whichshould be read or passed overby a read cursor */const dict_index_t *index, /*!< in: index */const ReadView *view) /*!< in: consistent read view */ {ut_ad(page_rec_is_user_rec(rec));/* NOTE that we might call this function while holding the searchsystem latch. */if (recv_recovery_is_on()) {return (false);} else if (index->table->is_temporary()) {/* Temp-tables are not shared across connections and multipletransactions from different connections cannot simultaneouslyoperate on same temp-table and so read of temp-table isalways consistent read. */return (true);}trx_id_t max_trx_id = page_get_max_trx_id(page_align(rec));ut_ad(max_trx_id > 0);return (view->sees(max_trx_id)); }

看一下最后的返回值函數(shù):

/** Check whether the changes by id are visible. @param[in] id transaction id to check against the view @param[in] name table name @return whether the view sees the modifications of id. */ bool changes_visible(trx_id_t id, const table_name_t &name) constMY_ATTRIBUTE((warn_unused_result)) {ut_ad(id > 0);if (id < m_up_limit_id || id == m_creator_trx_id) {return (true);}check_trx_id_sanity(id, name);if (id >= m_low_limit_id) {return (false);} else if (m_ids.empty()) {return (true);}const ids_t::value_type *p = m_ids.data();return (!std::binary_search(p, p + m_ids.size(), id)); }

需要注意的是,這個(gè)判斷和前面講的有些細(xì)節(jié)的不同,以源碼為主,前面的分析主要是為了說(shuō)明具體的應(yīng)用過(guò)程。這里增加空和等于兩種判斷,等于表示本事務(wù)內(nèi)數(shù)據(jù),當(dāng)然可見(jiàn);空的話(huà)也是可見(jiàn)(ID在中間且空)。

3、Read View創(chuàng)建
剛才說(shuō)過(guò),在RR的情況下第一次查詢(xún)會(huì)生成Read Veiw,那么看一下具體的過(guò)程:

//row0sel.cc dberr_t row_search_mvcc(byte *buf, page_cur_mode_t mode,row_prebuilt_t *prebuilt, ulint match_mode,const ulint direction) {DBUG_TRACE;dict_index_t *index = prebuilt->index;ibool comp = dict_table_is_comp(index->table);const dtuple_t *search_tuple = prebuilt->search_tuple;....../* Do some start-of-statement preparations */if (!prebuilt->sql_stat_start) {/* No need to set an intention lock or assign a read view */if (!MVCC::is_view_active(trx->read_view) && !srv_read_only_mode &&prebuilt->select_lock_type == LOCK_NONE) {ib::error(ER_IB_MSG_1031) << "MySQL is trying to perform a"" consistent read but the read view is not"" assigned!";trx_print(stderr, trx, 600);fputc('\n', stderr);ut_error;}} else if (prebuilt->select_lock_type == LOCK_NONE) {/* This is a consistent read *//* Assign a read view for the query */if (!srv_read_only_mode) {trx_assign_read_view(trx);//此處調(diào)用}prebuilt->sql_stat_start = FALSE;} else {wait_table_again:err = lock_table(0, index->table,prebuilt->select_lock_type == LOCK_S ? LOCK_IS : LOCK_IX,thr);if (err != DB_SUCCESS) {table_lock_waited = TRUE;goto lock_table_wait;}prebuilt->sql_stat_start = FALSE;}...... } /** Assigns a read view for a consistent read query. All the consistent readswithin the same transaction will get the same read view, which is createdwhen this function is first called for a new started transaction.@return consistent read view */ ReadView *trx_assign_read_view(trx_t *trx) /*!< in/out: active transaction */ {ut_ad(trx->state == TRX_STATE_ACTIVE);if (srv_read_only_mode) {ut_ad(trx->read_view == nullptr);return (nullptr);} else if (!MVCC::is_view_active(trx->read_view)) {trx_sys->mvcc->view_open(trx->read_view, trx);}return (trx->read_view); } /** Allocate and create a view. @param view View owned by this class created for the caller. Must be freed by calling view_close() @param trx Transaction instance of caller */ void MVCC::view_open(ReadView *&view, trx_t *trx) {ut_ad(!srv_read_only_mode);/** If no new RW transaction has been started since the last viewwas created then reuse the the existing view. */if (view != nullptr) {uintptr_t p = reinterpret_cast<uintptr_t>(view);view = reinterpret_cast<ReadView *>(p & ~1);ut_ad(view->m_closed);/* NOTE: This can be optimised further, for now we onlyresuse the view iff there are no active RW transactions.There is an inherent race here between purge and thisthread. Purge will skip views that are marked as closed.Therefore we must set the low limit id after we reset theclosed status after the check. */if (trx_is_autocommit_non_locking(trx) && view->empty()) {view->m_closed = false;if (view->m_low_limit_id == trx_sys_get_max_trx_id()) {return;} else {view->m_closed = true;}}mutex_enter(&trx_sys->mutex);UT_LIST_REMOVE(m_views, view);} else {mutex_enter(&trx_sys->mutex);view = get_view();}if (view != nullptr) {view->prepare(trx->id);UT_LIST_ADD_FIRST(m_views, view);//增加到MVCC控制視圖變量中ut_ad(!view->is_closed());ut_ad(validate());}trx_sys_mutex_exit(); } /** Find a free view from the active list, if none found then allocate a new view. @return a view to use */ReadView *MVCC::get_view() {ut_ad(mutex_own(&trx_sys->mutex));ReadView *view;if (UT_LIST_GET_LEN(m_free) > 0) {view = UT_LIST_GET_FIRST(m_free);UT_LIST_REMOVE(m_free, view);} else {view = UT_NEW_NOKEY(ReadView());if (view == nullptr) {ib::error(ER_IB_MSG_918) << "Failed to allocate MVCC view";}}return (view); } /** Opens a read view where exactly the transactions serialized before this point in time are seen in the view. @param id Creator transaction id */void ReadView::prepare(trx_id_t id) {ut_ad(mutex_own(&trx_sys->mutex));m_creator_trx_id = id;m_low_limit_no = m_low_limit_id = m_up_limit_id = trx_sys->max_trx_id;if (!trx_sys->rw_trx_ids.empty()) {copy_trx_ids(trx_sys->rw_trx_ids);} else {m_ids.clear();}ut_ad(m_up_limit_id <= m_low_limit_id);if (UT_LIST_GET_LEN(trx_sys->serialisation_list) > 0) {const trx_t *trx;trx = UT_LIST_GET_FIRST(trx_sys->serialisation_list);if (trx->no < m_low_limit_no) {m_low_limit_no = trx->no;}}ut_d(m_view_low_limit_no = m_low_limit_no);m_closed = false; }

看最后創(chuàng)建Read View可以看到分為兩種情況即視圖為空和不為空,不為空則使用原有的,為空則從空閑視圖中拿一個(gè),然后準(zhǔn)備視圖并返回。

4、MVCC版本創(chuàng)建和分析
先看一下版本控制的發(fā)起,也就前面提到的更新操作:

/** Updates a record when the update causes no size changes in its fields. @param[in] flags Undo logging and locking flags @param[in] cursor Cursor on the record to update; cursor stays valid and positioned on the same record @param[in,out] offsets Offsets on cursor->page_cur.rec @param[in] update Update vector @param[in] cmpl_info Compiler info on secondary index updates @param[in] thr Query thread, or null if flags & (btr_no_locking_flag | btr_no_undo_log_flag | btr_create_flag | btr_keep_sys_flag) @param[in] trx_id Transaction id @param[in,out] mtr Mini-transaction; if this is a secondary index, the caller must mtr_commit(mtr) before latching any further pages @return locking or undo log related error code, or @retval DB_SUCCESS on success @retval DB_ZIP_OVERFLOW if there is not enough space left on the compressed page (IBUF_BITMAP_FREE was reset outside mtr) */ dberr_t btr_cur_update_in_place(ulint flags, btr_cur_t *cursor, ulint *offsets,const upd_t *update, ulint cmpl_info,que_thr_t *thr, trx_id_t trx_id, mtr_t *mtr) {dict_index_t *index;buf_block_t *block;page_zip_des_t *page_zip;dberr_t err;rec_t *rec;roll_ptr_t roll_ptr = 0;ulint was_delete_marked;ibool is_hashed;rec = btr_cur_get_rec(cursor);index = cursor->index;ut_ad(rec_offs_validate(rec, index, offsets));ut_ad(!!page_rec_is_comp(rec) == dict_table_is_comp(index->table));ut_ad(trx_id > 0 || (flags & BTR_KEEP_SYS_FLAG) ||index->table->is_intrinsic());/* The insert buffer tree should never be updated in place. */ut_ad(!dict_index_is_ibuf(index));ut_ad(dict_index_is_online_ddl(index) == !!(flags & BTR_CREATE_FLAG) ||index->is_clustered());ut_ad((flags & ~(BTR_KEEP_POS_FLAG | BTR_KEEP_IBUF_BITMAP)) ==(BTR_NO_UNDO_LOG_FLAG | BTR_NO_LOCKING_FLAG | BTR_CREATE_FLAG |BTR_KEEP_SYS_FLAG) ||thr_get_trx(thr)->id == trx_id);ut_ad(fil_page_index_page_check(btr_cur_get_page(cursor)));ut_ad(btr_page_get_index_id(btr_cur_get_page(cursor)) == index->id);DBUG_PRINT("ib_cur",("update-in-place %s (" IB_ID_FMT ") by " TRX_ID_FMT ": %s",index->name(), index->id, trx_id,rec_printer(rec, offsets).str().c_str()));block = btr_cur_get_block(cursor);page_zip = buf_block_get_page_zip(block);/* Check that enough space is available on the compressed page. */if (page_zip) {ut_ad(!index->table->is_temporary());if (!btr_cur_update_alloc_zip(page_zip, btr_cur_get_page_cur(cursor), index,offsets, rec_offs_size(offsets), false,mtr)) {return (DB_ZIP_OVERFLOW);}rec = btr_cur_get_rec(cursor);}/* Do lock checking and undo logging */err = btr_cur_upd_lock_and_undo(flags, cursor, offsets, update, cmpl_info,thr, mtr, &roll_ptr);if (UNIV_UNLIKELY(err != DB_SUCCESS)) {/* We may need to update the IBUF_BITMAP_FREEbits after a reorganize that was done inbtr_cur_update_alloc_zip(). */goto func_exit;}if (!(flags & BTR_KEEP_SYS_FLAG) && !index->table->is_intrinsic()) {row_upd_rec_sys_fields(rec, nullptr, index, offsets, thr_get_trx(thr),roll_ptr);}was_delete_marked =rec_get_deleted_flag(rec, page_is_comp(buf_block_get_frame(block)));is_hashed = (block->index != nullptr);if (is_hashed) {/* TO DO: Can we skip this if none of the fieldsindex->search_info->curr_n_fieldsare being updated? *//* The function row_upd_changes_ord_field_binary works onlyif the update vector was built for a clustered index, we mustNOT call it if index is secondary */if (!index->is_clustered() ||row_upd_changes_ord_field_binary(index, update, thr, nullptr, nullptr,nullptr)) {/* Remove possible hash index pointer to this record */btr_search_update_hash_on_delete(cursor);}rw_lock_x_lock(btr_get_search_latch(index));}assert_block_ahi_valid(block);row_upd_rec_in_place(rec, index, offsets, update, page_zip);if (is_hashed) {rw_lock_x_unlock(btr_get_search_latch(index));}btr_cur_update_in_place_log(flags, rec, index, update, trx_id, roll_ptr, mtr);if (was_delete_marked &&!rec_get_deleted_flag(rec, page_is_comp(buf_block_get_frame(block)))) {/* The new updated record owns its possible externallystored fields */lob::BtrContext btr_ctx(mtr, nullptr, index, rec, offsets, block);btr_ctx.unmark_extern_fields();}ut_ad(err == DB_SUCCESS);func_exit:if (page_zip && !(flags & BTR_KEEP_IBUF_BITMAP) && !index->is_clustered() &&page_is_leaf(buf_block_get_frame(block))) {/* Update the free bits in the insert buffer. */ibuf_update_free_bits_zip(block, mtr);}return (err); }

這里還有insert等,有興趣可以看看相關(guān)操作函數(shù)。查詢(xún)?cè)谇懊嫣岬降暮瘮?shù) row_search_mvcc()中發(fā)起:

dberr_t row_search_mvcc(byte *buf, page_cur_mode_t mode,row_prebuilt_t *prebuilt, ulint match_mode,const ulint direction) {else if (index == clust_index) {/* Fetch a previous version of the row if the currentone is not visible in the snapshot; if we have a veryhigh force recovery level set, we try to avoid crashesby skipping this lookup */if (srv_force_recovery < 5 &&!lock_clust_rec_cons_read_sees(rec, index, offsets,trx_get_read_view(trx))) {rec_t *old_vers;/* The following call returns 'offsets' associated with 'old_vers' */err = row_sel_build_prev_vers_for_mysql(trx->read_view, clust_index, prebuilt, rec, &offsets, &heap,&old_vers, need_vrow ? &vrow : nullptr, &mtr,prebuilt->get_lob_undo());if (err != DB_SUCCESS) {goto lock_wait_or_error;}if (old_vers == nullptr) {/* The row did not exist yet inthe read view */goto next_rec;}rec = old_vers;prev_rec = rec;ut_d(prev_rec_debug = row_search_debug_copy_rec_order_prefix(pcur, index, prev_rec, &prev_rec_debug_n_fields,&prev_rec_debug_buf, &prev_rec_debug_buf_size));} }

然后下來(lái)就是視圖的創(chuàng)建匹配和判斷,在前面已經(jīng)提到過(guò)了。下面看一下記錄的版本具體數(shù)據(jù)的操作:
row_search_mvcc -> row_sel_build_prev_vers_for_mysql -> row_vers_build_for_consistent_read -> trx_undo_prev_version_build

bool trx_undo_prev_version_build(const rec_t *index_rec ATTRIB_USED_ONLY_IN_DEBUG,mtr_t *index_mtr ATTRIB_USED_ONLY_IN_DEBUG, const rec_t *rec,const dict_index_t *const index, ulint *offsets, mem_heap_t *heap,rec_t **old_vers, mem_heap_t *v_heap, const dtuple_t **vrow, ulint v_status,lob::undo_vers_t *lob_undo) {DBUG_TRACE;trx_undo_rec_t *undo_rec = nullptr;dtuple_t *entry;trx_id_t rec_trx_id;ulint type;undo_no_t undo_no;table_id_t table_id;trx_id_t trx_id;roll_ptr_t roll_ptr;upd_t *update = nullptr;byte *ptr;ulint info_bits;ulint cmpl_info;bool dummy_extern;byte *buf;ut_ad(!rw_lock_own(&purge_sys->latch, RW_LOCK_S));ut_ad(mtr_memo_contains_page(index_mtr, index_rec, MTR_MEMO_PAGE_S_FIX) ||mtr_memo_contains_page(index_mtr, index_rec, MTR_MEMO_PAGE_X_FIX));ut_ad(rec_offs_validate(rec, index, offsets));ut_a(index->is_clustered());roll_ptr = row_get_rec_roll_ptr(rec, index, offsets);*old_vers = nullptr;if (trx_undo_roll_ptr_is_insert(roll_ptr)) {/* The record rec is the first inserted version */return true;}rec_trx_id = row_get_rec_trx_id(rec, index, offsets);/* REDO rollback segments are used only for non-temporary objects.For temporary objects NON-REDO rollback segments are used. */bool is_temp = index->table->is_temporary();ut_ad(!index->table->skip_alter_undo);if (trx_undo_get_undo_rec(roll_ptr, rec_trx_id, heap, is_temp,index->table->name, &undo_rec)) {if (v_status & TRX_UNDO_PREV_IN_PURGE) {/* We are fetching the record being purged */undo_rec = trx_undo_get_undo_rec_low(roll_ptr, heap, is_temp);} else {/* The undo record may already have been purged,during purge or semi-consistent read. */return false;}}type_cmpl_t type_cmpl;ptr = trx_undo_rec_get_pars(undo_rec, &type, &cmpl_info, &dummy_extern,&undo_no, &table_id, type_cmpl);if (table_id != index->table->id) {/* The table should have been rebuilt, but purge hasnot yet removed the undo log records for thenow-dropped old table (table_id). */return true;}ptr = trx_undo_update_rec_get_sys_cols(ptr, &trx_id, &roll_ptr, &info_bits);/* (a) If a clustered index record version is such that thetrx id stamp in it is bigger than purge_sys->view, then theBLOBs in that version are known to exist (the purge has notprogressed that far);(b) if the version is the first version such that trx id in itis less than purge_sys->view, and it is not delete-marked,then the BLOBs in that version are known to exist (the purgecannot have purged the BLOBs referenced by that versionyet).This function does not fetch any BLOBs. The callers might, bypossibly invoking row_ext_create() via row_build(). However,they should have all needed information in the *old_versreturned by this function. This is because *old_vers is basedon the transaction undo log records. The functiontrx_undo_page_fetch_ext() will write BLOB prefixes to thetransaction undo log that are at least as long as the longestpossible column prefix in a secondary index. Thus, secondaryindex entries for *old_vers can be constructed withoutdereferencing any BLOB pointers. */ptr = trx_undo_rec_skip_row_ref(ptr, index);ptr = trx_undo_update_rec_get_update(ptr, index, type, trx_id, roll_ptr,info_bits, nullptr, heap, &update,lob_undo, type_cmpl);ut_a(ptr);if (row_upd_changes_field_size_or_external(index, offsets, update)) {/* We should confirm the existence of disowned external data,if the previous version record is delete marked. If the trx_idof the previous record is seen by purge view, we should treatit as missing history, because the disowned external datamight be purged already.The inherited external data (BLOBs) can be freed (purged)after trx_id was committed, provided that no view was startedbefore trx_id. If the purge view can see the committeddelete-marked record by trx_id, no transactions need to accessthe BLOB. *//* the row_upd_changes_disowned_external(update) call could beomitted, but the synchronization on purge_sys->latch is likelymore expensive. */if ((update->info_bits & REC_INFO_DELETED_FLAG) &&row_upd_changes_disowned_external(update)) {bool missing_extern;rw_lock_s_lock(&purge_sys->latch);missing_extern =purge_sys->view.changes_visible(trx_id, index->table->name);rw_lock_s_unlock(&purge_sys->latch);if (missing_extern) {/* treat as a fresh insert, not tocause assertion error at the caller. */return true;}}/* We have to set the appropriate extern storage bits in theold version of the record: the extern bits in rec for thosefields that update does NOT update, as well as the bits forthose fields that update updates to become externally storedfields. Store the info: */entry = row_rec_to_index_entry(rec, index, offsets, heap);/* The page containing the clustered index recordcorresponding to entry is latched in mtr. Thus thefollowing call is safe. */row_upd_index_replace_new_col_vals(entry, index, update, heap);buf = static_cast<byte *>(mem_heap_alloc(heap, rec_get_converted_size(index, entry)));*old_vers = rec_convert_dtuple_to_rec(buf, index, entry);} else {buf = static_cast<byte *>(mem_heap_alloc(heap, rec_offs_size(offsets)));*old_vers = rec_copy(buf, rec, offsets);rec_offs_make_valid(*old_vers, index, offsets);row_upd_rec_in_place(*old_vers, index, offsets, update, nullptr);}/* Set the old value (which is the after image of an update) in theupdate vector to dtuple vrow */if (v_status & TRX_UNDO_GET_OLD_V_VALUE) {row_upd_replace_vcol((dtuple_t *)*vrow, index->table, update, false,nullptr, nullptr);}#if defined UNIV_DEBUG || defined UNIV_BLOB_LIGHT_DEBUGut_a(!rec_offs_any_null_extern(*old_vers,rec_get_offsets(*old_vers, index, nullptr, ULINT_UNDEFINED, &heap))); #endif // defined UNIV_DEBUG || defined UNIV_BLOB_LIGHT_DEBUG/* If vrow is not NULL it means that the caller is interested in the values ofthe virtual columns for this version.If the UPD_NODE_NO_ORD_CHANGE flag is set on cmpl_info, it means that thechange which created this entry in undo log did not affect any column of anysecondary index (in particular: virtual), and thus the values of virtualcolumns were not recorded in undo. In such case the caller may assume that thevalues of (virtual) columns present in secondary index are exactly the same asthey are in the next (more recent) version.If on the other hand the UPD_NODE_NO_ORD_CHANGE flag is not set, then we willmake sure that *vrow points to a properly allocated memory and contains thevalues of virtual columns for this version recovered from undo log.This implies that if the caller has provided a non-NULL vrow, and the *vrow isstill NULL after the call, (and old_vers is not NULL) it must be because theUPD_NODE_NO_ORD_CHANGE flag was set for this version.This last statement is an important assumption made by therow_vers_impl_x_locked_low() function. */if (vrow && !(cmpl_info & UPD_NODE_NO_ORD_CHANGE)) {if (!(*vrow)) {*vrow = dtuple_create_with_vcol(v_heap ? v_heap : heap,index->table->get_n_cols(),dict_table_get_n_v_cols(index->table));dtuple_init_v_fld(*vrow);}ut_ad(index->table->n_v_cols);trx_undo_read_v_cols(index->table, ptr, *vrow,v_status & TRX_UNDO_PREV_IN_PURGE, false, nullptr,(v_heap != nullptr ? v_heap : heap));}if (update != nullptr) {update->reset();}return true; }

這個(gè)就是前面介紹的形成版本鏈的一個(gè)過(guò)程函數(shù)。通過(guò)解析undo log把指針一個(gè)個(gè)的連接起來(lái),形成一個(gè)活動(dòng)的版本鏈。

這樣,通過(guò)視圖創(chuàng)建、判斷以及MVCC中創(chuàng)建版本鏈的匹配原則,就可以拿到實(shí)際具體的相關(guān)版本數(shù)據(jù)了。

四、總結(jié)

MVCC是處理數(shù)據(jù)同步和安全的一種方式,是有效隔離事務(wù)的一種手段。數(shù)據(jù)庫(kù)如果嚴(yán)格實(shí)現(xiàn)串行讀寫(xiě),就不會(huì)有這種機(jī)制出現(xiàn),但在實(shí)際應(yīng)用中,為了達(dá)到更好的應(yīng)用效果,提高并發(fā)和訪問(wèn)速度,提出了想當(dāng)多的方法,《數(shù)據(jù)密集型應(yīng)用系統(tǒng)設(shè)計(jì)》中都有介紹。所以原理性的東西一定明白,再和具體的實(shí)現(xiàn)相對(duì)照,就會(huì)很清楚的弄明白事情的來(lái)龍去脈,知其然,知其所以然,是知也。
努力吧,歸來(lái)的少年!

總結(jié)

以上是生活随笔為你收集整理的mysql源码解读——MVCC的全部?jī)?nèi)容,希望文章能夠幫你解決所遇到的問(wèn)題。

如果覺(jué)得生活随笔網(wǎng)站內(nèi)容還不錯(cuò),歡迎將生活随笔推薦給好友。