本節(jié)介紹了PostgreSQL獲取事務(wù)快照的主實(shí)現(xiàn)邏輯,相應(yīng)的實(shí)現(xiàn)函數(shù)是GetTransactionSnapshot。
從策劃到設(shè)計(jì)制作,每一步都追求做到細(xì)膩,制作可持續(xù)發(fā)展的企業(yè)網(wǎng)站。為客戶提供網(wǎng)站設(shè)計(jì)制作、成都網(wǎng)站建設(shè)、網(wǎng)站策劃、網(wǎng)頁設(shè)計(jì)、域名與空間、網(wǎng)頁空間、網(wǎng)絡(luò)營銷、VI設(shè)計(jì)、 網(wǎng)站改版、漏洞修補(bǔ)等服務(wù)。為客戶提供更好的一站式互聯(lián)網(wǎng)解決方案,以客戶的口碑塑造優(yōu)易品牌,攜手廣大客戶,共同發(fā)展進(jìn)步。
全局/靜態(tài)變量
/*
* Currently registered Snapshots. Ordered in a heap by xmin, so that we can
* quickly find the one with lowest xmin, to advance our MyPgXact->xmin.
* 當(dāng)前已注冊(cè)的快照.
* 按照xmin堆排序,這樣我們可以快速找到xmin最小的一個(gè),從而可以設(shè)置MyPgXact->xmin。
*/
static int xmin_cmp(const pairingheap_node *a, const pairingheap_node *b,
void *arg);
static pairingheap RegisteredSnapshots = {&xmin_cmp, NULL, NULL};
/* first GetTransactionSnapshot call in a transaction? */
bool FirstSnapshotSet = false;
/*
* Remember the serializable transaction snapshot, if any. We cannot trust
* FirstSnapshotSet in combination with IsolationUsesXactSnapshot(), because
* GUC may be reset before us, changing the value of IsolationUsesXactSnapshot.
* 如存在則記下serializable事務(wù)快照.
* 我們不能信任與IsolationUsesXactSnapshot()結(jié)合使用的FirstSnapshotSet,
* 因?yàn)镚UC可能會(huì)在我們之前重置,改變IsolationUsesXactSnapshot的值。
*/
static Snapshot FirstXactSnapshot = NULL;
/*
* CurrentSnapshot points to the only snapshot taken in transaction-snapshot
* mode, and to the latest one taken in a read-committed transaction.
* SecondarySnapshot is a snapshot that's always up-to-date as of the current
* instant, even in transaction-snapshot mode. It should only be used for
* special-purpose code (say, RI checking.) CatalogSnapshot points to an
* MVCC snapshot intended to be used for catalog scans; we must invalidate it
* whenever a system catalog change occurs.
* CurrentSnapshot指向在transaction-snapshot模式下獲取的唯一快照/在read-committed事務(wù)中獲取的最新快照。
* SecondarySnapshot是即使在transaction-snapshot模式下,也總是最新的快照。它應(yīng)該只用于特殊用途碼(例如,RI檢查)。
* CatalogSnapshot指向打算用于catalog掃描的MVCC快照;
* 無論何時(shí)發(fā)生system catalog更改,我們都必須馬上使其失效。
*
* These SnapshotData structs are static to simplify memory allocation
* (see the hack in GetSnapshotData to avoid repeated malloc/free).
* 這些SnapshotData結(jié)構(gòu)體是靜態(tài)的便于簡化內(nèi)存分配.
* (可以回過頭來看GetSnapshotData函數(shù)如何避免重復(fù)的malloc/free)
*/
static SnapshotData CurrentSnapshotData = {HeapTupleSatisfiesMVCC};
static SnapshotData SecondarySnapshotData = {HeapTupleSatisfiesMVCC};
SnapshotData CatalogSnapshotData = {HeapTupleSatisfiesMVCC};
/* Pointers to valid snapshots */
//指向有效的快照
static Snapshot CurrentSnapshot = NULL;
static Snapshot SecondarySnapshot = NULL;
static Snapshot CatalogSnapshot = NULL;
static Snapshot HistoricSnapshot = NULL;
/*
* These are updated by GetSnapshotData. We initialize them this way
* for the convenience of TransactionIdIsInProgress: even in bootstrap
* mode, we don't want it to say that BootstrapTransactionId is in progress.
* 這些變量通過函數(shù)GetSnapshotData更新.
* 為了便于TransactionIdIsInProgress,以這種方式初始化它們:
* 即使在引導(dǎo)模式下,我們也不希望表示BootstrapTransactionId正在進(jìn)行中。
*
* RecentGlobalXmin and RecentGlobalDataXmin are initialized to
* InvalidTransactionId, to ensure that no one tries to use a stale
* value. Readers should ensure that it has been set to something else
* before using it.
* RecentGlobalXmin和RecentGlobalDataXmin初始化為InvalidTransactionId,
* 以確保沒有人嘗試使用過時(shí)的值。
* 在使用它之前,讀取進(jìn)程應(yīng)確保它已經(jīng)被設(shè)置為其他值。
*/
TransactionId TransactionXmin = FirstNormalTransactionId;
TransactionId RecentXmin = FirstNormalTransactionId;
TransactionId RecentGlobalXmin = InvalidTransactionId;
TransactionId RecentGlobalDataXmin = InvalidTransactionId;
/* (table, ctid) => (cmin, cmax) mapping during timetravel */
static HTAB *tuplecid_data = NULL;
MyPgXact
當(dāng)前的事務(wù)信息.
/*
* Flags for PGXACT->vacuumFlags
* PGXACT->vacuumFlags標(biāo)記
*
* Note: If you modify these flags, you need to modify PROCARRAY_XXX flags
* in src/include/storage/procarray.h.
* 注意:如果修改了這些標(biāo)記,需要更新src/include/storage/procarray.h中的PROCARRAY_XXX標(biāo)記
*
* PROC_RESERVED may later be assigned for use in vacuumFlags, but its value is
* used for PROCARRAY_SLOTS_XMIN in procarray.h, so GetOldestXmin won't be able
* to match and ignore processes with this flag set.
* PROC_RESERVED可能在接下來分配給vacuumFlags使用,
* 但是它在procarray.h中用于標(biāo)識(shí)PROCARRAY_SLOTS_XMIN,
* 因此GetOldestXmin不能匹配和忽略使用此標(biāo)記的進(jìn)程.
*/
//是否auto vacuum worker?
#define PROC_IS_AUTOVACUUM 0x01 /* is it an autovac worker? */
//正在運(yùn)行l(wèi)azy vacuum
#define PROC_IN_VACUUM 0x02 /* currently running lazy vacuum */
//正在運(yùn)行analyze
#define PROC_IN_ANALYZE 0x04 /* currently running analyze */
//只能通過auto vacuum設(shè)置
#define PROC_VACUUM_FOR_WRAPAROUND 0x08 /* set by autovac only */
//在事務(wù)外部正在執(zhí)行邏輯解碼
#define PROC_IN_LOGICAL_DECODING 0x10 /* currently doing logical
* decoding outside xact */
//保留用于procarray
#define PROC_RESERVED 0x20 /* reserved for procarray */
/* flags reset at EOXact */
//在EOXact時(shí)用于重置標(biāo)記的MASK
#define PROC_VACUUM_STATE_MASK \
(PROC_IN_VACUUM | PROC_IN_ANALYZE | PROC_VACUUM_FOR_WRAPAROUND)
/*
* Prior to PostgreSQL 9.2, the fields below were stored as part of the
* PGPROC. However, benchmarking revealed that packing these particular
* members into a separate array as tightly as possible sped up GetSnapshotData
* considerably on systems with many CPU cores, by reducing the number of
* cache lines needing to be fetched. Thus, think very carefully before adding
* anything else here.
*/
typedef struct PGXACT
{
//當(dāng)前的頂層事務(wù)ID(非子事務(wù))
//出于優(yōu)化的目的,只讀事務(wù)并不會(huì)分配事務(wù)號(hào)(xid = 0)
TransactionId xid; /* id of top-level transaction currently being
* executed by this proc, if running and XID
* is assigned; else InvalidTransactionId */
//在啟動(dòng)事務(wù)時(shí),當(dāng)前正在執(zhí)行的最小事務(wù)號(hào)XID,但不包括LAZY VACUUM
//vacuum不能清除刪除事務(wù)號(hào)xid >= xmin的元組
TransactionId xmin; /* minimal running XID as it was when we were
* starting our xact, excluding LAZY VACUUM:
* vacuum must not remove tuples deleted by
* xid >= xmin ! */
//vacuum相關(guān)的標(biāo)記
uint8 vacuumFlags; /* vacuum-related flags, see above */
bool overflowed;
bool delayChkpt; /* true if this proc delays checkpoint start;
* previously called InCommit */
uint8 nxids;
} PGXACT;
extern PGDLLIMPORT struct PGXACT *MyPgXact;
Snapshot
SnapshotData結(jié)構(gòu)體指針,SnapshotData結(jié)構(gòu)體可表達(dá)的信息囊括了所有可能的快照.
有以下幾種不同類型的快照:
1.常規(guī)的MVCC快照
2.在恢復(fù)期間的MVCC快照(處于Hot-Standby模式)
3.在邏輯解碼過程中使用的歷史MVCC快照
4.作為參數(shù)傳遞給HeapTupleSatisfiesDirty()函數(shù)的快照
5.作為參數(shù)傳遞給HeapTupleSatisfiesNonVacuumable()函數(shù)的快照
6.用于在沒有成員訪問情況下SatisfiesAny、Toast和Self的快照
//SnapshotData結(jié)構(gòu)體指針
typedef struct SnapshotData *Snapshot;
//無效的快照
#define InvalidSnapshot ((Snapshot) NULL)
/*
* We use SnapshotData structures to represent both "regular" (MVCC)
* snapshots and "special" snapshots that have non-MVCC semantics.
* The specific semantics of a snapshot are encoded by the "satisfies"
* function.
* 我們使用SnapshotData結(jié)構(gòu)體表示"regular" (MVCC) snapshots和具有非MVCC語義的"special" snapshots。
*/
//測試函數(shù)
typedef bool (*SnapshotSatisfiesFunc) (HeapTuple htup,
Snapshot snapshot, Buffer buffer);
//常見的有:
//HeapTupleSatisfiesMVCC:判斷元組對(duì)某一快照版本是否有效
//HeapTupleSatisfiesUpdate:判斷元組是否可更新(同時(shí)更新同一個(gè)元組)
//HeapTupleSatisfiesDirty:判斷當(dāng)前元組是否存在臟數(shù)據(jù)
//HeapTupleSatisfiesSelf:判斷tuple對(duì)自身信息是否有效
//HeapTupleSatisfiesToast:判斷是否TOAST表
//HeapTupleSatisfiesVacuum:判斷元組是否能被VACUUM刪除
//HeapTupleSatisfiesAny:所有元組都可見
//HeapTupleSatisfiesHistoricMVCC:用于CATALOG 表
/*
* Struct representing all kind of possible snapshots.
* 該結(jié)構(gòu)體可表達(dá)的信息囊括了所有可能的快照.
*
* There are several different kinds of snapshots:
* * Normal MVCC snapshots
* * MVCC snapshots taken during recovery (in Hot-Standby mode)
* * Historic MVCC snapshots used during logical decoding
* * snapshots passed to HeapTupleSatisfiesDirty()
* * snapshots passed to HeapTupleSatisfiesNonVacuumable()
* * snapshots used for SatisfiesAny, Toast, Self where no members are
* accessed.
* 有以下幾種不同類型的快照:
* * 常規(guī)的MVCC快照
* * 在恢復(fù)期間的MVCC快照(處于Hot-Standby模式)
* * 在邏輯解碼過程中使用的歷史MVCC快照
* * 作為參數(shù)傳遞給HeapTupleSatisfiesDirty()函數(shù)的快照
* * 作為參數(shù)傳遞給HeapTupleSatisfiesNonVacuumable()函數(shù)的快照
* * 用于在沒有成員訪問情況下SatisfiesAny、Toast和Self的快照
*
* TODO: It's probably a good idea to split this struct using a NodeTag
* similar to how parser and executor nodes are handled, with one type for
* each different kind of snapshot to avoid overloading the meaning of
* individual fields.
* TODO: 使用類似于parser/executor nodes的處理,使用NodeTag來拆分結(jié)構(gòu)體會(huì)是一個(gè)好的做法,
* 使用OO(面向?qū)ο罄^承)的方法.
*/
typedef struct SnapshotData
{
//測試tuple是否可見的函數(shù)
SnapshotSatisfiesFunc satisfies; /* tuple test function */
/*
* The remaining fields are used only for MVCC snapshots, and are normally
* just zeroes in special snapshots. (But xmin and xmax are used
* specially by HeapTupleSatisfiesDirty, and xmin is used specially by
* HeapTupleSatisfiesNonVacuumable.)
* 余下的字段僅用于MVCC快照,在特殊快照中通常為0。
* (xmin和xmax可用于HeapTupleSatisfiesDirty,xmin可用于HeapTupleSatisfiesNonVacuumable)
*
* An MVCC snapshot can never see the effects of XIDs >= xmax. It can see
* the effects of all older XIDs except those listed in the snapshot. xmin
* is stored as an optimization to avoid needing to search the XID arrays
* for most tuples.
* XIDs >= xmax的事務(wù),對(duì)該快照是不可見的(沒有任何影響).
* 對(duì)該快照可見的是小于xmax,但不在snapshot列表中的XIDs.
* 記錄xmin是出于優(yōu)化的目的,避免為大多數(shù)tuples搜索XID數(shù)組.
*/
//XID ∈ [2,min)是可見的
TransactionId xmin; /* all XID < xmin are visible to me */
//XID ∈ [xmax,∞)是不可見的
TransactionId xmax; /* all XID >= xmax are invisible to me */
/*
* For normal MVCC snapshot this contains the all xact IDs that are in
* progress, unless the snapshot was taken during recovery in which case
* it's empty. For historic MVCC snapshots, the meaning is inverted, i.e.
* it contains *committed* transactions between xmin and xmax.
* 對(duì)于普通的MVCC快照,xip存儲(chǔ)了所有正在進(jìn)行中的XIDs,除非在恢復(fù)期間產(chǎn)生的快照(這時(shí)候數(shù)組為空)
* 對(duì)于歷史MVCC快照,意義相反,即它包含xmin和xmax之間的*已提交*事務(wù)。
*
* note: all ids in xip[] satisfy xmin <= xip[i] < xmax
* 注意: 所有在xip數(shù)組中的XIDs滿足xmin <= xip[i] < xmax
*/
TransactionId *xip;
//xip數(shù)組中的元素個(gè)數(shù)
uint32 xcnt; /* # of xact ids in xip[] */
/*
* For non-historic MVCC snapshots, this contains subxact IDs that are in
* progress (and other transactions that are in progress if taken during
* recovery). For historic snapshot it contains *all* xids assigned to the
* replayed transaction, including the toplevel xid.
* 對(duì)于非歷史MVCC快照,下面這些域含有活動(dòng)的subxact IDs.
* (以及在恢復(fù)過程中狀態(tài)為進(jìn)行中的事務(wù)).
* 對(duì)于歷史MVCC快照,這些域字段含有*所有*用于回放事務(wù)的快照,包括頂層事務(wù)XIDs.
*
* note: all ids in subxip[] are >= xmin, but we don't bother filtering
* out any that are >= xmax
* 注意:sbuxip數(shù)組中的元素均≥ xmin,但我們不需要過濾掉任何>= xmax的項(xiàng)
*/
TransactionId *subxip;
//subxip數(shù)組元素個(gè)數(shù)
int32 subxcnt; /* # of xact ids in subxip[] */
//是否溢出?
bool suboverflowed; /* has the subxip array overflowed? */
//在Recovery期間的快照?
bool takenDuringRecovery; /* recovery-shaped snapshot? */
//如為靜態(tài)快照,則該值為F
bool copied; /* false if it's a static snapshot */
//在自身的事務(wù)中,CID < curcid是可見的
CommandId curcid; /* in my xact, CID < curcid are visible */
/*
* An extra return value for HeapTupleSatisfiesDirty, not used in MVCC
* snapshots.
* HeapTupleSatisfiesDirty返回的值,在MVCC快照中無用
*/
uint32 speculativeToken;
/*
* Book-keeping information, used by the snapshot manager
* 用于快照管理器的Book-keeping信息
*/
//在ActiveSnapshot棧中的引用計(jì)數(shù)
uint32 active_count; /* refcount on ActiveSnapshot stack */
//在RegisteredSnapshots中的引用計(jì)數(shù)
uint32 regd_count; /* refcount on RegisteredSnapshots */
//RegisteredSnapshots堆中的鏈接
pairingheap_node ph_node; /* link in the RegisteredSnapshots heap */
//快照"拍攝"時(shí)間戳
TimestampTz whenTaken; /* timestamp when snapshot was taken */
//拍照時(shí)WAL stream中的位置
XLogRecPtr lsn; /* position in the WAL stream when taken */
} SnapshotData;
GetTransactionSnapshot函數(shù)在事務(wù)處理中為新查詢獲得相應(yīng)的快照.
/*
* GetTransactionSnapshot
* Get the appropriate snapshot for a new query in a transaction.
* 在事務(wù)處理中為新查詢獲得相應(yīng)的快照
*
* Note that the return value may point at static storage that will be modified
* by future calls and by CommandCounterIncrement(). Callers should call
* RegisterSnapshot or PushActiveSnapshot on the returned snap if it is to be
* used very long.
* 注意返回值可能會(huì)指向?qū)碚{(diào)用和CommandCounterIncrement()函數(shù)修改的靜態(tài)存儲(chǔ)區(qū).
* 如需要長時(shí)間保持快照,調(diào)用者需要調(diào)用RegisterSnapshot或者PushActiveSnapshot函數(shù)記錄快照信息.
*/
Snapshot
GetTransactionSnapshot(void)
{
/*
* Return historic snapshot if doing logical decoding. We'll never need a
* non-historic transaction snapshot in this (sub-)transaction, so there's
* no need to be careful to set one up for later calls to
* GetTransactionSnapshot().
* 如執(zhí)行邏輯解碼,則返回歷史快照.
* 在該事務(wù)中,我們不需要非歷史快照,因此不需要為后續(xù)的GetTransactionSnapshot()調(diào)用小心配置
*/
if (HistoricSnapshotActive())
{
Assert(!FirstSnapshotSet);
return HistoricSnapshot;
}
/* First call in transaction? */
//首次調(diào)用?
if (!FirstSnapshotSet)
{
/*
* Don't allow catalog snapshot to be older than xact snapshot. Must
* do this first to allow the empty-heap Assert to succeed.
* 不允許catalog快照比事務(wù)快照更舊.
* 必須首次執(zhí)行該函數(shù)以確保empty-heap驗(yàn)證是成功的.
*/
InvalidateCatalogSnapshot();
Assert(pairingheap_is_empty(&RegisteredSnapshots));
Assert(FirstXactSnapshot == NULL);
if (IsInParallelMode())
elog(ERROR,
"cannot take query snapshot during a parallel operation");
/*
* In transaction-snapshot mode, the first snapshot must live until
* end of xact regardless of what the caller does with it, so we must
* make a copy of it rather than returning CurrentSnapshotData
* directly. Furthermore, if we're running in serializable mode,
* predicate.c needs to wrap the snapshot fetch in its own processing.
* 在transaction-snapshot模式下,無論調(diào)用者對(duì)它做什么,第一個(gè)快照必須一直存在到xact事務(wù)結(jié)束,
* 因此我們必須復(fù)制它,而不是直接返回CurrentSnapshotData。
*/
if (IsolationUsesXactSnapshot())
{
//transaction-snapshot模式
/* First, create the snapshot in CurrentSnapshotData */
//首先,在CurrentSnapshotData中創(chuàng)建快照
if (IsolationIsSerializable())
//隔離級(jí)別 = Serializable
CurrentSnapshot = GetSerializableTransactionSnapshot(&CurrentSnapshotData);
else
//其他隔離級(jí)別
CurrentSnapshot = GetSnapshotData(&CurrentSnapshotData);
/* Make a saved copy */
//拷貝快照
CurrentSnapshot = CopySnapshot(CurrentSnapshot);
FirstXactSnapshot = CurrentSnapshot;
/* Mark it as "registered" in FirstXactSnapshot */
//在FirstXactSnapshot中標(biāo)記該快照已注冊(cè)
FirstXactSnapshot->regd_count++;
pairingheap_add(&RegisteredSnapshots, &FirstXactSnapshot->ph_node);
}
else
//非transaction-snapshot模式,直接獲取
CurrentSnapshot = GetSnapshotData(&CurrentSnapshotData);
//設(shè)置標(biāo)記
FirstSnapshotSet = true;
return CurrentSnapshot;
}
//transaction-snapshot模式
if (IsolationUsesXactSnapshot())
return CurrentSnapshot;
/* Don't allow catalog snapshot to be older than xact snapshot. */
//不允許catalog快照比事務(wù)快照舊
InvalidateCatalogSnapshot();
//獲取快照
CurrentSnapshot = GetSnapshotData(&CurrentSnapshotData);
//返回
return CurrentSnapshot;
}
執(zhí)行簡單查詢,可觸發(fā)獲取快照邏輯.
16:35:08 (xdb@[local]:5432)testdb=# begin;
BEGIN
16:35:13 (xdb@[local]:5432)testdb=#* select 1;
啟動(dòng)gdb,設(shè)置斷點(diǎn)
(gdb) b GetTransactionSnapshot
Breakpoint 1 at 0xa9492e: file snapmgr.c, line 312.
(gdb) c
Continuing.
Breakpoint 1, GetTransactionSnapshot () at snapmgr.c:312
312 if (HistoricSnapshotActive())
(gdb)
如執(zhí)行邏輯解碼,則返回歷史快照(本例不是).
(gdb) n
319 if (!FirstSnapshotSet)
(gdb)
首次調(diào)用?是,進(jìn)入相應(yīng)的邏輯
319 if (!FirstSnapshotSet)
(gdb) n
325 InvalidateCatalogSnapshot();
(gdb)
327 Assert(pairingheap_is_empty(&RegisteredSnapshots));
(gdb)
328 Assert(FirstXactSnapshot == NULL);
(gdb) n
330 if (IsInParallelMode())
(gdb)
非transaction-snapshot模式,直接調(diào)用GetSnapshotData獲取
(gdb)
341 if (IsolationUsesXactSnapshot())
(gdb)
356 CurrentSnapshot = GetSnapshotData(&CurrentSnapshotData);
(gdb) p CurrentSnapshotData
$1 = {satisfies = 0xa9310d , xmin = 2342, xmax = 2350, xip = 0x14bee40, xcnt = 2,
subxip = 0x1514fa0, subxcnt = 0, suboverflowed = false, takenDuringRecovery = false, copied = false, curcid = 0,
speculativeToken = 0, active_count = 0, regd_count = 0, ph_node = {first_child = 0x0, next_sibling = 0x0,
prev_or_parent = 0x0}, whenTaken = 0, lsn = 0}
(gdb)
函數(shù)執(zhí)行成功,查看CurrentSnapshot
注:2342事務(wù)所在的進(jìn)程已被kill
(gdb) n
358
(gdb) p CurrentSnapshot
$2 = (Snapshot) 0xf9be60
(gdb) p *CurrentSnapshot
$3 = {satisfies = 0xa9310d , xmin = 2350, xmax = 2350, xip = 0x14bee40, xcnt = 0,
subxip = 0x1514fa0, subxcnt = 0, suboverflowed = false, takenDuringRecovery = false, copied = false, curcid = 0,
speculativeToken = 0, active_count = 0, regd_count = 0, ph_node = {first_child = 0x0, next_sibling = 0x0,
prev_or_parent = 0x0}, whenTaken = 0, lsn = 0}
(gdb)
執(zhí)行成功
(gdb) n
359 return CurrentSnapshot;
(gdb)
371 }
(gdb)
exec_simple_query (query_string=0x149aec8 "select 1;") at postgres.c:1059
1059 snapshot_set = true;
(gdb)
查看全局變量MyPgXact
(gdb) p MyPgXact
$7 = (struct PGXACT *) 0x7f47103c01f4
(gdb) p *MyPgXact
$8 = {xid = 0, xmin = 2350, vacuumFlags = 0 '\000', overflowed = false, delayChkpt = false, nxids = 0 '\000'}
(gdb)
注意:
1.xid = 0,表示未分配事務(wù)號(hào).出于優(yōu)化的理由,PG在修改數(shù)據(jù)時(shí)才會(huì)分配事務(wù)號(hào).
2.txid_current()函數(shù)會(huì)分配事務(wù)號(hào);txid_current_if_assigned()函數(shù)不會(huì).
DONE!
遺留問題:
1.CurrentSnapshotData全局變量中的信息何時(shí)初始化/更改?
2.GetSnapshotData函數(shù)的實(shí)現(xiàn)(下節(jié)介紹).
PG Source Code