MongoDB复制集同步原理及随机同步数据

作者：袖梨 2022-06-29

Mongo DB 是目前在IT行业非常流行的一种非关系型数据库(NoSql),其灵活的数据存储方式备受当前IT从业人员的青睐。本文我们讲讲Mongo DB复制集同步原理及随机同步数据的实例。

MongoDB复制集同步原理

同步过程

选取从哪个节点同步后，拉取oplog
1.Applies the op执行这个op日志
2.Writes the op to its own oplog (also local.oplog.rs)将这个op日志写入到自己的oplog中
3.Requests the next op请求下一个op日志secondary节点同步到哪了

secondary节点同步到哪了

主节点根据从节点获取oplog的时间戳可以判断数据同步到哪了

How does primary know where secondary is synced to? Well, secondary is querying primary‘s oplog for more results. So, if secondary requests an op written at 3pm, primary knows seconday has replicated all ops written before 3pm.

So, it goes like:
1.Do a write on primary.
2.Write is written to the oplog on primary, with a field “ts” saying the write occurred at time t.
3.{getLastError:1,w:2} is called on primary. primary has done the write, so it is just waiting for one more server to get the write (w:2).
4.secondary queries the oplog on primary and gets the op
5.secondary applies the op from time t
6.secondary requests ops with {ts:{$gt:t}} from primary‘s oplog
7.primary updates that secondary has applied up to t because it is requesting ops > t.
8.getLastError notices that primary and secondary both have the write, so w:2 is satisfied and it returns.

同步原理

如果A从B同步数据，B从C同步，C怎么知道A同步到哪了?看oplog读取协议：

当A从B同步数据，B对A说，我从年度oplog同步数据，如果你有写操作，告诉我一下。

B回答，我不是主节点，等我转发一下；B就跟主节点C说，就当做我是A，我代表A从你这同步数据。这时B与主节点C有两个连接，一个是B自己的，一个是代表A的。

A向B请求ops(写操作)，B就转向C，这样来完成A的请求。
A B C

是”真正”的同步连接. “ghost” connection，B代表A与C的连接。

初始化同步

新增成员或者重做同步的时候，会进行初始化同步。

如下7步：
1.Check the oplog. If it is not empty, this node does not initial sync, it just starts syncing normally. If the oplog is empty, then initial sync is necessary, continue to step #2:检查oplog,如果空的，需要进行初始化同步，否则进行普通的同步。
2.Get the latest oplog time from the source member: call this time start.取同步来源节点最新的oplog time，标记为start
3.Clone all of the data from the source member to the destination member.复制所有数据到目标节点
4.Build indexes on destination. 目标节点建索引，2.0版本包含在复制数据步骤里，2.2在复制数据后建索引。
5.Get the latest oplog time from the sync target, which is called minValid.取目标节点最新的oplog time,标记为minValid
6.Apply the sync target’s oplog from start to minValid.在目标节点执行start 到minValid之间的oplog
7.Become a “normal” member (transition into secondary state).成为正常的成员

个人理解，start 到minValid之间的oplog是复制过来的没有执行的oplog，没有完成最终一致性的那部分，就是一个oplog replay的过程。

查看源码rs_initialsync.cpp,同步初始化步骤如下：

/**

Do the initial sync for this member. There are several steps to this process:
*
Record start time.
Clone.
Set minValid1 to sync target’s latest op time.
Apply ops from start to minValid1, fetching missing docs as needed.
Set minValid2 to sync target’s latest op time.
Apply ops from minValid1 to minValid2.
Build indexes.
Set minValid3 to sync target’s latest op time.
Apply ops from minValid2 to minValid3.
*
At that point, initial sync is finished. Note that the oplog from the sync target is applied
three times: step 4, 6, and 8. 4 may involve refetching, 6 should not. By the end of 6,
this member should have consistent data. 8 is “cosmetic,” it is only to get this member
closer to the latest op time before it can transition to secondary state.
*/

clone data 复制数据的过程：
for each db on sourceServer:
for each collection in db:
for each doc in db.collection.find():
destinationServer.getDB(db).getCollection(collection).insert(doc)

初始化同步特点

好处：数据更紧凑，节省磁盘空间，因为所有操作都是insert。注意padding factor会设置为1。

不好的地方：同步速度太慢。使用fsync+lock 加写锁复制数据文件同步更快。

另外，mongodump/mongorestore来恢复不带oplog,实际上不太适合作为“从备份恢复”的策略。

从哪个成员来同步数据（Who to sync from）

MongoDB初始化同步数据的时候，可能从主节点同步，也可能是从从节点同步，根据最邻近原则，选择最邻近节点去同步数据；

By default, the member syncs from the the closest member of the set that is either the primary or another secondary with more recent oplog entries. This prevents two secondaries from syncing from each other.

http://docs.m***ongodb.org/manual/core/replication-internals/

如在上一篇文章提到的日志里[rsSync] replSet syncing to: 10.0.0.106:20011

这里syncing to 实际上是syncing from的意思，由于版本兼容原因沿用，正如kristina chodorow 所说’Backwards compatibility sucks’.

Replica Sets通过选择最邻近的节点（基于ping值），通过如下算法选择从哪个节点同步：
for each member that is healthy:
if member[state] == PRIMARY
add to set of possible sync targets
if member[lastOpTimeWritten] > our[lastOpTimeWritten]
add to set of possible sync targets

sync target = member with the min ping time from the possible sync targets

对于节点是否healthy的判断，各个版本不同，但是其目的都是找出正常运转的节点。在2.0版本中，它的判断还包括了salve delay这个因素。

在secondary运行db.adminCommand({replSetGetStatus:1}) 或者rs.status()命令来查看当前的节点状况，可以看到syncingTo这个字段，这个字段的值就是这个secondary的同步来源。

2.2新增replSetSyncFrom命令，可以指定从哪个节点同步数据。
db.adminCommand( { replSetSyncFrom: "[hostname]:[port]" } )

或者
rs.syncFrom("[hostname]:[port]")

如何选择最邻近的节点？看源码,最新2.2.2为例：mongodb-src-r2.2.2/src/mongo/db/repl/rs_initialsync.cpp
Member* ReplSetImpl::getMemberToSyncTo() {
lock lk(this);

bool buildIndexes = true;

// if we have a target we’ve requested to sync from, use it

if (_forceSyncTarget) {
Member* target = _forceSyncTarget;
_forceSyncTarget = 0;
sethbmsg( str::stream() fullName() return target;
}

Member* primary = const_cast(box.getPrimary());

// wait for 2N pings before choosing a sync target
if (_cfg) {
int needMorePings = config().members.size()*2 ? HeartbeatInfo::numPings;

if (needMorePings > 0) {
OCCASIONALLY log() return NULL;
}

buildIndexes = myConfig().buildIndexes;

// If we are only allowed to sync from the primary, return that
if (!_cfg->chainingAllowed()) {

// Returns NULL if we cannot reach the primary
return primary;
}
}

// find the member with the lowest ping time that has more data than me

// Find primary’s oplog time. Reject sync candidates that are more than

// MAX_SLACK_TIME seconds behind.
OpTime primaryOpTime;
static const unsigned maxSlackDurationSeconds = 10 * 60;
// 10 minutes
if (primary)
primaryOpTime = primary->hbinfo().opTime;
else

// choose a time that will exclude no candidates, since we don’t see a primary
primaryOpTime = OpTime(maxSlackDurationSeconds, 0);

if ( primaryOpTime.getSecs()
// erh ? I think this means there was just a new election

// and we don’t yet know the new primary’s optime
primaryOpTime = OpTime(maxSlackDurationSeconds, 0);
}

OpTime oldestSyncOpTime(primaryOpTime.getSecs() ? maxSlackDurationSeconds, 0);

Member *closest = 0;
time_t now = 0;

// Make two attempts. The first attempt, we ignore those nodes with

// slave delay higher than our own. The second attempt includes such

// nodes, in case those are the only ones we can reach.

// This loop attempts to set ‘closest’.
for (int attempts = 0; attempts for (Member *m = _members.head(); m; m = m->next()) {
if (!m->hbinfo().up())
continue;

// make sure members with buildIndexes sync from other members w/indexes
if (buildIndexes && !m->config().buildIndexes)
continue;

if (!m->state().readable())
continue;

if (m->state() == MemberState::RS_SECONDARY) {

// only consider secondaries that are ahead of where we are 只考虑OpTime在当前节点之前的节点
if (m->hbinfo().opTime continue;

// omit secondaries that are excessively behind, on the first attempt at least.
if (attempts == 0 &&
m->hbinfo().opTime continue;
}

// omit nodes that are more latent than anything we’ve already considered 忽略ping值延迟的
if (closest &&
(m->hbinfo().ping > closest->hbinfo().ping))
continue;

if (attempts == 0 &&
(myConfig().slaveDelay config().slaveDelay || m->config().hidden)) {
continue;
// skip this one in the first attempt
}

map::iterator vetoed = _veto.find(m->fullName());
if (vetoed != _veto.end()) {

// Do some veto housekeeping
if (now == 0) {
now = time(0);
}

// if this was on the veto list, check if it was vetoed in the last “while”. 判断是否在否决列表里
// if it was, skip.
if (vetoed->second >= now) {
if (time(0) % 5 == 0) {
log() }
continue;
}
_veto.erase(vetoed);

// fall through, this is a valid candidate now
}

// This candidate has passed all tests; set ‘closest’ 满足所有条件，设置为最近的节点
closest = m;
}
if (closest) break;
// no need for second attempt
}

if (!closest) {
return NULL;
}

sethbmsg( str::stream() fullName(), 0);

return closest;
}

mongodb 复制集随机同步数据

MongoDB主从切换，重新同步数据。

副本集架构：
PRIMARY> rs.conf()
{
"_id" : "poptask",
"version" : 4,
"members" : [
{
"_id" : 0,
"host" : "10.0.0.105:20011"
},
{
"_id" : 1,
"host" : "10.0.0.106:20011"
},
{
"_id" : 2,
"host" : "10.0.0.107:20011"
},
{
"_id" : 3,
"host" : "10.0.0.107:20012",
"arbiterOnly" : true
},
{
"_id" : 4,
"host" : "10.0.0.106:20012",
"arbiterOnly" : true
},
{
"_id" : 5,
"host" : "10.0.0.105:20012",
"arbiterOnly" : true
}
]
}

将主节点从105切换到106，
步骤：
107执行 rs.freeze(),
105执行 rs.stepdown()

切换后的环境：

10.0.0.105(secondary)

10.0.0.106(primary)

10.0.0.107 (secondary)

18号晚上开始切换，105的数据重新同步，数据较快同步完了，昨天在建索引，

今天重新同步107的数据，发现了一点小意外，算是有惊无险。

关掉107的mongod进程后，数据开始同步，此时发现105 down了，105的数据自动删除，然后重新开始同步。
此时停掉105，让107同步完。

经过分析，找到了原因：

1.105上的数据并非是看起来同步完了，关掉107的时刻，105还在做同步；

2.105是从107（从节点）上做的同步，并非是从106主节点同步数据；

3.107关掉时候，105发现连接不到107了，就自己删除已经同步的所有数据，然后从106主节点开始同步数据。
总结：mongodb同步有了新的认识，并非都是从主节点同步数据；即使做切换这样一个简单的操作的时候也要慎重，考虑周全。

105上log:
Tue Dec 18 20:00:15 [clientcursormon] mem (MB) res:61273 virt:1966680 mapped:982801
Tue Dec 18 20:02:48 [rsHealthPoll] replSet info 10.0.0.107:20012 is down (or slow to respond): socket exception
Tue Dec 18 20:02:48 [rsHealthPoll] replSet member 10.0.0.107:20012 is now in state DOWN
Tue Dec 18 20:02:48 [rsHealthPoll] replSet info 10.0.0.107:20011 is down (or slow to respond): socket exception
Tue Dec 18 20:02:48 [rsHealthPoll] replSet member 10.0.0.107:20011 is now in state DOWN
Tue Dec 18 20:02:49 [rsHealthPoll] replSet info 10.0.0.105:20012 is down (or slow to respond): socket exception
Tue Dec 18 20:02:49 [rsHealthPoll] replSet member 10.0.0.105:20012 is now in state DOWN
Tue Dec 18 20:02:49 [rsHealthPoll] replSet info 10.0.0.106:20011 is down (or slow to respond): socket exception
Tue Dec 18 20:02:49 [rsHealthPoll] replSet member 10.0.0.106:20011 is now in state DOWN
Tue Dec 18 20:02:49 [rsHealthPoll] replSet info 10.0.0.106:20012 is down (or slow to respond): socket exception
Tue Dec 18 20:02:49 [rsHealthPoll] replSet member 10.0.0.106:20012 is now in state DOWN
Tue Dec 18 20:02:50 [rsHealthPoll] replSet member 10.0.0.107:20012 is up
Tue Dec 18 20:02:50 [rsHealthPoll] replSet member 10.0.0.107:20012 is now in state ARBITER
Tue Dec 18 20:02:50 [rsHealthPoll] replSet member 10.0.0.107:20011 is up
Tue Dec 18 20:02:50 [rsHealthPoll] replSet member 10.0.0.107:20011 is now in state SECONDARY
Tue Dec 18 20:02:51 [rsHealthPoll] replSet member 10.0.0.105:20012 is up
Tue Dec 18 20:02:51 [rsHealthPoll] replSet member 10.0.0.105:20012 is now in state ARBITER
Tue Dec 18 20:02:51 [rsHealthPoll] replSet member 10.0.0.106:20011 is up
Tue Dec 18 20:02:51 [rsHealthPoll] replSet member 10.0.0.106:20011 is now in state SECONDARY
Tue Dec 18 20:02:51 [rsHealthPoll] replSet member 10.0.0.106:20012 is up
Tue Dec 18 20:02:51 [rsHealthPoll] replSet member 10.0.0.106:20012 is now in state ARBITER
Tue Dec 18 20:02:57 [rsHealthPoll] replSet member 10.0.0.106:20011 is now in state PRIMARY
Tue Dec 18 20:02:57 [rsSync] replSet syncing to: 10.0.0.106:20011
Tue Dec 18 20:04:15 got kill or ctrl c or hup signal 15 (Terminated), will terminate after current cmd ends
Tue Dec 18 20:04:15 [interruptThread] now exiting
Tue Dec 18 20:04:15 dbexit:
Tue Dec 18 20:04:15 [interruptThread] shutdown: going to close listening sockets...
Tue Dec 18 20:04:15 [interruptThread] closing listening socket: 7
Tue Dec 18 20:04:15 [interruptThread] closing listening socket: 8
Tue Dec 18 20:04:15 [interruptThread] closing listening socket: 9
Tue Dec 18 20:04:15 [interruptThread] removing socket file: /tmp/mongodb-20011.sock
Tue Dec 18 20:04:15 [interruptThread] shutdown: going to flush diaglog...
Tue Dec 18 20:04:15 [interruptThread] shutdown: going to close sockets...
Tue Dec 18 20:04:15 [interruptThread] shutdown: waiting for fs preallocator...
Tue Dec 18 20:04:15 [interruptThread] shutdown: lock for final commit...
Tue Dec 18 20:04:15 [interruptThread] shutdown: final commit...
Tue Dec 18 20:04:15 [interruptThread] shutdown: closing all files...
Tue Dec 18 20:04:15 [rsSync] Socket recv() errno:9 Bad file descriptor 10.0.0.106:20011
Tue Dec 18 20:04:15 [rsSync] SocketException: remote: 10.0.0.106:20011 error: 9001 socket exception [1] server [10.0.0.106:20011]
Tue Dec 18 20:04:15 ERROR: Client::shutdown not called: slaveTracking
Tue Dec 18 20:04:15 [rsSync] replSet syncThread: 10278 dbclient error communicating with server: 10.0.0.106:20011
25/491 5%
392/491 79%
Tue Dec 18 20:04:19 [interruptThread] closeAllFiles() finished
Tue Dec 18 20:04:19 [interruptThread] journalCleanup...
Tue Dec 18 20:04:19 [interruptThread] removeJournalFiles
Tue Dec 18 20:04:19 [interruptThread] shutdown: removing fs lock...
Tue Dec 18 20:04:19 dbexit: really exiting now
***** SERVER RESTARTED *****
Tue Dec 18 20:19:24 [initandlisten] MongoDB starting : pid=10105 port=20011 dbpath=/export/data/mongodb_data/ 64-bit host=xxx.com
Tue Dec 18 20:19:24 [initandlisten] db version v2.0.4, pdfile version 4.5
Tue Dec 18 20:19:24 [initandlisten] git version: 329f3c47fe8136c03392c8f0e548506cb21f8ebf
Tue Dec 18 20:19:24 [initandlisten] build info: Linux ip-10-110-9-236 2.6.21.7-2.ec2.v1.2.fc8xen #1 SMP Fri Nov 20 17:48:28 EST 2009 x86_64 BOOST_LIB_VERSION=1_41
Tue Dec 18 20:19:24 [initandlisten] options: { dbpath: "/export/data/mongodb_data/", directoryperdb: true, fork: true, keyFile: "/export/data/key/key", logappend: true, logpath: "/export/data/logs/rs.log", maxConns: 12000, port: 20011, replSet: "poptask", rest: true }
Tue Dec 18 20:19:24 [initandlisten] journal dir=/export/data/mongodb_data/journal
Tue Dec 18 20:19:24 [initandlisten] recover : no journal files present, no recovery needed
Tue Dec 18 20:19:24 [initandlisten] preallocateIsFaster=true 4.48
Tue Dec 18 20:19:25 [initandlisten] preallocateIsFaster=true 8.12
Tue Dec 18 20:19:26 [initandlisten] preallocateIsFaster=true 7.96
Tue Dec 18 20:19:26 [initandlisten] preallocating a journal file /export/data/mongodb_data/journal/prealloc.0
Tue Dec 18 20:19:30 [initandlisten] preallocating a journal file /export/data/mongodb_data/journal/prealloc.1
Tue Dec 18 20:19:34 [initandlisten] preallocating a journal file /export/data/mongodb_data/journal/prealloc.2
Tue Dec 18 20:19:37 [rsStart] replSet can't get local.system.replset config from self or any seed (EMPTYCONFIG)
Tue Dec 18 20:19:37 [rsStart] replSet info you may need to run replSetInitiate -- rs.initiate() in the shell -- if that is not already done
Tue Dec 18 20:19:47 [rsStart] trying to contact 10.0.0.105:20012
Tue Dec 18 20:19:47 [rsStart] trying to contact 10.0.0.106:20011
Tue Dec 18 20:19:47 [rsStart] trying to contact 10.0.0.106:20012
Tue Dec 18 20:19:47 [rsStart] trying to contact 10.0.0.107:20011
Tue Dec 18 20:19:47 [rsStart] trying to contact 10.0.0.107:20012
Tue Dec 18 20:19:47 [rsStart] replSet got config version 4 from a remote, saving locally
Tue Dec 18 20:19:47 [rsStart] replSet info saving a newer config version to local.system.replset
...
Tue Dec 18 20:19:49 [rsHealthPoll] replSet member 10.0.0.107:20011 is up
Tue Dec 18 20:19:49 [rsHealthPoll] replSet member 10.0.0.106:20011 is up
Tue Dec 18 20:19:49 [rsHealthPoll] replSet member 10.0.0.105:20012 is up
Tue Dec 18 20:19:49 [rsHealthPoll] replSet member 10.0.0.107:20012 is up
Tue Dec 18 20:19:49 [rsHealthPoll] replSet member 10.0.0.106:20011 is now in state PRIMARY
Tue Dec 18 20:19:49 [rsHealthPoll] replSet member 10.0.0.105:20012 is now in state ARBITER
Tue Dec 18 20:19:49 [rsHealthPoll] replSet member 10.0.0.107:20012 is now in state ARBITER
Tue Dec 18 20:19:49 [rsHealthPoll] replSet member 10.0.0.106:20012 is up
Tue Dec 18 20:19:49 [rsHealthPoll] replSet member 10.0.0.107:20011 is now in state SECONDARY
Tue Dec 18 20:19:49 [rsHealthPoll] replSet member 10.0.0.106:20012 is now in state ARBITER

Tue Dec 18 20:22:56 [FileAllocator] done allocating datafile /export/data/mongodb_data/local/local.26, size: 2047MB, took 7.54 secs
Tue Dec 18 20:22:58 [rsSync] ******
Tue Dec 18 20:22:58 [rsSync] replSet initial sync pending
Tue Dec 18 20:22:58 [rsSync] replSet syncing to: 10.0.0.107:20011
Tue Dec 18 20:22:58 [rsSync] build index local.me { _id: 1 }
Tue Dec 18 20:22:58 [rsSync] build index done 0 records 0 secs
Tue Dec 18 20:22:58 [rsSync] replSet initial sync drop all databases
Tue Dec 18 20:22:58 [rsSync] dropAllDatabasesExceptLocal 1
Tue Dec 18 20:22:58 [rsSync] replSet initial sync clone all databases
Tue Dec 18 20:22:58 [rsSync] replSet initial sync cloning db: admin

Tue Dec 18 23:00:36 [rsSync] 643882501 objects cloned so far from collection poptask_mg.TASK_SUCCESS
Tue Dec 18 23:00:48 [rsSync] build index poptask_mg.TASK_SUCCESS { _id: 1 }
Tue Dec 18 23:00:52 [clientcursormon] mem (MB) res:58149 virt:828136 mapped:413735
2137000/644712203 0%
4484800/644712203 0%
6770300/644712203 1%
...
643722400/644712203 99%
Tue Dec 18 23:48:50 [rsSync] external sort used : 645 files in 2881 secs
116300/644712203 0%
...
643308900/644712203 99%
Wed Dec 19 05:42:30 [rsSync] done building bottom layer, going to commit
Wed Dec 19 05:45:33 [rsSync] build index done 644712203 records 24284.9 secs
Wed Dec 19 05:45:35 [rsSync] build index poptask_mg.TASK_EXPIRED { _id: 1 }
Wed Dec 19 05:45:35 [rsSync] build index done 8 records 0.03 secs
Wed Dec 19 05:45:41 [rsSync] build index poptask_mg.TASK_FAILURE { _id: 1 }
Wed Dec 19 05:45:42 [rsSync] build index done 115927 records 0.62 secs
Wed Dec 19 05:45:42 [rsSync] build index poptask_mg.TASK_TO_DO { _id: 1 }
Wed Dec 19 05:45:42 [rsSync] build index done 0 records 0.01 secs
Wed Dec 19 05:46:35 [clientcursormon] mem (MB) res:25750 virt:869184 mapped:434205
Wed Dec 19 05:46:42 [rsSync] clone poptask_mg.TASK_SUCCESS201211 3464575
Wed Dec 19 05:46:43 [rsSync] 3530351 objects cloned so far from collection poptask_mg.TASK_SUCCESS201211

Wed Dec 19 07:31:46 [rsSync] info: indexing in foreground on this replica; was a background index build on the primary
Wed Dec 19 07:31:46 [rsSync] build index poptask_mg.TASK_SUCCESS { taskId: 1.0 }
Wed Dec 19 07:31:49 [clientcursormon] mem (MB) res:53381 virt:1131366 mapped:565213
2164200/644712203 0%
...
643273700/644712203 99%
Wed Dec 19 08:23:09 [rsSync] external sort used : 645 files in 3083 secs

Wed Dec 19 13:25:07 [rsSync] done building bottom layer, going to commit
Wed Dec 19 13:25:22 [rsSync] old journal file will be removed: /export/data/mongodb_data/journal/j._115
Wed Dec 19 13:25:22 [rsSync] old journal file will be removed: /export/data/mongodb_data/journal/j._116
Wed Dec 19 13:26:25 [rsSync] DR101 latency warning on journal file open 659ms
Wed Dec 19 13:27:44 [rsSync] build index done 644712203 records 21357.6 secs
Wed Dec 19 13:27:44 [rsSync] info: indexing in foreground on this replica; was a background index build on the primary
Wed Dec 19 13:27:44 [rsSync] build index poptask_mg.TASK_SUCCESS { externalTaskId: 1.0 }
1451500/644712203 0%
...
644000000/644712203 99%
Wed Dec 19 14:34:52 [rsSync] external sort used : 645 files in 4027 secs
96200/644712203 0%
226100/644712203 0%
...
643000000/644712203 99%
Thu Dec 20 02:05:41 [rsSync] external sort used : 645 files in 3595 secs
169700/644712203 0%
...
643867200/644712203 99%
Thu Dec 20 06:44:39 [rsSync] done building bottom layer, going to commit

Thu Dec 20 06:46:25 [rsSync] DR101 latency warning on journal file open 1695ms

Thu Dec 20 06:47:11 [rsSync] build index done 644712203 records 20485.7 secs
Thu Dec 20 06:47:11 [rsSync] info: indexing in foreground on this replica; was a background index build on the primary
...
Thu Dec 20 06:47:16 [rsSync] build index poptask_mg.TASK_SUCCESS201211 { taskId: 1.0 }
1271200/33636123 3%
...
Thu Dec 20 09:49:13 [rsSync] done building bottom layer, going to commit
Thu Dec 20 09:49:17 [rsSync] old journal file will be removed: /export/data/mongodb_data/journal/j._159
Thu Dec 20 09:49:38 [rsSync] old journal file will be removed: /export/data/mongodb_data/journal/j._160
Thu Dec 20 09:50:01 [rsSync] build index done 197750701 records 2770.61 secs
Thu Dec 20 09:50:01 [rsSync] replSet initial sync query minValid
Thu Dec 20 09:50:12 [rsSync] replSet initial oplog application from 10.0.0.107:20011 starting at Dec 18 20:22:52:11 to Dec 20 09:49:55:d2
Thu Dec 20 09:50:29 [rsSync] replSet initialSyncOplogApplication applied 84000 operations, synced to Dec 18 20:35:44:1c
Thu Dec 20 09:50:40 [rsSync] replSet initialSyncOplogApplication applied 219000 operations, synced to Dec 18 20:59:46:8
...
Thu Dec 20 10:18:29 [rsSync] replSet initialSyncOplogApplication applied 13481000 operations, synced to Dec 20 00:58:42:123
Thu Dec 20 10:18:35 [rsSync] Socket recv() errno:104 Connection reset by peer 10.0.0.107:20011
Thu Dec 20 10:18:35 [rsSync] SocketException: remote: 10.0.0.107:20011 error: 9001 socket exception [1] server [10.0.0.107:20011]
Thu Dec 20 10:18:35 [rsSync] replSet initial sync failing, error applying oplog 10278 dbclient error communicating with server: 10.0.0.107:20011
Thu Dec 20 10:18:35 [rsSync] Socket flush send() errno:32 Broken pipe 10.0.0.107:20011
Thu Dec 20 10:18:35 [rsSync] caught exception (socket exception) in destructor (~PiggyBackData)
Thu Dec 20 10:18:35 [rsSync] replSet initial sync failed during applyoplog
Thu Dec 20 10:18:35 [rsSync] replSet cleaning up [1]
Thu Dec 20 10:18:35 [rsSync] replSet cleaning up [2]
Thu Dec 20 10:18:36 [rsHealthPoll] DBClientCursor::init call() failed
Thu Dec 20 10:18:36 [rsHealthPoll] replSet info 10.0.0.107:20011 is down (or slow to respond): DBClientBase::findN: transport error: 10.0.0.107:20011 query: { replSetHeartbeat: "poptask", v: 4, pv: 1, checkEmpty: false, from: "10.0.0.105:20011" }Thu Dec 20 10:18:36 [rsHealthPoll] replSet member 10.0.0.107:20011 is now in state DOWN
Thu Dec 20 10:18:41 [rsSync] replSet initial sync pending
Thu Dec 20 10:18:41 [rsSync] replSet syncing to: 10.0.0.106:20011
Thu Dec 20 10:18:41 [rsSync] replSet initial sync drop all databases
Thu Dec 20 10:18:41 [rsSync] dropAllDatabasesExceptLocal 3
Thu Dec 20 10:18:41 [rsSync] removeJournalFiles
Thu Dec 20 10:18:42 [rsSync] removeJournalFiles
Thu Dec 20 10:18:43 [conn22750] warning: virtual size (480327MB) - mapped size (471258MB) is large (9003MB). could indicate a memory leak

107重做同步完毕
Sat Dec 22 02:00:07 [rsSync] replSet initialSyncOplogApplication applied 21693000 operations, synced to Dec 22 01:08:12:e
Sat Dec 22 02:00:15 [rsSync] replSet initial sync finishing up
Sat Dec 22 02:00:15 [rsSync] replSet set minValid=50d49943:20
Sat Dec 22 02:00:15 [rsSync] build index local.replset.minvalid { _id: 1 }
Sat Dec 22 02:00:15 [rsSync] build index done 0 records 0.021 secs
Sat Dec 22 02:00:15 [rsSync] replSet initial sync done
Sat Dec 22 02:00:16 [rsSync] replSet syncing to: 10.0.0.106:20011
Sat Dec 22 02:00:16 [rsSync] replSet SECONDARY

107
[root@107 ~]# sync
[root@107 ~]# free -m
total used free shared buffers cached
Mem: 64454 64287 167 0 305 59896
-/+ buffers/cache: 4085 60369
Swap: 16386 560 15825
[root@107 ~]# echo 3 >/proc/sys/vm/drop_caches
[root@107 ~]# free -m
total used free shared buffers cached
Mem: 64454 56087 8367 0 1 52068
-/+ buffers/cache: 4017 60437
Swap: 16386 560 15825
[root@107 ~]# free -g
total used free shared buffers cached
Mem: 62 54 8 0 0 50
-/+ buffers/cache: 3 59
Swap: 16 0 15
numactl ?interleave=all /export/servers/mongodb/bin/mongod ?replSet poptask ?keyFile /export/data/key/key ?port 20011 ?dbpath /export/data/mongodb_data/ ?logpath /export/data/logs/rs.log ?logappend ?rest ?directoryperdb ?maxConns 12000 ?fork
root 11095 9799 0 10:51 pts/0 00:00:00 grep mongo
numactl ?interleave=all /export/servers/mongodb/bin/mongod -f /export/servers/mongodb/etc/arbiter.cnf
[root@107 ~]#init 6

[root@107 ~]# free -m
total used free shared buffers cached
Mem: 64454 430 64024 0 13 251
-/+ buffers/cache: 166 64288
Swap: 16386 0 16386

107重启后（此时105mongod是停掉的，arbiter启动状态），主节点106降级了，应用不能写了，这算是一个事故。
降级原因，整个副本集一半成员都是down掉的。

106:
Mon Dec 24 10:52:30 [rsHealthPoll] DBClientCursor::init call() failed
Mon Dec 24 10:52:30 [rsHealthPoll] replSet info 10.0.0.107:20011 is down (or slow to respond): DBClientBase::findN: transport error: 10.0.0.107:20011 query: { replSetHeartbeat: "poptask",
v: 4, pv: 1, checkEmpty: false, from: "10.0.0.106:20011" }
Mon Dec 24 10:52:30 [rsHealthPoll] replSet member 10.0.0.107:20011 is now in state DOWN
Mon Dec 24 10:52:37 [rsHealthPoll] DBClientCursor::init call() failed
Mon Dec 24 10:52:37 [rsHealthPoll] replSet info 10.0.0.107:20012 is down (or slow to respond): DBClientBase::findN: transport error: 10.0.0.107:20012 query: { replSetHeartbeat: "poptask",
v: 4, pv: 1, checkEmpty: false, from: "10.0.0.106:20011" }
Mon Dec 24 10:52:37 [rsHealthPoll] replSet member 10.0.0.107:20012 is now in state DOWN
Mon Dec 24 10:52:57 [rsMgr] can't see a majority of the set, relinquishing primary
Mon Dec 24 10:52:57 [rsMgr] replSet relinquishing primary state
Mon Dec 24 10:52:57 [rsMgr] replSet SECONDARY
Mon Dec 24 10:52:57 [rsMgr] replSet closing client sockets after reqlinquishing primary
Mon Dec 24 10:52:57 [rsHealthPoll] replSet info 10.0.0.106:20012 is down (or slow to respond): socket exception
Mon Dec 24 10:52:57 [rsHealthPoll] replSet member 10.0.0.106:20012 is now in state DOWN
Mon Dec 24 10:52:57 [rsMgr] replSet can't see a majority, will not try to elect self
Mon Dec 24 10:52:59 [rsHealthPoll] replSet info 10.0.0.105:20012 is down (or slow to respond): socket exception
Mon Dec 24 10:52:59 [rsHealthPoll] replSet member 10.0.0.105:20012 is now in state DOWN
Mon Dec 24 10:52:59 [rsHealthPoll] replSet member 10.0.0.106:20012 is up
Mon Dec 24 10:52:59 [rsHealthPoll] replSet member 10.0.0.106:20012 is now in state ARBITER
Mon Dec 24 10:53:01 [rsHealthPoll] replSet member 10.0.0.105:20012 is up
Mon Dec 24 10:53:01 [rsHealthPoll] replSet member 10.0.0.105:20012 is now in state ARBITER
Mon Dec 24 10:56:08 [clientcursormon] mem (MB) res:60093 virt:1373714 mapped:685986
Mon Dec 24 10:57:53 [rsHealthPoll] replSet member 10.0.0.107:20011 is up
Mon Dec 24 10:57:53 [rsHealthPoll] replSet member 10.0.0.107:20011 is now in state SECONDARY
Mon Dec 24 10:57:53 [rsMgr] not electing self, 10.0.0.105:20012 would veto
Mon Dec 24 10:57:58 [rsMgr] replSet info electSelf 1
Mon Dec 24 10:57:58 [rsMgr] replSet PRIMARY
Mon Dec 24 10:58:45 [rsHealthPoll] DBClientCursor::init call() failed
Mon Dec 24 10:58:45 [rsHealthPoll] replSet info 10.0.0.107:20011 is down (or slow to respond): DBClientBase::findN: transport error: 10.0.0.107:20011 query: { replSetHeartbeat: "poptask",
v: 4, pv: 1, checkEmpty: false, from: "10.0.0.106:20011" }
Mon Dec 24 10:58:45 [rsHealthPoll] replSet member 10.0.0.107:20011 is now in state DOWN
Mon Dec 24 10:59:05 [rsMgr] can't see a majority of the set, relinquishing primary
Mon Dec 24 10:59:05 [rsMgr] replSet relinquishing primary state
Mon Dec 24 10:59:05 [rsMgr] replSet SECONDARY
Mon Dec 24 10:59:05 [rsMgr] replSet closing client sockets after reqlinquishing primary
Mon Dec 24 10:59:05 [rsHealthPoll] replSet info 10.0.0.105:20012 is down (or slow to respond): socket exception
Mon Dec 24 10:59:05 [rsHealthPoll] replSet member 10.0.0.105:20012 is now in state DOWN
Mon Dec 24 10:59:06 [rsHealthPoll] replSet info 10.0.0.106:20012 is down (or slow to respond): socket exception
Mon Dec 24 10:59:06 [rsHealthPoll] replSet member 10.0.0.106:20012 is now in state DOWN
Mon Dec 24 10:59:07 [rsHealthPoll] replSet member 10.0.0.105:20012 is up
Mon Dec 24 10:59:07 [rsHealthPoll] replSet member 10.0.0.105:20012 is now in state ARBITER
Mon Dec 24 10:59:08 [rsHealthPoll] replSet member 10.0.0.106:20012 is up
Mon Dec 24 10:59:08 [rsHealthPoll] replSet member 10.0.0.106:20012 is now in state ARBITER
Mon Dec 24 10:59:41 [rsHealthPoll] replSet member 10.0.0.107:20011 is up
Mon Dec 24 10:59:41 [rsHealthPoll] replSet member 10.0.0.107:20011 is now in state SECONDARY
Mon Dec 24 10:59:41 [rsMgr] not electing self, 10.0.0.105:20012 would veto
Mon Dec 24 10:59:46 [rsMgr] replSet info electSelf 1
Mon Dec 24 10:59:46 [rsMgr] replSet PRIMARY
Mon Dec 24 11:01:08 [clientcursormon] mem (MB) res:60108 virt:1373684 mapped:685986
Mon Dec 24 11:01:14 [rsHealthPoll] replSet member 10.0.0.107:20012 is up
Mon Dec 24 11:01:14 [rsHealthPoll] replSet member 10.0.0.107:20012 is now in state STARTUP2
Mon Dec 24 11:01:16 [rsHealthPoll] replSet member 10.0.0.107:20012 is now in state ARBITER
Mon Dec 24 11:06:08 [clientcursormon] mem (MB) res:60132 virt:1373462 mapped:685986

105升级操作系统后
***** SERVER RESTARTED *****
Mon Dec 24 16:15:23 permissions on /export/data/mongodb/key/key are too open
Mon Dec 24 16:15:23 dbexit:
Mon Dec 24 16:15:23 shutdown: going to close listening sockets...
Mon Dec 24 16:15:23 shutdown: going to flush diaglog...
Mon Dec 24 16:15:23 shutdown: going to close sockets...
Mon Dec 24 16:15:23 shutdown: waiting for fs preallocator...
Mon Dec 24 16:15:23 shutdown: lock for final commit...
Mon Dec 24 16:15:23 shutdown: final commit...
Mon Dec 24 16:15:23 shutdown: closing all files...
Mon Dec 24 16:15:23 closeAllFiles() finished
Mon Dec 24 16:15:23 dbexit: really exiting now

MongoDB复制集同步原理及随机同步数据

相关文章

精彩推荐