What is
MongoDB has 2 concepts that may lead even intermediate programmers to confusion! So let's break it down and explain both in depth.
1- Take a deep breath. :)
2- Replication: replicate means reproducing or making an exact copy of something. In
In
3- Sharding: sharding on the other hand has completely a different concept. If you have a server with 1 terabyte of data and you have 4 servers, then each nore will have 250 gigabyte of data each. As you may have guessed it is not fault tolerant because each part of data resides in a separate server. Each read and write will be sent to the corresponding section. So if you add more shards, both read and write performance will be improved in the cluster. When one shard of the cluster goes down, any data on it is inaccessible. For that reason each member of the cluster should also be a replica-set, but not required to.
4- Take another deep breath, and let's get back to work.
#mongodb #mongo #shard #replica #replication #sharding #cluster
shard
and replication
in mongoDB
? What is their differences?MongoDB has 2 concepts that may lead even intermediate programmers to confusion! So let's break it down and explain both in depth.
1- Take a deep breath. :)
2- Replication: replicate means reproducing or making an exact copy of something. In
MongoDB
replication, mirror all data sets into another server. This is process is used for fault tolerance. If there are 4 mongo servers and your dataset is 1 terabyte, each node in replica-set will have 1 terabyte of data.In
replica-set
there is ONE master (primary) node, and one or more slaves (secondary). Read performance can be improved by adding more and more slaves, but not writes! Adding more slaves does not affect writes, that's because all writes goes to master first and then will be propagated to other slaves.3- Sharding: sharding on the other hand has completely a different concept. If you have a server with 1 terabyte of data and you have 4 servers, then each nore will have 250 gigabyte of data each. As you may have guessed it is not fault tolerant because each part of data resides in a separate server. Each read and write will be sent to the corresponding section. So if you add more shards, both read and write performance will be improved in the cluster. When one shard of the cluster goes down, any data on it is inaccessible. For that reason each member of the cluster should also be a replica-set, but not required to.
4- Take another deep breath, and let's get back to work.
#mongodb #mongo #shard #replica #replication #sharding #cluster
In order to connect to
Sample usages:
Read full details here:
- http://api.mongodb.com/python/current/examples/high_availability.html#connecting-to-a-replica-set
#database #mongodb #mongo #replica_set #replication #pymongo #arbiter #master #primary #slave
MongoDB replica set
in Python
you can give all server node addersses to MongoClient
. Addresses passed to MongoClient()
are called the seeds. As long as at least one of the seeds is online, MongoClient
discovers all the members in the replica set, and determines which is the current primary and which are secondaries or arbiters.Sample usages:
>>> MongoClient('localhost', replicaset='foo')
MongoClient(host=['localhost:27017'], replicaset='foo', ...)
>>> MongoClient('localhost:27018', replicaset='foo')
MongoClient(['localhost:27018'], replicaset='foo', ...)
>>> MongoClient('localhost', 27019, replicaset='foo')
MongoClient(['localhost:27019'], replicaset='foo', ...)
>>> MongoClient('mongodb://localhost:27017,localhost:27018/?replicaSet=foo')
MongoClient(['localhost:27017', 'localhost:27018'], replicaset='foo', ...)
Read full details here:
- http://api.mongodb.com/python/current/examples/high_availability.html#connecting-to-a-replica-set
#database #mongodb #mongo #replica_set #replication #pymongo #arbiter #master #primary #slave
Secondary Reads
By default an instance of MongoClient sends queries to the primary member of the replica set. To use secondaries for queries we have to change the read preference:
>>> client = MongoClient(
... 'localhost:27017',
... replicaSet='foo',
... readPreference='secondaryPreferred')
>>> client.read_preference
SecondaryPreferred(tag_sets=None)
Now all queries will be sent to the secondary members of the set. If there are no secondary members the primary will be used as a fallback. If you have queries you would prefer to never send to the primary you can specify that using the secondary read preference.
#mongodb #replica_set #replication #secondary #slave #pymongo
Months ago we have talked about how to get mongoDB data changes. THe problem with that article was that if for any reason your script
was stopped you will lose the data in the downtime period.
Now we have a new solution that you will read from the point in time that have read last time. MongoDB uses bson Timestamp in order for its internal usage like replication oplog logs. We can use the same Timestamp and store it somewhere to read from the exact point
that we have read last time.
In python you can import it like below:
Now to read data from that point read that time stamp from where you have saved it and query the oplog from that point:
After traversing cursors and catching mongoDB changes you can store the new timestamp that resides in
Now use a
If you remember from before we got last changes by the query below:
We read the last ts and read from the last record, that's why we were missing data.
#mongodb #mongo #replication #oplog #timestamp #cursor
was stopped you will lose the data in the downtime period.
Now we have a new solution that you will read from the point in time that have read last time. MongoDB uses bson Timestamp in order for its internal usage like replication oplog logs. We can use the same Timestamp and store it somewhere to read from the exact point
that we have read last time.
In python you can import it like below:
from bson.timestamp import Timestamp
Now to read data from that point read that time stamp from where you have saved it and query the oplog from that point:
ts = YOUR_TIMESTAMP_HERE
cursor = oplog.find({'ts': {'$gt': ts}},
cursor_type=pymongo.CursorType.TAILABLE_AWAIT,
oplog_replay=True)
After traversing cursors and catching mongoDB changes you can store the new timestamp that resides in
ts
field in the document you have fetched from MongoDB oplog.Now use a
while True
and read data until cursor is alive. The point of this post is that you can store ts somewhere and read from the point you have stored ts.If you remember from before we got last changes by the query below:
last = oplog.find().sort('$natural', pymongo.DESCENDING).limit(1).next()
ts = last['ts']
We read the last ts and read from the last record, that's why we were missing data.
#mongodb #mongo #replication #oplog #timestamp #cursor
How to configure a Delayed Replica Set Member?
Let's assume that our member is third in the array of replica members:
The
The
And finally
The use case for this is to have a replication that is used for analytical purposes or used for backup and so on.
#mongodb #mongo #replica #replication #primary #delayed_replica_set #slaveDelay
Let's assume that our member is third in the array of replica members:
cfg = rs.conf()
cfg.members[2].priority = 0
cfg.members[2].hidden = true
cfg.members[2].slaveDelay = 3600
rs.reconfig(cfg)
The
priority
is set to 0 (preventing to be elected as primary).The
hidden
to true in order to hide the node from clients querying the database.And finally
slaveDelay
to number of seconds that we want it to get behind of Primary Node
.The use case for this is to have a replication that is used for analytical purposes or used for backup and so on.
#mongodb #mongo #replica #replication #primary #delayed_replica_set #slaveDelay
In order to see how much time your mongoDB slave is behind the primary node:
#mongodb #mongo #slave #printSlaveReplicationInfo #replica #replication
rs0:SECONDARY> db.printSlaveReplicationInfo()
source: mongo.mongo.com:27017
syncedTo: Mon Nov 12 2018 06:33:40 GMT+0000 (UTC)
-4 secs (0 hrs) behind the primary
#mongodb #mongo #slave #printSlaveReplicationInfo #replica #replication
How to check
We assume here that you have a replica set in place. First download the python script for our nagios plugin:
Now the
Create a new file
Create a new file in
This service gets enabled where it finds
#sysadmin #icinga2 #mongodb #replication #replication_lag #nagios_plugin
MongoDB
replication lag in Icinga2
and get notified when it is over 15 seconds?We assume here that you have a replica set in place. First download the python script for our nagios plugin:
cd /usr/lib/nagios/plugins
git clone git://github.com/mzupan/nagios-plugin-mongodb.git
Now the
Icinga2
part. You first need to create a command for replication lag check:cd /etc/icinga2/conf.d/commands
Create a new file
replication_lag.conf
:object CheckCommand "check_replication_lag" {
import "plugin-check-command"
command = [ PluginDir + "/nagios-plugin-mongodb/check_mongodb.py", "-A", "replication_lag" ]
arguments = {
"-H" = "$mongo_host$"
"-P" = "$mongo_port$"
}
}
Create a new file in
services
folder called replication_lag.conf
:apply Service for (display_name => config in host.vars.replication) {
import "generic-service"
check_command = "check_replication_lag"
vars += config
assign where host.vars.replication
}
This service gets enabled where it finds
replication
in host config. Now in secondary mongoDB hosts configuration add the below part:vars.replication["Secondary DB"] = {
mongo_host = "slave.example.com"
mongo_port = 27017
}
#sysadmin #icinga2 #mongodb #replication #replication_lag #nagios_plugin