Tech C**P
15 subscribers
161 photos
9 videos
59 files
304 links
مدرس و برنامه نویس پایتون و لینوکس @alirezastack
Download Telegram
What is shard and replication in mongoDB? What is their differences?

MongoDB has 2 concepts that may lead even intermediate programmers to confusion! So let's break it down and explain both in depth.

1- Take a deep breath. :)

2- Replication: replicate means reproducing or making an exact copy of something. In MongoDB replication, mirror all data sets into another server. This is process is used for fault tolerance. If there are 4 mongo servers and your dataset is 1 terabyte, each node in replica-set will have 1 terabyte of data.
In replica-set there is ONE master (primary) node, and one or more slaves (secondary). Read performance can be improved by adding more and more slaves, but not writes! Adding more slaves does not affect writes, that's because all writes goes to master first and then will be propagated to other slaves.

3- Sharding: sharding on the other hand has completely a different concept. If you have a server with 1 terabyte of data and you have 4 servers, then each nore will have 250 gigabyte of data each. As you may have guessed it is not fault tolerant because each part of data resides in a separate server. Each read and write will be sent to the corresponding section. So if you add more shards, both read and write performance will be improved in the cluster. When one shard of the cluster goes down, any data on it is inaccessible. For that reason each member of the cluster should also be a replica-set, but not required to.

4- Take another deep breath, and let's get back to work.

#mongodb #mongo #shard #replica #replication #sharding #cluster
In order to connect to MongoDB replica set in Python you can give all server node addersses to MongoClient. Addresses passed to MongoClient() are called the seeds. As long as at least one of the seeds is online, MongoClient discovers all the members in the replica set, and determines which is the current primary and which are secondaries or arbiters.


Sample usages:

>>> MongoClient('localhost', replicaset='foo')
MongoClient(host=['localhost:27017'], replicaset='foo', ...)
>>> MongoClient('localhost:27018', replicaset='foo')
MongoClient(['localhost:27018'], replicaset='foo', ...)
>>> MongoClient('localhost', 27019, replicaset='foo')
MongoClient(['localhost:27019'], replicaset='foo', ...)
>>> MongoClient('mongodb://localhost:27017,localhost:27018/?replicaSet=foo')
MongoClient(['localhost:27017', 'localhost:27018'], replicaset='foo', ...)

Read full details here:

- http://api.mongodb.com/python/current/examples/high_availability.html#connecting-to-a-replica-set


#database #mongodb #mongo #replica_set #replication #pymongo #arbiter #master #primary #slave
Secondary Reads

By default an instance of MongoClient sends queries to the primary member of the replica set. To use secondaries for queries we have to change the read preference:

>>> client = MongoClient(
... 'localhost:27017',
... replicaSet='foo',
... readPreference='secondaryPreferred')
>>> client.read_preference
SecondaryPreferred(tag_sets=None)


Now all queries will be sent to the secondary members of the set. If there are no secondary members the primary will be used as a fallback. If you have queries you would prefer to never send to the primary you can specify that using the secondary read preference.

#mongodb #replica_set #replication #secondary #slave #pymongo
Months ago we have talked about how to get mongoDB data changes. THe problem with that article was that if for any reason your script
was stopped you will lose the data in the downtime period.

Now we have a new solution that you will read from the point in time that have read last time. MongoDB uses bson Timestamp in order for its internal usage like replication oplog logs. We can use the same Timestamp and store it somewhere to read from the exact point
that we have read last time.

In python you can import it like below:

from bson.timestamp import Timestamp


Now to read data from that point read that time stamp from where you have saved it and query the oplog from that point:

ts = YOUR_TIMESTAMP_HERE
cursor = oplog.find({'ts': {'$gt': ts}},
cursor_type=pymongo.CursorType.TAILABLE_AWAIT,
oplog_replay=True)

After traversing cursors and catching mongoDB changes you can store the new timestamp that resides in ts field in the document you have fetched from MongoDB oplog.

Now use a while True and read data until cursor is alive. The point of this post is that you can store ts somewhere and read from the point you have stored ts.


If you remember from before we got last changes by the query below:

last = oplog.find().sort('$natural', pymongo.DESCENDING).limit(1).next()
ts = last['ts']


We read the last ts and read from the last record, that's why we were missing data.

#mongodb #mongo #replication #oplog #timestamp #cursor
How to configure a Delayed Replica Set Member?

Let's assume that our member is third in the array of replica members:

cfg = rs.conf()
cfg.members[2].priority = 0
cfg.members[2].hidden = true
cfg.members[2].slaveDelay = 3600
rs.reconfig(cfg)

The priority is set to 0 (preventing to be elected as primary).

The hidden to true in order to hide the node from clients querying the database.

And finally slaveDelay to number of seconds that we want it to get behind of Primary Node.

The use case for this is to have a replication that is used for analytical purposes or used for backup and so on.

#mongodb #mongo #replica #replication #primary #delayed_replica_set #slaveDelay
In order to see how much time your mongoDB slave is behind the primary node:

rs0:SECONDARY> db.printSlaveReplicationInfo()
source: mongo.mongo.com:27017
syncedTo: Mon Nov 12 2018 06:33:40 GMT+0000 (UTC)
-4 secs (0 hrs) behind the primary

#mongodb #mongo #slave #printSlaveReplicationInfo #replica #replication
How to check MongoDB replication lag in Icinga2 and get notified when it is over 15 seconds?

We assume here that you have a replica set in place. First download the python script for our nagios plugin:

cd /usr/lib/nagios/plugins
git clone git://github.com/mzupan/nagios-plugin-mongodb.git

Now the Icinga2 part. You first need to create a command for replication lag check:

cd /etc/icinga2/conf.d/commands

Create a new file replication_lag.conf:

object CheckCommand "check_replication_lag" {
import "plugin-check-command"
command = [ PluginDir + "/nagios-plugin-mongodb/check_mongodb.py", "-A", "replication_lag" ]
arguments = {
"-H" = "$mongo_host$"
"-P" = "$mongo_port$"
}
}


Create a new file in services folder called replication_lag.conf:

apply Service for (display_name => config in host.vars.replication) {
import "generic-service"
check_command = "check_replication_lag"
vars += config
assign where host.vars.replication
}


This service gets enabled where it finds replication in host config. Now in secondary mongoDB hosts configuration add the below part:

vars.replication["Secondary DB"] = {
mongo_host = "slave.example.com"
mongo_port = 27017
}

#sysadmin #icinga2 #mongodb #replication #replication_lag #nagios_plugin