Months ago we have talked about how to get mongoDB data changes. THe problem with that article was that if for any reason your script
was stopped you will lose the data in the downtime period.
Now we have a new solution that you will read from the point in time that have read last time. MongoDB uses bson Timestamp in order for its internal usage like replication oplog logs. We can use the same Timestamp and store it somewhere to read from the exact point
that we have read last time.
In python you can import it like below:
Now to read data from that point read that time stamp from where you have saved it and query the oplog from that point:
After traversing cursors and catching mongoDB changes you can store the new timestamp that resides in
Now use a
If you remember from before we got last changes by the query below:
We read the last ts and read from the last record, that's why we were missing data.
#mongodb #mongo #replication #oplog #timestamp #cursor
was stopped you will lose the data in the downtime period.
Now we have a new solution that you will read from the point in time that have read last time. MongoDB uses bson Timestamp in order for its internal usage like replication oplog logs. We can use the same Timestamp and store it somewhere to read from the exact point
that we have read last time.
In python you can import it like below:
from bson.timestamp import Timestamp
Now to read data from that point read that time stamp from where you have saved it and query the oplog from that point:
ts = YOUR_TIMESTAMP_HERE
cursor = oplog.find({'ts': {'$gt': ts}},
cursor_type=pymongo.CursorType.TAILABLE_AWAIT,
oplog_replay=True)
After traversing cursors and catching mongoDB changes you can store the new timestamp that resides in
ts
field in the document you have fetched from MongoDB oplog.Now use a
while True
and read data until cursor is alive. The point of this post is that you can store ts somewhere and read from the point you have stored ts.If you remember from before we got last changes by the query below:
last = oplog.find().sort('$natural', pymongo.DESCENDING).limit(1).next()
ts = last['ts']
We read the last ts and read from the last record, that's why we were missing data.
#mongodb #mongo #replication #oplog #timestamp #cursor
In order to get a random document from MongoDB collection you can use aggregate framework:
Read more here: https://www.mongodb.com/blog/post/how-to-perform-random-queries-on-mongodb
This method is the fastest and most efficient way of getting random data from a huge database like 100 M records.
#mongodb #mongo #aggregate #sample #random
db.users.aggregate( [ { $sample: { size: 1 } } ] )
NOTE:
MongoDB 3.2 introduced $sample
to the aggregation pipeline.Read more here: https://www.mongodb.com/blog/post/how-to-perform-random-queries-on-mongodb
This method is the fastest and most efficient way of getting random data from a huge database like 100 M records.
#mongodb #mongo #aggregate #sample #random
MongoDB
How to Perform Random Queries on MongoDB | MongoDB Blog
in
most important part if this scenario is when you are using micro service architecture and you have tens of modules which works independently from each other and send their requests to
Now if you look at the MongoDB log you would see:
In the above log you would see
#mongodb #mongo #pymongo #appname
pymongo
you can give name to your connections. This definitely helps to debug issues or trace logs when seeing mongoDB logs. Themost important part if this scenario is when you are using micro service architecture and you have tens of modules which works independently from each other and send their requests to
MongoDB
:mc = pymongo.MongoClient(host, port, appname='YOUR_APP_NAME')
Now if you look at the MongoDB log you would see:
I COMMAND [conn173140] command MY_DB.users appName: "YOUR_APP_NAME" command: find { find: "deleted_users", filter: {}, sort: { acquired_date: 1 }, skip: 19973, limit: 1000, $readPreference: { mode: "secondaryPreferred" }, $db: "blahblah" } planSummary: COLLSCAN keysExamined:0 docsExamined:19973 hasSortStage:1 cursorExhausted:1 numYields:312 nreturned:0 reslen:235 locks:{ Global: { acquireCount: { r: 626 } }, Database: { acquireCount: { r: 313 } }, Collection: { acquireCount: { r: 313 } } } protocol:op_query 153ms
In the above log you would see
YOUR_APP_NAME
.#mongodb #mongo #pymongo #appname
Create and assin a self-signed certificate with the bash script below:
- https://gist.github.com/alirezastack/30c8849e4add4329dcc2633fbb06a638
#mongodb #ssl #self_signed
- https://gist.github.com/alirezastack/30c8849e4add4329dcc2633fbb06a638
#mongodb #ssl #self_signed
Gist
Use this script to create a self signed certificate for your MongoDB instance
Simple bash script to take nightly
Put it in a crontab for nightly backups. First open crotab:
Create a new line (entry) in crontab and paste the below cron task:
#mongodb #backup #cron #cronjob #coderwall #mongodump #bash
MongoDB
backups:#!/bin/sh
DIR=`date +%m%d%y`
DEST=/db_backups/$DIR
mkdir $DEST
mongodump -h <your_database_host> -d <your_database_name> -u <username> -p <password> -o $DEST
NOTE:
db_backups folder shoud already be created by mkdir /db_backups
.Put it in a crontab for nightly backups. First open crotab:
sudo crontab -e
Create a new line (entry) in crontab and paste the below cron task:
45 1 * * * ../../scripts/db_backup.sh
NOTE:
here our script is called db_backup.sh, should you use your own script name here. and make it executable by chmod +x /your/ full_path/scripts/db_backup.sh
#mongodb #backup #cron #cronjob #coderwall #mongodump #bash
How to ignore extra fields for schema validation in
Some records currently have extra fields that are not included in my model schema (by error, but I want to handle these cases). When I try to query the DB and transform the records into the schema, I get the following error:
For ignoring this error when having extra fields while getting data, set
#mongodb #mongo #python #mongoengine #strict #FieldDoesNotExist
Mongoengine
?Some records currently have extra fields that are not included in my model schema (by error, but I want to handle these cases). When I try to query the DB and transform the records into the schema, I get the following error:
FieldDoesNotExist
The field 'X' does not exist on the document 'Y'
For ignoring this error when having extra fields while getting data, set
strict
to False
in your meta dictionary.class User(Document):
email = StringField(required=True, unique=True)
password = StringField()
meta = {'strict': False}
#mongodb #mongo #python #mongoengine #strict #FieldDoesNotExist
In
It uses
#mongodb #mongo #duplicates #duplication
MongoDB
you can remove duplicate documents based on a specific field:db.yourCollection.aggregate([
{ "$group": {
"_id": { "yourDuplicateKey": "$yourDuplicateKey" },
"dups": { "$push": "$_id" },
"count": { "$sum": 1 }
}},
{ "$match": { "count": { "$gt": 1 } }}
]).forEach(function(doc) {
doc.dups.shift();
db.yourCollection.remove({ "_id": {"$in": doc.dups }});
});
It uses
aggregation
to group by based on the given key then add its _id
into dups
field and its count in count
field. It will project fields with count of more than 1 using $match
. At the end loops over each document and remove all duplicate fields except the first one (`shift` will cause this behaviour).#mongodb #mongo #duplicates #duplication
How to configure a Delayed Replica Set Member?
Let's assume that our member is third in the array of replica members:
The
The
And finally
The use case for this is to have a replication that is used for analytical purposes or used for backup and so on.
#mongodb #mongo #replica #replication #primary #delayed_replica_set #slaveDelay
Let's assume that our member is third in the array of replica members:
cfg = rs.conf()
cfg.members[2].priority = 0
cfg.members[2].hidden = true
cfg.members[2].slaveDelay = 3600
rs.reconfig(cfg)
The
priority
is set to 0 (preventing to be elected as primary).The
hidden
to true in order to hide the node from clients querying the database.And finally
slaveDelay
to number of seconds that we want it to get behind of Primary Node
.The use case for this is to have a replication that is used for analytical purposes or used for backup and so on.
#mongodb #mongo #replica #replication #primary #delayed_replica_set #slaveDelay
How to add self-signed certificates to replica set nodes?
https://medium.com/@rossbulat/deploy-a-3-node-mongodb-3-6-replica-set-with-x-509-authentication-self-signed-certificates-d539fda94db4
#mongo #mongodb #ssl #self_signed #openssl
https://medium.com/@rossbulat/deploy-a-3-node-mongodb-3-6-replica-set-with-x-509-authentication-self-signed-certificates-d539fda94db4
#mongo #mongodb #ssl #self_signed #openssl
Medium
Deploy a 3-Node MongoDB 4.0 Replica Set with X.509 Authentication + Self Signed Certificates
This article will guide you through the process of setting up a MongoDB cluster that will utilise X.509 authentication with self signed…
In order to see how much time your mongoDB slave is behind the primary node:
#mongodb #mongo #slave #printSlaveReplicationInfo #replica #replication
rs0:SECONDARY> db.printSlaveReplicationInfo()
source: mongo.mongo.com:27017
syncedTo: Mon Nov 12 2018 06:33:40 GMT+0000 (UTC)
-4 secs (0 hrs) behind the primary
#mongodb #mongo #slave #printSlaveReplicationInfo #replica #replication
How to check
We assume here that you have a replica set in place. First download the python script for our nagios plugin:
Now the
Create a new file
Create a new file in
This service gets enabled where it finds
#sysadmin #icinga2 #mongodb #replication #replication_lag #nagios_plugin
MongoDB
replication lag in Icinga2
and get notified when it is over 15 seconds?We assume here that you have a replica set in place. First download the python script for our nagios plugin:
cd /usr/lib/nagios/plugins
git clone git://github.com/mzupan/nagios-plugin-mongodb.git
Now the
Icinga2
part. You first need to create a command for replication lag check:cd /etc/icinga2/conf.d/commands
Create a new file
replication_lag.conf
:object CheckCommand "check_replication_lag" {
import "plugin-check-command"
command = [ PluginDir + "/nagios-plugin-mongodb/check_mongodb.py", "-A", "replication_lag" ]
arguments = {
"-H" = "$mongo_host$"
"-P" = "$mongo_port$"
}
}
Create a new file in
services
folder called replication_lag.conf
:apply Service for (display_name => config in host.vars.replication) {
import "generic-service"
check_command = "check_replication_lag"
vars += config
assign where host.vars.replication
}
This service gets enabled where it finds
replication
in host config. Now in secondary mongoDB hosts configuration add the below part:vars.replication["Secondary DB"] = {
mongo_host = "slave.example.com"
mongo_port = 27017
}
#sysadmin #icinga2 #mongodb #replication #replication_lag #nagios_plugin
MongoDB server Load Average: 0.5 (It can reach 16)
Database Size: 100GB (It is compressed in MySQL it reaches 300 GB in size!)
Req/Sec: 500
Our server seems hungry for more requests and more data.
#mongodb #mongo #awesomeness
Database Size: 100GB (It is compressed in MySQL it reaches 300 GB in size!)
Req/Sec: 500
Our server seems hungry for more requests and more data.
#mongodb #mongo #awesomeness
In
It will find all users that their names start with
#mongoDB #pymongo #regex
MongoDB
you can use $regex
in order to find something based on a regex pattern:my_col.find({'name': { $regex: '^ali.*' } })
It will find all users that their names start with
ali
. Now let's say you want to search users based on their phone country code which has a + in its number like +98901...
. You need to escape the + character but escape it twice:my_col.find({'phone': { $regex: '^\\+98.*' } })
#mongoDB #pymongo #regex
Is there a way to create ObjectID from an INT in
#mongodb #objectid #pymongo #python #bson #int
MongoDB
?import bson
def object_id_from_int(n):
s = str(n)
s = '0' * (24 - len(s)) + s
return bson.ObjectId(s)
def int_from_object_id(obj):
return int(str(obj))
n = 12345
obj = object_id_from_int(n)
n = int_from_object_id(obj)
print(repr(obj)) # ObjectId('000000000000000000012345')
print(n) # 12345
#mongodb #objectid #pymongo #python #bson #int
Did you know you can use jsonSchema in MongoDB to search for documents?
Let's say you have
Now let's say you want to find all documents that has a customer_id of type
In Mongo shell:
This schema says look for documents that have
Interesting, right? :)
#database #mongodb #jsonSchema #json_schema
Let's say you have
users
collection with data below:{ "_id" : ObjectId("5f64bd1eca8806f2c04fcbe3"), "customer_id" : 100, "username" : "john" }
{ "_id" : ObjectId("5f64bd1eca8806f2c04fcbe5"), "customer_id" : 206, "username" : "new_customer" }
{ "_id" : ObjectId("60420df441558d6671cf54f2"), "customer_id" : "123", "username" : "Ali" }
Now let's say you want to find all documents that has a customer_id of type
string
instead of int
.In Mongo shell:
let ms = {required: ["customer_id"], properties: {customer_id: {bsonType: "string"}}}
This schema says look for documents that have
customer_id
field with string
type. To search:> db.customers.find({$jsonSchema: ms})
{ "_id" : ObjectId("60420df441558d6671cf54f2"), "customer_id" : "123", "username" : "Ali" }
Interesting, right? :)
#database #mongodb #jsonSchema #json_schema