Tech C**P

Months ago we have talked about how to get mongoDB data changes. THe problem with that article was that if for any reason your script
was stopped you will lose the data in the downtime period.

Now we have a new solution that you will read from the point in time that have read last time. MongoDB uses bson Timestamp in order for its internal usage like replication oplog logs. We can use the same Timestamp and store it somewhere to read from the exact point
that we have read last time.

In python you can import it like below:

from bson.timestamp import Timestamp

Now to read data from that point read that time stamp from where you have saved it and query the oplog from that point:

ts = YOUR_TIMESTAMP_HERE
 cursor = oplog.find({'ts': {'$gt': ts}},
                   cursor_type=pymongo.CursorType.TAILABLE_AWAIT,
                   oplog_replay=True)

After traversing cursors and catching mongoDB changes you can store the new timestamp that resides in ts field in the document you have fetched from MongoDB oplog.

Now use a while True and read data until cursor is alive. The point of this post is that you can store ts somewhere and read from the point you have stored ts.

If you remember from before we got last changes by the query below:

last = oplog.find().sort('$natural', pymongo.DESCENDING).limit(1).next()
 ts = last['ts']

We read the last ts and read from the last record, that's why we were missing data.

#mongodb #mongo #replication #oplog #timestamp #cursor

204 viewsAlireza Hos., 12:22

Tech C**P

In order to get a random document from MongoDB collection you can use aggregate framework:

db.users.aggregate(    [ { $sample: { size: 1 } } ] )

NOTE: MongoDB 3.2 introduced $sample to the aggregation pipeline.

Read more here: https://www.mongodb.com/blog/post/how-to-perform-random-queries-on-mongodb

This method is the fastest and most efficient way of getting random data from a huge database like 100 M records.

#mongodb #mongo #aggregate #sample #random

MongoDB

How to Perform Random Queries on MongoDB | MongoDB Blog

60 viewsAlireza Hos., 14:22

Tech C**P

in pymongo you can give name to your connections. This definitely helps to debug issues or trace logs when seeing mongoDB logs. The
most important part if this scenario is when you are using micro service architecture and you have tens of modules which works independently from each other and send their requests to MongoDB:

mc = pymongo.MongoClient(host, port, appname='YOUR_APP_NAME')

Now if you look at the MongoDB log you would see:

I COMMAND  [conn173140] command MY_DB.users appName: "YOUR_APP_NAME" command: find { find: "deleted_users", filter: {}, sort: {        acquired_date: 1 }, skip: 19973, limit: 1000, $readPreference: { mode: "secondaryPreferred" }, $db: "blahblah" } planSummary:          COLLSCAN keysExamined:0 docsExamined:19973 hasSortStage:1 cursorExhausted:1 numYields:312 nreturned:0 reslen:235 locks:{ Global: {     acquireCount: { r: 626 } }, Database: { acquireCount: { r: 313 } }, Collection: { acquireCount: { r: 313 } } } protocol:op_query 153ms

In the above log you would see YOUR_APP_NAME.

#mongodb #mongo #pymongo #appname

73 viewsAlireza Hos., 13:14

Tech C**P

How to ignore extra fields for schema validation in Mongoengine?

Some records currently have extra fields that are not included in my model schema (by error, but I want to handle these cases). When I try to query the DB and transform the records into the schema, I get the following error:

FieldDoesNotExist
The field 'X' does not exist on the document 'Y'

For ignoring this error when having extra fields while getting data, set strict to False in your meta dictionary.

class User(Document):
    email = StringField(required=True, unique=True)
    password = StringField()
    meta = {'strict': False}

#mongodb #mongo #python #mongoengine #strict #FieldDoesNotExist

88 viewsAlireza Hos., 11:55

Tech C**P

In MongoDB you can remove duplicate documents based on a specific field:

db.yourCollection.aggregate([
     { "$group": {
         "_id": { "yourDuplicateKey": "$yourDuplicateKey" },
         "dups": { "$push": "$_id" },
         "count": { "$sum": 1 }
     }},
     { "$match": { "count": { "$gt": 1 } }}
 ]).forEach(function(doc) {
     doc.dups.shift();
     db.yourCollection.remove({ "_id": {"$in": doc.dups }});
 });

It uses aggregation to group by based on the given key then add its _id into dups field and its count in count field. It will project fields with count of more than 1 using $match. At the end loops over each document and remove all duplicate fields except the first one (`shift` will cause this behaviour).

#mongodb #mongo #duplicates #duplication

85 viewsAlireza Hos., 12:52

Tech C**P

How to configure a Delayed Replica Set Member?

Let's assume that our member is third in the array of replica members:

cfg = rs.conf()
 cfg.members[2].priority = 0
 cfg.members[2].hidden = true
 cfg.members[2].slaveDelay = 3600
 rs.reconfig(cfg)

The priority is set to 0 (preventing to be elected as primary).

The hidden to true in order to hide the node from clients querying the database.

And finally slaveDelay to number of seconds that we want it to get behind of Primary Node.

The use case for this is to have a replication that is used for analytical purposes or used for backup and so on.

#mongodb #mongo #replica #replication #primary #delayed_replica_set #slaveDelay

80 viewsAlireza Hos., 13:24

Tech C**P

How to add self-signed certificates to replica set nodes?

https://medium.com/@rossbulat/deploy-a-3-node-mongodb-3-6-replica-set-with-x-509-authentication-self-signed-certificates-d539fda94db4

#mongo #mongodb #ssl #self_signed #openssl

Medium

Deploy a 3-Node MongoDB 4.0 Replica Set with X.509 Authentication + Self Signed Certificates

This article will guide you through the process of setting up a MongoDB cluster that will utilise X.509 authentication with self signed…

86 viewsAlireza Hos., edited 13:27

Tech C**P

In order to see how much time your mongoDB slave is behind the primary node:

rs0:SECONDARY> db.printSlaveReplicationInfo()
 source: mongo.mongo.com:27017
     syncedTo: Mon Nov 12 2018 06:33:40 GMT+0000 (UTC)
     -4 secs (0 hrs) behind the primary

#mongodb #mongo #slave #printSlaveReplicationInfo #replica #replication

79 viewsAlireza Hos., 06:37

Tech C**P

In MongoDB you can compare one field to another using $expr:

db.users.find({ $expr: { $eq: ["$created_at", "$updated_at"] } })

Here we get users that their updated_at field is equal to created_at field, here it means that user has not yet updated his profile.

#mongodb #mongo #expr #find

94 viewsAlireza Hos., 14:49

Tech C**P

MongoDB server Load Average: 0.5 (It can reach 16)
Database Size: 100GB (It is compressed in MySQL it reaches 300 GB in size!)
Req/Sec: 500

Our server seems hungry for more requests and more data.

#mongodb #mongo #awesomeness

104 viewsAlireza Hos., edited 10:56

About

Blog

Apps

Platform