Data Analysis
Create a
dataframe
from dictionary in Pandas
:import pandas
data = [{'id': 1, 'name': 'alireza'}, {'id': 2, 'name': 'Mohsen'}]
# Creating a dataframe from a dictionary object
df = pandas.DataFrame(data)
Now if you print dataframe:
> df
id name
0 1 alireza
1 2 Mohsen
NOTE:
the first column is the index column.In order to turn it to a dictionary after your aggregation, analysis, etc just use
to_dict
like below:df.to_dict(orient='records')
[{'id': 1, 'name': 'alireza'}, {'id': 2, 'name': 'Mohsen'}]
You are right! We didn't do anything useful on records, but the goal is to tell you how to turn dataframe to a dictionary not more.
NOTE:
on older version of pandas you have to use outtype='records'
rather than orient='records'
.#python #pandas #to_dict #outtype #orient #dictionary #dataframe
Allow outgoing requests through server with
If by debugging you can confirm that your server cannot see external port but other servers can, just add the desired port to
#linux #csf #iptables
CSF
(ConfigServer Security & Firewall)If by debugging you can confirm that your server cannot see external port but other servers can, just add the desired port to
CSF
with the destination IP address in the format of tcp/udp|in/out|s/d=port|s/d=ip
in /etc/csf/csf.allow
:tcp|out|d=1080|d=15.9.8.223The above line is advanced port+ip filtering which opens 1080 port on 15.9.8.223 destination server. You can just enter onr IP address per line to be allowed through iptables. In order to apply new changes just enter
csf -r
in command line after saving the above filter.NOTE:
one IP address per line is mandatory.NOTE:
CIDR addressing allowed with a quaded IP (e.g. 192.168.254.0/24)NOTE:
Only list IP addresses, not domain names (they will be ignored)#linux #csf #iptables
How to find documents with a specific field type?
It may happen to have different field types in a specific field like credit, and some of them are numeric and some of them string. In order to find
If you want to remove those fields, if applicable, use
#mongodb #mongo #type #field_type #remove #find
It may happen to have different field types in a specific field like credit, and some of them are numeric and some of them string. In order to find
NumberLong()
field types you can use $type
:db.users.find({credit: {$type: "long" }})
If you want to remove those fields, if applicable, use
remove
instead of find
to remove those documents that has wrong types. It is not sensible to do that for users document though, it just gives you the idea.#mongodb #mongo #type #field_type #remove #find
Get current directory from within the bash script:
#linux #bash #script #shell #pwd #current_directory
SCRIPT_DIR="$( cd "$( dirname "$0" )" && pwd )"
echo "$SCRIPT_DIR"
dirname
gets the current directory and cd
will change the current directory, finally pwd
will return the current working directory, which in our case will be stored in SCRIPT_DIR
.#linux #bash #script #shell #pwd #current_directory
By
One of the
When you get your bot token, all thing you need to do to send a message to a channel is:
- http://slackapi.github.io/python-slackclient/
#python #slack #python_slackclient #im #telegram
Telegram
filtering users are looking for alternative messaging applications. One the well known applications for tech savvy guys is Slack
it is filtered but not in a way Telegram
is filtered. You can call API without any proxy or socks server. The app itself is filtered, but when you login for the first time by using a proxy server, you do not need socks server anymore as you are logged in all the times.One of the
Slack
developer kit for Python
is python-slackclient
. Installtion is easy as pie:pip install slackclient
When you get your bot token, all thing you need to do to send a message to a channel is:
from slackclient import SlackClient
slack_token = "xoxp-YOUR-TOKEN"
sc = SlackClient(slack_token)
sc.api_call(
"chat.postMessage",
channel="#python",
text="Hello from Python! 🎉"
)
#python
is the name of your channel. For more information go to link below:- http://slackapi.github.io/python-slackclient/
#python #slack #python_slackclient #im #telegram
Meltdown: the latest news on two major CPU security bugs
Two major computer processor security bugs, dubbed Meltdown and Spectre, affect nearly every device made in the last 20 years. The ramifications of how much these bugs will impact computing is still playing out, but it could lead to compromised servers for cloud platforms and other farther-reaching effects.
#news #bug #meltdown #spectre #cpu
Two major computer processor security bugs, dubbed Meltdown and Spectre, affect nearly every device made in the last 20 years. The ramifications of how much these bugs will impact computing is still playing out, but it could lead to compromised servers for cloud platforms and other farther-reaching effects.
#news #bug #meltdown #spectre #cpu
How to add authentication to
At first you need to create an admin user, so bring up a mongo shell by typing
Now by using
Disconnect the mongo shell.
The important note about mongo is to run it by
#mongodb #mongo #auth #authentication #create_user
MongoDB
?At first you need to create an admin user, so bring up a mongo shell by typing
mongo
in your terminal and hit enter. The database that users are stored is admin
, so switch to admin
database:use admin
Now by using
createUser
database method we will create a user called myUserAdmin
:db.createUser(
{
user: "myUserAdmin",
pwd: "1234qwer",
roles: [ { role: "userAdminAnyDatabase", db: "admin" } ]
}
)
Disconnect the mongo shell.
The important note about mongo is to run it by
--auth
argument, otherwise authentication would not work:mongod --auth --port 27017 --dbpath /data/db1
#mongodb #mongo #auth #authentication #create_user
What is
MongoDB has 2 concepts that may lead even intermediate programmers to confusion! So let's break it down and explain both in depth.
1- Take a deep breath. :)
2- Replication: replicate means reproducing or making an exact copy of something. In
In
3- Sharding: sharding on the other hand has completely a different concept. If you have a server with 1 terabyte of data and you have 4 servers, then each nore will have 250 gigabyte of data each. As you may have guessed it is not fault tolerant because each part of data resides in a separate server. Each read and write will be sent to the corresponding section. So if you add more shards, both read and write performance will be improved in the cluster. When one shard of the cluster goes down, any data on it is inaccessible. For that reason each member of the cluster should also be a replica-set, but not required to.
4- Take another deep breath, and let's get back to work.
#mongodb #mongo #shard #replica #replication #sharding #cluster
shard
and replication
in mongoDB
? What is their differences?MongoDB has 2 concepts that may lead even intermediate programmers to confusion! So let's break it down and explain both in depth.
1- Take a deep breath. :)
2- Replication: replicate means reproducing or making an exact copy of something. In
MongoDB
replication, mirror all data sets into another server. This is process is used for fault tolerance. If there are 4 mongo servers and your dataset is 1 terabyte, each node in replica-set will have 1 terabyte of data.In
replica-set
there is ONE master (primary) node, and one or more slaves (secondary). Read performance can be improved by adding more and more slaves, but not writes! Adding more slaves does not affect writes, that's because all writes goes to master first and then will be propagated to other slaves.3- Sharding: sharding on the other hand has completely a different concept. If you have a server with 1 terabyte of data and you have 4 servers, then each nore will have 250 gigabyte of data each. As you may have guessed it is not fault tolerant because each part of data resides in a separate server. Each read and write will be sent to the corresponding section. So if you add more shards, both read and write performance will be improved in the cluster. When one shard of the cluster goes down, any data on it is inaccessible. For that reason each member of the cluster should also be a replica-set, but not required to.
4- Take another deep breath, and let's get back to work.
#mongodb #mongo #shard #replica #replication #sharding #cluster
Migrate a running process into
Send the current foreground job to the background using
List all the background jobs using
Here the PID is
Start a new
Reattach the background process using:
If this error appears:
Then type in the following command as root.
#reptyr #tmux #screen #pid
tmux
reptyr
is a utility for taking an existing running program and attaching it to a new terminal. Started a long-running process over ssh, but have to leave and don’t want to interrupt it? Just start a screen
, use reptyr
to grab it, and then kill the ssh session and head on home.sudo apt-get install -y reptyr # For Ubuntu users
Send the current foreground job to the background using
CTRL-Z
.List all the background jobs using
jobs -l
. This will get you the PID.jobs -l
[1] + 16189 suspended vim foobar.rst
Here the PID is
16189
.Start a new
tmux
or screen
session. I will be using tmux
:tmux
Reattach the background process using:
reptyr 16189
If this error appears:
Unable to attach to pid 16189: Operation not permitted
The kernel denied permission while attaching
Then type in the following command as root.
echo 0 > /proc/sys/kernel/yama/ptrace_scope
#reptyr #tmux #screen #pid
1. List all Open Files with lsof Command
> lsof
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
init 1 root cwd DIR 253,0 4096 2 /
init 1 root rtd DIR 253,0 4096 2 /
init 1 root txt REG 253,0 145180 147164 /sbin/init
init 1 root mem REG 253,0 1889704 190149 /lib/libc-2.12.so
FD
column stands for File Descriptor
, This column values are as below:-
cwd
current working directory-
rtd
root directory-
txt
program text (code and data)-
mem
memory-mapped fileTo get the count of open files you can use
wc -l
with lsof
like as follow:lsof | wc -l
2. List User Specific Opened Files
lsof -u alireza
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
sshd 1838 alireza cwd DIR 253,0 4096 2 /
sshd 1838 alireza rtd DIR 253,0 4096 2 /
#linux #sysadmin #lsof #wc #file_descriptor
Today we encountered slowness on MongoDB that caused all the infrastructure to get affected. The problem was that slowness on some specific mongo queries caused all the other queries to wait. YES we use index and YES we used explained on those queries and saw that those queries are using index. Now to mitigate the issue we had to kill very slow
The function below kills slow queries:
We need to save this query in mongo itself and run it directly. To save the above function in mongoDB use
I will explain the above function parts in a different post. Now you need to load server scripts and then run it:
The above query kills queries that has taken longer than 20s.
#mongodb #mongo #function #kill_slow_queries #currentOp
find
queries until we fix the issue.The function below kills slow queries:
function (sec) {db.currentOp()['inprog'].forEach(function (query) { if (query.op !== 'query') { return; } if (query.secs_running < sec) { return; } print(['Killing query:', query.opid, 'which was running:', query.secs_running, 'sec.'].join(' ')); db.killOp(query.opid); })}
We need to save this query in mongo itself and run it directly. To save the above function in mongoDB use
db.system.js.save
:db.system.js.save({_id:"kill_slow_queries", value:function (sec) {db.currentOp()['inprog'].forEach(function (query) { if (query.op !== 'query') { return; } if (query.secs_running < sec) { return; } print(['Killing query:', query.opid, 'which was running:', query.secs_running, 'sec.'].join(' ')); db.killOp(query.opid); })} })
I will explain the above function parts in a different post. Now you need to load server scripts and then run it:
db.loadServerScripts()
kill_slow_queries(20)
The above query kills queries that has taken longer than 20s.
NOTE:
you can create a shell script and run it periodically using crontab until you fix the slowness on your server.#mongodb #mongo #function #kill_slow_queries #currentOp
MongoDB
has a top utility like top
linux command that displays how much time spent on read, write and total on every name space (collection).To run
mongotop
you just need to run:mongotop
The output is something like below:
root@hs-1:~# mongotop
2018-01-09T13:42:42.177+0000 connected to: 127.0.0.1
ns total read write 2018-01-09T13:42:43Z
users.profile 28ms 28ms 0ms
authz.tokens 7ms 7ms 0ms
mielin.obx 3ms 3ms 0ms
conduc.contacts 1ms 1ms 0ms
admin.system.roles 0ms 0ms 0ms
The above query will run every second, to increase the interval use
mongotop YOUR_INTERVAL_INSECOND
.If you want the result in json use
mongotop --json
.If you want to return the result once and exit use
mongotop --row-count
#mongodb #mongo #mongotop #read #write
On previous posts we explained about query slowness. Here we try to explain different parts of the function.
can use command like
The part we are interested in is
This is all we've done in previous posts, to kill slow queries in mongoDB.
#mongodb #mongo #currentOp #killOp #opid
db.currentOp
: in progress operations in mongoDB is displayed by this command. The response of the command is in json format, so youcan use command like
db.currentOp()['inprog']
. The response has many useful informations like lock status
, numYields
and so on.The part we are interested in is
opid
part. opid
is the pid number of the query operation. op
section of each operation shows the type of the query. It can be an internal database command, insert command and or query. secs_running
of the operation is the part that we can check whether a query has taken a long time or not. It is in second.db.killOp
: killing an operation is just as simple as giving the opid
number to killOp
as below:db.killOp(6123213)
This is all we've done in previous posts, to kill slow queries in mongoDB.
#mongodb #mongo #currentOp #killOp #opid
See live disk IO status by using
The output has many columns. The part I'm interested in for now is
second. To see size per second in read and write see columns
#linux #debian #iostat #read_per_second #write_per_second #sysstat
iostat
:iostat -dx 1
The output has many columns. The part I'm interested in for now is
r/s
which refers to read per second and w/s
which is write persecond. To see size per second in read and write see columns
rkB/s
, wkB/s
in their corresponding order.NOTE:
if you don't have iostat on your linux os install it on debian by issuing apt-get install sysstat
command.#linux #debian #iostat #read_per_second #write_per_second #sysstat
Benchmark disk performance using
In order to get a meaningful result run the test a couple of times.
Direct read (without cache):
And here's a cached read:
purposes. For meaningful results, this operation should be repeated
2-3 times on an otherwise inactive system (no other active processes)
with at least a couple of megabytes of free memory. This displays
the speed of reading through the buffer cache to the disk without
any prior caching of data. This measurement is an indication of how
fast the drive can sustain sequential data reads under Linux, without
any filesystem overhead. To ensure accurate measurements, the
buffer cache is flushed during the processing of -t using the
BLKFLSBUF ioctl.
For meaningful results, this operation should be repeated 2-3
times on an otherwise inactive system (no other active processes)
with at least a couple of megabytes of free memory. This displays
the speed of reading directly from the Linux buffer cache without
disk access. This measurement is essentially an indication of the
throughput of the processor, cache, and memory of the system under
test.
You can use
These are some useful and simple disk benchmarking tools.
#linux #benchmark #hdd #dd #hard_disk #hdparm
hdparm
& dd
.In order to get a meaningful result run the test a couple of times.
Direct read (without cache):
$ sudo hdparm -t /dev/sda2
/dev/sda2:
Timing buffered disk reads: 302 MB in 3.00 seconds = 100.58 MB/sec
And here's a cached read:
$ sudo hdparm -T /dev/sda2
/dev/sda2:
Timing cached reads: 4636 MB in 2.00 seconds = 2318.89 MB/sec
-t:
Perform timings of device reads for benchmark and comparisonpurposes. For meaningful results, this operation should be repeated
2-3 times on an otherwise inactive system (no other active processes)
with at least a couple of megabytes of free memory. This displays
the speed of reading through the buffer cache to the disk without
any prior caching of data. This measurement is an indication of how
fast the drive can sustain sequential data reads under Linux, without
any filesystem overhead. To ensure accurate measurements, the
buffer cache is flushed during the processing of -t using the
BLKFLSBUF ioctl.
-T:
Perform timings of cache reads for benchmark and comparison purposes.For meaningful results, this operation should be repeated 2-3
times on an otherwise inactive system (no other active processes)
with at least a couple of megabytes of free memory. This displays
the speed of reading directly from the Linux buffer cache without
disk access. This measurement is essentially an indication of the
throughput of the processor, cache, and memory of the system under
test.
You can use
dd
command to test your hard disk too:$ time sh -c "dd if=/dev/zero of=ddfile bs=8k count=250000 && sync"; rm ddfile
rm ddfile
removes the test file created by dd
command of=ddfile
. of
param stands for output file.These are some useful and simple disk benchmarking tools.
#linux #benchmark #hdd #dd #hard_disk #hdparm