Tech C**P
14 subscribers
161 photos
9 videos
59 files
304 links
مدرس و برنامه نویس پایتون و لینوکس @alirezastack
Download Telegram
Data Analysis


Create a dataframe from dictionary in Pandas:

import pandas
data = [{'id': 1, 'name': 'alireza'}, {'id': 2, 'name': 'Mohsen'}]
# Creating a dataframe from a dictionary object
df = pandas.DataFrame(data)

Now if you print dataframe:
> df
id name
0 1 alireza
1 2 Mohsen

NOTE: the first column is the index column.


In order to turn it to a dictionary after your aggregation, analysis, etc just use to_dict like below:

df.to_dict(orient='records')
[{'id': 1, 'name': 'alireza'}, {'id': 2, 'name': 'Mohsen'}]

You are right! We didn't do anything useful on records, but the goal is to tell you how to turn dataframe to a dictionary not more.

NOTE: on older version of pandas you have to use outtype='records' rather than orient='records'.

#python #pandas #to_dict #outtype #orient #dictionary #dataframe
Allow outgoing requests through server with CSF (ConfigServer Security & Firewall)


If by debugging you can confirm that your server cannot see external port but other servers can, just add the desired port to CSF with the destination IP address in the format of tcp/udp|in/out|s/d=port|s/d=ip in /etc/csf/csf.allow:

tcp|out|d=1080|d=15.9.8.223
The above line is advanced port+ip filtering which opens 1080 port on 15.9.8.223 destination server. You can just enter onr IP address per line to be allowed through iptables. In order to apply new changes just enter csf -r in command line after saving the above filter.

NOTE: one IP address per line is mandatory.


NOTE: CIDR addressing allowed with a quaded IP (e.g. 192.168.254.0/24)


NOTE: Only list IP addresses, not domain names (they will be ignored)


#linux #csf #iptables
How to find documents with a specific field type?

It may happen to have different field types in a specific field like credit, and some of them are numeric and some of them string. In order to find NumberLong() field types you can use $type:

db.users.find({credit: {$type: "long" }})


If you want to remove those fields, if applicable, use remove instead of find to remove those documents that has wrong types. It is not sensible to do that for users document though, it just gives you the idea.

#mongodb #mongo #type #field_type #remove #find
Get current directory from within the bash script:

SCRIPT_DIR="$( cd "$( dirname "$0" )" && pwd )"
echo "$SCRIPT_DIR"

dirname gets the current directory and cd will change the current directory, finally pwd will return the current working directory, which in our case will be stored in SCRIPT_DIR.

#linux #bash #script #shell #pwd #current_directory
By Telegram filtering users are looking for alternative messaging applications. One the well known applications for tech savvy guys is Slack it is filtered but not in a way Telegram is filtered. You can call API without any proxy or socks server. The app itself is filtered, but when you login for the first time by using a proxy server, you do not need socks server anymore as you are logged in all the times.

One of the Slack developer kit for Python is python-slackclient. Installtion is easy as pie:

pip install slackclient

When you get your bot token, all thing you need to do to send a message to a channel is:

from slackclient import SlackClient

slack_token = "xoxp-YOUR-TOKEN"
sc = SlackClient(slack_token)

sc.api_call(
"chat.postMessage",
channel="#python",
text="Hello from Python! 🎉"
)

#python is the name of your channel. For more information go to link below:
- http://slackapi.github.io/python-slackclient/

#python #slack #python_slackclient #im #telegram
Meltdown: the latest news on two major CPU security bugs

Two major computer processor security bugs, dubbed Meltdown and Spectre, affect nearly every device made in the last 20 years. The ramifications of how much these bugs will impact computing is still playing out, but it could lead to compromised servers for cloud platforms and other farther-reaching effects.

#news #bug #meltdown #spectre #cpu
How to add authentication to MongoDB?

At first you need to create an admin user, so bring up a mongo shell by typing mongo in your terminal and hit enter. The database that users are stored is admin, so switch to admin database:

use admin


Now by using createUser database method we will create a user called myUserAdmin:

db.createUser(
{
user: "myUserAdmin",
pwd: "1234qwer",
roles: [ { role: "userAdminAnyDatabase", db: "admin" } ]
}
)

Disconnect the mongo shell.

The important note about mongo is to run it by --auth argument, otherwise authentication would not work:

mongod --auth --port 27017 --dbpath /data/db1

#mongodb #mongo #auth #authentication #create_user
What is shard and replication in mongoDB? What is their differences?

MongoDB has 2 concepts that may lead even intermediate programmers to confusion! So let's break it down and explain both in depth.

1- Take a deep breath. :)

2- Replication: replicate means reproducing or making an exact copy of something. In MongoDB replication, mirror all data sets into another server. This is process is used for fault tolerance. If there are 4 mongo servers and your dataset is 1 terabyte, each node in replica-set will have 1 terabyte of data.
In replica-set there is ONE master (primary) node, and one or more slaves (secondary). Read performance can be improved by adding more and more slaves, but not writes! Adding more slaves does not affect writes, that's because all writes goes to master first and then will be propagated to other slaves.

3- Sharding: sharding on the other hand has completely a different concept. If you have a server with 1 terabyte of data and you have 4 servers, then each nore will have 250 gigabyte of data each. As you may have guessed it is not fault tolerant because each part of data resides in a separate server. Each read and write will be sent to the corresponding section. So if you add more shards, both read and write performance will be improved in the cluster. When one shard of the cluster goes down, any data on it is inaccessible. For that reason each member of the cluster should also be a replica-set, but not required to.

4- Take another deep breath, and let's get back to work.

#mongodb #mongo #shard #replica #replication #sharding #cluster
Migrate a running process into tmux

reptyr is a utility for taking an existing running program and attaching it to a new terminal. Started a long-running process over ssh, but have to leave and don’t want to interrupt it? Just start a screen, use reptyr to grab it, and then kill the ssh session and head on home.

sudo apt-get install -y reptyr    # For Ubuntu users

Send the current foreground job to the background using CTRL-Z.

List all the background jobs using jobs -l. This will get you the PID.

jobs -l
[1] + 16189 suspended vim foobar.rst


Here the PID is 16189.
Start a new tmux or screen session. I will be using tmux:

tmux


Reattach the background process using:

reptyr 16189

If this error appears:

Unable to attach to pid 16189: Operation not permitted
The kernel denied permission while attaching


Then type in the following command as root.

echo 0 > /proc/sys/kernel/yama/ptrace_scope

#reptyr #tmux #screen #pid
1. List all Open Files with lsof Command

> lsof
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
init 1 root cwd DIR 253,0 4096 2 /
init 1 root rtd DIR 253,0 4096 2 /
init 1 root txt REG 253,0 145180 147164 /sbin/init
init 1 root mem REG 253,0 1889704 190149 /lib/libc-2.12.so

FD column stands for File Descriptor, This column values are as below:
- cwd current working directory
- rtd root directory
- txt program text (code and data)
- mem memory-mapped file


To get the count of open files you can use wc -l with lsof like as follow:

lsof | wc -l


2. List User Specific Opened Files

lsof -u alireza
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
sshd 1838 alireza cwd DIR 253,0 4096 2 /
sshd 1838 alireza rtd DIR 253,0 4096 2 /

#linux #sysadmin #lsof #wc #file_descriptor
Mastering Linux Shell Scripting

#ebook #book #shell #scripting #linux #pub
Today we encountered slowness on MongoDB that caused all the infrastructure to get affected. The problem was that slowness on some specific mongo queries caused all the other queries to wait. YES we use index and YES we used explained on those queries and saw that those queries are using index. Now to mitigate the issue we had to kill very slow find queries until we fix the issue.

The function below kills slow queries:

function (sec) {db.currentOp()['inprog'].forEach(function (query) {     if (query.op !== 'query') { return; }      if (query.secs_running < sec) { return; }        print(['Killing query:', query.opid,             'which was running:', query.secs_running, 'sec.'].join('   '));     db.killOp(query.opid); })}


We need to save this query in mongo itself and run it directly. To save the above function in mongoDB use db.system.js.save:

db.system.js.save({_id:"kill_slow_queries", value:function (sec) {db.currentOp()['inprog'].forEach(function (query) {     if (query.op !== 'query') { return; }      if (query.secs_running < sec) { return; }        print(['Killing query:', query.opid,             'which   was running:', query.secs_running, 'sec.'].join(' '));     db.killOp(query.opid); })} })


I will explain the above function parts in a different post. Now you need to load server scripts and then run it:

db.loadServerScripts()
kill_slow_queries(20)

The above query kills queries that has taken longer than 20s.

NOTE: you can create a shell script and run it periodically using crontab until you fix the slowness on your server.

#mongodb #mongo #function #kill_slow_queries #currentOp
MongoDB has a top utility like top linux command that displays how much time spent on read, write and total on every name space (collection).


To run mongotop you just need to run:

mongotop


The output is something like below:

root@hs-1:~# mongotop
2018-01-09T13:42:42.177+0000 connected to: 127.0.0.1

ns total read write 2018-01-09T13:42:43Z
users.profile 28ms 28ms 0ms
authz.tokens 7ms 7ms 0ms
mielin.obx 3ms 3ms 0ms
conduc.contacts 1ms 1ms 0ms
admin.system.roles 0ms 0ms 0ms


The above query will run every second, to increase the interval use mongotop YOUR_INTERVAL_INSECOND.

If you want the result in json use mongotop --json.

If you want to return the result once and exit use mongotop --row-count

#mongodb #mongo #mongotop #read #write
On previous posts we explained about query slowness. Here we try to explain different parts of the function.

db.currentOp: in progress operations in mongoDB is displayed by this command. The response of the command is in json format, so you
can use command like db.currentOp()['inprog']. The response has many useful informations like lock status, numYields and so on.
The part we are interested in is opid part. opid is the pid number of the query operation. op section of each operation shows the type of the query. It can be an internal database command, insert command and or query. secs_running of the operation is the part that we can check whether a query has taken a long time or not. It is in second.


db.killOp : killing an operation is just as simple as giving the opid number to killOp as below:

db.killOp(6123213)

This is all we've done in previous posts, to kill slow queries in mongoDB.

#mongodb #mongo #currentOp #killOp #opid
See live disk IO status by using iostat:

iostat -dx 1

The output has many columns. The part I'm interested in for now is r/s which refers to read per second and w/s which is write per
second. To see size per second in read and write see columns rkB/s, wkB/s in their corresponding order.

NOTE: if you don't have iostat on your linux os install it on debian by issuing apt-get install sysstat command.


#linux #debian #iostat #read_per_second #write_per_second #sysstat
Benchmark disk performance using hdparm & dd.

In order to get a meaningful result run the test a couple of times.


Direct read (without cache):


$ sudo hdparm -t /dev/sda2
/dev/sda2:
Timing buffered disk reads: 302 MB in 3.00 seconds = 100.58 MB/sec


And here's a cached read:


$ sudo hdparm -T /dev/sda2
/dev/sda2:
Timing cached reads: 4636 MB in 2.00 seconds = 2318.89 MB/sec

-t: Perform timings of device reads for benchmark and comparison
purposes. For meaningful results, this operation should be repeated
2-3 times on an otherwise inactive system (no other active processes)
with at least a couple of megabytes of free memory. This displays
the speed of reading through the buffer cache to the disk without
any prior caching of data. This measurement is an indication of how
fast the drive can sustain sequential data reads under Linux, without
any filesystem overhead. To ensure accurate measurements, the
buffer cache is flushed during the processing of -t using the
BLKFLSBUF ioctl.

-T: Perform timings of cache reads for benchmark and comparison purposes.
For meaningful results, this operation should be repeated 2-3
times on an otherwise inactive system (no other active processes)
with at least a couple of megabytes of free memory. This displays
the speed of reading directly from the Linux buffer cache without
disk access. This measurement is essentially an indication of the
throughput of the processor, cache, and memory of the system under
test.


You can use dd command to test your hard disk too:


$ time sh -c "dd if=/dev/zero of=ddfile bs=8k count=250000 && sync"; rm ddfile

rm ddfile removes the test file created by dd command of=ddfile. of param stands for output file.


These are some useful and simple disk benchmarking tools.

#linux #benchmark #hdd #dd #hard_disk #hdparm