/var/log/DMIT-NOC.log – Telegram

/var/log/DMIT-NOC.log

4.73K subscribers

189 photos

6 files

117 links

Download Telegram

About

Blog

Apps

Platform

/var/log/DMIT-NOC.log

4.73K subscribers

/var/log/DMIT-NOC.log

LAX Location emergency maintenance:
Core Router reconfiguration.

952 views21:28

/var/log/DMIT-NOC.log

/var/log/DMIT-NOC.log

LAX Location emergency maintenance: Core Router reconfiguration.

Done~

1.05K views21:46

/var/log/DMIT-NOC.log

This email serves as official notification that Zayo and/or one of its providers will be performing maintenance on its network as described below. This maintenance may affect services you have with us.

1st Activity Date
27-Oct-2020 00:01 to 27-Oct-2020 05:00 ( Pacific )
27-Oct-2020 07:00 to 27-Oct-2020 12:00 ( GMT )

Reason for Maintenance: Zayo will implement maintenance to perform software upgrade to router mpr1.lax12.us to SCBE2s

Expected Impact: Service Affecting Activity: Any Maintenance Activity directly impacting the service(s) of customers. Service(s) are expected to go down as a result of these activities.

DMIT Comment: It might impacts DMIT's Internet connectivity for a while (up to 2Gbps)
But DMIT has NTT, Cogent to keep reachability of our network,

1.06K viewsedited 22:53

/var/log/DMIT-NOC.log

/var/log/DMIT-NOC.log

This email serves as official notification that Zayo and/or one of its providers will be performing maintenance on its network as described below. This maintenance may affect services you have with us. 1st Activity Date 27-Oct-2020 00:01 to 27-Oct-2020 05:00…

Sry for typo:
Up to 2Gbps > Up to 2 Hours; ( depends on Zayo )

907 views04:52

/var/log/DMIT-NOC.log

We found something wrong in HKG, on investigation

807 viewsedited 04:34

/var/log/DMIT-NOC.log

/var/log/DMIT-NOC.log

We found something wrong in HKG, on investigation

Since we are installing the brand new DL325Gen10 with the AMD EPYC server, our on-site engineer needs to move our backup switch to an appropriate location.

Because the backup switch has no enough space but we have to install it at the last time we migrate our servers from TGT to Equinix. So, we did it: Our on-site install it at an inappropriate location.

Also, the datacenter offered us a Cisco unsupported transceiver and install it at the uplink of the main switch.

Here we go,~ We lost the backup switch due to move location. Then, we already lost the uplink of the master switch. Boom～

uhmm, Yes we do have an infrastructure monitoring platform, but due to migration at the last time, we haven't restored the connection with the new network and infrastructure monitoring platform.

2.54K views05:13

/var/log/DMIT-NOC.log

BAAM~~ These servers are ready to be installed today. All HKG VMs will be migrated to the new nodes later.

2.43K views05:01

/var/log/DMIT-NOC.log

There will be a high availability transformation and upgrading for the LAX datacenter. The switch will be reboot times.
10~20min unstable network.

781 views21:49

/var/log/DMIT-NOC.log

/var/log/DMIT-NOC.log

BAAM~~ These servers are ready to be installed today. All HKG VMs will be migrated to the new nodes later.

There are already over 400+ VM running well in our HKG EPYC with the Ceph cluster. Your data will be live migrated to the Ceph HA cluster and AMD node. Please try to backup your data to prevent any unexpected event.

As we expected, it is smooth progress. But, please try to backup your data to prevent any unexpected event.

But we cannot live to migrate your VM status from the previous-generation node to the new one, we have to reboot your VM at the last stage. So, your HKG VM might face to 1~10min downtime when we transfer you to the EPYC node.
We will start to do this weekend; end before next Tuesday. (EDT)

775 viewsedited 01:46

/var/log/DMIT-NOC.log

/var/log/DMIT-NOC.log

There will be a high availability transformation and upgrading for the LAX datacenter. The switch will be reboot times. 10~20min unstable network.

We will gradually transfer your VM from the stand-alone storage file system to our Ceph HA storage. DMIT LAX cluster used the zfs file system which allows us to live to migrate your disk fille to Ceph.

There should have only an IO delay during migration on your VM. Because our system needs to take a snapshot and transfer it to the Ceph cluster, during the transfer, there will have an IO hold to ensure data consistency.

As we expected, it is smooth progress. But, please try to backup your data to prevent any unexpected event.

This progress starts this weekend and will be ended at end of this month.

864 viewsedited 02:02

/var/log/DMIT-NOC.log

There is 2(Two) lite-impact maintenance notice for HKG and LAX.

874 views02:03

/var/log/DMIT-NOC.log

/var/log/DMIT-NOC.log pinned «There is 2(Two) lite-impact maintenance notice for HKG and LAX.»

02:03

/var/log/DMIT-NOC.log

/var/log/DMIT-NOC.log

We will gradually transfer your VM from the stand-alone storage file system to our Ceph HA storage. DMIT LAX cluster used the zfs file system which allows us to live to migrate your disk fille to Ceph. There should have only an IO delay during migration…

LAX-81403SR needs to be rebooted.
ETA: 30mins
We found some disks in this server is not work well.

769 views01:34

/var/log/DMIT-NOC.log

LAX Ceph migration is closed to full completion.
The IOPS limit of LAX VM will be tripled; the burstable IOPS be increased to six times than now after Ceph migration completion.

HKG is waiting for our on-site engineer to install RAM and SSD to another 2 servers.
Once we finished the migration, the IOPS limit will be doubled, and the burstable IOPS will be quadrupled.
80% of HKG VM already be migrated to AMD EPYC node and Ceph and running well.

754 views16:18

/var/log/DMIT-NOC.log

/var/log/DMIT-NOC.log

This email serves as official notification that Zayo and/or one of its providers will be performing maintenance on its network as described below. This maintenance may affect services you have with us. 1st Activity Date 27-Oct-2020 00:01 to 27-Oct-2020 05:00…

Please also note this notice.

710 views09:56

/var/log/DMIT-NOC.log

/var/log/DMIT-NOC.log

LAX Ceph migration is closed to full completion. The IOPS limit of LAX VM will be tripled; the burstable IOPS be increased to six times than now after Ceph migration completion. HKG is waiting for our on-site engineer to install RAM and SSD to another 2…

We complete almost all tasks for LAX, due to disk cache mode change and the ceph cluster performance improvement needed.

DMIT NOC will reboot some nodes / VMs at Oct 27 4PM(EST).

After this action, all tasks are complete.

720 viewsedited 10:10

/var/log/DMIT-NOC.log

/var/log/DMIT-NOC.log

We complete almost all tasks for LAX, due to disk cache mode change and the ceph cluster performance improvement needed. DMIT NOC will reboot some nodes / VMs at Oct 27 4PM(EST). After this action, all tasks are complete.

LAX Completed.
Testing VM:
vCore: 1Core
vRAM: 1GB
vDisk: 20G via Ceph
NO IOPS restrictions on testing VM = reference only; each products have different IOPS limits.

1.87K views20:54

/var/log/DMIT-NOC.log

/var/log/DMIT-NOC.log

We complete almost all tasks for LAX, due to disk cache mode change and the ceph cluster performance improvement needed. DMIT NOC will reboot some nodes / VMs at Oct 27 4PM(EST). After this action, all tasks are complete.

We also found another tiny issue. You can ignore it if you don't care about #IO_Latency

Please install ioping on your VM by:
YUM based
yum install ioping
ioping /dev/vda1

If ioping is not found, please do
yum install epel-release

APT based:
apt install ioping
ioping /dev/vda1

Please reinstall the system if you found the latency over than 900us and you care about it.

777 viewsedited 21:14