/var/log/DMIT-NOC.log
4.73K subscribers
189 photos
6 files
117 links
Download Telegram
/var/log/DMIT-NOC.log
We found something wrong in HKG, on investigation
Since we are installing the brand new DL325Gen10 with the AMD EPYC server, our on-site engineer needs to move our backup switch to an appropriate location.

Because the backup switch has no enough space but we have to install it at the last time we migrate our servers from TGT to Equinix. So, we did it: Our on-site install it at an inappropriate location.

Also, the datacenter offered us a Cisco unsupported transceiver and install it at the uplink of the main switch.

Here we go,~ We lost the backup switch due to move location. Then, we already lost the uplink of the master switch. Boom~

uhmm, Yes we do have an infrastructure monitoring platform, but due to migration at the last time, we haven't restored the connection with the new network and infrastructure monitoring platform.
BAAM~~ These servers are ready to be installed today. All HKG VMs will be migrated to the new nodes later.
There will be a high availability transformation and upgrading for the LAX datacenter. The switch will be reboot times.
10~20min unstable network.
/var/log/DMIT-NOC.log
BAAM~~ These servers are ready to be installed today. All HKG VMs will be migrated to the new nodes later.
There are already over 400+ VM running well in our HKG EPYC with the Ceph cluster. Your data will be live migrated to the Ceph HA cluster and AMD node. Please try to backup your data to prevent any unexpected event.

As we expected, it is smooth progress. But, please try to backup your data to prevent any unexpected event.

But we cannot live to migrate your VM status from the previous-generation node to the new one, we have to reboot your VM at the last stage. So, your HKG VM might face to 1~10min downtime when we transfer you to the EPYC node.
We will start to do this weekend; end before next Tuesday. (EDT)
/var/log/DMIT-NOC.log
There will be a high availability transformation and upgrading for the LAX datacenter. The switch will be reboot times. 10~20min unstable network.
We will gradually transfer your VM from the stand-alone storage file system to our Ceph HA storage. DMIT LAX cluster used the zfs file system which allows us to live to migrate your disk fille to Ceph.

There should have only an IO delay during migration on your VM. Because our system needs to take a snapshot and transfer it to the Ceph cluster, during the transfer, there will have an IO hold to ensure data consistency.

As we expected, it is smooth progress. But, please try to backup your data to prevent any unexpected event.

This progress starts this weekend and will be ended at end of this month.
There is 2(Two) lite-impact maintenance notice for HKG and LAX.
/var/log/DMIT-NOC.log pinned «There is 2(Two) lite-impact maintenance notice for HKG and LAX.»
LAX Ceph migration is closed to full completion.
The IOPS limit of LAX VM will be tripled; the burstable IOPS be increased to six times than now after Ceph migration completion.

HKG is waiting for our on-site engineer to install RAM and SSD to another 2 servers.
Once we finished the migration, the IOPS limit will be doubled, and the burstable IOPS will be quadrupled.
80% of HKG VM already be migrated to AMD EPYC node and Ceph and running well.
/var/log/DMIT-NOC.log
LAX Ceph migration is closed to full completion. The IOPS limit of LAX VM will be tripled; the burstable IOPS be increased to six times than now after Ceph migration completion. HKG is waiting for our on-site engineer to install RAM and SSD to another 2…
We complete almost all tasks for LAX, due to disk cache mode change and the ceph cluster performance improvement needed.

DMIT NOC will reboot some nodes / VMs at Oct 27 4PM(EST).

After this action, all tasks are complete.
/var/log/DMIT-NOC.log
We complete almost all tasks for LAX, due to disk cache mode change and the ceph cluster performance improvement needed. DMIT NOC will reboot some nodes / VMs at Oct 27 4PM(EST). After this action, all tasks are complete.
We also found another tiny issue. You can ignore it if you don't care about #IO_Latency

Please install ioping on your VM by:
YUM based
yum install ioping
ioping /dev/vda1

If ioping is not found, please do
yum install epel-release

APT based:
apt install ioping
ioping /dev/vda1

Please reinstall the system if you found the latency over than 900us and you care about it.
The maintenance recent for HKG and LAX has been completed.
Thanks for your patience
Please read the #IO_Latency above, it works for HKG and LAX.
/var/log/DMIT-NOC.log pinned «The maintenance recent for HKG and LAX has been completed. Thanks for your patience Please read the #IO_Latency above, it works for HKG and LAX.»
Email copy:

Notice of Completion of High Reliability Reconstruction

DMIT NOC made a reconstruction for our LAX and HKG datacenter recently.

- DMIT upgrades all racks in HKG and LAX to 2 switches with vPC configuration. It improved the redundancy to avoid single device error like transceiver or switch failure.
- DMIT swapped from ZFS to the Ceph Cluster solution. It helps to avoid single-node failure, like some cases we had in HKG before.
- DMIT joined the AMD EPYC family in both HKG and LAX locations. All DMIT Hyper servers are powered by AMD EPYC 2nd GEN.


The route-map of DMIT:
- New location launch
- Move to DMIT self-developed client area system.
-- Simple UI;
-- Batter task performance
-- Client self-service BGP support
-- Client self-service rDNS support
-- HA support (current DMIT only offer High Reliability)
- etc..

Thanks to all DMIT clients for the support and trust us.
DMIT HKG IP re-arrange notice:

We will arrange you a new IP address if your IP is in the below subnet:
103.135.248.0/24
103.135.249.0/24
103.135.250.0/24
103.135.251.0/24

There are lots of unused IPs in these subnets; to improve usability and to offer different types of services. We will need to swap your IP to a new one. You will receive a ticket and email for the detail of the rearranging.