/var/log/DMIT-NOC.log
4.94K subscribers
189 photos
6 files
117 links
Download Telegram
/var/log/DMIT-NOC.log
The on-site migration task takes longer than expected; We will keep you post here.
The iLO/IPMI of DMIT.io cluster has some issue, working with on-site to fix it before put server online.
Noticed the issue on LAX EB;

DMIT is now working with CMIN2 to figure that out.
Coresite ident an issue on the cross connect; moving to a new interface, CMIN2 currently been disconnected.
/var/log/DMIT-NOC.log
Coresite ident an issue on the cross connect; moving to a new interface, CMIN2 currently been disconnected.
Replaced a new interface and observed it being good.

We will monitor for 24 hours.

The problem seems caused by dirt port on the MMR.

=======
Last reply from Coresite: “Patching was moved and cleaned to new LOA demarc. Light levels: rx from you +3.12 dBm and they are tx +7.34 dBm”

=======
Earlier messages from Coresite: “I traced down the issue and cleaned the ports and fiber ends. Can you please let me know if that fixed the problem?”
/var/log/DMIT-NOC.log
Scheduled maintenance Location: All LAX servers. Time: Sep 26, 2024 ~ Oct 26, 2024. Duration: - Estimated 60 minutes for each VM; (cumulatively) - Estimated 2 hours for each node. (cumulatively) - Exclude waiting in booting sequence. Sequence: 1. Partial…
The maintenance is almost complete; we are verifying the status;

The CMIN2 issue happened near days is covered by this statement; we expect to experience network degradation sometime.

We will let you know once the status returns to normal.

Between Oct 26 ~ 31, we will move out the current network equipment and clean up the legacy things. We should expect no more issue but please wait for the further confirmation.
/var/log/DMIT-NOC.log
Photo
Interface configuration error. Revised.
There is a pending upgrade on LAX's MX Edge Router.
Expect maximum 30 mins down time and 60 mins network downgrade.

Time: 3PM ~ 6PM PDT.

DMIT.io will also be impacted at the same time.
LAX:
Emergency Rebooting one of hypervisor
Schedule Maintenance:
Location: Tokyo;
Date: Nov 18 ~ Dec 18, 2024

Content:
- Network Upgrade
- Hardware Upgrade
- Storage Upgrade

Impact:
- Network jitter, instability and downgrade;
- Possible SSH Key change due to hardware migrate;
- Experience downtime.
/var/log/DMIT-NOC.log pinned «Schedule Maintenance: Location: Tokyo; Date: Nov 18 ~ Dec 18, 2024 Content: - Network Upgrade - Hardware Upgrade - Storage Upgrade Impact: - Network jitter, instability and downgrade; - Possible SSH Key change due to hardware migrate; - Experience downtime.»
LAX:
Emergency Rebooting one of hypervisor; replace NICs.
Done.
/var/log/DMIT-NOC.log
TYO: Pending reboot action in an hour, expect VM reboot, and upto 1hr.
- Extend 30 mins.
- Extend 45 mins again. Due to the initial RH does not record the label correct; it took many time to trace and re-cable.
- Ready in 11:30 JST
Observed TOR issue on TYO, dispatching engineer.
Done
/var/log/DMIT-NOC.log
Content:
- Network Upgrade
- Hardware Upgrade
- Storage Upgrade
- Network Upgrade in progress.
-- Adding: RETN 100G, NTT 100G, BBIX 100G;
-- Remove: Equinix IX 10G, Cogent 10G; NTT 10G;
-- ETA: within a week.

- Hardware Upgrade completed
-- 7402P > 7443P
-- 10G * 2 per Node > 100G * 2 per Node
-- 20G uplink per rack > 100G * 2 uplink per rack.

- Storage Upgrade completed
-- SATA 10G SDS Cluster > NVMe 100G*2 SDS Cluster.
Adjusting TYO network arch, some product experienced downtime, and it has been fixed now.
There are couple works need to be done before return to stable.