/var/log/DMIT-NOC.log
3.89K subscribers
176 photos
5 files
109 links
Download Telegram
SJC Maintenance
Time: Apr 10 ~ Apr 30, 2025
Service: All VMs.
Impact: Lower IO & Network performance; couple reboot.
Duration: Less than 5 hrs per VM.
Description:
- Merge with LAX.T1.


HKG Maintenance
Time: Apr 16 ~ Apr 24, 2025 (Changed to Apr 13 ~ Apr 17)
Service: All VMs.
Impact: Couple reboot.
Duration: Less than 3 hrs per VM.
Description:
- Replace CPU from 7402 to 7003 platform.
- Migrated from previous rack vendor to DMIT's Equinix rack.
/var/log/DMIT-NOC.log pinned «SJC Maintenance Time: Apr 10 ~ Apr 30, 2025 Service: All VMs. Impact: Lower IO & Network performance; couple reboot. Duration: Less than 5 hrs per VM. Description: - Merge with LAX.T1. HKG Maintenance Time: Apr 16 ~ Apr 24, 2025 (Changed to Apr 13 ~ Apr…»
HKG one node experience major hardware error;
Working on the repairing.
The engineer has been dispatched.
All the services were restored at 10PM HKT.
The compensation has been issued.
The control panel has been restored.

The new built node rather has CPU or motherboard failure which prevent it go thru the POST.
DMIT has ordered couple new nodes.

====
The WHMCS and corresponding modules currently used by DMIT are burdened with too many legacy issues. Even though DMIT has recoded most of it.
The configuration file loss is currently the main reason why VMs cannot recover quickly after node failure
We already figured out a new method for VM management which allows us have more secure and flexible way to manage the VM's configuration files.
This approach will be written in the new architecture and compatible with the current architecture to help smooth out the transition.
Our main site at Los Angeles was experiencing power outage.

There is a multiple incident happening in the downtown LA.

West7Center report:
There is a power cut to the suite; the power is restoring;

The issue happened in CoreSite LA1 has been resolved already:
A third-party vendor performing work on the street inadvertently damaged a water pipe. At this time, all building cooling systems are operating normally, and there is no indication of reduced water pressure entering the building. The Los Angeles Department of Water and Power (LADWP) has been notified, and we are currently awaiting an update on their estimated time of arrival.
You may direct any questions or concerns to the below department.
Dispatch engineer onsite to check the status.
The website and management systems are in Los Angeles core site. It won’t affected the service in other regions but the control panel will remain offline.
On-site engineers smell smoke onsite; no visible damage in the suite and in our cage.
Datacenter engineer is working on locate the issue.
West7Center engineer is trying to locate the root cause, before restoring the power;

The P+R power feed was disconnected due to possible fire hazard when incident occurred.
It was not a power outage; instead it’s a fire prevention lead to power cut.
There is
- No sprinkler has been initiated;
- No visible fire hazard;
- No water damage;
- No evacuation;
- A small area has smoke smelled. (Not entire suites)
- only power cut on the cabinet; ambient light remain on.

Based on past information from other site engineers, the power was disconnected just after fire alarm has been triggered.
W7C response:
They are waiting for UPS system vendor engineer to be onsite.

The electrical engineer has arrived.

The UPS system has encountered some issue which prevent it starting
W7C UPS technician arrives in 30 minutes.
Power up.