/var/log/DMIT-NOC.log
4.75K subscribers
189 photos
6 files
117 links
Download Telegram
/var/log/DMIT-NOC.log
Due to a large number of orders again, our two Xeon Gold nodes were severely overloaded. (20% more than our design). At the same time, due to DDoS attacks and high load, the node continuously disconnects from the cluster. This caused the task unable to execute;…
The reason we talk before is not exact correct. After in-depth investigation; Due to defects in PVE-Firewall. DMIT recompiled it; We delete defective designs; This also led to a failure in filtering broadcast and multicast packets successfully. We have captured a large number (> 100kpps) of broadcast or multicast packets sent by a guest VM when the network rate decreases and packet loss increases. The kernel of the guest VM does not have enough buffer and performance to process these multicast packets, which causes congestion in the guest VM. Since multicast packets are distributed to each guest VM; When the guest VM has a high load, it also causes a sudden high load of Host; (Since this is a shared VM, each VM does not have a full CPU core).

P.S: Some special intranet broadcast packets are easily blocked in a large-scale network environment. There are some packets that will cause network architecture changes (e.g. advertise their IP as IGMP Snooping router)

Our engineer team already put a new beta optimized PVE-Firewall to the nodes that have these issues for testing. During our test, it succeeded in blocking intranet attacks and abuse. Our engineer will upload the PVE-Firewall to our code platform and deploy it to all nodes.

However, our number of nodes still exceeds our expected number and we will not accept new orders for the time being. (Upgrade is allowed). Although the node resources become sufficient due to VM being released recently; new orders will be accepted when the new node is ready.
Scheduled maintenance:
3PM PST
Apr 7, 2020.
Window: 30min~1h

Maintenance:
1. Check the X-C quality,
2. Check light attenuation of our backbone.
3. Replace the transceiver on both side of backbone

Temporary Impact:
1. The network LH36806/7 will be inaccessible.
2. IX Peering goes down.
3. Network capability goes down.
/var/log/DMIT-NOC.log pinned «Scheduled maintenance: 3PM PST Apr 7, 2020. Window: 30min~1h Maintenance: 1. Check the X-C quality, 2. Check light attenuation of our backbone. 3. Replace the transceiver on both side of backbone Temporary Impact: 1. The network LH36806/7 will be inaccessible.…»
/var/log/DMIT-NOC.log
Due to a large number of orders again, our two Xeon Gold nodes were severely overloaded. (20% more than our design). At the same time, due to DDoS attacks and high load, the node continuously disconnects from the cluster. This caused the task unable to execute;…
Due to the release and balance of resources; We now have enough resources to accept new orders.

The previous packet loss was not caused by overselling. But we have also strictly limited our resources right now.

The new Xeon Gold node and EPYC node will be ready at next week. Between April 25 and May 10, we will launch two more EPYC nodes.
If you feel that LAX is not fast enough recently; Please check MTR test;

If you find that your LAX-China route passes through Shanghai; Please try to use UDP protocol to bypass the unknown restrictions of China Telecom Shanghai PoP temporarily

Our NOC is still contacting with China Telecom GNOC for this issue.
Due to frequent customer-to-customer fraud via transfer function; We will temporarily close the transfer function in the near future; Our team will optimize the transfer process as much as possible to protect the interests of both DMIT and our customers.
/var/log/DMIT-NOC.log pinned «Due to frequent customer-to-customer fraud via transfer function; We will temporarily close the transfer function in the near future; Our team will optimize the transfer process as much as possible to protect the interests of both DMIT and our customers.»
We are sorry about the below situation, but we have to inform all customers that.

A) Kernel, Speed, Network, Bandwidth:

If we found that there is 3rd part kernel or unknown TCP acceleration program, add-ons, plug-in installed in VM.

Our team will NOT give any help on the network issue. That wastes our team too much time to explain how these software damages and screw up the TCP queue of the network in your VM.

This involves professional network knowledge. If you cannot understand it, please leave the system as it is. (e.g. buffer, bufferboat, queue, transmission control, retransmission, etc..)

If we note that, our team will give notice without any further help. The refund is acceptable if the requirement is met.

B) Guarantee:

Below is an industry common sense.

DMIT never guarantee the network from VM to China or any location. The speed on the cart page is the max value the VM can get. There is NO ISP give any guarantee of bandwidth. The bandwidth commitment that ISP gave to us is between DMIT and their port on switch/edge router. Including China Telecom

Regards,
NOC
/var/log/DMIT-NOC.log pinned «We are sorry about the below situation, but we have to inform all customers that. A) Kernel, Speed, Network, Bandwidth: If we found that there is 3rd part kernel or unknown TCP acceleration program, add-ons, plug-in installed in VM. Our team will NOT…»
>Due to TPE (Trans-Pacific Express) S1S cable fault. There is a huge network degradation on the CN2 backbone. Please wait for CTA/CTG NOC response and route rescheduling. Please do not open a ticket for this issue. DMIT has no ability to repair
/var/log/DMIT-NOC.log
Back to CN2 GIA.
CN2 Backbone resumed. It carried by multiple submarine cables now. The latency is flapping between 160ms~185ms. (In general, it should between 120~125ms). Bandwidth is now unaffected, but latency.

According to our experience, it will take about 2-4 months to complete the repair.

Regards,
NOC
Scheduled maintenance
Date: Apr 22, 2020
Time: 12pm - 3pm PST
Datacenter: LAX
Estimated duration: 1h
Max duration: 3h
Affected services:
- Reinstall;
- Backup;
- Network in some nodes;
- Control Panel;

Best regards,
NOC
/var/log/DMIT-NOC.log pinned «Scheduled maintenance Date: Apr 22, 2020 Time: 12pm - 3pm PST Datacenter: LAX Estimated duration: 1h Max duration: 3h Affected services: - Reinstall; - Backup; - Network in some nodes; - Control Panel; Best regards, NOC»
Our NOC and external expert create a new channel to monitor backbone of DMIT network and some of our upstreams' backbone. https://t.me/DMIT_NOC_Monitor
If you are facing high packet loss; please check the Optimized CN2 IP Network. This Optimized IP network helps you avoid network congestion at some AS4134-AS4809 and AS4837-AS4809 network exchange PoP. Sending a ticket to us for additional Optimized IP Network; Also, you can order it when you create a VM.

The default is going though CN2 GIA in both directions. Optimized is going though AS4134 or CN2 GT from CHN to LAX, but CN2 GIA for the direction of LAX to CHN. So, we cannot make changes to the default one. Most of our customers only need CN2 GIA in both directions.

The CN2 GIA for end-users in China Telecom(AS4134) / China Unicom(AS4837) does not always have the best network quality in some specific direction.