Simulate Network Faults

    NetworkChaos is a fault type in Chaos Mesh. By creating a NetworkChaos experiment, you can simulate a network fault scenario for a cluster. Currently, NetworkChaos supports the following fault types:

    • Partition: network disconnection and partition.
    • Net Emulation: poor network conditions, such as high delays, high packet loss rate, packet reordering, and so on.
    • Bandwidth: limit the communication bandwidth between nodes.

    Before creating NetworkChaos experiments, ensure the following:

    1. During the network injection process, make sure that the connection between Controller Manager and Chaos Daemon works, otherwise the NetworkChaos cannot be restored anymore.
    2. If you want to simulate Net Emulation fault, make sure the NET_SCH_NETEM module is installed in the Linux kernel. If you are using CentOS, you can install the module through the kernel-modules-extra package. Most other Linux distributions have installed the module already by default.
    1. Open Chaos Dashboard, and click NEW EXPERIMENT on the page to create a new experiment:

    2. In the Choose a Target area, choose NETWORK ATTACK and select a specific behavior, such as LOSS. Then fill out specific configuration.

      NetworkChaos Experiments

      For details of specific configuration fields, refer to [Field description](#field description).

    3. Fill out the experiment information, and specify the experiment scope and the scheduled experiment duration.

    4. Submit the experiment information.

    1. This configuration causes a latency of 10 milliseconds in the network connections of the target Pods. In addition to latency injection, Chaos Mesh supports packet loss and packet reordering injection. For details, see .

    1. Write the experiment configuration to the network-partition.yaml file, as shown below:

      This configuration blocks the connection created from app1 to app2. The value for the direction field can be to, from or both. For details, refer to .

    2. After the configuration file is prepared, use kubectl to create the experiment:

      1. kubectl apply -f ./network-partition.yaml
    1. Write the experiment configuration to the network-bandwidth.yaml file, as shown below:

      This configuration limits the bandwidth of app1 to 1 mbps.

    2. After the configuration file is prepared, use kubectl to create the experiment:

      1. kubectl apply -f ./network-bandwidth.yaml

    For the Net Emulation and Bandwidth fault types, you can further configure the action related parameters according to the following description.

    • Net Emulation type: delay, loss, duplicated, corrupt
    • Bandwidth type: bandwidth

    delay

    Setting action to delay means simulating network delay fault. You can also configure the following parameters.

    ParameterTypeDescriptionRequiredRequiredExample
    latencystringIndicates the network latencyNoNo2ms
    correlationstringIndicates the correlation between the current latency and the previous one. Range of value: [0, 100]NoNo50
    jitterstringIndicates the range of the network latencyNoNo1ms
    reorderReorder(#Reorder)Indicates the status of network packet reorderingNo
    1. Generate a random number whose distribution is related to the previous value:

      is the random number. corr is the correlation you fill out before.

    2. Use this random number to determine the delay of the current packet:

      1. ((rnd % (2 * sigma)) + mu) - sigma

      In the above command, sigma is jitter and mu is latency.

    reorder

    Setting action to reorder means simulating network packet reordering fault. You can also configure the following parameters.

    loss

    Setting action to loss means simulating packet loss fault. You can also configure the following parameters.

    ParameterTypeDescriptionDefault valueRequiredExample
    lossstringIndicates the probability of packet loss. Range of value: [0, 100]0No50
    correlationstringIndicates the correlation between the probability of current packet loss and the previous time’s packet loss. Range of value: [0, 100]0No50

    duplicate

    Set action to duplicate, meaning simulating package duplication. At this point, you can also set the following parameters.

    corrupt

    Setting action to corrupt means simulating package corruption fault. You can also configure the following parameters.

    ParameterTypeDescriptionDefault valueRequiredExample
    corruptstringIndicates the probability of packet corruption. Range of value: [0, 100]0No50
    correlationstringIndicates the correlation between the probability of current packet corruption and the previous time’s packet corruption. Range of value: [0, 100]0No50

    For occasional events such as reorder, loss, duplicate, and corrupt, the correlation is more complicated. For specific model description, refer to NetemCLG.

    bandwidth

    Setting action to bandwidth means simulating bandwidth limit fault. You also need to configure the following parameters.