Adding a feature

    To add an option for Cilium to use the ENI IPAM mode.

    We want to make this an option, so we need to add a field to CiliumNetworkingSpec:

    A few things to note here:

    • We could probably use a boolean for today’s needs, but we want to leave some flexibility, so we use a string.

    • We define a value for Cilium’s current default mode, so we leave the default “” value as meaning “default mode, whatever it may be in future”.

    So, we just need to check if Ipam is eni when determining which mode to configure.

    We will need to update both the versioned and unversioned APIs and regenerate the generated code, per the documentation on updating the API.

    Validation

    We should add some validation that the value entered is valid. We only accept eni, crd or the empty string right now.

    Validation is done in validation.go, and is fairly simple - we just add an error to a slice if something is not valid:

    1. if v.Ipam != "" {
    2. // "azure" not supported by kOps
    3. allErrs = append(allErrs, IsValidValue(fldPath.Child("ipam"), &v.Ipam, []string{"crd", "eni"})...)
    4. if v.Ipam == kops.CiliumIpamEni {
    5. if c.CloudProvider != string(kops.CloudProviderAWS) {
    6. allErrs = append(allErrs, field.Forbidden(fldPath.Child("ipam"), "Cilum ENI IPAM is supported only in AWS"))
    7. }
    8. if !v.DisableMasquerade {
    9. allErrs = append(allErrs, field.Forbidden(fldPath.Child("disableMasquerade"), "Masquerade must be disabled when ENI IPAM is used"))
    10. }
    11. }
    12. }

    Configuring Cilium

    First we add to the cilium-config ConfigMap:

    1. {{ with .Ipam }}
    2. ipam: {{ . }}
    3. {{ if eq . "eni" }}
    4. enable-endpoint-routes: "true"
    5. auto-create-cilium-node-resource: "true"
    6. blacklist-conflicting-routes: "false"
    7. {{ end }}
    8. {{ end }}

    Then we conditionally move cilium-operator to masters:

    1. {{ if eq .Ipam "eni" }}
    2. nodeSelector:
    3. node-role.kubernetes.io/master: ""
    4. tolerations:
    5. - effect: NoSchedule
    6. key: node-role.kubernetes.io/master
    7. - effect: NoExecute
    8. key: node.kubernetes.io/not-ready
    9. operator: Exists
    10. tolerationSeconds: 300
    11. - effect: NoExecute
    12. key: node.kubernetes.io/unreachable
    13. operator: Exists
    14. tolerationSeconds: 300
    15. {{ end }}

    After changing manifest files remember to run bash hack/update-expected.sh in order to get updated values.

    When Cilium is in ENI mode kubelet needs to be configured with the local IP address, so that it can distinguish it from the secondary IP address used by ENI. Kubelet is configured by nodeup, in nodeup/pkg/model/kubelet.go. That code passes the local IP address to kubelet when the UsesSecondaryIP() receiver of the NodeupModelContext returns true.

    So we modify UsesSecondaryIP() to also return true when Cilium is in ENI mode:

    Configuring IAM

    When Cilium is in ENI mode, cilium-operator on the master nodes needs additional IAM permissions. The masters’ IAM permissions are built by BuildAWSPolicyMaster() in pkg/model/iam/iam_builder.go:

    1. if b.Cluster.Spec.Networking != nil && b.Cluster.Spec.Networking.Cilium != nil && b.Cluster.Spec.Networking.Cilium.Ipam == kops.CiliumIpamEni {
    2. addCiliumEniPermissions(p, resource, b.Cluster.Spec.IAM.Legacy)
    3. }
    1. p.Statement = append(p.Statement,
    2. Effect: StatementEffectAllow,
    3. Action: stringorslice.Slice([]string{
    4. "ec2:DescribeSubnets",
    5. "ec2:AttachNetworkInterface",
    6. "ec2:AssignPrivateIpAddresses",
    7. "ec2:UnassignPrivateIpAddresses",
    8. "ec2:CreateNetworkInterface",
    9. "ec2:DescribeNetworkInterfaces",
    10. "ec2:DescribeVpcPeeringConnections",
    11. "ec2:DescribeSecurityGroups",
    12. "ec2:DetachNetworkInterface",
    13. "ec2:DeleteNetworkInterface",
    14. "ec2:ModifyNetworkInterfaceAttribute",
    15. "ec2:DescribeVpcs",
    16. }),
    17. Resource: resource,
    18. },
    19. )
    20. }

    Tests

    Prior to testing this for real, it can be handy to write a few unit tests.

    We should test that validation works as we expect (in validation_test.go):

    1. func Test_Validate_Cilium(t *testing.T) {
    2. grid := []struct {
    3. Cilium kops.CiliumNetworkingSpec
    4. Spec kops.ClusterSpec
    5. ExpectedErrors []string
    6. }{
    7. {
    8. Cilium: kops.CiliumNetworkingSpec{},
    9. },
    10. {
    11. Cilium: kops.CiliumNetworkingSpec{
    12. Ipam: "crd",
    13. },
    14. },
    15. {
    16. Cilium: kops.CiliumNetworkingSpec{
    17. DisableMasquerade: true,
    18. Ipam: "eni",
    19. },
    20. Spec: kops.ClusterSpec{
    21. CloudProvider: "aws",
    22. },
    23. },
    24. {
    25. Cilium: kops.CiliumNetworkingSpec{
    26. DisableMasquerade: true,
    27. Ipam: "eni",
    28. },
    29. Spec: kops.ClusterSpec{
    30. CloudProvider: "aws",
    31. },
    32. {
    33. Cilium: kops.CiliumNetworkingSpec{
    34. Ipam: "foo",
    35. },
    36. },
    37. {
    38. Cilium: kops.CiliumNetworkingSpec{
    39. Ipam: "eni",
    40. },
    41. Spec: kops.ClusterSpec{
    42. CloudProvider: "aws",
    43. },
    44. ExpectedErrors: []string{"Forbidden::cilium.disableMasquerade"},
    45. },
    46. {
    47. Cilium: kops.CiliumNetworkingSpec{
    48. DisableMasquerade: true,
    49. Ipam: "eni",
    50. },
    51. Spec: kops.ClusterSpec{
    52. CloudProvider: "gce",
    53. },
    54. ExpectedErrors: []string{"Forbidden::cilium.ipam"},
    55. },
    56. }
    57. for _, g := range grid {
    58. g.Spec.Networking = &kops.NetworkingSpec{
    59. Cilium: &g.Cilium,
    60. }
    61. errs := validateNetworkingCilium(&g.Spec, g.Spec.Networking.Cilium, field.NewPath("cilium"))
    62. testErrors(t, g.Spec, errs, g.ExpectedErrors)
    63. }
    64. }

    If your feature touches important configuration options in config or cluster.spec, document them in cluster_spec.md.

    Testing

    To rapidly test a nodeup change, you can build it, scp it to a running machine, and run it over SSH with the output viewable locally:

    make push-aws-run-amd64 TARGET=admin@<publicip>

    For more complete testing though, you will likely want to do a private build of nodeup and launch a cluster from scratch.

    To do this, you can repoint the nodeup source url by setting the KOPS_BASE_URL env var, and then push nodeup using:

    If you have changed the dns or kOps controllers, you would want to test them as well. To do so, run the respective snippets below before creating the cluster.

    For dns-controller:

    1. KOPS_VERSION=`.build/dist/$(go env GOOS)/$(go env GOARCH)/kops version -- --short`
    2. export DOCKER_IMAGE_PREFIX=${USER}/
    3. export DOCKER_REGISTRY=
    4. make dns-controller-push
    5. export DNSCONTROLLER_IMAGE=${DOCKER_IMAGE_PREFIX}dns-controller:${KOPS_VERSION}

    For kops-controller:

    1. KOPS_VERSION=`.build/dist/$(go env GOOS)/$(go env GOARCH)/kops version -- --short`
    2. export DOCKER_IMAGE_PREFIX=${USER}/
    3. export DOCKER_REGISTRY=
    4. make kops-controller-push
    5. export KOPSCONTROLLER_IMAGE=${DOCKER_IMAGE_PREFIX}kops-controller:${KOPS_VERSION}

    Using the feature

    Users would simply kops edit cluster, and add a value like:

    1. spec:
    2. networking:
    3. cilium:
    4. disableMasquerade: true
    5. ipam: eni

    Then kops update cluster --yes would create the new NodeUpConfig, which is included in the instance startup script and thus requires a new LaunchTemplate version, and thus a kops-rolling update. We’re working on changing settings without requiring a reboot, but likely for this particular setting it isn’t the sort of thing you need to change very often.

    • We could also create a CLI flag on create cluster. This doesn’t seem worth it in this case; this is a relatively advanced option.