Adding a feature
To add an option for Cilium to use the ENI IPAM mode.
We want to make this an option, so we need to add a field to CiliumNetworkingSpec:
A few things to note here:
We could probably use a boolean for today’s needs, but we want to leave some flexibility, so we use a string.
We define a value for Cilium’s current default mode, so we leave the default “” value as meaning “default mode, whatever it may be in future”.
So, we just need to check if Ipam
is eni
when determining which mode to configure.
We will need to update both the versioned and unversioned APIs and regenerate the generated code, per the documentation on updating the API.
Validation
We should add some validation that the value entered is valid. We only accept eni
, crd
or the empty string right now.
Validation is done in validation.go, and is fairly simple - we just add an error to a slice if something is not valid:
if v.Ipam != "" {
// "azure" not supported by kOps
allErrs = append(allErrs, IsValidValue(fldPath.Child("ipam"), &v.Ipam, []string{"crd", "eni"})...)
if v.Ipam == kops.CiliumIpamEni {
if c.CloudProvider != string(kops.CloudProviderAWS) {
allErrs = append(allErrs, field.Forbidden(fldPath.Child("ipam"), "Cilum ENI IPAM is supported only in AWS"))
}
if !v.DisableMasquerade {
allErrs = append(allErrs, field.Forbidden(fldPath.Child("disableMasquerade"), "Masquerade must be disabled when ENI IPAM is used"))
}
}
}
Configuring Cilium
First we add to the cilium-config
ConfigMap:
{{ with .Ipam }}
ipam: {{ . }}
{{ if eq . "eni" }}
enable-endpoint-routes: "true"
auto-create-cilium-node-resource: "true"
blacklist-conflicting-routes: "false"
{{ end }}
{{ end }}
Then we conditionally move cilium-operator to masters:
{{ if eq .Ipam "eni" }}
nodeSelector:
node-role.kubernetes.io/master: ""
tolerations:
- effect: NoSchedule
key: node-role.kubernetes.io/master
- effect: NoExecute
key: node.kubernetes.io/not-ready
operator: Exists
tolerationSeconds: 300
- effect: NoExecute
key: node.kubernetes.io/unreachable
operator: Exists
tolerationSeconds: 300
{{ end }}
After changing manifest files remember to run bash hack/update-expected.sh
in order to get updated values.
When Cilium is in ENI mode kubelet
needs to be configured with the local IP address, so that it can distinguish it from the secondary IP address used by ENI. Kubelet is configured by nodeup, in nodeup/pkg/model/kubelet.go. That code passes the local IP address to kubelet
when the UsesSecondaryIP()
receiver of the NodeupModelContext
returns true.
So we modify UsesSecondaryIP()
to also return true
when Cilium is in ENI mode:
Configuring IAM
When Cilium is in ENI mode, cilium-operator
on the master nodes needs additional IAM permissions. The masters’ IAM permissions are built by BuildAWSPolicyMaster()
in pkg/model/iam/iam_builder.go:
if b.Cluster.Spec.Networking != nil && b.Cluster.Spec.Networking.Cilium != nil && b.Cluster.Spec.Networking.Cilium.Ipam == kops.CiliumIpamEni {
addCiliumEniPermissions(p, resource, b.Cluster.Spec.IAM.Legacy)
}
p.Statement = append(p.Statement,
Effect: StatementEffectAllow,
Action: stringorslice.Slice([]string{
"ec2:DescribeSubnets",
"ec2:AttachNetworkInterface",
"ec2:AssignPrivateIpAddresses",
"ec2:UnassignPrivateIpAddresses",
"ec2:CreateNetworkInterface",
"ec2:DescribeNetworkInterfaces",
"ec2:DescribeVpcPeeringConnections",
"ec2:DescribeSecurityGroups",
"ec2:DetachNetworkInterface",
"ec2:DeleteNetworkInterface",
"ec2:ModifyNetworkInterfaceAttribute",
"ec2:DescribeVpcs",
}),
Resource: resource,
},
)
}
Tests
Prior to testing this for real, it can be handy to write a few unit tests.
We should test that validation works as we expect (in validation_test.go):
func Test_Validate_Cilium(t *testing.T) {
grid := []struct {
Cilium kops.CiliumNetworkingSpec
Spec kops.ClusterSpec
ExpectedErrors []string
}{
{
Cilium: kops.CiliumNetworkingSpec{},
},
{
Cilium: kops.CiliumNetworkingSpec{
Ipam: "crd",
},
},
{
Cilium: kops.CiliumNetworkingSpec{
DisableMasquerade: true,
Ipam: "eni",
},
Spec: kops.ClusterSpec{
CloudProvider: "aws",
},
},
{
Cilium: kops.CiliumNetworkingSpec{
DisableMasquerade: true,
Ipam: "eni",
},
Spec: kops.ClusterSpec{
CloudProvider: "aws",
},
{
Cilium: kops.CiliumNetworkingSpec{
Ipam: "foo",
},
},
{
Cilium: kops.CiliumNetworkingSpec{
Ipam: "eni",
},
Spec: kops.ClusterSpec{
CloudProvider: "aws",
},
ExpectedErrors: []string{"Forbidden::cilium.disableMasquerade"},
},
{
Cilium: kops.CiliumNetworkingSpec{
DisableMasquerade: true,
Ipam: "eni",
},
Spec: kops.ClusterSpec{
CloudProvider: "gce",
},
ExpectedErrors: []string{"Forbidden::cilium.ipam"},
},
}
for _, g := range grid {
g.Spec.Networking = &kops.NetworkingSpec{
Cilium: &g.Cilium,
}
errs := validateNetworkingCilium(&g.Spec, g.Spec.Networking.Cilium, field.NewPath("cilium"))
testErrors(t, g.Spec, errs, g.ExpectedErrors)
}
}
If your feature touches important configuration options in config
or cluster.spec
, document them in cluster_spec.md.
Testing
To rapidly test a nodeup change, you can build it, scp it to a running machine, and run it over SSH with the output viewable locally:
make push-aws-run-amd64 TARGET=admin@<publicip>
For more complete testing though, you will likely want to do a private build of nodeup and launch a cluster from scratch.
To do this, you can repoint the nodeup source url by setting the KOPS_BASE_URL
env var, and then push nodeup using:
If you have changed the dns or kOps controllers, you would want to test them as well. To do so, run the respective snippets below before creating the cluster.
For dns-controller:
KOPS_VERSION=`.build/dist/$(go env GOOS)/$(go env GOARCH)/kops version -- --short`
export DOCKER_IMAGE_PREFIX=${USER}/
export DOCKER_REGISTRY=
make dns-controller-push
export DNSCONTROLLER_IMAGE=${DOCKER_IMAGE_PREFIX}dns-controller:${KOPS_VERSION}
For kops-controller:
KOPS_VERSION=`.build/dist/$(go env GOOS)/$(go env GOARCH)/kops version -- --short`
export DOCKER_IMAGE_PREFIX=${USER}/
export DOCKER_REGISTRY=
make kops-controller-push
export KOPSCONTROLLER_IMAGE=${DOCKER_IMAGE_PREFIX}kops-controller:${KOPS_VERSION}
Using the feature
Users would simply kops edit cluster
, and add a value like:
spec:
networking:
cilium:
disableMasquerade: true
ipam: eni
Then kops update cluster --yes
would create the new NodeUpConfig, which is included in the instance startup script and thus requires a new LaunchTemplate version, and thus a kops-rolling update
. We’re working on changing settings without requiring a reboot, but likely for this particular setting it isn’t the sort of thing you need to change very often.
- We could also create a CLI flag on
create cluster
. This doesn’t seem worth it in this case; this is a relatively advanced option.