We are going to install a Rancher RKE custom cluster with a fixed number of nodes with the etcd and controlplane roles, and a variable nodes with the worker role, managed by .

Prerequisites

These elements are required to follow this guide:

  • The Rancher server is up and running
  • You have an AWS EC2 user with proper permissions to create virtual machines, auto scaling groups, and IAM profiles and roles

On Rancher server, we should create a custom k8s cluster v1.18.x. Be sure that cloud_provider name is set to amazonec2. Once cluster is created we need to get:

  • clusterID: c-xxxxx will be used on EC2 kubernetes.io/cluster/<clusterID> instance tag
  • clusterName: will be used on EC2 k8s.io/cluster-autoscaler/<clusterName> instance tag
  • nodeCommand: will be added on EC2 instance user_data to include new nodes on cluster

2. Configure the Cloud Provider

On AWS EC2, we should create a few objects to configure our system. We’ve defined three distinct groups and IAM profiles to configure on AWS.

  1. Autoscaling group: Nodes that will be part of the EC2 Auto Scaling Group (ASG). The ASG will be used by cluster-autoscaler to scale up and down.

    • IAM profile: Required by k8s nodes where cluster-autoscaler will be running. It is recommended for Kubernetes master nodes. This profile is called K8sAutoscalerProfile.
    1. {
    2. "Version": "2012-10-17",
    3. "Statement": [
    4. {
    5. "Effect": "Allow",
    6. "Action": [
    7. "autoscaling:DescribeAutoScalingGroups",
    8. "autoscaling:DescribeAutoScalingInstances",
    9. "autoscaling:DescribeLaunchConfigurations",
    10. "autoscaling:SetDesiredCapacity",
    11. "autoscaling:TerminateInstanceInAutoScalingGroup",
    12. "autoscaling:DescribeTags",
    13. "autoscaling:DescribeLaunchConfigurations",
    14. "ec2:DescribeLaunchTemplateVersions"
    15. ],
    16. "Resource": [
    17. "*"
    18. ]
    19. }
    20. ]
    21. }
  2. Master group: Nodes that will be part of the Kubernetes etcd and/or control planes. This will be out of the ASG.

    • IAM profile: Required by the Kubernetes cloud_provider integration. Optionally, AWS_ACCESS_KEY and AWS_SECRET_KEY can be used instead using-aws-credentials. This profile is called K8sMasterProfile.
    1. {
    2. "Version": "2012-10-17",
    3. "Statement": [
    4. {
    5. "Effect": "Allow",
    6. "Action": [
    7. "autoscaling:DescribeAutoScalingGroups",
    8. "autoscaling:DescribeLaunchConfigurations",
    9. "autoscaling:DescribeTags",
    10. "ec2:DescribeInstances",
    11. "ec2:DescribeRegions",
    12. "ec2:DescribeRouteTables",
    13. "ec2:DescribeSecurityGroups",
    14. "ec2:DescribeSubnets",
    15. "ec2:DescribeVolumes",
    16. "ec2:CreateSecurityGroup",
    17. "ec2:CreateTags",
    18. "ec2:CreateVolume",
    19. "ec2:ModifyInstanceAttribute",
    20. "ec2:ModifyVolume",
    21. "ec2:AttachVolume",
    22. "ec2:AuthorizeSecurityGroupIngress",
    23. "ec2:CreateRoute",
    24. "ec2:DeleteRoute",
    25. "ec2:DeleteSecurityGroup",
    26. "ec2:DeleteVolume",
    27. "ec2:DetachVolume",
    28. "ec2:RevokeSecurityGroupIngress",
    29. "elasticloadbalancing:AddTags",
    30. "elasticloadbalancing:AttachLoadBalancerToSubnets",
    31. "elasticloadbalancing:ApplySecurityGroupsToLoadBalancer",
    32. "elasticloadbalancing:CreateLoadBalancer",
    33. "elasticloadbalancing:CreateLoadBalancerPolicy",
    34. "elasticloadbalancing:CreateLoadBalancerListeners",
    35. "elasticloadbalancing:ConfigureHealthCheck",
    36. "elasticloadbalancing:DeleteLoadBalancer",
    37. "elasticloadbalancing:DeleteLoadBalancerListeners",
    38. "elasticloadbalancing:DescribeLoadBalancers",
    39. "elasticloadbalancing:DetachLoadBalancerFromSubnets",
    40. "elasticloadbalancing:DeregisterInstancesFromLoadBalancer",
    41. "elasticloadbalancing:ModifyLoadBalancerAttributes",
    42. "elasticloadbalancing:RegisterInstancesWithLoadBalancer",
    43. "elasticloadbalancing:SetLoadBalancerPoliciesForBackendServer",
    44. "elasticloadbalancing:AddTags",
    45. "elasticloadbalancing:CreateListener",
    46. "elasticloadbalancing:CreateTargetGroup",
    47. "elasticloadbalancing:DeleteListener",
    48. "elasticloadbalancing:DeleteTargetGroup",
    49. "elasticloadbalancing:DescribeListeners",
    50. "elasticloadbalancing:DescribeLoadBalancerPolicies",
    51. "elasticloadbalancing:DescribeTargetGroups",
    52. "elasticloadbalancing:DescribeTargetHealth",
    53. "elasticloadbalancing:ModifyListener",
    54. "elasticloadbalancing:ModifyTargetGroup",
    55. "elasticloadbalancing:RegisterTargets",
    56. "elasticloadbalancing:SetLoadBalancerPoliciesOfListener",
    57. "iam:CreateServiceLinkedRole",
    58. "ecr:GetAuthorizationToken",
    59. "ecr:BatchCheckLayerAvailability",
    60. "ecr:GetDownloadUrlForLayer",
    61. "ecr:GetRepositoryPolicy",
    62. "ecr:DescribeRepositories",
    63. "ecr:ListImages",
    64. "ecr:BatchGetImage",
    65. "kms:DescribeKey"
    66. ],
    67. "Resource": [
    68. "*"
    69. ]
    70. }
    71. ]
    72. }
    • IAM role: K8sMasterRole: [K8sMasterProfile,K8sAutoscalerProfile]
    • Security group: K8sMasterSg More info at
    • Tags: kubernetes.io/cluster/<clusterID>: owned
    • User data: K8sMasterUserData Ubuntu 18.04(ami-0e11cbb34015ff725), installs docker and add etcd+controlplane node to the k8s cluster
    • IAM profile: Provides cloud_provider worker integration. This profile is called K8sWorkerProfile.
    1. {
    2. "Version": "2012-10-17",
    3. "Statement": [
    4. {
    5. "Effect": "Allow",
    6. "Action": [
    7. "ec2:DescribeInstances",
    8. "ec2:DescribeRegions",
    9. "ecr:GetAuthorizationToken",
    10. "ecr:BatchCheckLayerAvailability",
    11. "ecr:GetDownloadUrlForLayer",
    12. "ecr:GetRepositoryPolicy",
    13. "ecr:DescribeRepositories",
    14. "ecr:ListImages",
    15. "ecr:BatchGetImage"
    16. ],
    17. "Resource": "*"
    18. }
    19. ]
    20. }
    • IAM role: K8sWorkerRole: [K8sWorkerProfile]
    • Security group: K8sWorkerSg More info at RKE ports (custom nodes tab)
    • Tags:
    • kubernetes.io/cluster/<clusterID>: owned
    • k8s.io/cluster-autoscaler/<clusterName>: true
    • k8s.io/cluster-autoscaler/enabled: true
    1. #!/bin/bash -x
    2. cat <<EOF > /etc/sysctl.d/90-kubelet.conf
    3. vm.overcommit_memory = 1
    4. kernel.panic = 10
    5. kernel.panic_on_oops = 1
    6. kernel.keys.root_maxkeys = 1000000
    7. kernel.keys.root_maxbytes = 25000000
    8. EOF
    9. sysctl -p /etc/sysctl.d/90-kubelet.conf
    10. curl -sL https://releases.rancher.com/install-docker/19.03.sh | sh
    11. sudo usermod -aG docker ubuntu
    12. TOKEN=$(curl -s -X PUT "http://169.254.169.254/latest/api/token" -H "X-aws-ec2-metadata-token-ttl-seconds: 21600")
    13. PRIVATE_IP=$(curl -H "X-aws-ec2-metadata-token: ${TOKEN}" -s http://169.254.169.254/latest/meta-data/local-ipv4)
    14. PUBLIC_IP=$(curl -H "X-aws-ec2-metadata-token: ${TOKEN}" -s http://169.254.169.254/latest/meta-data/public-ipv4)
    15. K8S_ROLES="--worker"
    16. sudo docker run -d --privileged --restart=unless-stopped --net=host -v /etc/kubernetes:/etc/kubernetes -v /var/run:/var/run rancher/rancher-agent:<RANCHER_VERSION> --server https://<RANCHER_URL> --token <RANCHER_TOKEN> --ca-checksum <RANCHER_CA_CHECKCSUM> --address ${PUBLIC_IP} --internal-address ${PRIVATE_IP} ${K8S_ROLES}

More info is at and Cluster Autoscaler on AWS.

Once we’ve configured AWS, let’s create VMs to bootstrap our cluster:

  • master (etcd+controlplane): Depending your needs, deploy three master instances with proper size. More info is at

    • IAM role: K8sMasterRole
    • Security group: K8sMasterSg
    • Tags:
    • kubernetes.io/cluster/<clusterID>: owned
    • User data: K8sMasterUserData
  • worker: Define an ASG on EC2 with the following settings:

    • Name: K8sWorkerAsg
    • IAM role: K8sWorkerRole
    • Security group: K8sWorkerSg
    • Tags:
    • kubernetes.io/cluster/<clusterID>: owned
    • k8s.io/cluster-autoscaler/<clusterName>: true
    • k8s.io/cluster-autoscaler/enabled: true
    • User data: K8sWorkerUserData
    • Instances:
    • minimum: 2
    • desired: 2
    • maximum: 10

Once the VMs are deployed, you should have a Rancher custom cluster up and running with three master and two worker nodes.

4. Install Cluster-autoscaler

At this point, we should have rancher cluster up and running. We are going to install cluster-autoscaler on master nodes and kube-system namespace, following cluster-autoscaler recommendation.

Parameters

This table shows cluster-autoscaler parameters for fine tuning:

Deployment

Once the manifest file is prepared, deploy it in the Kubernetes cluster (Rancher UI can be used instead):

  1. kubectl -n kube-system apply -f cluster-autoscaler-deployment.yaml

Note: Cluster-autoscaler deployment can also be set up using

Testing

At this point, we should have a cluster-scaler up and running in our Rancher custom cluster. Cluster-scale should manage K8sWorkerAsg ASG to scale up and down between 2 and 10 nodes, when one of the following conditions is true:

  • There are pods that failed to run in the cluster due to insufficient resources. In this case, the cluster is scaled up.
  • There are nodes in the cluster that have been underutilized for an extended period of time and their pods can be placed on other existing nodes. In this case, the cluster is scaled down.

We’ve prepared a test-deployment.yaml just to generate load on the Kubernetes cluster and see if cluster-autoscaler is working properly. The test deployment is requesting 1000m CPU and 1024Mi memory by three replicas. Adjust the requested resources and/or replica to be sure you exhaust the Kubernetes cluster resources:

  1. apiVersion: apps/v1
  2. kind: Deployment
  3. metadata:
  4. labels:
  5. app: hello-world
  6. name: hello-world
  7. spec:
  8. replicas: 3
  9. selector:
  10. matchLabels:
  11. app: hello-world
  12. strategy:
  13. rollingUpdate:
  14. maxSurge: 1
  15. maxUnavailable: 0
  16. type: RollingUpdate
  17. template:
  18. metadata:
  19. labels:
  20. app: hello-world
  21. spec:
  22. containers:
  23. - image: rancher/hello-world
  24. imagePullPolicy: Always
  25. name: hello-world
  26. ports:
  27. - containerPort: 80
  28. protocol: TCP
  29. resources:
  30. limits:
  31. cpu: 1000m
  32. memory: 1024Mi
  33. requests:
  34. cpu: 1000m

Once the test deployment is prepared, deploy it in the Kubernetes cluster default namespace (Rancher UI can be used instead):

Checking Scale

Once the Kubernetes resources got exhausted, cluster-autoscaler should scale up worker nodes where pods failed to be scheduled. It should scale up until up until all pods became scheduled. You should see the new nodes on the ASG and on the Kubernetes cluster. Check the logs on the cluster-autoscaler pod.

Once scale up is checked, let check for scale down. To do it, reduce the replica number on the test deployment until you release enough Kubernetes cluster resources to scale down. You should see nodes disappear on the ASG and on the Kubernetes cluster. Check the logs on the kube-system cluster-autoscaler pod.