EKS 데이터 플레인 오토스케일링 실습

2021-11-03

Data Engineering

Data_Engineering_TIL(20211103)

[학습자료]

“클라우드 네이티브를 위한 쿠버네티스 실전 프로젝트” 책을 읽고 정리한 내용입니다.

** 동양북스, 아이자와 고지&사토 가즈히코 지음, 박상욱 옮김

참고자료 URL : https://github.com/dybooksIT/k8s-aws-book

“IAM role을 EKS pod별로 설정하기”에 이어서 공부한 내용을 정리한 내용임

** URL : https://minman2115.github.io/DE_TIL300

[오토스케일링 기본개념]

EKS에서 Cluster Autoscaler라는 기능이 있는데 데이터 플레인 인스턴스에 대한 오토스케일링을 하는 기능이다.

Cluster Autoscaler의 스케일링 트리거 기준은 파드에 설정된 리소스 요청에 따라 판단한다. 새로운 파드를 배포하려고 할때 요청한 CPU 리소스와 메모리 크기에 여유가 없다면 그 파드는 pending 상태가 된다. 그러면 Cluster Autoscaler가 이 상황을 감지하고 요청한 파드가 동작할 수 있도록 노드를 추가한다. 다시말해서 pending 상태의 파드가 발생했을때 노드가 추가된다.

[오토스케링일 실습]

먼저 데이터노드의 IAM Role에 AutoScalingFullAccess policy 권한을 부여한다. 그런 다음에 EKS 클러스터 데이터 노드로 설정된 오토스케일링 그룹 이름을 복사한다.

복사한 오토스케일링 그룹 이름을 cluster-autoscaler.yaml 스크립트내에 지정된 # 오토스케일링 그룹 이름 설정 주석달린 밑에 부분에 넣어준다.

[ec2-user@ip-10-10-1-195 autoscaling]$ pwd
/home/ec2-user/k8s-aws-book/autoscaling

[ec2-user@ip-10-10-1-195 autoscaling]$ ll
total 12
-rw-rw-r-- 1 ec2-user ec2-user 3809 Oct 18 13:49 cluster-autoscaler.yaml
-rw-rw-r-- 1 ec2-user ec2-user 3961 Oct 18 13:49 components.yaml
-rw-rw-r-- 1 ec2-user ec2-user  272 Oct 18 13:49 horizontal-pod-autoscaler.yaml
    
[ec2-user@ip-10-10-1-195 autoscaling]$ vim cluster-autoscaler.yaml
apiVersion: v1
kind: ServiceAccount
metadata:
  labels:
    k8s-addon: cluster-autoscaler.addons.k8s.io
    k8s-app: cluster-autoscaler
  name: cluster-autoscaler
  namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: cluster-autoscaler
  labels:
    k8s-addon: cluster-autoscaler.addons.k8s.io
    k8s-app: cluster-autoscaler
rules:
- apiGroups: [""]
  resources: ["events","endpoints"]
  verbs: ["create", "patch"]
- apiGroups: [""]
  resources: ["pods/eviction"]
  verbs: ["create"]
- apiGroups: [""]
  resources: ["pods/status"]
  verbs: ["update"]
- apiGroups: [""]
  resources: ["endpoints"]
  resourceNames: ["cluster-autoscaler"]
  verbs: ["get","update"]
- apiGroups: [""]
  resources: ["nodes"]
  verbs: ["watch","list","get","update"]
- apiGroups: [""]
  resources: ["pods","services","replicationcontrollers","persistentvolumeclaims","persistentvolumes"]
  verbs: ["watch","list","get"]
- apiGroups: ["extensions"]
  resources: ["replicasets","daemonsets"]
  verbs: ["watch","list","get"]
- apiGroups: ["policy"]
  resources: ["poddisruptionbudgets"]
  verbs: ["watch","list"]
- apiGroups: ["apps"]
  resources: ["statefulsets"]
  verbs: ["watch","list","get"]
- apiGroups: ["storage.k8s.io"]
  resources: ["storageclasses"]
  verbs: ["watch","list","get"]

---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: cluster-autoscaler
  namespace: kube-system
  labels:
    k8s-addon: cluster-autoscaler.addons.k8s.io
    k8s-app: cluster-autoscaler
rules:
- apiGroups: [""]
  resources: ["configmaps"]
  verbs: ["create"]
- apiGroups: [""]
  resources: ["configmaps"]
  resourceNames: ["cluster-autoscaler-status"]
  verbs: ["delete","get","update"]

---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: cluster-autoscaler
  labels:
    k8s-addon: cluster-autoscaler.addons.k8s.io
    k8s-app: cluster-autoscaler
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: cluster-autoscaler
subjects:
  - kind: ServiceAccount
    name: cluster-autoscaler
    namespace: kube-system

---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: cluster-autoscaler
  namespace: kube-system
  labels:
    k8s-addon: cluster-autoscaler.addons.k8s.io
    k8s-app: cluster-autoscaler
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: Role
  name: cluster-autoscaler
subjects:
  - kind: ServiceAccount
    name: cluster-autoscaler
    namespace: kube-system

---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: cluster-autoscaler
  namespace: kube-system
  labels:
    app: cluster-autoscaler
spec:
  replicas: 1
  selector:
    matchLabels:
      app: cluster-autoscaler
  template:
    metadata:
      labels:
        app: cluster-autoscaler
    spec:
      serviceAccountName: cluster-autoscaler
      containers:
      - image: k8s.gcr.io/cluster-autoscaler:v1.2.2
        name: cluster-autoscaler
        resources:
          limits:
            cpu: 100m
            memory: 300Mi
          requests:
            cpu: 100m
            memory: 300Mi
        command:
        - ./cluster-autoscaler
        - --v=4
        - --stderrthreshold=info
        - --cloud-provider=aws
        - --skip-nodes-with-local-storage=false
        # 오토스케일링 그룹 이름 설정
        - --nodes=2:5:eks-xxxxxx-yyyyy-zzzzz-ttttt-cqwddwcq1a8
        env:
        - name: AWS_REGION
          # 리전을 지정
          value: ap-northeast-2
        volumeMounts:
        - name: ssl-certs
          mountPath: /etc/ssl/certs/ca-certificates.crt
          readOnly: true
        imagePullPolicy: "Always"
      volumes:
        - name: ssl-certs
          hostPath:
            path: "/etc/ssl/certs/ca-bundle.crt"

EKS 클러스터에 아래와 같이 명령어를 실행하여 cluster-autoscaler를 활성화한다. 적용 후 에러가 없다면 정상적으로 설정된 것이다.

[ec2-user@ip-10-10-1-195 autoscaling]$ kubectl apply -f cluster-autoscaler.yaml
serviceaccount/cluster-autoscaler created
clusterrole.rbac.authorization.k8s.io/cluster-autoscaler created
role.rbac.authorization.k8s.io/cluster-autoscaler created
clusterrolebinding.rbac.authorization.k8s.io/cluster-autoscaler created
rolebinding.rbac.authorization.k8s.io/cluster-autoscaler created
deployment.apps/cluster-autoscaler created

[ec2-user@ip-10-10-1-195 autoscaling]$ kubectl logs -f deployment/cluster-autoscaler -n kube-system
...
I1103 13:19:06.272490       1 leaderelection.go:199] successfully renewed lease kube-system/cluster-autoscaler
I1103 13:19:06.371828       1 reflector.go:240] Listing and watching *v1beta1.StatefulSet from k8s.io/autoscaler/cluster-autoscaler/vendor/k8s.io/client-go/informers/factory.go:87
...            

예제 어플리케이션의 디플로이먼트 레플리카 수를 10으로 변경하고 정말 해당 노드가 늘어나는지 확인해보자.

# 예제 어플리케이션 레플리카 수를 10으로 변경
[ec2-user@ip-10-10-1-195 autoscaling]$ kubectl scale --replicas=10 deployment/backend-app
deployment.apps/backend-app scaled

# 그러면 아래 파드 상태와 같이 노드 수가 부족해서 몇개의 파드는 pending 상태가 된다.
[ec2-user@ip-10-10-1-195 autoscaling]$ kubectl get pod
NAME                          READY   STATUS              RESTARTS   AGE
backend-app-7fb899969-44tsf   0/1     ContainerCreating   0          10s
backend-app-7fb899969-6kbcl   0/1     Pending             0          10s
backend-app-7fb899969-7985c   0/1     Pending             0          10s
backend-app-7fb899969-d9ltx   0/1     Pending             0          10s
backend-app-7fb899969-f752z   1/1     Running             0          6d
backend-app-7fb899969-jpbc9   1/1     Running             0          6d
backend-app-7fb899969-n4rpt   0/1     ContainerCreating   0          10s
backend-app-7fb899969-nzqv8   0/1     Pending             0          10s
backend-app-7fb899969-q7wh2   0/1     Pending             0          10s
backend-app-7fb899969-qf2kk   0/1     Pending             0          10s

# 그러면 cluster autoscaler는 이 상황을 감지하고 오토스케일링 그룹 내의 노드수를 스케일링 한다.
# 위에서 팬딩되고 있는 파드의 정보를 확인해보면 아래와 같이 파드를 트리거 걸어서 데이터 플레인을 늘리고 있는 것을 알수 있다.
[ec2-user@ip-10-10-1-195 autoscaling]$ kubectl describe pod backend-app-7fb899969-q7wh2
Name:           backend-app-7fb899969-q7wh2
Namespace:      eks-work
Priority:       0
Node:           <none>
Labels:         app=backend-app
                pod-template-hash=7fb899969
Annotations:    kubernetes.io/psp: eks.privileged
Status:         Pending
IP:
IPs:            <none>
Controlled By:  ReplicaSet/backend-app-xxxxxxx
Containers:
  backend-app:
    Image:      xxxxxxxxx.dkr.ecr.ap-northeast-2.amazonaws.com/k8sbook/backend-app:1.0.0
    Port:       8080/TCP
    Host Port:  0/TCP
    Limits:
      cpu:     250m
      memory:  768Mi
    Requests:
      cpu:      100m
      memory:   512Mi
    Liveness:   http-get http://:8080/health delay=30s timeout=1s period=30s #success=1 #failure=3
    Readiness:  http-get http://:8080/health delay=15s timeout=1s period=30s #success=1 #failure=3
    Environment:
      DB_URL:       <set to the key 'db-url' in secret 'db-config'>       Optional: false
      DB_USERNAME:  <set to the key 'db-username' in secret 'db-config'>  Optional: false
      DB_PASSWORD:  <set to the key 'db-password' in secret 'db-config'>  Optional: false
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-npqrk (ro)
Conditions:
  Type           Status
  PodScheduled   False
Volumes:
  default-token-npqrk:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-npqrk
    Optional:    false
QoS Class:       Burstable
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                 node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason            Age                From                Message
  ----     ------            ----               ----                -------
  Warning  FailedScheduling  42s (x2 over 42s)  default-scheduler   0/2 nodes are available: 2 Insufficient memory.
  Normal   TriggeredScaleUp  30s                cluster-autoscaler  pod triggered scale-up: [{eks-babe49b2-6b52-773e-1620-xxxxx 2->5 (max: 5)}]

# 아래와 같이 잠시 기다리면 데이터 플레인 EC2가 2대에서 5대로 늘어나는 것을 알 수 있다.
[ec2-user@ip-10-10-1-195 autoscaling]$ kubectl get node
NAME                                               STATUS   ROLES    AGE   VERSION
ip-192-168-0-105.ap-northeast-2.compute.internal   Ready    <none>   6d    v1.19.14-eks-dce78b
ip-192-168-2-62.ap-northeast-2.compute.internal    Ready    <none>   6d    v1.19.14-eks-dce78b

...

10 초 뒤

...

[ec2-user@ip-10-10-1-195 autoscaling]$ kubectl get node
NAME                                               STATUS     ROLES    AGE   VERSION
ip-192-168-0-105.ap-northeast-2.compute.internal   Ready      <none>   6d    v1.19.14-eks-dce78b
ip-192-168-0-254.ap-northeast-2.compute.internal   NotReady   <none>   21s   v1.19.14-eks-dce78b
ip-192-168-1-16.ap-northeast-2.compute.internal    NotReady   <none>   21s   v1.19.14-eks-dce78b
ip-192-168-1-41.ap-northeast-2.compute.internal    NotReady   <none>   21s   v1.19.14-eks-dce78b
ip-192-168-2-62.ap-northeast-2.compute.internal    Ready      <none>   6d    v1.19.14-eks-dce78b

...

10 초 뒤

...

[ec2-user@ip-10-10-1-195 autoscaling]$ kubectl get node
NAME                                               STATUS   ROLES    AGE    VERSION
ip-192-168-0-105.ap-northeast-2.compute.internal   Ready    <none>   6d     v1.19.14-eks-dce78b
ip-192-168-0-254.ap-northeast-2.compute.internal   Ready    <none>   103s   v1.19.14-eks-dce78b
ip-192-168-1-16.ap-northeast-2.compute.internal    Ready    <none>   103s   v1.19.14-eks-dce78b
ip-192-168-1-41.ap-northeast-2.compute.internal    Ready    <none>   103s   v1.19.14-eks-dce78b
ip-192-168-2-62.ap-northeast-2.compute.internal    Ready    <none>   6d     v1.19.14-eks-dce78b

아래와 같이 레플리카 수를 2로 다시 줄이면 잠시후에 데이터 플레인 수도 감소한다.

[ec2-user@ip-10-10-1-195 autoscaling]$ kubectl scale --replicas=2 deployment/backend-app
deployment.apps/backend-app scaled

...

10초 뒤

...

[ec2-user@ip-10-10-1-195 autoscaling]$ kubectl get pod
NAME                          READY   STATUS      RESTARTS   AGE
backend-app-7fb899969-f752z   1/1     Running     0          6d
backend-app-7fb899969-jpbc9   1/1     Running     0          6d1h

[ec2-user@ip-10-10-1-195 autoscaling]$ kubectl get node
NAME                                               STATUS                     ROLES    AGE   VERSION
ip-192-168-0-105.ap-northeast-2.compute.internal   Ready                      <none>   6d    v1.19.14-eks-dce78b
ip-192-168-0-254.ap-northeast-2.compute.internal   Ready,SchedulingDisabled   <none>   35m   v1.19.14-eks-dce78b
ip-192-168-1-16.ap-northeast-2.compute.internal    Ready,SchedulingDisabled   <none>   35m   v1.19.14-eks-dce78b
ip-192-168-1-41.ap-northeast-2.compute.internal    Ready,SchedulingDisabled   <none>   35m   v1.19.14-eks-dce78b
ip-192-168-2-62.ap-northeast-2.compute.internal    Ready                      <none>   6d    v1.19.14-eks-dce78b

...

3분 뒤

...

[ec2-user@ip-10-10-1-195 autoscaling]$ kubectl get node
NAME                                               STATUS   ROLES    AGE   VERSION
ip-192-168-0-105.ap-northeast-2.compute.internal   Ready    <none>   6d    v1.19.14-eks-dce78b
ip-192-168-2-62.ap-northeast-2.compute.internal    Ready    <none>   6d    v1.19.14-eks-dce78b

Cluster autoscaler 사용시 주의사항

Cluster autoscaler는 위에서 테스트 했던것처럼 requests 값으로 스케일링을 판단한다. 다시말해서 파드를 새로 배치할때만 스케일링 할지를 판단한다는 것이다. 그래서 예를 들어서 실제 EC2 노드의 CPU 사용률 등 부하가 작은데도 불구하고 파드의 requests 값이 크 경우에 파드가 스케쥴링 되지 않고 EC2 노드를 스케일링을 해버린다.

또한 requests 값은 설정값보다 작지만 limits 설정이 아예 없거나 limits가 아주 높은 값으로 설정된 경우 requests 자체에는 아직 리소스 여유가 있지만 파드가 추가적으로 배포될 경우 EC2에 과부하가 걸리는 상태가 될 수도 있다. 결론적으로 requests값과 limits 값은 상황에 따라 적절한 수로 설정해줘야 한다.

AWS 오토스케일링 기능을 이용한 예방적 오토스케일링

어떤 문제가 발생했을때 보다 어떤 문제가 발생할 조짐이 보일때 오토스케일링을 하는 것이 일반적이다. 따라서 어떤 임곗값을 넘으면 노드를 추가하는 것이 운영측면에서 바람직하다. 부하가 얼마나 발생할지, 대기시간을 얼마나 갖는지 등을 분석하여 적절한 임곗값을 설정해야 한다. 생성되는 파드를 pending 상태로 만들지 않으려면 노드의 부하가 높아지는 것을 판단할 필요가 있는데 EC2 노드의 CPU 사용률(pod_cpu_reserved_capacity)이 좋다. Container insights를 활성화했다면 자동으로 등록되기 때문에 Container insights를 사용하는 것이 좋다. 이 메트릭을 이용해서 AWS 오토스케일링을 하면 된다. 그런데 이 방법은 새로 배치될 파드가 pending 상태로 될 가능성을 줄여줄 뿐이다. 노드로 할당 가능한 CPU는 남아 있지만 실제 파드 부하가 높은 상황이라면, 여기에 파드를 배포하는 것은 EC2에 과부하를 일으킬 수 있다. 이런 상태가 되는 것을 막기 위해서 사전에 실제 부하상태를 확인하고 스케일링을 하도록 적절한 설정값을 부여하는 것이 중요하다. 예를 들어서 데이터 노드 전체 CPU 사용률이라면 AWS 표준 메트릭으로 등록되어 있어 이 메트릭을 사용할 수 있다.

 S3 전송속도 관련 참고자료 Horizontal Pod Autoscaler를 이용한 EKS 파드 오토스케일링 실습 