Horizontal Pod Autoscaler를 이용한 EKS 파드 오토스케일링 실습

2021-11-07

.

Data_Engineering_TIL(20211107)

[학습자료]

“클라우드 네이티브를 위한 쿠버네티스 실전 프로젝트” 책을 읽고 정리한 내용입니다.

** 동양북스, 아이자와 고지&사토 가즈히코 지음, 박상욱 옮김

참고자료 URL : https://github.com/dybooksIT/k8s-aws-book

“EKS 데이터 플레인 오토스케일링 실습”에 이어서 공부한 내용을 정리한 내용임

** URL : https://minman2115.github.io/DE_TIL302

[기본개념]

노드의 오토스케일링은 파드를 원하는 수만큼 띄우기 위해서 EC2 노드의 수를 충분하게 확보하는 기능이다. 그러면 어플리케이션 자체의 스케일링은 어떻게 해야 하는가. 쿠버네티스에는 Horizontal Pod Autoscaler(HPA)라는 기능이 있으며 노드와 마찬가지로 파드 리소스 사용률에 맞춰 파드의 스케일 아웃, 스케일 인 기능을 구현할 수 있다. HPA는 AWS의 오토스케일링과 마찬가지로 서비스의 문제 발생을 막는 차원에서 설정하는 것이 일반적이다. 다시말해서 파드 리소스 사용 현황을 모니터링하고 임곗값을 넘을 경우 스케일을 늘리는 구조이다.

[실습]

STEP 1) 메트릭 서버 배포

HPA로 파드 수를 자동 조절하려면 클러스터 내부에서 파드 리소스 사용 현황을 파악하고 있어야 한다. HPA는 메트릭 서버를 사용해서 사용현황을 파악한다.

# 현재 EKS 클러스터 현황 확인
[ec2-user@ip-10-10-1-195 ~]$ kubectl get all
NAME                              READY   STATUS      RESTARTS   AGE
pod/backend-app-7fb899969-f752z   1/1     Running     0          9d
pod/backend-app-7fb899969-jpbc9   1/1     Running     0          9d
pod/batch-app-1636258800-sg7rz    0/1     Completed   0          10m
pod/batch-app-1636259100-phdps    0/1     Completed   0          5m25s
pod/batch-app-1636259400-cmvvf    0/1     Completed   0          24s

NAME                          TYPE           CLUSTER-IP     EXTERNAL-IP                                                                    PORT(S)                                                AGE
service/backend-app-service   LoadBalancer   10.100.92.76   xxxxxx-yyyyyyy.ap-northeast-2.elb.amazonaws.com   8080:3060                                      9/TCP   19d

NAME                          READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/backend-app   2/2     2            2           19d

NAME                                    DESIRED   CURRENT   READY   AGE
replicaset.apps/backend-app-7fb899969   2         2         2       19d

NAME                             COMPLETIONS   DURATION   AGE
job.batch/batch-app-1635774300   0/1           5d14h      5d14h
job.batch/batch-app-1636258800   1/1           11s        10m
job.batch/batch-app-1636259100   1/1           11s        5m25s
job.batch/batch-app-1636259400   1/1           10s        24s

NAME                      SCHEDULE      SUSPEND   ACTIVE   LAST SCHEDULE   AGE
cronjob.batch/batch-app   */5 * * * *   False     0        33s             19d

# https://github.com/kubernetes-sigs/metrics-server/releases/download/v0.4.2/components.yaml 에서 다운로드 받음
[ec2-user@ip-10-10-1-195 ~]$ vim components.yaml
apiVersion: v1
kind: ServiceAccount
metadata:
  labels:
    k8s-app: metrics-server
  name: metrics-server
  namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  labels:
    k8s-app: metrics-server
    rbac.authorization.k8s.io/aggregate-to-admin: "true"
    rbac.authorization.k8s.io/aggregate-to-edit: "true"
    rbac.authorization.k8s.io/aggregate-to-view: "true"
  name: system:aggregated-metrics-reader
rules:
- apiGroups:
  - metrics.k8s.io
  resources:
  - pods
  - nodes
  verbs:
  - get
  - list
  - watch
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  labels:
    k8s-app: metrics-server
  name: system:metrics-server
rules:
- apiGroups:
  - ""
  resources:
  - pods
  - nodes
  - nodes/stats
  - namespaces
  - configmaps
  verbs:
  - get
  - list
  - watch
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  labels:
    k8s-app: metrics-server
  name: metrics-server-auth-reader
  namespace: kube-system
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: Role
  name: extension-apiserver-authentication-reader
subjects:
- kind: ServiceAccount
  name: metrics-server
  namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  labels:
    k8s-app: metrics-server
  name: metrics-server:system:auth-delegator
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: system:auth-delegator
subjects:
- kind: ServiceAccount
  name: metrics-server
  namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  labels:
    k8s-app: metrics-server
  name: system:metrics-server
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: system:metrics-server
subjects:
- kind: ServiceAccount
  name: metrics-server
  namespace: kube-system
---
apiVersion: v1
kind: Service
metadata:
  labels:
    k8s-app: metrics-server
  name: metrics-server
  namespace: kube-system
spec:
  ports:
  - name: https
    port: 443
    protocol: TCP
    targetPort: https
  selector:
    k8s-app: metrics-server
---
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    k8s-app: metrics-server
  name: metrics-server
  namespace: kube-system
spec:
  selector:
    matchLabels:
      k8s-app: metrics-server
  strategy:
    rollingUpdate:
      maxUnavailable: 0
  template:
    metadata:
      labels:
        k8s-app: metrics-server
    spec:
      containers:
      - args:
        - --cert-dir=/tmp
        - --secure-port=4443
        - --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname
        - --kubelet-use-node-status-port
        image: k8s.gcr.io/metrics-server/metrics-server:v0.4.2
        imagePullPolicy: IfNotPresent
        livenessProbe:
          failureThreshold: 3
          httpGet:
            path: /livez
            port: https
            scheme: HTTPS
          periodSeconds: 10
        name: metrics-server
        ports:
        - containerPort: 4443
          name: https
          protocol: TCP
        readinessProbe:
          failureThreshold: 3
          httpGet:
            path: /readyz
            port: https
            scheme: HTTPS
          periodSeconds: 10
        securityContext:
          readOnlyRootFilesystem: true
          runAsNonRoot: true
          runAsUser: 1000
        volumeMounts:
        - mountPath: /tmp
          name: tmp-dir
      nodeSelector:
        kubernetes.io/os: linux
      priorityClassName: system-cluster-critical
      serviceAccountName: metrics-server
      volumes:
      - emptyDir: {}
        name: tmp-dir
---
apiVersion: apiregistration.k8s.io/v1
kind: APIService
metadata:
  labels:
    k8s-app: metrics-server
  name: v1beta1.metrics.k8s.io
spec:
  group: metrics.k8s.io
  groupPriorityMinimum: 100
  insecureSkipTLSVerify: true
  service:
    name: metrics-server
    namespace: kube-system
  version: v1beta1
  versionPriority: 100
    
[ec2-user@ip-10-10-1-195 ~]$ ll
total 45828
-rw-rw-r--  1 ec2-user ec2-user      141 Oct 19 13:33 cloudwatch-namespace.yaml
-rw-rw-r--  1 ec2-user ec2-user     3962 Nov  7 04:47 components.yaml
-rw-rw-r--  1 ec2-user ec2-user      522 Oct 19 13:41 cwagent-configmap.yaml
-rw-rw-r--  1 ec2-user ec2-user     2560 Oct 19 13:43 cwagent-daemonset.yaml
-rw-rw-r--  1 ec2-user ec2-user     1120 Oct 19 13:35 cwagent-serviceaccount.yaml
drwxrwxr-x  2 ec2-user ec2-user       59 Oct 23 06:21 eks_log_exam
drwxrwxr-x 15 ec2-user ec2-user      315 Oct 18 13:49 k8s-aws-book
-rw-rw-r--  1 ec2-user ec2-user 46907392 Oct 18 12:16 kubectl

[ec2-user@ip-10-10-1-195 ~]$ kubectl apply -f components.yaml
serviceaccount/metrics-server created
clusterrole.rbac.authorization.k8s.io/system:aggregated-metrics-reader created
clusterrole.rbac.authorization.k8s.io/system:metrics-server created
rolebinding.rbac.authorization.k8s.io/metrics-server-auth-reader created
clusterrolebinding.rbac.authorization.k8s.io/metrics-server:system:auth-delegator created
clusterrolebinding.rbac.authorization.k8s.io/system:metrics-server created
service/metrics-server created
deployment.apps/metrics-server created
apiservice.apiregistration.k8s.io/v1beta1.metrics.k8s.io created

# 메트릭 APi가 활성화 되어있는지 아래와 같은 명령어로 확인한다.
# .status.conditions[].status 값이 True로 되어 있으면 정상적으로 활성화가 된 것이다.
[ec2-user@ip-10-10-1-195 ~]$ kubectl get apiservice v1beta1.metrics.k8s.io -o yaml
apiVersion: apiregistration.k8s.io/v1
kind: APIService
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"apiregistration.k8s.io/v1","kind":"APIService","metadata":{"annotations":{},"labels":{"k8s-app":"metrics-server"},"name":"v1beta1.metrics.k8s.io"},"spec":{"group":"metrics.k8s.io","groupPriorityMinimum":100,"insecureSkipTLSVerify":true,"service":{"name":"metrics-server","namespace":"kube-system"},"version":"v1beta1","versionPriority":100}}
  creationTimestamp: "2021-11-07T04:49:56Z"
  labels:
    k8s-app: metrics-server
  name: v1beta1.metrics.k8s.io
  resourceVersion: "5626252"
  selfLink: /apis/apiregistration.k8s.io/v1/apiservices/v1beta1.metrics.k8s.io
  uid: 99d17797-d1af-4561-b454-611c0615b0e0
spec:
  group: metrics.k8s.io
  groupPriorityMinimum: 100
  insecureSkipTLSVerify: true
  service:
    name: metrics-server
    namespace: kube-system
    port: 443
  version: v1beta1
  versionPriority: 100
status:
  conditions:
  - lastTransitionTime: "2021-11-07T04:50:04Z"
    message: all checks passed
    reason: Passed
    status: "True"        # 이 부분이 True 여야함
    type: Available

STEP 2) HPA 리소스 생성

예제 어플리케이션에 일정한 부하가 발생했을때 자동으로 스케일링 동작으로 하도록 HPA 리소스를 생성한다. 예제 소스 파일의 autoscaling 디렉토리로 이동한 후 아래의 명령어를 실행한다.

[ec2-user@ip-10-10-1-195 ~]$ pwd
/home/ec2-user

[ec2-user@ip-10-10-1-195 ~]$ ll
total 45828
-rw-rw-r--  1 ec2-user ec2-user      141 Oct 19 13:33 cloudwatch-namespace.yaml
-rw-rw-r--  1 ec2-user ec2-user     3962 Nov  7 04:47 components.yaml
-rw-rw-r--  1 ec2-user ec2-user      522 Oct 19 13:41 cwagent-configmap.yaml
-rw-rw-r--  1 ec2-user ec2-user     2560 Oct 19 13:43 cwagent-daemonset.yaml
-rw-rw-r--  1 ec2-user ec2-user     1120 Oct 19 13:35 cwagent-serviceaccount.yaml
drwxrwxr-x  2 ec2-user ec2-user       59 Oct 23 06:21 eks_log_exam
drwxrwxr-x 15 ec2-user ec2-user      315 Oct 18 13:49 k8s-aws-book
-rw-rw-r--  1 ec2-user ec2-user 46907392 Oct 18 12:16 kubectl

[ec2-user@ip-10-10-1-195 ~]$ cd k8s-aws-book/

[ec2-user@ip-10-10-1-195 k8s-aws-book]$ ll
total 20
drwxrwxr-x  2 ec2-user ec2-user    98 Nov  3 13:09 autoscaling
drwxrwxr-x  5 ec2-user ec2-user   166 Oct 18 13:49 backend-app
drwxrwxr-x  7 ec2-user ec2-user   198 Oct 18 14:10 batch-app
drwxrwxr-x  4 ec2-user ec2-user    49 Oct 18 13:49 cicd
drwxrwxr-x  2 ec2-user ec2-user    85 Oct 18 13:49 column-deployment-update
drwxrwxr-x  2 ec2-user ec2-user    60 Oct 18 13:49 column-loadbalancer-https
drwxrwxr-x  2 ec2-user ec2-user    73 Oct 18 13:49 db-docker-compose
drwxrwxr-x  3 ec2-user ec2-user  4096 Oct 18 14:26 eks-env
drwxrwxr-x 13 ec2-user ec2-user   288 Oct 18 13:59 frontend-app
-rw-rw-r--  1 ec2-user ec2-user 11357 Oct 18 13:49 LICENSE
drwxrwxr-x  3 ec2-user ec2-user    37 Oct 18 13:49 readme
-rw-rw-r--  1 ec2-user ec2-user  3103 Oct 18 13:49 README.md
drwxrwxr-x  4 ec2-user ec2-user    75 Oct 18 14:10 sample-app-common
drwxrwxr-x  2 ec2-user ec2-user   181 Oct 18 13:49 security

[ec2-user@ip-10-10-1-195 k8s-aws-book]$ cd autoscaling

[ec2-user@ip-10-10-1-195 autoscaling]$ ll
total 12
-rw-rw-r-- 1 ec2-user ec2-user 3774 Nov  3 13:09 cluster-autoscaler.yaml
-rw-rw-r-- 1 ec2-user ec2-user 3961 Oct 18 13:49 components.yaml
-rw-rw-r-- 1 ec2-user ec2-user  272 Oct 18 13:49 horizontal-pod-autoscaler.yaml

[ec2-user@ip-10-10-1-195 autoscaling]$ vim horizontal-pod-autoscaler.yaml
apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
  name: backend-app
  namespace: eks-work
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: backend-app
  minReplicas: 2
  maxReplicas: 5
  targetCPUUtilizationPercentage: 50

[ec2-user@ip-10-10-1-195 autoscaling]$ kubectl apply -f horizontal-pod-autoscaler.yaml
horizontalpodautoscaler.autoscaling/backend-app created

STEP 3) 동작 확인하기

아래 명령어와 같이 부하를 주기 위해 파드를 생성해 예제 어플리케이션 파드에 부하를 준다.

# 부하를 주기 위한 파드 실행
# 프롬프트가 표시되면 아래 명령어 입력
# while true; do wget -q -O- http://backend-app-service.eks-work.svc.cluster.local:8080/health; done
[ec2-user@ip-10-10-1-195 autoscaling]$ kubectl run -i --tty load-generator --image=busybox --rm -- sh
If you don't see a command prompt, try pressing enter.
/ # while true; do wget -q -O- http://backend-app-service.eks-work.svc.cluster.local:8080/health; done
{"status":"OK"}{"status":"OK"}{"status":"OK"} ..... {"status":"OK"}{"status":"OK"}{"status":"OK"} ...

# 위와 같이 while 문으로 계속 실행될 것이다.
# 그래서 터미널을 새로 띄워서 EKS 클러이언트로 다시 접속한 다음 아래와 같은 명령어를 실행한다.
# HPA 리소스 상태를 확인하면 파드의 CPU 사용률과 레플리카 수를 확인할 수 있다. 부하를 위와 같이 주는 동안에는
# 레플리카 수가 증가하며 Max로 지정한 5개까지 스케일 아웃하여 부하를 줄이면 다시 1개로 돌아간다. 레플리카 수나
# 리소스 상황에 따라 부하가 발생하지 않을 경우 부하를 주기 위한 파드를 더 생성하여 부하를 주면 된다.
# HPA를 확인하면 레플리카 수가 자동으로 변화함
[ec2-user@ip-10-10-1-195 ~]$ kubectl get hpa -w
NAME          REFERENCE                TARGETS   MINPODS   MAXPODS   REPLICAS   AGE
backend-app   Deployment/backend-app   2%/50%    2         5         2          11m
backend-app   Deployment/backend-app   45%/50%   2         5         2          11m
backend-app   Deployment/backend-app   67%/50%   2         5         2          12m
backend-app   Deployment/backend-app   67%/50%   2         5         3          12m
backend-app   Deployment/backend-app   60%/50%   2         5         3          13m
backend-app   Deployment/backend-app   44%/50%   2         5         3          14m
backend-app   Deployment/backend-app   81%/50%   2         5         3          15m
backend-app   Deployment/backend-app   81%/50%   2         5         5          16m
backend-app   Deployment/backend-app   59%/50%   2         5         5          16m
# 다섯개까지 늘어 났다면 위에 while문 날린 쉘 프롬프트에서 부하를 중지(부하를 주기 위한 명령을 실행한 쉘에서 ctrl + c)하고
# 다시 HPA 리소스 상태를 확인하면 레플리카 수가 다시 2개로 줄어드는 것을 알 수 있다.
backend-app   Deployment/backend-app   2%/50%    2         5         5          17m
backend-app   Deployment/backend-app   1%/50%    2         5         5          19m
backend-app   Deployment/backend-app   2%/50%    2         5         5          20m
backend-app   Deployment/backend-app   2%/50%    2         5         5          22m
backend-app   Deployment/backend-app   2%/50%    2         5         3          22m
backend-app   Deployment/backend-app   1%/50%    2         5         3          25m
backend-app   Deployment/backend-app   1%/50%    2         5         3          27m
backend-app   Deployment/backend-app   1%/50%    2         5         2          27m
backend-app   Deployment/backend-app   2%/50%    2         5         2          29m
^C[ec2-user@ip-10-10-1-195 ~]$

# 다시 while문 날린 쉘 프롬프트로 이동해서 exti 명령어를 실행해서 종료해준다.

...

/ # exit
Session ended, resume using 'kubectl attach load-generator -c load-generator -i -t' command when the pod is running
pod "load-generator" deleted
[ec2-user@ip-10-10-1-195 autoscaling]$

HPA에서는 기본값으로 스케일 아웃 시간은 30초, 스케일 시간은 5분에 한번 동작한다.

STEP 4) 실습 리소스 삭제

# 오토스케일링 관리 리소스 삭제
[ec2-user@ip-10-10-1-195 autoscaling]$ cd /home/ec2-user/k8s-aws-book

[ec2-user@ip-10-10-1-195 k8s-aws-book]$ ll
total 20
drwxrwxr-x  2 ec2-user ec2-user    98 Nov  7 05:10 autoscaling
drwxrwxr-x  5 ec2-user ec2-user   166 Oct 18 13:49 backend-app
drwxrwxr-x  7 ec2-user ec2-user   198 Oct 18 14:10 batch-app
drwxrwxr-x  4 ec2-user ec2-user    49 Oct 18 13:49 cicd
drwxrwxr-x  2 ec2-user ec2-user    85 Oct 18 13:49 column-deployment-update
drwxrwxr-x  2 ec2-user ec2-user    60 Oct 18 13:49 column-loadbalancer-https
drwxrwxr-x  2 ec2-user ec2-user    73 Oct 18 13:49 db-docker-compose
drwxrwxr-x  3 ec2-user ec2-user  4096 Oct 18 14:26 eks-env
drwxrwxr-x 13 ec2-user ec2-user   288 Oct 18 13:59 frontend-app
-rw-rw-r--  1 ec2-user ec2-user 11357 Oct 18 13:49 LICENSE
drwxrwxr-x  3 ec2-user ec2-user    37 Oct 18 13:49 readme
-rw-rw-r--  1 ec2-user ec2-user  3103 Oct 18 13:49 README.md
drwxrwxr-x  4 ec2-user ec2-user    75 Oct 18 14:10 sample-app-common
drwxrwxr-x  2 ec2-user ec2-user   181 Oct 18 13:49 security

[ec2-user@ip-10-10-1-195 k8s-aws-book]$ kubectl delete -f autoscaling
serviceaccount "cluster-autoscaler" deleted
clusterrole.rbac.authorization.k8s.io "cluster-autoscaler" deleted
role.rbac.authorization.k8s.io "cluster-autoscaler" deleted
clusterrolebinding.rbac.authorization.k8s.io "cluster-autoscaler" deleted
rolebinding.rbac.authorization.k8s.io "cluster-autoscaler" deleted
deployment.apps "cluster-autoscaler" deleted
serviceaccount "metrics-server" deleted
clusterrole.rbac.authorization.k8s.io "system:aggregated-metrics-reader" deleted
clusterrole.rbac.authorization.k8s.io "system:metrics-server" deleted
rolebinding.rbac.authorization.k8s.io "metrics-server-auth-reader" deleted
clusterrolebinding.rbac.authorization.k8s.io "metrics-server:system:auth-delegator" deleted
clusterrolebinding.rbac.authorization.k8s.io "system:metrics-server" deleted
service "metrics-server" deleted
deployment.apps "metrics-server" deleted
apiservice.apiregistration.k8s.io "v1beta1.metrics.k8s.io" deleted
horizontalpodautoscaler.autoscaling "backend-app" deleted