Horizontal Pod Autoscaler를 이용한 EKS 파드 오토스케일링 실습
.
Data_Engineering_TIL(20211107)
[학습자료]
“클라우드 네이티브를 위한 쿠버네티스 실전 프로젝트” 책을 읽고 정리한 내용입니다.
** 동양북스, 아이자와 고지&사토 가즈히코 지음, 박상욱 옮김
참고자료 URL : https://github.com/dybooksIT/k8s-aws-book
“EKS 데이터 플레인 오토스케일링 실습”에 이어서 공부한 내용을 정리한 내용임
** URL : https://minman2115.github.io/DE_TIL302
[기본개념]
노드의 오토스케일링은 파드를 원하는 수만큼 띄우기 위해서 EC2 노드의 수를 충분하게 확보하는 기능이다. 그러면 어플리케이션 자체의 스케일링은 어떻게 해야 하는가. 쿠버네티스에는 Horizontal Pod Autoscaler(HPA)라는 기능이 있으며 노드와 마찬가지로 파드 리소스 사용률에 맞춰 파드의 스케일 아웃, 스케일 인 기능을 구현할 수 있다. HPA는 AWS의 오토스케일링과 마찬가지로 서비스의 문제 발생을 막는 차원에서 설정하는 것이 일반적이다. 다시말해서 파드 리소스 사용 현황을 모니터링하고 임곗값을 넘을 경우 스케일을 늘리는 구조이다.
[실습]
STEP 1) 메트릭 서버 배포
HPA로 파드 수를 자동 조절하려면 클러스터 내부에서 파드 리소스 사용 현황을 파악하고 있어야 한다. HPA는 메트릭 서버를 사용해서 사용현황을 파악한다.
# 현재 EKS 클러스터 현황 확인
[ec2-user@ip-10-10-1-195 ~]$ kubectl get all
NAME READY STATUS RESTARTS AGE
pod/backend-app-7fb899969-f752z 1/1 Running 0 9d
pod/backend-app-7fb899969-jpbc9 1/1 Running 0 9d
pod/batch-app-1636258800-sg7rz 0/1 Completed 0 10m
pod/batch-app-1636259100-phdps 0/1 Completed 0 5m25s
pod/batch-app-1636259400-cmvvf 0/1 Completed 0 24s
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/backend-app-service LoadBalancer 10.100.92.76 xxxxxx-yyyyyyy.ap-northeast-2.elb.amazonaws.com 8080:3060 9/TCP 19d
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/backend-app 2/2 2 2 19d
NAME DESIRED CURRENT READY AGE
replicaset.apps/backend-app-7fb899969 2 2 2 19d
NAME COMPLETIONS DURATION AGE
job.batch/batch-app-1635774300 0/1 5d14h 5d14h
job.batch/batch-app-1636258800 1/1 11s 10m
job.batch/batch-app-1636259100 1/1 11s 5m25s
job.batch/batch-app-1636259400 1/1 10s 24s
NAME SCHEDULE SUSPEND ACTIVE LAST SCHEDULE AGE
cronjob.batch/batch-app */5 * * * * False 0 33s 19d
# https://github.com/kubernetes-sigs/metrics-server/releases/download/v0.4.2/components.yaml 에서 다운로드 받음
[ec2-user@ip-10-10-1-195 ~]$ vim components.yaml
apiVersion: v1
kind: ServiceAccount
metadata:
labels:
k8s-app: metrics-server
name: metrics-server
namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
labels:
k8s-app: metrics-server
rbac.authorization.k8s.io/aggregate-to-admin: "true"
rbac.authorization.k8s.io/aggregate-to-edit: "true"
rbac.authorization.k8s.io/aggregate-to-view: "true"
name: system:aggregated-metrics-reader
rules:
- apiGroups:
- metrics.k8s.io
resources:
- pods
- nodes
verbs:
- get
- list
- watch
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
labels:
k8s-app: metrics-server
name: system:metrics-server
rules:
- apiGroups:
- ""
resources:
- pods
- nodes
- nodes/stats
- namespaces
- configmaps
verbs:
- get
- list
- watch
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
labels:
k8s-app: metrics-server
name: metrics-server-auth-reader
namespace: kube-system
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: extension-apiserver-authentication-reader
subjects:
- kind: ServiceAccount
name: metrics-server
namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
labels:
k8s-app: metrics-server
name: metrics-server:system:auth-delegator
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: system:auth-delegator
subjects:
- kind: ServiceAccount
name: metrics-server
namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
labels:
k8s-app: metrics-server
name: system:metrics-server
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: system:metrics-server
subjects:
- kind: ServiceAccount
name: metrics-server
namespace: kube-system
---
apiVersion: v1
kind: Service
metadata:
labels:
k8s-app: metrics-server
name: metrics-server
namespace: kube-system
spec:
ports:
- name: https
port: 443
protocol: TCP
targetPort: https
selector:
k8s-app: metrics-server
---
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
k8s-app: metrics-server
name: metrics-server
namespace: kube-system
spec:
selector:
matchLabels:
k8s-app: metrics-server
strategy:
rollingUpdate:
maxUnavailable: 0
template:
metadata:
labels:
k8s-app: metrics-server
spec:
containers:
- args:
- --cert-dir=/tmp
- --secure-port=4443
- --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname
- --kubelet-use-node-status-port
image: k8s.gcr.io/metrics-server/metrics-server:v0.4.2
imagePullPolicy: IfNotPresent
livenessProbe:
failureThreshold: 3
httpGet:
path: /livez
port: https
scheme: HTTPS
periodSeconds: 10
name: metrics-server
ports:
- containerPort: 4443
name: https
protocol: TCP
readinessProbe:
failureThreshold: 3
httpGet:
path: /readyz
port: https
scheme: HTTPS
periodSeconds: 10
securityContext:
readOnlyRootFilesystem: true
runAsNonRoot: true
runAsUser: 1000
volumeMounts:
- mountPath: /tmp
name: tmp-dir
nodeSelector:
kubernetes.io/os: linux
priorityClassName: system-cluster-critical
serviceAccountName: metrics-server
volumes:
- emptyDir: {}
name: tmp-dir
---
apiVersion: apiregistration.k8s.io/v1
kind: APIService
metadata:
labels:
k8s-app: metrics-server
name: v1beta1.metrics.k8s.io
spec:
group: metrics.k8s.io
groupPriorityMinimum: 100
insecureSkipTLSVerify: true
service:
name: metrics-server
namespace: kube-system
version: v1beta1
versionPriority: 100
[ec2-user@ip-10-10-1-195 ~]$ ll
total 45828
-rw-rw-r-- 1 ec2-user ec2-user 141 Oct 19 13:33 cloudwatch-namespace.yaml
-rw-rw-r-- 1 ec2-user ec2-user 3962 Nov 7 04:47 components.yaml
-rw-rw-r-- 1 ec2-user ec2-user 522 Oct 19 13:41 cwagent-configmap.yaml
-rw-rw-r-- 1 ec2-user ec2-user 2560 Oct 19 13:43 cwagent-daemonset.yaml
-rw-rw-r-- 1 ec2-user ec2-user 1120 Oct 19 13:35 cwagent-serviceaccount.yaml
drwxrwxr-x 2 ec2-user ec2-user 59 Oct 23 06:21 eks_log_exam
drwxrwxr-x 15 ec2-user ec2-user 315 Oct 18 13:49 k8s-aws-book
-rw-rw-r-- 1 ec2-user ec2-user 46907392 Oct 18 12:16 kubectl
[ec2-user@ip-10-10-1-195 ~]$ kubectl apply -f components.yaml
serviceaccount/metrics-server created
clusterrole.rbac.authorization.k8s.io/system:aggregated-metrics-reader created
clusterrole.rbac.authorization.k8s.io/system:metrics-server created
rolebinding.rbac.authorization.k8s.io/metrics-server-auth-reader created
clusterrolebinding.rbac.authorization.k8s.io/metrics-server:system:auth-delegator created
clusterrolebinding.rbac.authorization.k8s.io/system:metrics-server created
service/metrics-server created
deployment.apps/metrics-server created
apiservice.apiregistration.k8s.io/v1beta1.metrics.k8s.io created
# 메트릭 APi가 활성화 되어있는지 아래와 같은 명령어로 확인한다.
# .status.conditions[].status 값이 True로 되어 있으면 정상적으로 활성화가 된 것이다.
[ec2-user@ip-10-10-1-195 ~]$ kubectl get apiservice v1beta1.metrics.k8s.io -o yaml
apiVersion: apiregistration.k8s.io/v1
kind: APIService
metadata:
annotations:
kubectl.kubernetes.io/last-applied-configuration: |
{"apiVersion":"apiregistration.k8s.io/v1","kind":"APIService","metadata":{"annotations":{},"labels":{"k8s-app":"metrics-server"},"name":"v1beta1.metrics.k8s.io"},"spec":{"group":"metrics.k8s.io","groupPriorityMinimum":100,"insecureSkipTLSVerify":true,"service":{"name":"metrics-server","namespace":"kube-system"},"version":"v1beta1","versionPriority":100}}
creationTimestamp: "2021-11-07T04:49:56Z"
labels:
k8s-app: metrics-server
name: v1beta1.metrics.k8s.io
resourceVersion: "5626252"
selfLink: /apis/apiregistration.k8s.io/v1/apiservices/v1beta1.metrics.k8s.io
uid: 99d17797-d1af-4561-b454-611c0615b0e0
spec:
group: metrics.k8s.io
groupPriorityMinimum: 100
insecureSkipTLSVerify: true
service:
name: metrics-server
namespace: kube-system
port: 443
version: v1beta1
versionPriority: 100
status:
conditions:
- lastTransitionTime: "2021-11-07T04:50:04Z"
message: all checks passed
reason: Passed
status: "True" # 이 부분이 True 여야함
type: Available
STEP 2) HPA 리소스 생성
예제 어플리케이션에 일정한 부하가 발생했을때 자동으로 스케일링 동작으로 하도록 HPA 리소스를 생성한다. 예제 소스 파일의 autoscaling 디렉토리로 이동한 후 아래의 명령어를 실행한다.
[ec2-user@ip-10-10-1-195 ~]$ pwd
/home/ec2-user
[ec2-user@ip-10-10-1-195 ~]$ ll
total 45828
-rw-rw-r-- 1 ec2-user ec2-user 141 Oct 19 13:33 cloudwatch-namespace.yaml
-rw-rw-r-- 1 ec2-user ec2-user 3962 Nov 7 04:47 components.yaml
-rw-rw-r-- 1 ec2-user ec2-user 522 Oct 19 13:41 cwagent-configmap.yaml
-rw-rw-r-- 1 ec2-user ec2-user 2560 Oct 19 13:43 cwagent-daemonset.yaml
-rw-rw-r-- 1 ec2-user ec2-user 1120 Oct 19 13:35 cwagent-serviceaccount.yaml
drwxrwxr-x 2 ec2-user ec2-user 59 Oct 23 06:21 eks_log_exam
drwxrwxr-x 15 ec2-user ec2-user 315 Oct 18 13:49 k8s-aws-book
-rw-rw-r-- 1 ec2-user ec2-user 46907392 Oct 18 12:16 kubectl
[ec2-user@ip-10-10-1-195 ~]$ cd k8s-aws-book/
[ec2-user@ip-10-10-1-195 k8s-aws-book]$ ll
total 20
drwxrwxr-x 2 ec2-user ec2-user 98 Nov 3 13:09 autoscaling
drwxrwxr-x 5 ec2-user ec2-user 166 Oct 18 13:49 backend-app
drwxrwxr-x 7 ec2-user ec2-user 198 Oct 18 14:10 batch-app
drwxrwxr-x 4 ec2-user ec2-user 49 Oct 18 13:49 cicd
drwxrwxr-x 2 ec2-user ec2-user 85 Oct 18 13:49 column-deployment-update
drwxrwxr-x 2 ec2-user ec2-user 60 Oct 18 13:49 column-loadbalancer-https
drwxrwxr-x 2 ec2-user ec2-user 73 Oct 18 13:49 db-docker-compose
drwxrwxr-x 3 ec2-user ec2-user 4096 Oct 18 14:26 eks-env
drwxrwxr-x 13 ec2-user ec2-user 288 Oct 18 13:59 frontend-app
-rw-rw-r-- 1 ec2-user ec2-user 11357 Oct 18 13:49 LICENSE
drwxrwxr-x 3 ec2-user ec2-user 37 Oct 18 13:49 readme
-rw-rw-r-- 1 ec2-user ec2-user 3103 Oct 18 13:49 README.md
drwxrwxr-x 4 ec2-user ec2-user 75 Oct 18 14:10 sample-app-common
drwxrwxr-x 2 ec2-user ec2-user 181 Oct 18 13:49 security
[ec2-user@ip-10-10-1-195 k8s-aws-book]$ cd autoscaling
[ec2-user@ip-10-10-1-195 autoscaling]$ ll
total 12
-rw-rw-r-- 1 ec2-user ec2-user 3774 Nov 3 13:09 cluster-autoscaler.yaml
-rw-rw-r-- 1 ec2-user ec2-user 3961 Oct 18 13:49 components.yaml
-rw-rw-r-- 1 ec2-user ec2-user 272 Oct 18 13:49 horizontal-pod-autoscaler.yaml
[ec2-user@ip-10-10-1-195 autoscaling]$ vim horizontal-pod-autoscaler.yaml
apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
name: backend-app
namespace: eks-work
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: backend-app
minReplicas: 2
maxReplicas: 5
targetCPUUtilizationPercentage: 50
[ec2-user@ip-10-10-1-195 autoscaling]$ kubectl apply -f horizontal-pod-autoscaler.yaml
horizontalpodautoscaler.autoscaling/backend-app created
STEP 3) 동작 확인하기
아래 명령어와 같이 부하를 주기 위해 파드를 생성해 예제 어플리케이션 파드에 부하를 준다.
# 부하를 주기 위한 파드 실행
# 프롬프트가 표시되면 아래 명령어 입력
# while true; do wget -q -O- http://backend-app-service.eks-work.svc.cluster.local:8080/health; done
[ec2-user@ip-10-10-1-195 autoscaling]$ kubectl run -i --tty load-generator --image=busybox --rm -- sh
If you don't see a command prompt, try pressing enter.
/ # while true; do wget -q -O- http://backend-app-service.eks-work.svc.cluster.local:8080/health; done
{"status":"OK"}{"status":"OK"}{"status":"OK"} ..... {"status":"OK"}{"status":"OK"}{"status":"OK"} ...
# 위와 같이 while 문으로 계속 실행될 것이다.
# 그래서 터미널을 새로 띄워서 EKS 클러이언트로 다시 접속한 다음 아래와 같은 명령어를 실행한다.
# HPA 리소스 상태를 확인하면 파드의 CPU 사용률과 레플리카 수를 확인할 수 있다. 부하를 위와 같이 주는 동안에는
# 레플리카 수가 증가하며 Max로 지정한 5개까지 스케일 아웃하여 부하를 줄이면 다시 1개로 돌아간다. 레플리카 수나
# 리소스 상황에 따라 부하가 발생하지 않을 경우 부하를 주기 위한 파드를 더 생성하여 부하를 주면 된다.
# HPA를 확인하면 레플리카 수가 자동으로 변화함
[ec2-user@ip-10-10-1-195 ~]$ kubectl get hpa -w
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
backend-app Deployment/backend-app 2%/50% 2 5 2 11m
backend-app Deployment/backend-app 45%/50% 2 5 2 11m
backend-app Deployment/backend-app 67%/50% 2 5 2 12m
backend-app Deployment/backend-app 67%/50% 2 5 3 12m
backend-app Deployment/backend-app 60%/50% 2 5 3 13m
backend-app Deployment/backend-app 44%/50% 2 5 3 14m
backend-app Deployment/backend-app 81%/50% 2 5 3 15m
backend-app Deployment/backend-app 81%/50% 2 5 5 16m
backend-app Deployment/backend-app 59%/50% 2 5 5 16m
# 다섯개까지 늘어 났다면 위에 while문 날린 쉘 프롬프트에서 부하를 중지(부하를 주기 위한 명령을 실행한 쉘에서 ctrl + c)하고
# 다시 HPA 리소스 상태를 확인하면 레플리카 수가 다시 2개로 줄어드는 것을 알 수 있다.
backend-app Deployment/backend-app 2%/50% 2 5 5 17m
backend-app Deployment/backend-app 1%/50% 2 5 5 19m
backend-app Deployment/backend-app 2%/50% 2 5 5 20m
backend-app Deployment/backend-app 2%/50% 2 5 5 22m
backend-app Deployment/backend-app 2%/50% 2 5 3 22m
backend-app Deployment/backend-app 1%/50% 2 5 3 25m
backend-app Deployment/backend-app 1%/50% 2 5 3 27m
backend-app Deployment/backend-app 1%/50% 2 5 2 27m
backend-app Deployment/backend-app 2%/50% 2 5 2 29m
^C[ec2-user@ip-10-10-1-195 ~]$
# 다시 while문 날린 쉘 프롬프트로 이동해서 exti 명령어를 실행해서 종료해준다.
...
/ # exit
Session ended, resume using 'kubectl attach load-generator -c load-generator -i -t' command when the pod is running
pod "load-generator" deleted
[ec2-user@ip-10-10-1-195 autoscaling]$
HPA에서는 기본값으로 스케일 아웃 시간은 30초, 스케일 시간은 5분에 한번 동작한다.
STEP 4) 실습 리소스 삭제
# 오토스케일링 관리 리소스 삭제
[ec2-user@ip-10-10-1-195 autoscaling]$ cd /home/ec2-user/k8s-aws-book
[ec2-user@ip-10-10-1-195 k8s-aws-book]$ ll
total 20
drwxrwxr-x 2 ec2-user ec2-user 98 Nov 7 05:10 autoscaling
drwxrwxr-x 5 ec2-user ec2-user 166 Oct 18 13:49 backend-app
drwxrwxr-x 7 ec2-user ec2-user 198 Oct 18 14:10 batch-app
drwxrwxr-x 4 ec2-user ec2-user 49 Oct 18 13:49 cicd
drwxrwxr-x 2 ec2-user ec2-user 85 Oct 18 13:49 column-deployment-update
drwxrwxr-x 2 ec2-user ec2-user 60 Oct 18 13:49 column-loadbalancer-https
drwxrwxr-x 2 ec2-user ec2-user 73 Oct 18 13:49 db-docker-compose
drwxrwxr-x 3 ec2-user ec2-user 4096 Oct 18 14:26 eks-env
drwxrwxr-x 13 ec2-user ec2-user 288 Oct 18 13:59 frontend-app
-rw-rw-r-- 1 ec2-user ec2-user 11357 Oct 18 13:49 LICENSE
drwxrwxr-x 3 ec2-user ec2-user 37 Oct 18 13:49 readme
-rw-rw-r-- 1 ec2-user ec2-user 3103 Oct 18 13:49 README.md
drwxrwxr-x 4 ec2-user ec2-user 75 Oct 18 14:10 sample-app-common
drwxrwxr-x 2 ec2-user ec2-user 181 Oct 18 13:49 security
[ec2-user@ip-10-10-1-195 k8s-aws-book]$ kubectl delete -f autoscaling
serviceaccount "cluster-autoscaler" deleted
clusterrole.rbac.authorization.k8s.io "cluster-autoscaler" deleted
role.rbac.authorization.k8s.io "cluster-autoscaler" deleted
clusterrolebinding.rbac.authorization.k8s.io "cluster-autoscaler" deleted
rolebinding.rbac.authorization.k8s.io "cluster-autoscaler" deleted
deployment.apps "cluster-autoscaler" deleted
serviceaccount "metrics-server" deleted
clusterrole.rbac.authorization.k8s.io "system:aggregated-metrics-reader" deleted
clusterrole.rbac.authorization.k8s.io "system:metrics-server" deleted
rolebinding.rbac.authorization.k8s.io "metrics-server-auth-reader" deleted
clusterrolebinding.rbac.authorization.k8s.io "metrics-server:system:auth-delegator" deleted
clusterrolebinding.rbac.authorization.k8s.io "system:metrics-server" deleted
service "metrics-server" deleted
deployment.apps "metrics-server" deleted
apiservice.apiregistration.k8s.io "v1beta1.metrics.k8s.io" deleted
horizontalpodautoscaler.autoscaling "backend-app" deleted