EKS 어플리케이션 로그 관리방안 및 운영 노하우

2021-10-23

Data Engineering

Data_Engineering_TIL(20211023)

[학습자료]

“클라우드 네이티브를 위한 쿠버네티스 실전 프로젝트” 책을 읽고 정리한 내용입니다.

** 동양북스, 아이자와 고지&사토 가즈히코 지음, 박상욱 옮김

참고자료 URL : https://github.com/dybooksIT/k8s-aws-book

“Cloudwatch의 Container insights로 EKS 어플리케이션 상태 확인하기”에 이어서 공부한 내용을 정리한 내용임

** URL : https://minman2115.github.io/DE_TIL297

[EKS 어플리케이션 로그수집 기본개념]

어떤 어플리케이션이던 장애가 발생하면 원인을 파악하기 위해 다양한 로그를 확인할 것이다. EKS에서는 일반적으로 아래와 같은 구성으로 로그의 관리와 운영을 구현하게 된다.

1) 수집

Fluentd 컨테이너를 데몬셋으로 동작시키고 파드의 로그를 cloudwatch logs에 전송한다.

2) 저장

cloudwatch logs에 저장하도록 설정한다.

3) 모니터링

메트릭 디렉토리를 설정하고 cloudwatch 사용자 메트릭을 생성하여 그 메트릭의 경보를 생성한다.

4) 시각화

cloudwatch의 logs insights를 사용하여 대상로그를 분석하고 cloudwatch의 대시보드로 시각화 한다.

[ec2-user@ip-10-10-1-195 ~]$ kubectl logs -l app=backend-app
2021-10-23 14:15:48.491 [http-nio-8080-exec-7] INFO  k.s.presentation.api.HealthApi - Health GET API called.
2021-10-23 14:16:06.228 [http-nio-8080-exec-8] INFO  k.s.presentation.api.HealthApi - Health GET API called.
2021-10-23 14:16:18.491 [http-nio-8080-exec-3] INFO  k.s.presentation.api.HealthApi - Health GET API called.
2021-10-23 14:16:36.228 [http-nio-8080-exec-5] INFO  k.s.presentation.api.HealthApi - Health GET API called.
2021-10-23 14:16:48.491 [http-nio-8080-exec-9] INFO  k.s.presentation.api.HealthApi - Health GET API called.
2021-10-23 14:17:06.228 [http-nio-8080-exec-1] INFO  k.s.presentation.api.HealthApi - Health GET API called.
2021-10-23 14:17:18.491 [http-nio-8080-exec-4] INFO  k.s.presentation.api.HealthApi - Health GET API called.
2021-10-23 14:17:36.228 [http-nio-8080-exec-6] INFO  k.s.presentation.api.HealthApi - Health GET API called.
2021-10-23 14:17:48.491 [http-nio-8080-exec-9] INFO  k.s.presentation.api.HealthApi - Health GET API called.
2021-10-23 14:18:06.228 [http-nio-8080-exec-3] INFO  k.s.presentation.api.HealthApi - Health GET API called.
2021-10-23 14:15:36.950 [http-nio-8080-exec-1] INFO  k.s.presentation.api.HealthApi - Health GET API called.
2021-10-23 14:15:59.757 [http-nio-8080-exec-5] INFO  k.s.presentation.api.HealthApi - Health GET API called.
2021-10-23 14:16:06.950 [http-nio-8080-exec-7] INFO  k.s.presentation.api.HealthApi - Health GET API called.
2021-10-23 14:16:29.757 [http-nio-8080-exec-1] INFO  k.s.presentation.api.HealthApi - Health GET API called.
2021-10-23 14:16:36.950 [http-nio-8080-exec-2] INFO  k.s.presentation.api.HealthApi - Health GET API called.
2021-10-23 14:16:59.757 [http-nio-8080-exec-4] INFO  k.s.presentation.api.HealthApi - Health GET API called.
2021-10-23 14:17:06.951 [http-nio-8080-exec-6] INFO  k.s.presentation.api.HealthApi - Health GET API called.
2021-10-23 14:17:29.757 [http-nio-8080-exec-9] INFO  k.s.presentation.api.HealthApi - Health GET API called.
2021-10-23 14:17:36.950 [http-nio-8080-exec-1] INFO  k.s.presentation.api.HealthApi - Health GET API called.
2021-10-23 14:17:59.757 [http-nio-8080-exec-4] INFO  k.s.presentation.api.HealthApi - Health GET API called.

기존에 레거시 환경의 서버에서 어플리케이션을 배포하는 경우 어플리케이션이 출력하는 로그는 배포된 서버의 특정 디렉토리에 로그파일로 출력해서 저장하도록 설정하는 것이 일반적이다. 그러나 쿠버네티스에서 동작하는 어플리케이션 개념은 다르다. 왜냐하면 원래 어떤 호스트에서 동작 중인지 알수 없으며, 재시작 등을 하게 되면 호스트가 변경될 가능성이 있기 때문이다. 이 경우 기존방법으로는 어플리케이션 하나의 로그를 일관되게 확인하기 어렵다. 이런 특성 때문에 쿠버네티스 상의 어플리케이션 로그 출력은 표준 출력으로 구성하는 것이 좋다.

kubectl logs <pod name> 으로 파드가 표준출력한 정보를 참조할 수 있다. 또 kubectl logs -l app=backend-app과 같이 레이블을 지정하여 특정 레이블에 있는 모든 파드의 로그를 참조할 수 있다.

kubectl 명령어로 파드별로 로그를 참조하는 아키텍처는 아래와 같다.

kubectl logs -l app=backend-app 명령어로 backend-app의 모든 파드 로그를 참조할 수 있다. 이 파드들이 어떤 호스트에 존재하는지는 고려하지 않는다.

EKS에서 로그수집은 그러면 어떻게 하는가

AWS에서는 어플리케이션이 출력한 로그를 수집하는 구조로 cloudwatch logs라는 서비스를 제공한다. EC2에 Cloudwatch 에이전트를 설치하고 수집 대상로그 파일을 설정하면 AWS 로그 저장공간으로 전송하는 구조이다. 또는 아래 아키텍처와 같이 데몬셋으로 Fluentd 컨테이너를 이용해서 각각이 표준 출력한 로그를 확인하고 cloudwatch logs로 로그를 수집, 관리할 수 있다.

[실습]

위에 그림의 아키텍처를 실제로 구현해보자.

STEP 1) 아래 그림과 같이 EKS 데이터 노드의 IAM Role에 정책을 추가한다.

EKS에 배포한 데몬셋에서 cloudwatch logs에 로그를 전송하려면 데이터 플레인에 연결된 IAM role에 IAM policy를 연결해줘야 한다.

STEP 2) Cloudwatch용 네임스페이스 생성

[ec2-user@ip-10-10-1-195 eks_log_exam]$ kubectl get all
NAME                              READY   STATUS      RESTARTS   AGE
pod/backend-app-7fb899969-bv5bs   1/1     Running     0          31h
pod/backend-app-7fb899969-ct6mb   1/1     Running     1          30h
pod/batch-app-1634968800-qhfdt    0/1     Completed   0          12m
pod/batch-app-1634969100-4gqm2    0/1     Completed   0          7m58s
pod/batch-app-1634969400-d5j5d    0/1     Completed   0          2m57s

NAME                          TYPE           CLUSTER-IP     EXTERNAL-IP                                                                    PORT(S)          AGE
service/backend-app-service   LoadBalancer   10.100.92.76   xxxxxxxxx-xxxxxxxxx.ap-northeast-2.elb.amazonaws.com   8080:30609/TCP   4d16h

NAME                          READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/backend-app   2/2     2            2           4d16h

NAME                                    DESIRED   CURRENT   READY   AGE
replicaset.apps/backend-app-7fb899969   2         2         2       4d16h

NAME                             COMPLETIONS   DURATION   AGE
job.batch/batch-app-1634827800   0/1           39h        39h
job.batch/batch-app-1634968800   1/1           10s        12m
job.batch/batch-app-1634969100   1/1           10s        7m58s
job.batch/batch-app-1634969400   1/1           10s        2m57s

NAME                      SCHEDULE      SUSPEND   ACTIVE   LAST SCHEDULE   AGE
cronjob.batch/batch-app   */5 * * * *   False     0        3m4s            4d15h

[ec2-user@ip-10-10-1-195 eks_log_exam]$ pwd
/home/ec2-user/eks_log_exam

[ec2-user@ip-10-10-1-195 eks_log_exam]$ curl -O https://raw.githubusercontent.com/aws-samples/amazon-cloudwatch-container-insights/latest/k8s-deployment-manifest-templates/deployment-mode/daemonset/container-insights-monitoring/cloudwatch-namespace.yaml
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   141  100   141    0     0    398      0 --:--:-- --:--:-- --:--:--   398
                        
[ec2-user@ip-10-10-1-195 eks_log_exam]$ cat cloudwatch-namespace.yaml
# create amazon-cloudwatch namespace
apiVersion: v1
kind: Namespace
metadata:
  name: amazon-cloudwatch
  labels:
    name: amazon-cloudwatch
        
[ec2-user@ip-10-10-1-195 eks_log_exam]$ kubectl apply -f cloudwatch-namespace.yaml
namespace/amazon-cloudwatch created

STEP 3) Fluentd 컨테이너를 데몬셋으로 구동

# Fluentd 컨테이너는 클러스터 이름과 리전 이름을 컨피그맵에서 참조하는 방식이기 때문에 아래와 같은 명령어로 컨피그 맵을 먼저 생성한다.
[ec2-user@ip-10-10-1-195 eks_log_exam]$ kubectl create configmap cluster-info --from-literal=cluster.name=eks-work-cluster --from-literal=logs.region=ap-northeast-2 -n amazon-cloudwatch
configmap/cluster-info created

# Fluentd 컨테이너를 설치할 매니패스트 다운로드
[ec2-user@ip-10-10-1-195 eks_log_exam]$ curl -O https://raw.githubusercontent.com/aws-samples/amazon-cloudwatch-cotes/deployment-mode/daemonset/container-insights-monitoring/fluentd/fluentd.yaml
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 11157  100 11157    0     0  31516      0 --:--:-- --:--:-- --:--:-- 31516

[ec2-user@ip-10-10-1-195 eks_log_exam]$ cat fluentd.yaml
apiVersion: v1
kind: ServiceAccount
metadata:
  name: fluentd
  namespace: amazon-cloudwatch
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: fluentd-role
rules:
  - apiGroups: [""]
    resources:
      - namespaces
      - pods
      - pods/logs
    verbs: ["get", "list", "watch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: fluentd-role-binding
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: fluentd-role
subjects:
  - kind: ServiceAccount
    name: fluentd
    namespace: amazon-cloudwatch
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: fluentd-config
  namespace: amazon-cloudwatch
  labels:
    k8s-app: fluentd-cloudwatch
data:
  fluent.conf: |
    @include containers.conf
    @include systemd.conf
    @include host.conf

    <match fluent.**>
      @type null
    </match>
  containers.conf: |
    <source>
      @type tail
      @id in_tail_container_logs
      @label @containers
      path /var/log/containers/*.log
      exclude_path ["/var/log/containers/cloudwatch-agent*", "/var/log/containers/fluentd*"]
      pos_file /var/log/fluentd-containers.log.pos
      tag *
      read_from_head true
      <parse>
        @type json
        time_format %Y-%m-%dT%H:%M:%S.%NZ
      </parse>
    </source>

    <source>
      @type tail
      @id in_tail_cwagent_logs
      @label @cwagentlogs
      path /var/log/containers/cloudwatch-agent*
      pos_file /var/log/cloudwatch-agent.log.pos
      tag *
      read_from_head true
      <parse>
        @type json
        time_format %Y-%m-%dT%H:%M:%S.%NZ
      </parse>
    </source>

    <source>
      @type tail
      @id in_tail_fluentd_logs
      @label @fluentdlogs
      path /var/log/containers/fluentd*
      pos_file /var/log/fluentd.log.pos
      tag *
      read_from_head true
      <parse>
        @type json
        time_format %Y-%m-%dT%H:%M:%S.%NZ
      </parse>
    </source>

    <label @fluentdlogs>
      <filter **>
        @type kubernetes_metadata
        @id filter_kube_metadata_fluentd
        watch false
      </filter>

      <filter **>
        @type record_transformer
        @id filter_fluentd_stream_transformer
        <record>
          stream_name ${tag_parts[3]}
        </record>
      </filter>

      <match **>
        @type relabel
        @label @NORMAL
      </match>
    </label>

    <label @containers>
      <filter **>
        @type kubernetes_metadata
        @id filter_kube_metadata
        watch false
      </filter>

      <filter **>
        @type record_transformer
        @id filter_containers_stream_transformer
        <record>
          stream_name ${tag_parts[3]}
        </record>
      </filter>

      <filter **>
        @type concat
        key log
        multiline_start_regexp /^\S/
        separator ""
        flush_interval 5
        timeout_label @NORMAL
      </filter>

      <match **>
        @type relabel
        @label @NORMAL
      </match>
    </label>

    <label @cwagentlogs>
      <filter **>
        @type kubernetes_metadata
        @id filter_kube_metadata_cwagent
        watch false
      </filter>

      <filter **>
        @type record_transformer
        @id filter_cwagent_stream_transformer
        <record>
          stream_name ${tag_parts[3]}
        </record>
      </filter>

      <filter **>
        @type concat
        key log
        multiline_start_regexp /^\d{4}[-/]\d{1,2}[-/]\d{1,2}/
        separator ""
        flush_interval 5
        timeout_label @NORMAL
      </filter>

      <match **>
        @type relabel
        @label @NORMAL
      </match>
    </label>

    <label @NORMAL>
      <match **>
        @type cloudwatch_logs
        @id out_cloudwatch_logs_containers
        region "#{ENV.fetch('AWS_REGION')}"
        log_group_name "/aws/containerinsights/#{ENV.fetch('CLUSTER_NAME')}/application"
        log_stream_name_key stream_name
        remove_log_stream_name_key true
        auto_create_stream true
        <buffer>
          flush_interval 5
          chunk_limit_size 2m
          queued_chunks_limit_size 32
          retry_forever true
        </buffer>
      </match>
    </label>
  systemd.conf: |
    <source>
      @type systemd
      @id in_systemd_kubelet
      @label @systemd
      filters [{ "_SYSTEMD_UNIT": "kubelet.service" }]
      <entry>
        field_map {"MESSAGE": "message", "_HOSTNAME": "hostname", "_SYSTEMD_UNIT": "systemd_unit"}
        field_map_strict true
      </entry>
      path /var/log/journal
      <storage>
        @type local
        persistent true
        path /var/log/fluentd-journald-kubelet-pos.json
      </storage>
      read_from_head true
      tag kubelet.service
    </source>

    <source>
      @type systemd
      @id in_systemd_kubeproxy
      @label @systemd
      filters [{ "_SYSTEMD_UNIT": "kubeproxy.service" }]
      <entry>
        field_map {"MESSAGE": "message", "_HOSTNAME": "hostname", "_SYSTEMD_UNIT": "systemd_unit"}
        field_map_strict true
      </entry>
      path /var/log/journal
      <storage>
        @type local
        persistent true
        path /var/log/fluentd-journald-kubeproxy-pos.json
      </storage>
      read_from_head true
      tag kubeproxy.service
    </source>

    <source>
      @type systemd
      @id in_systemd_docker
      @label @systemd
      filters [{ "_SYSTEMD_UNIT": "docker.service" }]
      <entry>
        field_map {"MESSAGE": "message", "_HOSTNAME": "hostname", "_SYSTEMD_UNIT": "systemd_unit"}
        field_map_strict true
      </entry>
      path /var/log/journal
      <storage>
        @type local
        persistent true
        path /var/log/fluentd-journald-docker-pos.json
      </storage>
      read_from_head true
      tag docker.service
    </source>

    <label @systemd>
      <filter **>
        @type kubernetes_metadata
        @id filter_kube_metadata_systemd
        watch false
      </filter>

      <filter **>
        @type record_transformer
        @id filter_systemd_stream_transformer
        <record>
          stream_name ${tag}-${record["hostname"]}
        </record>
      </filter>

      <match **>
        @type cloudwatch_logs
        @id out_cloudwatch_logs_systemd
        region "#{ENV.fetch('AWS_REGION')}"
        log_group_name "/aws/containerinsights/#{ENV.fetch('CLUSTER_NAME')}/dataplane"
        log_stream_name_key stream_name
        auto_create_stream true
        remove_log_stream_name_key true
        <buffer>
          flush_interval 5
          chunk_limit_size 2m
          queued_chunks_limit_size 32
          retry_forever true
        </buffer>
      </match>
    </label>
  host.conf: |
    <source>
      @type tail
      @id in_tail_dmesg
      @label @hostlogs
      path /var/log/dmesg
      pos_file /var/log/dmesg.log.pos
      tag host.dmesg
      read_from_head true
      <parse>
        @type syslog
      </parse>
    </source>

    <source>
      @type tail
      @id in_tail_secure
      @label @hostlogs
      path /var/log/secure
      pos_file /var/log/secure.log.pos
      tag host.secure
      read_from_head true
      <parse>
        @type syslog
      </parse>
    </source>

    <source>
      @type tail
      @id in_tail_messages
      @label @hostlogs
      path /var/log/messages
      pos_file /var/log/messages.log.pos
      tag host.messages
      read_from_head true
      <parse>
        @type syslog
      </parse>
    </source>

    <label @hostlogs>
      <filter **>
        @type kubernetes_metadata
        @id filter_kube_metadata_host
        watch false
      </filter>

      <filter **>
        @type record_transformer
        @id filter_containers_stream_transformer_host
        <record>
          stream_name ${tag}-${record["host"]}
        </record>
      </filter>

      <match host.**>
        @type cloudwatch_logs
        @id out_cloudwatch_logs_host_logs
        region "#{ENV.fetch('AWS_REGION')}"
        log_group_name "/aws/containerinsights/#{ENV.fetch('CLUSTER_NAME')}/host"
        log_stream_name_key stream_name
        remove_log_stream_name_key true
        auto_create_stream true
        <buffer>
          flush_interval 5
          chunk_limit_size 2m
          queued_chunks_limit_size 32
          retry_forever true
        </buffer>
      </match>
    </label>
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: fluentd-cloudwatch
  namespace: amazon-cloudwatch
spec:
  selector:
    matchLabels:
      k8s-app: fluentd-cloudwatch
  template:
    metadata:
      labels:
        k8s-app: fluentd-cloudwatch
      annotations:
        configHash: 8915de4cf9c3551a8dc74c0137a3e83569d28c71044b0359c2578d2e0461825
    spec:
      serviceAccountName: fluentd
      terminationGracePeriodSeconds: 30
      # Because the image's entrypoint requires to write on /fluentd/etc but we mount configmap there which is read-only,
      # this initContainers workaround or other is needed.
      # See https://github.com/fluent/fluentd-kubernetes-daemonset/issues/90
      initContainers:
        - name: copy-fluentd-config
          image: busybox
          command: ['sh', '-c', 'cp /config-volume/..data/* /fluentd/etc']
          volumeMounts:
            - name: config-volume
              mountPath: /config-volume
            - name: fluentdconf
              mountPath: /fluentd/etc
        - name: update-log-driver
          image: busybox
          command: ['sh','-c','']
      containers:
        - name: fluentd-cloudwatch
          image: fluent/fluentd-kubernetes-daemonset:v1.7.3-debian-cloudwatch-1.0
          env:
            - name: AWS_REGION
              valueFrom:
                configMapKeyRef:
                  name: cluster-info
                  key: logs.region
            - name: CLUSTER_NAME
              valueFrom:
                configMapKeyRef:
                  name: cluster-info
                  key: cluster.name
            - name: CI_VERSION
              value: "k8s/1.3.8"
          resources:
            limits:
              memory: 400Mi
            requests:
              cpu: 100m
              memory: 200Mi
          volumeMounts:
            - name: config-volume
              mountPath: /config-volume
            - name: fluentdconf
              mountPath: /fluentd/etc
            - name: varlog
              mountPath: /var/log
            - name: varlibdockercontainers
              mountPath: /var/lib/docker/containers
              readOnly: true
            - name: runlogjournal
              mountPath: /run/log/journal
              readOnly: true
            - name: dmesg
              mountPath: /var/log/dmesg
              readOnly: true
      volumes:
        - name: config-volume
          configMap:
            name: fluentd-config
        - name: fluentdconf
          emptyDir: {}
        - name: varlog
          hostPath:
            path: /var/log
        - name: varlibdockercontainers
          hostPath:
            path: /var/lib/docker/containers
        - name: runlogjournal
          hostPath:
            path: /run/log/journal
        - name: dmesg
          hostPath:
            path: /var/log/dmesg

# fluentd container를 데몬셋으로 구동
[ec2-user@ip-10-10-1-195 eks_log_exam]$ kubectl apply -f fluentd.yaml
serviceaccount/fluentd created
clusterrole.rbac.authorization.k8s.io/fluentd-role created
clusterrolebinding.rbac.authorization.k8s.io/fluentd-role-binding created
configmap/fluentd-config created
daemonset.apps/fluentd-cloudwatch created

위와 같이 정상적으로 실행이 되었다면 cloudwatch logs에 아래와 같이 로그그룹이 생성된다.

/aws/containerinsights/eks-work-cluster/application

/aws/containerinsights/eks-work-cluster/host

/aws/containerinsights/eks-work-cluster/dataplane

또한 파드 각각이 출력한 로그는 application으로 끝나는 로그그룹으로 전송하게 된다.

cloudwatch logs에서는 아래 그림과 같이 로그그룹 단위로 로그 저장기간을 설정할 수 있다. application,host,dataplane 단위로 설정이 가능하다는 뜻이다.

[cloudwatch 경보를 사용한 에러 확인]

어플리케이션에서 어떤 이유로 에러가 발생하면 통지를 보내고 싶을 것이다. 이런 경우 메트릭 필터에서 에러 대상 로그를 추출하고 그것을 cloudwatch 메트릭으로 등록해 메트릭 경보를 생성하면 통지를 보낼 수 있다. 예를 들어서 ERROR라는 로그가 출력된 경우에 통지를 보내고 싶으면 아래와 같은 방법으로 할 수 있다.

1) 메트릭 필터 조건을 ‘ERROR’로 설정하고 출력 횟수를 메트릭에 등록

2) 출력 횟수 메트릭 통보 설정

(예를 들어서 5분간 1회 이상 발생하면 통지 등)

메트릭 필터도 로그그룹 단위로 설정하기 때문에 어떤 어플리케이션에서 발생한 에러인지에 대해 미리 추출조건을 포함시키는 것이 좋다. 예를 들어서 A 어플리케이션, B 어플리케이션이 있을경우 ERROR라는 문자열만 추출하면 통지 내용만으로 어떤 어플리케이션에서 에러가 발생했는지 판단할 수 없다. 이런 경우를 위해 메트릭 필터에는 어플리케이션 A, 어플리케이션 B 에러와 같이 별도로 필터와 경보를 생성할 수 있다. 이렇게 설정하면 통지할때 어떤 어플리케이션에서 발생한 에러인지 알 수 있다.

서비스별로 세분화하여 메트릭 필터를 설정한 아키텍처 예시

cloudwatch logs 이벤트 검색을 이용한 디버그 예시

예시 어플리케이션 로그는 표준출력으로 되어 있지만 cloudwatch logs의 json 형태로 저장된다. 예를들면 batch-app의 에러를 검색할때는 cloudwatch 페이지에서 아래와 같이 검색한다.

** backend-app의 ERROR를 검색하는 쿼리 예시

{ $.log != "[*" && $.kubernetes.container_name = "backend-app" && ( $.log = "*WARN*" || $.log = "*error*" )}

이벤트 검색 조건의 예시로 어플리케이션 로그 AND 어플리케이션 이름이 backend-app이 AND “WARN” 이나 “error”가 포함된다라는 조건임

엑세스 로그 확인하기

시스템을 운영할때 어느 정도의 접속이 있었는지, 어떤 어플리케이션에서 얼마만큼의 에러가 발생하고 있느지 등은 중요한 지표이다. logs insights를 이용해 아래와 같은 쿼리로 정보를 수집할 수 있다.

위에 그림은 예제 어플리케이션에 어느정도의 접속이 있었는지 확인하는 쿼리

stats count(*) as ACCESS_COUNT | filter (kubernetes.container_name = "backend-app") | filter (log not like /^\[/ and (log like /Health GET API called./ or log like /REGION GET ALL API/ or log like /LOCATION LIST BY REGION ID API/)

위에 그림은 어떤 어플리케이션에서 얼마만큼 에러가 발생하고 있는지 확인

stats count(*) as ERROR_COUNT by kubernetes.container_name as APP | filter (log not like /^\[/ and (log like /WARN/ or log like /error/))

위와 같이 쿼리한 결과를 cloudwatch 대시보드에 저장도 가능하다. 수집할 정보를 매번 하나씩 쿼리로 입력하는 것은 비효율적이기 때문에 여러 쿼리 결과를 대시보드에 저장해두면 전체적인 상황을 파악이 가능하다.

실습종료후 리소스 삭제

[ec2-user@ip-10-10-1-195 eks_log_exam]$ pwd
/home/ec2-user/eks_log_exam

[ec2-user@ip-10-10-1-195 eks_log_exam]$ ll
total 16
-rw-rw-r-- 1 ec2-user ec2-user   141 Oct 23 06:14 cloudwatch-namespace.yaml
-rw-rw-r-- 1 ec2-user ec2-user 11157 Oct 23 06:24 fluentd.yaml

# 컨피그맵 삭제
[ec2-user@ip-10-10-1-195 eks_log_exam]$ kubectl delete cm cluster-info -n amazon-cloudwatch
configmap "cluster-info" deleted

# container insights 관련 리소스 삭제 (야믈 파일을 다운로드한 디렉토리에서 실행)
[ec2-user@ip-10-10-1-195 eks_log_exam]$ kubectl delete -f .
namespace "amazon-cloudwatch" deleted
serviceaccount "fluentd" deleted
clusterrole.rbac.authorization.k8s.io "fluentd-role" deleted
clusterrolebinding.rbac.authorization.k8s.io "fluentd-role-binding" deleted
configmap "fluentd-config" deleted
daemonset.apps "fluentd-cloudwatch" deleted

 Airflow 클러스터 구성과정 요약 IAM role을 EKS pod별로 설정하기 