Airflow DAG log를 S3에 저장할 수 있도록 설정하기

2020-10-16

.

Data_Engineering_TIL(20201016)

실습시 참고자료 : AIrflow 공식문서

** URL : https://airflow.apache.org/docs/stable/howto/write-logs.html

step 0) Airflow server에 접속해서 aws configure 명령어를 실행하고 credencial key를 입력하여 나의 AWS 계정을 인증해준다.

step 1) airflow.cfg 를 아래와 같이 수정해준다.

[ec2-user@ip-10-1-10-227 airflow]$ ls
airflow.cfg  airflow.db  airflow-webserver.pid  dags  logs  unittests.cfg

[ec2-user@ip-10-1-10-227 airflow]$ sudo vim airflow.cfg
[core]
# Airflow can store logs remotely in AWS S3, Google Cloud Storage or Elastic Search.
# Set this to True if you want to enable remote logging.
remote_logging = True

# Users must supply an Airflow connection id that provides access to the storage
# location.
remote_log_conn_id = MyS3Conn
remote_base_log_folder = s3://pms-bucket-test/airflow_log/
encrypt_s3_logs = False

step 2) airflow의 어떤 DAG를 실행하고 아래와 같은 명령어를 실행했을때 우리가 원하는 s3에 로그가 쌓이는지 확인해본다.

[ec2-user@ip-10-1-10-227 airflow]$ aws s3 ls s3://pms-bucket-test/airflow_log/ --recursive
2020-10-16 05:22:17       2132 airflow_log/emr_job_flow_test/actual_step/2020-10-16T05:22:03.655192+00:00/1.log
2020-10-16 05:22:12       2216 airflow_log/emr_job_flow_test/create_job_flow/2020-10-16T05:22:03.655192+00:00/1.log
2020-10-16 05:22:17       2110 airflow_log/emr_job_flow_test/pre_step/2020-10-16T05:22:03.655192+00:00/1.log