EC2에 single node 형태로 airflow(2.1.4) 서버 구현하기
.
Data_Engineering_TIL(20210925)
[학습자료]
“Apache Airflow” 최정민님 깃허브 자료를 공부하고 실습한 내용입니다.
URL : https://github.com/cjungm/with-aws/tree/main/airflow/chapter_1
[학습내용]
Airflow를 다중노드로 구성할때의 아키텍처는 일반적으로 아래와 같이 구현할 수 있다.
이번에는 아래에 그림과 같이 싱글노드에 Airflow 서버를 구현해보자.
- Airflow 싱글노드 구현 실습
STEP 1) EC2 생성
먼저 t3.xlarge 이상의 적당한 사양으로 Amazon Linux2 EC2를 생성한다. EBS 볼륨도 15기가 이상 넉넉하게 잡아주자.
STEP 2) airflow 환경변수 설정
EC2를 생성하고 접속해서 아래와 같이 기본적으로 환경변수 AIRFLOW_HOME
을 설정한다.
__| __|_ )
_| ( / Amazon Linux 2 AMI
___|\___|___|
https://aws.amazon.com/amazon-linux-2/
11 package(s) needed for security, out of 35 available
Run "sudo yum update" to apply all updates.
[ec2-user@ip-10-10-1-166 ~]$ export AIRFLOW_HOME=/home/ec2-user/airflow
STEP 3) Redis 설치 및 설정
celeryexecutor를 사용을 위해서 redis를 설치하고 설정을 해줘야 한다. celeryexecutor는 분산처리를 위해 Broker 역할을 할 수 있는 Queue service가 필요하다. (RabbitMQ, Redis, Amazon SQS등)
# yum 업데이트
[ec2-user@ip-10-10-1-166 ~]$ sudo yum update -y
# Remi 저장소 활성화
[ec2-user@ip-10-10-1-166 ~]$ sudo amazon-linux-extras install epel -y
...
Dependencies Resolved
=======================================================================================================================================
Package Arch Version Repository Size
=======================================================================================================================================
Installing:
epel-release noarch 7-11 amzn2extra-epel 15 k
Transaction Summary
=======================================================================================================================================
Install 1 Package
Total download size: 15 k
Installed size: 24 k
Downloading packages:
epel-release-7-11.noarch.rpm | 15 kB 00:00:00
Running transaction check
Running transaction test
Transaction test succeeded
Running transaction
Installing : epel-release-7-11.noarch 1/1
Verifying : epel-release-7-11.noarch 1/1
Installed:
epel-release.noarch 0:7-11
Complete!
0 ansible2 available \
...
[ec2-user@ip-10-10-1-166 ~]$ sudo yum install http://rpms.remirepo.net/enterprise/remi-release-7.rpm -y
...
=======================================================================================================================================
Package Arch Version Repository Size
=======================================================================================================================================
Installing:
remi-release noarch 7.9-2.el7.remi /remi-release-7 32 k
Transaction Summary
=======================================================================================================================================
Install 1 Package
Total size: 32 k
Installed size: 32 k
Downloading packages:
Running transaction check
Running transaction test
Transaction test succeeded
Running transaction
Installing : remi-release-7.9-2.el7.remi.noarch 1/1
Verifying : remi-release-7.9-2.el7.remi.noarch 1/1
Installed:
remi-release.noarch 0:7.9-2.el7.remi
Complete!
# Redis 패키지 yum으로 설치 및 실행
[ec2-user@ip-10-10-1-166 ~]$ sudo yum install redis -y
...
===========================================================================================================================================================
Package Arch Version Repository Size
===========================================================================================================================================================
Installing:
redis x86_64 3.2.12-2.el7 epel 544 k
Installing for dependencies:
jemalloc x86_64 3.6.0-1.el7 epel 105 k
Transaction Summary
===========================================================================================================================================================
Install 1 Package (+1 Dependent package)
Total download size: 648 k
Installed size: 1.7 M
Downloading packages:
warning: /var/cache/yum/x86_64/2/epel/packages/jemalloc-3.6.0-1.el7.x86_64.rpm: Header V3 RSA/SHA256 Signature, key ID xxxxxxxxxxxxxxxxxxxxxxxxx: NOKEY103 kB --:--:-- ETA
Public key for jemalloc-3.6.0-1.el7.x86_64.rpm is not installed
(1/2): jemalloc-3.6.0-1.el7.x86_64.rpm | 105 kB 00:00:02
(2/2): redis-3.2.12-2.el7.x86_64.rpm | 544 kB 00:00:01
-----------------------------------------------------------------------------------------------------------------------------------------------------------
...
Installed:
redis.x86_64 0:3.2.12-2.el7
Dependency Installed:
jemalloc.x86_64 0:3.6.0-1.el7
Complete!
[ec2-user@ip-10-10-1-166 ~]$ sudo service redis start
Redirecting to /bin/systemctl start redis.service
[ec2-user@ip-10-10-1-166 ~]$ sudo vim /etc/redis.conf
#############################################################
# 원본내용
#############################################################
...
daemonize no
...
bind 127.0.0.1
...
#############################################################
# 수정후
#############################################################
...
daemonize yes
...
bind 0.0.0.0
...
# 부팅시 자동 실행하기 위한 설정
[ec2-user@ip-10-10-1-166 ~]$ sudo chkconfig redis on
Note: Forwarding request to 'systemctl enable redis.service'.
Created symlink from /etc/systemd/system/multi-user.target.wants/redis.service to /usr/lib/systemd/system/redis.service.
[ec2-user@ip-10-10-1-166 ~]$ sudo chkconfig redis-sentinel on
Note: Forwarding request to 'systemctl enable redis-sentinel.service'.
Created symlink from /etc/systemd/system/multi-user.target.wants/redis-sentinel.service to /usr/lib/systemd/system/redis-sentinel.service.
[ec2-user@ip-10-10-1-166 ~]$ systemctl list-unit-files | grep redis
redis-sentinel.service enabled
redis.service enabled
[ec2-user@ip-10-10-1-166 ~]$ sudo service redis start
Redirecting to /bin/systemctl start redis.service
[ec2-user@ip-10-10-1-166 ~]$ sudo service redis status
Redirecting to /bin/systemctl status redis.service
● redis.service - Redis persistent key-value database
Loaded: loaded (/usr/lib/systemd/system/redis.service; enabled; vendor preset: disabled)
Drop-In: /etc/systemd/system/redis.service.d
└─limit.conf
Active: active (running) since Sat 2021-09-25 05:45:00 UTC; 5min ago
Main PID: 7447 (redis-server)
CGroup: /system.slice/redis.service
└─7447 /usr/bin/redis-server 127.0.0.1:6379
Sep 25 05:45:00 ip-10-10-1-166.ap-northeast-2.compute.internal systemd[1]: Starting Redis persistent key-value database...
Sep 25 05:45:00 ip-10-10-1-166.ap-northeast-2.compute.internal systemd[1]: Started Redis persistent key-value database.
[ec2-user@ip-10-10-1-166 ~]$ ps -ef | grep redis
redis 7447 1 0 05:45 ? 00:00:00 /usr/bin/redis-server 127.0.0.1:6379
ec2-user 7537 2808 0 05:50 pts/0 00:00:00 grep --color=auto redis
STEP 4) MySQL 설치 및 설정
# yum 명령어로 설치시 mariadb가 설치되는데 이는 권장 DB가 아니다.
# 하단의 공식 문서에 따르면 MySQL 5.7 또는 8을 권장하고 있으므로 다른 방법으로 설치해야한다.
# https://airflow.apache.org/docs/apache-airflow/stable/installation.html#prerequisites
#[ec2-user@ip-10-10-1-166 ~]$ sudo yum install mysql -y
#
# ...
#
# Dependencies Resolved
# ===========================================================================================================================================================
# Package Arch Version Repository Size
# ===========================================================================================================================================================
# Installing:
# mariadb x86_64 1:5.5.68-1.amzn2 amzn2-core 8.8 M
# Transaction Summary
# ===========================================================================================================================================================
# Install 1 Package
# Total download size: 8.8 M
# Installed size: 49 M
# Downloading packages:
# mariadb-5.5.68-1.amzn2.x86_64.rpm | 8.8 MB 00:00:00
# Running transaction check
# Running transaction test
# Transaction test succeeded
# Running transaction
# Installing : 1:mariadb-5.5.68-1.amzn2.x86_64 1/1
# Verifying : 1:mariadb-5.5.68-1.amzn2.x86_64 1/1
# Installed:
# mariadb.x86_64 1:5.5.68-1.amzn2
# Complete!
# [ec2-user@ip-10-10-1-166 ~]$ sudo yum remove mariadb.x86_64 1:5.5.68-1.amzn2 -y
#
# ...
#
# ===========================================================================================================================================================
# Package Arch Version Repository Size
# ===========================================================================================================================================================
# Removing:
# mariadb x86_64 1:5.5.68-1.amzn2 @amzn2-core 49 M
# Transaction Summary
# ===========================================================================================================================================================
# Remove 1 Package
# Installed size: 49 M
# Downloading packages:
# Running transaction check
# Running transaction test
# Transaction test succeeded
# Running transaction
# Erasing : 1:mariadb-5.5.68-1.amzn2.x86_64 1/1
# Verifying : 1:mariadb-5.5.68-1.amzn2.x86_64 1/1
# Removed:
# mariadb.x86_64 1:5.5.68-1.amzn2
# Complete!
# MySQL 5.7 설치
[ec2-user@ip-10-10-1-166 ~]$ sudo yum install https://dev.mysql.com/get/mysql57-community-release-el7-11.noarch.rpm -y
Loaded plugins: extras_suggestions, langpacks, priorities, update-motd
mysql57-community-release-el7-11.noarch.rpm | 25 kB 00:00:00
Examining /var/tmp/yum-root-WWLUGl/mysql57-community-release-el7-11.noarch.rpm: mysql57-community-release-el7-11.noarch
Marking /var/tmp/yum-root-WWLUGl/mysql57-community-release-el7-11.noarch.rpm to be installed
Resolving Dependencies
--> Running transaction check
---> Package mysql57-community-release.noarch 0:el7-11 will be installed
--> Finished Dependency Resolution
Dependencies Resolved
===========================================================================================================================================================
Package Arch Version Repository Size
===========================================================================================================================================================
Installing:
mysql57-community-release noarch el7-11 /mysql57-community-release-el7-11.noarch 31 k
Transaction Summary
===========================================================================================================================================================
Install 1 Package
Total size: 31 k
Installed size: 31 k
Downloading packages:
Running transaction check
Running transaction test
Transaction test succeeded
Running transaction
Installing : mysql57-community-release-el7-11.noarch 1/1
Verifying : mysql57-community-release-el7-11.noarch 1/1
Installed:
mysql57-community-release.noarch 0:el7-11
Complete!
[ec2-user@ip-10-10-1-166 ~]$ sudo yum install mysql-community-server -y
...
Installed:
mysql-community-libs.x86_64 0:5.7.35-1.el7 mysql-community-libs-compat.x86_64 0:5.7.35-1.el7 mysql-community-server.x86_64 0:5.7.35-1.el7
Dependency Installed:
mysql-community-client.x86_64 0:5.7.35-1.el7 mysql-community-common.x86_64 0:5.7.35-1.el7 ncurses-compat-libs.x86_64 0:6.0-8.20170212.amzn2.1.3
Replaced:
mariadb-libs.x86_64 1:5.5.68-1.amzn2
Complete!
[ec2-user@ip-10-10-1-166 ~]$ sudo yum install mysql mysql-server mysql-libs mysql-devel -y
Loaded plugins: extras_suggestions, langpacks, priorities, update-motd
260 packages excluded due to repository priority protections
Package mysql-community-client-5.7.35-1.el7.x86_64 already installed and latest version
Package mysql-community-server-5.7.35-1.el7.x86_64 already installed and latest version
Package mysql-community-libs-5.7.35-1.el7.x86_64 already installed and latest version
Resolving Dependencies
--> Running transaction check
---> Package mysql-community-devel.x86_64 0:5.7.35-1.el7 will be installed
--> Finished Dependency Resolution
Dependencies Resolved
===========================================================================================================================================================
Package Arch Version Repository Size
===========================================================================================================================================================
Installing:
mysql-community-devel x86_64 5.7.35-1.el7 mysql57-community 3.9 M
Transaction Summary
===========================================================================================================================================================
Install 1 Package
Total download size: 3.9 M
Installed size: 24 M
Downloading packages:
mysql-community-devel-5.7.35-1.el7.x86_64.rpm | 3.9 MB 00:00:00
Running transaction check
Running transaction test
Transaction test succeeded
Running transaction
Installing : mysql-community-devel-5.7.35-1.el7.x86_64 1/1
Verifying : mysql-community-devel-5.7.35-1.el7.x86_64 1/1
Installed:
mysql-community-devel.x86_64 0:5.7.35-1.el7
Complete!
# MySQL 데몬 실행
[ec2-user@ip-10-10-1-166 ~]$ sudo service mysqld start
Redirecting to /bin/systemctl start mysqld.service
[ec2-user@ip-10-10-1-166 ~]$ sudo service mysqld status
Redirecting to /bin/systemctl status mysqld.service
● mysqld.service - MySQL Server
Loaded: loaded (/usr/lib/systemd/system/mysqld.service; enabled; vendor preset: disabled)
Active: active (running) since Sat 2021-09-25 05:57:58 UTC; 25s ago
Docs: man:mysqld(8)
http://dev.mysql.com/doc/refman/en/using-systemd.html
Process: 7870 ExecStart=/usr/sbin/mysqld --daemonize --pid-file=/var/run/mysqld/mysqld.pid $MYSQLD_OPTS (code=exited, status=0/SUCCESS)
Process: 7820 ExecStartPre=/usr/bin/mysqld_pre_systemd (code=exited, status=0/SUCCESS)
Main PID: 7873 (mysqld)
CGroup: /system.slice/mysqld.service
└─7873 /usr/sbin/mysqld --daemonize --pid-file=/var/run/mysqld/mysqld.pid
Sep 25 05:57:54 ip-10-10-1-166.ap-northeast-2.compute.internal systemd[1]: Starting MySQL Server...
Sep 25 05:57:58 ip-10-10-1-166.ap-northeast-2.compute.internal systemd[1]: Started MySQL Server.
여기까지 정상적으로 실행이 된다면 MySQL에 대한 기본적인 설치가 완료되었다.
Airflow에서 MySQL을 DB로 사용하기 위해서는 Airflow가 MySQL에 Connection을 만들 수 있는 계정정보가 필요하다. 따라서 Airflow가 사용할 Database와 해당 DB에 접근할 권한이 있는 계정을 생성해야 한다. Airflow Database가 없다면 Airflow 실행에 필요한 테이블들을 만들지 못할것이다. 먼저 root 권한으로 접속해서 계정 생성 후 데이터베이스를 생성한다.
# 최초 접근 시 root User에 대한 비밀번호를 모르기 때문에 접근이 불가능하다.
#[ec2-user@ip-10-10-1-166 ~]$ mysql -u root -p
#Enter password:
#ERROR 1045 (28000): Access denied for user 'root'@'localhost' (using password: YES)
# MySQL은 설치 시 root User에 대한 임시 비밀번호를 생성한다. 하단 방법에 따라 임시 비밀번호를 획득한다.
[ec2-user@ip-10-10-1-166 ~]$ sudo cat /var/log/mysqld.log | grep root
2021-09-25T05:57:55.854095Z 1 [Note] A temporary password is generated for root@localhost: xxxxxxxxxxxxx
2021-09-25T05:59:51.694673Z 2 [Note] Access denied for user 'root'@'localhost' (using password: YES)
# 위에 비번을 카피하고 다시 루트권한으로 mysql에 접근한다.
[ec2-user@ip-10-10-1-166 ~]$ mysql -u root -p
Enter password: xxxxxxxxxxxxx
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 3
Server version: 5.7.35
Copyright (c) 2000, 2021, Oracle and/or its affiliates.
Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
# mysql에는 비밀번호 정책이 상 중 하로 나누어져 있는데 기본 설정은 MEDIUM이다. 이를 LOW로 변경해줘야 한다.
mysql> set global validate_password_policy='LOW';
Query OK, 0 rows affected (0.00 sec)
# 루트권한 비번을 변경한다.
mysql> alter user 'root'@'localhost' identified by 'mypasswd12!';
Query OK, 0 rows affected (0.01 sec)
mysql> use mysql;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A
Database changed
mysql> SET GLOBAL explicit_defaults_for_timestamp = 1;
Query OK, 0 rows affected (0.00 sec)
mysql> flush privileges;
Query OK, 0 rows affected (0.00 sec)
# Airflow 계정 및 Database를 생성한다.
mysql> create user 'airflow'@'localhost' identified by 'mypasswd12!';
Query OK, 0 rows affected (0.00 sec)
mysql> grant all privileges on *.* to 'airflow'@'localhost';
Query OK, 0 rows affected (0.00 sec)
mysql> grant all privileges on airflow.* to 'airflow'@'localhost';
Query OK, 0 rows affected (0.00 sec)
mysql> create database airflow;
Query OK, 1 row affected (0.00 sec)
mysql> flush privileges;
Query OK, 0 rows affected (0.00 sec)
mysql> exit;
Bye
STEP 5) Airflow 설치 및 설정
Airflow와 Airflow에서 사용할 Plugin(mysql, aws, celery, redis)을 설치하고 설정해준다.
[ec2-user@ip-10-10-1-166 ~]$ sudo yum install zlib-devel bzip2-devel openssl-devel ncurses-devel sqlite-devel python3-devel.x86_64 cyrus-sasl-devel.x86_64 -y
...
Installed:
bzip2-devel.x86_64 0:1.0.6-13.amzn2.0.3 cyrus-sasl-devel.x86_64 0:2.1.26-23.amzn2 ncurses-devel.x86_64 0:6.0-8.20170212.amzn2.1.3
openssl-devel.x86_64 1:1.0.2k-19.amzn2.0.7 python3-devel.x86_64 0:3.7.10-1.amzn2.0.1 sqlite-devel.x86_64 0:3.7.17-8.amzn2.1.1
zlib-devel.x86_64 0:1.2.7-18.amzn2
Dependency Installed:
cyrus-sasl.x86_64 0:2.1.26-23.amzn2 dwz.x86_64 0:0.11-3.amzn2.0.3 keyutils-libs-devel.x86_64 0:1.5.8-3.amzn2.0.2
krb5-devel.x86_64 0:1.15.1-37.amzn2.2.2 libcom_err-devel.x86_64 0:1.42.9-19.amzn2 libkadm5.x86_64 0:1.15.1-37.amzn2.2.2
libselinux-devel.x86_64 0:2.5-12.amzn2.0.2 libsepol-devel.x86_64 0:2.5-8.1.amzn2.0.2 libverto-devel.x86_64 0:0.2.5-4.amzn2.0.2
ncurses-c++-libs.x86_64 0:6.0-8.20170212.amzn2.1.3 pcre-devel.x86_64 0:8.32-17.amzn2.0.2 perl-srpm-macros.noarch 0:1-8.amzn2.0.1
python-rpm-macros.noarch 0:3-60.amzn2.0.1 python-srpm-macros.noarch 0:3-60.amzn2.0.1 python3-rpm-macros.noarch 0:3-60.amzn2.0.1
system-rpm-config.noarch 0:9.1.0-76.amzn2.0.10
Complete!
[ec2-user@ip-10-10-1-166 ~]$ sudo yum install gcc -y
...
Installed:
gcc.x86_64 0:7.3.1-13.amzn2
Dependency Installed:
cpp.x86_64 0:7.3.1-13.amzn2 glibc-devel.x86_64 0:2.26-54.amzn2 glibc-headers.x86_64 0:2.26-54.amzn2
kernel-headers.x86_64 0:4.14.246-187.474.amzn2 libatomic.x86_64 0:7.3.1-13.amzn2 libcilkrts.x86_64 0:7.3.1-13.amzn2
libitm.x86_64 0:7.3.1-13.amzn2 libmpc.x86_64 0:1.0.1-3.amzn2.0.2 libmpx.x86_64 0:7.3.1-13.amzn2
libquadmath.x86_64 0:7.3.1-13.amzn2 libsanitizer.x86_64 0:7.3.1-13.amzn2 mpfr.x86_64 0:3.1.1-4.amzn2.0.2
Complete!
[ec2-user@ip-10-10-1-166 ~]$ sudo yum install libevent-devel -y
...
Installed:
libevent-devel.x86_64 0:2.0.21-4.amzn2.0.3
Complete!
[ec2-user@ip-10-10-1-166 ~]$ sudo pip3 install wheel
...
Successfully installed wheel-0.37.0
[ec2-user@ip-10-10-1-166 ~]$ sudo pip3 install boto3 PyMySQL celery flask-bcrypt
...
Successfully installed Flask-2.0.1 Jinja2-3.0.1 MarkupSafe-2.0.1 PyMySQL-1.0.2 Werkzeug-2.0.1 amqp-5.0.6 bcrypt-3.2.0 billiard-3.6.4.0 boto3-1.18.48 botocore-1.21.48 cached-property-1.5.2 celery-5.1.2 cffi-1.14.6 click-7.1.2 click-didyoumean-0.0.3 click-plugins-1.1.1 click-repl-0.2.0 flask-bcrypt-0.7.1 importlib-metadata-4.8.1 itsdangerous-2.0.1 jmespath-0.10.0 kombu-5.1.0 prompt-toolkit-3.0.20 pycparser-2.20 python-dateutil-2.8.2 pytz-2021.1 s3transfer-0.5.0 six-1.16.0 typing-extensions-3.10.0.2 urllib3-1.26.7 vine-5.0.0 wcwidth-0.2.5 zipp-3.5.0
[ec2-user@ip-10-10-1-166 ~]$ sudo pip3 install apache-airflow-providers-sqlite
...
Successfully installed apache-airflow-providers-sqlite-2.0.1
[ec2-user@ip-10-10-1-166 ~]$ sudo pip3 install 'apache-airflow[mysql, aws, celery, redis]'
...
# 아래와 같이 디펜던시 에러가 날수도 있는데 airflow 실행에 문제가 없으면 넘어가도 무방하다.
#ERROR: After October 2020 you may experience errors when installing or updating packages. This is because pip will change the way that it resolves dependency conflicts.
#
#We recommend you use --use-feature=2020-resolver to test your packages with the new resolver before it becomes the default.
#
#jinja2 3.0.1 requires MarkupSafe>=2.0, but you'll have markupsafe 1.1.1 which is incompatible.
#flask 1.1.4 requires Jinja2<3.0,>=2.10.1, but you'll have jinja2 3.0.1 which is incompatible.
#flask-appbuilder 3.3.3 requires SQLAlchemy<1.4.0, but you'll have sqlalchemy 1.4.25 which is incompatible.
...
Successfully installed Babel-2.9.1 Flask-Babel-1.0.0 Flask-JWT-Extended-3.25.1 Flask-OpenID-1.3.0 Flask-SQLAlchemy-2.5.1 Mako-1.1.5 WTForms-2.3.3 alembic-1.7.3 anyio-3.3.1 apache-airflow-2.1.4 apache-airflow-providers-amazon-2.2.0 apache-airflow-providers-celery-2.1.0 apache-airflow-providers-ftp-2.0.1 apache-airflow-providers-http-2.0.1 apache-airflow-providers-imap-2.0.1 apache-airflow-providers-mysql-2.1.1 apache-airflow-providers-redis-2.0.1 apispec-3.3.2 argcomplete-1.12.3 attrs-20.3.0 blinker-1.4 boto3-1.17.112 botocore-1.20.112 cattrs-1.5.0 certifi-2021.5.30 charset-normalizer-2.0.6 clickclick-20.10.2 colorama-0.4.4 colorlog-5.0.1 commonmark-0.9.1 croniter-1.0.15 cryptography-3.4.8 decorator-5.1.0 defusedxml-0.7.1 dill-0.3.4 dnspython-2.1.0 email-validator-1.1.3 flask-1.1.4 flask-appbuilder-3.3.3 flask-caching-1.10.1 flask-login-0.4.1 flask-wtf-0.14.3 flower-1.0.0 graphviz-0.17 greenlet-1.1.1 gunicorn-20.1.0 h11-0.12.0 httpcore-0.13.7 httpx-0.19.0 humanize-3.11.0 idna-3.2 importlib-resources-1.5.0 inflection-0.5.1 iso8601-0.1.16 isodate-0.6.0 itsdangerous-1.1.0 jsonpath-ng-1.5.3 jsonschema-3.2.0 lazy-object-proxy-1.6.0 lockfile-0.12.2 markdown-3.3.4 markupsafe-1.1.1 marshmallow-3.13.0 marshmallow-enum-1.5.1 marshmallow-oneofschema-3.0.1 marshmallow-sqlalchemy-0.23.1 mysql-connector-python-8.0.26 mysqlclient-2.0.3 numpy-1.21.2 openapi-schema-validator-0.1.5 openapi-spec-validator-0.3.1 pandas-1.3.3 pendulum-2.1.2 ply-3.11 prison-0.2.1 prometheus-client-0.11.0 protobuf-3.18.0 psutil-5.8.0 pygments-2.10.0 pyjwt-1.7.1 pyrsistent-0.18.0 python-daemon-2.3.0 python-nvd3-0.15.0 python-slugify-4.0.1 python3-openid-3.2.0 pytzdata-2020.1 pyyaml-5.4.1 redis-3.5.3 requests-2.26.0 rfc3986-1.5.0 rich-10.11.0 s3transfer-0.4.2 setproctitle-1.2.2 sniffio-1.2.0 sqlalchemy-1.4.25 sqlalchemy-jsonfield-1.0.0 sqlalchemy-utils-0.37.8 swagger-ui-bundle-0.0.9 tabulate-0.8.9 tenacity-6.2.0 termcolor-1.1.0 text-unidecode-1.3 tornado-6.1 unicodecsv-0.14.1 watchtower-1.0.6 werkzeug-1.0.1
[ec2-user@ip-10-10-1-166 airflow]$ pip3 install "SQLAlchemy==1.3.24"
...
Successfully installed SQLAlchemy-1.3.24
[ec2-user@ip-10-10-1-166 ~]$ pip3 install "apache-airflow-providers-celery==2.0.0"
...
Successfully installed amqp-2.6.1 apache-airflow-providers-celery-2.0.0 celery-4.4.7 flower-0.9.7 kombu-4.6.11 prometheus-client-0.8.0 vine-1.3.0
[ec2-user@ip-10-10-1-166 ~]$ pip3 list | grep airflow
apache-airflow 2.1.4
apache-airflow-providers-amazon 2.2.0
apache-airflow-providers-celery 2.0.0
apache-airflow-providers-ftp 2.0.1
apache-airflow-providers-http 2.0.1
apache-airflow-providers-imap 2.0.1
apache-airflow-providers-mysql 2.1.1
apache-airflow-providers-redis 2.0.1
apache-airflow-providers-sqlite 2.0.1
설치가 완료되었다면 airflow 폴더애 있는 airflow.cfg를 수정해서 앞서 설치한 redis와 mysql의 Connection을 설정해야 한다. 먼저 airflow 명령어가 동작하는지 먼저 확인한다. 만약 명령어가 작동한다면 아까 설정해둔 AIRFLOW_HOME경로에 airflow 폴더가 생성될 것이다. 그 안에 configuration 파일이 있다.
# 아래와 같이 오류가 날것이다. sqlite는 사용하지 않는다.
# [ec2-user@ip-10-10-1-166 ~]$ airflow db init
# Traceback (most recent call last):
# File "/usr/local/bin/airflow", line 5, in <module>
# from airflow.__main__ import main
# File "/usr/local/lib/python3.7/site-packages/airflow/__init__.py", line 34, in <module>
# from airflow import settings
# File "/usr/local/lib/python3.7/site-packages/airflow/settings.py", line 34, in <module>
# from airflow.configuration import AIRFLOW_HOME, WEBSERVER_CONFIG, conf # NOQA F401
# File "/usr/local/lib/python3.7/site-packages/airflow/configuration.py", line 1113, in <module>
# conf.validate()
# File "/usr/local/lib/python3.7/site-packages/airflow/configuration.py", line 201, in validate
# self._validate_config_dependencies()
# File "/usr/local/lib/python3.7/site-packages/airflow/configuration.py", line 242, in _validate_config_dependencies
# f"error: sqlite C library version too old (< {min_sqlite_version}). "
# airflow.exceptions.AirflowConfigException: error: sqlite C library version too old (< 3.15.0). See https://airflow.apache.org/docs/apache-airflow/2.1.4/howto/set-up-database.html#setting-up-a-sqlite-database
[ec2-user@ip-10-10-1-166 ~]$ cd /home/ec2-user/airflow
[ec2-user@ip-10-10-1-166 airflow]$ ll
total 52
-rw-rw-r-- 1 ec2-user ec2-user 42599 Sep 25 06:20 airflow.cfg
-rw-r--r-- 1 ec2-user ec2-user 4700 Sep 25 06:20 webserver_config.py
[ec2-user@ip-10-10-1-166 airflow]$ vim airflow.cfg
...
# 사용할 dag 폴더 지정
# subfolder in a code repository. This path must be absolute.
dags_folder = /home/ec2-user/airflow/dags
...
# 사용할 executor 설정
# executor = SequentialExecutor
executor = CeleryExecutor
...
# MySQL Connection 설정
# sql_alchemy_conn = sqlite:////home/airflow/airflow/airflow.db
sql_alchemy_conn = mysql+pymysql://airflow:mypasswd12!@127.0.0.1:3306/airflow
...
# 비밀번호 사용
# auth_backend = airflow.api.auth.backend.deny_all
auth_backend = airflow.api.auth.backend.basic_auth
...
# Redis Connection 설정
# broker_url = sqla+mysql://airflow:airflow@127.0.0.1:3306/airflow
broker_url = redis://localhost:6379/0
...
# result_backend = db+mysql://airflow:airflow@localhost:3306/airflow
result_backend = db+mysql://airflow:mypasswd12!@127.0.0.1:3306/airflow
...
# catchup_by_default = True
catchup_by_default = False
...
# 그런 다음에 다시 airflow db를 init했을때 아래와 같이 Done! 이 나왔으면 성공한 것이다.
[ec2-user@ip-10-10-1-166 airflow]$ airflow db init
DB: mysql+pymysql://airflow:***@127.0.0.1:3306/airflow
[2021-09-25 06:32:44,042] {db.py:702} INFO - Creating tables
INFO [alembic.runtime.migration] Context impl MySQLImpl.
INFO [alembic.runtime.migration] Will assume non-transactional DDL.
...
Initialization done
# airflow cli 명령어로 어드민 계정을 생성해보자.
[ec2-user@ip-10-10-1-166 airflow]$ airflow users create -r Admin -u minman -p mypasswd12! -e minman1234@naver.com -f minsu -l mansu
[2021-09-25 06:35:33,433] {manager.py:788} WARNING - No user yet created, use flask fab command to do it.
Admin user minman created
# 그런 다음에 webserver를 구동해보자.
# 아래와 같이 airflow가 올라오고 pending이 되어 있으면 성공한 것이다.
[ec2-user@ip-10-10-1-166 airflow]$ airflow webserver
____________ _____________
____ |__( )_________ __/__ /________ __
____ /| |_ /__ ___/_ /_ __ /_ __ \_ | /| / /
___ ___ | / _ / _ __/ _ / / /_/ /_ |/ |/ /
_/_/ |_/_/ /_/ /_/ /_/ \____/____/|__/
[2021-09-25 06:36:30,447] {dagbag.py:496} INFO - Filling up the DagBag from /dev/null
Running the Gunicorn Server with:
Workers: 4 sync
Host: 0.0.0.0:8080
Timeout: 120
Logfiles: - -
Access Logformat:
=================================================================
[2021-09-25 06:36:33 +0000] [8516] [INFO] Starting gunicorn 20.1.0
[2021-09-25 06:36:33 +0000] [8516] [INFO] Listening at: http://0.0.0.0:8080 (8516)
[2021-09-25 06:36:33 +0000] [8516] [INFO] Using worker: sync
[2021-09-25 06:36:33 +0000] [8519] [INFO] Booting worker with pid: 8519
[2021-09-25 06:36:33 +0000] [8520] [INFO] Booting worker with pid: 8520
[2021-09-25 06:36:33 +0000] [8521] [INFO] Booting worker with pid: 8521
[2021-09-25 06:36:33 +0000] [8522] [INFO] Booting worker with pid: 8522
...
# 웹브라우저를 열고 http://{ec2 public ip}:8080 으로 접속했을때 아래 화면과 같이 전시되면 정상적으로 접속이 된 것이다.
# ex) http://52.78.206.51:8080/
위에 노란박스는 Airflow의 Scheduler가 동작하지 않는다는 경고문이다. 아직 Scheduler를 실행하지 않았기 때문에 발생한다. 참고로 화면에 보이는 DAGs list는 기본적으로 Airflow가 제공하는 Example 이다.
웹서버 띄운 터미널은 그대로 두고 다른 세션으로 터미널을 두개 띄워서 각각 스케쥴러와 워커도 구동해보자.
[ec2-user@ip-10-10-1-166 ~]$ airflow scheduler
____________ _____________
____ |__( )_________ __/__ /________ __
____ /| |_ /__ ___/_ /_ __ /_ __ \_ | /| / /
___ ___ | / _ / _ __/ _ / / /_/ /_ |/ |/ /
_/_/ |_/_/ /_/ /_/ /_/ \____/____/|__/
[2021-09-25 06:43:27,133] {scheduler_job.py:662} INFO - Starting the scheduler
[2021-09-25 06:43:27,133] {scheduler_job.py:667} INFO - Processing each file at most -1 times
[2021-09-25 06:43:27,208] {manager.py:254} INFO - Launched DagFileProcessorManager with pid: 8436
[2021-09-25 06:43:27,209] {scheduler_job.py:1217} INFO - Resetting orphaned tasks for active dag runs
[2021-09-25 06:43:27,212] {settings.py:51} INFO - Configured default timezone Timezone('UTC')
...
[ec2-user@ip-10-10-1-166 ~]$ airflow celery worker
* Serving Flask app "airflow.utils.serve_logs" (lazy loading)
* Environment: production
WARNING: This is a development server. Do not use it in a production deployment.
Use a production WSGI server instead.
* Debug mode: off
[2021-09-25 06:47:01,136] {_internal.py:113} INFO - * Running on http://0.0.0.0:8793/ (Press CTRL+C to quit)
-------------- celery@ip-10-10-1-166.ap-northeast-2.compute.internal v4.4.7 (cliffs)
--- ***** -----
-- ******* ---- Linux-4.14.243-185.433.amzn2.x86_64-x86_64-with-glibc2.2.5 2021-09-25 06:47:01
- *** --- * ---
- ** ---------- [config]
- ** ---------- .> app: airflow.executors.celery_executor:0x7ff656f6aa90
- ** ---------- .> transport: redis://localhost:6379/0
- ** ---------- .> results: mysql://airflow:**@127.0.0.1:3306/airflow
- *** --- * --- .> concurrency: 16 (prefork)
-- ******* ---- .> task events: OFF (enable -E to monitor tasks in this worker)
--- ***** -----
-------------- [queues]
.> default exchange=default(direct) key=default
[tasks]
. airflow.executors.celery_executor.execute_command
[2021-09-25 06:47:03,231: INFO/MainProcess] Connected to redis://localhost:6379/0
[2021-09-25 06:47:03,239: INFO/MainProcess] mingle: searching for neighbors
[2021-09-25 06:47:04,257: INFO/MainProcess] mingle: all alone
[2021-09-25 06:47:04,265: INFO/MainProcess] celery@ip-10-10-1-166.ap-northeast-2.compute.internal ready.
...
** 참고자료 : 자주 사용하는 airflow 명령어 모음
# db initialize
airflow db init
# user 생성
airflow users create -r {Role-Name} -u {User-Name} -p {Password} -e {Email} -f {First-Name} -l {Last-Name}
# dag 목록 조회
airflow list_dags
# dag error check
python3 -c "from airflow.models import DagBag; import os;d = DagBag(os.path.expanduser('~/airflow/dags'));"
# '~/airflow/dags' : 확인할 Dag들이 있는 directory
# webserver 기동
airflow webserver
# scheduler 기동
airflow scheduler
# worker 기동
airflow celery worker