EC2에 single node 형태로 airflow(2.1.4) 서버 구현하기

2021-09-25

.

Data_Engineering_TIL(20210925)

[학습자료]

“Apache Airflow” 최정민님 깃허브 자료를 공부하고 실습한 내용입니다.

URL : https://github.com/cjungm/with-aws/tree/main/airflow/chapter_1

[학습내용]

Airflow를 다중노드로 구성할때의 아키텍처는 일반적으로 아래와 같이 구현할 수 있다.

2

이번에는 아래에 그림과 같이 싱글노드에 Airflow 서버를 구현해보자.

1

  • Airflow 싱글노드 구현 실습

STEP 1) EC2 생성

먼저 t3.xlarge 이상의 적당한 사양으로 Amazon Linux2 EC2를 생성한다. EBS 볼륨도 15기가 이상 넉넉하게 잡아주자.

STEP 2) airflow 환경변수 설정

EC2를 생성하고 접속해서 아래와 같이 기본적으로 환경변수 AIRFLOW_HOME을 설정한다.

       __|  __|_  )
       _|  (     /   Amazon Linux 2 AMI
      ___|\___|___|

https://aws.amazon.com/amazon-linux-2/
11 package(s) needed for security, out of 35 available
Run "sudo yum update" to apply all updates.
[ec2-user@ip-10-10-1-166 ~]$ export AIRFLOW_HOME=/home/ec2-user/airflow

STEP 3) Redis 설치 및 설정

celeryexecutor를 사용을 위해서 redis를 설치하고 설정을 해줘야 한다. celeryexecutor는 분산처리를 위해 Broker 역할을 할 수 있는 Queue service가 필요하다. (RabbitMQ, Redis, Amazon SQS등)

# yum 업데이트
[ec2-user@ip-10-10-1-166 ~]$ sudo yum update -y

# Remi 저장소 활성화
[ec2-user@ip-10-10-1-166 ~]$ sudo amazon-linux-extras install epel -y

...

Dependencies Resolved

=======================================================================================================================================
 Package                           Arch                        Version                      Repository                            Size
=======================================================================================================================================
Installing:
 epel-release                      noarch                      7-11                         amzn2extra-epel                       15 k

Transaction Summary
=======================================================================================================================================
Install  1 Package

Total download size: 15 k
Installed size: 24 k
Downloading packages:
epel-release-7-11.noarch.rpm                                                                                    |  15 kB  00:00:00
Running transaction check
Running transaction test
Transaction test succeeded
Running transaction
  Installing : epel-release-7-11.noarch                                                                                            1/1
  Verifying  : epel-release-7-11.noarch                                                                                            1/1

Installed:
  epel-release.noarch 0:7-11

Complete!
  0  ansible2                 available    \

...
    
[ec2-user@ip-10-10-1-166 ~]$ sudo yum install http://rpms.remirepo.net/enterprise/remi-release-7.rpm -y
    
...

=======================================================================================================================================
 Package                         Arch                      Version                            Repository                          Size
=======================================================================================================================================
Installing:
 remi-release                    noarch                    7.9-2.el7.remi                     /remi-release-7                     32 k

Transaction Summary
=======================================================================================================================================
Install  1 Package

Total size: 32 k
Installed size: 32 k
Downloading packages:
Running transaction check
Running transaction test
Transaction test succeeded
Running transaction
  Installing : remi-release-7.9-2.el7.remi.noarch                                                                                  1/1
  Verifying  : remi-release-7.9-2.el7.remi.noarch                                                                                  1/1

Installed:
  remi-release.noarch 0:7.9-2.el7.remi

Complete!

# Redis 패키지 yum으로 설치 및 실행
[ec2-user@ip-10-10-1-166 ~]$ sudo yum install redis -y

...

===========================================================================================================================================================
 Package                              Arch                               Version                                    Repository                        Size
===========================================================================================================================================================
Installing:
 redis                                x86_64                             3.2.12-2.el7                               epel                             544 k
Installing for dependencies:
 jemalloc                             x86_64                             3.6.0-1.el7                                epel                             105 k

Transaction Summary
===========================================================================================================================================================
Install  1 Package (+1 Dependent package)

Total download size: 648 k
Installed size: 1.7 M
Downloading packages:
warning: /var/cache/yum/x86_64/2/epel/packages/jemalloc-3.6.0-1.el7.x86_64.rpm: Header V3 RSA/SHA256 Signature, key ID xxxxxxxxxxxxxxxxxxxxxxxxx: NOKEY103 kB  --:--:-- ETA
Public key for jemalloc-3.6.0-1.el7.x86_64.rpm is not installed
(1/2): jemalloc-3.6.0-1.el7.x86_64.rpm                                                                                              | 105 kB  00:00:02
(2/2): redis-3.2.12-2.el7.x86_64.rpm                                                                                                | 544 kB  00:00:01
-----------------------------------------------------------------------------------------------------------------------------------------------------------

...

Installed:
  redis.x86_64 0:3.2.12-2.el7

Dependency Installed:
  jemalloc.x86_64 0:3.6.0-1.el7

Complete!

[ec2-user@ip-10-10-1-166 ~]$ sudo service redis start
Redirecting to /bin/systemctl start redis.service

[ec2-user@ip-10-10-1-166 ~]$ sudo vim /etc/redis.conf

#############################################################
# 원본내용
#############################################################

...

daemonize no

...
  
bind 127.0.0.1

...


#############################################################
# 수정후
#############################################################

...

daemonize yes

...

bind 0.0.0.0

...

# 부팅시 자동 실행하기 위한 설정
[ec2-user@ip-10-10-1-166 ~]$ sudo chkconfig redis on
Note: Forwarding request to 'systemctl enable redis.service'.
Created symlink from /etc/systemd/system/multi-user.target.wants/redis.service to /usr/lib/systemd/system/redis.service.

[ec2-user@ip-10-10-1-166 ~]$ sudo chkconfig redis-sentinel on
Note: Forwarding request to 'systemctl enable redis-sentinel.service'.
Created symlink from /etc/systemd/system/multi-user.target.wants/redis-sentinel.service to /usr/lib/systemd/system/redis-sentinel.service.

[ec2-user@ip-10-10-1-166 ~]$ systemctl list-unit-files | grep redis
redis-sentinel.service                        enabled
redis.service                                 enabled

[ec2-user@ip-10-10-1-166 ~]$ sudo service redis start
Redirecting to /bin/systemctl start redis.service

[ec2-user@ip-10-10-1-166 ~]$ sudo service redis status
Redirecting to /bin/systemctl status redis.service
● redis.service - Redis persistent key-value database
   Loaded: loaded (/usr/lib/systemd/system/redis.service; enabled; vendor preset: disabled)
  Drop-In: /etc/systemd/system/redis.service.d
           └─limit.conf
   Active: active (running) since Sat 2021-09-25 05:45:00 UTC; 5min ago
 Main PID: 7447 (redis-server)
   CGroup: /system.slice/redis.service
           └─7447 /usr/bin/redis-server 127.0.0.1:6379

Sep 25 05:45:00 ip-10-10-1-166.ap-northeast-2.compute.internal systemd[1]: Starting Redis persistent key-value database...
Sep 25 05:45:00 ip-10-10-1-166.ap-northeast-2.compute.internal systemd[1]: Started Redis persistent key-value database.
            
[ec2-user@ip-10-10-1-166 ~]$ ps -ef | grep redis
redis     7447     1  0 05:45 ?        00:00:00 /usr/bin/redis-server 127.0.0.1:6379
ec2-user  7537  2808  0 05:50 pts/0    00:00:00 grep --color=auto redis

STEP 4) MySQL 설치 및 설정

# yum 명령어로 설치시 mariadb가 설치되는데 이는 권장 DB가 아니다.
# 하단의 공식 문서에 따르면 MySQL 5.7 또는 8을 권장하고 있으므로 다른 방법으로 설치해야한다.
# https://airflow.apache.org/docs/apache-airflow/stable/installation.html#prerequisites
#[ec2-user@ip-10-10-1-166 ~]$ sudo yum install mysql -y
#
# ...
# 
# Dependencies Resolved

# ===========================================================================================================================================================
#  Package                           Arch                             Version                                     Repository                            Size
# ===========================================================================================================================================================
# Installing:
#  mariadb                           x86_64                           1:5.5.68-1.amzn2                            amzn2-core                           8.8 M

# Transaction Summary
# ===========================================================================================================================================================
# Install  1 Package

# Total download size: 8.8 M
# Installed size: 49 M
# Downloading packages:
# mariadb-5.5.68-1.amzn2.x86_64.rpm                                                                                                   | 8.8 MB  00:00:00
# Running transaction check
# Running transaction test
# Transaction test succeeded
# Running transaction
#   Installing : 1:mariadb-5.5.68-1.amzn2.x86_64                                                                                                         1/1
#   Verifying  : 1:mariadb-5.5.68-1.amzn2.x86_64                                                                                                         1/1

# Installed:
#   mariadb.x86_64 1:5.5.68-1.amzn2

# Complete!

# [ec2-user@ip-10-10-1-166 ~]$ sudo yum remove mariadb.x86_64 1:5.5.68-1.amzn2 -y
#
# ...
#
# ===========================================================================================================================================================
#  Package                           Arch                             Version                                    Repository                             Size
# ===========================================================================================================================================================
# Removing:
#  mariadb                           x86_64                           1:5.5.68-1.amzn2                           @amzn2-core                            49 M

# Transaction Summary
# ===========================================================================================================================================================
# Remove  1 Package

# Installed size: 49 M
# Downloading packages:
# Running transaction check
# Running transaction test
# Transaction test succeeded
# Running transaction
#   Erasing    : 1:mariadb-5.5.68-1.amzn2.x86_64                                                                                                         1/1
#   Verifying  : 1:mariadb-5.5.68-1.amzn2.x86_64                                                                                                         1/1

# Removed:
#   mariadb.x86_64 1:5.5.68-1.amzn2

# Complete!

# MySQL 5.7 설치
[ec2-user@ip-10-10-1-166 ~]$ sudo yum install https://dev.mysql.com/get/mysql57-community-release-el7-11.noarch.rpm -y
Loaded plugins: extras_suggestions, langpacks, priorities, update-motd
mysql57-community-release-el7-11.noarch.rpm                                                                                         |  25 kB  00:00:00
Examining /var/tmp/yum-root-WWLUGl/mysql57-community-release-el7-11.noarch.rpm: mysql57-community-release-el7-11.noarch
Marking /var/tmp/yum-root-WWLUGl/mysql57-community-release-el7-11.noarch.rpm to be installed
Resolving Dependencies
--> Running transaction check
---> Package mysql57-community-release.noarch 0:el7-11 will be installed
--> Finished Dependency Resolution

Dependencies Resolved

===========================================================================================================================================================
 Package                                   Arch                   Version                   Repository                                                Size
===========================================================================================================================================================
Installing:
 mysql57-community-release                 noarch                 el7-11                    /mysql57-community-release-el7-11.noarch                  31 k

Transaction Summary
===========================================================================================================================================================
Install  1 Package

Total size: 31 k
Installed size: 31 k
Downloading packages:
Running transaction check
Running transaction test
Transaction test succeeded
Running transaction
  Installing : mysql57-community-release-el7-11.noarch                                                                                                 1/1
  Verifying  : mysql57-community-release-el7-11.noarch                                                                                                 1/1

Installed:
  mysql57-community-release.noarch 0:el7-11

Complete!

[ec2-user@ip-10-10-1-166 ~]$ sudo yum install mysql-community-server -y

...


Installed:
  mysql-community-libs.x86_64 0:5.7.35-1.el7      mysql-community-libs-compat.x86_64 0:5.7.35-1.el7      mysql-community-server.x86_64 0:5.7.35-1.el7

Dependency Installed:
  mysql-community-client.x86_64 0:5.7.35-1.el7    mysql-community-common.x86_64 0:5.7.35-1.el7    ncurses-compat-libs.x86_64 0:6.0-8.20170212.amzn2.1.3

Replaced:
  mariadb-libs.x86_64 1:5.5.68-1.amzn2

Complete!

[ec2-user@ip-10-10-1-166 ~]$ sudo yum install mysql mysql-server mysql-libs mysql-devel -y
Loaded plugins: extras_suggestions, langpacks, priorities, update-motd
260 packages excluded due to repository priority protections
Package mysql-community-client-5.7.35-1.el7.x86_64 already installed and latest version
Package mysql-community-server-5.7.35-1.el7.x86_64 already installed and latest version
Package mysql-community-libs-5.7.35-1.el7.x86_64 already installed and latest version
Resolving Dependencies
--> Running transaction check
---> Package mysql-community-devel.x86_64 0:5.7.35-1.el7 will be installed
--> Finished Dependency Resolution

Dependencies Resolved

===========================================================================================================================================================
 Package                                     Arch                         Version                            Repository                               Size
===========================================================================================================================================================
Installing:
 mysql-community-devel                       x86_64                       5.7.35-1.el7                       mysql57-community                       3.9 M

Transaction Summary
===========================================================================================================================================================
Install  1 Package

Total download size: 3.9 M
Installed size: 24 M
Downloading packages:
mysql-community-devel-5.7.35-1.el7.x86_64.rpm                                                                                       | 3.9 MB  00:00:00
Running transaction check
Running transaction test
Transaction test succeeded
Running transaction
  Installing : mysql-community-devel-5.7.35-1.el7.x86_64                                                                                               1/1
  Verifying  : mysql-community-devel-5.7.35-1.el7.x86_64                                                                                               1/1

Installed:
  mysql-community-devel.x86_64 0:5.7.35-1.el7

Complete!

# MySQL 데몬 실행
[ec2-user@ip-10-10-1-166 ~]$ sudo service mysqld start
Redirecting to /bin/systemctl start mysqld.service

[ec2-user@ip-10-10-1-166 ~]$ sudo service mysqld status
Redirecting to /bin/systemctl status mysqld.service
● mysqld.service - MySQL Server
   Loaded: loaded (/usr/lib/systemd/system/mysqld.service; enabled; vendor preset: disabled)
   Active: active (running) since Sat 2021-09-25 05:57:58 UTC; 25s ago
     Docs: man:mysqld(8)
           http://dev.mysql.com/doc/refman/en/using-systemd.html
  Process: 7870 ExecStart=/usr/sbin/mysqld --daemonize --pid-file=/var/run/mysqld/mysqld.pid $MYSQLD_OPTS (code=exited, status=0/SUCCESS)
  Process: 7820 ExecStartPre=/usr/bin/mysqld_pre_systemd (code=exited, status=0/SUCCESS)
 Main PID: 7873 (mysqld)
   CGroup: /system.slice/mysqld.service
           └─7873 /usr/sbin/mysqld --daemonize --pid-file=/var/run/mysqld/mysqld.pid

Sep 25 05:57:54 ip-10-10-1-166.ap-northeast-2.compute.internal systemd[1]: Starting MySQL Server...
Sep 25 05:57:58 ip-10-10-1-166.ap-northeast-2.compute.internal systemd[1]: Started MySQL Server.

여기까지 정상적으로 실행이 된다면 MySQL에 대한 기본적인 설치가 완료되었다.

Airflow에서 MySQL을 DB로 사용하기 위해서는 Airflow가 MySQL에 Connection을 만들 수 있는 계정정보가 필요하다. 따라서 Airflow가 사용할 Database와 해당 DB에 접근할 권한이 있는 계정을 생성해야 한다. Airflow Database가 없다면 Airflow 실행에 필요한 테이블들을 만들지 못할것이다. 먼저 root 권한으로 접속해서 계정 생성 후 데이터베이스를 생성한다.

# 최초 접근 시 root User에 대한 비밀번호를 모르기 때문에 접근이 불가능하다.
#[ec2-user@ip-10-10-1-166 ~]$ mysql -u root -p
#Enter password:
#ERROR 1045 (28000): Access denied for user 'root'@'localhost' (using password: YES)

# MySQL은 설치 시 root User에 대한 임시 비밀번호를 생성한다. 하단 방법에 따라 임시 비밀번호를 획득한다.
[ec2-user@ip-10-10-1-166 ~]$ sudo cat /var/log/mysqld.log | grep root
2021-09-25T05:57:55.854095Z 1 [Note] A temporary password is generated for root@localhost: xxxxxxxxxxxxx
2021-09-25T05:59:51.694673Z 2 [Note] Access denied for user 'root'@'localhost' (using password: YES)
        
# 위에 비번을 카피하고 다시 루트권한으로 mysql에 접근한다.
[ec2-user@ip-10-10-1-166 ~]$ mysql -u root -p
Enter password: xxxxxxxxxxxxx
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 3
Server version: 5.7.35

Copyright (c) 2000, 2021, Oracle and/or its affiliates.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

# mysql에는 비밀번호 정책이 상 중 하로 나누어져 있는데 기본 설정은 MEDIUM이다. 이를 LOW로 변경해줘야 한다.
mysql> set global validate_password_policy='LOW';
Query OK, 0 rows affected (0.00 sec)

# 루트권한 비번을 변경한다.
mysql> alter user 'root'@'localhost' identified by 'mypasswd12!';
Query OK, 0 rows affected (0.01 sec)

mysql> use mysql;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Database changed
mysql> SET GLOBAL explicit_defaults_for_timestamp = 1;
Query OK, 0 rows affected (0.00 sec)

mysql> flush privileges;
Query OK, 0 rows affected (0.00 sec)

# Airflow 계정 및 Database를 생성한다.
mysql> create user 'airflow'@'localhost' identified by 'mypasswd12!';
Query OK, 0 rows affected (0.00 sec)

mysql> grant all privileges on *.* to 'airflow'@'localhost';
Query OK, 0 rows affected (0.00 sec)

mysql> grant all privileges on airflow.* to 'airflow'@'localhost';
Query OK, 0 rows affected (0.00 sec)

mysql> create database airflow;
Query OK, 1 row affected (0.00 sec)

mysql> flush privileges;
Query OK, 0 rows affected (0.00 sec)

mysql> exit;
Bye

STEP 5) Airflow 설치 및 설정

Airflow와 Airflow에서 사용할 Plugin(mysql, aws, celery, redis)을 설치하고 설정해준다.

[ec2-user@ip-10-10-1-166 ~]$ sudo yum install zlib-devel bzip2-devel openssl-devel ncurses-devel sqlite-devel python3-devel.x86_64 cyrus-sasl-devel.x86_64 -y

...


Installed:
  bzip2-devel.x86_64 0:1.0.6-13.amzn2.0.3           cyrus-sasl-devel.x86_64 0:2.1.26-23.amzn2        ncurses-devel.x86_64 0:6.0-8.20170212.amzn2.1.3
  openssl-devel.x86_64 1:1.0.2k-19.amzn2.0.7        python3-devel.x86_64 0:3.7.10-1.amzn2.0.1        sqlite-devel.x86_64 0:3.7.17-8.amzn2.1.1
  zlib-devel.x86_64 0:1.2.7-18.amzn2

Dependency Installed:
  cyrus-sasl.x86_64 0:2.1.26-23.amzn2                    dwz.x86_64 0:0.11-3.amzn2.0.3                  keyutils-libs-devel.x86_64 0:1.5.8-3.amzn2.0.2
  krb5-devel.x86_64 0:1.15.1-37.amzn2.2.2                libcom_err-devel.x86_64 0:1.42.9-19.amzn2      libkadm5.x86_64 0:1.15.1-37.amzn2.2.2
  libselinux-devel.x86_64 0:2.5-12.amzn2.0.2             libsepol-devel.x86_64 0:2.5-8.1.amzn2.0.2      libverto-devel.x86_64 0:0.2.5-4.amzn2.0.2
  ncurses-c++-libs.x86_64 0:6.0-8.20170212.amzn2.1.3     pcre-devel.x86_64 0:8.32-17.amzn2.0.2          perl-srpm-macros.noarch 0:1-8.amzn2.0.1
  python-rpm-macros.noarch 0:3-60.amzn2.0.1              python-srpm-macros.noarch 0:3-60.amzn2.0.1     python3-rpm-macros.noarch 0:3-60.amzn2.0.1
  system-rpm-config.noarch 0:9.1.0-76.amzn2.0.10

Complete!

[ec2-user@ip-10-10-1-166 ~]$ sudo yum install gcc -y

...


Installed:
  gcc.x86_64 0:7.3.1-13.amzn2

Dependency Installed:
  cpp.x86_64 0:7.3.1-13.amzn2                               glibc-devel.x86_64 0:2.26-54.amzn2              glibc-headers.x86_64 0:2.26-54.amzn2
  kernel-headers.x86_64 0:4.14.246-187.474.amzn2            libatomic.x86_64 0:7.3.1-13.amzn2               libcilkrts.x86_64 0:7.3.1-13.amzn2
  libitm.x86_64 0:7.3.1-13.amzn2                            libmpc.x86_64 0:1.0.1-3.amzn2.0.2               libmpx.x86_64 0:7.3.1-13.amzn2
  libquadmath.x86_64 0:7.3.1-13.amzn2                       libsanitizer.x86_64 0:7.3.1-13.amzn2            mpfr.x86_64 0:3.1.1-4.amzn2.0.2

Complete!

[ec2-user@ip-10-10-1-166 ~]$ sudo yum install libevent-devel -y

...


Installed:
  libevent-devel.x86_64 0:2.0.21-4.amzn2.0.3

Complete!

[ec2-user@ip-10-10-1-166 ~]$ sudo pip3 install wheel

...

Successfully installed wheel-0.37.0

[ec2-user@ip-10-10-1-166 ~]$ sudo pip3 install boto3 PyMySQL celery flask-bcrypt

...

Successfully installed Flask-2.0.1 Jinja2-3.0.1 MarkupSafe-2.0.1 PyMySQL-1.0.2 Werkzeug-2.0.1 amqp-5.0.6 bcrypt-3.2.0 billiard-3.6.4.0 boto3-1.18.48 botocore-1.21.48 cached-property-1.5.2 celery-5.1.2 cffi-1.14.6 click-7.1.2 click-didyoumean-0.0.3 click-plugins-1.1.1 click-repl-0.2.0 flask-bcrypt-0.7.1 importlib-metadata-4.8.1 itsdangerous-2.0.1 jmespath-0.10.0 kombu-5.1.0 prompt-toolkit-3.0.20 pycparser-2.20 python-dateutil-2.8.2 pytz-2021.1 s3transfer-0.5.0 six-1.16.0 typing-extensions-3.10.0.2 urllib3-1.26.7 vine-5.0.0 wcwidth-0.2.5 zipp-3.5.0

[ec2-user@ip-10-10-1-166 ~]$ sudo pip3 install apache-airflow-providers-sqlite

...

Successfully installed apache-airflow-providers-sqlite-2.0.1

[ec2-user@ip-10-10-1-166 ~]$ sudo pip3 install 'apache-airflow[mysql, aws, celery, redis]'

...

# 아래와 같이 디펜던시 에러가 날수도 있는데 airflow 실행에 문제가 없으면 넘어가도 무방하다.
#ERROR: After October 2020 you may experience errors when installing or updating packages. This is because pip will change the way that it resolves dependency conflicts.
#    
#We recommend you use --use-feature=2020-resolver to test your packages with the new resolver before it becomes the default.
#
#jinja2 3.0.1 requires MarkupSafe>=2.0, but you'll have markupsafe 1.1.1 which is incompatible.
#flask 1.1.4 requires Jinja2<3.0,>=2.10.1, but you'll have jinja2 3.0.1 which is incompatible.
#flask-appbuilder 3.3.3 requires SQLAlchemy<1.4.0, but you'll have sqlalchemy 1.4.25 which is incompatible.

...

Successfully installed Babel-2.9.1 Flask-Babel-1.0.0 Flask-JWT-Extended-3.25.1 Flask-OpenID-1.3.0 Flask-SQLAlchemy-2.5.1 Mako-1.1.5 WTForms-2.3.3 alembic-1.7.3 anyio-3.3.1 apache-airflow-2.1.4 apache-airflow-providers-amazon-2.2.0 apache-airflow-providers-celery-2.1.0 apache-airflow-providers-ftp-2.0.1 apache-airflow-providers-http-2.0.1 apache-airflow-providers-imap-2.0.1 apache-airflow-providers-mysql-2.1.1 apache-airflow-providers-redis-2.0.1 apispec-3.3.2 argcomplete-1.12.3 attrs-20.3.0 blinker-1.4 boto3-1.17.112 botocore-1.20.112 cattrs-1.5.0 certifi-2021.5.30 charset-normalizer-2.0.6 clickclick-20.10.2 colorama-0.4.4 colorlog-5.0.1 commonmark-0.9.1 croniter-1.0.15 cryptography-3.4.8 decorator-5.1.0 defusedxml-0.7.1 dill-0.3.4 dnspython-2.1.0 email-validator-1.1.3 flask-1.1.4 flask-appbuilder-3.3.3 flask-caching-1.10.1 flask-login-0.4.1 flask-wtf-0.14.3 flower-1.0.0 graphviz-0.17 greenlet-1.1.1 gunicorn-20.1.0 h11-0.12.0 httpcore-0.13.7 httpx-0.19.0 humanize-3.11.0 idna-3.2 importlib-resources-1.5.0 inflection-0.5.1 iso8601-0.1.16 isodate-0.6.0 itsdangerous-1.1.0 jsonpath-ng-1.5.3 jsonschema-3.2.0 lazy-object-proxy-1.6.0 lockfile-0.12.2 markdown-3.3.4 markupsafe-1.1.1 marshmallow-3.13.0 marshmallow-enum-1.5.1 marshmallow-oneofschema-3.0.1 marshmallow-sqlalchemy-0.23.1 mysql-connector-python-8.0.26 mysqlclient-2.0.3 numpy-1.21.2 openapi-schema-validator-0.1.5 openapi-spec-validator-0.3.1 pandas-1.3.3 pendulum-2.1.2 ply-3.11 prison-0.2.1 prometheus-client-0.11.0 protobuf-3.18.0 psutil-5.8.0 pygments-2.10.0 pyjwt-1.7.1 pyrsistent-0.18.0 python-daemon-2.3.0 python-nvd3-0.15.0 python-slugify-4.0.1 python3-openid-3.2.0 pytzdata-2020.1 pyyaml-5.4.1 redis-3.5.3 requests-2.26.0 rfc3986-1.5.0 rich-10.11.0 s3transfer-0.4.2 setproctitle-1.2.2 sniffio-1.2.0 sqlalchemy-1.4.25 sqlalchemy-jsonfield-1.0.0 sqlalchemy-utils-0.37.8 swagger-ui-bundle-0.0.9 tabulate-0.8.9 tenacity-6.2.0 termcolor-1.1.0 text-unidecode-1.3 tornado-6.1 unicodecsv-0.14.1 watchtower-1.0.6 werkzeug-1.0.1

[ec2-user@ip-10-10-1-166 airflow]$ pip3 install "SQLAlchemy==1.3.24"

...

Successfully installed SQLAlchemy-1.3.24

[ec2-user@ip-10-10-1-166 ~]$ pip3 install "apache-airflow-providers-celery==2.0.0"

...

Successfully installed amqp-2.6.1 apache-airflow-providers-celery-2.0.0 celery-4.4.7 flower-0.9.7 kombu-4.6.11 prometheus-client-0.8.0 vine-1.3.0

[ec2-user@ip-10-10-1-166 ~]$ pip3 list | grep airflow
apache-airflow                  2.1.4
apache-airflow-providers-amazon 2.2.0
apache-airflow-providers-celery 2.0.0
apache-airflow-providers-ftp    2.0.1
apache-airflow-providers-http   2.0.1
apache-airflow-providers-imap   2.0.1
apache-airflow-providers-mysql  2.1.1
apache-airflow-providers-redis  2.0.1
apache-airflow-providers-sqlite 2.0.1

설치가 완료되었다면 airflow 폴더애 있는 airflow.cfg를 수정해서 앞서 설치한 redis와 mysql의 Connection을 설정해야 한다. 먼저 airflow 명령어가 동작하는지 먼저 확인한다. 만약 명령어가 작동한다면 아까 설정해둔 AIRFLOW_HOME경로에 airflow 폴더가 생성될 것이다. 그 안에 configuration 파일이 있다.

# 아래와 같이 오류가 날것이다. sqlite는 사용하지 않는다.
# [ec2-user@ip-10-10-1-166 ~]$ airflow db init
# Traceback (most recent call last):
#   File "/usr/local/bin/airflow", line 5, in <module>
#     from airflow.__main__ import main
#   File "/usr/local/lib/python3.7/site-packages/airflow/__init__.py", line 34, in <module>
#     from airflow import settings
#   File "/usr/local/lib/python3.7/site-packages/airflow/settings.py", line 34, in <module>
#     from airflow.configuration import AIRFLOW_HOME, WEBSERVER_CONFIG, conf  # NOQA F401
#   File "/usr/local/lib/python3.7/site-packages/airflow/configuration.py", line 1113, in <module>
#     conf.validate()
#   File "/usr/local/lib/python3.7/site-packages/airflow/configuration.py", line 201, in validate
#     self._validate_config_dependencies()
#   File "/usr/local/lib/python3.7/site-packages/airflow/configuration.py", line 242, in _validate_config_dependencies
#     f"error: sqlite C library version too old (< {min_sqlite_version}). "
# airflow.exceptions.AirflowConfigException: error: sqlite C library version too old (< 3.15.0). See https://airflow.apache.org/docs/apache-airflow/2.1.4/howto/set-up-database.html#setting-up-a-sqlite-database

[ec2-user@ip-10-10-1-166 ~]$ cd /home/ec2-user/airflow

[ec2-user@ip-10-10-1-166 airflow]$ ll
total 52
-rw-rw-r-- 1 ec2-user ec2-user 42599 Sep 25 06:20 airflow.cfg
-rw-r--r-- 1 ec2-user ec2-user  4700 Sep 25 06:20 webserver_config.py

[ec2-user@ip-10-10-1-166 airflow]$ vim airflow.cfg

...

# 사용할 dag 폴더 지정
# subfolder in a code repository. This path must be absolute. 
dags_folder = /home/ec2-user/airflow/dags

...

# 사용할 executor 설정
# executor = SequentialExecutor
executor = CeleryExecutor

...

# MySQL Connection 설정
# sql_alchemy_conn = sqlite:////home/airflow/airflow/airflow.db
sql_alchemy_conn =  mysql+pymysql://airflow:mypasswd12!@127.0.0.1:3306/airflow
            
...

# 비밀번호 사용
# auth_backend = airflow.api.auth.backend.deny_all
auth_backend = airflow.api.auth.backend.basic_auth

...

# Redis Connection 설정
# broker_url = sqla+mysql://airflow:airflow@127.0.0.1:3306/airflow
broker_url = redis://localhost:6379/0
        
...

# result_backend = db+mysql://airflow:airflow@localhost:3306/airflow
result_backend = db+mysql://airflow:mypasswd12!@127.0.0.1:3306/airflow
            
...

# catchup_by_default = True
catchup_by_default = False

...

# 그런 다음에 다시 airflow db를 init했을때 아래와 같이 Done! 이 나왔으면 성공한 것이다. 
[ec2-user@ip-10-10-1-166 airflow]$ airflow db init
DB: mysql+pymysql://airflow:***@127.0.0.1:3306/airflow
[2021-09-25 06:32:44,042] {db.py:702} INFO - Creating tables
INFO  [alembic.runtime.migration] Context impl MySQLImpl.
INFO  [alembic.runtime.migration] Will assume non-transactional DDL.

...

Initialization done

# airflow cli 명령어로 어드민 계정을 생성해보자.
[ec2-user@ip-10-10-1-166 airflow]$ airflow users create -r Admin  -u minman  -p mypasswd12! -e minman1234@naver.com -f minsu -l mansu
[2021-09-25 06:35:33,433] {manager.py:788} WARNING - No user yet created, use flask fab command to do it.
Admin user minman created

# 그런 다음에 webserver를 구동해보자.
# 아래와 같이 airflow가 올라오고 pending이 되어 있으면 성공한 것이다.
[ec2-user@ip-10-10-1-166 airflow]$ airflow webserver
  ____________       _____________
 ____    |__( )_________  __/__  /________      __
____  /| |_  /__  ___/_  /_ __  /_  __ \_ | /| / /
___  ___ |  / _  /   _  __/ _  / / /_/ /_ |/ |/ /
 _/_/  |_/_/  /_/    /_/    /_/  \____/____/|__/
[2021-09-25 06:36:30,447] {dagbag.py:496} INFO - Filling up the DagBag from /dev/null
Running the Gunicorn Server with:
Workers: 4 sync
Host: 0.0.0.0:8080
Timeout: 120
Logfiles: - -
Access Logformat:
=================================================================
[2021-09-25 06:36:33 +0000] [8516] [INFO] Starting gunicorn 20.1.0
[2021-09-25 06:36:33 +0000] [8516] [INFO] Listening at: http://0.0.0.0:8080 (8516)
[2021-09-25 06:36:33 +0000] [8516] [INFO] Using worker: sync
[2021-09-25 06:36:33 +0000] [8519] [INFO] Booting worker with pid: 8519
[2021-09-25 06:36:33 +0000] [8520] [INFO] Booting worker with pid: 8520
[2021-09-25 06:36:33 +0000] [8521] [INFO] Booting worker with pid: 8521
[2021-09-25 06:36:33 +0000] [8522] [INFO] Booting worker with pid: 8522

...

# 웹브라우저를 열고 http://{ec2 public ip}:8080 으로 접속했을때 아래 화면과 같이 전시되면 정상적으로 접속이 된 것이다.
# ex) http://52.78.206.51:8080/

3

위에 노란박스는 Airflow의 Scheduler가 동작하지 않는다는 경고문이다. 아직 Scheduler를 실행하지 않았기 때문에 발생한다. 참고로 화면에 보이는 DAGs list는 기본적으로 Airflow가 제공하는 Example 이다.

웹서버 띄운 터미널은 그대로 두고 다른 세션으로 터미널을 두개 띄워서 각각 스케쥴러와 워커도 구동해보자.

[ec2-user@ip-10-10-1-166 ~]$ airflow scheduler
  ____________       _____________
 ____    |__( )_________  __/__  /________      __
____  /| |_  /__  ___/_  /_ __  /_  __ \_ | /| / /
___  ___ |  / _  /   _  __/ _  / / /_/ /_ |/ |/ /
 _/_/  |_/_/  /_/    /_/    /_/  \____/____/|__/
[2021-09-25 06:43:27,133] {scheduler_job.py:662} INFO - Starting the scheduler
[2021-09-25 06:43:27,133] {scheduler_job.py:667} INFO - Processing each file at most -1 times
[2021-09-25 06:43:27,208] {manager.py:254} INFO - Launched DagFileProcessorManager with pid: 8436
[2021-09-25 06:43:27,209] {scheduler_job.py:1217} INFO - Resetting orphaned tasks for active dag runs
[2021-09-25 06:43:27,212] {settings.py:51} INFO - Configured default timezone Timezone('UTC')

...

[ec2-user@ip-10-10-1-166 ~]$ airflow celery worker
 * Serving Flask app "airflow.utils.serve_logs" (lazy loading)
 * Environment: production
   WARNING: This is a development server. Do not use it in a production deployment.
   Use a production WSGI server instead.
 * Debug mode: off
[2021-09-25 06:47:01,136] {_internal.py:113} INFO -  * Running on http://0.0.0.0:8793/ (Press CTRL+C to quit)

 -------------- celery@ip-10-10-1-166.ap-northeast-2.compute.internal v4.4.7 (cliffs)
--- ***** -----
-- ******* ---- Linux-4.14.243-185.433.amzn2.x86_64-x86_64-with-glibc2.2.5 2021-09-25 06:47:01
- *** --- * ---
- ** ---------- [config]
- ** ---------- .> app:         airflow.executors.celery_executor:0x7ff656f6aa90
- ** ---------- .> transport:   redis://localhost:6379/0
- ** ---------- .> results:     mysql://airflow:**@127.0.0.1:3306/airflow
- *** --- * --- .> concurrency: 16 (prefork)
-- ******* ---- .> task events: OFF (enable -E to monitor tasks in this worker)
--- ***** -----
 -------------- [queues]
                .> default          exchange=default(direct) key=default


[tasks]
  . airflow.executors.celery_executor.execute_command

[2021-09-25 06:47:03,231: INFO/MainProcess] Connected to redis://localhost:6379/0
[2021-09-25 06:47:03,239: INFO/MainProcess] mingle: searching for neighbors
[2021-09-25 06:47:04,257: INFO/MainProcess] mingle: all alone
[2021-09-25 06:47:04,265: INFO/MainProcess] celery@ip-10-10-1-166.ap-northeast-2.compute.internal ready.

...

** 참고자료 : 자주 사용하는 airflow 명령어 모음

# db initialize
airflow db init

# user 생성
airflow users create -r {Role-Name}  -u {User-Name}  -p {Password} -e {Email} -f {First-Name} -l {Last-Name}

# dag 목록 조회
airflow list_dags

# dag error check
python3 -c "from airflow.models import DagBag; import os;d = DagBag(os.path.expanduser('~/airflow/dags'));"
# '~/airflow/dags' : 확인할 Dag들이 있는 directory

# webserver 기동
airflow webserver

# scheduler 기동
airflow scheduler

# worker 기동
airflow celery worker