Fluentd를 이용한 EMR 애코 어플리케이션 로그수집 구현
.
Data_Engineering_TIL(20200523)
[로그수집 아키텍처]
** kinesis streams -> logstash -> elasticsearch도 가능한 아키텍처
[로그 수집대상]
- master 노드
1) /var/log/hadoop-hdfs/hadoop-hdfs-namenode*.log
2) /var/log/hadoop-mapreduce/mapred-mapred-historyserver*.log
3) /var/log/hadoop-yarn/yarn-yarn-resourcemanager*.log
4) /var/log/hadoop-yarn/yarn-yarn-timelineserver*.log
5) /var/log/hadoop-yarn/yarn-yarn-proxyserver*.log
6) /var/log/hive/hiveserver*.log
7) /var/log/hive/metastore*.log
8) /var/log/presto/server.log
9) /var/log/oozie/*.log
10) /var/log/hue/*.log
11) /var/log/zeppelin/*.log
12) /var/log/zookeeper/*.log
- core, task 노드
1) /var/log/hadoop-hdfs/hadoop-hdfs-datanode*.log
2) /var/log/hadoop-mapreduce/mapred-*.log
3) /var/log/hadoop-yarn/containers///*.log
4) /var/log/hive/*.log
5) /var/log/presto/*.log
6) /var/log/zookeeper/*.log
[EMR Bootstrap shell file 예시]
#!/bin/bash
INSTANCE_ID=`sudo curl -s http://169.254.169.254/latest/meta-data/instance-id`;
mkdir ~/.aws;
cat > ~/.aws/config << EOF
[default]
region = ap-northeast-2
EOF
EMR_TAG=`aws ec2 describe-tags --filters "Name=resource-id,Values=$INSTANCE_ID" "Name=key,Values=aws:elasticmapreduce:instance-group-role"|perl -ne 'print "$1\n" if /\"Value\": \"(.*?)\"/'`;
sudo curl -sL https://toolbelt.treasuredata.com/sh/install-amazon2-td-agent3.sh | sh;
## 위에 설치 OS 환경은 Amazon Linux AMI 2버전이고, 만약에 1버전에 설치한다고 하면 아래와 같이 명령어를 입력한다.
## sudo curl -sL https://toolbelt.treasuredata.com/sh/install-amazon1-td-agent3.sh | sh;
## 예를 들어서 EMR 5.20 은 Amazon Linux AMI 1버전이고, EMR 6.0 버전은 Amazon Linux AMI 2버전이다.
sudo chown -R root:ec2-user /etc/td-agent;
sudo chmod 777 /etc/td-agent;
sudo /opt/td-agent/embedded/bin/gem install fluent-plugin-kinesis;
sudo mv /etc/td-agent/td-agent.conf /etc/td-agent/td-agent.org;
sudo gpasswd -a td-agent root;
sudo cat > /etc/td-agent/td-agent.conf << EOF
<source>
@type tail
path /var/log/hadoop-hdfs/hadoop-hdfs-namenode*.log
pos_file /var/log/td-agent/hadoop-hdfs-namenode.pos
<parse>
@type none
</parse>
tag pms-EMRtest-EMR-$EMR_TAG-$INSTANCE_ID-hdfs-namenode
</source>
<source>
@type tail
path /var/log/hadoop-mapreduce/mapred-mapred-historyserver*.log
pos_file /var/log/td-agent/mapred-mapred-historyserver.pos
<parse>
@type none
</parse>
tag pms-EMRtest-EMR-$EMR_TAG-$INSTANCE_ID-mapred-historyserver
</source>
<source>
@type tail
path /var/log/hadoop-yarn/yarn-yarn-resourcemanager*.log
pos_file /var/log/td-agent/yarn-yarn-resourcemanager.pos
<parse>
@type none
</parse>
tag pms-EMRtest-EMR-$EMR_TAG-$INSTANCE_ID-yarn-resourcemanager
</source>
<source>
@type tail
path /var/log/hadoop-yarn/yarn-yarn-timelineserver*.log
pos_file /var/log/td-agent/yarn-yarn-timelineserver.pos
<parse>
@type none
</parse>
tag pms-EMRtest-EMR-$EMR_TAG-$INSTANCE_ID-yarn-timelineserver
</source>
<source>
@type tail
path /var/log/hadoop-yarn/yarn-yarn-proxyserver*.log
pos_file /var/log/td-agent/yarn-yarn-proxyserver.pos
<parse>
@type none
</parse>
tag pms-EMRtest-EMR-$EMR_TAG-$INSTANCE_ID-yarn-proxyserver
</source>
<source>
@type tail
path /var/log/hive/hiveserver*.log
pos_file /var/log/td-agent/hiveserver.pos
<parse>
@type none
</parse>
tag pms-EMRtest-EMR-$EMR_TAG-$INSTANCE_ID-hiveserver
</source>
<source>
@type tail
path /var/log/hive/metastore*.log
pos_file /var/log/td-agent/hive-metastore.pos
<parse>
@type none
</parse>
tag pms-EMRtest-EMR-$EMR_TAG-$INSTANCE_ID-hive-metastore
</source>
<source>
@type tail
path /var/log/presto/*server*.log
pos_file /var/log/td-agent/presto-server.pos
<parse>
@type none
</parse>
tag pms-EMRtest-EMR-$EMR_TAG-$INSTANCE_ID-presto-server
</source>
<source>
@type tail
path /var/log/oozie/*.log
pos_file /var/log/td-agent/oozie.pos
<parse>
@type none
</parse>
tag pms-EMRtest-EMR-$EMR_TAG-$INSTANCE_ID-oozie
</source>
<source>
@type tail
path /var/log/hue/*.log
pos_file /var/log/td-agent/hue.pos
<parse>
@type none
</parse>
tag pms-EMRtest-EMR-$EMR_TAG-$INSTANCE_ID-hue
</source>
<source>
@type tail
path /var/log/zeppelin/*.log
pos_file /var/log/td-agent/zeppelin.pos
<parse>
@type none
</parse>
tag pms-EMRtest-EMR-$EMR_TAG-$INSTANCE_ID-zeppelin
</source>
<source>
@type tail
path /var/log/zookeeper/*.log
pos_file /var/log/td-agent/zookeeper.pos
<parse>
@type none
</parse>
tag pms-EMRtest-EMR-$EMR_TAG-$INSTANCE_ID-zookeeper
</source>
<source>
@type tail
path /var/log/hadoop-hdfs/hadoop-hdfs-datanode*.log
pos_file /var/log/td-agent/hadoop-hdfs-datanode.pos
<parse>
@type none
</parse>
tag pms-EMRtest-EMR-$EMR_TAG-$INSTANCE_ID-hdfs-datanode
</source>
<source>
@type tail
path /var/log/hadoop-mapreduce/mapred-*.log
pos_file /var/log/td-agent/hadoop-mapred.pos
<parse>
@type none
</parse>
tag pms-EMRtest-EMR-$EMR_TAG-$INSTANCE_ID-hadoop-mapred
</source>
<source>
@type tail
path /var/log/hadoop-yarn/containers/*/*/*.log
pos_file /var/log/td-agent/yarn-containers.pos
<parse>
@type none
</parse>
tag pms-EMRtest-EMR-$EMR_TAG-$INSTANCE_ID-yarn-containers.*
</source>
<source>
@type tail
path /var/log/hive/*.log
pos_file /var/log/td-agent/hive.pos
<parse>
@type none
</parse>
tag pms-EMRtest-EMR-$EMR_TAG-$INSTANCE_ID-hive
</source>
<source>
@type tail
path /var/log/presto/*.log
pos_file /var/log/td-agent/presto.pos
<parse>
@type none
</parse>
tag pms-EMRtest-EMR-$EMR_TAG-$INSTANCE_ID-presto
</source>
<match pms-EMRtest-EMR-**>
@type kinesis_streams
region ap-northeast-2
aws_key_id xxxxxxxxxxxxxx
aws_sec_key yyyyyyyyyyyyyyyyy
stream_name pms-kinesis-test
<inject>
tag_key agent.tag
</inject>
</match>
EOF
sudo sed -e $'s/%Y%m%d%H/%Y%m%d%H_'${INSTANCE_ID}'/g' -i /etc/td-agent/td-agent.conf;
sudo chown -R root:ec2-user /etc/td-agent;
sudo chmod 755 /etc/td-agent;
sudo service td-agent start;
[아키텍처 구현]
먼저 아래 그림과 같이 EMR을 생성하는데 생성할때 부트스트랩 파일로 위에 쉘스크립트를 반영해준다.
그리고 KInesis streams, firehose를 순차적으로 만들어준다.
firehose는 source는 kinesis streams 그리고 destination은 s3를 아래 그림과 같이 지정해준다.
- 정상적으로 로그 수집 시 EC2 내에 /var/log/td-agent/td-agent.log의 내용은 아래와 같을 것이다.
(또는 tail -f /var/log/td-agent/td-agent.log 명령어로 로그내용 끝에 쪽에만 실시간으로 로그를 확인할 수 있다. 내가 설정한 conf 파일을 확인하고 싶다면 /etc/td-agent/td-agent.conf 경로로 가면 된다.)
2020-05-23 07:12:43 +0000 [info]: parsing config file is succeeded path="/etc/td-agent/td-agent.conf"
2020-05-23 07:12:43 +0000 [info]: gem 'fluent-plugin-elasticsearch' version '4.0.7'
2020-05-23 07:12:43 +0000 [info]: gem 'fluent-plugin-kafka' version '0.13.0'
2020-05-23 07:12:43 +0000 [info]: gem 'fluent-plugin-kinesis' version '3.2.2'
2020-05-23 07:12:43 +0000 [info]: gem 'fluent-plugin-prometheus' version '1.7.3'
2020-05-23 07:12:43 +0000 [info]: gem 'fluent-plugin-prometheus_pushgateway' version '0.0.2'
2020-05-23 07:12:43 +0000 [info]: gem 'fluent-plugin-record-modifier' version '2.1.0'
2020-05-23 07:12:43 +0000 [info]: gem 'fluent-plugin-rewrite-tag-filter' version '2.2.0'
2020-05-23 07:12:43 +0000 [info]: gem 'fluent-plugin-s3' version '1.3.1'
2020-05-23 07:12:43 +0000 [info]: gem 'fluent-plugin-systemd' version '1.0.2'
2020-05-23 07:12:43 +0000 [info]: gem 'fluent-plugin-td' version '1.1.0'
2020-05-23 07:12:43 +0000 [info]: gem 'fluent-plugin-td-monitoring' version '0.2.4'
2020-05-23 07:12:43 +0000 [info]: gem 'fluent-plugin-webhdfs' version '1.2.4'
2020-05-23 07:12:43 +0000 [info]: gem 'fluentd' version '1.10.2'
2020-05-23 07:12:43 +0000 [warn]: 'time_format' specified without 'time_key', will be ignored
2020-05-23 07:12:43 +0000 [info]: using configuration file: <ROOT>
<source>
@type tail
path "/var/log/hadoop-hdfs/hadoop-hdfs-namenode*.log"
pos_file "/var/log/td-agent/hadoop-hdfs-namenode.pos"
tag "pms-EMRtest-EMR-MASTER-i-0138caed56fa305d3-hdfs-namenode"
<parse>
@type "none"
unmatched_lines
</parse>
</source>
<source>
@type tail
path "/var/log/hadoop-mapreduce/mapred-mapred-historyserver*.log"
pos_file "/var/log/td-agent/mapred-mapred-historyserver.pos"
tag "pms-EMRtest-EMR-MASTER-i-0138caed56fa305d3-mapred-historyserver"
<parse>
@type "none"
unmatched_lines
</parse>
</source>
<source>
@type tail
path "/var/log/hadoop-yarn/yarn-yarn-resourcemanager*.log"
pos_file "/var/log/td-agent/yarn-yarn-resourcemanager.pos"
tag "pms-EMRtest-EMR-MASTER-i-0138caed56fa305d3-yarn-resourcemanager"
<parse>
@type "none"
unmatched_lines
</parse>
</source>
<source>
@type tail
path "/var/log/hadoop-yarn/yarn-yarn-timelineserver*.log"
pos_file "/var/log/td-agent/yarn-yarn-timelineserver.pos"
tag "pms-EMRtest-EMR-MASTER-i-0138caed56fa305d3-yarn-timelineserver"
<parse>
@type "none"
unmatched_lines
</parse>
</source>
<source>
@type tail
path "/var/log/hadoop-yarn/yarn-yarn-proxyserver*.log"
pos_file "/var/log/td-agent/yarn-yarn-proxyserver.pos"
tag "pms-EMRtest-EMR-MASTER-i-0138caed56fa305d3-yarn-proxyserver"
<parse>
@type "none"
unmatched_lines
</parse>
</source>
<source>
@type tail
path "/var/log/hive/hiveserver*.log"
pos_file "/var/log/td-agent/hiveserver.pos"
tag "pms-EMRtest-EMR-MASTER-i-0138caed56fa305d3-hiveserver"
<parse>
@type "none"
unmatched_lines
</parse>
</source>
<source>
@type tail
path "/var/log/hive/metastore*.log"
pos_file "/var/log/td-agent/hive-metastore.pos"
tag "pms-EMRtest-EMR-MASTER-i-0138caed56fa305d3-hive-metastore"
<parse>
@type "none"
unmatched_lines
</parse>
</source>
<source>
@type tail
path "/var/log/presto/*server*.log"
pos_file "/var/log/td-agent/presto-server.pos"
tag "pms-EMRtest-EMR-MASTER-i-0138caed56fa305d3-presto-server"
<parse>
@type "none"
unmatched_lines
</parse>
</source>
<source>
@type tail
path "/var/log/oozie/*.log"
pos_file "/var/log/td-agent/oozie.pos"
tag "pms-EMRtest-EMR-MASTER-i-0138caed56fa305d3-oozie"
<parse>
@type "none"
unmatched_lines
</parse>
</source>
<source>
@type tail
path "/var/log/hue/*.log"
pos_file "/var/log/td-agent/hue.pos"
tag "pms-EMRtest-EMR-MASTER-i-0138caed56fa305d3-hue"
<parse>
@type "none"
unmatched_lines
</parse>
</source>
<source>
@type tail
path "/var/log/zeppelin/*.log"
pos_file "/var/log/td-agent/zeppelin.pos"
tag "pms-EMRtest-EMR-MASTER-i-0138caed56fa305d3-zeppelin"
<parse>
@type "none"
unmatched_lines
</parse>
</source>
<source>
@type tail
path "/var/log/zookeeper/*.log"
pos_file "/var/log/td-agent/zookeeper.pos"
tag "pms-EMRtest-EMR-MASTER-i-0138caed56fa305d3-zookeeper"
<parse>
@type "none"
unmatched_lines
</parse>
</source>
<source>
@type tail
path "/var/log/hadoop-hdfs/hadoop-hdfs-datanode*.log"
pos_file "/var/log/td-agent/hadoop-hdfs-datanode.pos"
tag "pms-EMRtest-EMR-MASTER-i-0138caed56fa305d3-hdfs-datanode"
<parse>
@type "none"
unmatched_lines
</parse>
</source>
<source>
@type tail
path "/var/log/hadoop-mapreduce/mapred-*.log"
pos_file "/var/log/td-agent/hadoop-mapred.pos"
tag "pms-EMRtest-EMR-MASTER-i-0138caed56fa305d3-hadoop-mapred"
<parse>
@type "none"
unmatched_lines
</parse>
</source>
<source>
@type tail
path "/var/log/hadoop-yarn/containers/*/*/*.log"
pos_file "/var/log/td-agent/yarn-containers.pos"
tag "pms-EMRtest-EMR-MASTER-i-0138caed56fa305d3-yarn-containers.*"
<parse>
@type "none"
unmatched_lines
</parse>
</source>
<source>
@type tail
path "/var/log/hive/*.log"
pos_file "/var/log/td-agent/hive.pos"
tag "pms-EMRtest-EMR-MASTER-i-0138caed56fa305d3-hive"
<parse>
@type "none"
unmatched_lines
</parse>
</source>
<source>
@type tail
path "/var/log/presto/*.log"
pos_file "/var/log/td-agent/presto.pos"
tag "pms-EMRtest-EMR-MASTER-i-0138caed56fa305d3-presto"
<parse>
@type "none"
unmatched_lines
</parse>
</source>
<match pms-EMRtest-EMR-**>
@type kinesis_streams
region "ap-northeast-2"
aws_key_id xxxxxx
aws_sec_key xxxxxx
stream_name "pms-kinesis-test"
<inject>
tag_key "agent.tag"
</inject>
</match>
</ROOT>
2020-05-23 07:12:43 +0000 [info]: starting fluentd-1.10.2 pid=29457 ruby="2.4.10"
2020-05-23 07:12:43 +0000 [info]: spawn command to main: cmdline=["/opt/td-agent/embedded/bin/ruby", "-Eascii-8bit:ascii-8bit", "/opt/td-agent/embedded/bin/fluentd", "--log", "/var/log/td-agent/td-agent.log", "--daemon", "/var/run/td-agent/td-agent.pid", "--under-supervisor"]
2020-05-23 07:12:44 +0000 [info]: adding match pattern="pms-EMRtest-EMR-**" type="kinesis_streams"
2020-05-23 07:12:44 +0000 [warn]: #0 'time_format' specified without 'time_key', will be ignored
2020-05-23 07:12:44 +0000 [info]: adding source type="tail"
2020-05-23 07:12:44 +0000 [info]: adding source type="tail"
2020-05-23 07:12:44 +0000 [info]: adding source type="tail"
2020-05-23 07:12:44 +0000 [info]: adding source type="tail"
2020-05-23 07:12:44 +0000 [info]: adding source type="tail"
2020-05-23 07:12:44 +0000 [info]: adding source type="tail"
2020-05-23 07:12:44 +0000 [info]: adding source type="tail"
2020-05-23 07:12:44 +0000 [info]: adding source type="tail"
2020-05-23 07:12:44 +0000 [info]: adding source type="tail"
2020-05-23 07:12:44 +0000 [info]: adding source type="tail"
2020-05-23 07:12:44 +0000 [info]: adding source type="tail"
2020-05-23 07:12:44 +0000 [info]: adding source type="tail"
2020-05-23 07:12:44 +0000 [info]: adding source type="tail"
2020-05-23 07:12:44 +0000 [info]: adding source type="tail"
2020-05-23 07:12:44 +0000 [info]: adding source type="tail"
2020-05-23 07:12:44 +0000 [info]: adding source type="tail"
2020-05-23 07:12:44 +0000 [info]: adding source type="tail"
2020-05-23 07:12:44 +0000 [info]: #0 starting fluentd worker pid=29502 ppid=29494 worker=0
2020-05-23 07:12:44 +0000 [info]: #0 following tail of /var/log/presto/launcher.log
2020-05-23 07:12:44 +0000 [info]: #0 following tail of /var/log/presto/server.log
2020-05-23 07:12:44 +0000 [info]: #0 following tail of /var/log/presto/http-request.log
2020-05-23 07:12:44 +0000 [info]: #0 following tail of /var/log/hive/hive-server2.log
2020-05-23 07:12:44 +0000 [info]: #0 following tail of /var/log/zookeeper/zookeeper.log
2020-05-23 07:12:44 +0000 [info]: #0 following tail of /var/log/hue/makemigrations.log
2020-05-23 07:12:44 +0000 [info]: #0 following tail of /var/log/hue/access.log
2020-05-23 07:12:44 +0000 [info]: #0 following tail of /var/log/hue/error.log
2020-05-23 07:12:44 +0000 [info]: #0 following tail of /var/log/hue/migrate.log
2020-05-23 07:12:44 +0000 [info]: #0 following tail of /var/log/hue/collectstatic.log
2020-05-23 07:12:44 +0000 [info]: #0 following tail of /var/log/hue/supervisor.log
2020-05-23 07:12:44 +0000 [info]: #0 following tail of /var/log/hue/kt_renewer.log
2020-05-23 07:12:44 +0000 [info]: #0 following tail of /var/log/hue/runcpserver.log
2020-05-23 07:12:44 +0000 [info]: #0 following tail of /var/log/oozie/oozie-instrumentation.log
2020-05-23 07:12:44 +0000 [info]: #0 following tail of /var/log/oozie/oozie-audit.log
2020-05-23 07:12:44 +0000 [info]: #0 following tail of /var/log/oozie/jetty.log
2020-05-23 07:12:44 +0000 [info]: #0 following tail of /var/log/oozie/oozie-ops.log
2020-05-23 07:12:44 +0000 [info]: #0 following tail of /var/log/oozie/oozie-jpa.log
2020-05-23 07:12:44 +0000 [info]: #0 following tail of /var/log/oozie/oozie-error.log
2020-05-23 07:12:44 +0000 [info]: #0 following tail of /var/log/oozie/derby.log
2020-05-23 07:12:44 +0000 [info]: #0 following tail of /var/log/oozie/oozie.log
2020-05-23 07:12:44 +0000 [info]: #0 following tail of /var/log/presto/server.log
2020-05-23 07:12:44 +0000 [info]: #0 following tail of /var/log/hadoop-hdfs/hadoop-hdfs-namenode-ip-10-1-10-162.log
2020-05-23 07:12:44 +0000 [info]: #0 fluentd worker is now running worker=0
그리고 아래 그림과 같이 Kinesis streams와 firehose 의 콘솔(모니터링 메뉴)에서 로그들이 잘 들어가는 것을 확인할 수 있다.
destination인 s3에는 MASTER,TASK,CORE 노드에서 오는 로그들이 저장되는 것을 아래 그림과 같이 확인할 수 있다.
s3 로그 파일을 다운받아 열어보면 대충 내용은 아래와 같이 제이슨 형태의 로그로 구성되어 있다.
{"message":"2020-05-23 07:12:59,084 INFO StatusTransitService$StatusTransitRunnable:520 - SERVER[ip-10-1-10-162.ap-northeast-2.compute.internal] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[-] ACTION[-] Acquired lock for [org.apache.oozie.service.StatusTransitService]","agent.tag":"pms-EMRtest-EMR-MASTER-i-0138caed56fa305d3-oozie"}
{"message":"2020-05-23T07:12:53.345Z\t10.1.10.212\tGET\t/v1/info/state\tnull\tnull\t200\t0\t9\t0\t1c1e307f0f1f4e47b963899f2ac084a900000005d2\tHTTP/1.1\t0\t0\t0\tnull","agent.tag":"pms-EMRtest-EMR-MASTER-i-0138caed56fa305d3-presto"}
{"message":"2020-05-23T07:12:57.114Z\t10.1.10.162\tPUT\t/v1/announcement/i-0138caed56fa305d3\tnull\ti-0138caed56fa305d3\t202\t950\t0\t1\t1c1e307f0f1f4e47b963899f2ac084a900000005d5\tHTTP/1.1\t0\t0\t-1\tnull","agent.tag":"pms-EMRtest-EMR-MASTER-i-0138caed56fa305d3-presto"}
{"message":"2020-05-23T07:13:13.348Z\t10.1.10.212\tGET\t/v1/info/state\tnull\tnull\t200\t0\t9\t1\t1c1e307f0f1f4e47b963899f2ac084a900000005e8\tHTTP/1.1\t0\t0\t0\tnull","agent.tag":"pms-EMRtest-EMR-MASTER-i-0138caed56fa305d3-presto"}
{"message":"2020-05-23T07:13:24.843Z\t10.1.10.162\tGET\t/v1/info/state\tnull\tnull\t200\t0\t9\t1\t1c1e307f0f1f4e47b963899f2ac084a900000005f3\tHTTP/1.1\t0\t0\t0\tnull","agent.tag":"pms-EMRtest-EMR-MASTER-i-0138caed56fa305d3-presto"}
{"message":" \"gauges\" : {","agent.tag":"pms-EMRtest-EMR-MASTER-i-0138caed56fa305d3-oozie"}
{"message":" \"configuration.action.types\" : {","agent.tag":"pms-EMRtest-EMR-MASTER-i-0138caed56fa305d3-oozie"}
{"message":" \"configuration.config.dir\" : {","agent.tag":"pms-EMRtest-EMR-MASTER-i-0138caed56fa305d3-oozie"}
{"message":" },","agent.tag":"pms-EMRtest-EMR-MASTER-i-0138caed56fa305d3-oozie"}
{"message":" \"value\" : 0","agent.tag":"pms-EMRtest-EMR-MASTER-i-0138caed56fa305d3-oozie"}
{"message":" },","agent.tag":"pms-EMRtest-EMR-MASTER-i-0138caed56fa305d3-oozie"}
{"message":" },","agent.tag":"pms-EMRtest-EMR-MASTER-i-0138caed56fa305d3-oozie"}
{"message":" },","agent.tag":"pms-EMRtest-EMR-MASTER-i-0138caed56fa305d3-oozie"}
...