EC2를 이용한 Hadoop Cluster 구축

2020-08-02

Data Engineering

Data_Engineering_TIL(20200731)

** 학습 시 참고자료

1) AWS EC2로 Hadoop Cluster 구축하기

URL : https://yahwang.github.io/posts/62

2) [:EN]HADOOP 101: MULTI-NODE INSTALLATION USING AWS EC2[:KO]HADOOP 101: 멀티노드 설치 AWS EC2[:]

URL : https://codethief.io/ko/hadoop101/

1. 구축환경

NameNode 1 & DataNode 2 & client 1

ex)

pms-hadoop-namenode, pms-hadoop-datanode, pms-hadoop-client

EC2 Instance : Ubuntu 18.04 / m5.xlarge / 30GB

** client만 amazon linux 2 / t3.micro / 8GB

Security Group

Type Protocol Port range Source

All TCP TCP 0 - 65535 [local pc 아이피주소]

All TCP TCP 0 - 65535 [이 보안그룹의 ID]

2. 설치과정

step 0) 사전준비

ubuntu 18버전으로 datanode 생성한다.

마찬가지로 amazon linux 2 버전으로 client 1대를 생성하고, 거기에 하둡 클러스터에서 쓸 키페어를 싸이버덕으로 넣어준다.

그런 다음에 클라이언트로 접속해서 아래와 같은 명령어로 키페어의 권한을 변경해주고 네임노드에 키페어를 복사해준다.

[ec2-user@ip-10-1-10-247 ~]$ ls -l
-rw-rw-r-- 1 ec2-user ec2-user 1674 Jul 31 05:38 pms-hadoop-test.pem

# private key는 소유자만 사용하도록 권한 설정부터 해야 사용할 수 있다.
[ec2-user@ip-10-1-10-247 ~]$ sudo chmod 400 pms-hadoop-test.pem

## ls -l로 확인하면 -r------- 로 변한다.
[ec2-user@ip-10-1-10-247 ~]$ ll
-r-------- 1 ec2-user ec2-user 1674 Jul 31 05:38 pms-hadoop-test.pem

# private key를 Instance로 ssh를 통해 복사
[ec2-user@ip-10-1-10-247 ~]$ scp -i pms-hadoop-test.pem pms-hadoop-test.pem ubuntu@[namenode 퍼블릭 IP]:~/.ssh
The authenticity of host '13.125.191.160 (13.125.191.160)' can't be established.
ECDSA key fingerprint is SHA256:xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx.
ECDSA key fingerprint is MD5:xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added '13.125.191.160' (ECDSA) to the list of known hosts.
pms-hadoop-hadoop.pem                                                            100% 1674     2.3MB/s   00:00

# private key로 Instance 연결
[ec2-user@ip-10-1-10-247 ~]$ ssh -i pms-hadoop-test.pem ubuntu@13.125.191.160
Welcome to Ubuntu 18.04.4 LTS (GNU/Linux 5.3.0-1023-aws x86_64)

 * Documentation:  https://help.ubuntu.com
 * Management:     https://landscape.canonical.com
 * Support:        https://ubuntu.com/advantage

  System information as of Fri Jul 31 05:52:22 UTC 2020

  System load:  0.0                Processes:           122
  Usage of /:   10.5% of 29.02GB   Users logged in:     1
  Memory usage: 2%                 IP address for ens5: 10.1.10.226
  Swap usage:   0%


0 packages can be updated.
0 updates are security updates.


*** System restart required ***
Last login: Fri Jul 31 05:48:51 2020 from 58.151.93.17
ubuntu@ip-10-1-10-121:~$

step 1) 위와 같이 네임노드로 접속하여 호스트네임을 namenode라고 변경해준다.

참고자료 : https://docs.aws.amazon.com/ko_kr/AWSEC2/latest/UserGuide/set-hostname.html

sudo hostnamectl set-hostname namenode
sudo reboot

step 2) 다시 네임노드로 돌아와서 아래의 명령어를 실행하여 apt를 업데이트해주고. 자바와 하둡을 설치해준다.

sudo apt update -y 
sudo apt dist-upgrade -y
sudo apt install openjdk-8-jdk -y
wget http://apache.mirror.cdnetworks.com/hadoop/common/hadoop-2.9.2/hadoop-2.9.2.tar.gz -P ~/Downloads
sudo tar zxvf ~/Downloads/hadoop-* -C /usr/local

step 3) bashrc에 환경설정 추가

** 참고사항 : 환경변수 설정방법

1회성 설정 방법

ex) export <변수명>=<값> : export JAVA_HOME=/user/lib/java-7-openjdk-amd64/

영구 설정 방법

/etc/bash.bashrc 파일에 export <변수명>=<값>을 작성하고, source /etc/bash.bashrc 명령을 통해 적용 해 주면 된다.

영구적인 설정방법으로 bashrc 환경변수를 아래와 같이 설정해준다.

ubuntu@datanode:/etc$ sudo mv /usr/local/hadoop-* /usr/local/hadoop

ubuntu@datanode:/etc$ sudo vim /etc/bash.bashrc
# 가장하단의 빈칸에 아래와 같은 내용을 추가
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
# 참고사항 : JAVA 경로 확인 readlink -f $(which java)
export HADOOP_HOME=/usr/local/hadoop
export HADOOP_CONF_DIR=/usr/local/hadoop/etc/hadoop
export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin

# 저장
ubuntu@datanode:/etc$ source /etc/bash.bashrc

# 하둡 폴더 소유자 변경
ubuntu@datanode:/etc$ sudo chown -R ubuntu $HADOOP_HOME

step 4) Hadoop 기본 환경 설정

ubuntu@namenode:~$ cd $HADOOP_CONF_DIR

ubuntu@namenode:/usr/local/hadoop/etc/hadoop$ sudo vim /usr/local/hadoop/etc/hadoop/hadoop-env.sh
# JAVA를 환경변수에 이미 설정했지만 hadoop이 인식못하는 것을 방지하기 위해서 추가로 설정한다.
# export JAVA_HOME 부분을 찾아서 아래와 같이 고친다.
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64

ubuntu@namenode:/usr/local/hadoop/etc/hadoop$ sudo vim /usr/local/hadoop/etc/hadoop/core-site.xml
# node에게 namenode의 정보를 알려준다.
<configuration>
  <property>
    <name>fs.defaultFS</name>
    <value>hdfs://[namenode의 퍼블릭 DNS주소]:9000</value>
  </property>
</configuration>

ubuntu@namenode:/usr/local/hadoop/etc/hadoop$ sudo vim /usr/local/hadoop/etc/hadoop/yarn-site.xml
<configuration>
<!-- Site specific YARN configuration properties -->
  <property>
    <name>yarn.nodemanager.aux-services</name>
    <value>mapreduce_shuffle</value>
  </property>
  <property>
    <name>yarn.resourcemanager.hostname</name>
    <value>[namenode의 퍼블릭 DNS주소]</value>
  </property>
</configuration>

ubuntu@namenode:/usr/local/hadoop/etc/hadoop$ sudo cp mapred-site.xml.template mapred-site.xml
ubuntu@namenode:/usr/local/hadoop/etc/hadoop$ sudo vim /usr/local/hadoop/etc/hadoop/mapred-site.xml
# jobtracker는 yarn을 사용하지 않을 경우를 대비한 용도 
<configuration> 
  <property>
    <name>mapreduce.jobtracker.address</name>
    <value>[namenode의 퍼블릭 DNS주소]:54311</value>
  </property>
  <property>
    <name>mapreduce.framework.name</name>
    <value>yarn</value>
  </property>
</configuration>

step 5) 데이터노드 생성을 위해서 네임노드에 대해서 현재까지 작업한 내용을 AMI 이미지를 뜬다.

그리고 생성한 이미지로 데이터노드를 만들어준다.

그런 다음에 생성한 두대의 데이터노드로 각각 접속해서 호스트 네임을 datanode로 바꿔주고 재부팅한다.

step 6) 클라이언트에서 모든 하둡클러스터 노드와 SSH 연결하기

클라이언트로 접속해서 아래와 같은 명령어를 실행신다.

[ec2-user@ip-10-1-10-247 ~]$ sudo mv ~/pms-hadoop-test.pem ~/.ssh
[ec2-user@ip-10-1-10-247 ~]$ sudo vim ~/.ssh/config
Host namenode
  HostName [네임노드 퍼블릭 DNS주소]
  User ubuntu
  IdentityFile ~/.ssh/pms-hadoop-test.pem

Host datanode1
  HostName [데이터노드1 퍼블릭 DNS주소]
  User ubuntu
  IdentityFile ~/.ssh/pms-hadoop-test.pem
    
Host datanode2
  HostName [데이터노드2 퍼블릭 DNS주소]
  User ubuntu
  IdentityFile ~/.ssh/pms-hadoop-test.pem

# 저장 후 namenode에도 이 파일을 복사한다.
[ec2-user@ip-10-1-10-247 ~]$ scp ~/.ssh/config namenode:~/.ssh/config
config                                                                           100%  197   542.1KB/s   00:00

step 7) 네임노드가 비밀번호 없이 데이터 노드에 접근할 수 있도록 설정

네임노드로 접속해서 아래와 같은 커맨드를 실행시킨다.

ubuntu@namenode:~$ ssh-keygen -f ~/.ssh/id_rsa -t rsa -P ""
Generating public/private rsa key pair.
Your identification has been saved in /home/ubuntu/.ssh/id_rsa.
Your public key has been saved in /home/ubuntu/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:xxxxxxxxxxxxxxxxxxxxxxxx ubuntu@namenode
The key's randomart image is:
+---[RSA 2048]----+
|  xxxxxxxxxxxxx   |
+----[SHA256]-----+

ubuntu@namenode:~$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

# datanode에 public key를 인증
ubuntu@namenode:~$ ssh datanode1 'cat >> ~/.ssh/authorized_keys' < ~/.ssh/id_rsa.pub
The authenticity of host '13.124.220.29 (13.124.220.29)' can't be established.
ECDSA key fingerprint is SHA256:xxxxxxxxxxxxxxxxxxxxxxxx.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added '13.124.220.29' (ECDSA) to the list of known hosts.
    
ubuntu@namenode:~$ ssh datanode2 'cat >> ~/.ssh/authorized_keys' < ~/.ssh/id_rsa.pub
The authenticity of host '13.124.220.29 (13.124.220.29)' can't be established.
ECDSA key fingerprint is SHA256:xxxxxxxxxxxxxxxxxxxxxxxx.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added '13.124.220.29' (ECDSA) to the list of known hosts.
    
ubuntu@namenode:~$ sudo vim /etc/hosts
127.0.0.1 localhost
[네임노드 프라이빗 아이피] namenode
[데이터노드 프라이빗 아이피] datanode1
[데이터노드 프라이빗 아이피] datanode2

## 네임노드가 데이터노드에 비밀번호 없이 접속할 수 있도록 ssh 적용이 잘 되었는지 아래와 같이 확인

ubuntu@namenode:~/.ssh$ ssh localhost
The authenticity of host 'localhost (127.0.0.1)' can't be established.
ECDSA key fingerprint is SHA256:xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'localhost' (ECDSA) to the list of known hosts.
Welcome to Ubuntu 18.04.4 LTS (GNU/Linux 5.3.0-1032-aws x86_64)

 * Documentation:  https://help.ubuntu.com
 * Management:     https://landscape.canonical.com
 * Support:        https://ubuntu.com/advantage

  System information as of Sun Aug  2 13:09:03 UTC 2020

  System load:  0.0                Processes:           113
  Usage of /:   10.5% of 29.02GB   Users logged in:     1
  Memory usage: 1%                 IP address for ens5: 10.1.10.19
  Swap usage:   0%


0 packages can be updated.
0 updates are security updates.


Last login: Sun Aug  2 13:06:33 2020 from 13.125.215.118
ubuntu@namenode:~$ exit
logout
Connection to localhost closed.
ubuntu@namenode:~/.ssh$ ssh namenode
The authenticity of host 'ec2-13-125-235-220.ap-northeast-2.compute.amazonaws.com (10.1.10.19)' can't be established.
ECDSA key fingerprint is SHA256:xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'ec2-13-125-235-220.ap-northeast-2.compute.amazonaws.com,10.1.10.19' (ECDSA) to the list of known hosts.
Welcome to Ubuntu 18.04.4 LTS (GNU/Linux 5.3.0-1032-aws x86_64)

 * Documentation:  https://help.ubuntu.com
 * Management:     https://landscape.canonical.com
 * Support:        https://ubuntu.com/advantage

  System information as of Sun Aug  2 13:09:15 UTC 2020

  System load:  0.0                Processes:           114
  Usage of /:   10.5% of 29.02GB   Users logged in:     1
  Memory usage: 1%                 IP address for ens5: 10.1.10.19
  Swap usage:   0%


0 packages can be updated.
0 updates are security updates.


Last login: Sun Aug  2 13:09:03 2020 from 127.0.0.1
ubuntu@namenode:~$ exit
logout
Connection to ec2-13-125-235-220.ap-northeast-2.compute.amazonaws.com closed.
ubuntu@namenode:~/.ssh$ ssh datanode1
Welcome to Ubuntu 18.04.4 LTS (GNU/Linux 5.3.0-1032-aws x86_64)

 * Documentation:  https://help.ubuntu.com
 * Management:     https://landscape.canonical.com
 * Support:        https://ubuntu.com/advantage

  System information as of Sun Aug  2 13:09:21 UTC 2020

  System load:  0.0                Processes:           113
  Usage of /:   10.5% of 29.02GB   Users logged in:     1
  Memory usage: 1%                 IP address for ens5: 10.1.10.127
  Swap usage:   0%


0 packages can be updated.
0 updates are security updates.


Last login: Sun Aug  2 13:04:19 2020 from 1.239.132.21
ubuntu@datanode1:~$ exit
logout
Connection to ec2-3-35-53-51.ap-northeast-2.compute.amazonaws.com closed.
ubuntu@namenode:~/.ssh$ ssh datanode2
Welcome to Ubuntu 18.04.4 LTS (GNU/Linux 5.3.0-1032-aws x86_64)

 * Documentation:  https://help.ubuntu.com
 * Management:     https://landscape.canonical.com
 * Support:        https://ubuntu.com/advantage

  System information as of Sun Aug  2 13:09:27 UTC 2020

  System load:  0.08               Processes:           124
  Usage of /:   10.5% of 29.02GB   Users logged in:     1
  Memory usage: 1%                 IP address for ens5: 10.1.10.39
  Swap usage:   0%


0 packages can be updated.
0 updates are security updates.


Last login: Sun Aug  2 13:04:25 2020 from 1.239.132.21
ubuntu@datanode2:~$ exit
logout
Connection to ec2-3-35-18-245.ap-northeast-2.compute.amazonaws.com closed.

step 8) 네임노드의 하둡 config 설정

네임노드로 접속해서 아래와 같이 하둡 컨피그를 설정해준다.

ubuntu@namenode:/usr/local/hadoop/etc/hadoop$ sudo vim $HADOOP_CONF_DIR/hdfs-site.xml
<configuration>
  <property>
    <name>dfs.replication</name>
    <value>3</value>
  </property>
  # namenode 데이터를 저장하는 폴더
  <property>
    <name>dfs.namenode.name.dir</name>
    <value>file:///usr/local/hadoop/data/hdfs/namenode</value>
  </property>
</configuration>

ubuntu@namenode:/usr/local/hadoop/etc/hadoop$ sudo mkdir -p $HADOOP_HOME/data/hdfs/namenode
ubuntu@namenode:/usr/local/hadoop/etc/hadoop$ sudo chmod 777 $HADOOP_HOME/data/hdfs/namenode

# masters는 secondary namenode의 위치를 지정한다.
# 여기서는 namenode와 같은 instance에 만들기 때문에 namenode의 hostname을 입력한다.
ubuntu@namenode:/usr/local/hadoop/etc/hadoop$ sudo vim $HADOOP_CONF_DIR/masters
namenode    

# slaves는 datanode들의 위치를 지정한다.
ubuntu@namenode:/usr/local/hadoop/etc/hadoop$ sudo vim $HADOOP_CONF_DIR/slaves
# localhost는 지우고 hostname으로 작성
datanode1
datanode2

step 9) 데이터노드의 하둡 config 설정

각각의 데이터노드로 접속하여 아래와 같이 하둡 컨피그를 설정해준다.

# intermediate data를 저장하기 위한 폴더를 설정해야 한다.
# 이 폴더를 설정하지 않으면, ‘exitCode=-1000. No space available in any of the local directories’ 이런 오류가 생길 수 있다.
ubuntu@datanode:/usr/local/hadoop/etc/hadoop$ sudo vim $HADOOP_CONF_DIR/yarn-site.xml
<configuration>
<!-- Site specific YARN configuration properties -->
  <property>
    <name>yarn.nodemanager.aux-services</name>
    <value>mapreduce_shuffle</value>
  </property>
  <property>
    <name>yarn.resourcemanager.hostname</name>
    <value>[namenode의 퍼블릭 DNS주소]</value>
  </property>
  # 아래내용 추가
  <property>
    <name>yarn.nodemanager.local-dirs</name>
    <value>file:///usr/local/hadoop/yarn/local</value>
  </property>
</configuration>

ubuntu@datanode:/usr/local/hadoop/etc/hadoop$ mkdir -p /usr/local/hadoop/yarn/local
ubuntu@datanode:/usr/local/hadoop/etc/hadoop$ sudo chmod 777 /usr/local/hadoop/yarn/local
    
ubuntu@datanode:/usr/local/hadoop/etc/hadoop$ sudo vim $HADOOP_CONF_DIR/hdfs-site.xml
<configuration>
  <property>
    <name>dfs.replication</name>
    <value>3</value>
  </property>
  <property>
    <name>dfs.datanode.data.dir</name>
    <value>file:///usr/local/hadoop/data/hdfs/datanode</value>
  </property>
</configuration>

step 10) Hadoop Cluster 구동 및 정상작동 확인

namenode에서 아래와 같은 명령어로 하둡을 실행하고, 정상작동 여부를 확인한다.

# namenode format으로 오류없이 shutdown 표시가 전시되면 된다.
ubuntu@namenode:~/.ssh$ hdfs namenode -format
20/08/02 13:15:46 INFO namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG:   host = namenode/10.1.10.19
STARTUP_MSG:   args = [-format]
STARTUP_MSG:   version = 2.9.2
STARTUP_MSG:   classpath = /usr/local/hadoop/etc/hadoop:/usr/local/hadoop/share/hadoop/common/lib/commons-collections-3.2.2.jar:/usr/local/hadoop/share/hadoop/common/lib/jettison-1.1.jar:/usr/local/hadoop/share/hadoop/common/lib/httpclient-4.5.2.jar:/usr/local/hadoop/share/hadoop/common/lib/guava-11.0.2.jar:/usr/local/hadoop/share/hadoop/common/lib/jersey-server-1.9.jar:/usr/local/hadoop/share/hadoop/common/lib/xmlenc-0.52.jar:/usr/local/hadoop/share/hadoop/common/lib/api-asn1-api-1.0.0-M20.jar:/usr/local/hadoop/share/hadoop/common/lib/curator-framework-2.7.1.jar:/usr/local/hadoop/share/hadoop/common/lib/httpcore-4.4.4.jar:/usr/local/hadoop/share/hadoop/common/lib/commons-lang-2.6.jar:/usr/local/hadoop/share/hadoop/common/lib/stax-api-1.0-2.jar:/usr/local/hadoop/share/hadoop/common/lib/apacheds-i18n-2.0.0-M15.jar:/usr/local/hadoop/share/hadoop/common/lib/protobuf-java-2.5.0.jar:/usr/local/hadoop/share/hadoop/common/lib/stax2-api-3.1.4.jar:/usr/local/hadoop/share/hadoop/common/lib/jsch-0.1.54.jar:/usr/local/hadoop/share/hadoop/common/lib/jetty-sslengine-6.1.26.jar:/usr/local/hadoop/share/hadoop/common/lib/api-util-1.0.0-M20.jar:/usr/local/hadoop/share/hadoop/common/lib/jackson-mapper-asl-1.9.13.jar:/usr/local/hadoop/share/hadoop/common/lib/servlet-api-2.5.jar:/usr/local/hadoop/share/hadoop/common/lib/commons-beanutils-core-1.8.0.jar:/usr/local/hadoop/share/hadoop/common/lib/jsp-api-2.1.jar:/usr/local/hadoop/share/hadoop/common/lib/xz-1.0.jar:/usr/local/hadoop/share/hadoop/common/lib/commons-beanutils-1.7.0.jar:/usr/local/hadoop/share/hadoop/common/lib/commons-math3-3.1.1.jar:/usr/local/hadoop/share/hadoop/common/lib/apacheds-kerberos-codec-2.0.0-M15.jar:/usr/local/hadoop/share/hadoop/common/lib/asm-3.2.jar:/usr/local/hadoop/share/hadoop/common/lib/curator-recipes-2.7.1.jar:/usr/local/hadoop/share/hadoop/common/lib/snappy-java-1.0.5.jar:/usr/local/hadoop/share/hadoop/common/lib/avro-1.7.7.jar:/usr/local/hadoop/share/hadoop/common/lib/hadoop-auth-2.9.2.jar:/usr/local/hadoop/share/hadoop/common/lib/commons-codec-1.4.jar:/usr/local/hadoop/share/hadoop/common/lib/mockito-all-1.8.5.jar:/usr/local/hadoop/share/hadoop/common/lib/woodstox-core-5.0.3.jar:/usr/local/hadoop/share/hadoop/common/lib/jaxb-impl-2.2.3-1.jar:/usr/local/hadoop/share/hadoop/common/lib/java-xmlbuilder-0.4.jar:/usr/local/hadoop/share/hadoop/common/lib/paranamer-2.3.jar:/usr/local/hadoop/share/hadoop/common/lib/jackson-core-asl-1.9.13.jar:/usr/local/hadoop/share/hadoop/common/lib/log4j-1.2.17.jar:/usr/local/hadoop/share/hadoop/common/lib/commons-compress-1.4.1.jar:/usr/local/hadoop/share/hadoop/common/lib/jackson-xc-1.9.13.jar:/usr/local/hadoop/share/hadoop/common/lib/gson-2.2.4.jar:/usr/local/hadoop/share/hadoop/common/lib/commons-net-3.1.jar:/usr/local/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar:/usr/local/hadoop/share/hadoop/common/lib/slf4j-api-1.7.25.jar:/usr/local/hadoop/share/hadoop/common/lib/commons-io-2.4.jar:/usr/local/hadoop/share/hadoop/common/lib/jersey-core-1.9.jar:/usr/local/hadoop/share/hadoop/common/lib/netty-3.6.2.Final.jar:/usr/local/hadoop/share/hadoop/common/lib/json-smart-1.3.1.jar:/usr/local/hadoop/share/hadoop/common/lib/htrace-core4-4.1.0-incubating.jar:/usr/local/hadoop/share/hadoop/common/lib/jets3t-0.9.0.jar:/usr/local/hadoop/share/hadoop/common/lib/jackson-jaxrs-1.9.13.jar:/usr/local/hadoop/share/hadoop/common/lib/jcip-annotations-1.0-1.jar:/usr/local/hadoop/share/hadoop/common/lib/nimbus-jose-jwt-4.41.1.jar:/usr/local/hadoop/share/hadoop/common/lib/jetty-util-6.1.26.jar:/usr/local/hadoop/share/hadoop/common/lib/jaxb-api-2.2.2.jar:/usr/local/hadoop/share/hadoop/common/lib/commons-configuration-1.6.jar:/usr/local/hadoop/share/hadoop/common/lib/jersey-json-1.9.jar:/usr/local/hadoop/share/hadoop/common/lib/commons-cli-1.2.jar:/usr/local/hadoop/share/hadoop/common/lib/hadoop-annotations-2.9.2.jar:/usr/local/hadoop/share/hadoop/common/lib/commons-lang3-3.4.jar:/usr/local/hadoop/share/hadoop/common/lib/zookeeper-3.4.6.jar:/usr/local/hadoop/share/hadoop/common/lib/curator-client-2.7.1.jar:/usr/local/hadoop/share/hadoop/common/lib/jetty-6.1.26.jar:/usr/local/hadoop/share/hadoop/common/lib/commons-digester-1.8.jar:/usr/local/hadoop/share/hadoop/common/lib/junit-4.11.jar:/usr/local/hadoop/share/hadoop/common/lib/jsr305-3.0.0.jar:/usr/local/hadoop/share/hadoop/common/lib/commons-logging-1.1.3.jar:/usr/local/hadoop/share/hadoop/common/lib/hamcrest-core-1.3.jar:/usr/local/hadoop/share/hadoop/common/lib/activation-1.1.jar:/usr/local/hadoop/share/hadoop/common/hadoop-common-2.9.2.jar:/usr/local/hadoop/share/hadoop/common/hadoop-common-2.9.2-tests.jar:/usr/local/hadoop/share/hadoop/common/hadoop-nfs-2.9.2.jar:/usr/local/hadoop/share/hadoop/hdfs:/usr/local/hadoop/share/hadoop/hdfs/lib/guava-11.0.2.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/jersey-server-1.9.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/leveldbjni-all-1.8.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/xmlenc-0.52.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/commons-lang-2.6.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/protobuf-java-2.5.0.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/jackson-databind-2.7.8.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/jackson-mapper-asl-1.9.13.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/servlet-api-2.5.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/asm-3.2.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/commons-codec-1.4.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/jackson-core-asl-1.9.13.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/okhttp-2.7.5.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/log4j-1.2.17.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/commons-daemon-1.0.13.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/okio-1.6.0.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/commons-io-2.4.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/jersey-core-1.9.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/hadoop-hdfs-client-2.9.2.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/netty-3.6.2.Final.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/jackson-annotations-2.7.8.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/htrace-core4-4.1.0-incubating.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/xml-apis-1.3.04.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/jetty-util-6.1.26.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/netty-all-4.0.23.Final.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/xercesImpl-2.9.1.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/commons-cli-1.2.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/jackson-core-2.7.8.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/jetty-6.1.26.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/jsr305-3.0.0.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/commons-logging-1.1.3.jar:/usr/local/hadoop/share/hadoop/hdfs/hadoop-hdfs-2.9.2-tests.jar:/usr/local/hadoop/share/hadoop/hdfs/hadoop-hdfs-native-client-2.9.2-tests.jar:/usr/local/hadoop/share/hadoop/hdfs/hadoop-hdfs-rbf-2.9.2-tests.jar:/usr/local/hadoop/share/hadoop/hdfs/hadoop-hdfs-rbf-2.9.2.jar:/usr/local/hadoop/share/hadoop/hdfs/hadoop-hdfs-client-2.9.2-tests.jar:/usr/local/hadoop/share/hadoop/hdfs/hadoop-hdfs-2.9.2.jar:/usr/local/hadoop/share/hadoop/hdfs/hadoop-hdfs-native-client-2.9.2.jar:/usr/local/hadoop/share/hadoop/hdfs/hadoop-hdfs-client-2.9.2.jar:/usr/local/hadoop/share/hadoop/hdfs/hadoop-hdfs-nfs-2.9.2.jar:/usr/local/hadoop/share/hadoop/yarn:/usr/local/hadoop/share/hadoop/yarn/lib/commons-collections-3.2.2.jar:/usr/local/hadoop/share/hadoop/yarn/lib/jettison-1.1.jar:/usr/local/hadoop/share/hadoop/yarn/lib/httpclient-4.5.2.jar:/usr/local/hadoop/share/hadoop/yarn/lib/guava-11.0.2.jar:/usr/local/hadoop/share/hadoop/yarn/lib/jersey-server-1.9.jar:/usr/local/hadoop/share/hadoop/yarn/lib/leveldbjni-all-1.8.jar:/usr/local/hadoop/share/hadoop/yarn/lib/json-io-2.5.1.jar:/usr/local/hadoop/share/hadoop/yarn/lib/jersey-client-1.9.jar:/usr/local/hadoop/share/hadoop/yarn/lib/xmlenc-0.52.jar:/usr/local/hadoop/share/hadoop/yarn/lib/api-asn1-api-1.0.0-M20.jar:/usr/local/hadoop/share/hadoop/yarn/lib/curator-framework-2.7.1.jar:/usr/local/hadoop/share/hadoop/yarn/lib/httpcore-4.4.4.jar:/usr/local/hadoop/share/hadoop/yarn/lib/commons-lang-2.6.jar:/usr/local/hadoop/share/hadoop/yarn/lib/stax-api-1.0-2.jar:/usr/local/hadoop/share/hadoop/yarn/lib/apacheds-i18n-2.0.0-M15.jar:/usr/local/hadoop/share/hadoop/yarn/lib/protobuf-java-2.5.0.jar:/usr/local/hadoop/share/hadoop/yarn/lib/stax2-api-3.1.4.jar:/usr/local/hadoop/share/hadoop/yarn/lib/jsch-0.1.54.jar:/usr/local/hadoop/share/hadoop/yarn/lib/jetty-sslengine-6.1.26.jar:/usr/local/hadoop/share/hadoop/yarn/lib/api-util-1.0.0-M20.jar:/usr/local/hadoop/share/hadoop/yarn/lib/jackson-mapper-asl-1.9.13.jar:/usr/local/hadoop/share/hadoop/yarn/lib/geronimo-jcache_1.0_spec-1.0-alpha-1.jar:/usr/local/hadoop/share/hadoop/yarn/lib/servlet-api-2.5.jar:/usr/local/hadoop/share/hadoop/yarn/lib/commons-beanutils-core-1.8.0.jar:/usr/local/hadoop/share/hadoop/yarn/lib/jsp-api-2.1.jar:/usr/local/hadoop/share/hadoop/yarn/lib/xz-1.0.jar:/usr/local/hadoop/share/hadoop/yarn/lib/metrics-core-3.0.1.jar:/usr/local/hadoop/share/hadoop/yarn/lib/commons-beanutils-1.7.0.jar:/usr/local/hadoop/share/hadoop/yarn/lib/commons-math3-3.1.1.jar:/usr/local/hadoop/share/hadoop/yarn/lib/apacheds-kerberos-codec-2.0.0-M15.jar:/usr/local/hadoop/share/hadoop/yarn/lib/asm-3.2.jar:/usr/local/hadoop/share/hadoop/yarn/lib/curator-recipes-2.7.1.jar:/usr/local/hadoop/share/hadoop/yarn/lib/snappy-java-1.0.5.jar:/usr/local/hadoop/share/hadoop/yarn/lib/avro-1.7.7.jar:/usr/local/hadoop/share/hadoop/yarn/lib/commons-codec-1.4.jar:/usr/local/hadoop/share/hadoop/yarn/lib/ehcache-3.3.1.jar:/usr/local/hadoop/share/hadoop/yarn/lib/woodstox-core-5.0.3.jar:/usr/local/hadoop/share/hadoop/yarn/lib/jaxb-impl-2.2.3-1.jar:/usr/local/hadoop/share/hadoop/yarn/lib/aopalliance-1.0.jar:/usr/local/hadoop/share/hadoop/yarn/lib/java-xmlbuilder-0.4.jar:/usr/local/hadoop/share/hadoop/yarn/lib/javax.inject-1.jar:/usr/local/hadoop/share/hadoop/yarn/lib/fst-2.50.jar:/usr/local/hadoop/share/hadoop/yarn/lib/paranamer-2.3.jar:/usr/local/hadoop/share/hadoop/yarn/lib/jackson-core-asl-1.9.13.jar:/usr/local/hadoop/share/hadoop/yarn/lib/log4j-1.2.17.jar:/usr/local/hadoop/share/hadoop/yarn/lib/commons-compress-1.4.1.jar:/usr/local/hadoop/share/hadoop/yarn/lib/jackson-xc-1.9.13.jar:/usr/local/hadoop/share/hadoop/yarn/lib/gson-2.2.4.jar:/usr/local/hadoop/share/hadoop/yarn/lib/commons-net-3.1.jar:/usr/local/hadoop/share/hadoop/yarn/lib/commons-io-2.4.jar:/usr/local/hadoop/share/hadoop/yarn/lib/java-util-1.9.0.jar:/usr/local/hadoop/share/hadoop/yarn/lib/jersey-core-1.9.jar:/usr/local/hadoop/share/hadoop/yarn/lib/netty-3.6.2.Final.jar:/usr/local/hadoop/share/hadoop/yarn/lib/json-smart-1.3.1.jar:/usr/local/hadoop/share/hadoop/yarn/lib/htrace-core4-4.1.0-incubating.jar:/usr/local/hadoop/share/hadoop/yarn/lib/jets3t-0.9.0.jar:/usr/local/hadoop/share/hadoop/yarn/lib/HikariCP-java7-2.4.12.jar:/usr/local/hadoop/share/hadoop/yarn/lib/jackson-jaxrs-1.9.13.jar:/usr/local/hadoop/share/hadoop/yarn/lib/jcip-annotations-1.0-1.jar:/usr/local/hadoop/share/hadoop/yarn/lib/nimbus-jose-jwt-4.41.1.jar:/usr/local/hadoop/share/hadoop/yarn/lib/jetty-util-6.1.26.jar:/usr/local/hadoop/share/hadoop/yarn/lib/jersey-guice-1.9.jar:/usr/local/hadoop/share/hadoop/yarn/lib/jaxb-api-2.2.2.jar:/usr/local/hadoop/share/hadoop/yarn/lib/commons-configuration-1.6.jar:/usr/local/hadoop/share/hadoop/yarn/lib/jersey-json-1.9.jar:/usr/local/hadoop/share/hadoop/yarn/lib/commons-cli-1.2.jar:/usr/local/hadoop/share/hadoop/yarn/lib/guice-servlet-3.0.jar:/usr/local/hadoop/share/hadoop/yarn/lib/commons-lang3-3.4.jar:/usr/local/hadoop/share/hadoop/yarn/lib/zookeeper-3.4.6.jar:/usr/local/hadoop/share/hadoop/yarn/lib/curator-client-2.7.1.jar:/usr/local/hadoop/share/hadoop/yarn/lib/jetty-6.1.26.jar:/usr/local/hadoop/share/hadoop/yarn/lib/commons-digester-1.8.jar:/usr/local/hadoop/share/hadoop/yarn/lib/guice-3.0.jar:/usr/local/hadoop/share/hadoop/yarn/lib/jsr305-3.0.0.jar:/usr/local/hadoop/share/hadoop/yarn/lib/commons-logging-1.1.3.jar:/usr/local/hadoop/share/hadoop/yarn/lib/mssql-jdbc-6.2.1.jre7.jar:/usr/local/hadoop/share/hadoop/yarn/lib/activation-1.1.jar:/usr/local/hadoop/share/hadoop/yarn/hadoop-yarn-server-applicationhistoryservice-2.9.2.jar:/usr/local/hadoop/share/hadoop/yarn/hadoop-yarn-server-nodemanager-2.9.2.jar:/usr/local/hadoop/share/hadoop/yarn/hadoop-yarn-server-resourcemanager-2.9.2.jar:/usr/local/hadoop/share/hadoop/yarn/hadoop-yarn-registry-2.9.2.jar:/usr/local/hadoop/share/hadoop/yarn/hadoop-yarn-server-router-2.9.2.jar:/usr/local/hadoop/share/hadoop/yarn/hadoop-yarn-server-sharedcachemanager-2.9.2.jar:/usr/local/hadoop/share/hadoop/yarn/hadoop-yarn-server-common-2.9.2.jar:/usr/local/hadoop/share/hadoop/yarn/hadoop-yarn-server-web-proxy-2.9.2.jar:/usr/local/hadoop/share/hadoop/yarn/hadoop-yarn-server-timeline-pluginstorage-2.9.2.jar:/usr/local/hadoop/share/hadoop/yarn/hadoop-yarn-applications-unmanaged-am-launcher-2.9.2.jar:/usr/local/hadoop/share/hadoop/yarn/hadoop-yarn-api-2.9.2.jar:/usr/local/hadoop/share/hadoop/yarn/hadoop-yarn-server-tests-2.9.2.jar:/usr/local/hadoop/share/hadoop/yarn/hadoop-yarn-client-2.9.2.jar:/usr/local/hadoop/share/hadoop/yarn/hadoop-yarn-applications-distributedshell-2.9.2.jar:/usr/local/hadoop/share/hadoop/yarn/hadoop-yarn-common-2.9.2.jar:/usr/local/hadoop/share/hadoop/mapreduce/lib/jersey-server-1.9.jar:/usr/local/hadoop/share/hadoop/mapreduce/lib/leveldbjni-all-1.8.jar:/usr/local/hadoop/share/hadoop/mapreduce/lib/protobuf-java-2.5.0.jar:/usr/local/hadoop/share/hadoop/mapreduce/lib/jackson-mapper-asl-1.9.13.jar:/usr/local/hadoop/share/hadoop/mapreduce/lib/xz-1.0.jar:/usr/local/hadoop/share/hadoop/mapreduce/lib/asm-3.2.jar:/usr/local/hadoop/share/hadoop/mapreduce/lib/snappy-java-1.0.5.jar:/usr/local/hadoop/share/hadoop/mapreduce/lib/avro-1.7.7.jar:/usr/local/hadoop/share/hadoop/mapreduce/lib/aopalliance-1.0.jar:/usr/local/hadoop/share/hadoop/mapreduce/lib/javax.inject-1.jar:/usr/local/hadoop/share/hadoop/mapreduce/lib/paranamer-2.3.jar:/usr/local/hadoop/share/hadoop/mapreduce/lib/jackson-core-asl-1.9.13.jar:/usr/local/hadoop/share/hadoop/mapreduce/lib/log4j-1.2.17.jar:/usr/local/hadoop/share/hadoop/mapreduce/lib/commons-compress-1.4.1.jar:/usr/local/hadoop/share/hadoop/mapreduce/lib/commons-io-2.4.jar:/usr/local/hadoop/share/hadoop/mapreduce/lib/jersey-core-1.9.jar:/usr/local/hadoop/share/hadoop/mapreduce/lib/netty-3.6.2.Final.jar:/usr/local/hadoop/share/hadoop/mapreduce/lib/jersey-guice-1.9.jar:/usr/local/hadoop/share/hadoop/mapreduce/lib/guice-servlet-3.0.jar:/usr/local/hadoop/share/hadoop/mapreduce/lib/hadoop-annotations-2.9.2.jar:/usr/local/hadoop/share/hadoop/mapreduce/lib/junit-4.11.jar:/usr/local/hadoop/share/hadoop/mapreduce/lib/guice-3.0.jar:/usr/local/hadoop/share/hadoop/mapreduce/lib/hamcrest-core-1.3.jar:/usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-common-2.9.2.jar:/usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.9.2.jar:/usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-shuffle-2.9.2.jar:/usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-core-2.9.2.jar:/usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.9.2.jar:/usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-hs-plugins-2.9.2.jar:/usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-hs-2.9.2.jar:/usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-app-2.9.2.jar:/usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.9.2-tests.jar:/usr/local/hadoop/contrib/capacity-scheduler/*.jar
STARTUP_MSG:   build = https://git-wip-us.apache.org/repos/asf/hadoop.git -r 826afbeae31ca687bc2f8471dc841b66ed2c6704; compiled by 'ajisaka' on 2018-11-13T12:42Z
STARTUP_MSG:   java = 1.8.0_252
************************************************************/
20/08/02 13:15:46 INFO namenode.NameNode: registered UNIX signal handlers for [TERM, HUP, INT]
20/08/02 13:15:46 INFO namenode.NameNode: createNameNode [-format]
Formatting using clusterid: CID-1fb18375-939c-45e7-95aa-2b654c02cddf
20/08/02 13:15:46 INFO namenode.FSEditLog: Edit logging is async:true
20/08/02 13:15:46 INFO namenode.FSNamesystem: KeyProvider: null
20/08/02 13:15:46 INFO namenode.FSNamesystem: fsLock is fair: true
20/08/02 13:15:46 INFO namenode.FSNamesystem: Detailed lock hold time metrics enabled: false
20/08/02 13:15:46 INFO namenode.FSNamesystem: fsOwner             = ubuntu (auth:SIMPLE)
20/08/02 13:15:46 INFO namenode.FSNamesystem: supergroup          = supergroup
20/08/02 13:15:46 INFO namenode.FSNamesystem: isPermissionEnabled = true
20/08/02 13:15:46 INFO namenode.FSNamesystem: HA Enabled: false
20/08/02 13:15:46 INFO common.Util: dfs.datanode.fileio.profiling.sampling.percentage set to 0. Disabling file IO profiling
20/08/02 13:15:46 INFO blockmanagement.DatanodeManager: dfs.block.invalidate.limit: configured=1000, counted=60, effected=1000
20/08/02 13:15:46 INFO blockmanagement.DatanodeManager: dfs.namenode.datanode.registration.ip-hostname-check=true
20/08/02 13:15:46 INFO blockmanagement.BlockManager: dfs.namenode.startup.delay.block.deletion.sec is set to 000:00:00:00.000
20/08/02 13:15:46 INFO blockmanagement.BlockManager: The block deletion will start around 2020 Aug 02 13:15:46
20/08/02 13:15:46 INFO util.GSet: Computing capacity for map BlocksMap
20/08/02 13:15:46 INFO util.GSet: VM type       = 64-bit
20/08/02 13:15:46 INFO util.GSet: 2.0% max memory 889 MB = 17.8 MB
20/08/02 13:15:46 INFO util.GSet: capacity      = 2^21 = 2097152 entries
20/08/02 13:15:46 INFO blockmanagement.BlockManager: dfs.block.access.token.enable=false
20/08/02 13:15:46 WARN conf.Configuration: No unit for dfs.heartbeat.interval(3) assuming SECONDS
20/08/02 13:15:46 WARN conf.Configuration: No unit for dfs.namenode.safemode.extension(30000) assuming MILLISECONDS
20/08/02 13:15:46 INFO blockmanagement.BlockManagerSafeMode: dfs.namenode.safemode.threshold-pct = 0.9990000128746033
20/08/02 13:15:46 INFO blockmanagement.BlockManagerSafeMode: dfs.namenode.safemode.min.datanodes = 0
20/08/02 13:15:46 INFO blockmanagement.BlockManagerSafeMode: dfs.namenode.safemode.extension = 30000
20/08/02 13:15:46 INFO blockmanagement.BlockManager: defaultReplication         = 3
20/08/02 13:15:46 INFO blockmanagement.BlockManager: maxReplication             = 512
20/08/02 13:15:46 INFO blockmanagement.BlockManager: minReplication             = 1
20/08/02 13:15:46 INFO blockmanagement.BlockManager: maxReplicationStreams      = 2
20/08/02 13:15:46 INFO blockmanagement.BlockManager: replicationRecheckInterval = 3000
20/08/02 13:15:46 INFO blockmanagement.BlockManager: encryptDataTransfer        = false
20/08/02 13:15:46 INFO blockmanagement.BlockManager: maxNumBlocksToLog          = 1000
20/08/02 13:15:46 INFO namenode.FSNamesystem: Append Enabled: true
20/08/02 13:15:46 INFO namenode.FSDirectory: GLOBAL serial map: bits=24 maxEntries=16777215
20/08/02 13:15:46 INFO util.GSet: Computing capacity for map INodeMap
20/08/02 13:15:46 INFO util.GSet: VM type       = 64-bit
20/08/02 13:15:46 INFO util.GSet: 1.0% max memory 889 MB = 8.9 MB
20/08/02 13:15:46 INFO util.GSet: capacity      = 2^20 = 1048576 entries
20/08/02 13:15:46 INFO namenode.FSDirectory: ACLs enabled? false
20/08/02 13:15:46 INFO namenode.FSDirectory: XAttrs enabled? true
20/08/02 13:15:46 INFO namenode.NameNode: Caching file names occurring more than 10 times
20/08/02 13:15:46 INFO snapshot.SnapshotManager: Loaded config captureOpenFiles: falseskipCaptureAccessTimeOnlyChange: false
20/08/02 13:15:46 INFO util.GSet: Computing capacity for map cachedBlocks
20/08/02 13:15:46 INFO util.GSet: VM type       = 64-bit
20/08/02 13:15:46 INFO util.GSet: 0.25% max memory 889 MB = 2.2 MB
20/08/02 13:15:46 INFO util.GSet: capacity      = 2^18 = 262144 entries
20/08/02 13:15:46 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.window.num.buckets = 10
20/08/02 13:15:46 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.num.users = 10
20/08/02 13:15:46 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.windows.minutes = 1,5,25
20/08/02 13:15:46 INFO namenode.FSNamesystem: Retry cache on namenode is enabled
20/08/02 13:15:46 INFO namenode.FSNamesystem: Retry cache will use 0.03 of total heap and retry cache entry expiry time is 600000 millis
20/08/02 13:15:46 INFO util.GSet: Computing capacity for map NameNodeRetryCache
20/08/02 13:15:46 INFO util.GSet: VM type       = 64-bit
20/08/02 13:15:46 INFO util.GSet: 0.029999999329447746% max memory 889 MB = 273.1 KB
20/08/02 13:15:46 INFO util.GSet: capacity      = 2^15 = 32768 entries
20/08/02 13:15:46 INFO namenode.FSImage: Allocated new BlockPoolId: BP-1427593057-10.1.10.19-1596374146724
20/08/02 13:15:46 INFO common.Storage: Storage directory /usr/local/hadoop/data/hdfs/namenode has been successfully formatted.
20/08/02 13:15:46 INFO namenode.FSImageFormatProtobuf: Saving image file /usr/local/hadoop/data/hdfs/namenode/current/fsimage.ckpt_0000000000000000000 using no compression
20/08/02 13:15:46 INFO namenode.FSImageFormatProtobuf: Image file /usr/local/hadoop/data/hdfs/namenode/current/fsimage.ckpt_0000000000000000000 of size 325 bytes saved in 0 seconds .
20/08/02 13:15:46 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0
20/08/02 13:15:46 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at namenode/10.1.10.19
************************************************************/

# HDFS 구동
# sudo start-dfs.sh와 같이 루트권한으로 실행하게 되면 노드 통신간 퍼미션 디나이 오류가 발생하니 주의해야한다.
ubuntu@namenode:~$ /usr/local/hadoop/sbin/start-dfs.sh
Starting namenodes on [ec2-13-125-235-220.ap-northeast-2.compute.amazonaws.com]
ec2-13-125-235-220.ap-northeast-2.compute.amazonaws.com: starting namenode, logging to /usr/local/hadoop/logs/hadoop-ubuntu-namenode-namenode.out
datanode1: starting datanode, logging to /usr/local/hadoop/logs/hadoop-ubuntu-datanode-datanode1.out
datanode2: starting datanode, logging to /usr/local/hadoop/logs/hadoop-ubuntu-datanode-datanode2.out
Starting secondary namenodes [0.0.0.0]
The authenticity of host '0.0.0.0 (0.0.0.0)' can't be established.
ECDSA key fingerprint is SHA256:xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx.
Are you sure you want to continue connecting (yes/no)? yes
0.0.0.0: Warning: Permanently added '0.0.0.0' (ECDSA) to the list of known hosts.
0.0.0.0: starting secondarynamenode, logging to /usr/local/hadoop/logs/hadoop-ubuntu-secondarynamenode-namenode.out
    
# hadoop yarn 구동
# sudo start-yarn.sh와 같이 루트권한으로 실행하게 되면 노드 통신간 퍼미션 디나이 오류가 발생하니 주의해야한다.
ubuntu@namenode:~$ /usr/local/hadoop/sbin/start-yarn.sh
starting yarn daemons
starting resourcemanager, logging to /usr/local/hadoop/logs/yarn-ubuntu-resourcemanager-namenode.out
datanode1: starting nodemanager, logging to /usr/local/hadoop/logs/yarn-ubuntu-nodemanager-datanode1.out
datanode2: starting nodemanager, logging to /usr/local/hadoop/logs/yarn-ubuntu-nodemanager-datanode2.out

# history 서버 구동
ubuntu@namenode:~$ /usr/local/hadoop/sbin/mr-jobhistory-daemon.sh start historyserver
starting historyserver, logging to /usr/local/hadoop/logs/mapred-ubuntu-historyserver-namenode.out

# 자바 프로세스로도 확인
ubuntu@namenode:~$ jps
2656 JobHistoryServer
1942 NameNode
2200 SecondaryNameNode
2729 Jps
2366 ResourceManager



##--데이터노드에서도 jps 명령어를 실행하면 아래와 같이 자바 프로세스가 돌고 있을것이다.--##

ubuntu@datanode1:~$ jps
2018 Jps
1701 DataNode
1895 NodeManager

##-----------------------------------------------------------------------------------##


# HDFS 어드민 리포트 확인
ubuntu@namenode:/usr/local/hadoop$ hdfs dfsadmin -report
Configured Capacity: 62317690880 (58.04 GB)
Present Capacity: 55728148480 (51.90 GB)
DFS Remaining: 55728099328 (51.90 GB)
DFS Used: 49152 (48 KB)
DFS Used%: 0.00%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0
Missing blocks (with replication factor 1): 0
Pending deletion blocks: 0
    
-------------------------------------------------

Live datanodes (2):
    
Name: 10.1.10.127:50010 (datanode1)
Hostname: ip-10-1-10-127.ap-northeast-2.compute.internal
Decommission Status : Normal
Configured Capacity: 31158845440 (29.02 GB)
DFS Used: 24576 (24 KB)
Non DFS Used: 3277991936 (3.05 GB)
DFS Remaining: 27864051712 (25.95 GB)
DFS Used%: 0.00%
DFS Remaining%: 89.43%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Sun Aug 02 13:23:23 UTC 2020
Last Block Report: Sun Aug 02 13:17:41 UTC 2020

Name: 10.1.10.39:50010 (datanode2)
Hostname: ip-10-1-10-39.ap-northeast-2.compute.internal
Decommission Status : Normal
Configured Capacity: 31158845440 (29.02 GB)
DFS Used: 24576 (24 KB)
Non DFS Used: 3277996032 (3.05 GB)
DFS Remaining: 27864047616 (25.95 GB)
DFS Used%: 0.00%
DFS Remaining%: 89.43%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Sun Aug 02 13:23:24 UTC 2020
Last Block Report: Sun Aug 02 13:17:51 UTC 2020
            
## 또는 HADOOP UI(:50070)과 YARN UI(:8088)을 통해서도 하둡이 돌아가는 것을 확인할 수도 있다.

step 11) 간단한 hadoop job 테스트

네임노드에서 아래와 같은 명령어로 테스트를 진행해본다.

# home directory 생성
ubuntu@namenode:/usr/local/hadoop$ hdfs dfs -mkdir -p /user/ubuntu

# create random-data
ubuntu@namenode:/usr/local/hadoop$ hadoop jar /usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-e*.jar teragen 500000 random-data
20/08/02 13:28:22 INFO client.RMProxy: Connecting to ResourceManager at ec2-13-125-235-220.ap-northeast-2.amazonaws.com/10.1.10.19:8032
20/08/02 13:28:23 INFO terasort.TeraGen: Generating 500000 using 2
20/08/02 13:28:23 INFO mapreduce.JobSubmitter: number of splits:2
20/08/02 13:28:23 INFO Configuration.deprecation: yarn.resourcemanager.system-metrics-publisher.enabled iated. Instead, use yarn.system-metrics-publisher.enabled
20/08/02 13:28:23 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1596374353473_0001
20/08/02 13:28:24 INFO impl.YarnClientImpl: Submitted application application_1596374353473_0001
20/08/02 13:28:24 INFO mapreduce.Job: The url to track the job: http://ec2-13-125-235-220.ap-northeast-2.amazonaws.com:8088/proxy/application_1596374353473_0001/
20/08/02 13:28:24 INFO mapreduce.Job: Running job: job_1596374353473_0001
20/08/02 13:28:35 INFO mapreduce.Job: Job job_1596374353473_0001 running in uber mode : false
20/08/02 13:28:35 INFO mapreduce.Job:  map 0% reduce 0%
20/08/02 13:28:39 INFO mapreduce.Job:  map 50% reduce 0%
20/08/02 13:28:46 INFO mapreduce.Job:  map 100% reduce 0%
20/08/02 13:28:46 INFO mapreduce.Job: Job job_1596374353473_0001 completed successfully
20/08/02 13:28:46 INFO mapreduce.Job: Counters: 31
        File System Counters
                FILE: Number of bytes read=0
                FILE: Number of bytes written=396576
                FILE: Number of read operations=0
                FILE: Number of large read operations=0
                FILE: Number of write operations=0
                HDFS: Number of bytes read=167
                HDFS: Number of bytes written=50000000
                HDFS: Number of read operations=8
                HDFS: Number of large read operations=0
                HDFS: Number of write operations=4
        Job Counters
                Launched map tasks=2
                Other local map tasks=2
                Total time spent by all maps in occupied slots (ms)=10786
                Total time spent by all reduces in occupied slots (ms)=0
                Total time spent by all map tasks (ms)=10786
                Total vcore-milliseconds taken by all map tasks=10786
                Total megabyte-milliseconds taken by all map tasks=11044864
        Map-Reduce Framework
                Map input records=500000
                Map output records=500000
                Input split bytes=167
                Spilled Records=0
                Failed Shuffles=0
                Merged Map outputs=0
                GC time elapsed (ms)=117
                CPU time spent (ms)=2330
                Physical memory (bytes) snapshot=394739712
                Virtual memory (bytes) snapshot=3880120320
                Total committed heap usage (bytes)=282066944
        org.apache.hadoop.examples.terasort.TeraGen$Counters
                CHECKSUM=1074598070305752
        File Input Format Counters
                Bytes Read=0
        File Output Format Counters
                Bytes Written=50000000
            
# sort job         
ubuntu@namenode:/usr/local/hadoop$ hadoop jar /usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-e*.jar terasort random-data sorted-data
20/08/02 13:28:54 INFO terasort.TeraSort: starting
20/08/02 13:28:55 INFO input.FileInputFormat: Total input files to process : 2
Spent 73ms computing base-splits.
Spent 2ms computing TeraScheduler splits.
Computing input splits took 76ms
Sampling 2 splits of 2
Making 1 from 100000 sampled records
Computing parititions took 447ms
Spent 525ms computing partitions.
20/08/02 13:28:56 INFO client.RMProxy: Connecting to ResourceManager at ec2-13-125-235-220.ap-northeast-2.amazonaws.com/10.1.10.19:8032
20/08/02 13:28:56 INFO mapreduce.JobSubmitter: number of splits:2
20/08/02 13:28:56 INFO Configuration.deprecation: yarn.resourcemanager.system-metrics-publisher.enabled iated. Instead, use yarn.system-metrics-publisher.enabled
20/08/02 13:28:56 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1596374353473_0002
20/08/02 13:28:56 INFO impl.YarnClientImpl: Submitted application application_1596374353473_0002
20/08/02 13:28:56 INFO mapreduce.Job: The url to track the job: http://ec2-13-125-235-220.ap-northeast-2.amazonaws.com:8088/proxy/application_1596374353473_0002/
20/08/02 13:28:56 INFO mapreduce.Job: Running job: job_1596374353473_0002
20/08/02 13:29:01 INFO mapreduce.Job: Job job_1596374353473_0002 running in uber mode : false
20/08/02 13:29:01 INFO mapreduce.Job:  map 0% reduce 0%
20/08/02 13:29:07 INFO mapreduce.Job:  map 100% reduce 0%
20/08/02 13:29:12 INFO mapreduce.Job:  map 100% reduce 100%
20/08/02 13:29:13 INFO mapreduce.Job: Job job_1596374353473_0002 completed successfully
20/08/02 13:29:14 INFO mapreduce.Job: Counters: 49
        File System Counters
                FILE: Number of bytes read=52000006
                FILE: Number of bytes written=104599170
                FILE: Number of read operations=0
                FILE: Number of large read operations=0
                FILE: Number of write operations=0
                HDFS: Number of bytes read=50000338
                HDFS: Number of bytes written=50000000
                HDFS: Number of read operations=9
                HDFS: Number of large read operations=0
                HDFS: Number of write operations=2
        Job Counters
                Launched map tasks=2
                Launched reduce tasks=1
                Data-local map tasks=2
                Total time spent by all maps in occupied slots (ms)=7650
                Total time spent by all reduces in occupied slots (ms)=2807
                Total time spent by all map tasks (ms)=7650
                Total time spent by all reduce tasks (ms)=2807
                Total vcore-milliseconds taken by all map tasks=7650
                Total vcore-milliseconds taken by all reduce tasks=2807
                Total megabyte-milliseconds taken by all map tasks=7833600
                Total megabyte-milliseconds taken by all reduce tasks=2874368
        Map-Reduce Framework
                Map input records=500000
                Map output records=500000
                Map output bytes=51000000
                Map output materialized bytes=52000012
                Input split bytes=338
                Combine input records=0
                Combine output records=0
                Reduce input groups=500000
                Reduce shuffle bytes=52000012
                Reduce input records=500000
                Reduce output records=500000
                Spilled Records=1000000
                Shuffled Maps =2
                Failed Shuffles=0
                Merged Map outputs=2
                GC time elapsed (ms)=304
                CPU time spent (ms)=6570
                Physical memory (bytes) snapshot=904011776
                Virtual memory (bytes) snapshot=5790863360
                Total committed heap usage (bytes)=588251136
        Shuffle Errors
                BAD_ID=0
                CONNECTION=0
                IO_ERROR=0
                WRONG_LENGTH=0
                WRONG_MAP=0
                WRONG_REDUCE=0
        File Input Format Counters
                Bytes Read=50000000
        File Output Format Counters
                Bytes Written=50000000
20/08/02 13:29:14 INFO terasort.TeraSort: done

** 참고사항

instance를 중지시킬 경우에는 모든 프로세스를 아래와 같이 먼저 종료해줘야 한다.

stop-yarn.sh
stop-dfs.sh
mr-jobhistory-daemon.sh stop historyserver

만약에 datanode가 제대로 실행되지 않은 경우에 재구축 방법은 아래와 같다.

# namenode에서 아래와 같은 명령어 실행
rm -rf /usr/local/hadoop/data/hdfs/namenode/*
# 위에 명령어 실행 후 reboot 불필요

# datanode에서 아래와 같은 명령어 실행
rm -rf /usr/local/hadoop/data/hdfs/datanode/*
rm -rf /usr/local/hadoop/yarn/local/*
sudo reboot
# 위의 명령어를 실행하고 나서 다시 hdfs namenode -format부터 실행해본다.

 nginx reverse proxy를 이용한 private 네트워크의 ES 접근 Kafka producer application 실습 