EC2 설치형으로 CDH 클러스터 구현하기

2021-09-21

.

Data_Engineering_TIL(20210920)

[참고자료]

“Installed_CDP (on ec2)” 최정민님 깃허브 자료를 공부하고 실습한 내용입니다.

URL : https://github.com/cjungm/with-aws/tree/main/CDP/Installed_cdp

[실습목적]

마스터 1, 슬레이브 3으로 구성된 CDH 클러스터를 구성한다.

[실습내용 요약]

STEP 1) EC2 인스턴스 생성

STEP 2) 필요한 패키지 설치 및 기본적인 리눅스 설정

STEP 3) vm.swappiness 설정

STEP 4) /etc/rc.d/rc.local 수정

STEP 5) /etc/default/grub 수정

STEP 6) CDH 설치하기

STEP 7) CDH가 설치된 EC2 이미지를 이용한 다수의 노드 생성

STEP 8) 클러스터를 구성하는 노드들의 네트워크 설정

STEP 9) CDH Database 설치

STEP 10) CDH 클러스터 구성

[실습 상세내용]

STEP 1) EC2 인스턴스 생성

  • EC2 생성 설정값

OS : RHEL-7.9_HVM

CDH_VERSION : 7.1.4

instace Type : r5.4xlarge (16vCPU, 128GB RAM)

EBS 스토리지 볼륨크기 : 100 GB

  • EC2 생성하기

아래 그림에는 m5a.xlarge로 선택한걸로 되어있는데 r5.4xlarge를 선택하자.

1

STEP 2) 필요한 패키지 설치 및 기본적인 리눅스 설정

# yum install
[ec2-user@ip-10-10-1-141 ~]$ sudo yum update -y

[ec2-user@ip-10-10-1-141 ~]$ sudo yum install wget ntp iptables-services vim -y

# ntpd 활성화
[ec2-user@ip-10-10-1-141 ~]$ sudo systemctl start ntpd

[ec2-user@ip-10-10-1-141 ~]$ sudo systemctl enable ntpd
Created symlink from /etc/systemd/system/multi-user.target.wants/ntpd.service to /usr/lib/systemd/system/ntpd.service.

# ntpd 상시 활성화
[ec2-user@ip-10-10-1-141 ~]$ sudo chkconfig ntpd on
Note: Forwarding request to 'systemctl enable ntpd.service'.

# ntpd 활성화 내역 확인
[ec2-user@ip-10-10-1-141 ~]$ sudo chkconfig ntpd
Note: Forwarding request to 'systemctl is-enabled ntpd.service'.
enabled

# ntpd를 활용한 rpm download & install
[ec2-user@ip-10-10-1-141 ~]$ wget https://rpmfind.net/linux/centos/7.9.2009/os/x86_64/Packages/libtirpc-devel-0.2.4-0.16.el7.x86_64.rpm
--2021-09-20 05:30:18--  https://rpmfind.net/linux/centos/7.9.2009/os/x86_64/Packages/libtirpc-devel-0.2.4-0.16.el7.x86_64.rpm
Resolving rpmfind.net (rpmfind.net)... xxx.xxx.xxx.xxx
Connecting to rpmfind.net (rpmfind.net)|xxx.xxx.xxx.xxx|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 93208 (91K) [application/x-rpm]
Saving to: ‘libtirpc-devel-0.2.4-0.16.el7.x86_64.rpm’

100%[======================================================================================================================>] 93,208       182KB/s   in 0.5s

2021-09-20 05:30:20 (182 KB/s) - ‘libtirpc-devel-0.2.4-0.16.el7.x86_64.rpm’ saved [93208/93208]

[ec2-user@ip-10-10-1-141 ~]$ sudo yum install libtirpc-devel-0.2.4-0.16.el7.x86_64.rpm -y
Loaded plugins: amazon-id, search-disabled-repos
Examining libtirpc-devel-0.2.4-0.16.el7.x86_64.rpm: libtirpc-devel-0.2.4-0.16.el7.x86_64
Marking libtirpc-devel-0.2.4-0.16.el7.x86_64.rpm to be installed
Resolving Dependencies
--> Running transaction check
---> Package libtirpc-devel.x86_64 0:0.2.4-0.16.el7 will be installed
--> Processing Dependency: libtirpc = 0.2.4-0.16.el7 for package: libtirpc-devel-0.2.4-0.16.el7.x86_64
--> Processing Dependency: libtirpc.so.1()(64bit) for package: libtirpc-devel-0.2.4-0.16.el7.x86_64
--> Running transaction check
---> Package libtirpc.x86_64 0:0.2.4-0.16.el7 will be installed
--> Finished Dependency Resolution

Dependencies Resolved

================================================================================================================================================================
 Package                           Arch                      Version                             Repository                                                Size
================================================================================================================================================================
Installing:
 libtirpc-devel                    x86_64                    0.2.4-0.16.el7                      /libtirpc-devel-0.2.4-0.16.el7.x86_64                    214 k
Installing for dependencies:
 libtirpc                          x86_64                    0.2.4-0.16.el7                      rhel-7-server-rhui-rpms                                   89 k

Transaction Summary
================================================================================================================================================================
Install  1 Package (+1 Dependent package)

Total size: 303 k
Total download size: 89 k
Installed size: 397 k
Downloading packages:
libtirpc-0.2.4-0.16.el7.x86_64.rpm                                                                                                       |  89 kB  00:00:00
Running transaction check
Running transaction test
Transaction test succeeded
Running transaction
  Installing : libtirpc-0.2.4-0.16.el7.x86_64                                                                                                               1/2
  Installing : libtirpc-devel-0.2.4-0.16.el7.x86_64                                                                                                         2/2
  Verifying  : libtirpc-0.2.4-0.16.el7.x86_64                                                                                                               1/2
  Verifying  : libtirpc-devel-0.2.4-0.16.el7.x86_64                                                                                                         2/2

Installed:
  libtirpc-devel.x86_64 0:0.2.4-0.16.el7

Dependency Installed:
  libtirpc.x86_64 0:0.2.4-0.16.el7

Complete!

# selinux disable 하기
# 아래와 같이 vim으로 직접 수정해도 되고 아니면
# 아래에 linux 명령어 수행을 통한 변경도 가능하다.
# sed -i 's/SELINUX= enforcing/SELINUX=disabled/' /etc/selinux/config
[ec2-user@ip-10-10-1-141 ~]$ sudo vim /etc/selinux/config

##################################################################################
# 원본내용
##################################################################################

# This file controls the state of SELinux on the system.
# SELINUX= can take one of these three values:
#     enforcing - SELinux security policy is enforced.
#     permissive - SELinux prints warnings instead of enforcing.
#     disabled - No SELinux policy is loaded.
SELINUX=enforcing
# SELINUXTYPE= can take one of three values:
#     targeted - Targeted processes are protected,
#     minimum - Modification of targeted policy. Only selected processes are protected.
#     mls - Multi Level Security protection.
SELINUXTYPE=targeted

##################################################################################
# 수정후
##################################################################################

# This file controls the state of SELinux on the system.
# SELINUX= can take one of these three values:
#     enforcing - SELinux security policy is enforced.
#     permissive - SELinux prints warnings instead of enforcing.
#     disabled - No SELinux policy is loaded.
SELINUX=disabled
# SELINUXTYPE= can take one of three values:
#     targeted - Targeted processes are protected,
#     minimum - Modification of targeted policy. Only selected processes are protected.
#     mls - Multi Level Security protection.
SELINUXTYPE=targeted

# key-gen 하기
# 먼저 공개 키를 생성한다.
[ec2-user@ip-10-10-1-141 ~]$ ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
Generating public/private dsa key pair.
Your identification has been saved in /home/ec2-user/.ssh/id_dsa.
Your public key has been saved in /home/ec2-user/.ssh/id_dsa.pub.
The key fingerprint is:
SHA256:xxxxxxxxxxxxxxxxxx ec2-user@ip-10-10-1-141.ap-northeast-2.compute.internal
The key's randomart image is:
+---[DSA 1024]----+
|xxxxxxxxxxxxxxxx|
|xxxxxxxxxxxxxxxx|
|xxxxxxxxxxxxxxxx|
|xxxxxxxxxxxxxxxx|
|xxxxxxxxxxxxxxxx|
|xxxxxxxxxxxxxxxx|
|xxxxxxxxxxxxxxxx|
|xxxxxxxxxxxxxxxx|
|xxxxxxxxxxxxxxxx|
+----[SHA256]-----+

# 그리고 공개 키 내용 변경을 진행한다.
[ec2-user@ip-10-10-1-141 ~]$ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys

# 마지막으로 권한 변경을 해준다.
[ec2-user@ip-10-10-1-141 ~]$ chmod 0600 ~/.ssh/authorized_keys

# 키가 잘 생성되었나 test해보기
[ec2-user@ip-10-10-1-141 ~]$ ssh localhost
The authenticity of host 'localhost (::1)' can't be established.
ECDSA key fingerprint is xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
ECDSA key fingerprint is xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'localhost' (ECDSA) to the list of known hosts.
Last login: Mon Sep 20 05:22:55 2021 from xxx.xx.xx.xxx
            
[ec2-user@ip-10-10-1-141 ~]$ exit
logout
Connection to localhost closed.

STEP 3) vm.swappiness 설정

저장공간의 스왑 활용도, 스와핑 활용도, 스와피니스 리눅스 커널 속성 중 하나 스왑메모리 활용 수준 조절 스왑 사용의 적극성 수준에 대한 설정

  • 설정값 설명

설정 값의 범위: 0 ~ 100 (default : 60)

vm.swappiness = 0 –> 스왑 사용안함

vm.swappiness = 1 –> 스왑 사용 최소화

vm.swappiness = 60 –> 기본값

vm.swappiness = 100 –> 적극적으로 스왑 사용

  • 아래와 같이 명령어를 실행해서 설정을 해준다.
# vm.swappiness option
[ec2-user@ip-10-10-1-141 ~]$ sysctl vm.swappiness
vm.swappiness = 30

# vm.swappiness를 1로 설정한다.
[ec2-user@ip-10-10-1-141 ~]$ sudo sysctl vm.swappiness=1
vm.swappiness = 1

[ec2-user@ip-10-10-1-141 ~]$ sudo su

[root@ip-10-10-1-141 ec2-user]# echo "vm.swappiness=1" >> /etc/sysctl.conf

[root@ip-10-10-1-141 ec2-user]# cat /etc/sysctl.conf
# sysctl settings are defined through files in
# /usr/lib/sysctl.d/, /run/sysctl.d/, and /etc/sysctl.d/.
#
# Vendors settings live in /usr/lib/sysctl.d/.
# To override a whole file, create a new file with the same in
# /etc/sysctl.d/ and put new settings there. To override
# only specific settings, add a file with a lexically later
# name in /etc/sysctl.d/ and put new settings there.
#
# For more information, see sysctl.conf(5) and sysctl.d(5).
vm.swappiness=1

[root@ip-10-10-1-141 ec2-user]# exit
exit

STEP 4) /etc/rc.d/rc.local 수정

rc.local는 어떤용도냐면 부팅시 자동 실행 명령어 스크립트를 수행한다.

THP(Transparent Huge Pages)를 결론적으로 비활성화하는 설정이다.

** 참고 URL : https://allthatlinux.com/dokuwiki/doku.php?id=thp_transparent_huge_pages_%EA%B8%B0%EB%8A%A5%EA%B3%BC_%EC%84%A4%EC%A0%95_%EB%B0%A9%EB%B2%95

일반적으로 서버 부팅시마다 매번 자동 실행되길 원하는 명령어는 /etc/rc.d/rc.local에 넣어주면 된다.

[ec2-user@ip-10-10-1-141 ~]$ sudo chmod +x /etc/rc.d/rc.local

[ec2-user@ip-10-10-1-141 ~]$ sudo vim /etc/rc.d/rc.local
##################################################################################
# 원본파일
##################################################################################

#!/bin/bash
# THIS FILE IS ADDED FOR COMPATIBILITY PURPOSES
#
# It is highly advisable to create own systemd services or udev rules
# to run scripts during boot instead of using this file.
#
# In contrast to previous versions due to parallel execution during boot
# this script will NOT be run after all other services.
#
# Please note that you must run 'chmod +x /etc/rc.d/rc.local' to ensure
# that this script will be executed during boot.

touch /var/lock/subsys/local

##################################################################################
# 수정후
##################################################################################

#!/bin/bash
# THIS FILE IS ADDED FOR COMPATIBILITY PURPOSES
#
# It is highly advisable to create own systemd services or udev rules
# to run scripts during boot instead of using this file.
#
# In contrast to previous versions due to parallel execution during boot
# this script will NOT be run after all other services.
#
# Please note that you must run 'chmod +x /etc/rc.d/rc.local' to ensure
# that this script will be executed during boot.

touch /var/lock/subsys/local
echo never > /sys/kernel/mm/transparent_hugepage/enabled
echo never > /sys/kernel/mm/transparent_hugepage/defrag

# THP 옵션이 활성화 확인 방법
# [always] madvise never  -> 출력된 결과에 [always] 에 대괄호가 되어있으면 THP가 활성화 된 상태임
# always madvise [never]  -> 출력된 결과에 [never] 에 대괄호가 되어있으면 THP가 비활성화 된 상태임 
[ec2-user@ip-10-10-1-141 ~]$ cat /sys/kernel/mm/transparent_hugepage/enabled
[always] madvise never

STEP 5) /etc/default/grub 수정

부팅시 관여하는 설정파일인데 결론적으로 THP(Transparent Huge Pages)를 결론적으로 비활성화하기 위해서 수정을 하는 것이다.

[ec2-user@ip-10-10-1-141 ~]$ sudo su

[root@ip-10-10-1-141 ec2-user]# cat /etc/default/grub
GRUB_TIMEOUT=1
GRUB_DISTRIBUTOR="$(sed 's, release .*$,,g' /etc/system-release)"
GRUB_DEFAULT=saved
GRUB_DISABLE_SUBMENU=true
GRUB_TERMINAL_OUTPUT="console"
GRUB_CMDLINE_LINUX="console=ttyS0,115200n8 console=tty0 net.ifnames=0 rd.blacklist=nouveau nvme_core.io_timeout=4294967295 crashkernel=auto"
GRUB_DISABLE_RECOVERY="true"

[root@ip-10-10-1-141 ec2-user]# echo "transparent_hugepage=never" >> /etc/default/grub

[root@ip-10-10-1-141 ec2-user]# grub2-mkconfig -o /boot/grub2/grub.cfg
Generating grub configuration file ...
Found linux image: /boot/vmlinuz-3.10.0-1160.42.2.el7.x86_64
Found initrd image: /boot/initramfs-3.10.0-1160.42.2.el7.x86_64.img
Found linux image: /boot/vmlinuz-3.10.0-1160.15.2.el7.x86_64
Found initrd image: /boot/initramfs-3.10.0-1160.15.2.el7.x86_64.img
Found linux image: /boot/vmlinuz-0-rescue-xxxxxxxxxxxxxxxxxxxxxxxxx
Found initrd image: /boot/initramfs-0-rescue-xxxxxxxxxxxxxxxxxxxxxxxxx.img
Found linux image: /boot/vmlinuz-0-rescue-xxxxxxxxxxxxxxxxxxxxxxxxx
Found initrd image: /boot/initramfs-0-rescue-xxxxxxxxxxxxxxxxxxxxxxxxx.img
done

[root@ip-10-10-1-141 ec2-user]# systemctl start tuned

[root@ip-10-10-1-141 ec2-user]# tuned-adm off

[root@ip-10-10-1-141 ec2-user]# tuned-adm list
Available profiles:
- balanced                    - General non-specialized tuned profile
- desktop                     - Optimize for the desktop use-case
- hpc-compute                 - Optimize for HPC compute workloads
- latency-performance         - Optimize for deterministic performance at the cost of increased power consumption
- network-latency             - Optimize for deterministic performance at the cost of increased power consumption, focused on low latency network performance
- network-throughput          - Optimize for streaming network throughput, generally only necessary on older CPUs or 40G+ networks
- powersave                   - Optimize for low power consumption
- throughput-performance      - Broadly applicable tuning that provides excellent performance across a variety of common server workloads
- virtual-guest               - Optimize for running inside a virtual guest
- virtual-host                - Optimize for running KVM guests
No current active profile.

[root@ip-10-10-1-141 ec2-user]# systemctl stop tuned

[root@ip-10-10-1-141 ec2-user]# systemctl disable tuned
Removed symlink /etc/systemd/system/multi-user.target.wants/tuned.service.

[root@ip-10-10-1-141 ec2-user]# exit
exit

STEP 6) CDH 설치하기

[ec2-user@ip-10-10-1-141 ~]$ wget https://archive.cloudera.com/cm7/7.1.4/cloudera-manager-installer.bin
--2021-09-20 06:57:37--  https://archive.cloudera.com/cm7/7.1.4/cloudera-manager-installer.bin
Resolving archive.cloudera.com (archive.cloudera.com)... xxx.xxx.xxx.xxx
Connecting to archive.cloudera.com (archive.cloudera.com)|xxx.xxx.xxx.xxx|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 830310 (811K) [application/octet-stream]
Saving to: ‘cloudera-manager-installer.bin’

100%[=========================================================================================================================>] 830,310     2.07MB/s   in 0.4s

2021-09-20 06:57:38 (2.07 MB/s) - ‘cloudera-manager-installer.bin’ saved [830310/830310]

[ec2-user@ip-10-10-1-141 ~]$ sudo chmod u+x cloudera-manager-installer.bin

[ec2-user@ip-10-10-1-141 ~]$ sudo ./cloudera-manager-installer.bin

2

# 서버 부팅 시 자동실행 설정
[ec2-user@ip-10-10-1-141 ~]$ sudo systemctl enable cloudera-scm-server.service

설치가 완료되면 웹브라우저를 열고 http://{EC2 public IP}:7180으로 접속해서 아래와 같은 화면처럼 클라우드애라 매니저에 접속이 잘 되는지 확인한다.

3

STEP 7) CDH가 설치된 EC2 이미지를 이용한 다수의 노드 생성

아래 그림과 같이 위에서 만든 EC2를 AMI 이미지로 만들고 그거를 갖고 r5.4xlarge 사양으로 3대의 노드를 새로 생성한다.

아래 그림에는 m5a.xlarge로 선택한걸로 되어있는데 r5.4xlarge를 선택하자.

4

5

그런 다음에 생성한 EC2 보안그룹에 접속해서 CDH 노드들간의 통신을 위해서 현재 사용하는 VPC의 프라이빗 아이피를 모두 허용하도록 설정한다.

6

STEP 8) 클러스터를 구성하는 노드들의 네트워크 설정

모든 노드에 접속해서 아래와 같이 명령어를 실행해준다.

[ec2-user@ip-10-10-1-91 ~]$ sudo hostnamectl set-hostname cdh-manager

[ec2-user@ip-10-10-1-91 ~]$ hostname
cdh-manager

#####################################################################################
# 각각의 노드별로 아래와 같이 호스트네임을 지정해준다.
#####################################################################################

# [ec2-user@ip-10-10-1-68 ~]$ sudo hostnamectl set-hostname cdh-node1

# [ec2-user@ip-10-10-1-68 ~]$ hostname
# cdh-node1

# [ec2-user@ip-10-10-1-108 ~]$ sudo hostnamectl set-hostname cdh-node2

# [ec2-user@ip-10-10-1-108 ~]$ hostname
# cdh-node2

# [ec2-user@ip-10-10-1-99 ~]$ sudo hostnamectl set-hostname cdh-node3

# [ec2-user@ip-10-10-1-99 ~]$ hostname
# cdh-node3

[ec2-user@ip-10-10-1-91 ~]$ sudo vim /etc/hosts

#####################################################################################
# 원본내용
#####################################################################################

127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6

#####################################################################################
# 수정후 내용
# 아래와 같이 각각의 노드별로 EC2 private ip와 호스트네임을 넣어준다.
#####################################################################################

127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6

10.10.1.91   cdh-manager
10.10.1.68   cdh-node1
10.10.1.108  cdh-node2
10.10.1.99   cdh-node3

[ec2-user@cdh-manager ~]$ sudo vim /etc/cloudera-scm-server/db.mgmt.properties

#####################################################################################
# 원본내용
#####################################################################################

# Auto-generated by initialize_embedded_db.sh
#
# 20210920-080352
#
# These are database credentials for databases
# created by "cloudera-scm-server-db" for
# Cloudera Manager Management Services,
# to be used during the installation wizard if
# the embedded database route is taken.
#
# The source of truth for these settings
# is the Cloudera Manager databases and
# changes made here will not be reflected
# there automatically.
#
com.cloudera.cmf.ACTIVITYMONITOR.db.type=postgresql
com.cloudera.cmf.ACTIVITYMONITOR.db.host=ip-10-10-1-91.ap-northeast-2.compute.internal:7432
com.cloudera.cmf.ACTIVITYMONITOR.db.name=amon
com.cloudera.cmf.ACTIVITYMONITOR.db.user=amon
com.cloudera.cmf.ACTIVITYMONITOR.db.password=S2eeTPVElY
com.cloudera.cmf.REPORTSMANAGER.db.type=postgresql
com.cloudera.cmf.REPORTSMANAGER.db.host=ip-10-10-1-91.ap-northeast-2.compute.internal:7432
com.cloudera.cmf.REPORTSMANAGER.db.name=rman
com.cloudera.cmf.REPORTSMANAGER.db.user=rman
com.cloudera.cmf.REPORTSMANAGER.db.password=zDH0Qqm0VC
com.cloudera.cmf.NAVIGATOR.db.type=postgresql
com.cloudera.cmf.NAVIGATOR.db.host=ip-10-10-1-91.ap-northeast-2.compute.internal:7432
com.cloudera.cmf.NAVIGATOR.db.name=nav
com.cloudera.cmf.NAVIGATOR.db.user=nav
com.cloudera.cmf.NAVIGATOR.db.password=fl75yEi6GC
com.cloudera.cmf.NAVIGATORMETASERVER.db.type=postgresql
com.cloudera.cmf.NAVIGATORMETASERVER.db.host=ip-10-10-1-91.ap-northeast-2.compute.internal:7432
com.cloudera.cmf.NAVIGATORMETASERVER.db.name=navms
com.cloudera.cmf.NAVIGATORMETASERVER.db.user=navms
com.cloudera.cmf.NAVIGATORMETASERVER.db.password=Sfn4vw7cKa

#####################################################################################
# 원본내용
#####################################################################################

# Auto-generated by initialize_embedded_db.sh
#
# 20210920-080352
#
# These are database credentials for databases
# created by "cloudera-scm-server-db" for
# Cloudera Manager Management Services,
# to be used during the installation wizard if
# the embedded database route is taken.
#
# The source of truth for these settings
# is the Cloudera Manager databases and
# changes made here will not be reflected
# there automatically.
#
com.cloudera.cmf.ACTIVITYMONITOR.db.type=postgresql
com.cloudera.cmf.ACTIVITYMONITOR.db.host=10.10.1.91:7432
com.cloudera.cmf.ACTIVITYMONITOR.db.name=amon
com.cloudera.cmf.ACTIVITYMONITOR.db.user=amon
com.cloudera.cmf.ACTIVITYMONITOR.db.password=S2eeTPVElY
com.cloudera.cmf.REPORTSMANAGER.db.type=postgresql
com.cloudera.cmf.REPORTSMANAGER.db.host=10.10.1.91:7432
com.cloudera.cmf.REPORTSMANAGER.db.name=rman
com.cloudera.cmf.REPORTSMANAGER.db.user=rman
com.cloudera.cmf.REPORTSMANAGER.db.password=zDH0Qqm0VC
com.cloudera.cmf.NAVIGATOR.db.type=postgresql
com.cloudera.cmf.NAVIGATOR.db.host=10.10.1.91:7432
com.cloudera.cmf.NAVIGATOR.db.name=nav
com.cloudera.cmf.NAVIGATOR.db.user=nav
com.cloudera.cmf.NAVIGATOR.db.password=fl75yEi6GC
com.cloudera.cmf.NAVIGATORMETASERVER.db.type=postgresql
com.cloudera.cmf.NAVIGATORMETASERVER.db.host=10.10.1.91:7432
com.cloudera.cmf.NAVIGATORMETASERVER.db.name=navms
com.cloudera.cmf.NAVIGATORMETASERVER.db.user=navms
com.cloudera.cmf.NAVIGATORMETASERVER.db.password=Sfn4vw7cKa

[ec2-user@ip-10-10-1-91 ~]$ sudo service cloudera-scm-server restart
Redirecting to /bin/systemctl restart cloudera-scm-server.service

STEP 9) CDH Database 설치

CDH Manager 및 하둡 에코 어플들이 사용할 내장디비를 설치하든 외장디비를 설치하든 디비를 설치하고 config를 아래와 같이 잡아줘야 한다.

postgresql 내장디비를 설치해서 아래와 같이 연결해보자.

마스터 노드 서버에서 아래와 같이 명령어를 실행하자

# Installing PostgreSQL Server
[ec2-user@cdh-manager ~]$ sudo yum install postgresql-server -y

...

===================================================================================================================================================================
 Package                                  Arch                          Version                               Repository                                      Size
===================================================================================================================================================================
Installing:
 postgresql-server                        x86_64                        9.2.24-7.el7_9                        rhel-7-server-rhui-rpms                        3.8 M
Installing for dependencies:
 postgresql                               x86_64                        9.2.24-7.el7_9                        rhel-7-server-rhui-rpms                        3.0 M
 postgresql-libs                          x86_64                        9.2.24-7.el7_9                        rhel-7-server-rhui-rpms                        235 k

Transaction Summary
===================================================================================================================================================================
Install  1 Package (+2 Dependent packages)

Total download size: 7.1 M
Installed size: 33 M
Downloading packages:
(1/3): postgresql-libs-9.2.24-7.el7_9.x86_64.rpm                                                                                            | 235 kB  00:00:00
(2/3): postgresql-9.2.24-7.el7_9.x86_64.rpm                                                                                                 | 3.0 MB  00:00:00
(3/3): postgresql-server-9.2.24-7.el7_9.x86_64.rpm                                                                                          | 3.8 MB  00:00:00
-------------------------------------------------------------------------------------------------------------------------------------------------------------------
Total                                                                                                                               15 MB/s | 7.1 MB  00:00:00
Running transaction check
Running transaction test
Transaction test succeeded
Running transaction
  Installing : postgresql-libs-9.2.24-7.el7_9.x86_64                                                                                                           1/3
  Installing : postgresql-9.2.24-7.el7_9.x86_64                                                                                                                2/3
  Installing : postgresql-server-9.2.24-7.el7_9.x86_64                                                                                                         3/3
warning: /var/lib/pgsql/.bash_profile created as /var/lib/pgsql/.bash_profile.rpmnew
  Verifying  : postgresql-9.2.24-7.el7_9.x86_64                                                                                                                1/3
  Verifying  : postgresql-libs-9.2.24-7.el7_9.x86_64                                                                                                           2/3
  Verifying  : postgresql-server-9.2.24-7.el7_9.x86_64                                                                                                         3/3

Installed:
  postgresql-server.x86_64 0:9.2.24-7.el7_9

Dependency Installed:
  postgresql.x86_64 0:9.2.24-7.el7_9                                            postgresql-libs.x86_64 0:9.2.24-7.el7_9

Complete!

[ec2-user@ip-10-10-1-166 ~]$ python --version
Python 2.7.5

# Installing the psycopg2 Python Package
[ec2-user@cdh-manager ~]$ sudo yum install python3-pip -y

...

===================================================================================================================================================================
 Package                                   Arch                          Version                              Repository                                      Size
===================================================================================================================================================================
Installing:
 python3-pip                               noarch                        9.0.3-8.el7                          rhel-7-server-rhui-rpms                        1.6 M
Installing for dependencies:
 python3                                   x86_64                        3.6.8-18.el7                         rhel-7-server-rhui-rpms                         70 k
 python3-libs                              x86_64                        3.6.8-18.el7                         rhel-7-server-rhui-rpms                        6.9 M
 python3-setuptools                        noarch                        39.2.0-10.el7                        rhel-7-server-rhui-rpms                        629 k

Transaction Summary
===================================================================================================================================================================
Install  1 Package (+3 Dependent packages)

Total download size: 9.3 M
Installed size: 47 M
Downloading packages:
(1/4): python3-3.6.8-18.el7.x86_64.rpm                                                                                                      |  70 kB  00:00:00
(2/4): python3-libs-3.6.8-18.el7.x86_64.rpm                                                                                                 | 6.9 MB  00:00:00
(3/4): python3-pip-9.0.3-8.el7.noarch.rpm                                                                                                   | 1.6 MB  00:00:00
(4/4): python3-setuptools-39.2.0-10.el7.noarch.rpm                                                                                          | 629 kB  00:00:00
-------------------------------------------------------------------------------------------------------------------------------------------------------------------

...                                                                                                           

Installed:
  python3-pip.noarch 0:9.0.3-8.el7

Dependency Installed:
  python3.x86_64 0:3.6.8-18.el7                   python3-libs.x86_64 0:3.6.8-18.el7                   python3-setuptools.noarch 0:39.2.0-10.el7

Complete!

[ec2-user@cdh-manager ~]$ sudo pip3 install psycopg2==2.7.5 --ignore-installed
WARNING: Running pip install with root privileges is generally not a good idea. Try `pip3 install --user` instead.
Collecting psycopg2==2.7.5
  Downloading https://files.pythonhosted.org/packages/5e/d0/xxxxxxxxxx/psycopg2-2.7.5-cp36-cp36m-manylinux1_x86_64.whl (2.7MB)
    100% |████████████████████████████████| 2.7MB 450kB/s
Installing collected packages: psycopg2
Successfully installed psycopg2-2.7.5

[ec2-user@cdh-manager ~]$ sudo su

[root@cdh-manager ec2-user]# echo 'LC_ALL="en_US.UTF-8"' >> /etc/locale.conf

[root@cdh-manager ec2-user]# exit
exit

[ec2-user@cdh-manager ~]$ sudo su -l postgres -c "postgresql-setup initdb"
Initializing database ... OK

[ec2-user@cdh-manager ~]$ sudo su

[root@cdh-manager ec2-user]# cd /var/lib/pgsql/data

[root@cdh-manager data]# vim pg_hba.conf

#####################################################################################
# 원본내용
#####################################################################################

...

# TYPE  DATABASE        USER            ADDRESS                 METHOD

# "local" is for Unix domain socket connections only
local   all             all                                     peer
# IPv4 local connections:
host    all             all             127.0.0.1/32            ident
# IPv6 local connections:
host    all             all             ::1/128                 ident
# Allow replication connections from localhost, by a user with the
# replication privilege.
#local   replication     postgres                                peer
#host    replication     postgres        127.0.0.1/32            ident
#host    replication     postgres        ::1/128                 ident

#####################################################################################
# 수정후 내용
#####################################################################################

...

# TYPE  DATABASE        USER            ADDRESS                 METHOD

# "local" is for Unix domain socket connections only
local   all             all                                     peer
# IPv4 local connections:
host all all 127.0.0.1/32 md5
host    all             all             127.0.0.1/32            ident
# IPv6 local connections:
host    all             all             ::1/128                 ident
# Allow replication connections from localhost, by a user with the
# replication privilege.
#local   replication     postgres                                peer
#host    replication     postgres        127.0.0.1/32            ident
#host    replication     postgres        ::1/128                 ident

[root@cdh-manager data]# exit
exit

[ec2-user@ip-10-10-1-166 ~]$ sudo vim /var/lib/pgsql/data/postgresql.conf

아래의 내용을 참고해서 설정값을 설정한다.

먼저 컨피그 파일 중간에 listen_adress는 아래와 같이 마스터 노드의 프라이빗 아이피를 기입하고, 포트도 위에 그림에서 명시되었듯이

listen_addresses = '10.10.1.91'

port = 7432

그리고 아래의 내용을 참고해서 설정값을 수정해준다.

** URL : https://docs.cloudera.com/cdp-private-cloud-base/latest/installation/topics/cdpdc-configuring-starting-postgresql-server.html

88

[ec2-user@cdh-manager ~]$ sudo systemctl enable postgresql
Created symlink from /etc/systemd/system/multi-user.target.wants/postgresql.service to /usr/lib/systemd/system/postgresql.service.

[ec2-user@cdh-manager ~]$ sudo systemctl start postgresql

[ec2-user@cdh-manager ~]$ sudo systemctl status postgresql
● postgresql.service - PostgreSQL database server
   Loaded: loaded (/usr/lib/systemd/system/postgresql.service; enabled; vendor preset: disabled)
   Active: active (running) since Mon 2021-09-20 13:32:46 UTC; 5s ago
  Process: 16425 ExecStart=/usr/bin/pg_ctl start -D ${PGDATA} -s -o -p ${PGPORT} -w -t 300 (code=exited, status=0/SUCCESS)
  Process: 16418 ExecStartPre=/usr/bin/postgresql-check-db-dir ${PGDATA} (code=exited, status=0/SUCCESS)
 Main PID: 16427 (postgres)
   CGroup: /system.slice/postgresql.service
           ├─16427 /usr/bin/postgres -D /var/lib/pgsql/data -p 5432
           ├─16428 postgres: logger process
           ├─16430 postgres: checkpointer process
           ├─16431 postgres: writer process
           ├─16432 postgres: wal writer process
           ├─16433 postgres: autovacuum launcher process
           └─16434 postgres: stats collector process

Sep 20 13:32:45 cdh-manager systemd[1]: Starting PostgreSQL database server...
Sep 20 13:32:46 cdh-manager systemd[1]: Started PostgreSQL database server.

STEP 10) CDH 클러스터 구성

웹브라우저를 열고 http://{EC2 public IP}:7180으로 접속해서 아래와 같은 계정정보로 로그인한다.

ID : admin

PW : admin

3

로그인을 완료하면 아래와 같이 설치를 진행한다.

7

8

9

여기까지 완료했으면 기본적인 CDH 클러스터 구성이 완료되었다. 클러스터가 이상이 없는지 아래 그림과 같이 최종 체크하면 모든 과정이 끝나게 된다.

10