EMR jupyterhub 개발환경 백업&복원 예시

2020-12-07

.

[목표]

ASIS EMR의 jupyterhub 계정정보와 라이브러리를 백업해서 TOBE EMR에서 restore하는 쉘스크립트 예시

  • 쉘스크립트 내용요약

1) backup.sh : AS-IS EMR을 terminate 하기 직전에 s3로 정보들을 백업하는 쉘스크립트

2) restore.sh : TO-BE EMR을 띄우고, 백업한 정보를 restore할때 사용하는 쉘스크립트

  • jupterhub 개발환경 백업대상

1) EMR 마스터노드 yum, pip3 라이브러리

2) jupyterhub docker container내에서 사용하는 pip 라이브러리

3) jupyterhub 유저정보 및 작업한 file 들

  • 유의사항

개발한 스크립트 로직상 restore한 계정은 비밀번호가 초기화 되기 때문에 EMR을 새로띄우고나서 개인별로 jupyterhub 계정 비밀번호 변경필요

  • backup.sh
#!/bin/bash

####################################################################
# backup pip3 library in EMR master node
####################################################################

pip3 freeze > /home/hadoop/requirements.txt
aws s3 cp /home/hadoop/requirements.txt s3://pms-bucket-test/dev_emr_backup/

####################################################################
# backup jupyterhub user info
####################################################################

echo "c.LocalAuthenticator.create_system_users = True" | sudo tee -a /etc/jupyter/conf/jupyterhub_config.py
sudo docker restart jupyterhub

token=$(sudo docker exec jupyterhub /opt/conda/bin/jupyterhub token jovyan | tail -1)
sleep 1s
user_list=$(curl -XGET -s -k https://$(hostname):9443/hub/api/users -H "Authorization: token $token" | jq '.[].name' | sed 's/"//g')
echo $user_list > /home/hadoop/jupyterhub_user_list.txt

aws s3 cp /home/hadoop/jupyterhub_user_list.txt s3://pms-bucket-test/dev_emr_backup/jupyterhub_user_list.txt

####################################################################
# backup jupyterhub pip library
####################################################################

sudo docker exec jupyterhub bash -c "pip freeze > jupyterhub_requirements.txt"
sudo docker cp jupyterhub:/home/jovyan/jupyterhub_requirements.txt /home/hadoop/
aws s3 cp /home/hadoop/jupyterhub_requirements.txt s3://pms-bucket-test/dev_emr_backup/

####################################################################
# backup yum library in EMR master node
####################################################################

rpm -qa > /home/hadoop/yum_list_backup.log
aws s3 cp /home/hadoop/yum_list_backup.log s3://pms-bucket-test/dev_emr_backup/

####################################################################
# backup jupyterhub files
####################################################################

aws s3 sync /mnt/var/lib/jupyter/home/ s3://pms-bucket-test/dev_jupyterhub_backup/ --exclude "*/jupyterhub.sqlite" --exclude "*/jupyterhub-proxy.pid" --exclude "*/.autovizwidget/*" --exclude "*/.ipynb_checkpoints/*" --exclude "*/.ipython/*" --exclude "*/.local/*"  --exclude "*/.sparkmagic/*" --exclude "*/.bash_logout" --exclude "*/.bashrc" --exclude "*/.profile"
  • restore.sh
#!/bin/bash

####################################################################
# restore yum library in EMR master node
####################################################################

aws s3 cp s3://pms-bucket-test/dev_emr_backup/yum_list_backup.log /home/hadoop/yum_list_backup.log
sudo yum -y install $(cat /home/hadoop/yum_list_backup.log)

####################################################################
# restore pip3 library in EMR master node
####################################################################

aws s3 cp s3://pms-bucket-test/dev_emr_backup/requirements.txt /home/hadoop/requirements.txt
sudo pip3 install $(grep -ivE "beautifulsoup4|boto|click|jmespath|joblib|lxml|mysqlclient|nltk|nose|numpy|py-dateutil|python37-sagemaker-pyspark|pytz|PyYAML|regex|six|tqdm|windmill" /home/hadoop/requirements.txt)


####################################################################
# restore jupyterhub pip library
####################################################################

aws s3 cp s3://pms-bucket-test/dev_emr_backup/jupyterhub_requirements.txt /home/hadoop/jupyterhub_requirements.txt
sudo docker cp /home/hadoop/jupyterhub_requirements.txt jupyterhub:/home/jovyan/jupyterhub_requirements.txt
sudo docker exec jupyterhub bash -c "pip install -r jupyterhub_requirements.txt"


####################################################################
# restore jupyterhub user info
####################################################################

echo "c.LocalAuthenticator.create_system_users = True" | sudo tee -a /etc/jupyter/conf/jupyterhub_config.py
echo "c.Authenticator.admin_users = {'jovyan'}" | sudo tee -a /etc/jupyter/conf/jupyterhub_config.py

aws s3 cp s3://pms-bucket-test/dev_emr_backup/jupyterhub_user_list.txt /home/hadoop/jupyterhub_user_list.txt
sed -i 's/jovyan //g' /home/hadoop/jupyterhub_user_list.txt

set -x
USERS=($( cat /home/hadoop/jupyterhub_user_list.txt ))
sleep 1s
TOKEN=$(sudo docker exec jupyterhub /opt/conda/bin/jupyterhub token jovyan | tail -1)
sleep 1s
password=$(echo "bXlwYXNzd2Q=" | base64 -d)
# bXlwYXNzd2Q= : mypasswd

for i in "${USERS[@]}";
do 
   sudo docker exec jupyterhub useradd -m -s /bin/bash -N $i
   sudo docker exec jupyterhub bash -c "echo $i:$password | chpasswd"
done

echo $(sed -e s/' '/"','"/g /home/hadoop/jupyterhub_user_list.txt) > /home/hadoop/jupyterhub_user_list.txt
echo $(sed "s/^/'/" /home/hadoop/jupyterhub_user_list.txt) > /home/hadoop/jupyterhub_user_list.txt
sed -i "s/$/'/g" /home/hadoop/jupyterhub_user_list.txt

users=$(cat /home/hadoop/jupyterhub_user_list.txt)
echo "c.Authenticator.whitelist = {$users}" | sudo tee -a /etc/jupyter/conf/jupyterhub_config.py
sudo docker restart jupyterhub


####################################################################
# restore jupyterhub files
####################################################################

sudo aws s3 cp s3://pms-bucket-test/dev_jupyterhub_backup/ /mnt/var/lib/jupyter/home/ --recursive
sudo docker restart jupyterhub

** 아래의 URL의 첨부파일을 반드시 참고할것

https://github.com/minman2115/Data_engineering_studynotes_2020/blob/master/EMR%20jupyterhub%20%EA%B0%9C%EB%B0%9C%ED%99%98%EA%B2%BD%20%EB%B0%B1%EC%97%85%26%EB%B3%B5%EC%9B%90%20%EC%98%88%EC%8B%9C/%EC%B0%B8%EA%B3%A0%EC%9E%90%EB%A3%8C.zip