Elasticsearch에서 데이터를 백업하고 복원하는 방안

2021-10-03

.

Data_Engineering_TIL(20211003)

[참고자료]

“Elasticsearch Migration” 최정민님 깃허브 자료

URL : https://github.com/cjungm/with-aws/tree/main/ElasticSearch/migration_ES

[참고사항]

“Kaggle API를 이용한 데이터 다운로드 및 ES에 적재하기” 에 이어서 진행하는 실습내용임

URL : https://minman2115.github.io/DE_TIL292

[실습목표]

AWS에서 제공하는 managed ES 클러스터를 구성하고, EC2 설치형으로 구성한 ES 클러스터와 managed ES 클러스터에서 데이터를 백업하고 복원을 해본다.

[실습요약]

STEP 1) AWS에서 제공하는 managed ES 클러스터 생성

STEP 2) 데이터 마이그레이션하기

방법 1. EC2 설치형으로 구성한 ES 노드에 공유 File Storage에 Respository 구성하고 스냅샷 백업하는 방법

방법 2. EC2 설치형 ES에서 S3에 Respository 구성하고 ES 스냅샷 백업하는 방법

방법 3. Amazon managed ES에서 스냅샷을 백업하고 복구하는 방법

방법 4. EC2 설치형 ES에서 Elasticdump를 이용하여 데이터 이관하는 방법

[실습 상세내용]

STEP 1) AWS에서 제공하는 managed ES 클러스터 생성

AWS 콘솔에서 아래와 같이 managed ES를 생성한다.

1

2

STEP 2) 데이터 마이그레이션하기

방법 1. EC2 설치형으로 구성한 ES 노드에 공유 File Storage에 Respository 구성하고 스냅샷 백업하는 방법

먼저 EC2 설치형으로 구성한 ES 클러스터의 모든 노드에서 아래와 같이 명령어를 실행하여 노드끼리 폴더 mount할 설정을 해준다.

[ec2-user@es-coordinater ~]$ sudo yum install -y nfs-utils nfs-utils-lib
Loaded plugins: extras_suggestions, langpacks, priorities, update-motd
amzn2-core                                                                                                        | 3.7 kB  00:00:00
amzn2extra-docker                                                                                                 | 3.0 kB  00:00:00
kibana-7.x                                                                                                        | 1.3 kB  00:00:00
(1/3): amzn2-core/2/x86_64/group_gz                                                                               | 2.5 kB  00:00:00
(2/3): amzn2-core/2/x86_64/updateinfo                                                                             | 405 kB  00:00:00
(3/3): amzn2-core/2/x86_64/primary_db                                                                             |  56 MB  00:00:00
Package 1:nfs-utils-1.3.0-0.54.amzn2.0.2.x86_64 already installed and latest version
No package nfs-utils-lib available.
Nothing to do

[ec2-user@es-coordinater ~]$ sudo chkconfig nfs on
Note: Forwarding request to 'systemctl enable nfs.service'.
Created symlink from /etc/systemd/system/multi-user.target.wants/nfs-server.service to /usr/lib/systemd/system/nfs-server.service.

[ec2-user@es-coordinater ~]$ sudo service rpcbind start
Redirecting to /bin/systemctl start rpcbind.service

[ec2-user@es-coordinater ~]$ sudo service nfs start
Redirecting to /bin/systemctl start nfs.service

[ec2-user@es-coordinater ~]$ sudo mkdir /mnt/snapshots

[ec2-user@es-coordinater ~]$ sudo chown elasticsearch:elasticsearch /mnt/snapshots

코디네이트 노드(또는 지정된 마스터노드)에서 아래와 같이 설정해준다.

#/mnt/snapshots  {data node 01 ip}(rw,sync,no_root_squash,no_subtree_check)
#/mnt/snapshots  {data node 02 ip}(rw,sync,no_root_squash,no_subtree_check)
#/mnt/snapshots  {data node 03 ip}(rw,sync,no_root_squash,no_subtree_check)
[ec2-user@es-coordinater ~]$ sudo vim /etc/exports
/mnt/snapshots  10.10.1.51(rw,sync,no_root_squash,no_subtree_check)
/mnt/snapshots  10.10.1.68(rw,sync,no_root_squash,no_subtree_check)
/mnt/snapshots  10.10.1.206(rw,sync,no_root_squash,no_subtree_check)

[ec2-user@es-coordinater ~]$ sudo exportfs -a

그런 다음에 코디네이트 노드(또는 지정된 마스터노드)를 제외한 모든노드에서 아래와 같이 공유폴더 mount 명령어를 실행해준다.

# sudo mount {마스터노드 아이피}:/mnt/snapshots /mnt/snapshots
[ec2-user@data01 mnt]$ sudo mount 10.10.1.226:/mnt/snapshots /mnt/snapshots

모든노드에서 아래와 같이 명령어를 실행하여 ES 차원에서 레포를 어떤거로 할지 지정해준다.

[ec2-user@es-coordinater ~]$ sudo vim /etc/elasticsearch/elasticsearch.yml

...

# 가장 하단에 아래의 컨피그 값 추가
path.repo: ["/mnt/snapshots"]
    
[ec2-user@es-coordinater ~]$ sudo systemctl restart elasticsearch.service

[ec2-user@es-coordinater ~]$ sudo systemctl status elasticsearch.service
 elasticsearch.service - Elasticsearch
   Loaded: loaded (/usr/lib/systemd/system/elasticsearch.service; enabled; vendor preset: disabled)
   Active: active (running) since Sun 2021-10-03 16:51:51 KST; 20s ago
     Docs: https://www.elastic.co
 Main PID: 4473 (java)
   CGroup: /system.slice/elasticsearch.service
           ├─4473 /usr/share/elasticsearch/jdk/bin/java -Xshare:auto -Des.networkaddress.cache.ttl=60 -Des.networkaddress.cache.negative.ttl=10 -XX:+AlwaysPreTouch...
           └─4675 /usr/share/elasticsearch/modules/x-pack-ml/platform/linux-x86_64/bin/controller

Oct 03 16:51:25 es-coordinater systemd[1]: Starting Elasticsearch...
Oct 03 16:51:51 es-coordinater systemd[1]: Started Elasticsearch.

코디네이트 노드(또는 마스터노드)에서 아래와 같이 ES 레포를 등록한다.

[ec2-user@es-coordinater ~]$ curl -X PUT 'http://10.10.1.226:9200/_snapshot/minman_es_repo' -H 'Content-Type: application/json' -d '{"type": "fs","settings": {"location": "/mnt/snapshots"}}' -k -u elastic
Enter host password for user 'elastic':
{"acknowledged":true}

[ec2-user@es-coordinater ~]$ cd /mnt/snapshots

# 그러면 아래와 같이 file system에 ES 레포를 등록이 완료된 것이고
# 스냅샷을 뜨게 되면 이 경로에 떨어질 것이다.
# 그러면 이 스냅샷을 s3에 백업하고 이를 또 리스토어를 할 수 있게 된다.
[ec2-user@es-coordinater snapshots]$ aws s3 sync . s3://{migration-bucket}/{path} --sse AES256

방법 2. EC2 설치형 ES에서 S3에 Respository 구성하고 ES 스냅샷 백업하는 방법

먼저 모든 노드에서 아래와 같이 명령어를 실행하여 s3 플러그인을 설치한다.

[ec2-user@es-coordinater ~]$ sudo /usr/share/elasticsearch/bin/elasticsearch-plugin install repository-s3
-> Installing repository-s3
-> Downloading repository-s3 from elastic
[=================================================] 100%  
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@     WARNING: plugin requires additional permissions     @
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
* java.lang.RuntimePermission accessDeclaredMembers
* java.lang.RuntimePermission getClassLoader
* java.lang.reflect.ReflectPermission suppressAccessChecks
* java.net.SocketPermission * connect,resolve
* java.util.PropertyPermission es.allow_insecure_settings read,write
See http://docs.oracle.com/javase/8/docs/technotes/guides/security/permissions.html
for descriptions of what these permissions allow and the associated risks.

Continue with installation? [y/N]y
-> Installed repository-s3
-> Please restart Elasticsearch to activate any plugins installed

[ec2-user@es-coordinater ~]$ sudo /usr/share/elasticsearch/bin/elasticsearch-plugin list
repository-s3

[ec2-user@es-coordinater ~]$ sudo vim /etc/elasticsearch/jvm.options

...

# 맨 밑에 아래 설정값을 추가해준다.
-Des.allow_insecure_settings=true

[ec2-user@es-coordinater ~]$ sudo systemctl restart elasticsearch.service

[ec2-user@es-coordinater ~]$ sudo systemctl status elasticsearch.service
● elasticsearch.service - Elasticsearch
   Loaded: loaded (/usr/lib/systemd/system/elasticsearch.service; enabled; vendor preset: disabled)
   Active: active (running) since Mon 2021-10-04 12:08:44 KST; 34s ago
     Docs: https://www.elastic.co
 Main PID: 3542 (java)
   CGroup: /system.slice/elasticsearch.service
           ├─3542 /usr/share/elasticsearch/jdk/bin/java -Xshare:auto -Des.networkaddress.cache.ttl=60 -Des.networkaddress.cache.negative.ttl=10 -XX:+AlwaysPreTouch...
           └─3747 /usr/share/elasticsearch/modules/x-pack-ml/platform/linux-x86_64/bin/controller

Oct 04 12:08:20 es-coordinater systemd[1]: Starting Elasticsearch...
Oct 04 12:08:44 es-coordinater systemd[1]: Started Elasticsearch.

코디네이트 노드(또는 지정된 마스터노드)에서 아래와 같이 명령어를 실행하여 보안 클러스터 설정을 다시 로드한다.

# curl -u elastic:mypassword1! -XPOST 'http://{코디네이트 노드 또는 지정된 master 노드의 프라이빗ip}:9200/_nodes/reload_secure_settings'
[ec2-user@es-coordinater ~]$ curl -u elastic:mypassword1! -XPOST 'http://10.10.1.226:9200/_nodes/reload_secure_settings'
{
 "_nodes":{"total":4,"successful":4,"failed":0},
 "cluster_name":"es-demo",
 "nodes":{
     "4C5M4JW9S0yvmJSKmNh2HA":{"name":"data02"},
     "_N0Ky6UjRlCoBaSBrYa9OQ":{"name":"data03"},
     "bhbcGaRFT7O2-r1_PlBTVw":{"name":"data01"},
     "kY_eX8HGRqaWLZDACNMgFA":{"name":"es-coordinater"}
 }
}

# 그리고 아래와 같은 REST API 양식을 이용해서 S3 레포지토리를 지정해준다.
#curl -u USER:PASS -H 'Content-Type: application/json' -XPUT 'http://{master_private_ip}:9200/_snapshot/{repository_name}?pretty=true' -d'
#{
#  "type": "s3",
#  "settings": {
#    "bucket": "{migration-bucket}",
#    "region": "region_name",
#    "base_path": "{path}",
#    "access_key": "{access_key}",
#    "secret_key": "{secret_key}"
#  }
#}'
[ec2-user@es-coordinater ~]$ curl -u elastic:mypassword1! -H 'Content-Type: application/json' -X PUT 'http://10.10.1.226:9200/_snapshot/minman_s3_es_repo?pretty=true' -d'
{
  "type": "s3",
  "settings": {
    "bucket": "pms-bucket-test",
    "region": "ap-northeast-2",
    "base_path": "elastic_repo",
    "access_key": "xxxxxxxxxxxxxxxxxx",
    "secret_key": "yyyyyyyyyyyyyyyyyy"
  }
}'
{
  "acknowledged" : true
}


# curl -u USERID:PASSWD -H "content-type: application/JSON" -XPUT 'http://{master_private_ip}:9200/_snapshot/{repository_name}/{backup_key}?pretty=true&wait_for_completion=true' -d '{"indices": "{index}","ignore_unavailable": true,"include_global_state": false}'
[ec2-user@es-coordinater ~]$ curl -u elastic:mypassword1! -H "content-type: application/JSON" -XPUT 'http://10.10.1.226:9200/_snapshot/minman_s3_es_repo/minman_test_index_backup_key?pretty=true&wait_for_completion=true' -d '{"indices": "minman_test_index","ignore_unavailable": true,"include_global_state": false}'
{
  "snapshot" : {
    "snapshot" : "minman_test_index_backup_key",
    "uuid" : "schVu85XSj6WYGIjTKgSAg",
    "repository" : "minman_s3_es_repo",
    "version_id" : 7140099,
    "version" : "7.14.0",
    "indices" : [
      "minman_test_index"
    ],
    "data_streams" : [ ],
    "include_global_state" : false,
    "state" : "SUCCESS",
    "start_time" : "2021-10-04T04:01:43.168Z",
    "start_time_in_millis" : 1633320103168,
    "end_time" : "2021-10-04T04:03:15.610Z",
    "end_time_in_millis" : 1633320195610,
    "duration_in_millis" : 92442,
    "failures" : [ ],
    "shards" : {
      "total" : 1,
      "failed" : 0,
      "successful" : 1
    },
    "feature_states" : [ ]
  }
}

# Total number of objects 95 Total size 3.6 GB
[ec2-user@es-coordinater ~]$ aws s3 ls s3://pms-bucket-test/elastic_repo/
                           PRE indices/
2021-10-04 12:17:51          0
2021-10-04 13:03:16        602 index-0
2021-10-04 13:03:16          8 index.latest
2021-10-04 13:03:16        256 meta-schVu85XSj6WYGIjTKgSAg.dat
2021-10-04 13:03:16        398 snap-schVu85XSj6WYGIjTKgSAg.dat
        
# S3 백업 확인하기
# curl -u USER:PASS -XGET 'http://{master_private_ip}:9200/_snapshot?pretty'
[ec2-user@es-coordinater ~]$ curl -u elastic:mypassword1! -X GET 'http://10.10.1.226:9200/_snapshot?pretty'
{
  "minman_es_repo" : {
    "type" : "fs",
    "settings" : {
      "location" : "/mnt/snapshots"
    }
  },
  "minman_s3_es_repo" : {
    "type" : "s3",
    "uuid" : "7-bEBsazT6-8wcSdZ6up1Q",
    "settings" : {
      "bucket" : "pms-bucket-test",
      "region" : "ap-northeast-2",
      "base_path" : "elastic_repo"
    }
  }
}

# 특정 백업 키 확인
# curl -u USER:PASS -XGET 'http://{master_private_ip}:9200/_snapshot/{repository_name}/{backup_key}?pretty'
[ec2-user@es-coordinater ~]$ curl -u elastic:mypassword1! -X GET 'http://10.10.1.226:9200/_snapshot/minman_s3_es_repo/minman_test_index_backup_key?pretty'
{
  "snapshots" : [
    {
      "snapshot" : "minman_test_index_backup_key",
      "uuid" : "schVu85XSj6WYGIjTKgSAg",
      "repository" : "minman_s3_es_repo",
      "version_id" : 7140099,
      "version" : "7.14.0",
      "indices" : [
        "minman_test_index"
      ],
      "data_streams" : [ ],
      "include_global_state" : false,
      "state" : "SUCCESS",
      "start_time" : "2021-10-04T04:01:43.168Z",
      "start_time_in_millis" : 1633320103168,
      "end_time" : "2021-10-04T04:03:15.610Z",
      "end_time_in_millis" : 1633320195610,
      "duration_in_millis" : 92442,
      "failures" : [ ],
      "shards" : {
        "total" : 1,
        "failed" : 0,
        "successful" : 1
      },
      "feature_states" : [ ]
    }
  ]
}

# 스냅샷 상태체크
[ec2-user@es-coordinater ~]$ curl -X GET "http://es-coordinater:9200/_snapshot/minman_s3_es_repo/minman_test_index_backup_key/_status?pretty" -k -u elastic
Enter host password for user 'elastic':
{
  "snapshots" : [
    {
      "snapshot" : "minman_test_index_backup_key",
      "repository" : "minman_s3_es_repo",
      "uuid" : "schVu85XSj6WYGIjTKgSAg",
      "state" : "SUCCESS",
      "include_global_state" : false,
      "shards_stats" : {
        "initializing" : 0,
        "started" : 0,
        "finalizing" : 0,
        "done" : 1,
        "failed" : 0,
        "total" : 1
      },
      "stats" : {
        "incremental" : {
          "file_count" : 119,
          "size_in_bytes" : 3862661102
        },
        "total" : {
          "file_count" : 119,
          "size_in_bytes" : 3862661102
        },
        "start_time_in_millis" : 1633320103168,
        "time_in_millis" : 92442
      },
      "indices" : {
        "minman_test_index" : {
          "shards_stats" : {
            "initializing" : 0,
            "started" : 0,
            "finalizing" : 0,
            "done" : 1,
            "failed" : 0,
            "total" : 1
          },
          "stats" : {
            "incremental" : {
              "file_count" : 119,
              "size_in_bytes" : 3862661102
            },
            "total" : {
              "file_count" : 119,
              "size_in_bytes" : 3862661102
            },
            "start_time_in_millis" : 1633320103168,
            "time_in_millis" : 92442
          },
          "shards" : {
            "0" : {
              "stage" : "DONE",
              "stats" : {
                "incremental" : {
                  "file_count" : 119,
                  "size_in_bytes" : 3862661102
                },
                "total" : {
                  "file_count" : 119,
                  "size_in_bytes" : 3862661102
                },
                "start_time_in_millis" : 1633320103168,
                "time_in_millis" : 92442
              }
            }
          }
        }
      }
    }
  ]
}

방법 3. Amazon managed ES에서 스냅샷을 백업하고 복구하는 방법

그런 다음에 아래 그림과 같이 Amazon managed ES에 데이터 적재를 위한 권한을 위임할 IAM 역할을 생성한다.

스냅샷 이름은 pms_es_snapshot_role이라고 하자.

3

trust relationships 양식은 아래와 같다.

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Service": [
          "ec2.amazonaws.com",
          "es.amazonaws.com"
        ]
      },
      "Action": "sts:AssumeRole"
    }
  ]
}

pms_es_snapshot_role IAM role에 아래와 같이 s3, es policy를 추가해주자.

4

policy 양식은 아래와 같다.

// S3 policy
{
  "Version": "2012-10-17",
  "Statement": [{
      "Action": [
        "s3:*"
      ],
      "Effect": "Allow",
      "Resource": [
        "arn:aws:s3:::{s3_bucket_name}"
      ]
    },
    {
      "Action": [
        "s3:*"
      ],
      "Effect": "Allow",
      "Resource": [
        "arn:aws:s3:::{s3_bucket_name}/*"
      ]
    }
  ]
}

// Elasticsearch policy
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": "iam:PassRole",
      "Resource": "{TheSnapshotRole ARN}"
    },
    {
      "Effect": "Allow",
      "Action": "es:*",
      "Resource": "arn:aws:es:{region_name}:{account_id}:{domain}/{domain_name}/*"
    }
  ]
}

그런 다음에 아래와 같이 위에서 생성한 role을 매핑해준다.

Amazon ES 도메인의 Kibana 플러그인으로 이동해서 기본 메뉴에서 Security의 Roles 선택한다. 그런 다음에 manage_snapshots역할을 검색 및 선택한다. 그리고 Mapped users 탭의 Manage mapping 선택한 다음에 Backend_role에 전달할 권한이 있는 역할의 도메인 ARN 추가한다.위에서 생성한 pms_es_snapshot_role 이 ARN 의 형식을 사용해서 역할을 추가해준다.

그런 다음에 pms_es_snapshot_role에 모든 권한을 부여하기 위해 기본 메뉴에서 Security > Roles > all_access 선택해서 Mapped users 에 pms_es_snapshot_role ARN을 추가해준다.

5

매핑이 끝나면 Managed ES에 s3 레포지토리를 지정해줘야 한다.

S3 스냅샷 리포지토리를 등록하려면 Amazon ES 도메인 엔드포인트에 PUT 요청을 보내야 한다. curl이 작업은 AWS 요청 서명을 지원하지 않기 때문에 사용할 수 없다. 대신에 샘플 Python 클라이언트 Postman 등의 다른 방법으로 서명 요청을 전송해 스냅샷 리포지토리를 등록해야 한다.

아래와 같은 파이썬 스크립트 양식을 이용해서 Repository 지정을 하는 python 스크립트를 작성한다.

import boto3
import requests
from requests_aws4auth import AWS4Auth

host = '{aws Elasticsearch Domain}:443'
region = '{region_name}' # e.g. us-west-2
service = 'es'
credentials = boto3.Session().get_credentials()
awsauth = AWS4Auth(credentials.access_key, credentials.secret_key, region, service, session_token=credentials.token)

# Register repository

path = '_snapshot/{repository_name}' # the Elasticsearch API endpoint
url = host + path

payload = {
  "type": "s3",
  "settings": {
    "bucket": "{migration-bucket}",
    "region": "{region_name}",
    "base_path": "{s3 base prefix}",
    "role_arn": "{pms_es_snapshot_role ARN}"
  }
}

headers = {"Content-Type": "application/json"}

r = requests.put(url, auth=awsauth, json=payload, headers=headers)

print(r.status_code)
print(r.text)

설치형 EC2의 코디네이트 노드(또는 지정된 마스터노드)에 접속해서 아래와 같이 명령어를 실행하였는데 어떤 서버에서 실행하던지 사실 상관은 없어보인다.

# 위에 python 스크립트를 실행할 서버에 접속해서 아래와 같이 위에서 언급했던 minsupark이라는 AWS 계정정보를 설정해준다.
[ec2-user@es-coordinater ~]$ aws configure
AWS Access Key ID [None]: xxxxxxxxxxxxxxxx
AWS Secret Access Key [None]: yyyyyyyyyyyyyyyyyy
Default region name [None]: ap-northeast-2
Default output format [None]: json

[ec2-user@es-coordinater ~]$ vim managed_es_s3_repo.py
import boto3
import requests
from requests_aws4auth import AWS4Auth

host = 'https://search-pms-managed-es-test-xxxxxxxxxxx.ap-northeast-2.es.amazonaws.com:443'
region = 'ap-northeast-2' # e.g. us-west-2
service = 'es'
credentials = boto3.Session().get_credentials()
awsauth = AWS4Auth(credentials.access_key, credentials.secret_key, region, service, session_token=credentials.token)

# Register repository

path = '/_snapshot/minman_s3_es_repo' # the Elasticsearch API endpoint
url = host + path

payload = {
  "type": "s3",
  "settings": {
    "bucket": "pms-bucket-test",
    "region": "ap-northeast-2",
    "base_path": "elastic_repo",
    "role_arn": "arn:aws:iam::xxxxxxxxxxx:role/pms_es_snapshot_role"
  }
}

headers = {"Content-Type": "application/json"}

r = requests.put(url, auth=awsauth, json=payload, headers=headers)

print(r.status_code)
print(r.text)

[ec2-user@es-coordinater ~]$ python3 managed_es_s3_repo.py
200
{"acknowledged":true}

아래와 같은 양식으로 Snapshot을 뜨는 python 스크립트를 작성해서 활용할수도 있다.

하지만 우리는 이미 스냅샷 뜬게 있어서 그거를 사용할 것이다.

import boto3
import requests
from pprint import pprint
from requests_aws4auth import AWS4Auth

host = '{aws Elasticsearch Domain}:443'
region = '{region_name}' # e.g. us-west-2
service = 'es'
credentials = boto3.Session().get_credentials()
awsauth = AWS4Auth(credentials.access_key, credentials.secret_key, region, service, session_token=credentials.token)

# take snapshot
path = '_snapshot/{repository_name}/{backup_key}'
url = host + path

r = requests.put(url, auth=awsauth)

pprint(r.status_code)
pprint(r.text)

그런 다음에 다시 명령어를 아래와 같이 실행해주면 스냅샷을 뜰 수 있다.

[ec2-user@es-coordinater ~]$ vim managed_es_check_s3_snapshot.py
import boto3
import requests
from pprint import pprint
from requests_aws4auth import AWS4Auth

host = 'https://search-pms-managed-es-test-xxxxxxxxxxxxxxxx.ap-northeast-2.es.amazonaws.com:443'
region = 'ap-northeast-2' # e.g. us-west-2
service = 'es'
credentials = boto3.Session().get_credentials()
awsauth = AWS4Auth(credentials.access_key, credentials.secret_key, region, service, session_token=credentials.token)

# Take snapshot
path = '/_snapshot/minman_s3_es_repo/minman_test_index_backup_key'
url = host + path

r = requests.put(url, auth=awsauth)

pprint(r.status_code)
pprint(r.text)

[ec2-user@es-coordinater ~]$ python3 managed_es_check_s3_snapshot.py
200
'{"accepted":true}'

그런 다음에 아래와 같은 양식으로 managed ES에 스냅샷을 복구하는 python script를 작성한다.

import boto3
import requests
from pprint import pprint
from requests_aws4auth import AWS4Auth

host = '{aws Elasticsearch Domain}:443'
region = '{region_name}' # e.g. us-west-2
service = 'es'
credentials = boto3.Session().get_credentials()
awsauth = AWS4Auth(credentials.access_key, credentials.secret_key, region, service, session_token=credentials.token)

# Restore snapshot (one index)
path = '/_snapshot/{repository_name}/{backup_key}/_restore'
url = host + path
payload = {
    "indices": "{index}", # 위에서 백업한 인덱스
    "ignore_unavailable" : "true",
    "include_global_state" : "false"
}
headers = {"Content-Type": "application/json"}
r = requests.post(url, auth=awsauth, json=payload, headers=headers)
pprint(r.text)

그런 다음에 다시 명령어를 아래와 같이 실행하여 스냅샷을 managed es로 리스토어 해준다.

[ec2-user@es-coordinater ~]$ vim restore_es_snapshot.py
import boto3
import requests
from pprint import pprint
from requests_aws4auth import AWS4Auth

host = 'https://search-pms-managed-es-test-xxxxxxxxxxxxxxxx.ap-northeast-2.es.amazonaws.com:443'
region = 'ap-northeast-2' # e.g. us-west-2
service = 'es'
credentials = boto3.Session().get_credentials()
awsauth = AWS4Auth(credentials.access_key, credentials.secret_key, region, service, session_token=credentials.token)

# Restore snapshot (one index)
path = '/_snapshot/minman_s3_es_repo/minman_test_index_backup_key/_restore'
url = host + path
payload = {
    "indices": "minman_test_index",
    "ignore_unavailable" : "true",
    "include_global_state" : "false"
}
headers = {"Content-Type": "application/json"}
r = requests.post(url, auth=awsauth, json=payload, headers=headers)
pprint(r.text)

[ec2-user@es-coordinater ~]$ python3 restore_es_snapshot.py

# 리스토어 후 인덱스 확인하기
# curl -u USER:PASS -H "content-type: application/JSON" -XPUT 'http://{aws Elasticsearch Domain}:443/_cat/indices?v?pretty'
[ec2-user@es-coordinater ~]$ curl -u minman:mypassword1! -H "content-type: application/JSON" -XPUT 'https://search-pms-es-managed-test-xxxxxxxxxx.ap-northeast-2.es.amazonaws.com:443/_cat/indices?v?pretty'

방법 4. EC2 설치형 ES에서 Elasticdump를 이용하여 데이터 이관하는 방법

Elasticdump라는 툴을 이용해서 데이터를 이관하는 방법도 있다.

URL : https://github.com/elasticsearch-dump/elasticsearch-dump

예를들어서 EC2 설치형 ES에서 데이터를 백업하고 싶다고 한다면 코디네이트 노드에서 아래와 같이 명령어를 실행한다.

[ec2-user@es-coordinater ~]$ curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.34.0/install.sh | bash
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 13226  100 13226    0     0  45294      0 --:--:-- --:--:-- --:--:-- 45139
=> Downloading nvm as script to '/home/ec2-user/.nvm'

=> Appending nvm source string to /home/ec2-user/.bashrc
=> Appending bash_completion source string to /home/ec2-user/.bashrc
=> Close and reopen your terminal to start using nvm or run the following to use it now:

export NVM_DIR="$HOME/.nvm"
[ -s "$NVM_DIR/nvm.sh" ] && \. "$NVM_DIR/nvm.sh"  # This loads nvm
[ -s "$NVM_DIR/bash_completion" ] && \. "$NVM_DIR/bash_completion"  # This loads nvm bash_completion

[ec2-user@es-coordinater ~]$ . ~/.nvm/nvm.sh

[ec2-user@es-coordinater ~]$ nvm install node
Downloading and installing node v16.10.0...
Downloading https://nodejs.org/dist/v16.10.0/node-v16.10.0-linux-x64.tar.xz...
################################################################################################################################################################################### 100.0%
Computing checksum with sha256sum
Checksums matched!
Now using node v16.10.0 (npm v7.24.0)
Creating default alias: default -> node (-> v16.10.0)
    
[ec2-user@es-coordinater ~]$ node -e "console.log('Running Node.js ' + process.version)"
Running Node.js v16.10.0

[ec2-user@es-coordinater ~]$ npm install elasticdump
npm WARN deprecated s3signed@0.1.0: This module is no longer maintained. It is provided as is.
npm WARN deprecated har-validator@5.1.5: this library is no longer supported
npm WARN deprecated querystring@0.2.0: The querystring API is considered Legacy. new code should use the URLSearchParams API instead.
npm WARN deprecated uuid@3.3.2: Please upgrade  to version 7 or higher.  Older versions may use Math.random() in certain circumstances, which is known to be problematic.  See https://v8.dev/blog/math-random for details.
npm WARN deprecated request@2.88.2: request has been deprecated, see https://github.com/request/request/issues/3142

added 113 packages, and audited 114 packages in 8s

5 packages are looking for funding
  run `npm fund` for details

found 0 vulnerabilities
npm notice
npm notice New patch version of npm available! 7.24.0 -> 7.24.1
npm notice Changelog: https://github.com/npm/cli/releases/tag/v7.24.1
npm notice Run npm install -g npm@7.24.1 to update!
npm notice

# httpAuthFile.txt 양식은 아래와 같다.
#user={username}
#password={password}
[ec2-user@es-coordinater ~]$ vim /home/ec2-user/httpAuthFile.txt
user=elastic
password=mypassword1!

# vim json_export.sh을 작성하는데 양식은 아래와 같다.
# #! /bin/sh

# /home/ec2-user/node_modules/elasticdump/bin/elasticdump\
#   --s3AccessKeyId "{ACCESS_KEY}" \
#   --s3SecretAccessKey "{SECRET_KEY}" \
#   --input=http://{Master_Private_IP}:9200/{Index_Name}\
#   --output "s3://{Bucket_Name}/{Path}/{Target_File_Name}"\
#   --limit=10000 \
#   --httpAuthFile=/home/ec2-user/httpAuthFile.txt
[ec2-user@es-coordinater ~]$ vim json_export.sh
#! /bin/sh

/home/ec2-user/node_modules/elasticdump/bin/elasticdump\
  --s3AccessKeyId "xxxxxxxxxxxxxxxxxxxx" \
  --s3SecretAccessKey "yyyyyyyyyyyyyyyyyyyyyyyyyy" \
  --input=http://10.10.1.226:9200/minman_test_index\
  --output "s3://pms-bucket-test/esdump_test_folder/esdump_test"\
  --limit=10000 \
  --httpAuthFile=/home/ec2-user/httpAuthFile.txt

# 또는 nohup /home/ec2-user/json_export.sh > dump_json.log & 명령어를 이용해서 백그라운드로도 띄울 수 있다.
[ec2-user@es-coordinater ~]$ sh json_export.sh
Mon, 04 Oct 2021 11:16:59 GMT | starting dump
Mon, 04 Oct 2021 11:16:59 GMT | got 10000 objects from source elasticsearch (offset: 0)
Mon, 04 Oct 2021 11:16:59 GMT | sent 10000 objects to destination s3, wrote 10000
Mon, 04 Oct 2021 11:17:00 GMT | got 10000 objects from source elasticsearch (offset: 10000)
Mon, 04 Oct 2021 11:17:00 GMT | sent 10000 objects to destination s3, wrote 10000
Mon, 04 Oct 2021 11:17:00 GMT | got 10000 objects from source elasticsearch (offset: 20000)
Mon, 04 Oct 2021 11:17:00 GMT | sent 10000 objects to destination s3, wrote 10000
Mon, 04 Oct 2021 11:17:00 GMT | got 10000 objects from source elasticsearch (offset: 30000)
Mon, 04 Oct 2021 11:17:00 GMT | sent 10000 objects to destination s3, wrote 10000
Mon, 04 Oct 2021 11:17:00 GMT | got 10000 objects from source elasticsearch (offset: 40000)
Mon, 04 Oct 2021 11:17:01 GMT | sent 10000 objects to destination s3, wrote 10000
Mon, 04 Oct 2021 11:17:01 GMT | got 10000 objects from source elasticsearch (offset: 50000)
Mon, 04 Oct 2021 11:17:01 GMT | sent 10000 objects to destination s3, wrote 10000
Mon, 04 Oct 2021 11:17:05 GMT | got 10000 objects from source elasticsearch (offset: 60000)
Mon, 04 Oct 2021 11:17:05 GMT | sent 10000 objects to destination s3, wrote 10000
Mon, 04 Oct 2021 11:17:05 GMT | got 10000 objects from source elasticsearch (offset: 70000)
Mon, 04 Oct 2021 11:17:05 GMT | sent 10000 objects to destination s3, wrote 10000
Mon, 04 Oct 2021 11:17:05 GMT | got 10000 objects from source elasticsearch (offset: 80000)
Mon, 04 Oct 2021 11:17:05 GMT | sent 10000 objects to destination s3, wrote 10000
Mon, 04 Oct 2021 11:17:05 GMT | got 10000 objects from source elasticsearch (offset: 90000)
Mon, 04 Oct 2021 11:17:05 GMT | sent 10000 objects to destination s3, wrote 10000
Mon, 04 Oct 2021 11:17:05 GMT | got 10000 objects from source elasticsearch (offset: 100000)

...

Mon, 04 Oct 2021 11:35:10 GMT | got 10000 objects from source elasticsearch (offset: 10910000)
Mon, 04 Oct 2021 11:35:10 GMT | sent 10000 objects to destination s3, wrote 10000
Mon, 04 Oct 2021 11:35:10 GMT | got 10000 objects from source elasticsearch (offset: 10920000)
Mon, 04 Oct 2021 11:35:10 GMT | sent 10000 objects to destination s3, wrote 10000
Mon, 04 Oct 2021 11:35:10 GMT | got 10000 objects from source elasticsearch (offset: 10930000)
Mon, 04 Oct 2021 11:35:11 GMT | sent 10000 objects to destination s3, wrote 10000
Mon, 04 Oct 2021 11:35:11 GMT | got 10000 objects from source elasticsearch (offset: 10940000)
Mon, 04 Oct 2021 11:35:11 GMT | sent 10000 objects to destination s3, wrote 10000
Mon, 04 Oct 2021 11:35:11 GMT | got 6404 objects from source elasticsearch (offset: 10950000)
Mon, 04 Oct 2021 11:35:11 GMT | sent 6404 objects to destination s3, wrote 6404
Mon, 04 Oct 2021 11:35:15 GMT | got 0 objects from source elasticsearch (offset: 10956404)
Mon, 04 Oct 2021 11:35:15 GMT | Total Writes: 10956404
Mon, 04 Oct 2021 11:35:15 GMT | dump complete
Mon, 04 Oct 2021 11:35:16 GMT | Uploaded esdump_test_folder/esdump_test

[ec2-user@es-coordinater ~]$ aws s3 ls s3://pms-bucket-test/esdump_test_folder/
2021-10-04 20:17:00 5205213518 esdump_test