Keras 창시자로부터 배우는 TensorFlow 2.0 + Keras 오버뷰자료 실습

2020-02-28

딥러닝

Deep_Learning_TIL_(20200228)

학습 시 참고자료(출처)

자료 : Keras 창시자로부터 배우는 TensorFlow 2.0 + Keras 특강

URL : https://colab.research.google.com/drive/1p4RhSj1FEuscyZP81ocn8IeGD_2r46fS?fbclid=IwAR0qED1D3sAQk4oEVC0IolOopC9ur3LlnUsHFLl-YyFHuPUPtUxk4YUBVa0

실습환경 : Google Colab

!pip install tensorflow==2.0.0

Collecting tensorflow==2.0.0
[?25l  Downloading https://files.pythonhosted.org/packages/46/0f/7bd55361168bb32796b360ad15a25de6966c9c1beb58a8e30c01c8279862/tensorflow-2.0.0-cp36-cp36m-manylinux2010_x86_64.whl (86.3MB)
[K     |████████████████████████████████| 86.3MB 52kB/s 
[?25hRequirement already satisfied: wheel>=0.26 in /usr/local/lib/python3.6/dist-packages (from tensorflow==2.0.0) (0.34.2)
Requirement already satisfied: grpcio>=1.8.6 in /usr/local/lib/python3.6/dist-packages (from tensorflow==2.0.0) (1.27.1)
Collecting tensorboard<2.1.0,>=2.0.0
[?25l  Downloading https://files.pythonhosted.org/packages/76/54/99b9d5d52d5cb732f099baaaf7740403e83fe6b0cedde940fabd2b13d75a/tensorboard-2.0.2-py3-none-any.whl (3.8MB)
[K     |████████████████████████████████| 3.8MB 58.5MB/s 
[?25hRequirement already satisfied: termcolor>=1.1.0 in /usr/local/lib/python3.6/dist-packages (from tensorflow==2.0.0) (1.1.0)
Requirement already satisfied: absl-py>=0.7.0 in /usr/local/lib/python3.6/dist-packages (from tensorflow==2.0.0) (0.9.0)
Requirement already satisfied: astor>=0.6.0 in /usr/local/lib/python3.6/dist-packages (from tensorflow==2.0.0) (0.8.1)
Requirement already satisfied: keras-preprocessing>=1.0.5 in /usr/local/lib/python3.6/dist-packages (from tensorflow==2.0.0) (1.1.0)
Requirement already satisfied: six>=1.10.0 in /usr/local/lib/python3.6/dist-packages (from tensorflow==2.0.0) (1.12.0)
Requirement already satisfied: protobuf>=3.6.1 in /usr/local/lib/python3.6/dist-packages (from tensorflow==2.0.0) (3.10.0)
Requirement already satisfied: keras-applications>=1.0.8 in /usr/local/lib/python3.6/dist-packages (from tensorflow==2.0.0) (1.0.8)
Collecting tensorflow-estimator<2.1.0,>=2.0.0
[?25l  Downloading https://files.pythonhosted.org/packages/fc/08/8b927337b7019c374719145d1dceba21a8bb909b93b1ad6f8fb7d22c1ca1/tensorflow_estimator-2.0.1-py2.py3-none-any.whl (449kB)
[K     |████████████████████████████████| 450kB 70.5MB/s 
[?25hRequirement already satisfied: numpy<2.0,>=1.16.0 in /usr/local/lib/python3.6/dist-packages (from tensorflow==2.0.0) (1.17.5)
Requirement already satisfied: google-pasta>=0.1.6 in /usr/local/lib/python3.6/dist-packages (from tensorflow==2.0.0) (0.1.8)
Requirement already satisfied: gast==0.2.2 in /usr/local/lib/python3.6/dist-packages (from tensorflow==2.0.0) (0.2.2)
Requirement already satisfied: wrapt>=1.11.1 in /usr/local/lib/python3.6/dist-packages (from tensorflow==2.0.0) (1.11.2)
Requirement already satisfied: opt-einsum>=2.3.2 in /usr/local/lib/python3.6/dist-packages (from tensorflow==2.0.0) (3.1.0)
Requirement already satisfied: setuptools>=41.0.0 in /usr/local/lib/python3.6/dist-packages (from tensorboard<2.1.0,>=2.0.0->tensorflow==2.0.0) (45.1.0)
Requirement already satisfied: werkzeug>=0.11.15 in /usr/local/lib/python3.6/dist-packages (from tensorboard<2.1.0,>=2.0.0->tensorflow==2.0.0) (1.0.0)
Requirement already satisfied: google-auth<2,>=1.6.3 in /usr/local/lib/python3.6/dist-packages (from tensorboard<2.1.0,>=2.0.0->tensorflow==2.0.0) (1.7.2)
Requirement already satisfied: requests<3,>=2.21.0 in /usr/local/lib/python3.6/dist-packages (from tensorboard<2.1.0,>=2.0.0->tensorflow==2.0.0) (2.21.0)
Requirement already satisfied: markdown>=2.6.8 in /usr/local/lib/python3.6/dist-packages (from tensorboard<2.1.0,>=2.0.0->tensorflow==2.0.0) (3.2.1)
Requirement already satisfied: google-auth-oauthlib<0.5,>=0.4.1 in /usr/local/lib/python3.6/dist-packages (from tensorboard<2.1.0,>=2.0.0->tensorflow==2.0.0) (0.4.1)
Requirement already satisfied: h5py in /usr/local/lib/python3.6/dist-packages (from keras-applications>=1.0.8->tensorflow==2.0.0) (2.8.0)
Requirement already satisfied: cachetools<3.2,>=2.0.0 in /usr/local/lib/python3.6/dist-packages (from google-auth<2,>=1.6.3->tensorboard<2.1.0,>=2.0.0->tensorflow==2.0.0) (3.1.1)
Requirement already satisfied: rsa<4.1,>=3.1.4 in /usr/local/lib/python3.6/dist-packages (from google-auth<2,>=1.6.3->tensorboard<2.1.0,>=2.0.0->tensorflow==2.0.0) (4.0)
Requirement already satisfied: pyasn1-modules>=0.2.1 in /usr/local/lib/python3.6/dist-packages (from google-auth<2,>=1.6.3->tensorboard<2.1.0,>=2.0.0->tensorflow==2.0.0) (0.2.8)
Requirement already satisfied: idna<2.9,>=2.5 in /usr/local/lib/python3.6/dist-packages (from requests<3,>=2.21.0->tensorboard<2.1.0,>=2.0.0->tensorflow==2.0.0) (2.8)
Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.6/dist-packages (from requests<3,>=2.21.0->tensorboard<2.1.0,>=2.0.0->tensorflow==2.0.0) (2019.11.28)
Requirement already satisfied: chardet<3.1.0,>=3.0.2 in /usr/local/lib/python3.6/dist-packages (from requests<3,>=2.21.0->tensorboard<2.1.0,>=2.0.0->tensorflow==2.0.0) (3.0.4)
Requirement already satisfied: urllib3<1.25,>=1.21.1 in /usr/local/lib/python3.6/dist-packages (from requests<3,>=2.21.0->tensorboard<2.1.0,>=2.0.0->tensorflow==2.0.0) (1.24.3)
Requirement already satisfied: requests-oauthlib>=0.7.0 in /usr/local/lib/python3.6/dist-packages (from google-auth-oauthlib<0.5,>=0.4.1->tensorboard<2.1.0,>=2.0.0->tensorflow==2.0.0) (1.3.0)
Requirement already satisfied: pyasn1>=0.1.3 in /usr/local/lib/python3.6/dist-packages (from rsa<4.1,>=3.1.4->google-auth<2,>=1.6.3->tensorboard<2.1.0,>=2.0.0->tensorflow==2.0.0) (0.4.8)
Requirement already satisfied: oauthlib>=3.0.0 in /usr/local/lib/python3.6/dist-packages (from requests-oauthlib>=0.7.0->google-auth-oauthlib<0.5,>=0.4.1->tensorboard<2.1.0,>=2.0.0->tensorflow==2.0.0) (3.1.0)
Installing collected packages: tensorboard, tensorflow-estimator, tensorflow
  Found existing installation: tensorboard 1.15.0
    Uninstalling tensorboard-1.15.0:
      Successfully uninstalled tensorboard-1.15.0
  Found existing installation: tensorflow-estimator 1.15.1
    Uninstalling tensorflow-estimator-1.15.1:
      Successfully uninstalled tensorflow-estimator-1.15.1
  Found existing installation: tensorflow 1.15.0
    Uninstalling tensorflow-1.15.0:
      Successfully uninstalled tensorflow-1.15.0
Successfully installed tensorboard-2.0.2 tensorflow-2.0.0 tensorflow-estimator-2.0.1

import tensorflow as tf
print(tf.__version__)

2.0.0

TensorFlow 2.0 + Keras, 딥러닝 연구자들을 위한 오버뷰

@fchollet, October 2019 (번역 @chansung)

원본은 TensorFlow 2.0 + Keras Overview for Deep Learning Researchers입니다.

이 문서는 입문, 특강, 그리고 TensorFlow 2.0의 API를 빠르게 참조하는 목적을 위해 제공됩니다.

TensorFlow와 Keras는 모두 약 4년전쯤 릴리즈 되었습니다 (Keras는 2015년 3월, TensorFlow는 2015년 11월). 이는 딥러닝 세계의 관점에서 볼 때, 꽤 오랜시간이라고 볼 수 있습니다!

과거에 TensorFlow 1.x + Keras는 여러가지 알려진 문제점을 가지고 있었습니다:

TensorFlow를 사용한다는것은 정적인 계산 그래프를 조작함을 의미하는것으로, Imperative 코딩 스타일을 사용하는 프로그래머로 하여금 어렵고, 불편한 느낌을 받게 했었습니다.
TensorFlow API가 매우 강력하면서도 유연하지만, 빠른 코드의 작성의 가능성이 결여되어 있었으며 종종 사용법은 어렵고 혼란스러웠습니다.
Keras는 매우 생산적이고 사용이 쉽지만, 연구에 사용된 사례에서 종종 유연성이 결여되었었습니다.

TensorFlow 2.0은 TensorFlow와 Keras를 대대적으로 새로이 디자인한 것으로, 지난 4년간의 사용자 피드백과 기술의 진보가 모두 고려되었습니다. 위에서 언급된 문제점들을 대규모로 수정합니다.

미래에서온 차세대 머신러닝 플랫폼입니다

TensorFlow 2.0은 아래와 같은 주요 아이디어에 기반하고 있습니다:

사용자들이 계산을 eagerly하게 수행할 수 있게 해줍니다. 이는 Numpy를 사용하는법과 유사합니다. 이는 TensorFlow 2.0을 이용한 프로그래밍이 직관적이며 동시에 파이토닉할 수 있게끔 해 줍니다.
컴파일된 그래프의 엄청난 이점을 그대로 보존하는데, 이는 성능, 분산, 그리고 배포를 위함입니다. 이 내용은 TensorFlow를 빠르고, 분산 구조에서의 확장 가능하며, 상용화에 준비될 수 있도록 해 줍니다.
Keras를 딥러닝의 고수준 API로 채택하여, TensorFlow를 이해하기 쉬우면서도 높은 생산성을 가질 수 있게 만들어 줍니다.
매우 고수준(더 쉬운 사용성, 약간 부족한 유연성) 에서부터 매우 저수준(더 깊은 전문성, 매우 뛰어난 유연성)의 다양한 범위의 작업으로까지 Keras를 확장합니다.

파트 1: TensorFlow의 기본

Tensors (텐서)

다음은 상수형 텐서 입니다:

x = tf.constant([[5, 2], [1, 3]])
print(x)

tf.Tensor(
[[5 2]
 [1 3]], shape=(2, 2), dtype=int32)

해당 텐서의 값을 Numpy 배열형태로 가져오고 싶다면 .numpy()를 호출하면 됩니다:

x.numpy()

array([[5, 2],
       [1, 3]], dtype=int32)

Numpy 배열과 꽤나 유사한 점으로 dtype과 shape이라는 속성을 가집니다:

print('dtype:', x.dtype)
print('shape:', x.shape)

dtype: <dtype: 'int32'>
shape: (2, 2)

상수형 텐서를 생성하는 보편적인 방법은 tf.ones과 tf.zeros를 사용하는 것입니다(이는 Numpy의 np.ones 및 np.zeros와 유사합니다):

print(tf.ones(shape=(2, 1)))
print(tf.zeros(shape=(2, 1)))

tf.Tensor(
[[1.]
 [1.]], shape=(2, 1), dtype=float32)
tf.Tensor(
[[0.]
 [0.]], shape=(2, 1), dtype=float32)

랜덤한 상수형 텐서

다음은 랜덤한 정규분포로부터 상수를 생성합니다:

tf.random.normal(shape=(2, 2), mean=0., stddev=1.)

<tf.Tensor: id=12, shape=(2, 2), dtype=float32, numpy=
array([[-1.0932361,  2.228664 ],
       [-0.8287051, -0.9424461]], dtype=float32)>

그리고 다음은 랜덤한 균등분포로부터 값이 채워지는 정수형 텐서를 보여줍니다:

tf.random.uniform(shape=(2, 2), minval=0, maxval=10, dtype='int32')

<tf.Tensor: id=16, shape=(2, 2), dtype=int32, numpy=
array([[0, 3],
       [5, 1]], dtype=int32)>

Variables (변수)

Variables는 변할 수 있는 상태(뉴럴넷의 가중치와 같은)를 저장하는데 사용되는 특별한 텐서 입니다. 초기값을 사용해서 Variable을 생성할 수 있습니다:

initial_value = tf.random.normal(shape=(2, 2))
a = tf.Variable(initial_value)
print(a)

<tf.Variable 'Variable:0' shape=(2, 2) dtype=float32, numpy=
array([[-1.4263921 ,  0.49103293],
       [-0.36253545,  1.8493237 ]], dtype=float32)>

.assign(value), .assign_add(increment), 또는 .assign_sub(decrement)와 같은 메소드를 사용해서 Variable의 값을 갱신합니다:

new_value = tf.random.normal(shape=(2, 2))
a.assign(new_value)
for i in range(2):
  for j in range(2):
    assert a[i, j] == new_value[i, j]

print(a)

<tf.Variable 'Variable:0' shape=(2, 2) dtype=float32, numpy=
array([[ 1.5460566 ,  1.1849345 ],
       [ 0.24738911, -0.30803028]], dtype=float32)>

added_value = tf.random.normal(shape=(2, 2))
a.assign_add(added_value)
for i in range(2):
  for j in range(2):
    assert a[i, j] == new_value[i, j] + added_value[i, j]

print(a)

<tf.Variable 'Variable:0' shape=(2, 2) dtype=float32, numpy=
array([[ 0.83296186,  0.58070505],
       [ 1.7173103 , -0.678332  ]], dtype=float32)>

TensorFlow에서 수학을 하는것

TensorFlow는 Numpy를 사용하는것과 정확히 똑같은 방법으로 사용할 수 있습니다. 이 둘의 주요 다른점은 작성한 TensorFlow의 코드는 GPU와 TPU 상에서 실행될 수 있다는 점입니다:

a = tf.random.normal(shape=(2, 2))
b = tf.random.normal(shape=(2, 2))

c = a + b
d = tf.square(c)
e = tf.exp(d)

print(a,'\n',b,'\n',c,'\n',d,'\n',e)

tf.Tensor(
[[-1.7188389  -0.715927  ]
 [-0.14043556  0.7175207 ]], shape=(2, 2), dtype=float32) 
 tf.Tensor(
[[-0.36652637  0.55603373]
 [ 0.89047074  0.614373  ]], shape=(2, 2), dtype=float32) 
 tf.Tensor(
[[-2.0853653  -0.15989327]
 [ 0.75003517  1.3318937 ]], shape=(2, 2), dtype=float32) 
 tf.Tensor(
[[4.348748   0.02556586]
 [0.56255275 1.7739408 ]], shape=(2, 2), dtype=float32) 
 tf.Tensor(
[[77.38154    1.0258955]
 [ 1.7551472  5.894035 ]], shape=(2, 2), dtype=float32)

`GradientTape`을 사용해서 경사도를 계산하는것

한 가지 더 Numpy와의 큰 차이점이 있습니다: 모든 미분가능한 표현에 대해서, 자동으로 경사도를 구하는 것이 가능합니다.

단순히 GradientTape를 열게되면, 그때부턴 tape.watch()를 통해 텐서를 확인하고, 이 텐서를 입력으로써 사용하는 미분가능한 표현을 구성하는것이 가능합니다:

a = tf.random.normal(shape=(2, 2))
b = tf.random.normal(shape=(2, 2))

with tf.GradientTape() as tape:
  tape.watch(a)  # `a`에 적용되는 연산의 히스토리에 대한 기록을 시작
  c = tf.sqrt(tf.square(a) + tf.square(b))  # `a`를 사용하여 몇 가지 수학을 수행
  # `a`에 대한 `c`의 경사도는 무엇인가?
  dc_da = tape.gradient(c, a)
  print(dc_da)

tf.Tensor(
[[ 0.94430184  0.04223739]
 [-0.860221    0.6537137 ]], shape=(2, 2), dtype=float32)

디폴트로는 Variable들은 자동으로 watch가 적용되어 있기 때문에, 수동으로 watch를 해 줄 필요는 없습니다:

a = tf.Variable(a)

with tf.GradientTape() as tape:
  c = tf.sqrt(tf.square(a) + tf.square(b))
  dc_da = tape.gradient(c, a)
  print(dc_da)

tf.Tensor(
[[ 0.94430184  0.04223739]
 [-0.860221    0.6537137 ]], shape=(2, 2), dtype=float32)

GradientTape을 중첩시켜서 고차원의 미분을 계산할 수도 있습니다:

with tf.GradientTape() as outer_tape:
  with tf.GradientTape() as tape:
    c = tf.sqrt(tf.square(a) + tf.square(b))
    dc_da = tape.gradient(c, a)
  d2c_da2 = outer_tape.gradient(dc_da, a)
  print(d2c_da2)

tf.Tensor(
[[0.14804387 0.35640335]
 [0.40593112 2.9766662 ]], shape=(2, 2), dtype=float32)

end-to-end 예제: 선형 회귀

지금까지 TensorFlow는 Numpy와 비슷한 라이브러리인데, 추가적으로 GPU 또는 TPU를 통해 가속될 수 있고, 자동으로 미분이 계산된다는 내용을 배웠습니다. 그러면 이제는 end-to-end 예제를 알아볼 시간입니다: 머신러닝의 피즈버즈인, 선형 회귀를 구현해 봅시다.

이를 보여주기 위해서, Layer 또는 MeanSquaredError와 같은 Keras의 고수준 컴포넌트를 사용하지 않을 것입니다. 단지 기본적인 연산자만을 사용합니다.

input_dim = 2
output_dim = 1
learning_rate = 0.01

# 가중치 행렬입니다
w = tf.Variable(tf.random.uniform(shape=(input_dim, output_dim)))
# 편향 벡터입니다
b = tf.Variable(tf.zeros(shape=(output_dim,)))

def compute_predictions(features):
  return tf.matmul(features, w) + b

def compute_loss(labels, predictions):
  return tf.reduce_mean(tf.square(labels - predictions))

def train_on_batch(x, y):
  with tf.GradientTape() as tape:
    predictions = compute_predictions(x)
    loss = compute_loss(y, predictions)
    dloss_dw, dloss_db = tape.gradient(loss, [w, b])
  w.assign_sub(learning_rate * dloss_dw)
  b.assign_sub(learning_rate * dloss_db)
  return loss

작성한 모델을 검증하기 위한, 인공적인 데이터를 생성해 보겠습니다:

import numpy as np
import random
import matplotlib.pyplot as plt
%matplotlib inline

# 데이터셋을 준비합니다
num_samples = 10000
negative_samples = np.random.multivariate_normal(
    mean=[0, 3], cov=[[1, 0.5],[0.5, 1]], size=num_samples)
positive_samples = np.random.multivariate_normal(
    mean=[3, 0], cov=[[1, 0.5],[0.5, 1]], size=num_samples)
features = np.vstack((negative_samples, positive_samples)).astype(np.float32)
labels = np.vstack((np.zeros((num_samples, 1), dtype='float32'),
                    np.ones((num_samples, 1), dtype='float32')))

plt.scatter(features[:, 0], features[:, 1], c=labels[:, 0])

<matplotlib.collections.PathCollection at 0x7ff0b5880550>

그러면, 데이터의 배치크기 단위로 돌면서, train_on_batch 함수를 반복적으로 호출하여 선형 회귀 모델을 학습시켜 봅시다:

# 데이터를 무작위로 섞습니다
random.Random(1337).shuffle(features)
random.Random(1337).shuffle(labels)

# 손쉽게 배치화된 반복을 위해, tf.data.Dataset 객체를 생성합니다
dataset = tf.data.Dataset.from_tensor_slices((features, labels))
dataset = dataset.shuffle(buffer_size=1024).batch(256)

for epoch in range(10):
  for step, (x, y) in enumerate(dataset):
    loss = train_on_batch(x, y)
  print('Epoch %d: 마지막 배치의 손실값 = %.4f' % (epoch, float(loss)))

Epoch 0: 마지막 배치의 손실값 = 0.0537
Epoch 1: 마지막 배치의 손실값 = 0.0898
Epoch 2: 마지막 배치의 손실값 = 0.0370
Epoch 3: 마지막 배치의 손실값 = 0.0332
Epoch 4: 마지막 배치의 손실값 = 0.0399
Epoch 5: 마지막 배치의 손실값 = 0.0331
Epoch 6: 마지막 배치의 손실값 = 0.0174
Epoch 7: 마지막 배치의 손실값 = 0.0242
Epoch 8: 마지막 배치의 손실값 = 0.0270
Epoch 9: 마지막 배치의 손실값 = 0.0301

아래는 우리가 만든 모델이 얼마나 잘 동작하는지를 보여줍니다:

predictions = compute_predictions(features)
plt.scatter(features[:, 0], features[:, 1], c=predictions[:, 0] > 0.5)

<matplotlib.collections.PathCollection at 0x7ff0b1b73080>

`tf.function`를 이용해서 속도를 빠르게 하기

현재의 코드는 얼마나 빨리 수행될까요?

import time

t0 = time.time()
for epoch in range(20):
  for step, (x, y) in enumerate(dataset):
    loss = train_on_batch(x, y)
t_end = time.time() - t0
print('epoch당 걸린 시간: %.3f 초' % (t_end / 20,))

epoch당 걸린 시간: 0.117 초

학습 함수를 정적 그래프로 컴파일 해 봅시다. 이를 위해서 해야할 것은 문자 그대로, tf.function이라는 데코레이터를 위에 넣어주는것 뿐입니다:

@tf.function
def train_on_batch(x, y):
  with tf.GradientTape() as tape:
    predictions = compute_predictions(x)
    loss = compute_loss(y, predictions)
    dloss_dw, dloss_db = tape.gradient(loss, [w, b])
  w.assign_sub(learning_rate * dloss_dw)
  b.assign_sub(learning_rate * dloss_db)
  return loss

다시한번 시간을 측정해 봅시다:

t0 = time.time()
for epoch in range(20):
  for step, (x, y) in enumerate(dataset):
    loss = train_on_batch(x, y)
t_end = time.time() - t0
print('epoch당 걸린 시간: %.3f 초' % (t_end / 20,))

epoch당 걸린 시간: 0.078 초

걸린 시간이 약 40% 감소했습니다. 이 경우, 매우 간단한 모델을 사용했습니다; 일반적으로 모델이 크면 클 수록, 정적 그래프를 활용한 속도 개선은 더 많이 이뤄집니다.

기억해야할 것이 있습니다: eager 실행모드는 디버깅과 코드 라인별 결과를 출력하는데 매우 유용하지만, 크기를 키워야할 시기가 오면, 정적 그래프가 연구자들에게 최고의 친구가 될 것입니다.

파트 2: Keras API

Keras는 딥러닝을 위한 파이썬 API 입니다. 모두가 사용할만한 내용을 가지고 있습니다:

엔지니어의 경우, Keras는 계층, 평가지표(metrics), 학습 반복문과 같은 재사용 가능한 블록을 제공하여 일반적은 사용 사례를 지원합니다. 고수준의 사용자 경험을 제공하여 접근이 용이하고, 생산성이 좋습니다.
연구자의 경우, 계층이나 학습 반목문과 같은 이미 제공되는 블록의 사용을 선호하지 않고, 스스로 만든 것을 대신 사용할 지도 모릅니다. 물론, Keras는 이를 가능하게 해 줍니다. 이 경우, Keras는 여러분이 작성하게될 블록에 대한 템플릿을 Layers 및 Metrics와 같은 표준적인 API와 함께 제공합니다. 이러한 구조는 다른 사람과 코드를 쉽게 공유하고, 상용의 작업 흐름에도 통합될 수 있게끔 해 줍니다.
이 같은 내용은 라이브러리를 개발하는 분들에게도 적용되는 사실입니다. TensorFlow는 거대한 생태계죠. 수 많은 라이브러리가 존재합니다. 서로다른 라이브러리가 상호작용하고, 이들의 컴포넌트를 공유할 수 있게하기 위해선 API 표준을 따라야만 합니다. API 표준이 곧 Keras가 제공하는 핵심입니다.

Keras는 결정적으로 고수준의 UX와 저수준의 유연성을 모두 함께 완만히 도입합니다. 이는 더이상 한편으론 사용성이 뛰어나지만 유연치는 못한 고수준 API를, 다른 한편으론 매우 유연하지만 전문가만이 사용가능한 저수준 API를 가져야만 하는 상황에서 벗어나게 해 줍니다. 그 대신, 매우 고수준에서부터 매우 저수준 까지의 다양한 작업 흐름의 범위를 가질 수 있게 됩니다. 이 작업흐름이란, 동일한 컨셉과 객체에 기반해서 만들어졌기 때문에 모든것이 상호 호환 가능한 것을 의미합니다.

Keras 작업 흐름의 범위

`Layer` 기본 클래스

가장 첫 번째로 알아야할 클래스는 Layer 입니다. Keras의 거의 모든것은 이 클래스로부터 파생됩니다.

Layer는 상태(가중치, weights)와 몇 (call 메소드에 정의된)계산을 캡슐화 합니다.

from tensorflow.keras.layers import Layer

class Linear(Layer):
  """y = w.x + b"""

  def __init__(self, units=32, input_dim=32):
      super(Linear, self).__init__()
      w_init = tf.random_normal_initializer()
      self.w = tf.Variable(
          initial_value=w_init(shape=(input_dim, units), dtype='float32'),
          trainable=True)
      b_init = tf.zeros_initializer()
      self.b = tf.Variable(
          initial_value=b_init(shape=(units,), dtype='float32'),
          trainable=True)

  def call(self, inputs):
      return tf.matmul(inputs, self.w) + self.b

# 우리가 만든 Layer객체를 인스턴스화 합니다
linear_layer = Linear(4, 2)

Layer 인스턴스는 마치 함수처럼 동작합니다. 몇 데이터에 대해서 이를 호출해 봅시다:

y = linear_layer(tf.ones((2, 2)))
assert y.shape == (2, 4)

Layer 클래스는 속성으로써 부여된 weights를 통해서, 가중치들을 추적합니다

# 가중치는 자동으로 `weights`라는 속성으로써 추적됩니다.
assert linear_layer.weights == [linear_layer.w, linear_layer.b]

add_weight를 이용하여 간단히 가중치를 생성하는 방법이 있는것도 알아두세요. 이렇게 코드를 작성하는것 대신:

w_init = tf.random_normal_initializer()
self.w = tf.Variable(initial_value=w_init(shape=shape, dtype='float32'))

일반적으로 아래와 같이 작성합니다:

self.w = self.add_weight(shape=shape, initializer='random_normal')

build라는 별도의 메소드에서 가중치를 생성하는것이 좋은 관례입니다. 이 build는 Layer에 의해 첫 번째 입력의 Shape이 확인되는 순간 호출되는 lazy한 메소드 입니다. 이러한 패턴은 입력 차원(input_dim)을 생성자에 명시하지 않아도 되게 해 줍니다:

class Linear(Layer):
  """y = w.x + b"""

  def __init__(self, units=32):
      super(Linear, self).__init__()
      self.units = units

  def build(self, input_shape):
      self.w = self.add_weight(shape=(input_shape[-1], self.units),
                               initializer='random_normal',
                               trainable=True)
      self.b = self.add_weight(shape=(self.units,),
                               initializer='random_normal',
                               trainable=True)

  def call(self, inputs):
      return tf.matmul(inputs, self.w) + self.b


# Lazy한 Layer의 인스턴스를 만듭니다.
linear_layer = Linear(4)

# 이렇게 하면, `build(input_shape)`이 호출되어 가중치를 생성하게 됩니다.
y = linear_layer(tf.ones((2, 2)))
assert len(linear_layer.weights) == 2

학습 가능한, 그리고 학습 불가능한 가중치

Layer에 의해 생성된 가중치는 학습이 가능할 수도, 학습이 불가능할 수도 있습니다. 이 두 경우는 각각 trainable_weights 및 non_trainable_weights로써 노출되어 외부에서 접근 가능합니다. 다음은 학습 불가능한 가중치를 가지는 Layer를 보여줍니다:

class ComputeSum(Layer):
  """입력의 합산 결과를 반환하는 Layer"""

  def __init__(self, input_dim):
      super(ComputeSum, self).__init__()
      # 학습 불가능한 가중치를 생성합니다.
      self.total = tf.Variable(initial_value=tf.zeros((input_dim,)),
                               trainable=False)

  def call(self, inputs):
      self.total.assign_add(tf.reduce_sum(inputs, axis=0))
      return self.total  

my_sum = ComputeSum(2)
x = tf.ones((2, 2))

y = my_sum(x)
print(y.numpy())  # [2. 2.]

y = my_sum(x)
print(y.numpy())  # [4. 4.]

assert my_sum.weights == [my_sum.total]
assert my_sum.non_trainable_weights == [my_sum.total]
assert my_sum.trainable_weights == []

[2. 2.]
[4. 4.]

재귀적으로 Layer를 조합하는것

Layer들은 더 큰 계산을 위한 블록을 생성하기 위해 재귀적으로 중첩될 수 있습니다. 각각의 Layer는 각각의 (학습 가능한것과 학습 불가능한)가중치를 추적할 수 있습니다.

# `build` 메소드와 함께 앞서 정의된
# Linear 클래스를 재사용 해봅시다

class MLP(Layer):
    """Linear Layer의 간단한 층을 쌓는 Layer 입니다."""

    def __init__(self):
        super(MLP, self).__init__()
        self.linear_1 = Linear(32)
        self.linear_2 = Linear(32)
        self.linear_3 = Linear(10)

    def call(self, inputs):
        x = self.linear_1(inputs)
        x = tf.nn.relu(x)
        x = self.linear_2(x)
        x = tf.nn.relu(x)
        return self.linear_3(x)

mlp = MLP()

# `mlp` 객체에 대한 첫 번째 호출은 가중치를 생성하게 됩니다.
y = mlp(tf.ones(shape=(3, 64)))

# 가중치들은 재귀적으로 추적됩니다.
assert len(mlp.weights) == 6

미리 정의된 Layer의 종류

Keras는 넓은 범위의 미리 정의된 Layer의 종류를 제공하여 항상 여러분 스스로가 모든것을 구현하지 않아도 되도록끔 해 줍니다.

Convolution layers
Transposed convolutions
Separateable convolutions
Average and max pooling
Global average and max pooling
LSTM, GRU (with built-in cuDNN acceleration)
BatchNormalization
Dropout
Attention
ConvLSTM2D
etc.

Keras는 디폴트로 좋은 설정값을 노출시키는 원칙을 따릅니다. 이렇게 해서, 필요한 인자값을 디폴트값으로 내버려두어도 대부분의 경우에서 잘 동작할 수 있게끔 해 줍니다. 예를 들어서, LSTM Layer는 디폴트로 직교 순환 행렬 초기화자(orthogonal recurrent matrix intializer)를 사용하고, 이는 forget 게이트의 편향값을 1로써 초기화 합니다.

`call` 메소드의 `training` 인자

몇 Layer, 특히 BatchNormalization과 Dropout Layer,는 학습과 추론단계에서 서로다른 동작방식을 가집니다. 이러한 종류의 Layer에 대해선, call 메소으의 (부울 형식인)training 인자를 노출시키는 것이 표준적인 관례입니다.

call 메소드의 이 인자를 노출시킴으로써, 미리 제공되는 학습과 평가 반복문(예를 들어서 fit 메소드)이 해당 Layer를 학습과 추론에 대해서 옳바르게 사용할 수 있게 됩니다.

class Dropout(Layer):
  
  def __init__(self, rate):
    super(Dropout, self).__init__()
    self.rate = rate

  def call(self, inputs, training=None):
    if training:
      return tf.nn.dropout(inputs, rate=self.rate)
    return inputs

class MLPWithDropout(Layer):

  def __init__(self):
      super(MLPWithDropout, self).__init__()
      self.linear_1 = Linear(32)
      self.dropout = Dropout(0.5)
      self.linear_3 = Linear(10)

  def call(self, inputs, training=None):
      x = self.linear_1(inputs)
      x = tf.nn.relu(x)
      x = self.dropout(x, training=training)
      return self.linear_3(x)
    
mlp = MLPWithDropout()
y_train = mlp(tf.ones((2, 2)), training=True)
y_test = mlp(tf.ones((2, 2)), training=False)

좀 더 함수형적으로 모델을 정의하기 위한 방법

딥 러닝 모델을 만들기 위해서, 항상 객체지향적 프로그래밍 방법을 사용할 필요는 없습니다. 아래의 예시처럼 Layer들은 함수형적으로도 조합이 가능합니다 (“함수형 API” 라고 부릅니다):

# `Input` 객체를 사용해서, 입력의 shape(모양)과 dtype(데이터형)을 묘사합니다.
# 딥러닝에서 이는 데이터형을 선언하는 방식입니다.
# shape 인자는 샘플당 으로, 배치 크기를 포함하지 않습니다. 
# 함수형 API는 샘플당 변형을 정의하는데 집중합니다.
# 생성하는 모델은 자동으로 샘플당 변형에 대한 배치를 고려합니다.
# 따라서, 모델은 데이터의 배치마다 호출됩니다.
inputs = tf.keras.Input(shape=(16,))

# 이러한 "데이터형"의 객체에 대해서 Layer를 호출하고,
# 호출 결과로 갱신된 (새로운 shape과 dtype을 가지는)"데이터형"을 반환합니다.
x = Linear(32)(inputs) # 앞서 정의된 Linear Layer를 재사용 합니다.
x = Dropout(0.5)(x)    # 앞서 정의된 Droptout Layer를 재사용 합니다.
outputs = Linear(10)(x)

# 함수형 `모델(Model)`은 입력과 출력을 명시하여 정의될 수 있습니다.
# 모델은 다른것과 마찬가지로 스스로가 또 하나의 Layer가 됩니다.
model = tf.keras.Model(inputs, outputs)

# 함수형 모델은 호출되기전, 이미 가중치를 가집니다.
# 그 이유는 입력에 대한 shape을 `input`에서 사전에 정의했기 때문입니다.
assert len(model.weights) == 4

# 똑같은 데이터에 대해서, 모델을 다시 호출해 봅시다.
y = model(tf.ones((2, 16)))
assert y.shape == (2, 10)

함수형 API는 하위 클래스를 만드는것 보다 더 간결하고, 여기엔 몇몇 부가적인 이점(일반적으로 함수형, 형 선언적 언어가 형 선언적이지 않은 객체지향 개발에 비해 가지는 이점과 동일)이 존재합니다. 하지만, 이는 Layer들의 DAGs를 정의하는데에만 사용될 수 있습니다. 재귀적인 네트워크는 Layer의 하위 클래스를 통해서 정의되어야 합니다.

함수형 모델과 하위 클래스를 통해 정의된 모델의 주요 다른점은 이곳에 설명되어 있습니다.

이곳을 방문해서, 함수형 API에 대해 좀 더 배워볼 수 있습니다.

연구의 작업 흐름에서, 객체지향 모델과 함수형 모델을 섞어쓰는 자신을 종종 발견하게 될지도 모릅니다.

단일 입력과 출력을 가지는 Layer을 이용해서, 여러 층으로 구성된 모델에 대하여 Sequential 클래스를 사용할 수도 있습니다. 이 클래스는 Layer의 목록을 Model로 변환해 줍니다:

from tensorflow.keras import Sequential

model = Sequential([Linear(32), Dropout(0.5), Linear(10)])

y = model(tf.ones((2, 16)))
assert y.shape == (2, 10)

Loss 클래스

Keras는 넓은 범위의 미리 정의된 손실함수에 대한 Loss 클래스를 제공합니다. 이는 BinaryCrossentropy, CategoricalCrossentropy, KLDivergence등과 같은 것이 포함되며 다음과 같이 작동합니다:

bce = tf.keras.losses.BinaryCrossentropy()
y_true = [0., 0., 1., 1.]  # 목표 (레이블)
y_pred = [1., 1., 1., 0.]  # 예측 결과
loss = bce(y_true, y_pred)
print('손실:', loss.numpy())

손실: 11.522857

Loss 클래스는 상태를 가지지 않습니다. 즉, __call__의 출력은 입력에 대한 함수일 뿐입니다.

Metric 클래스

또한, Keras는 넓은 범위의 미리 정의된 평가지표 함수에 대한 Metric 클래스를 제공합니다. 이는 BinaryAccuracy, AUC, FalsePositives등과 같은것을 포함합니다.

Loss와는 다르게, Metric은 상태를 가집니다. update_state 메소드를 사용해서 상태를 갱신하고, result를 사용해서 스칼라형태의 결과값을 요청할 수 있습니다:

m = tf.keras.metrics.AUC()
m.update_state([0, 1, 1, 1], [0, 1, 0, 0])
print('중간 결과: ', m.result().numpy())

m.update_state([1, 1, 1, 1], [0, 1, 1, 0])
print('최종 결과: ', m.result().numpy())

중간 결과:  0.6666667
최종 결과:  0.71428573

내부 상태는 metric.reset_states에 의해 초기화될 수 있습니다.

Metric 클래스의 하위 클래스를 만들어서, 여러분만의 평가지표 함수를 손쉽게 만들수도 있습니다:

__init__내의 상태 변수를 생성합니다
update_state내에서 인자로써 주어진 y_true와 y_pred를 이용해서 변수를 갱신합니다
result내에서 평가지표의 결과를 반환합니다
reset_states내에서 상태를 초기화 합니다

다음은 이 방법을 보여주기 위한 목적으로, BinaryTruePositive 평가지표에 대한 구현하고 있습니다:

class BinaryTruePositives(tf.keras.metrics.Metric):

  def __init__(self, name='binary_true_positives', **kwargs):
    super(BinaryTruePositives, self).__init__(name=name, **kwargs)
    self.true_positives = self.add_weight(name='tp', initializer='zeros')

  def update_state(self, y_true, y_pred, sample_weight=None):
    y_true = tf.cast(y_true, tf.bool)
    y_pred = tf.cast(y_pred, tf.bool)

    values = tf.logical_and(tf.equal(y_true, True), tf.equal(y_pred, True))
    values = tf.cast(values, self.dtype)
    if sample_weight is not None:
      sample_weight = tf.cast(sample_weight, self.dtype)
      sample_weight = tf.broadcast_weights(sample_weight, values)
      values = tf.multiply(values, sample_weight)
    self.true_positives.assign_add(tf.reduce_sum(values))

  def result(self):
    return self.true_positives

  def reset_states(self):
    self.true_positive.assign(0)

Optimizer 클래스 & 빠른 end-to-end 학습 반복문

앞서 보여진 선형회귀 예제에서 작성한, 경사하강시 변수값을 직접 갱신하는 방법은 일반적으로 하지 않아도 됩니다. 보통은 SGD, RMSprop, 또는 Adam등과 같이 Keras에서 미리 제공되는 Optimizer 중 하나를 사용하면 됩니다.

아래는 MNIST 데이터에 대해서, Loss, Metric 클래스와 Optimizer가 모두 함께 사용되는 예를 보여줍니다.

from tensorflow.keras import layers

# 데이터셋를 준비합니다
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()
x_train = x_train[:].reshape(60000, 784).astype('float32') / 255
dataset = tf.data.Dataset.from_tensor_slices((x_train, y_train))
dataset = dataset.shuffle(buffer_size=1024).batch(64)

# 간단한 분류를 위한 모델의 인스턴스를 만듭니다
model = tf.keras.Sequential([
  layers.Dense(256, activation=tf.nn.relu),
  layers.Dense(256, activation=tf.nn.relu),
  layers.Dense(10)
])

# 정수형 레이블을 인자로 받아들이는, 로지스틱 Loss의 인스턴스를 만듭니다
loss = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)

# 정확도에 대한 Metric의 인스턴스를 만듭니다
accuracy = tf.keras.metrics.SparseCategoricalAccuracy()

# Optimizer의 인스턴스를 만듭니다
optimizer = tf.keras.optimizers.Adam()

# 데이터셋의 데이터 배치를 순회합니다
for step, (x, y) in enumerate(dataset):
  
  # GradientTape 열어줍니다
  with tf.GradientTape() as tape:

    # 순방향 전파(forward)를 수행합니다
    logits = model(x)

    # 현재 배치에 대한 손실값을 측정합니다
    loss_value = loss(y, logits)
     
  # 손실에 대한 가중치의 경사도를 계산합니다
  gradients = tape.gradient(loss_value, model.trainable_weights)
  
  # 모델의 가중치를 갱신합니다
  optimizer.apply_gradients(zip(gradients, model.trainable_weights))

  # 현재까지 수행된 전체에 대한 모델의 정확도를 갱신합니다
  accuracy.update_state(y, logits)
  
  # 로그를 출력합니다
  if step % 100 == 0:
    print('단계(Step):', step)
    print('마지막 단계(Step)의 손실:', float(loss_value))
    print('지금까지 수행된 전체에 대한 정확도:', float(accuracy.result()))

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz
11493376/11490434 [==============================] - 0s 0us/step
단계(Step): 0
마지막 단계(Step)의 손실: 2.341179132461548
지금까지 수행된 전체에 대한 정확도: 0.03125
단계(Step): 100
마지막 단계(Step)의 손실: 0.33624351024627686
지금까지 수행된 전체에 대한 정확도: 0.843440592288971
단계(Step): 200
마지막 단계(Step)의 손실: 0.18553781509399414
지금까지 수행된 전체에 대한 정확도: 0.8798196315765381
단계(Step): 300
마지막 단계(Step)의 손실: 0.3094934821128845
지금까지 수행된 전체에 대한 정확도: 0.8974252343177795
단계(Step): 400
마지막 단계(Step)의 손실: 0.20692837238311768
지금까지 수행된 전체에 대한 정확도: 0.9101075530052185
단계(Step): 500
마지막 단계(Step)의 손실: 0.23231974244117737
지금까지 수행된 전체에 대한 정확도: 0.9177582263946533
단계(Step): 600
마지막 단계(Step)의 손실: 0.1896420568227768
지금까지 수행된 전체에 대한 정확도: 0.9241368770599365
단계(Step): 700
마지막 단계(Step)의 손실: 0.11397643387317657
지금까지 수행된 전체에 대한 정확도: 0.9284726977348328
단계(Step): 800
마지막 단계(Step)의 손실: 0.22063642740249634
지금까지 수행된 전체에 대한 정확도: 0.9317259788513184
단계(Step): 900
마지막 단계(Step)의 손실: 0.03219543397426605
지금까지 수행된 전체에 대한 정확도: 0.9354883432388306

SparseCategoricalAccuracy Metric 인스턴스를 재사용해서 테스트 반복문을 구현할 수 있습니다:

x_test = x_test[:].reshape(10000, 784).astype('float32') / 255
test_dataset = tf.data.Dataset.from_tensor_slices((x_test, y_test))
test_dataset = test_dataset.batch(128)

accuracy.reset_states()  # 이 코드는 Metric의 내부 상태를 초기화 합니다

for step, (x, y) in enumerate(test_dataset):
  logits = model(x)
  accuracy.update_state(y, logits)

print('최종 테스트 정확도:', float(accuracy.result()))

최종 테스트 정확도: 0.9593999981880188

`add_loss` 메소드

때로는 순방향 전파(forward) 수행 중 손실값을 계산해 보고 싶을 수 있습니다 (특히, 정규화(regularization) 손실에 대해서). Keras는 어느시점에서든지 손실값을 계산할 수 있게 해 주고, add_loss 메소드를 통해 이 손실값을 재귀적으로 계속 추적할 수 있게 해 줍니다.

다음은 입력에 대한 L2 노름에 기반한 희소 정규화(regularization) 손실을 추가하는 Layer의 예를 보여줍니다:

class ActivityRegularization(Layer):
  """활성 희소 정규화 손실(activity sparsity regularization loss)을 생성하는 Layer 입니다"""
  
  def __init__(self, rate=1e-2):
    super(ActivityRegularization, self).__init__()
    self.rate = rate
  
  def call(self, inputs):
    # 입력값에 기반하는
    # `add_loss`를 사용해서 정규화 손실을 생성합니다
    self.add_loss(self.rate * tf.reduce_sum(tf.square(inputs)))
    return inputs

add_loss를 이용해서 추가된 손실값은 Layer 또는 Model의 리스트형 속성인 .losses를 통해서 접근이 가능합니다:

from tensorflow.keras import layers

class SparseMLP(Layer):
  """희소 정규화 손실을 가지는 선형 계층을 쌓아올린 Layer 입니다"""

  def __init__(self, output_dim):
      super(SparseMLP, self).__init__()
      self.dense_1 = layers.Dense(32, activation=tf.nn.relu)
      self.regularization = ActivityRegularization(1e-2)
      self.dense_2 = layers.Dense(output_dim)

  def call(self, inputs):
      x = self.dense_1(inputs)
      x = self.regularization(x)
      return self.dense_2(x)
    

mlp = SparseMLP(1)
y = mlp(tf.ones((10, 10)))

print(mlp.losses)  # float32 자료형의 단일 스칼라값을 가지는 리스트 입니다

[<tf.Tensor: id=186153, shape=(), dtype=float32, numpy=0.79275525>]

이 손실값들은 순방향 전파(forward)의 시작점에 있는 최상위 Layer로부터 초기화되며 축적되지 않습니다. 따라서 layer.losses는 항상 마지막 순방향 전파동안 생성된 손실값만을 가지게 됩니다. 학습 반복문을 작성할 때, 일반적으로 경사도 계산 이전에 이 손실값들에 대한 합산을 수행합니다.

# *마지막* 순방향 전파에 해당하는 손실값들 입니다
mlp = SparseMLP(1)
mlp(tf.ones((10, 10)))
assert len(mlp.losses) == 1
mlp(tf.ones((10, 10)))
assert len(mlp.losses) == 1  # 축적되지 않습니다

# 이 손실값들을 학습 반복문에서 사용하는법을 보여줍니다

# 데이터셋을 준비합니다
(x_train, y_train), _ = tf.keras.datasets.mnist.load_data()
dataset = tf.data.Dataset.from_tensor_slices(
    (x_train.reshape(60000, 784).astype('float32') / 255, y_train))
dataset = dataset.shuffle(buffer_size=1024).batch(64)

# 새로운 MLP를 만듭니다
mlp = SparseMLP(10)

# Loss와 Optimizer를 만듭니다
loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
optimizer = tf.keras.optimizers.SGD(learning_rate=0.1)

for step, (x, y) in enumerate(dataset):
  with tf.GradientTape() as tape:

    # 순방향 전파를 수행합니다
    logits = mlp(x)

    # 현재 배치에 대한 외부의 손실값을 계산합니다
    loss = loss_fn(y, logits)
    
    # 순방향 전파시 생성된 손실값을 더해줍니다 
    loss += sum(mlp.losses)
     
    # 해당 손실에 대한 가중치의 경사도를 계산합니다
    gradients = tape.gradient(loss, mlp.trainable_weights)
  
  # 모델의 가중치를 갱신합니다
  optimizer.apply_gradients(zip(gradients, mlp.trainable_weights))
  
  # 로그를 출력합니다
  if step % 100 == 0:
    print(step, float(loss))

4.2266459465026855
2.2934560775756836
2.2962656021118164
2.1863455772399902
2.158012866973877
2.1282315254211426
2.0247669219970703
2.041032314300537
1.9458585977554321
1.7418930530548096

자세한 end-to-end 예제: Variational AutoEncoder (VAE)

기초적인 내용의 공부를 잠시 미뤄두고, 약간 더 어려운 예제를 살펴보고 싶다면, 여기에 소개된 VAE에 대한 구현의 예제를 확인해 보시기 바랍니다. 이는 여러분이 지금까지 배워왔던 모든것의 내용을 담고 있습니다:

Layer의 하위 클래스를 만드는것
재귀적으로 Layer를 구성하는것
Loss 및 Metric 클래스에 대한것
add_loss
GradientTape

미리 정의된 학습 반복문을 사용하는것

간단한 케이스에 대해서 조차 여러분이 스스로 저수준의 학습 반복문을 매번 작성해야 한다면, 이는 어리석은 일일지도 모릅니다. Keras는 미리 정의된 학습 반복문을 Model 클래스에서 제공합니다. 사용하고자 한다면, Model의 하위 클래스를 만들거나 Functional(함수형) 또는 Sequential(순차형) 모델을 생성하면 됩니다.

이를 보여주기 위해서, 앞서 만들어둔 MNIST의 예를 재사용 해 보겠습니다:

# 데이터셋을 준비합니다
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()
x_train = x_train.reshape(60000, 784).astype('float32') / 255
dataset = tf.data.Dataset.from_tensor_slices((x_train, y_train))
dataset = dataset.shuffle(buffer_size=1024).batch(64)

# 간단한 분류모델의 인스턴스를 만듭니다
model = tf.keras.Sequential([
  layers.Dense(256, activation=tf.nn.relu),
  layers.Dense(256, activation=tf.nn.relu),
  layers.Dense(10)
])

# 정수형 레이블을 인자로 받아들이는, 로지스틱 Loss의 인스턴스를 만듭니다
loss = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)

# 정확도에 대한 Metric의 인스턴스를 만듭니다
accuracy = tf.keras.metrics.SparseCategoricalAccuracy()

# Optimizer의 인스턴스를 만듭니다
optimizer = tf.keras.optimizers.Adam()

가장 첫 번째로, compile 메소드를 호출하여 Optimizer, Loss, 모니터링하기 위한 Metric을 설정합니다.

model.compile(optimizer=optimizer, loss=loss, metrics=[accuracy])

그리고 나선 fit 메소드를 호출하고, 이 때 데이터를 전달해 줍니다:

model.fit(dataset, epochs=3)

Epoch 1/3
938/938 [==============================] - 7s 7ms/step - loss: 0.2177 - sparse_categorical_accuracy: 0.9361
Epoch 2/3
938/938 [==============================] - 4s 4ms/step - loss: 0.0842 - sparse_categorical_accuracy: 0.9747
Epoch 3/3
938/938 [==============================] - 4s 4ms/step - loss: 0.0564 - sparse_categorical_accuracy: 0.9821

<tensorflow.python.keras.callbacks.History at 0x7ff0b1254fd0>

이게 끝입니다! 이제는 테스트를 해 봅시다:

x_test = x_test[:].reshape(10000, 784).astype('float32') / 255
test_dataset = tf.data.Dataset.from_tensor_slices((x_test, y_test))
test_dataset = test_dataset.batch(128)

loss, acc = model.evaluate(test_dataset)
print('손실:', loss, '정확도:', acc)

79/79 [==============================] - 0s 4ms/step - loss: 0.0949 - sparse_categorical_accuracy: 0.9712
손실: 0.09488680568940737 정확도: 0.9712

fit이 수행되는 동안 검증용 데이터셋에 대한 Loss와 Metric을 모니터링 하는것 또한 가능합니다.

또한, Numpy형의 배열에 대해서도 직접적으로 fit을 호출할 수 있습니다. 따라서 데이터셋에 대한 변환이 필요 없습니다:

(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()
x_train = x_train.reshape(60000, 784).astype('float32') / 255

num_val_samples = 10000
x_val = x_train[-num_val_samples:]
y_val = y_train[-num_val_samples:]
x_train = x_train[:-num_val_samples]
y_train = y_train[:-num_val_samples]

# 간단한 분류모델의 인스턴스를 만듭니다
model = tf.keras.Sequential([
  layers.Dense(256, activation=tf.nn.relu),
  layers.Dense(256, activation=tf.nn.relu),
  layers.Dense(10)
])

# 정수형 레이블을 인자로 받아들이는, 로지스틱 Loss의 인스턴스를 만듭니다
loss = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)

# 정확도에 대한 Metric의 인스턴스를 만듭니다
accuracy = tf.keras.metrics.SparseCategoricalAccuracy()

# Optimizer의 인스턴스를 만듭니다
optimizer = tf.keras.optimizers.Adam()

model.compile(optimizer=optimizer,
              loss=loss,
              metrics=[accuracy])
model.fit(x_train, y_train,
          validation_data=(x_val, y_val),
          epochs=3,
          batch_size=64)

Train on 50000 samples, validate on 10000 samples
Epoch 1/3
50000/50000 [==============================] - 4s 83us/sample - loss: 0.2399 - sparse_categorical_accuracy: 0.9292 - val_loss: 0.1223 - val_sparse_categorical_accuracy: 0.9632
Epoch 2/3
50000/50000 [==============================] - 4s 75us/sample - loss: 0.0951 - sparse_categorical_accuracy: 0.9704 - val_loss: 0.0872 - val_sparse_categorical_accuracy: 0.9747
Epoch 3/3
50000/50000 [==============================] - 4s 73us/sample - loss: 0.0616 - sparse_categorical_accuracy: 0.9805 - val_loss: 0.0805 - val_sparse_categorical_accuracy: 0.9755





<tensorflow.python.keras.callbacks.History at 0x7ff0afa09278>

Callbacks

fit이 가지는 간단하지만 훌륭한 기능 중 하나로, callbacks을 사용해서 학습과 평가 도중 일어나는 일에 대한 사용자 정의화가 가능합니다.

Callback은 객체의 한 종류로, 학습 중간 중간에 호출(예를들어, 매 배치마다 또는 매 epoch마다) 되며 어떤 작업을 수행합니다.

미리 정의된 여러가지 Callback이 존재합니다. ModelCheckpoint는 학습도중 매 epoch마다 모델을 저장하고, EarlyStopping은 검증용 평가지표(metrics)가 향상되지 않을 때 학습을 중단시킵니다.

물론, 손쉽게 여러분만의 callback을 작성할 수도 있습니다.

# 간단한 분류모델의 인스턴스를 만듭니다
model = tf.keras.Sequential([
  layers.Dense(256, activation=tf.nn.relu),
  layers.Dense(256, activation=tf.nn.relu),
  layers.Dense(10)
])

# 정수형 레이블을 인자로 받아들이는, 로지스틱 Loss의 인스턴스를 만듭니다
loss = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)

# 정확도에 대한 Metric의 인스턴스를 만듭니다
accuracy = tf.keras.metrics.SparseCategoricalAccuracy()

# Optimizer의 인스턴스를 만듭니다
optimizer = tf.keras.optimizers.Adam()

model.compile(optimizer=optimizer,
              loss=loss,
              metrics=[accuracy])

# 몇가지 Callback의 인스턴스를 만듭니다
callbacks = [tf.keras.callbacks.EarlyStopping(),
             tf.keras.callbacks.ModelCheckpoint(filepath='my_model.keras',
                                                save_best_only=True)]

model.fit(x_train, y_train,
          validation_data=(x_val, y_val),
          epochs=30,
          batch_size=64,
          callbacks=callbacks)

Train on 50000 samples, validate on 10000 samples
Epoch 1/30
50000/50000 [==============================] - 4s 83us/sample - loss: 0.2432 - sparse_categorical_accuracy: 0.9281 - val_loss: 0.1253 - val_sparse_categorical_accuracy: 0.9635
Epoch 2/30
50000/50000 [==============================] - 4s 72us/sample - loss: 0.0939 - sparse_categorical_accuracy: 0.9717 - val_loss: 0.0964 - val_sparse_categorical_accuracy: 0.9695
Epoch 3/30
50000/50000 [==============================] - 4s 72us/sample - loss: 0.0635 - sparse_categorical_accuracy: 0.9800 - val_loss: 0.0788 - val_sparse_categorical_accuracy: 0.9774
Epoch 4/30
50000/50000 [==============================] - 4s 70us/sample - loss: 0.0456 - sparse_categorical_accuracy: 0.9858 - val_loss: 0.0845 - val_sparse_categorical_accuracy: 0.9757





<tensorflow.python.keras.callbacks.History at 0x7ff0af546748>

작별 인사

저는 이 가이드가 여러분에게 TensorFlow2.0과 Keras로 무엇을 할 수 있는지 알려주는 좋은 오버뷰가 되길 희망합니다!

TensorFlow와 Keras는 단일 작업 흐름만을 대변하는게 아니라는 것을 기억하세요. 사용성과 유연성이라는 트레이드오프를 가지는 여러 범위의 작업흐름을 지원합니다. 예를 들어서, fit 메소드를 사용하는것이 사용자정의 학습 반복문을 작성하는것보다 훨씬 쉽지만, fit은 연구에서 필요한 미세한 조절이 가능한 수준까지를 제공하진 못합니다.

따라서, 여러분의 일에 맞는 알맞은 툴을 사용하세요!

Keras의 중심이 되는 원칙은 “복잡도의 점진적인 공개” 입니다. 매우 쉽게 시작할 수 있고, 점점 더 많은 부분을 밑바닥에서 부터 구현해야 하는 작업흐름에 대해서 점진적으로 좀 더 깊이 들여다보고, 그렇게함으로써 완전한 제어를 할 수 있게 됩니다.

이 사실은 모델의 정의와 모델의 학습 모두에 적용되는 것입니다.

모델의 정의: 작업 흐름의 범위

모델의 학습: 작업 흐름의 범위

이 다음으로 보면 좋을만한 것들

이 가이드 다음으로, 여러분이 관심을 가질만한 주제가 더 있습니다:

 JAX 기초 및 간단한 선형회귀모델 구현실습 AWS Sagemaker를 이용하여 치킨이미지를 분류할 수 있는 CNN(MobileNet) 구현 및 배포 