TF v2를 이용한 mnist image classification MLP model 구현
.
Deep_Learning_Studynotes_(20190713)
study program : https://www.fastcampus.co.kr/data_camp_deeplearning
실습환경 : Google colab gpu 엔진
step 1) Importing Libraries
!pip install --upgrade tensorflow
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.utils import to_categorical
import numpy as np
import matplotlib.pyplot as plt
import os
print(tf.__version__)
print(keras.__version__)
2.1.0
2.2.4-tf
step 2) Enable Eager Mode 설정
if tf.__version__ < '2.0.0':
tf.enable_eager_execution()
step 3) Hyper Parameters 셋팅
learning_rate = 0.001
training_epochs = 30
batch_size = 100
n_class = 10
step 4) MNIST/Fashion MNIST Data 선택
# ## MNIST Dataset #########################################################
# mnist = keras.datasets.mnist
# class_names = ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9']
# ##########################################################################
# Fashion MNIST Dataset #################################################
mnist = keras.datasets.fashion_mnist
class_names = ['T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat', 'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle boot']
#########################################################################
step 5) Datasets 임포트
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/train-labels-idx1-ubyte.gz
32768/29515 [=================================] - 0s 0us/step
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/train-images-idx3-ubyte.gz
26427392/26421880 [==============================] - 0s 0us/step
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/t10k-labels-idx1-ubyte.gz
8192/5148 [===============================================] - 0s 0us/step
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/t10k-images-idx3-ubyte.gz
4423680/4422102 [==============================] - 0s 0us/step
step 6) 데이터 셋팅
n_train = train_images.shape[0]
n_test = test_images.shape[0]
train_images = train_images.astype(np.float32) / 255.
test_images = test_images.astype(np.float32) / 255.
train_labels = to_categorical(train_labels, 10)
test_labels = to_categorical(test_labels, 10)
train_dataset = tf.data.Dataset.from_tensor_slices((train_images, train_labels)).shuffle(buffer_size=100000).batch(batch_size)
test_dataset = tf.data.Dataset.from_tensor_slices((test_images, test_labels)).batch(batch_size)
step 7) Model 설계
def create_model():
model = keras.Sequential()
model.add(keras.layers.Flatten(input_shape=(28,28)))
## 기존에 이미지가 28 X 28 사이즈이므로
## 백터로 일자로 펴서 데이터를 넣어줘야 하기 때문에 위와 같이 코딩함
model.add(keras.layers.Dense(256,activation='relu'))
## fully connected layer(MLP 하나의 층)를 추가하는데 몇개의 퍼셉트론을 둘거냐
## 여기서는 256개를 두겠다. 활성화함수는 렐루를 쓰겠다.
model.add(keras.layers.Dense(128,activation='relu'))
## 거기에 하나의 층으로 또 128개의 퍼셉트론을 더 쌓겠다. 역시 AF는 렐루로
model.add(keras.layers.Dense(10,activation='softmax'))
## MLP의 마지막 층은 10개로 두겠다. 이 10개는 소프트맥스로..
return model
model = create_model()
model.summary()
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
flatten (Flatten) (None, 784) 0
_________________________________________________________________
dense (Dense) (None, 256) 200960
_________________________________________________________________
dense_1 (Dense) (None, 128) 32896
_________________________________________________________________
dense_2 (Dense) (None, 10) 1290
=================================================================
Total params: 235,146
Trainable params: 235,146
Non-trainable params: 0
_________________________________________________________________
step 8) Loss Function 계산함수 구현
## 아래와 같이 @tf.function을 붙여주면 그래프 모드로 전환되어 연산이 빨라진다.
@tf.function
def loss_fn(model, images, labels):
predictions = model(images, training=True)
loss = tf.reduce_mean(keras.losses.categorical_crossentropy(labels, predictions))
return loss
step 9) Calculating Gradient & Updating Weights 함수 구현
@tf.function
def train(model, images, labels):
with tf.GradientTape() as tape:
loss = loss_fn(model, images, labels)
grads = tape.gradient(loss, model.trainable_variables)
optimizer.apply_gradients(zip(grads, model.trainable_variables))
step 10) Caculating Model’s Accuracy 계산함수 구현
@tf.function
def evaluate(model, images, labels):
predictions = model(images, training=False)
correct_prediction = tf.equal(tf.argmax(predictions, 1), tf.argmax(labels, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
return accuracy
step 11) Optimizer 설정
optimizer = tf.keras.optimizers.Adam(learning_rate=learning_rate)
step 12) Training 수행
# train my model
print('Learning started. It takes sometime.')
for epoch in range(training_epochs):
avg_loss = 0.
avg_train_acc = 0.
avg_test_acc = 0.
train_step = 0
test_step = 0
for images, labels in train_dataset:
## for images, labels in train_dataset.repeat(training_epochs):
## 위와 같이 .repeat를 적용해주면 한 에포크를 돌때마다 셔플을 해준다.
train(model,images, labels)
loss = loss_fn(model, images, labels)
acc = evaluate(model, images, labels)
avg_loss = avg_loss + loss
avg_train_acc = avg_train_acc + acc
train_step += 1
avg_loss = avg_loss / train_step
avg_train_acc = avg_train_acc / train_step
for images, labels in test_dataset:
acc = evaluate(model, images, labels)
avg_test_acc = avg_test_acc + acc
test_step += 1
avg_test_acc = avg_test_acc / test_step
print('Epoch:', '{}'.format(epoch + 1),
'loss =', '{:.8f}'.format(avg_loss),
'train accuracy = ', '{:.4f}'.format(avg_train_acc),
'test accuracy = ', '{:.4f}'.format(avg_test_acc))
print('Learning Finished!')
Learning started. It takes sometime.
Epoch: 1 loss = 0.47122025 train accuracy = 0.8334 test accuracy = 0.8503
Epoch: 2 loss = 0.34015965 train accuracy = 0.8769 test accuracy = 0.8601
Epoch: 3 loss = 0.30349249 train accuracy = 0.8895 test accuracy = 0.8727
Epoch: 4 loss = 0.28034952 train accuracy = 0.8972 test accuracy = 0.8774
Epoch: 5 loss = 0.25995550 train accuracy = 0.9037 test accuracy = 0.8777
Epoch: 6 loss = 0.24598221 train accuracy = 0.9084 test accuracy = 0.8732
Epoch: 7 loss = 0.23506032 train accuracy = 0.9131 test accuracy = 0.8802
Epoch: 8 loss = 0.22246923 train accuracy = 0.9170 test accuracy = 0.8859
Epoch: 9 loss = 0.21404687 train accuracy = 0.9201 test accuracy = 0.8873
Epoch: 10 loss = 0.20306093 train accuracy = 0.9240 test accuracy = 0.8851
Epoch: 11 loss = 0.19534108 train accuracy = 0.9262 test accuracy = 0.8868
Epoch: 12 loss = 0.18795916 train accuracy = 0.9305 test accuracy = 0.8850
Epoch: 13 loss = 0.18208914 train accuracy = 0.9321 test accuracy = 0.8903
Epoch: 14 loss = 0.17431562 train accuracy = 0.9357 test accuracy = 0.8884
Epoch: 15 loss = 0.16442259 train accuracy = 0.9386 test accuracy = 0.8893
Epoch: 16 loss = 0.16249800 train accuracy = 0.9406 test accuracy = 0.8922
Epoch: 17 loss = 0.15301982 train accuracy = 0.9428 test accuracy = 0.8941
Epoch: 18 loss = 0.15008074 train accuracy = 0.9438 test accuracy = 0.8848
Epoch: 19 loss = 0.14261821 train accuracy = 0.9475 test accuracy = 0.8878
Epoch: 20 loss = 0.13615653 train accuracy = 0.9490 test accuracy = 0.8931
Epoch: 21 loss = 0.13391495 train accuracy = 0.9494 test accuracy = 0.8901
Epoch: 22 loss = 0.12801310 train accuracy = 0.9521 test accuracy = 0.8972
Epoch: 23 loss = 0.12218244 train accuracy = 0.9556 test accuracy = 0.8913
Epoch: 24 loss = 0.11631814 train accuracy = 0.9569 test accuracy = 0.8914
Epoch: 25 loss = 0.11678679 train accuracy = 0.9567 test accuracy = 0.8878
Epoch: 26 loss = 0.10753930 train accuracy = 0.9607 test accuracy = 0.8980
Epoch: 27 loss = 0.10477049 train accuracy = 0.9613 test accuracy = 0.8989
Epoch: 28 loss = 0.10153881 train accuracy = 0.9622 test accuracy = 0.8913
Epoch: 29 loss = 0.09684762 train accuracy = 0.9642 test accuracy = 0.8879
Epoch: 30 loss = 0.09497685 train accuracy = 0.9655 test accuracy = 0.8951
Learning Finished!
step 13) 트레이닝한 모델 테스트
def plot_image(i, predictions_array, true_label, img):
predictions_array, true_label, img = predictions_array[i], true_label[i], img[i]
plt.grid(False)
plt.xticks([])
plt.yticks([])
plt.imshow(img,cmap=plt.cm.binary)
predicted_label = np.argmax(predictions_array)
if predicted_label == true_label:
color = 'blue'
else:
color = 'red'
plt.xlabel("{} {:2.0f}% ({})".format(class_names[predicted_label],
100*np.max(predictions_array),
class_names[true_label]),
color=color)
def plot_value_array(i, predictions_array, true_label):
predictions_array, true_label = predictions_array[i], true_label[i]
plt.grid(False)
#plt.xticks([])
plt.xticks(range(n_class), class_names, rotation=90)
plt.yticks([])
thisplot = plt.bar(range(n_class), predictions_array, color="#777777")
plt.ylim([0, 1])
predicted_label = np.argmax(predictions_array)
thisplot[predicted_label].set_color('red')
thisplot[true_label].set_color('blue')
rnd_idx = np.random.randint(1, n_test//batch_size)
img_cnt = 0
for images, labels in test_dataset:
img_cnt += 1
if img_cnt != rnd_idx:
continue
predictions = model(images, training=False)
num_rows = 5
num_cols = 3
num_images = num_rows*num_cols
labels = tf.argmax(labels, axis=-1)
plt.figure(figsize=(3*2*num_cols, 4*num_rows))
plt.subplots_adjust(hspace=1.0)
for i in range(num_images):
plt.subplot(num_rows, 2*num_cols, 2*i+1)
plot_image(i, predictions.numpy(), labels.numpy(), images.numpy())
plt.subplot(num_rows, 2*num_cols, 2*i+2)
plot_value_array(i, predictions.numpy(), labels.numpy())
break
step 14) step 7) 모델 설계 부분에서 레이어를 더 쌓아서 모델을 만들고 테스트를 해보자
def create_model():
model = keras.Sequential()
model.add(keras.layers.Flatten(input_shape=(28,28)))
model.add(keras.layers.Dense(256, activation='relu'))
model.add(keras.layers.Dense(256, activation='relu'))
model.add(keras.layers.Dense(128, activation='relu'))
model.add(keras.layers.Dense(128, activation='relu'))
model.add(keras.layers.Dense(10, activation='softmax'))
return model
model = create_model()
model.summary()
Model: "sequential_1"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
flatten_1 (Flatten) (None, 784) 0
_________________________________________________________________
dense_3 (Dense) (None, 256) 200960
_________________________________________________________________
dense_4 (Dense) (None, 256) 65792
_________________________________________________________________
dense_5 (Dense) (None, 128) 32896
_________________________________________________________________
dense_6 (Dense) (None, 128) 16512
_________________________________________________________________
dense_7 (Dense) (None, 10) 1290
=================================================================
Total params: 317,450
Trainable params: 317,450
Non-trainable params: 0
_________________________________________________________________
@tf.function
def loss_fn(model, images, labels):
predictions = model(images, training=True)
loss = tf.reduce_mean(keras.losses.categorical_crossentropy(labels, predictions))
return loss
@tf.function
def train(model, images, labels):
with tf.GradientTape() as tape:
loss = loss_fn(model, images, labels)
grads = tape.gradient(loss, model.trainable_variables)
optimizer.apply_gradients(zip(grads, model.trainable_variables))
@tf.function
def evaluate(model, images, labels):
predictions = model(images, training=False)
correct_prediction = tf.equal(tf.argmax(predictions, 1), tf.argmax(labels, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
return accuracy
optimizer = tf.keras.optimizers.Adam(learning_rate=learning_rate)
# train my model
print('Learning started. It takes sometime.')
for epoch in range(training_epochs):
avg_loss = 0.
avg_train_acc = 0.
avg_test_acc = 0.
train_step = 0
test_step = 0
for images, labels in train_dataset:
## for images, labels in train_dataset.repeat(training_epochs):
## 위와 같이 .repeat를 적용해주면 한 에포크를 돌때마다 셔플을 해준다.
train(model,images, labels)
loss = loss_fn(model, images, labels)
acc = evaluate(model, images, labels)
avg_loss = avg_loss + loss
avg_train_acc = avg_train_acc + acc
train_step += 1
avg_loss = avg_loss / train_step
avg_train_acc = avg_train_acc / train_step
for images, labels in test_dataset:
acc = evaluate(model, images, labels)
avg_test_acc = avg_test_acc + acc
test_step += 1
avg_test_acc = avg_test_acc / test_step
print('Epoch:', '{}'.format(epoch + 1),
'loss =', '{:.8f}'.format(avg_loss),
'train accuracy = ', '{:.4f}'.format(avg_train_acc),
'test accuracy = ', '{:.4f}'.format(avg_test_acc))
print('Learning Finished!')
Learning started. It takes sometime.
Epoch: 1 loss = 0.47201630 train accuracy = 0.8306 test accuracy = 0.8460
Epoch: 2 loss = 0.33868349 train accuracy = 0.8758 test accuracy = 0.8626
Epoch: 3 loss = 0.30029640 train accuracy = 0.8892 test accuracy = 0.8626
Epoch: 4 loss = 0.27897891 train accuracy = 0.8963 test accuracy = 0.8759
Epoch: 5 loss = 0.26005241 train accuracy = 0.9031 test accuracy = 0.8824
Epoch: 6 loss = 0.24540581 train accuracy = 0.9083 test accuracy = 0.8710
Epoch: 7 loss = 0.23372877 train accuracy = 0.9128 test accuracy = 0.8733
Epoch: 8 loss = 0.22330683 train accuracy = 0.9165 test accuracy = 0.8792
Epoch: 9 loss = 0.21107051 train accuracy = 0.9198 test accuracy = 0.8870
Epoch: 10 loss = 0.20567133 train accuracy = 0.9222 test accuracy = 0.8879
Epoch: 11 loss = 0.19724540 train accuracy = 0.9249 test accuracy = 0.8895
Epoch: 12 loss = 0.18724051 train accuracy = 0.9298 test accuracy = 0.8878
Epoch: 13 loss = 0.17988257 train accuracy = 0.9326 test accuracy = 0.8914
Epoch: 14 loss = 0.17087474 train accuracy = 0.9354 test accuracy = 0.8904
Epoch: 15 loss = 0.16594066 train accuracy = 0.9370 test accuracy = 0.8866
Epoch: 16 loss = 0.15842450 train accuracy = 0.9401 test accuracy = 0.8879
Epoch: 17 loss = 0.15252811 train accuracy = 0.9426 test accuracy = 0.8909
Epoch: 18 loss = 0.14875479 train accuracy = 0.9433 test accuracy = 0.8924
Epoch: 19 loss = 0.14242834 train accuracy = 0.9466 test accuracy = 0.8945
Epoch: 20 loss = 0.13557811 train accuracy = 0.9482 test accuracy = 0.8868
Epoch: 21 loss = 0.13116112 train accuracy = 0.9496 test accuracy = 0.8910
Epoch: 22 loss = 0.12538593 train accuracy = 0.9520 test accuracy = 0.8904
Epoch: 23 loss = 0.12070082 train accuracy = 0.9536 test accuracy = 0.8915
Epoch: 24 loss = 0.11779831 train accuracy = 0.9548 test accuracy = 0.8890
Epoch: 25 loss = 0.11672459 train accuracy = 0.9548 test accuracy = 0.8920
Epoch: 26 loss = 0.10860301 train accuracy = 0.9582 test accuracy = 0.8922
Epoch: 27 loss = 0.10649157 train accuracy = 0.9597 test accuracy = 0.8925
Epoch: 28 loss = 0.10406902 train accuracy = 0.9594 test accuracy = 0.8901
Epoch: 29 loss = 0.09889799 train accuracy = 0.9625 test accuracy = 0.8964
Epoch: 30 loss = 0.09266045 train accuracy = 0.9647 test accuracy = 0.8945
Learning Finished!
rnd_idx = np.random.randint(1, n_test//batch_size)
img_cnt = 0
for images, labels in test_dataset:
img_cnt += 1
if img_cnt != rnd_idx:
continue
predictions = model(images, training=False)
num_rows = 5
num_cols = 3
num_images = num_rows*num_cols
labels = tf.argmax(labels, axis=-1)
plt.figure(figsize=(3*2*num_cols, 4*num_rows))
plt.subplots_adjust(hspace=1.0)
for i in range(num_images):
plt.subplot(num_rows, 2*num_cols, 2*i+1)
plot_image(i, predictions.numpy(), labels.numpy(), images.numpy())
plt.subplot(num_rows, 2*num_cols, 2*i+2)
plot_value_array(i, predictions.numpy(), labels.numpy())
break
위의 결과에서 알 수 있듯이 레이어를 더 쌓아도 기존의 모델과 별 차이가 없음을 확인할 수 있었다.
그렇다면 레이어를 더 쌓았는데 비슷하거나 좀 더 성능이 안좋게 나온거라면 오버피팅이 아니냐라는 판단을 할 수 있고, drop-out을 적용해보자.
step 15) 위에 모델에서 추가적으로 drop-out을 적용한 MLP 모델 만들기
learning_rate = 0.001
training_epochs = 30
batch_size = 100
n_class = 10
drop_rate = 0.3
def create_model():
model = keras.Sequential()
model.add(keras.layers.Flatten(input_shape=(28,28)))
model.add(keras.layers.Dense(256, activation='relu'))
model.add(keras.layers.Dropout(drop_rate))
model.add(keras.layers.Dense(256, activation='relu'))
model.add(keras.layers.Dropout(drop_rate))
model.add(keras.layers.Dense(128, activation='relu'))
model.add(keras.layers.Dropout(drop_rate))
model.add(keras.layers.Dense(128, activation='relu'))
model.add(keras.layers.Dropout(drop_rate))
model.add(keras.layers.Dense(10, activation='softmax'))
return model
model = create_model()
model.summary()
Model: "sequential_2"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
flatten_2 (Flatten) (None, 784) 0
_________________________________________________________________
dense_8 (Dense) (None, 256) 200960
_________________________________________________________________
dropout (Dropout) (None, 256) 0
_________________________________________________________________
dense_9 (Dense) (None, 256) 65792
_________________________________________________________________
dropout_1 (Dropout) (None, 256) 0
_________________________________________________________________
dense_10 (Dense) (None, 128) 32896
_________________________________________________________________
dropout_2 (Dropout) (None, 128) 0
_________________________________________________________________
dense_11 (Dense) (None, 128) 16512
_________________________________________________________________
dropout_3 (Dropout) (None, 128) 0
_________________________________________________________________
dense_12 (Dense) (None, 10) 1290
=================================================================
Total params: 317,450
Trainable params: 317,450
Non-trainable params: 0
_________________________________________________________________
@tf.function
def loss_fn(model, images, labels):
predictions = model(images, training=True)
loss = tf.reduce_mean(keras.losses.categorical_crossentropy(labels, predictions))
return loss
@tf.function
def train(model, images, labels):
with tf.GradientTape() as tape:
loss = loss_fn(model, images, labels)
grads = tape.gradient(loss, model.trainable_variables)
optimizer.apply_gradients(zip(grads, model.trainable_variables))
@tf.function
def evaluate(model, images, labels):
predictions = model(images, training=False)
correct_prediction = tf.equal(tf.argmax(predictions, 1), tf.argmax(labels, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
return accuracy
optimizer = tf.keras.optimizers.Adam(learning_rate=learning_rate)
# train my model
print('Learning started. It takes sometime.')
for epoch in range(training_epochs):
avg_loss = 0.
avg_train_acc = 0.
avg_test_acc = 0.
train_step = 0
test_step = 0
for images, labels in train_dataset:
train(model,images, labels)
loss = loss_fn(model, images, labels)
acc = evaluate(model, images, labels)
avg_loss = avg_loss + loss
avg_train_acc = avg_train_acc + acc
train_step += 1
avg_loss = avg_loss / train_step
avg_train_acc = avg_train_acc / train_step
for images, labels in test_dataset:
acc = evaluate(model, images, labels)
avg_test_acc = avg_test_acc + acc
test_step += 1
avg_test_acc = avg_test_acc / test_step
print('Epoch:', '{}'.format(epoch + 1),
'loss =', '{:.8f}'.format(avg_loss),
'train accuracy = ', '{:.4f}'.format(avg_train_acc),
'test accuracy = ', '{:.4f}'.format(avg_test_acc))
print('Learning Finished!')
Learning started. It takes sometime.
Epoch: 1 loss = 0.68396634 train accuracy = 0.7945 test accuracy = 0.8314
Epoch: 2 loss = 0.45805994 train accuracy = 0.8591 test accuracy = 0.8566
Epoch: 3 loss = 0.42066765 train accuracy = 0.8699 test accuracy = 0.8560
Epoch: 4 loss = 0.39140019 train accuracy = 0.8771 test accuracy = 0.8569
Epoch: 5 loss = 0.37540531 train accuracy = 0.8844 test accuracy = 0.8679
Epoch: 6 loss = 0.36260232 train accuracy = 0.8876 test accuracy = 0.8688
Epoch: 7 loss = 0.35164994 train accuracy = 0.8912 test accuracy = 0.8550
Epoch: 8 loss = 0.33950019 train accuracy = 0.8947 test accuracy = 0.8650
Epoch: 9 loss = 0.33450663 train accuracy = 0.8967 test accuracy = 0.8739
Epoch: 10 loss = 0.32979870 train accuracy = 0.8985 test accuracy = 0.8757
Epoch: 11 loss = 0.32060406 train accuracy = 0.9019 test accuracy = 0.8697
Epoch: 12 loss = 0.31622446 train accuracy = 0.9022 test accuracy = 0.8816
Epoch: 13 loss = 0.31420413 train accuracy = 0.9052 test accuracy = 0.8807
Epoch: 14 loss = 0.30564395 train accuracy = 0.9053 test accuracy = 0.8799
Epoch: 15 loss = 0.29799530 train accuracy = 0.9081 test accuracy = 0.8847
Epoch: 16 loss = 0.29705247 train accuracy = 0.9085 test accuracy = 0.8841
Epoch: 17 loss = 0.29135558 train accuracy = 0.9094 test accuracy = 0.8805
Epoch: 18 loss = 0.29027590 train accuracy = 0.9106 test accuracy = 0.8805
Epoch: 19 loss = 0.29065055 train accuracy = 0.9121 test accuracy = 0.8802
Epoch: 20 loss = 0.28519043 train accuracy = 0.9134 test accuracy = 0.8854
Epoch: 21 loss = 0.28431398 train accuracy = 0.9147 test accuracy = 0.8850
Epoch: 22 loss = 0.27944121 train accuracy = 0.9163 test accuracy = 0.8818
Epoch: 23 loss = 0.27624452 train accuracy = 0.9162 test accuracy = 0.8874
Epoch: 24 loss = 0.27038869 train accuracy = 0.9174 test accuracy = 0.8864
Epoch: 25 loss = 0.27284542 train accuracy = 0.9169 test accuracy = 0.8866
Epoch: 26 loss = 0.27075678 train accuracy = 0.9183 test accuracy = 0.8861
Epoch: 27 loss = 0.26519039 train accuracy = 0.9202 test accuracy = 0.8871
Epoch: 28 loss = 0.26452246 train accuracy = 0.9193 test accuracy = 0.8910
Epoch: 29 loss = 0.26045755 train accuracy = 0.9208 test accuracy = 0.8873
Epoch: 30 loss = 0.26390463 train accuracy = 0.9203 test accuracy = 0.8914
Learning Finished!
그러면 이번에는 정규화를 적용해보자.
step 16) L2 정규화를 적용한 MLP 구현
learning_rate = 0.001
training_epochs = 30
batch_size = 100
n_class = 10
reg_weight = 0.002
## reg_weight는 정규화에서 람다를 얘기한다. 사용자 파라미터로 람다값을 지정해줘야한다.
def create_model():
model = keras.Sequential()
model.add(keras.layers.Flatten(input_shape=(28,28)))
model.add(keras.layers.Dense(256, activation='relu',
kernel_regularizer=keras.regularizers.l2(reg_weight)))
model.add(keras.layers.Dense(256, activation='relu',
kernel_regularizer=keras.regularizers.l2(reg_weight)))
model.add(keras.layers.Dense(128, activation='relu',
kernel_regularizer=keras.regularizers.l2(reg_weight)))
model.add(keras.layers.Dense(128, activation='relu',
kernel_regularizer=keras.regularizers.l2(reg_weight)))
model.add(keras.layers.Dense(10, activation='softmax',
kernel_regularizer=keras.regularizers.l2(reg_weight)))
return model
model = create_model()
model.summary()
Model: "sequential_3"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
flatten_3 (Flatten) (None, 784) 0
_________________________________________________________________
dense_13 (Dense) (None, 256) 200960
_________________________________________________________________
dense_14 (Dense) (None, 256) 65792
_________________________________________________________________
dense_15 (Dense) (None, 128) 32896
_________________________________________________________________
dense_16 (Dense) (None, 128) 16512
_________________________________________________________________
dense_17 (Dense) (None, 10) 1290
=================================================================
Total params: 317,450
Trainable params: 317,450
Non-trainable params: 0
_________________________________________________________________
@tf.function
def loss_fn(model, images, labels):
predictions = model(images, training=True)
loss = tf.reduce_mean(keras.losses.categorical_crossentropy(labels, predictions))
return loss
@tf.function
def train(model, images, labels):
with tf.GradientTape() as tape:
loss = loss_fn(model, images, labels)
grads = tape.gradient(loss, model.trainable_variables)
optimizer.apply_gradients(zip(grads, model.trainable_variables))
@tf.function
def evaluate(model, images, labels):
predictions = model(images, training=False)
correct_prediction = tf.equal(tf.argmax(predictions, 1), tf.argmax(labels, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
return accuracy
optimizer = tf.keras.optimizers.Adam(learning_rate=learning_rate)
# train my model
print('Learning started. It takes sometime.')
for epoch in range(training_epochs):
avg_loss = 0.
avg_train_acc = 0.
avg_test_acc = 0.
train_step = 0
test_step = 0
for images, labels in train_dataset:
train(model,images, labels)
loss = loss_fn(model, images, labels)
acc = evaluate(model, images, labels)
avg_loss = avg_loss + loss
avg_train_acc = avg_train_acc + acc
train_step += 1
avg_loss = avg_loss / train_step
avg_train_acc = avg_train_acc / train_step
for images, labels in test_dataset:
acc = evaluate(model, images, labels)
avg_test_acc = avg_test_acc + acc
test_step += 1
avg_test_acc = avg_test_acc / test_step
print('Epoch:', '{}'.format(epoch + 1),
'loss =', '{:.8f}'.format(avg_loss),
'train accuracy = ', '{:.4f}'.format(avg_train_acc),
'test accuracy = ', '{:.4f}'.format(avg_test_acc))
print('Learning Finished!')
Learning started. It takes sometime.
Epoch: 1 loss = 0.47432184 train accuracy = 0.8302 test accuracy = 0.8511
Epoch: 2 loss = 0.33622319 train accuracy = 0.8754 test accuracy = 0.8608
Epoch: 3 loss = 0.30235520 train accuracy = 0.8882 test accuracy = 0.8727
Epoch: 4 loss = 0.27974367 train accuracy = 0.8967 test accuracy = 0.8763
Epoch: 5 loss = 0.26026556 train accuracy = 0.9037 test accuracy = 0.8746
Epoch: 6 loss = 0.24867533 train accuracy = 0.9077 test accuracy = 0.8776
Epoch: 7 loss = 0.23672600 train accuracy = 0.9111 test accuracy = 0.8782
Epoch: 8 loss = 0.22811213 train accuracy = 0.9143 test accuracy = 0.8845
Epoch: 9 loss = 0.21527784 train accuracy = 0.9195 test accuracy = 0.8834
Epoch: 10 loss = 0.20444603 train accuracy = 0.9228 test accuracy = 0.8755
Epoch: 11 loss = 0.19815075 train accuracy = 0.9259 test accuracy = 0.8889
Epoch: 12 loss = 0.19000788 train accuracy = 0.9282 test accuracy = 0.8895
Epoch: 13 loss = 0.18227911 train accuracy = 0.9298 test accuracy = 0.8850
Epoch: 14 loss = 0.17513357 train accuracy = 0.9334 test accuracy = 0.8897
Epoch: 15 loss = 0.16624437 train accuracy = 0.9366 test accuracy = 0.8922
Epoch: 16 loss = 0.16002238 train accuracy = 0.9401 test accuracy = 0.8873
Epoch: 17 loss = 0.15527752 train accuracy = 0.9415 test accuracy = 0.8909
Epoch: 18 loss = 0.14854047 train accuracy = 0.9441 test accuracy = 0.8967
Epoch: 19 loss = 0.14315921 train accuracy = 0.9464 test accuracy = 0.8886
Epoch: 20 loss = 0.13643174 train accuracy = 0.9481 test accuracy = 0.8953
Epoch: 21 loss = 0.13209526 train accuracy = 0.9505 test accuracy = 0.8884
Epoch: 22 loss = 0.12768622 train accuracy = 0.9520 test accuracy = 0.8933
Epoch: 23 loss = 0.12261139 train accuracy = 0.9528 test accuracy = 0.8874
Epoch: 24 loss = 0.11856887 train accuracy = 0.9552 test accuracy = 0.8942
Epoch: 25 loss = 0.11253037 train accuracy = 0.9576 test accuracy = 0.8897
Epoch: 26 loss = 0.10884634 train accuracy = 0.9584 test accuracy = 0.8913
Epoch: 27 loss = 0.10621001 train accuracy = 0.9601 test accuracy = 0.8934
Epoch: 28 loss = 0.10153838 train accuracy = 0.9609 test accuracy = 0.8945
Epoch: 29 loss = 0.09865828 train accuracy = 0.9626 test accuracy = 0.8971
Epoch: 30 loss = 0.09443182 train accuracy = 0.9646 test accuracy = 0.8939
Learning Finished!
step 17) Batch norm을 적용한 MLP 구현
learning_rate = 0.001
training_epochs = 30
batch_size = 100
n_class = 10
def create_model():
## 배치놈은 적용할때 순서를 아래오 같이 잘 적용해줘야 한다.
model = keras.Sequential()
model.add(keras.layers.Flatten(input_shape=(28,28)))
model.add(keras.layers.Dense(256))
model.add(keras.layers.BatchNormalization())
model.add(keras.layers.ReLU())
model.add(keras.layers.Dense(256))
model.add(keras.layers.BatchNormalization())
model.add(keras.layers.ReLU())
model.add(keras.layers.Dense(128))
model.add(keras.layers.BatchNormalization())
model.add(keras.layers.ReLU())
model.add(keras.layers.Dense(128))
model.add(keras.layers.BatchNormalization())
model.add(keras.layers.ReLU())
model.add(keras.layers.Dense(10))
model.add(keras.layers.BatchNormalization())
model.add(keras.layers.Softmax())
return model
model = create_model()
model.summary()
Model: "sequential_4"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
flatten_4 (Flatten) (None, 784) 0
_________________________________________________________________
dense_18 (Dense) (None, 256) 200960
_________________________________________________________________
batch_normalization (BatchNo (None, 256) 1024
_________________________________________________________________
re_lu (ReLU) (None, 256) 0
_________________________________________________________________
dense_19 (Dense) (None, 256) 65792
_________________________________________________________________
batch_normalization_1 (Batch (None, 256) 1024
_________________________________________________________________
re_lu_1 (ReLU) (None, 256) 0
_________________________________________________________________
dense_20 (Dense) (None, 128) 32896
_________________________________________________________________
batch_normalization_2 (Batch (None, 128) 512
_________________________________________________________________
re_lu_2 (ReLU) (None, 128) 0
_________________________________________________________________
dense_21 (Dense) (None, 128) 16512
_________________________________________________________________
batch_normalization_3 (Batch (None, 128) 512
_________________________________________________________________
re_lu_3 (ReLU) (None, 128) 0
_________________________________________________________________
dense_22 (Dense) (None, 10) 1290
_________________________________________________________________
batch_normalization_4 (Batch (None, 10) 40
_________________________________________________________________
softmax (Softmax) (None, 10) 0
=================================================================
Total params: 320,562
Trainable params: 319,006
Non-trainable params: 1,556
_________________________________________________________________
@tf.function
def loss_fn(model, images, labels):
predictions = model(images, training=True)
loss = tf.reduce_mean(keras.losses.categorical_crossentropy(labels, predictions))
return loss
@tf.function
def train(model, images, labels):
with tf.GradientTape() as tape:
loss = loss_fn(model, images, labels)
grads = tape.gradient(loss, model.trainable_variables)
optimizer.apply_gradients(zip(grads, model.trainable_variables))
@tf.function
def evaluate(model, images, labels):
predictions = model(images, training=False)
correct_prediction = tf.equal(tf.argmax(predictions, 1), tf.argmax(labels, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
return accuracy
optimizer = tf.keras.optimizers.Adam(learning_rate=learning_rate)
# train my model
print('Learning started. It takes sometime.')
for epoch in range(training_epochs):
avg_loss = 0.
avg_train_acc = 0.
avg_test_acc = 0.
train_step = 0
test_step = 0
for images, labels in train_dataset:
train(model,images, labels)
loss = loss_fn(model, images, labels)
acc = evaluate(model, images, labels)
avg_loss = avg_loss + loss
avg_train_acc = avg_train_acc + acc
train_step += 1
avg_loss = avg_loss / train_step
avg_train_acc = avg_train_acc / train_step
for images, labels in test_dataset:
acc = evaluate(model, images, labels)
avg_test_acc = avg_test_acc + acc
test_step += 1
avg_test_acc = avg_test_acc / test_step
print('Epoch:', '{}'.format(epoch + 1),
'loss =', '{:.8f}'.format(avg_loss),
'train accuracy = ', '{:.4f}'.format(avg_train_acc),
'test accuracy = ', '{:.4f}'.format(avg_test_acc))
print('Learning Finished!')
Learning started. It takes sometime.
Epoch: 1 loss = 0.55156994 train accuracy = 0.8252 test accuracy = 0.8509
Epoch: 2 loss = 0.37986416 train accuracy = 0.8767 test accuracy = 0.8614
Epoch: 3 loss = 0.31682464 train accuracy = 0.8916 test accuracy = 0.8752
Epoch: 4 loss = 0.28147227 train accuracy = 0.8969 test accuracy = 0.8645
Epoch: 5 loss = 0.25224206 train accuracy = 0.9054 test accuracy = 0.8838
Epoch: 6 loss = 0.23104106 train accuracy = 0.9117 test accuracy = 0.8659
Epoch: 7 loss = 0.21120189 train accuracy = 0.9144 test accuracy = 0.8645
Epoch: 8 loss = 0.19473054 train accuracy = 0.9213 test accuracy = 0.8885
Epoch: 9 loss = 0.17819290 train accuracy = 0.9240 test accuracy = 0.8848
Epoch: 10 loss = 0.16607507 train accuracy = 0.9286 test accuracy = 0.8855
Epoch: 11 loss = 0.15477753 train accuracy = 0.9323 test accuracy = 0.8769
Epoch: 12 loss = 0.14456172 train accuracy = 0.9344 test accuracy = 0.8892
Epoch: 13 loss = 0.13205849 train accuracy = 0.9386 test accuracy = 0.8671
Epoch: 14 loss = 0.12597175 train accuracy = 0.9410 test accuracy = 0.8880
Epoch: 15 loss = 0.11478428 train accuracy = 0.9475 test accuracy = 0.8888
Epoch: 16 loss = 0.10781864 train accuracy = 0.9494 test accuracy = 0.8780
Epoch: 17 loss = 0.09772268 train accuracy = 0.9522 test accuracy = 0.8872
Epoch: 18 loss = 0.09256799 train accuracy = 0.9538 test accuracy = 0.8860
Epoch: 19 loss = 0.08564570 train accuracy = 0.9552 test accuracy = 0.8902
Epoch: 20 loss = 0.08106101 train accuracy = 0.9562 test accuracy = 0.8765
Epoch: 21 loss = 0.07430840 train accuracy = 0.9611 test accuracy = 0.8822
Epoch: 22 loss = 0.07082912 train accuracy = 0.9610 test accuracy = 0.8890
Epoch: 23 loss = 0.06600729 train accuracy = 0.9624 test accuracy = 0.8886
Epoch: 24 loss = 0.06105103 train accuracy = 0.9662 test accuracy = 0.8870
Epoch: 25 loss = 0.05810046 train accuracy = 0.9654 test accuracy = 0.8860
Epoch: 26 loss = 0.05752798 train accuracy = 0.9657 test accuracy = 0.8858
Epoch: 27 loss = 0.05266721 train accuracy = 0.9685 test accuracy = 0.8797
Epoch: 28 loss = 0.04977661 train accuracy = 0.9684 test accuracy = 0.8895
Epoch: 29 loss = 0.04771885 train accuracy = 0.9691 test accuracy = 0.8820
Epoch: 30 loss = 0.04471344 train accuracy = 0.9714 test accuracy = 0.8813
Learning Finished!
배치놈은 로스값이 위에 모델들과 다르게 빠르게 떨어지는 것을 알 수 있다.(수렴이 빨리되는 성향)
따라서 배치놈은 가능하면 모델링할때 써주는게 좋다.
step 18) learning rate decay를 적용한 MLP 모델
learning_rate = 0.001
training_epochs = 30
batch_size = 100
n_class = 10
def create_model():
model = keras.Sequential()
model.add(keras.layers.Flatten(input_shape=(28,28)))
model.add(keras.layers.Dense(256, activation='relu'))
model.add(keras.layers.Dense(256, activation='relu'))
model.add(keras.layers.Dense(128, activation='relu'))
model.add(keras.layers.Dense(128, activation='relu'))
model.add(keras.layers.Dense(10, activation='softmax'))
return model
model = create_model()
model.summary()
Model: "sequential_5"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
flatten_5 (Flatten) (None, 784) 0
_________________________________________________________________
dense_23 (Dense) (None, 256) 200960
_________________________________________________________________
dense_24 (Dense) (None, 256) 65792
_________________________________________________________________
dense_25 (Dense) (None, 128) 32896
_________________________________________________________________
dense_26 (Dense) (None, 128) 16512
_________________________________________________________________
dense_27 (Dense) (None, 10) 1290
=================================================================
Total params: 317,450
Trainable params: 317,450
Non-trainable params: 0
_________________________________________________________________
@tf.function
def loss_fn(model, images, labels):
predictions = model(images, training=True)
loss = tf.reduce_mean(keras.losses.categorical_crossentropy(labels, predictions))
return loss
@tf.function
def train(model, images, labels):
with tf.GradientTape() as tape:
loss = loss_fn(model, images, labels)
grads = tape.gradient(loss, model.trainable_variables)
optimizer.apply_gradients(zip(grads, model.trainable_variables))
@tf.function
def evaluate(model, images, labels):
predictions = model(images, training=False)
correct_prediction = tf.equal(tf.argmax(predictions, 1), tf.argmax(labels, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
return accuracy
lr_schedule = keras.optimizers.schedules.ExponentialDecay(initial_learning_rate=learning_rate,
decay_steps=n_train//batch_size*10,
decay_rate=0.5,
staircase=True)
optimizer = keras.optimizers.Adam(learning_rate=lr_schedule)
위에서 learning rate decay는 모델을 건드는게 아니라 옵티마이저에서 adam 쓰고, 러닝레이트 디케이가 따로 들어간 걸 볼 수 있다.
decay_steps = 얼마마다 한번씩 이 러닝레이트를 떨굴것이냐 step은 웨이팅을 한번 업데이트 하는 것을 말한다. 배치가 한번 들어가서 업데이트 되면 그게 한 스텝이다.
10 애포그가 되면 러닝레이트를 떨구고 싶을때 아래와 같이 코딩해주면 된다.
한 애포크에 몇 스텝이 들어가는지 계산해서 곱하기 10하면 된다. 600(6만(데이터셋) 나누기 100(배치사이즈))번 스텝을 밟으면 한 애포크가 될 것이다.
따라서 6000 스텝이 되었을때 러닝레이트를 떨굴건데 0.5,절반 만큼 떨구라는 것이다.
staircase는 10 애포크 동안 러닝레이트를 유지하다가 10 애포크가 되면 계단식으로
확 떨구라는 옵션이다. 디폴트는 false다.
false일 경우를 적용하면 매 스텝마다 절반씩 조금씩 조금씩 떨구는 방식이다.
일반적으로 staircase를 쭉 유지시키다가 한번에 떨굴때 더 성능이 잘나온다고 한다.
모델링을 할때 자주 적용하는 옵션중에 하나이다.
# train my model
print('Learning started. It takes sometime.')
for epoch in range(training_epochs):
avg_loss = 0.
avg_train_acc = 0.
avg_test_acc = 0.
train_step = 0
test_step = 0
for images, labels in train_dataset:
train(model,images, labels)
loss = loss_fn(model, images, labels)
acc = evaluate(model, images, labels)
avg_loss = avg_loss + loss
avg_train_acc = avg_train_acc + acc
train_step += 1
avg_loss = avg_loss / train_step
avg_train_acc = avg_train_acc / train_step
for images, labels in test_dataset:
acc = evaluate(model, images, labels)
avg_test_acc = avg_test_acc + acc
test_step += 1
avg_test_acc = avg_test_acc / test_step
print('Epoch:', '{}'.format(epoch + 1),
'loss =', '{:.8f}'.format(avg_loss),
'train accuracy = ', '{:.4f}'.format(avg_train_acc),
'test accuracy = ', '{:.4f}'.format(avg_test_acc))
print('Learning Finished!')
Learning started. It takes sometime.
Epoch: 1 loss = 0.47815746 train accuracy = 0.8287 test accuracy = 0.8438
Epoch: 2 loss = 0.33719864 train accuracy = 0.8759 test accuracy = 0.8656
Epoch: 3 loss = 0.30212611 train accuracy = 0.8877 test accuracy = 0.8709
Epoch: 4 loss = 0.27830932 train accuracy = 0.8960 test accuracy = 0.8706
Epoch: 5 loss = 0.26362100 train accuracy = 0.9019 test accuracy = 0.8759
Epoch: 6 loss = 0.24579105 train accuracy = 0.9081 test accuracy = 0.8769
Epoch: 7 loss = 0.23407455 train accuracy = 0.9118 test accuracy = 0.8761
Epoch: 8 loss = 0.22326103 train accuracy = 0.9159 test accuracy = 0.8845
Epoch: 9 loss = 0.21276423 train accuracy = 0.9194 test accuracy = 0.8842
Epoch: 10 loss = 0.20038223 train accuracy = 0.9252 test accuracy = 0.8771
Epoch: 11 loss = 0.18016288 train accuracy = 0.9312 test accuracy = 0.8936
Epoch: 12 loss = 0.16649547 train accuracy = 0.9358 test accuracy = 0.8920
Epoch: 13 loss = 0.15905622 train accuracy = 0.9394 test accuracy = 0.8919
Epoch: 14 loss = 0.15006126 train accuracy = 0.9430 test accuracy = 0.8913
Epoch: 15 loss = 0.14231375 train accuracy = 0.9447 test accuracy = 0.8930
Epoch: 16 loss = 0.13621068 train accuracy = 0.9479 test accuracy = 0.8920
Epoch: 17 loss = 0.12873869 train accuracy = 0.9514 test accuracy = 0.8955
Epoch: 18 loss = 0.12225182 train accuracy = 0.9528 test accuracy = 0.8980
Epoch: 19 loss = 0.11551828 train accuracy = 0.9554 test accuracy = 0.8955
Epoch: 20 loss = 0.10996552 train accuracy = 0.9585 test accuracy = 0.8973
Epoch: 21 loss = 0.09123135 train accuracy = 0.9648 test accuracy = 0.9001
Epoch: 22 loss = 0.08312258 train accuracy = 0.9680 test accuracy = 0.8961
Epoch: 23 loss = 0.07741531 train accuracy = 0.9706 test accuracy = 0.8970
Epoch: 24 loss = 0.07386006 train accuracy = 0.9720 test accuracy = 0.8997
Epoch: 25 loss = 0.06799655 train accuracy = 0.9743 test accuracy = 0.8974
Epoch: 26 loss = 0.06446645 train accuracy = 0.9760 test accuracy = 0.8993
Epoch: 27 loss = 0.05809086 train accuracy = 0.9780 test accuracy = 0.8986
Epoch: 28 loss = 0.05627620 train accuracy = 0.9798 test accuracy = 0.8984
Epoch: 29 loss = 0.05158266 train accuracy = 0.9818 test accuracy = 0.9000
Epoch: 30 loss = 0.04746664 train accuracy = 0.9829 test accuracy = 0.8989
Learning Finished!
이렇게 쉬운 데이터셋 문제로 모델링하면 트레이닝 퍼포먼스가 100가까이 나오는 것은 흔하다.
만약에 트레이닝 퍼포먼스가 100프로 에 가까운데 테스트 퍼포먼스가 70프로 이하로 나온다 하면 오버피팅을 의심해야한다.
위의 스텝들을 진행해보면서 여러가지 옵션들을 조합하여 MLP의 성능을 최대한 끌어올려 보았다. 이런 조합들을 적당히 잘 섞어서 튜닝하면 조금 더 올릴 수도 있을것이다.
그런데 테스트 퍼포먼스는 99퍼가 나오기는 힘들것이다.
근데 여기서 CNN을 쓴다!? 그러면 99퍼에 가까운 성능을 낼 수 있는 모델을 구현할 수 있다.