Python - PyTorch - INCLAVIC

Introduction

將複雜的深度學習（Deep Learning，以下簡稱 DL）演算法簡化，用 PyTorch 套件內建函釋即可運算
專注於建構 DL 模型
可使用 GPU 加速模型訓練

Course

可以參考我的課程簡報：

PyTorch Introduction

What is PyTorch

DL Framework
主要功能：
- Build Neural Network
- Loss Function & Optimizers

Syntax

Tensor
高維度的向量（與 NumPy Array 相似）
- 向量相乘相加相除內積都可以在 Tensor 上做
- NumPy 用的熟的話，應該可以無痛轉接

Cuda

確定 Cuda 活著

import torch
torch.cuda.is_available()

"""
Output:

True
"""

Tensor

建立 empty 的 Tensor

# 建立一維空的 tensor
x = torch.empty(5) # 一維的 tensor
print(x)

# 建立指定維度的 tensor
x = torch.empty((1, 2)) # 二維的 tensor：row 是 1、column 是 2
print(x)

"""
Output:

tensor([-7.4865e+37,  3.2980e-41,  3.0829e-44,  0.0000e+00, -7.4866e+37])
tensor([[-6.2235e+37,  3.2980e-41]])
"""

建立全部填滿指定值的 Tensor

# 建立全部填滿 0 的二維 tensor
x = torch.zeros((2, 3))
print(x)

# 建立全部填滿 1 的二維 tensor
x = torch.ones((3, 2))
print(x)

"""
Output:
tensor([[0., 0., 0.],
        [0., 0., 0.]])
tensor([[1., 1.],
        [1., 1.],
        [1., 1.]])
"""

建立全部填滿隨機值的 Tensor

x = torch.rand((2, 2))
print(x)

"""
Output:

tensor([[0.3799, 0.7677],
        [0.2072, 0.4638]])
"""

建立填滿自己指定值的 tensor（from List）

x = torch.tensor([[1, 2], [3, 4]])
print(x)

"""
Output:

tensor([[1, 2],
        [3, 4]])
"""

查看 Data Type

x = torch.ones((2, 2))
print(x)
print(x.dtype)

# torch.float32 是 c++ 裡的 float
# torch.float64 是 c++ 裡的 double

"""
Output:

tensor([[1., 1.],
        [1., 1.]])
torch.float32
"""

賦予 Data Type

x = torch.ones((2, 2), dtype=torch.int64)
print(x.dtype)

"""
Output:

torch.int64
"""

Data Size

x = torch.ones((2, 2))
print(x.size())
print(x.shape)

# Tensor 的 size 其實就是 NumPy 的 shape，是一樣的東西
# （甚至他們可以混用）

"""
Output:

torch.Size([2, 2])
torch.Size([2, 2])
"""

Tensor 和 NumPy 的轉換

x = torch.ones((1, 3))
print("tensor:", x)

y = x.numpy()
print("tensor to numpy:", y)

z = torch.from_numpy(y)
print("numpy to tensor:", z)

"""
Output:

tensor: tensor([[1., 1., 1.]])
tensor to numpy: [[1. 1. 1.]]
numpy to tensor: tensor([[1., 1., 1.]])
"""

# 轉換後記憶體位置會相同，因此資料處理時會同步！！！

x = torch.ones((1, 3))
y = x.numpy()

x += 1
print(x)
print(y)

# （因為 python 是個罪惡的語言，動態語言的特性讓他資料處理很不嚴謹）

"""
Output:

tensor([[2., 2., 2.]])
[[2. 2. 2.]]
"""

使用 GPU

# 理論上在 GPU 的運算會比用 CPU 快很多
# 但還是依模型而定，本次作業的模型很小，可能 GPU 與 CPU 無法體現明顯差異，若各位有機會訓練大型模型，可以感受看看一個跑三天一個跑三小時的那個愉悅感（？）

# 這個 if 是看你電腦的 GPU 能法被 PyTorch 調用
if torch.cuda.is_available():
  device = 'cuda'
else:
  device = 'cpu'

x = torch.ones(3, device=device)
y = torch.zeros(3)
y = y.to(device) # 把運算 copy 一份到 device 指定的 GPU 上，之後的運算都在 GPU 上進行

z = x + y
print(z)

"""
Output:

tensor([1., 1., 1.], device='cuda:0')
"""

# 算術運算必須同時在 CPU 或同時在 GPU 才能進行

if torch.cuda.is_available():
  device = 'cuda'
else:
  device = 'cpu'

x = torch.ones(3, device=device)
y = torch.zeros(3)
y = y.to('cpu')

# 如此一來，x 在 GPU 上、y 在 CPU 上，就會無法運算
z = x + y
print(z)

"""
Output:

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-21-65f30c7e20ec> in <cell line: 13>()
     11 
     12 # 如此一來，x 在 GPU 上、y 在 CPU 上，就會無法運算
---> 13 z = x + y
     14 print(z)

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!
"""

Gradient

梯度
在建立需要更新的數值時
- 將 requires_grad 設為 True
- 開啟後計算就會自動產生 Backward Function

# 在建立需要更新的數值時，將 requires_grad 設為 True
# 開啟後計算就會自動產生 Backward Function

x = torch.tensor([5.], requires_grad=True)
print("x: ", x)

y = x + 2
print("y: ", y)

z = y ** 2
print("z: ", z)

"""
Output:

x:  tensor([5.], requires_grad=True)
y:  tensor([7.], grad_fn=<AddBackward0>)
z:  tensor([49.], grad_fn=<PowBackward0>)
"""

例題：
- y = x + 2 且 z = y * y
- 若想計算當 x = 5 時，z 所累積的梯度，該如何計算 ?

數學算法，手會這樣算：

程式的話，要這樣寫：

x = torch.tensor([5.], requires_grad=True)
print("x: ", x)

y = x + 2
print("y: ", y)

z = y ** 2
print("z: ", z)

# 在計算 Gradient 就是把它 backward 回去
z.backward()
print("Gradient: ", x.grad)

"""
Output:

x:  tensor([5.], requires_grad=True)
y:  tensor([7.], grad_fn=<AddBackward0>)
z:  tensor([49.], grad_fn=<PowBackward0>)
Gradient:  tensor([14.])
"""

停止Gradient累計

如何停止 Gradient 累計 ?
- 當更新神經網路權重時，不需要產生 Gradient
有三種方法：
1. 直接關閉，將 requires_grad_ 設為 False
2. 使用 detach，會複製一份（預設requires_grad　為 False）
3. with torch.no_grad()，不會對 requires_grad 產生改變，但運算時不累積梯度

# 1. 直接關閉，將 requires_grad_ 設為 False

x = torch.ones(3, requires_grad=True)
print("x with requires_grad: ", x)

x.requires_grad_ = False
print("x not with requires_grad: ", x)

"""
Output:

x with requires_grad:  tensor([1., 1., 1.], requires_grad=True)
x not with requires_grad:  tensor([1., 1., 1.], requires_grad=True)
"""

# 2. 使用 detach，會複製一份（預設requires_grad　為 False）
# 但一樣會共享記憶體位址

x = torch.ones(3, requires_grad=True)
print("x: ", x)

# 複製一份 x 給 y，但沒有開 gradient
y = x.detach()
print("y: ", y)

y += 1
print(x) # 因為記憶體位址共享

"""
Output:

x:  tensor([1., 1., 1.], requires_grad=True)
y:  tensor([1., 1., 1.])
tensor([2., 2., 2.], requires_grad=True)
"""

# 3. with torch.no_grad()，不會對 requires_grad 產生改變，但運算時不累積梯度
# 最常見，必用的一招
# 最常用在 inference 階段（模型推論）

x = torch.ones(3, requires_grad=True)

with torch.no_grad():
  print("x: ", x)
  y = 2 * (x + 5)
  print("y: ", y)

	# x 的 requires_grad 沒有被關閉
	# 但在使用 x 做運算，存值給 y 時，不累積梯度

"""
Output:

x:  tensor([1., 1., 1.], requires_grad=True)
y:  tensor([12., 12., 12.])
"""

清空 Gradient

Pytorch 的 gradient 會累加，每次用完都需要清空 gradient
- x.grad.zero_()

# Pytorch 的 gradient 會累加，每次用完都需要清空 gradient

x = torch.tensor(1.0, requires_grad=True)
y = x + 2
z = y * y

z.backward()
print("x with gradient: ", x.grad)

x.grad.zero_()
print("x with grad_zero: ", x.grad)

"""
Output:

x with gradient:  tensor(6.)
x with grad_zero:  tensor(0.)
"""

Module

Model 模型
- 用模型建網路
Optimizer 優化器/最佳化器
- Gradient Descent 梯度下降的方式
- Adam、SGD… 之類的
Criterion 評分標準
- Loss Function

Model 模型

依序建立模型的每層內容
設定模型 forward 的順序

import torch.nn as nn

# 假設今天的輸入是一個數字，讓他做 f = wx + b

class Net(nn.Module):
  def __init__(self, in_channels, out_channels):
    # model layers

    # 自定義神經網路，對繼承至父親的屬性進行初始化
    super(Net, self).__init__()

    # nn.Linear 定義一個神經網路的線性層
    # in_channels 是前一層輸入的神經元個數
    # out_channels 是本層輸出的神經元個數
    # bias 預設都是 True
    self.fc = nn.Linear(in_channels, out_channels)

  def forward(self, x):
    # model structure
    # 希望輸出的 x 是輸入的 x 經過 self.fc 後的結果
    x = self.fc(x)
    return x

# 宣告 model
# 輸入的 channel 是一個數字，輸出的 channel 也是一個數字
model = Net(in_channels=1, out_channels=1)
print(model)

"""
Output:

Net(
  (fc): Linear(in_features=1, out_features=1, bias=True)
)
"""

Optimizer 優化器/最佳化器

Gradient Descent 梯度下降的方式
Adam、SGD… 之類的
更新權重的方式

Criterion 評分標準

Loss Function
Mean Square Error、CrossEntropy、Binary Cross Entropy… 之類的

Modle Training

Model 設定為 Train 模式
Forward
Calculate Loss
Backward

# 初始化 X, Y 為 3 * 1 的矩陣
X = torch.tensor([[1.], [2.], [3.]])
Y = torch.tensor([[4.], [6.], [8.]])

# 選擇 optimizer 為 Stochastic gradient descent 隨機梯度下降法，learning rate 為 0.01
# 選擇 critertion loss function 為 Mean Square Error 均方誤差
optimizer = torch.optim.SGD(model.parameters(), lr=0.01)
critertion = nn.MSELoss()
epoch = 500

# 將 Model 設定為 Train 模式
model.train()

for e in range(epoch):
    # zip 函式可以同時迭代多個 list
    for (x, y) in zip(X, Y):
        y_pred = model(x)
        loss = critertion(y_pred, y)

        optimizer.zero_grad()  # 梯度歸零
        loss.backward()     # 透過 backward 得到每個參數的梯度值
        optimizer.step()     # 透過梯度下降執行下一步參數更新

    # 輸出查看每 100 代的 loss 如何
    if (e + 1) % 100 == 0:
        print(f'epoch: {e + 1}, loss: {loss.item()}')

"""
Output:

epoch: 100, loss: 1.128821986640105e-05
epoch: 200, loss: 2.5122462830040604e-06
epoch: 300, loss: 5.59026375412941e-07
epoch: 400, loss: 1.2383770808810368e-07
epoch: 500, loss: 2.7535861590877175e-08
"""

Model Testing

Model 設定為 eval 模式（ evaluation 評估模式）
停止 gradient 累計

# 這邊的 model.fc 是沿用剛才 Build Model 的 function（客家人）
# 所以要用這裡的話，Build Model 要先跑過一次
w, b = model.fc.weight.item(), model.fc.bias.item()
print(f'f(x) = {w:.4}x + {b:.4}')

# Model 設定為 eval 模式（evaluation 評估模式）
model.eval()

# 停止 gradient 累計
with torch.no_grad():
    # test 輸入 X = 10 的時候會輸出什麼
    print(model(torch.tensor([10.])))

# 由於方才 X, Y 之間的關係是 Y = 2 * X + 2
# 這裡訓練出的結果是 f(x) = 2.0x + 1.999 其實非常接近了

"""
Output:

f(x) = 2.0x + 1.999
tensor([22.0016])
"""

Reference

上屆助教們的簡報 <(_ _)>
黃貞瑛老師的課程與簡報