skip to content
Liu Yang's Blog

SVM Loss的循环和矩阵实现笔记

/ 6 min read

Table of Contents

SVM Loss的公式

Li=jyimax(0,WjTXiWyiTXi+Δ)L_i = \sum_{j \neq y_i} \max(0, W_j^T X_i - W_{y_i}^T X_i + \Delta)

如果我们对 WjTXiWyiTXi+ΔW_j^T X_i - W_{y_i}^T X_i + \Delta 取负号,那么即 (WjTXiWyiTXi+Δ)-(W_j^T X_i - W_{y_i}^T X_i + \Delta)

显然,正确类的分数要高于误分类的分数,且大于一个间隔, 再从优化的角度看,我们最后需要对Loss最小化,所以前面加一个负号,这就构成了Hinge 合页损失。

Native SVM Loss

每次迭代一个样本(1,F),F表示特征数量,计算在每个类别的分数(1,C),对每个类别和正确的分数通过公式计算,若>0则产生了Loss。

此时, 对应两个类别的权重W, 计算时,系数都是输入的这个样本, 根据正负,dWyidW_{y_i},减去x[i],dWjdW_j, 加x[i].

最后, 再加上正则化项的系数即可.

Code如下:

def svm_loss_naive(
W: torch.Tensor, X: torch.Tensor, y: torch.Tensor, reg: float
):
"""
Structured SVM loss function, naive implementation (with loops).
Inputs have dimension D, there are C classes, and we operate on minibatches
of N examples. When you implment the regularization over W, please DO NOT
multiply the regularization term by 1/2 (no coefficient).
Inputs:
- W: A PyTorch tensor of shape (D, C) containing weights.
- X: A PyTorch tensor of shape (N, D) containing a minibatch of data.
- y: A PyTorch tensor of shape (N,) containing training labels; y[i] = c means
that X[i] has label c, where 0 <= c < C.
- reg: (float) regularization strength
Returns a tuple of:
- loss as torch scalar
- gradient of loss with respect to weights W; a tensor of same shape as W
"""
dW = torch.zeros_like(W) # initialize the gradient as zero
# compute the loss and the gradient
num_classes = W.shape[1]
num_train = X.shape[0]
loss = 0.0
for i in range(num_train):
scores = W.t().mv(X[i])
correct_class_score = scores[y[i]]
for j in range(num_classes):
if j == y[i]:
continue
# -(correct_class_score - scores[j] - 1)
margin = scores[j] - correct_class_score + 1 # note delta = 1
if margin > 0:
loss += margin
#######################################################################
# TODO: #
# Compute the gradient of the SVM term of the loss function and store #
# it on dW. (part 1) Rather than first computing the loss and then #
# computing the derivative, it is simple to compute the derivative #
# at the same time that the loss is being computed. #
#######################################################################
# Replace "pass" statement with your code
# 因为此时在公式中W的系数都是x[i]
dW[:, j] += X[i]
dW[:, y[i]] -= X[i]
#######################################################################
# END OF YOUR CODE #
#######################################################################
# Right now the loss is a sum over all training examples, but we want it
# to be an average instead so we divide by num_train.
loss /= num_train
# Add regularization to the loss.
loss += reg * torch.sum(W * W)
#############################################################################
# TODO: #
# Compute the gradient of the loss function w.r.t. the regularization term #
# and add it to dW. (part 2) #
#############################################################################
dW /= num_train
dW += reg * 2 * W
#############################################################################
# END OF YOUR CODE #
#############################################################################
return loss, dW

Vectorized SVM Loss

可直接用 (N,F) @ (F,C), 得到所有结果的分数, 通过索引得到正确分类的分数(N,1), 根据公式, 求出Margin的结果(N,C).

显然, 那么哪些类贡献了Loss, Margin的结果大于0, 因此可创建Mask(N,C). 需要注意的是,正确的类也参与了Loss的计算,所以其Mask为误分类点数量的负值.

输入(N,F).T @ Mask(N,C) 结果为 (F,C), 最后的结果再加上正则化项则解出最终结果.

可以这样理解, 对输入转置后, 每个样本都是列向量, 而Mask中的每个列向量则表示样本的线性组合贡献Loss的误分类样本的组合, 因此恰好是能得到结果的.

这种恰好在深度学习中很常见.

Code如下:

def svm_loss_vectorized(
W: torch.Tensor, X: torch.Tensor, y: torch.Tensor, reg: float
):
"""
Structured SVM loss function, vectorized implementation. When you implment
the regularization over W, please DO NOT multiply the regularization term by
1/2 (no coefficient). The inputs and outputs are the same as svm_loss_naive.
Inputs:
- W: A PyTorch tensor of shape (D, C) containing weights.
- X: A PyTorch tensor of shape (N, D) containing a minibatch of data.
- y: A PyTorch tensor of shape (N,) containing training labels; y[i] = c means
that X[i] has label c, where 0 <= c < C.
- reg: (float) regularization strength
Returns a tuple of:
- loss as torch scalar
- gradient of loss with respect to weights W; a tensor of same shape as W
"""
loss = 0.0
dW = torch.zeros_like(W) # initialize the gradient as zero
#############################################################################
# TODO: #
# Implement a vectorized version of the structured SVM loss, storing the #
# result in loss. #
#############################################################################
# Replace "pass" statement with your code
N = X.shape[0]
scores = X @ W # (N, C)
y_correct_pred = scores[torch.arange(N),y].reshape(-1,1)# (N,1)
# print("y_correct_pred", y_correct_pred.shape)
margin = - (y_correct_pred - scores - 1)# (N,C)
margin[torch.arange(N),y] = 0 # (N, C)
margin = torch.maximum(margin, torch.tensor(0.0)) # (N, C)
loss = torch.sum(margin)/N + reg * torch.sum(W * W)
#############################################################################
# END OF YOUR CODE #
#############################################################################
#############################################################################
# TODO: #
# Implement a vectorized version of the gradient for the structured SVM #
# loss, storing the result in dW. #
# #
# Hint: Instead of computing the gradient from scratch, it may be easier #
# to reuse some of the intermediate values that you used to compute the #
# loss. #
#############################################################################
mask = (margin > 0).double() # 误分类点
mask_sum = torch.sum(mask, dim=1)
mask[torch.arange(N), y] -= mask_sum # 拉大正确的类和其他部分的差距
dW = X.t() @ mask
# Normalize and add regularization term
dW /= N
dW += 2 * reg * W
#############################################################################
# END OF YOUR CODE #
#############################################################################
return loss, dW