SVM Loss的循环和矩阵实现笔记
/ 6 min read
Table of Contents
SVM Loss的公式
如果我们对 取负号,那么即
显然,正确类的分数要高于误分类的分数,且大于一个间隔, 再从优化的角度看,我们最后需要对Loss最小化,所以前面加一个负号,这就构成了Hinge 合页损失。
Native SVM Loss
每次迭代一个样本(1,F),F表示特征数量,计算在每个类别的分数(1,C),对每个类别和正确的分数通过公式计算,若>0则产生了Loss。
此时, 对应两个类别的权重W, 计算时,系数都是输入的这个样本, 根据正负,,减去x[i],, 加x[i].
最后, 再加上正则化项的系数即可.
Code如下:
def svm_loss_naive( W: torch.Tensor, X: torch.Tensor, y: torch.Tensor, reg: float): """ Structured SVM loss function, naive implementation (with loops).
Inputs have dimension D, there are C classes, and we operate on minibatches of N examples. When you implment the regularization over W, please DO NOT multiply the regularization term by 1/2 (no coefficient).
Inputs: - W: A PyTorch tensor of shape (D, C) containing weights. - X: A PyTorch tensor of shape (N, D) containing a minibatch of data. - y: A PyTorch tensor of shape (N,) containing training labels; y[i] = c means that X[i] has label c, where 0 <= c < C. - reg: (float) regularization strength
Returns a tuple of: - loss as torch scalar - gradient of loss with respect to weights W; a tensor of same shape as W """ dW = torch.zeros_like(W) # initialize the gradient as zero
# compute the loss and the gradient num_classes = W.shape[1] num_train = X.shape[0] loss = 0.0 for i in range(num_train): scores = W.t().mv(X[i]) correct_class_score = scores[y[i]] for j in range(num_classes): if j == y[i]: continue # -(correct_class_score - scores[j] - 1) margin = scores[j] - correct_class_score + 1 # note delta = 1 if margin > 0: loss += margin ####################################################################### # TODO: # # Compute the gradient of the SVM term of the loss function and store # # it on dW. (part 1) Rather than first computing the loss and then # # computing the derivative, it is simple to compute the derivative # # at the same time that the loss is being computed. # ####################################################################### # Replace "pass" statement with your code # 因为此时在公式中W的系数都是x[i] dW[:, j] += X[i] dW[:, y[i]] -= X[i] ####################################################################### # END OF YOUR CODE # #######################################################################
# Right now the loss is a sum over all training examples, but we want it # to be an average instead so we divide by num_train. loss /= num_train
# Add regularization to the loss. loss += reg * torch.sum(W * W)
############################################################################# # TODO: # # Compute the gradient of the loss function w.r.t. the regularization term # # and add it to dW. (part 2) # ############################################################################# dW /= num_train dW += reg * 2 * W ############################################################################# # END OF YOUR CODE # #############################################################################
return loss, dW
Vectorized SVM Loss
可直接用 (N,F) @ (F,C), 得到所有结果的分数, 通过索引得到正确分类的分数(N,1), 根据公式, 求出Margin的结果(N,C).
显然, 那么哪些类贡献了Loss, Margin的结果大于0, 因此可创建Mask(N,C). 需要注意的是,正确的类也参与了Loss的计算,所以其Mask为误分类点数量的负值.
输入(N,F).T @ Mask(N,C) 结果为 (F,C), 最后的结果再加上正则化项则解出最终结果.
可以这样理解, 对输入转置后, 每个样本都是列向量, 而Mask中的每个列向量则表示样本的线性组合
即贡献Loss的误分类样本的组合
, 因此恰好是能得到结果的.
这种恰好在深度学习中很常见.
Code如下:
def svm_loss_vectorized( W: torch.Tensor, X: torch.Tensor, y: torch.Tensor, reg: float): """ Structured SVM loss function, vectorized implementation. When you implment the regularization over W, please DO NOT multiply the regularization term by 1/2 (no coefficient). The inputs and outputs are the same as svm_loss_naive.
Inputs: - W: A PyTorch tensor of shape (D, C) containing weights. - X: A PyTorch tensor of shape (N, D) containing a minibatch of data. - y: A PyTorch tensor of shape (N,) containing training labels; y[i] = c means that X[i] has label c, where 0 <= c < C. - reg: (float) regularization strength
Returns a tuple of: - loss as torch scalar - gradient of loss with respect to weights W; a tensor of same shape as W """ loss = 0.0 dW = torch.zeros_like(W) # initialize the gradient as zero
############################################################################# # TODO: # # Implement a vectorized version of the structured SVM loss, storing the # # result in loss. # ############################################################################# # Replace "pass" statement with your code N = X.shape[0] scores = X @ W # (N, C) y_correct_pred = scores[torch.arange(N),y].reshape(-1,1)# (N,1) # print("y_correct_pred", y_correct_pred.shape) margin = - (y_correct_pred - scores - 1)# (N,C) margin[torch.arange(N),y] = 0 # (N, C) margin = torch.maximum(margin, torch.tensor(0.0)) # (N, C) loss = torch.sum(margin)/N + reg * torch.sum(W * W) ############################################################################# # END OF YOUR CODE # #############################################################################
############################################################################# # TODO: # # Implement a vectorized version of the gradient for the structured SVM # # loss, storing the result in dW. # # # # Hint: Instead of computing the gradient from scratch, it may be easier # # to reuse some of the intermediate values that you used to compute the # # loss. # ############################################################################# mask = (margin > 0).double() # 误分类点 mask_sum = torch.sum(mask, dim=1) mask[torch.arange(N), y] -= mask_sum # 拉大正确的类和其他部分的差距 dW = X.t() @ mask # Normalize and add regularization term dW /= N dW += 2 * reg * W ############################################################################# # END OF YOUR CODE # ############################################################################# return loss, dW