skip to content
Liu Yang's Blog

matmul、dot、multiply区别

/ 6 min read

Updated:
Table of Contents

multiply*对应位置的乘积(element-wise product)

二者等价,为逐元素乘法,

matmul@:矩阵乘法

此为矩阵乘法,具体规则如下

The behavior depends on the arguments in the following way.
# 如果均为2D ,则为矩阵乘法
If both arguments are 2-D they are multiplied like conventional matrices.
# 如果一个是N-D的,则把其视为N个矩阵的堆叠
If either argument is N-D, N > 2, it is treated as a stack of matrices residing in the last two indexes and broadcast accordingly.
则,sum(a[0, 1, :] * b[0 , :, 1])
# 如果任何一个参数是一维的,则其会被广播
If the first argument is 1-D, it is promoted to a matrix by prepending a 1 to its dimensions. After matrix multiplication the prepended 1 is removed.
If the second argument is 1-D, it is promoted to a matrix by appending a 1 to its dimensions. After matrix multiplication the appended 1 is removed.

dot数量积、标量积(scalar product)或者内积(inner product)

指实数域中的两个向量运算得到一个实数值标量的二元运算。

Dot product of two arrays. Specifically,
# 如果均为一维的,则返回内积
If both a and b are 1-D arrays, it is inner product of vectors (without complex conjugation).
# 如果均为两维的,则视为矩阵乘法,即@或matmul
If both a and b are 2-D arrays, it is matrix multiplication, but using matmul or a @ b is preferred.
# 如果存在标量,则视为逐元素乘法,即multiply
If either a or b is 0-D (scalar), it is equivalent to multiply and using numpy.multiply(a, b) or a * b is preferred.
# 如果A是N维,B是一维,则将A视为多个一维array的堆叠,计算逐元素内积
If a is an N-D array and b is a 1-D array, it is a sum product over the last axis of a and b.
# 如果A是N维,B是M(>2)维,最后一个轴和 b 的倒数第二个轴进行相乘
If a is an N-D array and b is an M-D array (where M>=2), it is a sum product over the last axis of a and the second-to-last axis of b:
`dot(a, b)[i,j,k,m] = sum(a[i,j,:] * b[k,:,m])`

对于dot的最后与@的第二规则的对比

b为一维矩阵的时候

# 定义一个 3-D array (2, 3, 4)
low=0
high=100
a_shape= 2, 3, 4
b_shape= 4, 5
a = np.random.randint(low, high, size=a_shape)
# 定义一个 2-D array (4, 5)
b = np.random.randint(low, high, size=b_shape)
r1,r2 = np.dot(a, b),np.matmul(a, b)
print(r1.shape,r2.shape)
np.array_equal(r1,r2)

结果,此为特殊情况

Terminal window
(2, 3, 5) (2, 3, 5)
True

考虑b为多维矩阵的情况

low = 0
high = 100
a_shape = 2, 2, 4
b_shape = 2, 4, 2
a = np.random.randint(low, high, size=a_shape)
b = np.random.randint(low, high, size=b_shape)
r1, r2 = np.dot(a, b), np.matmul(a, b)
# 对于matmul,为对应位置的矩阵相加
print(np.array_equal(a[0] @ b[0], r2[0]), np.array_equal(a[1] @ b[1], r2[1]))
# 判断matmul和dot的结果是否相等
print(r1.shape, r2.shape)
np.array_equal(r1, r2),np.array_equal(r1[0][0], r2[0])
Terminal window
True True
(2, 2, 2, 2) (2, 2, 2)
(False, False)

结果分析

dot:dot(a, b)[i,j,k,m] = sum(a[i,j,:] * b[k,:,m])(numpy官方表达式) matmul:matmul(a, b)[i,j,m] = sum(a[i,j,:] * b[i,:,m])(手动推导)

设a[i,j,:]表示一行,则有4行, b[k,:,m]表示一列,则有4列
对于matmul,i和k是相同的,即对应位置的矩阵相乘,所以最终结果是两个矩阵:
a[1] - b[1]
a[2] - b[2]
对于dot,i和k不固定,所以a的每个矩阵,可以同时和两个矩阵做矩阵乘法,得到四个矩阵:
a[1] b[1]
x
a[2] b[2]
i挑选a的一个矩阵, k挑选b的一个矩阵,所以我们仅需令i,k相等,即可打印出相同的矩阵:
r2[i], r2[i,(j,k)]
特别的,当b只有一个矩阵时,结果也就和matmul一样了,上一次实验的结果可以证明
i = 0
np.array_equal(r1[i,:,i], r2[i])
#结果为True

总结

multiply*是逐元素乘法

matmul@是矩阵乘法,高维情况则将最后两维视为矩阵,即高维都可以视为batch,batch不对应则广播,广播后A和B同位置矩阵做矩阵乘法

dot的情况复杂:

  1. 均为一维的,则返回内积
  2. 均为两维的,则视为矩阵乘法,即@或matmul
  3. 存在标量,则视为逐元素乘法,即multiply
  4. 如果A是N维,B是一维,则将A视为多个一维array的堆叠,计算逐元素内积
  5. 如果A是N维,B是M(>2)维,虽然依然是做矩阵乘法,但是为A和B的矩阵交叉相乘,不同于

参考文献

numpy.matmul — NumPy v2.1 Manual

numpy.dot — NumPy v2.1 Manual