matmul、dot、multiply区别 • Liu Yang's Blog

`multiply`和`*`：对应位置的乘积（element-wise product）

二者等价，为逐元素乘法，

`matmul`和`@`：矩阵乘法

此为矩阵乘法，具体规则如下

The behavior depends on the arguments in the following way.
# 如果均为2D ，则为矩阵乘法
If both arguments are 2-D they are multiplied like conventional matrices.
# 如果一个是N-D的，则把其视为N个矩阵的堆叠
If either argument is N-D, N > 2, it is treated as a stack of matrices residing in the last two indexes and broadcast accordingly.
则，sum(a[0, 1, :] * b[0 , :, 1])
# 如果任何一个参数是一维的，则其会被广播
If the first argument is 1-D, it is promoted to a matrix by prepending a 1 to its dimensions. After matrix multiplication the prepended 1 is removed.
If the second argument is 1-D, it is promoted to a matrix by appending a 1 to its dimensions. After matrix multiplication the appended 1 is removed.

`dot`：数量积、标量积（scalar product）或者内积（inner product）

指实数域中的两个向量运算得到一个实数值标量的二元运算。

Dot product of two arrays. Specifically,
# 如果均为一维的，则返回内积
If both a and b are 1-D arrays, it is inner product of vectors (without complex conjugation).
# 如果均为两维的，则视为矩阵乘法，即@或matmul
If both a and b are 2-D arrays, it is matrix multiplication, but using matmul or a @ b is preferred.
# 如果存在标量，则视为逐元素乘法，即multiply
If either a or b is 0-D (scalar), it is equivalent to multiply and using numpy.multiply(a, b) or a * b is preferred.
# 如果A是N维，B是一维，则将A视为多个一维array的堆叠，计算逐元素内积
If a is an N-D array and b is a 1-D array, it is a sum product over the last axis of a and b.
# 如果A是N维，B是M(>2)维，最后一个轴和 b 的倒数第二个轴进行相乘
If a is an N-D array and b is an M-D array (where M>=2), it is a sum product over the last axis of a and the second-to-last axis of b:
`dot(a, b)[i,j,k,m] = sum(a[i,j,:] * b[k,:,m])`

对于`dot`的最后与`@`的第二规则的对比

b为一维矩阵的时候

# 定义一个 3-D array (2, 3, 4)
low=0
high=100
a_shape= 2, 3, 4
b_shape= 4, 5
a = np.random.randint(low, high, size=a_shape)
# 定义一个 2-D array (4, 5)
b = np.random.randint(low, high, size=b_shape)
r1,r2 = np.dot(a, b),np.matmul(a, b)
print(r1.shape,r2.shape)
np.array_equal(r1,r2)

结果，此为特殊情况

(2, 3, 5) (2, 3, 5)
True

考虑b为多维矩阵的情况

low = 0
high = 100
a_shape = 2, 2, 4
b_shape = 2, 4, 2
a = np.random.randint(low, high, size=a_shape)
b = np.random.randint(low, high, size=b_shape)

r1, r2 = np.dot(a, b), np.matmul(a, b)
# 对于matmul,为对应位置的矩阵相加
print(np.array_equal(a[0] @ b[0], r2[0]), np.array_equal(a[1] @ b[1], r2[1]))
# 判断matmul和dot的结果是否相等
print(r1.shape, r2.shape)
np.array_equal(r1, r2),np.array_equal(r1[0][0], r2[0])

True True
(2, 2, 2, 2) (2, 2, 2)
(False, False)

结果分析

dot:dot(a, b)[i,j,k,m] = sum(a[i,j,:] * b[k,:,m])（numpy官方表达式） matmul:matmul(a, b)[i,j,m] = sum(a[i,j,:] * b[i,:,m])(手动推导)

设a[i,j,:]表示一行，则有4行, b[k,:,m]表示一列，则有4列
对于matmul，i和k是相同的,即对应位置的矩阵相乘,所以最终结果是两个矩阵:
    a[1] - b[1]

    a[2] - b[2]
对于dot,i和k不固定,所以a的每个矩阵,可以同时和两个矩阵做矩阵乘法,得到四个矩阵:
    a[1]   b[1]
         x
    a[2]   b[2]
i挑选a的一个矩阵, k挑选b的一个矩阵，所以我们仅需令i,k相等，即可打印出相同的矩阵：
    r2[i], r2[i,(j,k)]
特别的，当b只有一个矩阵时，结果也就和matmul一样了，上一次实验的结果可以证明

i = 0
np.array_equal(r1[i,:,i], r2[i])
#结果为True

总结

multiply和*是逐元素乘法

matmul和@是矩阵乘法，高维情况则将最后两维视为矩阵，即高维都可以视为batch，batch不对应则广播，广播后A和B同位置矩阵做矩阵乘法

dot的情况复杂:

均为一维的，则返回内积
均为两维的，则视为矩阵乘法，即@或matmul
存在标量，则视为逐元素乘法，即multiply
如果A是N维，B是一维，则将A视为多个一维array的堆叠，计算逐元素内积
如果A是N维，B是M(>2)维，虽然依然是做矩阵乘法，但是为A和B的矩阵交叉相乘，不同于

参考文献

numpy.matmul — NumPy v2.1 Manual

numpy.dot — NumPy v2.1 Manual