Transpose原理及SUM维度操作的思考
/ 5 min read
Table of Contents
transpose实际原理
即,改变遍历的顺序,例如16个元素,假定shape为(2, 2, 4),则strides为(8, 4, 1),每个轴为i,j,k,通过三层循环即可遍历。transpose需将strides,调换位置,例如交换0,1轴,strides变为(4, 8, 1)
对k轴,每1个数加1,对j轴,每8个数加1,对i轴,每4个数加1。
import numpy as np
memory = np.arange(16)shape = (2, 2, 4)# stride = (8, 4, 1)stride = (4, 8, 1) # transpose后i_idx = j_idx = k_idx = 0for i in range(shape[0]): # 2 i_idx = i * stride[0] # 4 for j in range(shape[1]): # 2 j_idx = i_idx + j * stride[1] # 8 for k in range(shape[2]): # 4 k_idx = j_idx + k * stride[2] # 1 print(memory[k_idx], end=' ') print()另一种角度也可思考为索引的变化,比如轴为(x,y,z),相当于转置为(y,x,z),忽略掉z,即相当于每个item为一个长度为4的array,也就变为了二维矩阵的转置,更加好理解一些。二维矩阵的转置,相当于双层for循环(i和j)变换位置(如果使用二维矩阵存储,从这种角度,i、j也就是对应的索引),在上述代码相当于交换stride的过程。
参考
下面这个文献有详细的解释和配图,推荐查看(比我写的清晰)
python - How does NumPy’s transpose() method permute the axes of an array? - Stack Overflow
实际应用
假设有两张图片,存储为(N,C,H*W) = (2,3,4), 令其为a=np.arange(1,25)
若想拼接图片,即拼接为(3,4*2)的这种情况,想要的结果为
array([[1, 2, 3, 4, 13, 14, 15, 16], [ 5, 6, 7, 8, 17, 18, 19, 20], [ 9, 10, 11, 12, 21, 22, 23, 24]])如果直接使用reshape或者view,得到的结果为
tensor([[ 1, 2, 3, 4, 5, 6, 7, 8], [ 9, 10, 11, 12, 13, 14, 15, 16], [17, 18, 19, 20, 21, 22, 23, 24]])出现这种情况是,矩阵存储时数据是连续的,所以会保持这种1->24的顺序
解决方法是,先做transpose,再做view,使用view后,会提示下面错误,
view size is not compatible with input tensor's size and stride (at least one dimension spans across two contiguous subspaces).# 至少有一个维度,跨越了两个连续的子空间,所以view行不通我们可以还使用reshape,来避免view的问题。
reshape函数的定义
解释: 如果可以返回view,则返回view。不能则返回副本
Returns a tensor with the same data and number of elements as input, but with the specified shape. When possible, the returned tensor will be a view of input. Otherwise, it will be a copy. Contiguous inputs and inputs with compatible strides can be reshaped without copying, but you should not depend on the copying vs. viewing behavior.
torch.reshape — PyTorch 2.5 documentation
这就相当于如果view不成立,则调用了contiguous()函数返回一个在内存中连续的Tensor
最终代码
b = torch.transpose(a, 1, 0)print(b.reshape((3, 8)))tensor([[ 1, 2, 3, 4, 13, 14, 15, 16], [ 5, 6, 7, 8, 17, 18, 19, 20], [ 9, 10, 11, 12, 21, 22, 23, 24]])SUM的原理
每个维度都可以考虑为是一个大括号,例如形状(2,2,4),为矩阵A
- axis=0,则有2个(2,4)的矩阵
- axis=1,每个(2,4)的矩阵,有2个长度为[4]的array
- axis=2,每个array,有4个数
求和即是对应轴的元素相加的含义
sum后的结果可能不会keepdims,所以需要注意一下
- np.sum(A,axis=0) -> 矩阵(2,4) + 矩阵(2,4) ->shape(1, 2, 4)
- np.sum(A,axis=1) -> 每个矩阵(2,4)内的array求和,即array + array ->shape(2, 1, 4)
- np.sum(A,axis=2) -> 每个array的四个元素求和 ->shape(2, 2, 1)
推广来说,如果求和多次,例如axis=(1,2),则从(2, 2, 4)->(2, 1, 1),即对两个(2, 4)矩阵进行逐元素相加,相当于先执行上面提到的操作2、3,或者先执行操作3再执行操作2。
If axis is a tuple of ints, a sum is performed on all of the axes specified in the tuple instead of a single axis or all the axes as before.
参考
其他参考
关于python 高维数组transpose的实现原理以及pytorch view等的思考_transpose(1, 2)-CSDN博客