这是此问题的后续措施:How to do a sum of sums of the square of sum of sums?我在哪里寻求使用einsum的帮助(以实现更快的速度)并获得了一个很好的答案.我也有suggestion使用numba.我已经尝试了两者,似乎在达到...

这是此问题的后续措施:
How to do a sum of sums of the square of sum of sums?
我在哪里寻求使用einsum的帮助(以实现更快的速度)并获得了一个很好的答案.
我也有suggestion使用numba.我已经尝试了两者,似乎在达到某个点之后,numba的速度提高要好得多.
那么,如何在不遇到内存问题的情况下加快速度呢?
解决方法:
下面的解决方案提出了3种不同的方法来进行简单的和,以及4种不同的方法来进行平方和.
总和3种方法-循环,JIT循环,einsum(没有遇到内存问题)
求和平方和的4种方法-循环,JIT循环,扩展einsum,中间einsum
在这里,前三个不会遇到内存问题,而for循环和扩展的einsum都会遇到速度问题.这使JIT解决方案看起来是最好的.
import numpy as np
import time
from numba import jit
def fun1(Fu, Fv, Fx, Fy, P, B):
Nu = Fu.shape[0]
Nv = Fv.shape[0]
Nx = Fx.shape[0]
Ny = Fy.shape[0]
Nk = Fu.shape[1]
Nl = Fv.shape[1]
I1 = np.zeros([Nu, Nv])
for iu in range(Nu):
for iv in range(Nv):
for ix in range(Nx):
for iy in range(Ny):
S = 0.
for ik in range(Nk):
for il in range(Nl):
S += Fu[iu,ik]*Fv[iv,il]*Fx[ix,ik]*Fy[iy,il]*P[ix,iy]*B[ik,il]
I1[iu, iv] += S
return I1
def fun2(Fu, Fv, Fx, Fy, P, B):
Nu = Fu.shape[0]
Nv = Fv.shape[0]
Nx = Fx.shape[0]
Ny = Fy.shape[0]
Nk = Fu.shape[1]
Nl = Fv.shape[1]
I2 = np.zeros([Nu, Nv])
for iu in range(Nu):
for iv in range(Nv):
for ix in range(Nx):
for iy in range(Ny):
S = 0.
for ik in range(Nk):
for il in range(Nl):
S += Fu[iu,ik]*Fv[iv,il]*Fx[ix,ik]*Fy[iy,il]*P[ix,iy]*B[ik,il]
I2[iu, iv] += S**2.
return I2
if __name__ == '__main__':
Nx = 30
Ny = 40
Nk = 50
Nl = 60
Nu = 70
Nv = 8
Fx = np.random.rand(Nx, Nk)
Fy = np.random.rand(Ny, Nl)
Fu = np.random.rand(Nu, Nk)
Fv = np.random.rand(Nv, Nl)
P = np.random.rand(Nx, Ny)
B = np.random.rand(Nk, Nl)
fjit1 = jit(fun1)
fjit2 = jit(fun2)
# For loop - becomes too slow so commented out
# t = time.time()
# I1 = fun1(Fu, Fv, Fx, Fy, P, B)
# print 'fun1 :', time.time() - t
# JIT compiled for loop - After a certain point beats einsum
t = time.time()
I1jit = fjit1(Fu, Fv, Fx, Fy, P, B)
print 'jit1 :', time.time() - t
# einsum great solution when no squaring is needed
t = time.time()
I1_ = np.einsum('uk, vl, xk, yl, xy, kl->uv', Fu, Fv, Fx, Fy, P, B)
print '1 einsum:', time.time() - t
# For loop - becomes too slow so commented out
# t = time.time()
# I2 = fun2(Fu, Fv, Fx, Fy, P, B)
# print 'fun2 :', time.time() - t
# JIT compiled for loop - After a certain point beats einsum
t = time.time()
I2jit = fjit2(Fu, Fv, Fx, Fy, P, B)
print 'jit2 :', time.time() - t
# Expanded einsum - As the size increases becomes very very slow
# t = time.time()
# I2_ = np.einsum('uk,vl,xk,yl,um,vn,xm,yn,kl,mn,xy->uv', Fu,Fv,Fx,Fy,Fu,Fv,Fx,Fy,B,B,P**2)
# print '2 einsum:', time.time() - t
# Intermediate einsum - As the sizes increase memory can become an issue
t = time.time()
temp = np.einsum('uk, vl, xk, yl, xy, kl->uvxy', Fu, Fv, Fx, Fy, P, B)
I2__ = np.einsum('uvxy->uv', np.square(temp))
print '2 einsum:', time.time() - t
# print 'I1 == I1_ :', np.allclose(I1, I1_)
print 'I1_ == Ijit1_:', np.allclose(I1_, I1jit)
# print 'I2 == I2_ :', np.allclose(I2, I2_)
print 'I2_ == Ijit2_:', np.allclose(I2__, I2jit)
评论:
请随时编辑/改善此答案.如果有人对并行化提出任何建议,那就太好了.
本文标题为:python-如何求和与平方和与内存限制?


基础教程推荐
- Python 多进程进程池Queue进程通信 2023-09-03
- python使用pandas读写excel文件的方法实例 2022-08-30
- python通过pillow识别动态验证码的示例代码 2023-08-08
- pytorch和tensorflow计算Flops和params的详细过程 2022-08-30
- 基于Python实现评论区抽奖功能详解 2023-08-09
- Python多线程入门学习 2023-08-04
- python-是否有一种方法可以从subprocess.Popen实例中读取字符,当它调用的进程尚未发出换行符时? 2023-11-15
- python服务器环境搭建Flask,uwsgi和nginx 2023-09-03
- windows下python环境安装 2023-09-04
- c#-用于CPU /内存密集型任务的最佳(python)设置 2023-11-10