V2EX = way to explore
V2EX 是一个关于分享和探索的地方
现在注册
已注册用户请  登录
astrophys
V2EX  ›  MacBook Pro

拿到新款 mbp 的 v 友,有兴趣的话能测一下 numpy scipy 的 benchmark 嘛

  •  1
     
  •   astrophys · 276 天前 · 3245 次点击
    这是一个创建于 276 天前的主题,其中的信息可能已经有所发展或是发生改变。

    测试脚本: https://gist.github.com/markus-beuckelmann/8bc25531b11158431a5b09a45abd6276

    很好奇这一代 M1 Pro Max 在 Python 科学计算上的提升有多大,之前 v 友测的上一代 M1 的算力在不谈功耗的情况下大概和 i5 互有胜负: https://v2ex.com/t/733777

    第 1 条附言  ·  276 天前
    贴一下我用的 16 寸 i9 64G ,用的 MKL 库 8 线程:

    Dotted two 4096x4096 matrices in 0.48 s.
    Dotted two vectors of length 524288 in 0.06 ms.
    SVD of a 2048x1024 matrix in 0.34 s.
    Cholesky decomposition of a 2048x2048 matrix in 0.07 s.
    Eigendecomposition of a 2048x2048 matrix in 3.27 s.

    This was obtained using the following Numpy configuration:
    blas_mkl_info:
    libraries = ['mkl_rt', 'pthread']
    library_dirs = ['/Users/jisuoqing/Workspace/code/miniconda3/lib']
    define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
    include_dirs = ['/Users/jisuoqing/Workspace/code/miniconda3/include']
    blas_opt_info:
    libraries = ['mkl_rt', 'pthread']
    library_dirs = ['/Users/jisuoqing/Workspace/code/miniconda3/lib']
    define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
    include_dirs = ['/Users/jisuoqing/Workspace/code/miniconda3/include']
    lapack_mkl_info:
    libraries = ['mkl_rt', 'pthread']
    library_dirs = ['/Users/jisuoqing/Workspace/code/miniconda3/lib']
    define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
    include_dirs = ['/Users/jisuoqing/Workspace/code/miniconda3/include']
    lapack_opt_info:
    libraries = ['mkl_rt', 'pthread']
    library_dirs = ['/Users/jisuoqing/Workspace/code/miniconda3/lib']
    define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
    include_dirs = ['/Users/jisuoqing/Workspace/code/miniconda3/include']
    Supported SIMD extensions in this NumPy install:
    baseline = SSE,SSE2,SSE3
    found = SSSE3,SSE41,POPCNT,SSE42,AVX,F16C,FMA3,AVX2
    not found = AVX512F,AVX512CD,AVX512_KNL,AVX512_SKX,AVX512_CLX,AVX512_CNL,AVX512_ICL
    第 2 条附言  ·  273 天前
    谢谢 @Aspector 找到将 numpy link 到 accelerate framework 的具体步骤,以及用不同 library 的情况下 benchmark 的比较,具体见帖子: https://www.reddit.com/r/Python/comments/qog8x3/if_you_are_using_apples_m1_macs_compiling_numpy
    33 条回复    2021-12-04 16:38:52 +08:00
    haogefeifei
        1
    haogefeifei  
       276 天前   ❤️ 1
    纯多核运算应该占不到什么便宜,不过哪来用丝毫不差就是了

    M1:
    Dotted two 4096x4096 matrices in 0.77 s.
    Dotted two vectors of length 524288 in 0.27 ms.
    SVD of a 2048x1024 matrix in 0.90 s.
    Cholesky decomposition of a 2048x2048 matrix in 0.11 s.
    Eigendecomposition of a 2048x2048 matrix in 7.55 s.

    虚拟机 AMD 3700X 4.1Ghz:
    Dotted two 4096x4096 matrices in 0.44 s.
    Dotted two vectors of length 524288 in 0.03 ms.
    SVD of a 2048x1024 matrix in 0.58 s.
    Cholesky decomposition of a 2048x2048 matrix in 0.10 s.
    Eigendecomposition of a 2048x2048 matrix in 6.16 s.
    dejavuwind
        2
    dejavuwind  
       276 天前
    size 应该定为多少合适?我来试一下 10 核 M1 Pro
    dejavuwind
        3
    dejavuwind  
       276 天前   ❤️ 1
    直接跑 M1 Pro
    好像不咋滴

    Dotted two 4096x4096 matrices in 0.67 s.
    Dotted two vectors of length 524288 in 0.26 ms.
    SVD of a 2048x1024 matrix in 1.04 s.
    Cholesky decomposition of a 2048x2048 matrix in 0.09 s.
    Eigendecomposition of a 2048x2048 matrix in 9.24 s.
    wilhexm
        4
    wilhexm  
       276 天前   ❤️ 1
    16 inch M1 Max 24 Core GPU

    Dotted two 4096x4096 matrices in 0.55 s.
    Dotted two vectors of length 524288 in 0.25 ms.
    SVD of a 2048x1024 matrix in 1.32 s.
    Cholesky decomposition of a 2048x2048 matrix in 0.08 s.
    Eigendecomposition of a 2048x2048 matrix in 6.79 s.
    pb941129
        5
    pb941129  
       276 天前 via iPhone   ❤️ 1
    之前帖子回复过 16 寸 i9 的跑分,刚在 Monterey 上跑了下,速度基本上一致。从楼上 M1 Pro 的速度来看,感觉如果是用于 Python 科学计算的话,M1 Pro 还是做不了啥事……
    Aspector
        6
    Aspector  
       276 天前
    Deprecated since version 1.20: The native libraries on macOS, provided by Accelerate, are not fit for use in NumPy since they have bugs that cause wrong output under easily reproducible conditions. If the vendor fixes those bugs, the library could be reinstated, but until then users compiling for themselves should use another linear algebra library or use the built-in (but slower) default, see the next section.

    现在的 numpy 用 Accelerate 了吗?苹果是没管这些 bug ?
    icyalala
        7
    icyalala  
       276 天前   ❤️ 1
    M1 不用 Accelerate 就相当于在 Intel 上不用 AVX2
    EyreYoung
        8
    EyreYoung  
       276 天前   ❤️ 1
    18 款 i7-8750 (好像是这个)
    Dotted two 2048x2048 matrices in 0.07 s.
    Dotted two vectors of length 262144 in 0.02 ms.
    SVD of a 1024x512 matrix in 0.05 s.
    Cholesky decomposition of a 1024x1024 matrix in 0.01 s.
    Eigendecomposition of a 1024x1024 matrix in 0.63 s.


    Dotted two 4096x4096 matrices in 0.63 s.
    Dotted two vectors of length 524288 in 0.10 ms.
    SVD of a 2048x1024 matrix in 0.35 s.
    Cholesky decomposition of a 2048x2048 matrix in 0.09 s.
    Eigendecomposition of a 2048x2048 matrix in 4.15 s.
    boboliu
        9
    boboliu  
       276 天前   ❤️ 1
    @Aspector #6
    @icyalala #7

    https://github.com/numpy/numpy/pull/18874

    > This pull request is to add support for Accelerate back to NumPy
    dbsquirrel
        10
    dbsquirrel  
       276 天前   ❤️ 1
    Dotted two 4096x4096 matrices in 1.85 s.
    Dotted two vectors of length 524288 in 0.24 ms.
    SVD of a 2048x1024 matrix in 0.68 s.
    Cholesky decomposition of a 2048x2048 matrix in 0.15 s.
    Eigendecomposition of a 2048x2048 matrix in 5.75 s.

    风扇直接起飞,mbp 2016 ( 2.9 GHz i5 )
    Aspector
        11
    Aspector  
       276 天前 via iPhone   ❤️ 1
    @boboliu 所以现在有没有用上啊…这个 commit 是今年春天的,怎么 v2 这两个帖子里测出来 M1 没变化?
    0Vincent0Zhang0
        12
    0Vincent0Zhang0  
       276 天前   ❤️ 1
    M1 Max 64g 现在的结果:

    Dotted two 4096x4096 matrices in 0.70 s.
    Dotted two vectors of length 524288 in 0.25 ms.
    SVD of a 2048x1024 matrix in 1.99 s.
    Cholesky decomposition of a 2048x2048 matrix in 0.10 s.
    Eigendecomposition of a 2048x2048 matrix in 10.36 s.

    还有待优化。
    dejavuwind
        13
    dejavuwind  
       276 天前
    跟环境好像有点关系吧
    两个 NOT_AVAILABLE 是不是对结果有影响? @astrophys @Aspector
    Dotted two 4096x4096 matrices in 0.65 s.
    Dotted two vectors of length 524288 in 0.26 ms.
    SVD of a 2048x1024 matrix in 0.93 s.
    Cholesky decomposition of a 2048x2048 matrix in 0.09 s.
    Eigendecomposition of a 2048x2048 matrix in 9.90 s.

    This was obtained using the following Numpy configuration:
    blas_mkl_info:
    NOT AVAILABLE
    blis_info:
    NOT AVAILABLE
    openblas_info:
    libraries = ['openblas', 'openblas']
    library_dirs = ['/opt/arm64-builds/lib']
    language = c
    define_macros = [('HAVE_CBLAS', None)]
    runtime_library_dirs = ['/opt/arm64-builds/lib']
    blas_opt_info:
    libraries = ['openblas', 'openblas']
    library_dirs = ['/opt/arm64-builds/lib']
    language = c
    define_macros = [('HAVE_CBLAS', None)]
    runtime_library_dirs = ['/opt/arm64-builds/lib']
    lapack_mkl_info:
    NOT AVAILABLE
    openblas_lapack_info:
    libraries = ['openblas', 'openblas']
    library_dirs = ['/opt/arm64-builds/lib']
    language = c
    define_macros = [('HAVE_CBLAS', None)]
    runtime_library_dirs = ['/opt/arm64-builds/lib']
    lapack_opt_info:
    libraries = ['openblas', 'openblas']
    library_dirs = ['/opt/arm64-builds/lib']
    language = c
    define_macros = [('HAVE_CBLAS', None)]
    runtime_library_dirs = ['/opt/arm64-builds/lib']
    Supported SIMD extensions in this NumPy install:
    baseline = NEON,NEON_FP16,NEON_VFPV4,ASIMD
    found = ASIMDHP
    not found = ASIMDDP
    astrophys
        14
    astrophys  
    OP
       276 天前
    贴个 2019 16 寸 i9 64g 的结果:

    Dotted two 4096x4096 matrices in 0.45 s.
    Dotted two vectors of length 524288 in 0.05 ms.
    SVD of a 2048x1024 matrix in 0.29 s.
    Cholesky decomposition of a 2048x2048 matrix in 0.07 s.
    Eigendecomposition of a 2048x2048 matrix in 3.23 s.
    astrophys
        15
    astrophys  
    OP
       276 天前
    @dejavuwind 用 MKL 和多线程肯定会快,我贴的是有 MKL 的。
    tiramice
        16
    tiramice  
       276 天前
    w-2175 虚拟机 8 核
    Dotted two 4096x4096 matrices in 0.29 s.
    Dotted two vectors of length 524288 in 0.03 ms.
    SVD of a 2048x1024 matrix in 0.50 s.
    Cholesky decomposition of a 2048x2048 matrix in 0.12 s.
    Eigendecomposition of a 2048x2048 matrix in 4.47 s.
    astrophys
        17
    astrophys  
    OP
       276 天前
    @Aspector 在 numpy 的 1.20.0 版本移除了 accelerate framework 的支持,今天正好有人问了这个问题: https://stackoverflow.com/questions/69848969/how-to-build-numpy-from-source-linked-to-apple-accelerate-framework#
    sharpy
        18
    sharpy  
       276 天前
    16 寸 i9
    Dotted two 4096x4096 matrices in 0.41 s.
    Dotted two vectors of length 524288 in 0.04 ms.
    SVD of a 2048x1024 matrix in 0.28 s.
    Cholesky decomposition of a 2048x2048 matrix in 0.07 s.
    Eigendecomposition of a 2048x2048 matrix in 2.89 s.
    volvo007
        19
    volvo007  
       276 天前
    2020 mbp13 intel 顶配

    Dotted two 4096x4096 matrices in 0.98 s.
    Dotted two vectors of length 524288 in 0.20 ms.
    SVD of a 2048x1024 matrix in 0.49 s.
    Cholesky decomposition of a 2048x2048 matrix in 0.11 s.
    Eigendecomposition of a 2048x2048 matrix in 4.16 s.
    cxxlxx
        20
    cxxlxx  
       275 天前
    @haogefeifei 为啥我 5900x 比你差好多,无论是 wsl 还是 Windows
    Dotted two 4096x4096 matrices in 0.39 s.
    Dotted two vectors of length 524288 in 0.14 ms.
    SVD of a 2048x1024 matrix in 1.34 s.
    Cholesky decomposition of a 2048x2048 matrix in 0.08 s.
    Eigendecomposition of a 2048x2048 matrix in 4.80 s.
    rpman
        21
    rpman  
       275 天前
    apple silicon 的支持还在修修补补阶段, 要用可以自己找 commit 去编译
    thedrwu
        22
    thedrwu  
       275 天前 via Android
    本地们调试能画图就行,运算丢给服务器和超算了
    astrophys
        23
    astrophys  
    OP
       275 天前
    @thedrwu 就是因为只在本地画图我才只关心 python ,有时候画点复杂的图还是需要点算力的😂
    yangbin9317
        24
    yangbin9317  
       275 天前
    Intel(R) Xeon(R) Silver 4216 CPU @ 2.10GHz

    Dotted two 4096x4096 matrices in 0.34 s.
    Dotted two vectors of length 524288 in 0.02 ms.
    SVD of a 2048x1024 matrix in 1.03 s.
    Cholesky decomposition of a 2048x2048 matrix in 0.61 s.
    Eigendecomposition of a 2048x2048 matrix in 9.66 s.
    20015jjw
        25
    20015jjw  
       275 天前
    16c mac pro / 96g

    Dotted two 4096x4096 matrices in 0.28 s.
    Dotted two vectors of length 524288 in 0.02 ms.
    SVD of a 2048x1024 matrix in 0.56 s.
    Cholesky decomposition of a 2048x2048 matrix in 0.07 s.
    Eigendecomposition of a 2048x2048 matrix in 4.00 s.

    我比较好奇的是,这么小规模的测试,误差很大吧...
    20015jjw
        26
    20015jjw  
       275 天前
    @20015jjw 再跑了一次

    Dotted two 4096x4096 matrices in 0.26 s.
    Dotted two vectors of length 524288 in 0.02 ms.
    SVD of a 2048x1024 matrix in 0.50 s.
    Cholesky decomposition of a 2048x2048 matrix in 0.07 s.
    Eigendecomposition of a 2048x2048 matrix in 3.77 s.

    第三个相差都 10%了...
    两次前后跑的,该跑的东西啥都没关
    astrophys
        27
    astrophys  
    OP
       275 天前
    @20015jjw 差个 10%无所谓,主要是看有没有大于 10%的明显差距
    MongkeMary
        28
    MongkeMary  
       275 天前
    16 寸低配 MBP M1 Pro 10 核

    Dotted two 4096x4096 matrices in 0.56 s.
    Dotted two vectors of length 524288 in 0.25 ms.
    SVD of a 2048x1024 matrix in 0.67 s.
    Cholesky decomposition of a 2048x2048 matrix in 0.08 s.
    Eigendecomposition of a 2048x2048 matrix in 6.88 s.
    MongkeMary
        29
    MongkeMary  
       275 天前
    @astrophys 有没有 MKL 还是很关键的,这种运输 openblas 的性能和 MKL 还是有差距的
    astrophys
        30
    astrophys  
    OP
       275 天前
    @MongkeMary 是的呀,m1 的话就看有没有用 accelerate framework 了
    shinecurve
        32
    shinecurve  
       269 天前
    暗影精灵 7
    i7-11800H

    Dotted two 4096x4096 matrices in 0.39 s.
    Dotted two vectors of length 524288 in 0.05 ms.
    SVD of a 2048x1024 matrix in 0.26 s.
    Cholesky decomposition of a 2048x2048 matrix in 0.08 s.
    Eigendecomposition of a 2048x2048 matrix in 2.57 s.

    给大家做一个参考
    lqcc
        33
    lqcc  
       247 天前   ❤️ 1
    M1 macbook air ,用的 accelerate 库编译的 numpy ,速度还可以。


    Dotted two 4096x4096 matrices in 0.60 s.
    Dotted two vectors of length 524288 in 0.11 ms.
    SVD of a 2048x1024 matrix in 0.52 s.
    Cholesky decomposition of a 2048x2048 matrix in 0.06 s.
    Eigendecomposition of a 2048x2048 matrix in 5.98 s.

    This was obtained using the following Numpy configuration:
    blas_mkl_info:
    NOT AVAILABLE
    blis_info:
    NOT AVAILABLE
    openblas_info:
    NOT AVAILABLE
    accelerate_info:
    extra_compile_args = ['-I/System/Library/Frameworks/vecLib.framework/Headers']
    extra_link_args = ['-Wl,-framework', '-Wl,Accelerate']
    define_macros = [('NO_ATLAS_INFO', 3), ('HAVE_CBLAS', None)]
    blas_opt_info:
    extra_compile_args = ['-I/System/Library/Frameworks/vecLib.framework/Headers']
    extra_link_args = ['-Wl,-framework', '-Wl,Accelerate']
    define_macros = [('NO_ATLAS_INFO', 3), ('HAVE_CBLAS', None)]
    lapack_mkl_info:
    NOT AVAILABLE
    openblas_lapack_info:
    NOT AVAILABLE
    openblas_clapack_info:
    NOT AVAILABLE
    flame_info:
    NOT AVAILABLE
    lapack_opt_info:
    extra_compile_args = ['-I/System/Library/Frameworks/vecLib.framework/Headers']
    extra_link_args = ['-Wl,-framework', '-Wl,Accelerate']
    define_macros = [('NO_ATLAS_INFO', 3), ('HAVE_CBLAS', None)]
    Supported SIMD extensions in this NumPy install:
    baseline = NEON,NEON_FP16,NEON_VFPV4,ASIMD
    found = ASIMDHP,ASIMDDP,ASIMDFHM
    not found =
    关于   ·   帮助文档   ·   API   ·   FAQ   ·   我们的愿景   ·   广告投放   ·   感谢   ·   实用小工具   ·   2787 人在线   最高记录 5497   ·     Select Language
    创意工作者们的社区
    World is powered by solitude
    VERSION: 3.9.8.5 · 29ms · UTC 13:36 · PVG 21:36 · LAX 06:36 · JFK 09:36
    Developed with CodeLauncher
    ♥ Do have faith in what you're doing.