用 adia64 这类软件测的是 L1 1-2ns (3-5 clock),L2 2-3.5ns(6-10clock),L3 10-15ns (20-50 clock), 主存 30-50ns(60-150clock) 但是一些书例如现代操作系统上写的 L1 是瞬间访问, L2 两三个时钟,而另一些书又和上面的数据更加接近
1
xenme 2019-01-02 13:33:19 +08:00 1
https://stackoverflow.com/questions/4087280/approximate-cost-to-access-various-caches-and-main-memory
Core i7 Xeon 5500 Series Data Source Latency (approximate) [Pg. 22] local L1 CACHE hit, ~4 cycles ( 2.1 - 1.2 ns ) local L2 CACHE hit, ~10 cycles ( 5.3 - 3.0 ns ) local L3 CACHE hit, line unshared ~40 cycles ( 21.4 - 12.0 ns ) local L3 CACHE hit, shared line in another core ~65 cycles ( 34.8 - 19.5 ns ) local L3 CACHE hit, modified in another core ~75 cycles ( 40.2 - 22.5 ns ) remote L3 CACHE (Ref: Fig.1 [Pg. 5]) ~100-300 cycles ( 160.7 - 30.0 ns ) local DRAM ~60 ns remote DRAM ~100 ns |
2
yanaraika 2019-01-02 13:41:55 +08:00 2
http://instlatx64.atw.hu/ MemLatX64 有更精确的数据
|
3
ryd994 2019-01-02 13:45:18 +08:00 via Android 1
寄存器才是指令直接访问
|
4
29EtwXn6t5wgM3fD 2019-01-02 13:57:59 +08:00 1
https://www.7-cpu.com/cpu/Cortex-A57.html
AMD Opteron A1170 (ARM Cortex-A57), 2.0 GHz, 28 nm. RAM: 16 GB. (Probably it's SoftIron Overdrive 3000 server, DDR3 RDIMM). L1 Data cache = 32 KB, 64 B/line, 2-WAY. L1 Instruction cache = 48 KB, 64 B/line, 3-WAY. L2 Cache = 1 MB (per 2 cores), 64 B/line, 16-WAY. L3 Cache = 8 MB (per 8 cores), 64 B/line, ?-WAY. L1 Data Cache Latency = 4 cycles for simple access via pointer L1 Data Cache Latency = 5 cycles for access with complex address calculation (size_t n, *p; n = p[n]). L2 Cache Latency = 18 cycles L3 Cache Latency = 60 cycles RAM Latency = 60 cycles + 124 ns |
5
29EtwXn6t5wgM3fD 2019-01-02 14:00:07 +08:00 1
https://www.7-cpu.com/cpu/Skylake_X.html
Intel i7-7820X (Skylake X), 8 cores, 4.3 GHz (Turbo Boost), Mesh 2.4 GHz, 14 nm. RAM: 4x 8 GB DDR4-3400 16-18-18-36. L1 Data cache = 32 KB, 64 B/line, 8-WAY L1 Instruction cache = 32 KB, 64 B/line, 8-WAY. L2 cache = 1024 KB, 64 B/line, 16-WAY L3 cache = 11 MB, 64 B/line, 11-WAY L1 Data Cache Latency = 4 cycles for simple access via pointer L1 Data Cache Latency = 5 cycles for access with complex address calculation (size_t n, *p; n = p[n]). L2 Cache Latency = 14 cycles L3 Cache Latency = 68 cycles (3.6 GHz) L3 Cache Latency = 79 cycles (4.3 GHz) (77-81 cycles for different cores) RAM Latency = 79 cycles + 50 ns 不管是 ARM 还是 x86 都需要 4/5 个时钟 |