[转帖]记录自己安装内存带宽测试工具——Stream过程

记录,自己,安装,内存,带宽,测试工具,stream,过程 · 浏览次数 : 0

小编点评

**测试环境：** * CPU: Kunpeng 920 8Core * MEM: 16G * Storage: 200GOS * 操作系统: openEuler 20.03 (LTS-SP3) * 服务器资源监控工具——Stream 1.1 **编译安装：** 1. 下载源代码：wget http://www.cs.virginia.edu/stream/FTP/Code/stream.c 2. 解压编译：gcc -O3 -fopenmp -DN=2000000 -DNTIMES=10 stream.c -o stream1 **参数说明：** * `-O3`: 指定最高编译优化级别，即 3-fopenmp：启用 OpenMP，适应多处理器环境，更能得到内存带宽实际最大值。开启后，程序默认运行线程为 CPU 线数 -DN=2000000：指定测试数组 `a[]、b[]、c[]` 的大小（Array size）。该值对测试结果影响较大（5.9 版本默认值 2000000，若 stream.c 为 5.10 版本，参数名变为 `-DSTREAM_ARRAY_SIZE`，默认值 10000000）。 * `-mtune=native -march=native`: 针对 CPU 指令的优化，此处由于编译机即运行机器。 **编译输出：** 运行可执行文件（stream）时，可查看输出中 `Best Rate` 和 `Avg time` 等指标。 **使用指令：** 1. 设置运行线程数量：export OMP_NUM_THREADS=X 2. 在编译输出目录下运行：./stream **注意：** * `N` 的预processor变量可能与运行时 `STREAM_ARRAY_SIZE` 有关。 * 建议测试不同数组大小以找到最佳性能。 * `WARNING` 消息中提供一些关于数组大小和内存需求的信息。

正文

测试环境：

CPU:Kunpeng 920 8Core
MEM:16G
Storage:200G
OS:openEuler 20.03 (LTS-SP3)
复制

1 服务器资源监控工具——Stream

1.1 编译安装——Stream

源码编译安装

下载源码：

wget http://www.cs.virginia.edu/stream/FTP/Code/stream.c
复制

解压编译：

gcc -O3 -fopenmp -DN=2000000 -DNTIMES=10 stream.c -o stream
1
复制

参数说明：

-O3：
	指定最高编译优化级别，即3
-fopenmp：
启用OpenMP，适应多处理器环境，更能得到内存带宽实际最大值。开启后，程序默认运行线程为CPU线程数
-DN=2000000：
指定测试数组a[]、b[]、c[]的大小（Array size）。该值对测试结果影响较大（5.9版本默认值2000000,。若stream.c为5.10版本，参数名变为-DSTREAM_ARRAY_SIZE，默认值10000000）。注意：必须设置测试数组大小远大于CPU 最高级缓存（一般为L3 Cache）的大小，否则就是测试CPU缓存的吞吐性能，而非内存吞吐性能。
推荐计算公式：{最高级缓存X MB}×1024×1024×4.1×CPU路数/8，结果取整数
解释：由于stream.c源码推荐设置至少4倍最高级缓存，且STREAM_ARRAY_SIZE为double类型=8 Byte。所以公式为：最高级缓存(单位：Byte)×4.1倍×CPU路数/8
例如：测试机器是双路CPU，最高级缓存32MB，则计算值为32×1024×1024×4.1×2/8≈34393292
-DNTIMES=10：
执行的次数，并从这些结果中选最优值。
stream.c：
待编译的源码文件
stream：
输出的可执行文件名
其他参数：
-mtune=native -march=native：
针对CPU指令的优化，此处由于编译机即运行机器。故采用native的优化方法。更多编译器对CPU的优化参考
-mcmodel=medium：
当单个Memory Array Size 大于2GB时需要设置此参数
-DOFFSET=4096：
数组的偏移，一般可以不定义
复制

1.2 验证与运行——Stream

运行

使用命令指定运行线程为X：

export OMP_NUM_THREADS=X
1
复制

在编译输出的可执行文件（stream）所在目录下运行：

./stream
复制

返回结果：

[root@controller ~]# export OMP_NUM_THREADS=4
[root@controller ~]# ./stream
-------------------------------------------------------------
STREAM version $Revision: 5.10 $
-------------------------------------------------------------
This system uses 8 bytes per array element.
-------------------------------------------------------------
*****  WARNING: ******
      It appears that you set the preprocessor variable N when compiling this code.
      This version of the code uses the preprocesor variable STREAM_ARRAY_SIZE to control the array size
      Reverting to default value of STREAM_ARRAY_SIZE=10000000
*****  WARNING: ******
Array size = 10000000 (elements), Offset = 0 (elements)
Memory per array = 76.3 MiB (= 0.1 GiB).
Total memory required = 228.9 MiB (= 0.2 GiB).
Each kernel will be executed 10 times.
 The *best* time for each kernel (excluding the first iteration)
 will be used to compute the reported bandwidth.
-------------------------------------------------------------
Number of Threads requested = 4
Number of Threads counted = 4
-------------------------------------------------------------
Your clock granularity/precision appears to be 1 microseconds.
Each test below will take on the order of 4240 microseconds.
   (= 4240 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
-------------------------------------------------------------
Function    Best Rate MB/s  Avg time     Min time     Max time
Copy:           31305.2     0.005203     0.005111     0.005255
Scale:          36232.0     0.004490     0.004416     0.004747
Add:            36457.7     0.006733     0.006583     0.007083
Triad:          36933.9     0.006661     0.006498     0.006960
-------------------------------------------------------------
Solution Validates: avg error less than 1.000000e-13 on all three arrays
-------------------------------------------------------------
复制

1.3 其他——Stream

Steam源码

http://www.cs.virginia.edu/stream/FTP/Code/
复制

Git仓库

https://github.com/jeffhammond/STREAM
复制

</article>
复制

[转帖]记录自己安装内存带宽测试工具——Stream过程

小编点评

正文

1 服务器资源监控工具——Stream

1.1 编译安装——Stream

1.2 验证与运行——Stream

1.3 其他——Stream

与[转帖]记录自己安装内存带宽测试工具——Stream过程相似的内容：

[转帖]记录自己安装内存带宽测试工具——Stream过程

[转帖]perf学习-linux自带性能分析工具

[转帖]perf学习-linux自带性能分析工具

[转帖]SSL 配置优化的若干建议

[转帖]hibernate QueryPlanCache引发的heap区内存溢出

[转帖]系统中出现大量不可中断进程和僵尸进程

[转帖]【OS】OSWbb（OSWatcher Black Box）的简介和使用

[转帖]Cilium架构 (Cilium 2)

[转帖]方神: 银河麒麟V10SP1桥接配置网卡总结

[转帖]Innodb存储引擎-锁(数据库锁的查看、快照读&当前读、MVCC、自增长与锁、外键与锁、行锁、并发事务的问题、阻塞、死锁、锁升级、锁的实现)

# 热门排行