一、概括

嵌入式开发中对要设计的产品、立项的项目进行设计时,往往需要对关键芯片进行性能评估,本文主要总结基于linux系统的产品在性能评估时的工具使用总结,在aarch64(arm64平台下测试),板卡根文件系统为debian系统。
工具列表如下:

名称作用git源码链接
lmbench带宽测评,反应时间测评https://github.com/redrose2100/lmbench.git
stream内存带宽(每秒通过的字节数)测试https://github.com/jeffhammond/STREAM.git
unixbench测试 unix 系统基本性能,测试的结果不仅仅只是CPU,内存,或者磁盘为基准,还取决于硬件,操作系统版本,编译器.https://github.com/kdlucas/byte-unixbench.git
cyclictest 和 stress-ng实时性测试压力工具 git clone https://github.com/ColinIanKing/stress-ng.git ,测试工具git://git.kernel.org/pub/scm/utils/rt-tests/rt-tests.git

二、stream

1、编译
修改Makefile为以下

CC ?= gcc
CFLAGS = -O3    -fno-PIC  -mcmodel=large    -fopenmp -DSTREAM_ARRAY_SIZE=200000000 -DNTIMES=30

all: stream
clean:
        rm -f stream  *.o

stream: stream.c
        $(CC)   $(CFLAGS) stream.c -o stream

export CC=aarch64-linux-gnu-gcc
make
2、将编译后的stream拷贝到嵌入式板卡中
3、运行测试
单线程
export OMP_NUM_THREADS=1
./stream > stream-result-1thread.txt

多线程,以8线程为例,这里cpu核数为8,如果是单核单线程的话最大支持到8
export OMP_NUM_THREADS=8
export GOMP_CPU_AFFINITY=0-7
./stream > stream-result-8thread.txt
4、运行出现的问题
提示如下错误

./stream: error while loading shared libraries: libgomp.so.1: cannot open shared object file: No such file or directory

debian环境下 dpkg -i libgomp1_8.3.0-6_arm64.deb
再次
buildroot环境下编译时打开对此包编译的选项;
5、结果和解释
单线程

-------------------------------------------------------------
STREAM version $Revision: 5.10 $
-------------------------------------------------------------
This system uses 8 bytes per array element.
-------------------------------------------------------------
Array size = 200000000 (elements), Offset = 0 (elements)
Memory per array = 1525.9 MiB (= 1.5 GiB).
Total memory required = 4577.6 MiB (= 4.5 GiB).
Each kernel will be executed 30 times.
 The *best* time for each kernel (excluding the first iteration)
 will be used to compute the reported bandwidth.
-------------------------------------------------------------
Number of Threads requested = 1
Number of Threads counted = 1
-------------------------------------------------------------
Your clock granularity/precision appears to be 1 microseconds.
Each test below will take on the order of 347944 microseconds.
   (= 347944 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
-------------------------------------------------------------
Function    Best Rate MB/s  Avg time     Min time     Max time
Copy:           11397.9     0.280894     0.280754     0.281033
Scale:          10245.3     0.312539     0.312339     0.313669
Add:             8855.1     0.542250     0.542060     0.542685
Triad:           8857.6     0.542100     0.541906     0.542925
-------------------------------------------------------------
Solution Validates: avg error less than 1.000000e-13 on all three arrays
-------------------------------------------------------------

多线程

-------------------------------------------------------------
STREAM version $Revision: 5.10 $
-------------------------------------------------------------
This system uses 8 bytes per array element.
-------------------------------------------------------------
Array size = 200000000 (elements), Offset = 0 (elements)
Memory per array = 1525.9 MiB (= 1.5 GiB).
Total memory required = 4577.6 MiB (= 4.5 GiB).
Each kernel will be executed 30 times.
 The *best* time for each kernel (excluding the first iteration)
 will be used to compute the reported bandwidth.
-------------------------------------------------------------
Number of Threads requested = 8
Number of Threads counted = 8
-------------------------------------------------------------
Your clock granularity/precision appears to be 1 microseconds.
Each test below will take on the order of 339367 microseconds.
   (= 339367 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
-------------------------------------------------------------
Function    Best Rate MB/s  Avg time     Min time     Max time
Copy:           14378.2     0.223113     0.222559     0.223810
Scale:          12578.0     0.257082     0.254413     0.260384
Add:            10312.8     0.468002     0.465440     0.470596
Triad:           8938.6     0.542479     0.536994     0.548937
-------------------------------------------------------------
Solution Validates: avg error less than 1.000000e-13 on all three arrays
-------------------------------------------------------------

说明
关注以下四行的每秒字节数和时间

Function    Best Rate MB/s  Avg time     Min time     Max time
Copy:           14378.2     0.223113     0.222559     0.223810
Scale:          12578.0     0.257082     0.254413     0.260384
Add:            10312.8     0.468002     0.465440     0.470596
Triad:           8938.6     0.542479     0.536994     0.548937

点赞(0) 打赏

评论列表 共有 0 条评论

暂无评论

微信公众账号

微信扫一扫加关注

发表
评论
返回
顶部