第二天·在Linux中安装Linpack并测试
实验环境
系统环境
Red Hat Enterprise Linux 7
硬件环境
lspci
00:00.0 Host bridge: Intel Corporation 440FX - 82441FX PMC [Natoma] (rev 02)
00:01.0 ISA bridge: Intel Corporation 82371SB PIIX3 ISA [Natoma/Triton II]
00:01.1 IDE interface: Intel Corporation 82371SB PIIX3 IDE [Natoma/Triton II]
00:01.2 USB controller: Intel Corporation 82371SB PIIX3 USB [Natoma/Triton II] (rev 01)
00:01.3 Bridge: Intel Corporation 82371AB/EB/MB PIIX4 ACPI (rev 03)
00:02.0 VGA compatible controller: Cirrus Logic GD 5446
00:03.0 Ethernet controller: Red Hat, Inc. Virtio network device
00:04.0 SCSI storage controller: Red Hat, Inc. Virtio block device
00:05.0 Unclassified device [00ff]: Red Hat, Inc. Virtio memory balloon
lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 4
On-line CPU(s) list: 0-3
Thread(s) per core: 1
Core(s) per socket: 1
座: 4
NUMA 节点: 1
厂商 ID: GenuineIntel
CPU 系列: 6
型号: 94
型号名称: Intel Core Processor (Skylake)
步进: 3
CPU MHz: 1696.072
BogoMIPS: 3392.14
超管理器厂商: KVM
虚拟化类型: 完全
L1d 缓存: 32K
L1i 缓存: 32K
L2 缓存: 4096K
L3 缓存: 16384K
NUMA 节点0 CPU: 0-3
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat
pse36 clflush mmx fxsr sse sse2 ss syscall nx pdpe1gb rdtscp lm constant_tsc rep_good
nopl xtopology eagerfpu pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe
popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch
fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm rdseed adx smap xsaveopt arat
cat /proc/cpuinfo
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 94
model name : Intel Core Processor (Skylake)
stepping : 3
microcode : 0x1
cpu MHz : 1696.072
cache size : 16384 KB
physical id : 0
siblings : 1
core id : 0
cpu cores : 1
apicid : 0
initial apicid : 0
fpu : yes
fpu_exception : yes
cpuid level : 13
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat
pse36 clflush mmx fxsr sse sse2 ss syscall nx pdpe1gb rdtscp lm constant_tsc rep_good
nopl xtopology eagerfpu pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe
popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm
3dnowprefetch fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm rdseed adx
smap xsaveopt arat
bogomips : 3392.14
clflush size : 64
cache_alignment : 64
address sizes : 46 bits physical, 48 bits virtual
power management:
processor : 1
vendor_id : GenuineIntel
cpu family : 6
model : 94
model name : Intel Core Processor (Skylake)
stepping : 3
microcode : 0x1
cpu MHz : 1696.072
cache size : 16384 KB
physical id : 1
siblings : 1
core id : 0
cpu cores : 1
apicid : 1
initial apicid : 1
fpu : yes
fpu_exception : yes
cpuid level : 13
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat
pse36 clflush mmx fxsr sse sse2 ss syscall nx pdpe1gb rdtscp lm constant_tsc rep_good
nopl xtopology eagerfpu pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe
popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch
fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm rdseed adx smap xsaveopt arat
bogomips : 3392.14
clflush size : 64
cache_alignment : 64
address sizes : 46 bits physical, 48 bits virtual
power management:
processor : 2
vendor_id : GenuineIntel
cpu family : 6
model : 94
model name : Intel Core Processor (Skylake)
stepping : 3
microcode : 0x1
cpu MHz : 1696.072
cache size : 16384 KB
physical id : 2
siblings : 1
core id : 0
cpu cores : 1
apicid : 2
initial apicid : 2
fpu : yes
fpu_exception : yes
cpuid level : 13
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36
clflush mmx fxsr sse sse2 ss syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl
xtopology eagerfpu pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt
tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch fsgsbase
tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm rdseed adx smap xsaveopt arat
bogomips : 3392.14
clflush size : 64
cache_alignment : 64
address sizes : 46 bits physical, 48 bits virtual
power management:
processor : 3
vendor_id : GenuineIntel
cpu family : 6
model : 94
model name : Intel Core Processor (Skylake)
stepping : 3
microcode : 0x1
cpu MHz : 1696.072
cache size : 16384 KB
physical id : 3
siblings : 1
core id : 0
cpu cores : 1
apicid : 3
initial apicid : 3
fpu : yes
fpu_exception : yes
cpuid level : 13
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36
clflush mmx fxsr sse sse2 ss syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl
xtopology eagerfpu pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt
tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch fsgsbase
tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm rdseed adx smap xsaveopt arat
bogomips : 3392.14
clflush size : 64
cache_alignment : 64
address sizes : 46 bits physical, 48 bits virtual
power management:
free -m
total used free shared buff/cache available
Mem: 3950 87 3687 16 175 3637
Swap: 0 0 0
dmidecode -t memory
# dmidecode 3.0
Getting SMBIOS data from sysfs.
SMBIOS 2.8 present.
Handle 0x1000, DMI type 16, 23 bytes
Physical Memory Array
Location: Other
Use: System Memory
Error Correction Type: Multi-bit ECC
Maximum Capacity: 4 GB
Error Information Handle: Not Provided
Number Of Devices: 1
Handle 0x1100, DMI type 17, 40 bytes
Memory Device
Array Handle: 0x1000
Error Information Handle: Not Provided
Total Width: Unknown
Data Width: Unknown
Size: 4096 MB
Form Factor: DIMM
Set: None
Locator: DIMM 0
Bank Locator: Not Specified
Type: RAM
Type Detail: Other
Speed: Unknown
Manufacturer: Red Hat
Serial Number: Not Specified
Asset Tag: Not Specified
Part Number: Not Specified
Rank: Unknown
Configured Clock Speed: Unknown
Minimum Voltage: Unknown
Maximum Voltage: Unknown
Configured Voltage: Unknown
安装过程
安装MPICH2
安装依赖
yum install gcc gcc-c++ gcc-gfortran openssl-devel -y
从http://www.mpich.org/static/downloads/1.1.1p1/mpich2-1.1.1p1.tar.gz下载文件,
tar zxf mpich2-1.1.1p1.tar.gz
cd mpich2-1.1.1p1
./configure --prefix=/root/linpack/mpi --with-pm=smpd --enable-f77
make
make install
上述步骤完成后,编辑~/.bashrc
,增加一行:
PATH="$PATH:/root/linpack/mpi/bin"
添加MPICH可执行文件目录的环境变量。
执行
smpd -s
启动服务
在/etc/hosts
中127.0.0.1
字段后添加cu009.novalocal
,执行
cd examples
mpiexec -n 4
结果如下:
[root@cu009 examples]# mpiexec -n 4 ./cpi
Process 1 of 4 is on cu009.novalocal
Process 0 of 4 is on cu009.novalocal
Process 3 of 4 is on cu009.novalocal
Process 2 of 4 is on cu009.novalocal
pi is approximately 3.1415926544231239, Error is 0.0000000008333307
wall clock time = 0.000286
安装GotoBLAS
yum install patch -y #安装依赖
tar zxf GotoBLAS2-1.13.tar.gz
cd GotoBLAS2
vim Makefile.rule
将第10、17、20、27、41,即
TARGET = PENRYN
CC = gcc
FC = gfortran
BINARY=64
NUM_THREADS = 24
这几行的注释删去,保存。执行
./quickbuild.64bit
若出现如下错误:
/usr/bin/ld: 找不到 -l-l
collect2: 错误:ld 返回 1
make[1]: *** [../libgoto2_penryn-r1.13.so] 错误 1
make[1]: 离开目录“/root/linpack/GotoBLAS2/exports”
make: *** [shared] 错误 2
再执行一次make
就能成功。
安装HPL
tar zxvf hpl-2.0.tar.gz
mv phl-2.0 hpl
cd hpl
cp setup/Make.Linux_PII_FBLAS .
vim Make.Linux_PII_FBLAS
修改Make.Linux_PII_FLABS,将配置中的各项用下列值替换:
TOPdir = /root/linpack/hpl
MPdir = /root/linpack/mpi
LAdir = /root/linpack/GotoBLAS2
LAlib = $(LAdir)/libgoto2.a $(LAdir)/libgoto2.so
CC = /root/linpack/mpi/bin/mpicc
LINKER = /root/linpack/mpi/bin/mpif77
最后,执行
make arch=Linux_PII_FBLAS
编译成功后,就可以开始跑分。执行
cd bin/Linux_PII_FBLAS
mpiexec -np 4 ./xhpl
其中,HPL.dat
的配置如下:
HPLinpack benchmark input file
Innovative Computing Laboratory, University of Tennessee
HPL.out output file name (if any)
8 device out (6=stdout,7=stderr,file)
4 # of problems sizes (N)
29 30 34 35 Ns
4 # of NBs
1 2 3 4 NBs
0 PMAP process mapping (0=Row-,1=Column-major)
3 # of process grids (P x Q)
2 1 4 Ps
2 4 1 Qs
16.0 threshold
3 # of panel fact
0 1 2 PFACTs (0=left, 1=Crout, 2=Right)
2 # of recursive stopping criterium
2 4 NBMINs (>= 1)
1 # of panels in recursion
2 NDIVs
3 # of recursive panel fact.
0 1 2 RFACTs (0=left, 1=Crout, 2=Right)
1 # of broadcast
0 BCASTs (0=1rg,1=1rM,2=2rg,3=2rM,4=Lng,5=LnM)
1 # of lookahead depth
0 DEPTHs (>=0)
2 SWAP (0=bin-exch,1=long,2=mix)
64 swapping threshold
0 L1 in (0=transposed,1=no-transposed) form
0 U in (0=transposed,1=no-transposed) form
1 Equilibration (0=no,1=yes)
8 memory alignment in double (> 0)
输出结果(随机截取十组)如下:
================================================================================
T/V N NB P Q Time Gflops
--------------------------------------------------------------------------------
WR00L2R2 35 4 4 1 0.00 1.258e-01
--------------------------------------------------------------------------------
||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)= 0.0258957 ...... PASSED
================================================================================
T/V N NB P Q Time Gflops
--------------------------------------------------------------------------------
WR00L2R4 35 4 4 1 0.00 1.519e-01
--------------------------------------------------------------------------------
||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)= 0.0164438 ...... PASSED
================================================================================
T/V N NB P Q Time Gflops
--------------------------------------------------------------------------------
WR00C2L2 35 4 4 1 0.00 1.109e-01
--------------------------------------------------------------------------------
||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)= 0.0258957 ...... PASSED
================================================================================
T/V N NB P Q Time Gflops
--------------------------------------------------------------------------------
WR00C2L4 35 4 4 1 0.00 1.220e-01
--------------------------------------------------------------------------------
||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)= 0.0199397 ...... PASSED
================================================================================
T/V N NB P Q Time Gflops
--------------------------------------------------------------------------------
WR00C2C2 35 4 4 1 0.00 1.235e-01
--------------------------------------------------------------------------------
||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)= 0.0258957 ...... PASSED
================================================================================
T/V N NB P Q Time Gflops
--------------------------------------------------------------------------------
WR00C2C4 35 4 4 1 0.00 1.230e-01
--------------------------------------------------------------------------------
||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)= 0.0199397 ...... PASSED
================================================================================
T/V N NB P Q Time Gflops
--------------------------------------------------------------------------------
WR00C2R2 35 4 4 1 0.00 1.335e-01
--------------------------------------------------------------------------------
||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)= 0.0258957 ...... PASSED
================================================================================
T/V N NB P Q Time Gflops
--------------------------------------------------------------------------------
WR00C2R4 35 4 4 1 0.00 1.526e-01
--------------------------------------------------------------------------------
||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)= 0.0164438 ...... PASSED
================================================================================
T/V N NB P Q Time Gflops
--------------------------------------------------------------------------------
WR00R2L2 35 4 4 1 0.00 1.137e-01
--------------------------------------------------------------------------------
||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)= 0.0258957 ...... PASSED
================================================================================
T/V N NB P Q Time Gflops
--------------------------------------------------------------------------------
WR00R2L4 35 4 4 1 0.00 1.236e-01
--------------------------------------------------------------------------------
||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)= 0.0199397 ...... PASSED
================================================================================
查看完整跑分数据报告,请前往https://lenconda.top/asc/hpl.txt
参考
[1]. Linpack安装过程