ContextSwitch 学习与使用

contextswitch,学习,使用 · 浏览次数 : 189

小编点评

归纳总结以上内容,生成内容时需要带简单的排版,例如: 1. 将内容排版成1行,每个内容之间用空格隔开。 2. 将内容排版成多个行,每行包含1-3个内容之间用空格隔开。 3. 使用标题将内容排版成1行,标题之间用空格隔开。 例如: 1. 首先将内容排版成1行,每个内容之间用空格隔开。 2. 然后将内容排版成多个行,每行包含1-3个内容之间用空格隔开。 3. 使用标题将内容排版成1行,标题之间用空格隔开。 例如: **1. 内容排版成1行** AMD EPYC 9T34 Processor 64-Core Processor 1 physical CPUs, 8 cores/CPU, 2 hardware threads/core = 16 hw threads total-- No CPU affinity --10000000 system calls in 553414290ns (55.3ns/syscall)2000000 process context switches in 1963917388ns (982.0ns/ctxsw)2000000 thread context switches in 2559121196ns (1279.6ns/ctxsw)2000000 thread context switches in 550724648ns (275.4ns/ctxsw) **2. 内容排版成多个行** AMD EPYC 9T34 Processor 64-Core Processor 1 physical CPUs, 8 cores/CPU, 2 hardware threads/core = 16 hw threads total-- No CPU affinity --10000000 system calls in 553414290ns (55.3ns/syscall)2000000 process context switches in 1963917388ns (982.0ns/ctxsw)2000000 thread context switches in 2559121196ns (1279.6ns/ctxsw)2000000 thread context switches in 550724648ns (275.4ns/ctxsw) **3. 使用标题将内容排版成1行** AMD EPYC 9T34 Processor 64-Core Processor 1 physical CPUs, 8 cores/CPU, 2 hardware threads/core = 16 hw threads total-- No CPU affinity --10000000 system calls in 553414290ns (55.3ns/syscall)2000000 process context switches in 1963917388ns (982.0ns/ctxsw)2000000 thread context switches in 2559121196ns (1279.6ns/ctxsw)2000000 thread context switches in 550724648ns (275.4ns/ctxsw) **标题:AMD EPYC 9T34 Processor 64-Core Processor 1 physical CPUs, 8 cores/CPU, 2 hardware threads/core = 16 hw threads total-- No CPU affinity --10000000 system calls in 553414290ns (55.3ns/syscall)2000000 process context switches in 1963917388ns (982.0ns/ctxsw)2000000 thread context switches in 2559121196ns (1279.6ns/ctxsw)2000000 thread context switches in 550724648ns (275.4ns/ctxsw)**

正文

ContextSwitch 学习与使用


说明

github上面有一个简单的测试系统调用以及上下文切换的工具.
contextswitch. 
下载之后直接make就可以进行简单的测试

需要注意的是 部分arm环境没有: 
-mno-avx 
这个参数, 需要去掉一下. 

官方文档以及说明

Little micro-benchmarks to assess the performance overhead of context
switching.

timesyscall: Benchmarks the overhead of a system call.
timectxsw:   Benchmarks the overhead of context switching between 2 processes.
timetctxsw:  Benchmarks the overhead of context switching between 2 threads.
timectxswws: Benchmarks the overhead of context switching between 2 processes
             using a working set of the size specified in argument.
timetctxsw2: Benchmarks the overhead of context switching between 2 threads,
             by using a shed_yield() method.
             If you do taskset -a 1, all threads should be scheduled on the
             same processor, so you are really doing thread context switch.
             Then to be sure that you are really doing it, just do:
               strace -ff -tt -v taskset -a 1 ./timetctxsw2
             Now why sched_yield() is enough for testing ? Because, it place
             the current thread at the end of the ready queue. So the next
             ready thread will be scheduled.
             I also added sched_setscheduler(SCHED_FIFO) to get the best
             performances.
From: https://github.com/tsuna/contextswitch       

脚本说明

runbench() {
  $* ./timesyscall
  $* ./timectxsw
  $* ./timetctxsw
  $* ./timetctxsw2
}
每一组测试内的内容分别为:

1. 系统调用的时间.
2. 2个进程之间的上下文切换的时间.
3. 同一进程内的连个线程切换的时间.
4. shed_yield() method 方法的切换时间 (不太了解)

一共分为三组
第一组不进行设置
第二组绑定CPU但是在两个核心上
第三组绑定到同一个CPU核心上面.

测试结果说明

在我所有的测试环境内: 
1. AMD 9T34 无可争议的排第一
2. 相同硬件不同操作系统的差异比较大, 如果比较必须使用相同的操作系统来进行.
3. 国产里面与SPECJVM和SPECCPU的结果完全一样.飞腾<海光<鲲鹏<阿里倚天
   阿里倚天无可争议的王者. 
4. 十年前的CPU的确不如现在新的CPU. 必须更新换代,性能更好,速度更快. 
5. CPU绑核非常有用途,需要进行优化. 
6. 协程,轻量级线程是未来. 只有这样性能才会好.    

结果图表-1


结果图表-2


E5-2620 2.0Ghz

2 physical CPUs, 6 cores/CPU, 2 hardware threads/core = 24 hw threads total
-- No CPU affinity --
10000000 system calls in 11841646290ns (1184.2ns/syscall)
2000000 process context switches in 6039748545ns (3019.9ns/ctxsw)
2000000  thread context switches in 6745297188ns (3372.6ns/ctxsw)
sched_setscheduler(): Operation not permitted
2000000  thread context switches in 755823488ns (377.9ns/ctxsw)
-- With CPU affinity --
10000000 system calls in 14343751134ns (1434.4ns/syscall)
2000000 process context switches in 16353343542ns (8176.7ns/ctxsw)
2000000  thread context switches in 13617487377ns (6808.7ns/ctxsw)
sched_setscheduler(): Operation not permitted
2000000  thread context switches in 2363107269ns (1181.6ns/ctxsw)
-- With CPU affinity to CPU 0 --
10000000 system calls in 11929472188ns (1192.9ns/syscall)
2000000 process context switches in 6915983386ns (3458.0ns/ctxsw)
2000000  thread context switches in 6837489882ns (3418.7ns/ctxsw)
sched_setscheduler(): Operation not permitted
2000000  thread context switches in 795652256ns (397.8ns/ctxsw)

Intel(R) Xeon(R) Gold 5118 CPU @ 2.30GHz 云海OS虚拟机

1 physical CPUs, 8 cores/CPU, 1 hardware threads/core = 8 hw threads total
-- No CPU affinity --
10000000 system calls in 2841917410ns (284.2ns/syscall)
2000000 process context switches in 7404178178ns (3702.1ns/ctxsw)
2000000  thread context switches in 7502081647ns (3751.0ns/ctxsw)
sched_setscheduler(): Operation not permitted
2000000  thread context switches in 222130514ns (111.1ns/ctxsw)
-- With CPU affinity --
10000000 system calls in 2835862084ns (283.6ns/syscall)
2000000 process context switches in 4990890087ns (2495.4ns/ctxsw)
2000000  thread context switches in 4311646652ns (2155.8ns/ctxsw)
sched_setscheduler(): Operation not permitted
2000000  thread context switches in 870608240ns (435.3ns/ctxsw)
-- With CPU affinity to CPU 0 --
10000000 system calls in 2844931708ns (284.5ns/syscall)
2000000 process context switches in 7601947691ns (3801.0ns/ctxsw)
2000000  thread context switches in 7914561498ns (3957.3ns/ctxsw)
sched_setscheduler(): Operation not permitted
2000000  thread context switches in 247057805ns (123.5ns/ctxsw)

Intel(R) Xeon(R) Gold 5118 CPU @ 2.30GHz 云海OS物理机

2 physical CPUs, 12 cores/CPU, 2 hardware threads/core = 48 hw threads total
-- No CPU affinity --
10000000 system calls in 5769760409ns (577.0ns/syscall)
2000000 process context switches in 7245677219ns (3622.8ns/ctxsw)
2000000  thread context switches in 7069213271ns (3534.6ns/ctxsw)
sched_setscheduler(): Operation not permitted
2000000  thread context switches in 475086926ns (237.5ns/ctxsw)
-- With CPU affinity --
10000000 system calls in 5762431985ns (576.2ns/syscall)
2000000 process context switches in 8692364627ns (4346.2ns/ctxsw)
2000000  thread context switches in 6572286258ns (3286.1ns/ctxsw)
sched_setscheduler(): Operation not permitted
2000000  thread context switches in 1304249661ns (652.1ns/ctxsw)
-- With CPU affinity to CPU 0 --
10000000 system calls in 5774310295ns (577.4ns/syscall)
2000000 process context switches in 6869635514ns (3434.8ns/ctxsw)
2000000  thread context switches in 6927117249ns (3463.6ns/ctxsw)
sched_setscheduler(): Operation not permitted
2000000  thread context switches in 473255745ns (236.6ns/ctxsw)

飞腾S2500-物理机器-NFSV3

2 physical CPUs, 128 cores/CPU, 1 hardware threads/core = 256 hw threads total
-- No CPU affinity --
10000000 system calls in 3838470070ns (383.8ns/syscall)
2000000 process context switches in 10913991269ns (5457.0ns/ctxsw)
2000000  thread context switches in 10987973614ns (5494.0ns/ctxsw)
sched_setscheduler(): Operation not permitted
2000000  thread context switches in 354962539ns (177.5ns/ctxsw)
-- With CPU affinity --
10000000 system calls in 3851009222ns (385.1ns/syscall)
2000000 process context switches in 10500204985ns (5250.1ns/ctxsw)
2000000  thread context switches in 8605107251ns (4302.6ns/ctxsw)
sched_setscheduler(): Operation not permitted
2000000  thread context switches in 1694906366ns (847.5ns/ctxsw)
-- With CPU affinity to CPU 0 --
10000000 system calls in 3871134715ns (387.1ns/syscall)
2000000 process context switches in 8211223439ns (4105.6ns/ctxsw)
2000000  thread context switches in 8915611368ns (4457.8ns/ctxsw)
sched_setscheduler(): Operation not permitted
2000000  thread context switches in 362941497ns (181.5ns/ctxsw)

飞腾S2500-物理机器-银河麒麟V10

model name : HUAWEI,Kunpeng 920
2 physical CPUs, 128 cores/CPU, 1 hardware threads/core = 256 hw threads total
-- No CPU affinity --
10000000 system calls in 1104251960ns (110.4ns/syscall)
2000000 process context switches in 5502095280ns (2751.0ns/ctxsw)
2000000  thread context switches in 5057680610ns (2528.8ns/ctxsw)
2000000  thread context switches in 159336010ns (79.7ns/ctxsw)
-- With CPU affinity --
10000000 system calls in 1104213220ns (110.4ns/syscall)
2000000 process context switches in 3157105260ns (1578.6ns/ctxsw)
2000000  thread context switches in 2749304460ns (1374.7ns/ctxsw)
2000000  thread context switches in 520588690ns (260.3ns/ctxsw)
-- With CPU affinity to CPU 0 --
10000000 system calls in 1104361790ns (110.4ns/syscall)
2000000 process context switches in 2554260900ns (1277.1ns/ctxsw)
2000000  thread context switches in 2501093900ns (1250.5ns/ctxsw)
2000000  thread context switches in 159835540ns (79.9ns/ctxsw)

飞腾S2500-KVM虚拟机

10000000 system calls in 2016128780ns (201.6ns/syscall)
2000000 process context switches in 20813179318ns (10406.6ns/ctxsw)
2000000  thread context switches in 21270077053ns (10635.0ns/ctxsw)
2000000  thread context switches in 283497350ns (141.7ns/ctxsw)
-- With CPU affinity --
10000000 system calls in 2003773606ns (200.4ns/syscall)
2000000 process context switches in 7149973534ns (3575.0ns/ctxsw)
2000000  thread context switches in 6041671015ns (3020.8ns/ctxsw)
2000000  thread context switches in 1184706267ns (592.4ns/ctxsw)
-- With CPU affinity to CPU 0 --
10000000 system calls in 1996452026ns (199.6ns/syscall)
2000000 process context switches in 20093433102ns (10046.7ns/ctxsw)
2000000  thread context switches in 20838253803ns (10419.1ns/ctxsw)
2000000  thread context switches in 284723964ns (142.4ns/ctxsw)

海光机器

model name : Hygon C86 7285 32-core Processor
pgrep: cannot allocate 4611686018427387903 bytes
2 physical CPUs, 32 cores/CPU, 2 hardware threads/core = 128 hw threads total
-- No CPU affinity --
10000000 system calls in 1188373575ns (118.8ns/syscall)
2000000 process context switches in 7182741168ns (3591.4ns/ctxsw)
2000000  thread context switches in 5057264353ns (2528.6ns/ctxsw)
2000000  thread context switches in 218741918ns (109.4ns/ctxsw)
-- With CPU affinity --
10000000 system calls in 1199538092ns (120.0ns/syscall)
2000000 process context switches in 4926579090ns (2463.3ns/ctxsw)
2000000  thread context switches in 4116607893ns (2058.3ns/ctxsw)
2000000  thread context switches in 877003690ns (438.5ns/ctxsw)
-- With CPU affinity to CPU 0 --
10000000 system calls in 1207213049ns (120.7ns/syscall)
2000000 process context switches in 4803238321ns (2401.6ns/ctxsw)
2000000  thread context switches in 5033478360ns (2516.7ns/ctxsw)
2000000  thread context switches in 218102516ns (109.1ns/ctxsw)

鲲鹏机器

2 physical CPUs, 128 cores/CPU, 1 hardware threads/core = 256 hw threads total
-- No CPU affinity --
10000000 system calls in 1628256836ns (162.8ns/syscall)
2000000 process context switches in 3567828849ns (1783.9ns/ctxsw)
2000000  thread context switches in 3366796751ns (1683.4ns/ctxsw)
2000000  thread context switches in 208056729ns (104.0ns/ctxsw)
-- With CPU affinity --
10000000 system calls in 3957162873ns (395.7ns/syscall)
2000000 process context switches in 66176473553ns (33088.2ns/ctxsw)
2000000  thread context switches in 64858764678ns (32429.4ns/ctxsw)
2000000  thread context switches in 9224336984ns (4612.2ns/ctxsw)
-- With CPU affinity to CPU 0 --
10000000 system calls in 1658580824ns (165.9ns/syscall)
2000000 process context switches in 4162672768ns (2081.3ns/ctxsw)
2000000  thread context switches in 3930988507ns (1965.5ns/ctxsw)
2000000  thread context switches in 206905930ns (103.5ns/ctxsw)

Intel 8369HB 3.3Ghz

10000000 system calls in 2039800553ns (204.0ns/syscall)
2000000 process context switches in 3484116193ns (1742.1ns/ctxsw)
2000000  thread context switches in 3504345370ns (1752.2ns/ctxsw)
sched_setscheduler(): Operation not permitted
2000000  thread context switches in 163336302ns (81.7ns/ctxsw)
-- With CPU affinity --
10000000 system calls in 2042749498ns (204.3ns/syscall)
2000000 process context switches in 3512477901ns (1756.2ns/ctxsw)
2000000  thread context switches in 3037479215ns (1518.7ns/ctxsw)
sched_setscheduler(): Operation not permitted
2000000  thread context switches in 589604636ns (294.8ns/ctxsw)
-- With CPU affinity to CPU 0 --
10000000 system calls in 2037861063ns (203.8ns/syscall)
2000000 process context switches in 3543912186ns (1772.0ns/ctxsw)
2000000  thread context switches in 3575216872ns (1787.6ns/ctxsw)
sched_setscheduler(): Operation not permitted
2000000  thread context switches in 164079529ns (82.0ns/ctxsw)

阿里倚天710

1 physical CPUs, 8 cores/CPU, 1 hardware threads/core = 8 hw threads total
-- No CPU affinity --
10000000 system calls in 672626352ns (67.3ns/syscall)
2000000 process context switches in 3586487130ns (1793.2ns/ctxsw)
2000000  thread context switches in 3228362627ns (1614.2ns/ctxsw)
sched_setscheduler(): Operation not permitted
2000000  thread context switches in 102817391ns (51.4ns/ctxsw)
-- With CPU affinity --
10000000 system calls in 672290182ns (67.2ns/syscall)
2000000 process context switches in 1990312435ns (995.2ns/ctxsw)
2000000  thread context switches in 1682598464ns (841.3ns/ctxsw)
sched_setscheduler(): Operation not permitted
2000000  thread context switches in 328222163ns (164.1ns/ctxsw)
-- With CPU affinity to CPU 0 --
10000000 system calls in 672409838ns (67.2ns/syscall)
2000000 process context switches in 3347526340ns (1673.8ns/ctxsw)
2000000  thread context switches in 3100110717ns (1550.1ns/ctxsw)
sched_setscheduler(): Operation not permitted
2000000  thread context switches in 102631615ns (51.3ns/ctxsw)

AMD 9T34

model name : AMD EPYC 9T34 64-Core Processor
1 physical CPUs, 8 cores/CPU, 2 hardware threads/core = 16 hw threads total
-- No CPU affinity --
10000000 system calls in 553414290ns (55.3ns/syscall)
2000000 process context switches in 1963917388ns (982.0ns/ctxsw)
2000000  thread context switches in 2131473467ns (1065.7ns/ctxsw)
2000000  thread context switches in 115396178ns (57.7ns/ctxsw)
-- With CPU affinity --
10000000 system calls in 554322086ns (55.4ns/syscall)
2000000 process context switches in 2730693871ns (1365.3ns/ctxsw)
2000000  thread context switches in 2559121196ns (1279.6ns/ctxsw)
2000000  thread context switches in 550724648ns (275.4ns/ctxsw)
-- With CPU affinity to CPU 0 --
10000000 system calls in 553295602ns (55.3ns/syscall)
2000000 process context switches in 2011838005ns (1005.9ns/ctxsw)
2000000  thread context switches in 2027328701ns (1013.7ns/ctxsw)
2000000  thread context switches in 114914625ns (57.5ns/ctxsw)

与ContextSwitch 学习与使用相似的内容: