Comparing speed between 2 cores vs 1 multi-threaded core

My colleague at Flashgrid, Mikhail Velikikh did a very interesting test case, and I want to share it with you.

He tested two Azure VMs:

DS2_v2 – 2 cores with 1 thread each
D2S_v5 – 1 core with 2 threads

So as you see the total number of threads is the same, and also both of them are Intel Xeon 8370C. The test showed that VM with 2 cores is processing more events than VM with 1 core and 2 threads.

The command used for this test is the following:

# curl --noproxy '*' -H Metadata:true -s -f "http://169.254.169.254/metadata/instance/compute?api-version=2017-12-01" | jq -r '.vmSize'

# lscpu

# sysbench --threads=2 cpu run

curl and lscpu commands are just showing current VM size and CPU info.

The output for DS2_v2 (processed 4282.14 events per second):

+ curl --noproxy '*' -H Metadata:true -s -f 'http://169.254.169.254/metadata/instance/compute?api-version=2017-12-01'
+ jq -r .vmSize
Standard_DS2_v2
+ lscpu
Architecture:                    x86_64
CPU op-mode(s):                  32-bit, 64-bit
Byte Order:                      Little Endian
Address sizes:                   46 bits physical, 48 bits virtual
CPU(s):                          2
On-line CPU(s) list:             0,1
Thread(s) per core:              1
Core(s) per socket:              2
Socket(s):                       1
NUMA node(s):                    1
Vendor ID:                       GenuineIntel
CPU family:                      6
Model:                           106
Model name:                      Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz
Stepping:                        6
CPU MHz:                         2793.436
BogoMIPS:                        5586.87
Hypervisor vendor:               Microsoft
Virtualization type:             full
L1d cache:                       96 KiB
L1i cache:                       64 KiB
L2 cache:                        2.5 MiB
L3 cache:                        48 MiB
NUMA node0 CPU(s):               0,1
Vulnerability Itlb multihit:     KVM: Mitigation: VMX unsupported
Vulnerability L1tf:              Mitigation; PTE Inversion
Vulnerability Mds:               Mitigation; Clear CPU buffers; SMT Host state unknown
Vulnerability Meltdown:          Mitigation; PTI
Vulnerability Mmio stale data:   Vulnerable: Clear CPU buffers attempted, no microcode; SMT Host state unknown
Vulnerability Retbleed:          Not affected
Vulnerability Spec store bypass: Vulnerable
Vulnerability Spectre v1:        Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2:        Mitigation; Retpolines, STIBP disabled, RSB filling, PBRSB-eIBRS Not affected
Vulnerability Srbds:             Not affected
Vulnerability Tsx async abort:   Mitigation; Clear CPU buffers; SMT Host state unknown
Flags:                           fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology cpuid pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase bmi1 hle avx2 smep bmi2 erms invpcid rtm avx512f avx512dq rdseed adx smap clflushopt avx512cd avx512bw avx512vl xsaveopt xsavec xsaves md_clear
+ sysbench --threads=2 cpu run
sysbench 1.0.20 (using bundled LuaJIT 2.1.0-beta2)

Running the test with following options:
Number of threads: 2
Initializing random number generator from current time


Prime numbers limit: 10000

Initializing worker threads...

Threads started!

CPU speed:
    events per second:  4282.14

General statistics:
    total time:                          10.0003s
    total number of events:              42832

Latency (ms):
         min:                                    0.46
         avg:                                    0.47
         max:                                    1.38
         95th percentile:                        0.47
         sum:                                19988.62

Threads fairness:
    events (avg/stddev):           21416.0000/28.00
    execution time (avg/stddev):   9.9943/0.00

The output for D2s_v5 (processed 3131.14 events per second):

+ curl --noproxy '*' -H Metadata:true -s -f 'http://169.254.169.254/metadata/instance/compute?api-version=2017-12-01'
+ jq -r .vmSize
Standard_D2s_v5
+ lscpu
Architecture:                    x86_64
CPU op-mode(s):                  32-bit, 64-bit
Byte Order:                      Little Endian
Address sizes:                   46 bits physical, 57 bits virtual
CPU(s):                          2
On-line CPU(s) list:             0,1
Thread(s) per core:              2
Core(s) per socket:              1
Socket(s):                       1
NUMA node(s):                    1
Vendor ID:                       GenuineIntel
CPU family:                      6
Model:                           106
Model name:                      Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz
Stepping:                        6
CPU MHz:                         2800.000
CPU max MHz:                     2800.0000
CPU min MHz:                     800.0000
BogoMIPS:                        5586.87
Virtualization:                  VT-x
Hypervisor vendor:               Microsoft
Virtualization type:             full
L1d cache:                       48 KiB
L1i cache:                       32 KiB
L2 cache:                        1.3 MiB
L3 cache:                        48 MiB
NUMA node0 CPU(s):               0,1
Vulnerability Itlb multihit:     Not affected
Vulnerability L1tf:              Not affected
Vulnerability Mds:               Not affected
Vulnerability Meltdown:          Not affected
Vulnerability Mmio stale data:   Vulnerable: Clear CPU buffers attempted, no microcode; SMT Host state unknown
Vulnerability Retbleed:          Vulnerable
Vulnerability Spec store bypass: Vulnerable
Vulnerability Spectre v1:        Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2:        Mitigation; Retpolines, STIBP disabled, RSB filling, PBRSB-eIBRS Not affected
Vulnerability Srbds:             Not affected
Vulnerability Tsx async abort:   Not affected
Flags:                           fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology tsc_reliable nonstop_tsc cpuid aperfmperf pni pclmulqdq vmx ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single tpr_shadow vnmi ept vpid ept_ad fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm avx512f avx512dq rdseed adx smap avx512ifma clflushopt clwb avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves avx512vbmi umip avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg avx512_vpopcntdq la57 rdpid fsrm arch_capabilities
+ sysbench --threads=2 cpu run
sysbench 1.0.20 (using bundled LuaJIT 2.1.0-beta2)

Running the test with following options:
Number of threads: 2
Initializing random number generator from current time


Prime numbers limit: 10000

Initializing worker threads...

Threads started!

CPU speed:
    events per second:  3131.14

General statistics:
    total time:                          10.0005s
    total number of events:              31318

Latency (ms):
         min:                                    0.34
         avg:                                    0.64
         max:                                    1.59
         95th percentile:                        0.64
         sum:                                19996.86

Threads fairness:
    events (avg/stddev):           15659.0000/22.00
    execution time (avg/stddev):   9.9984/0.00

Advertisement