The world is moving to UTF8, MySQL 8.0 has utf8mb4 charset as default now, but, to be honest, I was pretty surprised how sensible the "charset" related topic could be.. -- in fact you may easily hit huge performance overhead just by using an "odd" config settings around your client/server charset and collation. While to avoid any potential charset mismatch between client and server, MySQL has from a long time an excellent option : "skip-character-set-client-handshake" which is forcing any client connection to be "aligned" with server settings ! (for more details see the ref. manual : https://dev.mysql.com/doc/refman/8.0/en/server-options.html#option_mysqld_character-set-client-handshake) -- this option is NOT set by default (to leave you a freedom in choose of charsets used on client and server sides). However, in my sense, it's still better to align clients according to the server settings to avoid any potential client misconfig..

As well if you wish to use UTF8, please use "utf8mb4" as first of all it's the most complete for any kind of characters (and probably the only one which makes sense as of today), and second -- its related code was yet more improved in MySQL 8.0 for better efficiency. How much more efficient ? -- let's see from the following test results.

but first of all, the related config setup :
character_set_server=utf8mb4
collation_server=utf8mb4_0900_ai_ci
skip-character-set-client-handshake
sort_buffer_size=512K

NOTE: mind to use a bigger sort buffer for UTF8

The results are obtained with on the same 2S Skylake as in the previously published RO tests with latin1 and with the same test workloads (just that for latin1 you need to change character_set_server= latin1 and collation_server= latin1_swedish_ci)

So far, here we are :

Sysbench OLTP_RO 10Mx8-tables UTF8mb4 on 2S 48cores-HT Skylake

Comments :
  • MySQL 8.0 is doing up to 40% better than 5.7
  • MariaDB 10.3.5 is trying to follow, but not yet there..


Sysbench RO Point-Selects 10Mx8-tables UTF8mb4 on 2S 48cores-HT Skylake

Comments :
  • point-selects workload is much less sensible to UTF8
  • 8.0 and 5.7 are getting highest QPS due RO fixes in 5.7
  • MariaDB 10.3.5 is going higher than before since adoption of InnoDB 5.7 code
  • 5.6 is slower than others because it's 5.6, and has no 5.7 improvements ;-))


Sysbench RO Distinct-Ranges 10Mx8-tables UTF8mb4 on 2S 48cores-HT Skylake

Comments :
  • MySQL 8.0 is doing 30% better than 5.7
  • MariaDB is doing so bad here just because it's already doing something bad yet in previous latin1 tests..

Instead of summary :
  • a gentle reminder to PeterZ that MySQL is not "InnoDB only" ;-))
  • if you're doing "benchmarKeting" -- very easy to be "the best" by comparing everyone with UTF8, and hide all other regressions and bottlenecks.. ;-))
  • so, hope it's obvious why all my following benchmark results will be published with "latin1" only..

Thank You for using MySQL !

Rgds,
-Dimitri