For replication logs (binary/relay), we should create a separate file system on the same pool without specifying the recordsize option (or specifying it to be the default 128KB). This prevents multiple pages from being written in one larger block, as that would necessitate reading the entire larger block for future updates to any one page in that block. Using this parameter, we tell ZFS to limit the block size to 16KB. Recordsize=16k: As discussed above, InnoDB uses 16KB pages. Since O_DIRECT doesn’t disable caching on ZFS, we can disable caching of data in the ARC by setting primarycache to only cache metadata (default is “all”). Primarycache=metadata: Since InnoDB already does it’s own caching via the Buffer Pool, there is no benefit to also caching the data in the page cache, or in the case of ZFS, ARC. Using the logbias=throughput value might improve performance for writing database files.” For most configurations, the default value is recommended. Instead, ZFS optimizes synchronous operations for global pool throughput and efficient use of resources. Logbias=throughput: From Solaris documentation: “If logbias is set to throughput, ZFS does not use the pool’s separate log devices. This means simultaneously having approximately double the disk throughput while consuming only half of the storage space. So fast, in fact, that if you have a modern processor, you are likely to be able to decompress the data much faster than your SSD can read the compressed data. The parameters we normally recommend are the following:Ītime=off: It means the same thing it means on all other file systems.Ĭompression=lz4: LZ4 compression is very fast. ZFS Tuning and OptimisationĪt this stage, we need to consider the actual ZFS configuration parameters for the file system that will store our InnoDB tablespaces and logs. This is especially important in environments where disk I/O is heavily constrained and provisioning more IOPS is expensive (e.g. Since InnoDB does it’s own prefetching, we can disable ZFS’ own prefetching (since it is redundant in this specific usage) by setting the kernel module paramter zfs_prefetch_disable=1. For 8KB sectors it would be ashift=13 because 2^13 = 8192. “Wait, I thought you said 4KB, why 12?” I hear you ask? The ashift parameter is the exponent, rather than size. ZFS will try to auto-detect this, but it is usually a good idea to explicitly instruct it to create a pool with 4KB sector size using the ashift parameter: zpool create -o ashift=12 tank Underlying page size on SSDs can be bigger, but this is heavily abstracted away and most SSDs are optimised for 4KB writes since this is the most commonly used block size on most file systems. Most modern disks have 4KB hardware sectors, even of they emulate 512 byte sectors. With that out of the way, at the time of pool creation, the first thing you should consider is your underlying disk’s physical sector size. ext4’s stride and stripe-width mkfs parameters), but unfortunately systems engineers who are sufficiently familiar with both databases and storage are relatively rare. Even traditional file systems can be optimized for 16KB application page size (e.g. That reduces the requirement to 16KB blocks for the InnoDB tablespaces and logs, and no fixed block size for the replication logs. In order to simplify this, we can adjust the transaction log write size to 16KB (maximum is equal to the used InnoDB page size) using the innodb_log_write_ahead_size option. InnoDB defaults to 16KB page size for tablespaces, 8KB for the transaction log, and binary logs don’t have a fixed page size. The most important aspect here is the page size of various files. Specifically, let’s look at the InnoDB storage engine and replication logs as these are by far the most ubiquitously used components. Importance of Storage Alignmentīefore we get into the ZFS specific technical part, let’s first look at the storage layer and how we can optimise the interaction between the storage subsystem and MySQL / MariaDB. If you are only discovering it and switching to it today, the question you will ask yourself will probably be more along the lines of: “Why have I not done this before?” This is at least as applicable to running MySQL on ZFS as it is to running anything else on ZFS. One thing I have been asking myself for quite a few years now is: “How did we ever manage before ZFS?” Was everything really as painful and expensive before ZFS as it currently feels when I have to get anything done without ZFS? And why does anyone still use anything other than ZFS, a decade after it became available on the most popular server platforms (Solaris, FreeBSD, Linux)? I was fortunate enough to be exposed to it early on. MySQL on ZFS – we should all have started doing this a decade ago
0 Comments
Leave a Reply. |