Openzfs postgres tuning recommendations

6/10/2023

Most user interaction with storage can be characterized by reading and writing files in their entirety-and that's not what fio does. While the ZFS defaults are reasonably sane, fio doesn't interact with disks in quite the same way most users normally do. The shipping defaults should be sane defaults, and that's a good place for everyone to start from. We believe it's important to test things the way they come out of the box. Yes, I know, I'm literally chasing a couple of seconds and will never recoup the effort spent in achieving this.Retesting ZFS with recordsize set correctly Have I missunderstood how the write throttling delay works?.Both of these just moved the position of the cliff slightly but no where near what I expected. I've tried adjusting the zfs_dirty_data_sync_percent to start writing earlier because the dirty page buffer is so much larger than the default and I've alse tried adjusting the active io scaling with zfs_vdev_async_write_active_dirty_percent to kick in earlier as well to get the writes up to speed faster with the large dirty buffer. In addition it falls off once again at the end, see picture: However this makes no sense as at the very least even if nothing was flushed out during the 16 seconds it should have set in 3 seconds later. However what I observe writing from the Windows client is that after around 16 seconds at ~1 GB/s write speed the write performance falls off a cliff ( iostat still shows the disks working hard to flush the data) which I can only assume is the pushback mechanism for the write throttling of ZFS.

As far as I understand, the pool should be able to absorb 19G into RAM before it starts to push back on writes from the client (Samba) with latency.

I set the max dirty pages to 24G which is waay more than the 7G that I need, and hold of to start delaying writes until 80% of this is used. I have ran the appropriate mkinitcpio -P command to refresh my initramfs and confirmed that the settings were applied after a reboot: # arc_summary | grep dirty_data Options zfs zfs_delay_min_dirty_percent=80 Options zfs zfs_dirty_data_max_percent=50 Options zfs zfs_dirty_data_max_max_percent=50 I've edited my /etc/modprobe.d/zfs.conf file like so: options zfs zfs_dirty_data_max_max=25769803776 However I cannot get ZFS to behave in the way I expect. I understand the data integrity implications of this and the risk is acceptable as I can always transfer the file again later for up to a month in case a power loss causes the file to be lost or incomplete. I figured that all I would need is (1024-800)*30~=7G of dirty pages in RAM that can get flushed out to disk over ~10 seconds after the transfer completes. My plan is to use the RAM based dirty pages to absorb the spillover and let it flush to disk after the transfer is "completed" on the client side. I want to be able to transfer one whole 30G video archive at 1GB/s to the raid-z2 array that can only support 800 MB/s sequential write. I've tuned my network and Samba to the point where transferring from NVMe NTFS on a Windows machine to NVMe ext4 reaches 1GB/s, i.e reasonably close to saturating the 10 Gbps link with 9K Jumbo Frames. This pool is shared over the network via Samba. I've disabled atime, set recordsize=1M, set compressios=off and dedup=off as the data is actually incompressible and testing showed worse performance with compression=lz4 than off despite what the internet said and there is no duplicate data by design. My use case is a video archive of mostly large (~30G) write once, read once, compressed video. I'm running an up-to-date Arch system with zfs 2.1.2-1. The machine is an Ryzen GE (4C/8T, 3.8 GHz boost) with 32G ECC RAM, NVMe boot/system drive and 2x10Gbps ethernet ports (Intel x550-T2). Using fio and manual tests I know that the array can sustain around 800 MB/s sequential writes on average this is fine and in-line with the expected performance of this array.

I have a pool with a single raid-z2 with 6 drives, all Exos X18 CMR drives. This is a follow-up to: High speed network writes with large capacity storage.

0 Comments

Openzfs postgres tuning recommendations

Leave a Reply.

Author

Archives

Categories