The point of performance comparisons or benchmark articles has to be purely sensational. By far the most of these appear to have little value other than attracting less educated readers to the relevant websites.
In a recent article Michael Larabel of Phoronix reports on the relative performance of various file systems under Linux, specifically comparing the traditional Linux file systems to the new (not yet quite available) native ZFS module. According to the article ZFS performs slower than the other Linux file systems in most of the tests, but I have a number of issues with both how the testing was done and with how the article was written.
Solaris 11 Express should have been included in the test, and the results for OpenIndiana should be shown for all tests. It is crucial that report include other system metrics such as CPU utilization during the test runs.
I also have some even more serious gripes. In particular the blanket statement that some unspecified “subset” of the tests were performed on both a standard SATA hard drive and the SSD drive, but that the results were “proportionally” the same – does not make sense as some tests are more seek latency sensitive than others, and some file systems hide these latencies better than others.
Another serious gripe is that there is no feature comparison. More complex software has more work to do, and one would expect some trade-offs.
Even worse: two of ZFS’s strengths were eliminated by the way the testing was done. Firstly when ZFS is given a “whole disk” as is recommended in the ZFS best practices (as opposed to being given just a partition) it will safely enable the disk’s write cache. It only does this if it knows that there are no other file systems on the disk, i.e when ZFS is in control of the whole disk. Secondly ZFS manages many disks very efficiently, particularly as far as is concerned allocating space: ZFS performance doesn't come into its own right on a single disk system!
Importantly, and especially so since this is very much a beta version of a port of a mature and stable product, we need to understand which of ZFS's features are present, different and/or missing compared to the mature product. For example some of ZFS’s biggest performance inhibitors under FUSE is that it is limited to a single-threaded ioctl (Ed: Apparently this is fixed in ZFS for Linux 0.6.9, but I am unable to tell whether this is the version Phoronix tested) - and not having access to the disk devices at a low level. The KQ Infotech website lists some missing features, particularly interesting is the missing Linux async I/O support. Furthermore the KQ Infotech FAQ states that Direct IO falls back to buffered read/write functions and that missing kernel APIs are being emmulated though the "Solaris Porting Layer".
A quick search highlights some serious known issues, such as the Linux VFS Cache and ZFS ARC cache copy duplication bug, a bug which heavily impacts on performance.
More information about missing features can be found on the LLNL issue tracker page.
If nothing else, the article should mention the fact that there are known severe performance issues and feature incompleteness with the Linux native ZFS module! The way in which Linux allocates and manages virtual address space is inefficient (don't take my word for it, see this and this), requiring expensive workarounds.
Besides all of this my real, main gripe is about this kind of article in general. The common practice of testing everything with “default installation settings” implies that nothing else needs to be done - however when you want the absolute best possible performance out of something, you need to tune it for the specific workload and conditions. In the case of the article in question, the statement reads “All file-systems were tested with their default mount options”, and no other information is given, such as whether the disk was partitioned, whether the different subject file systems where mounted at the same time, what disk the system was booted from and whether the operating system was running with part of the disk hosting the tested file system mounted as its root. We don’t even know whether the author read the ZFS Best Practices Guide.
It can be argued that the average person will not tune the system, or in this case the file system, for one specific workload because their workstation should be an all-round performer, but you should still comply with the best practices recommendations from the vendors, especially if performance is one of your main criteria.
I don’t know whether using defaults is ever acceptable in this kind of article. My issue stems from how these articles are written in a way that suggests that performance is the only or at least the most important factor in choosing an operating system, or file system, of graphics card, or CPU or whatever the subject is. If that were true then at least the system should be tuned to make the most of each of the subject candidates, whether these are hardware or software parts being tested and compared to one another. This tuning is often done by disabling features, configuring the relevant options, and usually to get it right you would need to have someone who is a performance expert on that piece of software or hardware to optimize it for each test.
Specific hardware (or software) often favor one or the other of the entrants. An optimized, feature poor system will outperform a complex, feature rich system on limited hardware. Making the best use of the available hardware might mean different implementation choices when optimizing for performance rather than for functionality or reliability. ZFS in particular comes into its own right, both in terms of features and performance, when it has underneath it a lot of hardware – RAM, disk controllers, and as many dedicated disk drives as possible. The other file systems have likely reached their performance limit on the limited hardware on which the testing was done. Linux is particularly aimed at the non-NUMA, single-core, Single hard drive, single user environment. Solaris, and ZFS, was developed in a company where single user workstations were an almost non-existing target, the real target of course being the large servers of the big corporates.
As documented in the ZFS Evil Tuning Guide, many tuning options exist. One could turn off ZFS check-sum calculations, limit the ARC cache sizes, set the SSD disk as a cache or log device for the SATA disk, and set the pool to cache only meta data, to mention a few. Looking at the hardware available in the Phoronix article, the choices would depend on the specific test – in another test one might stripe between the SATA disk and the SSD disk, in another you might choose to mirror across the two.
The other file system candidates might have different recommendations in terms of how to optimize for performance.
I realize that the functionality would be affected by such tuning, but the article doesn’t look at functionality, or even usability for that matter. ZFS provides good reliability and data integrity, but only in its default configuration, with data check-summing turned on. The data protection levels and usable space in each test might be different, but that again is a function of which features are used and not the subject of the article, not even mentioned anywhere in the article.
As a point in case for the argument about functionality, once needs to consider all that ZFS is doing in addition to being a POSIX compliant file system. It replaces the volume manager. It adds data integrity checking through check-summing. It manages space allocation, including space for file systems, meta-data, snapshots, and ZVOLs (virtual devices created out of a ZFS pool) automatically. Usage can be controlled by means of a complete set of reservation and quota options. Changing settings, such as turning on encryption, the number of copies of data to be kept, whether to do check-summing, etc is dynamic. There is much more as Google will tell.
And just to add insult to injury, the article goes and pits XFS against ZFS, ignoring the many severe reliability issues present with XFS, such as the often reported data corruption under heavy load and severe file system corruption when losing power.
I would really like to see a performance competition one day. The details of how the testing will be done will be given out in advance to allow the teams to research it. Each team is then given access to the same budget from which to buy and build their own system to enter into the competition. Their performance experts then set up and build the systems, and install the software and tune it for the tests on the specific hardware they have available. One team might buy a system with more CPUs while another might buy a system with more disks and SCSI controllers, but the test is fair (barring my observation about how feature poor systems will always perform better on a low-budget system) because the teams each solves the same problem with the same budget. The teams submit to the judges their ready systems to run the performance test scripts and publish their configuration details in a how-to guide. To eliminate cheating, an independent group will try to duplicate the team’s test results using the guide.
I think this would make a fun event for a LAN party – any sponsors interested?