Wiki Index All Recent Edit Bottom

Linux File Systems

1.   Introduction
2.   Why I chose XFS
2.1   Benchmarks
2.2   My observations
2.3   My conclusions
3.   Creating XFS
3.1   For < 1TB XFS file system
3.2   For > 1TB XFS filesystem
4.   Mounting XFS
4.1   atime, relatime and noatime
4.2   async and nobarrier
5.   Install XFS tools
6.   De-fragmenting XFS
6.1   De-fragment a file system
6.2   De-fragment a file
7.   References
7.1   XFS References
7.2   Performance Tuning XFS References

Introduction

I have recently bought new storage arrays for my workstation at home. I choose to do a fresh install of Ubuntu Karmic 9.10 at the same time. Initially, I setup my new arrays using Ext4 but then decided to do some research as to which file system best fits my requirements.

In the end I chose XFS as the file system for my disk arrays and subsequently migrated all my file systems to XFS on my workstation and laptop after reading some favourable boot up benchmarks.

I have not chosen XFS for performance alone, indeed some benchmarks show that JFS and Ext4 can out perform XFS for some file operations. The following explains why I chose XFS in preference to other popular file systems.

Why I chose XFS

My workstation at home has two 6TB disk arrays and a 1TB root file system. My disk arrays contain my photo, music and video libraries which I stream via UPnP/DLNA and DAAP. The video files can be 2GB to 30GB in size. I also do a good deal of HD video encoding, processing and editing. My root partition contains many virtual machine images of which several are running at any given time.

My work laptop has a 250GB root file system and also contains many virtual machine images of which one is usually running.

XFS is designed with large file systems and large file handling in mind. It seems a sensible choice for those reasons alone.

Benchmarks

Here are some benchmarks.

My observations

I've run my own (quick and dirty) synthetic benchmarks and I have got XFS performance to be comparable to or exceed that of Ext4. However, performing real world day to day operations I have observed significant performance gains when using XFS versus Ext4. The most obvious is running rsync backups/restores. XFS is blisteringly quick, not just for the file sync, but also sending the incremental file list which is near instantaneous for a 6TB volume with 1.5TB of data. My workstation boot time feels much, much quicker. I know this is subjective but when booting from XFS after logging in via GDM my entire session is loaded and ready to use instantly. When running Ext4 I have to wait for applications and applets to load.

When performing file system intensive operations such as large file copies, backups or uncompressing large archives on Ext4 my system responsiveness drops off significantly. For example, Firefox will often fade to grey and is unusual. Since migrating to XFS I've never experienced this even when the file system is under heavy load.

I have two 64GB USB 2.0 pen drives that I use for transporting data to and from work. I work with flight data from aircraft crash recorders, so I frequently fill a pen drive with data and copy the data off when I get home for analysis. I have used Ext3 on one pen drive and XFS (without the tuning) on the other. I've noticed that with the Ext3 pen drive performance drops off pretty quickly after I've filled and emptied the pen drive of data a few times. Where as XFS pen drive retains the full transfer speed after the same number of fill and empty cycles.

The read speed of XFS and Ext3 after ten fill and empty cycles:

If I format the Ext3 pen drive again, then full performance is restored but slowly degrades as the pen drive is used. I put this down to Ext3 fragmentation. XFS does fragment, but no where nearly as quickly or with such a huge impact on performance. XFS also has online de-fragmentation tools for files or the whole file system which is something Ext3 and Ext4 lack. This performance loss of Ext3 something synthetic benchmarks just won't catch.

JFS seems to provide a nice balance for many different workloads. You can use it with various I/O patterns and the performance is relatively the same. ReiserFS tends to be far less generic in performance. ReiserFS does perform well with directories full of small files. XFS (when un-tuned) can be relatively slow under these circumstances and even when tuned can't out perform ReiserFS in this scenario. While JFS isn't as good as ReiserFS in this scenario is doesn't suffer to the extreme of an un-tuned XFS. JFS tends to be light on CPU resources, which maybe good for laptop or netbook use.

My conclusions

So, for now at least I will be migrating my workstation, laptop and disk arrays exclusively to XFS. I may experiment with JFS on my netbook but I may also just got to XFS on that as well.

It will be interesting to see how Ext4, brtfs and NILFS develop, but they are all a little too experimental for me at this stage and Ext4 seems to be a stop gap solution until brtfs is matured. So I'll be sticking with XFS for pretty much everything for a while yet and I suggest you give it a chance too.

Creating XFS

Currently the defaults used when creating a XFS file system are not optimal, therefore if you want to use a tuned XFS root file system you can't simply use the graphical partitioning tool from the Ubuntu LiveCD installer.

However, it is very easy manually create the tuned XFS file systems. Simply boot the Ubuntu Live CD, then start a new shell.

Most of the performance tuning information available on the 'net is dated and doesn't reflect the XFS defaults in modern Linux kernels. Therefore the following is all that is required to create a tuned XFS file system on Ubuntu Karmic 9.10.

 mkfs.xfs -l lazy-count=1 -L VolumeName <dev> 

lasy=count=1 will soon be adopted as the default and is recommended by the XFS developers. However, lazy-count is a mkfs option because it changes the on-disk format slightly, and older kernels do not understand this new format. Hence mkfs sets a superblock feature bit to prevent the file system from being mounted on kernels that don't understand the slightly different disk format.

If you are not running Ubuntu Karmic 9.10 and are not sure what XFS defaults might be on your system and want to tune XFS then use the following.

For < 1TB XFS file system

 mkfs.xfs -l lazy-count=1,version=2,size=128m -i attr=2 -d agcount=4 -L VolumeName <dev>

For > 1TB XFS filesystem

 mkfs.xfs -l lazy-count=1,version=2,size=128m -i attr=2 -d agcount=6 -L VolumeName <dev>

Once you have created all your tuned XFS file systems start the Ubuntu installer from the Live CD. When the disk partitioning section comes round choose:

Now 'Change' each XFS file system telling the partitioner where to mount each XFS file system. But ensure that you do not tick 'Format the Partition:', thereby ensuring your tuned XFS file systems are left intact.

When you see this message, just click Continue.

 The file system on /dev/sda1 assigned to /boot has not been marked for
 formatting.  Directories containing system files (/etc, /lib, /usr, 
 /var, ...) that already exist under any defined mountpoint will be deleted
 during the install.
 
 Please ensure that you have backed up any critical data before installing.

Mounting XFS

Further performance optimisations can be gained but specifying some additional mount options for your XFS file systems.

To manually mount a XFS file system with, optimal mount options, use the following:

 mount -t xfs -o noatime,osyncisosync,logbsize=256k,logbufs=8 <dev> <mtpt> 

The '/etc/fstab' entries I use look something like this.

 UUID=xxxxxxxxxxx...x <mtpt> xfs noatime,osyncisosync,logbsize=256k,logbufs=8 0 2

The 'logsbsize' and 'logbufs' options address the often sited limitation of XFS when handling lots of small files and large number of file deletions. The above assumes you don't require 'atime'. Not using 'atime' provides a significant performance benefit.

atime, relatime and noatime

Every time a file is accessed (read or write) the default for most file systems is to append the metadata associated with that file with an updated access time. Thus, even read operations incur an overhead associated with a write to the file system. This can lead to a significant degradation in performance in some usage scenarios. Appending 'noatime' to the fstab line for any file system stops this action from happening.

One may also specify a 'relatime' option which updates the atime if the previous atime is older than the mtime or ctime. In terms of performance, this will not be as fast as the 'noatime' mount option, but is useful if using applications that need to know when files were last read (like mutt).

As access time is of little importance in most scenarios, this alteration has been widely touted as a fast and easy way to get a performance boost. Even Linus Torvalds seems to be a proponent of this optimization

NOTE! It would appear that Ubuntu Karmic 9.10 defaults to 'relatime', at least its what I've observed in when issuing 'cat /proc/mounts'. I don't know if this is a distribution default or a kernel default.

Access time is not the same as the last-modified time. Disabling access time will still enable you to see when files were last modified by a write operation.

async and nobarrier

If you really want to go for all out performance you can also provide 'async' and 'nobarrier' mount options. But you really need to understand and accept the potential issues with using these options.

Read the following to understand what write barriers are and if you are prepared to disable them to gain performance.

Install XFS tools

XFS is available as a kernel module in Ubuntu and also available from the Live CDs. Once Ubuntu is installed you can install the XFS tools as follows.

 aptitude install xfsdump xfsprogs

De-fragmenting XFS

There are two utilities that XFS has to manage this fragmentation.

De-fragment a file system

To find the health of a XFS file system use the 'xfs_db' command to gather some information. In the example below '/dev/sda1' is mount as '/boot' and '/dev/sda3' in mounted as '/root'.

 sudo xfs_db -c frag -r /dev/sda1
 actual 162, ideal 162, fragmentation factor 0.00%

 sudo xfs_db -c frag -r /dev/sda3
 actual 2288833, ideal 254504, fragmentation factor 88.88%

The closer the fragmentation factor is to 0% the better. Unsurprisingly '/boot' is not fragmented. However '/root' is very fragmented, probably because of the amount of data that goes through my home directory.

De-fragmenting XFS file systems can be done on a live running system, but it is a good idea to schedule this for a time where the partition will be used less.

The file system reorganizer for XFS is 'xfs_fsr'. Typically, I instruct 'xfs_fsr' to reorganise '/dev/sda3' with a timeout (-t) of 6hrs (60 * 60 * 6 = 21600) which is specified in seconds. But for the purposes of this example I used a timeout of 15 mins.

 sudo xfs_fsr -t 300 /dev/sda3 -v

The output will look something like this.

 / start inode=0
 ino=145565
 extents before:2 after:1 DONE ino=145565
 ino=145662
 extents before:2 after:1 DONE ino=145662
 ino=600148
 extents before:2 after:1 DONE ino=600148
 ino=1127295
 extents before:82794 after:1 DONE ino=1127295
 ino=1127243
 extents before:2 after:1 DONE ino=1127243
 ino=1382852
 extents before:50869 after:1 DONE ino=1382852
 ino=1422636

When the defrag is finished check how well the file system reorganising was.

 sudo xfs_db -c frag -r /dev/sda3
 actual 2155648, ideal 254512, fragmentation factor 88.19%

As you can see de-fragmenting for 15mins doesn't improve things greatly, which is why it needs to be run for several hours or more.

Manually de-fragmenting the file system is simple enough, but a better solution would be to schedule a cron job to run periodically.

De-fragment a file

It is also possible to de-fragment a single file. To determine if a file is in need of de-fragmenting run the following...

  xfs_bmap -v /srv/A320/PGQAR.DAT | wc -l

This will output a number which showing the number of extents the file is using.

  95280

This number should be close to 1. So in the example above, I have a very fragmented file.

 sudo xfs_fsr -v /srv/A320/PGQAR.DAT

This will output something like the following.

 /srv/A320/PGQAR.DAT
 extents before:95278 after:1 DONE /srv/A320/PGQAR.DAT

My file is now de-fragmented. I use the method above to target de-fragmentation where I know files reside that are most likely to be fragmented, rather than de-fragmenting the whole file system.

References

XFS References

Performance Tuning XFS References

$Id: LinuxFileSystems,v 1.8 2010/02/10 16:45:32 martin Exp $

Wiki Index All Recent Edit Top