« Responsible or irresponsible, that is the issue | Main | Zotob (Bozori) virus author caught »
August 22, 2005
Everything begins with choice - Chapter 1: Filesystems
Hi, the name's Costin and I am a Linux user.
On a second thought, I'm not sure "user" is the right term. In the Windows world, calling somebody an "user" is the same as calling them stupid. But to be honest, I'm not a sysadmin either, nor a developer. Oh well, I guess I'm a Linux user after all.
I use both Linux and Windows, and when the time allows, other operating systems such as the recently announced MacOSx86, or Darwin for PCs. I've used Solaris in the past, on PCs and SPARC's, as well as a handful of other operating systems: AIX, HPUX, VMS, Unicos and Ultrix to name just a few. Maybe not the average Linux/Unix user, but I'm no addict either.
Recently, I've been doing a lot of work with Fedora Core 4, on AMD64. This is mostly related to downloading high amounts of mail, processing and then storing a huge amount of text in a SQL database. Detection of new viruses is of course the main purpose here, as you might have suspected. Of course, with such disk intensive applications, one of the most important factors is the type of the file system.
In my case, the setup implies the storage of about 2 million files of ~16K in average (e-mails), as well as some very large files, not many, but 6GB+ (virtual machine images). In both areas there is heavy access, with new files being added and older files being deleted. Most of the time, the large files grow even larger. So, what to use?
In the Linux/Unix world, the choice of file systems is pretty much the following:
- ext2 - older, reliable file system but with no journalling
- ext3 - ext2 on steroids
- ReiserFS 3 - blinding fast file system, directly supported by the new kernels
- ReiserFS 4 - even faster, but still new
- XFS - developed by SGI, for their supercomputing purposes
- JFS - based on the defunct IBM HPFS (OS/2), now open source
- FAT32 - a relique from the Microsoft world
- NTFS - the WinNT file system, supported in write by recent Linux kernels (sic)
Of these, only some support journalling, which is a very desirable technique which not only avoids long consistency checks during reboot but increases the overall reliability. So there goes ext2 and FAT32, which despite high compatibility, would be too much of a nuisance during the unavoidable crashes.
Yes, ext3 is good, stable and reliable, but way too slow for the type of computing I'm doing in the lab.
What to say about NTFS? Well, it _is_ possible to use it in Linux but write support is limited and the whole NTFS in Linux project is currently in a deep stage of sleep, so there goes NTFS as well.
I've been using ReiserFS 3 quite happily for a while, before I've started to hit some problems. In my case, it was related to security contexts under SELinux. One evening I've spent about 2 hours trying to figure out why Squid stopped being able to access parts of its cache on one of my ReiserFS 3(.6) partitions. It turned out that due to some unknown reasons, the security contexts on the Squid cache folder simply disappeared after a while. Moving the cache to an ext3 partition solved the problem. Strange. Besides, ReiserFS is very CPU intensive - I'm already trying to squeeze the most of the CPU for other things. No doubt the file system is very fast, possibly the fastest of the batch, and the algorithms build into it are marvellous. On the other hand, it doesn't seem to work well with security contexts and SELinux, at least for me. Maybe I'm doing something wrong, maybe I'm not doing something I should be doing. I don't know; it just doesn't want to play like the nice kid it is supposed to be.
How about ReiserFS 4? Again, YMMV. When heavily used on my AMD64 machine, the machine hangs every now and then. Indeed the CPU is getting _very_ hot, and it may not be out of question that it is getting even hotter because of the intensive ReiserFS 4 computations, but the system just doesn't crash with ext3. And, as it will be seen later, nor does it crash with XFS and JFS. So, there goes ReiserFS 4, despite having the wonderful feature of knowing how to stick even more data in the unused slack space between the end of a file and the physical end of a cluster.
So this leaves us with JFS and XFS. I've tested both, and I must say both performed very well. No crashes. No security context problems. Speed. Especially speed!
So, which one to use? Well, your choice - I'm using both.
Right now, I'm especially inclined to praise XFS, but JFS isn't bad either. Both reach close to the disk's native transfer speed and the CPU load is acceptable. However, only the XFS defragmentation tools are available on Linux, which is essential in my case.
If you're doing some heavy Linux computing and so far you've thought that ext3 is enough, it doesn't hurt to try XFS or JFS.
Oh, and one more thing - I'm storing my entire virus collection on an XFS partition. (it is on a machine with no net connection and encrypted disks, so don't bother) There's an awful lot of files in there, small, big and lots of directories. Until using XFS, I've been having all types of problems with NTFS; don't even think about FAT32. XFS handles the job admirably.
Finally, the most wonderful thing conclussion that you can draw from the above is not really about XFS being as good as JFS and better than ReiserFS. No - it is about _choice_. You have the choice to try and see which one suits your purposes best.
I wish I'd have had this choice in Windows as well.
Anybody out there porting XFS and JFS to Windows?
Posted by Costin Raiu at August 22, 2005 8:49 PM
Comments
Interesting choice !
Good for you.
I recently had to reinstall my Linux because the FS just ... well .. I don't know what happened, but didn't work anymore.
It was reiserfs and SuSe 9.0 (up2date).
And, it is the 3rd time when it does so, apparently after no reason.
I tried to remember what I've done recently, but the last thing was only checking out from CVS a very large number of small files (source files).
About 200 Mb in size and the number if files is about 20000 files.
I don't know what to say :(
Other colleagues reported to me exactly the same problem.
What do you think Costin ?
Can it be from Reiser FS ?
Posted by: Sorin Mustaca at September 1, 2005 10:50 AM
It may not be a bug in ReiserFS itself, but maybe an incompatibility between ReiserFS and something else. For instance, as I was saying in the post, ReiserFS just doesn't like SELinux, on all the machines I've tried. It keeps loosing permissions, contexts, it uses lots of CPU power and overall, generates too much headache for somebody who doesn't have time to spend on debugging such issues.
Since I've starting using XFS I never had any of the above mentioned issues. No errors, no problems, no data loss - it just works as expected, and at least just as fast as ReiserFS.
If you have the disk space to try XFS, go ahead - it might just solve all the problems!
Posted by: Costin Raiu at September 1, 2005 11:36 AM