Using the ext3 filesystem in 2.4 kernels

Introduction

This document is a brief description of how to get up and running with the ext3
journalling filesystem on 2.4 kernels.

ext3 was written by Dr Stephen C. Tweedie for 2.2 kernels.

The filesystem was ported to 2.4 kernels by Peter Braam, Andreas Dilger, Andrew
Morton and, of course, Stephen Tweedie. Ted Ts'o supports the all-important
e2fsprogs utilities, as well as providing ext3 feature work and design advice.
Alexander Viro has contributed to ext3's directory searching code.

Please send any comments on this document to Andrew Morton.

Please send any queries, questions or bug reports on this software to the ext3
user's mailing list. Instructions for subscribing to this list are at https://
listman.redhat.com/mailman/listinfo/ext3-users/

Status

Across July, ext3 development has slowed as we head toward a 1.0 release.  As
of kernel 2.4.7, the ext3 patch is quite stable and performs well.  Testing has
been on x86 SMP.  Please send any success or failure reports for other
architectures to the ext3-users list.

One outstanding problem is disk quotas.  There are several known sources of
deadlocks in the 2.4.7 quota code, and ext3 adds one more source.  The quota
code in the -ac kernels is very different, and once that gets merged into
Linus' tree we shall continue development and testing of quota code for ext3.  
This is not to say that ext3+quotas crashes all over the place - but if you
push the filesystem hard enough for long enough, the code will lock up and you
will need to reboot to reestablish operation on the affected filesystem.   We
only test quota code against the -ac kernels - this is supported and works
well.

Installation

 1. Download the latest kernel patch from http://www.zip.com.au/~akpm/linux/
    ext3/
 2. cd /usr/src/linux
 3. gunzip < ~/ext3-2.4-0.x.y.patch.gz | patch -p1
 4. make menuconfig
     

     

    Under the filesystems menu, select ext3.  Please also select "JBD debugging
    support", as it will produce useful diagnostics if something goes wrong. 
    You shouldn't normally select "Buffer head tracing" - it uses a lot of
    memory.  However if you do see `assertion failures' from ext3, please see
    if you can reproduce them with buffer tracing enabled before reporting them
    - that will provide much useful information.

    The filesystem may be compiled into the kernel or built as a module. 
    Building it into the kernel can simplify the gathering of diagnostic
    information if something fails.
     
 5. Build and install the kernel.

Other software

You will need the latest util-linux package from http://www.kernel.org/pub/
linux/utils/util-linux/ .  The changes in mount are described below.

You will need to download version 1.25 or later of e2fsprogs from http://
e2fsprogs.sourceforge.net/.

Converting ext2 filesystems

An ext2 filesystem maybe converted to ext3 by creating a journal file on it. 
To do this, run

    tune2fs -j /dev/hdXX

on the target filesystem (which may be mounted).  The filesystem is now ext3
capable.  This means that it can be mounted as type ext3.  Now you can unmount/
mount (after changing your /etc/fstab appropriately) to do this. To mount the
root filesystem ext3, the easiest thing is probably to just reboot.

Creating new ext3 filesystems

Simply run

    mke2fs -j /dev/hdXX

to create a new ext3 filesystem on that device.

Switching between ext2 and ext3

ext3 filesystems may still be mounted by ext2 as long as they have been cleanly
unmounted.  ext2 will refuse to mount ext3 filesystems which have not been
cleanly shut down, because there is live data still in the journal which ext2
does not know how to deal with.

The e2fsck application from e2fsprogs can perform journal replay, so running

    e2fsck -fy /dev/hdXX

on a damaged ext3 filesystem will repair it, allowing ext2 to mount it.

ext3 software will refuse to mount an ext2 filesystem - at present there must
be a journal file on the filesystem.
 

LILO options for the root filesystem

If your root filesystem is ext3, an ext3-capable kernel will, by default, mount
it using ext3.  This may be overridden via the following LILO option:

    LILO: linux rootfstype=ext2

You may provide mount options to the root filesystem via LILO using the
rootflags option.  For example:

    LILO: linux rootflags=data=journal

Non-LILO bootloaders

The LILO bootloader doesn't know about filesystems - it uses a pre-prepared
list of blocks to locate and load the operating system image into memory. 
However other (smarter?) software such as SILO (SPARC) and yaboot (built on
Open Firmware) (PPC) have filesystem drivers in them, and they know how to
directly open and load an ext2 file.

This can be a problem if the boot filesystem is ext3, and it has suffered an
unclean shutdown.  When ext3 is in this state it is not compatible with ext2 -
it neds recovery to be performed.  This incompatibility is recorded in the
filesystem's superblock, and a fully ext2-compatible bootloader implementation
should complain and refuse to open files on the filesystem.  This is, of
course, not what we want to happen.  The system won't boot!

Versions of yaboot prior to 1.3.5 will refuse to boot from a "needs recovery"
filesystem.  Version 1.3.5 and later support ext3 via libext2fs.

SILO also has the correct compatibility checks, and booting from a "needs
recovery" ext3 filesystem will cause SILO to complain about "too many
symlinks", or something else inappropriate.  To avoid this serious problem you
will need to ensure that your boot filesystem is of type ext2, not ext3 (or
patch SILO to defeat the compatibility checks?)

Making things seamless

One problem with switching back and forth between ext2-only and ext3-enhanced
kernels is the need to tell the kernel what sort of filesystem to mount all
your devices with.  This usually involves playing games with /etc/fstab.

The latest version of mount recognises ext3 and can automatically choose the
ext3 filesystem type.  The version of fsck in e2fsprogs-1.23 and later can also
do this if the fstype is auto.   Here's the state of play:

  * If mount is not told the target fstype, and it detects ext3, it will try
    ext3 and then ext2.
  * If mount is told fstype auto then it will detect ext3 and will try ext3 and
    then ext2.
  * If fsck is told fstype auto then it will autodetect the type of the
    filesystem and run the appropriate checker (which is fsck.ext2 for both
    ext2 and ext3).

Is is recommended that you use latest stable version of e2fsprogs, and that you
use fstype auto in /etc/fstab.

NOTE: You must be using e2fsprogs-1.23 or later for fstype auto to work
correctly!

NOTE: If you are using a recent Red Hat distribution and if you have built your
own util-linux from the official tarball you may have problems with mount
failing to mount filesystems.  This is because Red Hat have added the "-O"
option to their version of mount.  This option is used in their /etc/rc.d/
rc.sysinit, and this causes the standard mount to fail with "unrecognised
option -O".

The fix is to edit /etc/rc.d/rc.sysinit and remove any instances of "-O
no_netdev".

NOTE: Using filesystem type auto for the root filesystem confuses /bin/df, and
causes it to not print out information for the root filesystem.  Fix: always
specify the root filesystem as ext3 in /etc/fstab.
 

Filesystem check intervals

A feature of e2fsck is that it will regularly force a check of a filesystem
even if the filesystem is marked clean.  Typically, this happens on every
twentieth mount or every 180 days, whichever comes first.

This still happens with ext3, and is quite possibly not what you want to happen
- one of the reasons you chose ext3 was to avoid the downtime which is caused
by a long fsck.

So it is a good idea to turn this feature off for ext3.  Use the command
 

    tune2fs -i 0 -c 0 /dev/hdxx

To disable the checking.

NOTE: this means that it is your responsibility to periodically schedule
downtime for the manual checking of disks.  In many Linux distributions this is
most easily done by creating a file called /forcefsck and rebooting.

External journals

As of version 0.9.5, ext3 supports the placement of the journal on a separate
device.  It is intended that this be a magnetic disk, or an NVRAM device.  
NVRAM devices may be simulated by using Andrew Tridgell's trivial RAM disk
driver trd, which is in the ext3 CVS repository.

You will need a very recent e2fsprogs.  Version 1.23-WIP-0727 or later.

To install trd:

    mknod /dev/trd b 240 0
    insmod trd.o trd_size=50000        (For a 50 megabyte device)

To create an external journal:

    mke2fs -O journal_dev /dev/trd

To create an ext3 filesystem on /dev/hda5 which uses the external journal
device:

    mke2fs -J device=/dev/trd /dev/hda5
    mount /dev/hda5 /mnt/some/place -t ext3

NOTE!  You'll need to specify the filesystem type (-t ext3) because the
automatic filesystem type detection in mount(8) doesn't recognise ext3 with
external journals.

Of course you may use a partition on a real disk as the external journal device
- just replace /dev/trd above with /dev/hdXX.

NOTE: using a RAM disk driver to simulate an NVRAM device should ONLY be used
for testing.  Doing so will lose the benefits of journalling at recovery time
(you will always get a full fsck of the filesystem), and in fact you will lose
data and have an increased chance of filesystem corruption after a crash.
 

A HOWTO

Rajesh Fowkar has prepared an ext3 installation HOWTO.  It is available at
http://www.symonds.net/~rajesh/howto/ext3/index.html.