[Imap-uw] Long term archiving strategy?

Mark Crispin mrc at CAC.Washington.EDU
Mon Sep 3 14:25:04 PDT 2007


On Mon, 3 Sep 2007, "Ing.BcA. Ivan Doležal" wrote:
>   I was wondering if anyone had some strategy for server-side strategy/hack 
> of mail offloading/archiving with UW IMAPD.

Most UW imapd sites use the same strategy for archiving as they use for 
their files.  That is, they treat mailboxes as simply a type of file, and 
they use whatever backup/archive procedure they have in place for other 
files.

A general issue with archiving from "fast expensive disks" to "some slow 
cheap medium" is that the cost of disks has plummeted, whereas the cost of 
alternative media has not.  What's more, the performance of disks (and, 
in recent years, of solid state file store) has continuously improved.

In my (admittedly biased) opinion, tape storage is well into its waning 
years, and will be extinct in the not-too-distant future.  Twenty years 
ago, optical disks seemed to be the up-and-coming replacement, but they 
haven't kept up with disk storage; a full backup of a relatively modest 
250GB file store needs approximately 75 DVDs!  However, these days, we 
have enterprises dealing with multiple TB or PB.  I wouldn't be at all 
surprised to hear about EB, ZB, or even YB data stores in my lifetime.

With all this said, a lot of thought was given to archiving in IMAP's 
infancy.  The whole idea of tagging commands and responses was based upon 
the idea that in a large mail store a FETCH or SEARCH may have to wait for 
a read from an optical disk, or even for an operator to mount the 
appropriate tape; thus it would be necessary for IMAP to permit commands 
to be executed out of order.  That model of the world never came to pass, 
and all the out-of-order mechanisms of IMAP turned out to accomplish 
nothing but added complexity.

Nonetheless, UW imapd doesn't preclude the use of an external archiving 
solution.  For example, on the old TOPS-20 systems, an FDB (what UNIX 
calls an "inode") could have archive status, pointing to a particular 
archive store (which in those days meant a labelled 9-track tape).  A file 
with archive status could have its data pages removed from the disk, and 
if an application attempted to access the file it would be blocked and the 
operator would get a message to mount tape such-and-such.  Once the tape 
was mounted, the operating system would read the tape and restore the file 
data pages, and when all this was done the application would continue as 
if nothing had happened other than the file open taking a *long* time...

I've often thought about creating something like this for UNIX, but as 
noted above the world changed the cost/benefit dynamics and the project 
never got off the ground.

One thing that *has* been accomplished, however, is the new mix format in 
UW imapd.  One of its primary goals was to reduce the burden on general 
purpose backup systems.  The problem with flat files is that a change to a 
single flag of a single message in a 1.5GB mailbox would require the 
entire 1.5GB mailbox file to be backed up on an incremental dump tape.  As 
users' mailbox files began to become huge, we discovered that incremental 
dumps were no longer significantly smaller than full dumps; and our backup 
system was no longer able to do that much backup in a single day.

Mix remedies this problem by causing most old messages to wind up in files 
that never change; thus only a full backup would save those files and the 
incremental backups are much smaller.

While steering away from the Scylla of flat files, mix also avoids the 
Charybdis of one-file/one-message which ends up resulting in huge 
directories and the performance problems familiar to anyone who has large 
news spools with 5 or 6 digit article counts for a single newsgroup.

-- Mark --

http://panda.com/mrc
Democracy is two wolves and a sheep deciding what to eat for lunch.
Liberty is a well-armed sheep contesting the vote.


More information about the Imap-uw mailing list