[Imap-uw] Long term archiving strategy?
Mark Crispin
mrc at CAC.Washington.EDU
Mon Sep 3 14:25:04 PDT 2007
On Mon, 3 Sep 2007, "Ing.BcA. Ivan Doležal" wrote:
> I was wondering if anyone had some strategy for server-side strategy/hack
> of mail offloading/archiving with UW IMAPD.
Most UW imapd sites use the same strategy for archiving as they use for
their files. That is, they treat mailboxes as simply a type of file, and
they use whatever backup/archive procedure they have in place for other
files.
A general issue with archiving from "fast expensive disks" to "some slow
cheap medium" is that the cost of disks has plummeted, whereas the cost of
alternative media has not. What's more, the performance of disks (and,
in recent years, of solid state file store) has continuously improved.
In my (admittedly biased) opinion, tape storage is well into its waning
years, and will be extinct in the not-too-distant future. Twenty years
ago, optical disks seemed to be the up-and-coming replacement, but they
haven't kept up with disk storage; a full backup of a relatively modest
250GB file store needs approximately 75 DVDs! However, these days, we
have enterprises dealing with multiple TB or PB. I wouldn't be at all
surprised to hear about EB, ZB, or even YB data stores in my lifetime.
With all this said, a lot of thought was given to archiving in IMAP's
infancy. The whole idea of tagging commands and responses was based upon
the idea that in a large mail store a FETCH or SEARCH may have to wait for
a read from an optical disk, or even for an operator to mount the
appropriate tape; thus it would be necessary for IMAP to permit commands
to be executed out of order. That model of the world never came to pass,
and all the out-of-order mechanisms of IMAP turned out to accomplish
nothing but added complexity.
Nonetheless, UW imapd doesn't preclude the use of an external archiving
solution. For example, on the old TOPS-20 systems, an FDB (what UNIX
calls an "inode") could have archive status, pointing to a particular
archive store (which in those days meant a labelled 9-track tape). A file
with archive status could have its data pages removed from the disk, and
if an application attempted to access the file it would be blocked and the
operator would get a message to mount tape such-and-such. Once the tape
was mounted, the operating system would read the tape and restore the file
data pages, and when all this was done the application would continue as
if nothing had happened other than the file open taking a *long* time...
I've often thought about creating something like this for UNIX, but as
noted above the world changed the cost/benefit dynamics and the project
never got off the ground.
One thing that *has* been accomplished, however, is the new mix format in
UW imapd. One of its primary goals was to reduce the burden on general
purpose backup systems. The problem with flat files is that a change to a
single flag of a single message in a 1.5GB mailbox would require the
entire 1.5GB mailbox file to be backed up on an incremental dump tape. As
users' mailbox files began to become huge, we discovered that incremental
dumps were no longer significantly smaller than full dumps; and our backup
system was no longer able to do that much backup in a single day.
Mix remedies this problem by causing most old messages to wind up in files
that never change; thus only a full backup would save those files and the
incremental backups are much smaller.
While steering away from the Scylla of flat files, mix also avoids the
Charybdis of one-file/one-message which ends up resulting in huge
directories and the performance problems familiar to anyone who has large
news spools with 5 or 6 digit article counts for a single newsgroup.
-- Mark --
http://panda.com/mrc
Democracy is two wolves and a sheep deciding what to eat for lunch.
Liberty is a well-armed sheep contesting the vote.
More information about the Imap-uw
mailing list