From: andreas.koenig@anima.de (Andreas J. Koenig)
Subject: Re: CPAN is getting too big
To: Elaine -HFB- Ashton <elaine@chaos.wustl.edu>
Cc: packrats@history.perl.org
Date: 04 Feb 2000 17:03:59 +0100

>>>>> On Mon, 10 Jan 2000 12:37:14 -0600, Elaine -HFB- Ashton <elaine@chaos.wustl.edu> said:

 > Andreas J. Koenig [andreas.koenig@anima.de] quoth:
 > *> > Ah, gotcha. How much disk space would it need at the outset? I could put
 > *>
 > *>Less than 2 GB.

 > I can handle that. :)

Turns out to be much less:

~ftp/pub/backpan% du -sk *
1126602 authors
12      lost+found
1506    modules
235437  ports
2724    src

The bad news: ports/ and src/ are only random not systematic
collections.

The obvious: modules/ is bogus in that context. It is a symlink tree
that tries to help point&click, but has no meaning for an
archeological collection.

The grain of salt: missings. The backup system made a few hickups over
the time and a few files may be missing if they had a short life on
CPAN. The critical time zones are June 1999, December 1999, January
2000.

I have deleted all symlinks and all CHECKSUMS files in the authors/
tree as well as the authors/0* files. I believe the really interesting
part is only authors/id/ directory.

I'm now running the following cronjob to keep the tree up to date:

    rsync --exclude CHECKSUMS -vrptgx /home/ftp/pub/PAUSE/authors/id/ /home/ftp/pub/backpan/authors/id/

Note the absense of -l, I believe symlinks are no good in there.

I do not have access to a good backup system anymore, so I'd be glad
if you would not rely on me and would take over the further
maintainance.

The whole thing is available at

    ftp://pause.perl.org/pub/backpan
    rsync: pause.perl.org/backpan

Let me know if I can help sorting things out,
Hope you like it,
-- 
andreas

