NAME
    File::CRBackup - Cp+rsync-based filesystem backup with history levels
    and hardlinks

VERSION
    version 0.02

SYNOPSIS
    In daily-backup script:

     #!/usr/bin/perl
     use File::CRBackup qw(backup);
     use Log::Any::App;
     backup(
         source    => '/path/to/mydata',
         target    => '/backup/mydata',
         histories => [7, 4, 3],         # 7 days, 4 weeks, 3 months
     );

DESCRIPTION
    This module utilizes two mature, dependable Unix command-line utilities,
    cp and rsync, to create a filesystem backup system. Some characteristics
    of this backup system:

    *   Supports backup histories and history levels

        For example, you can create 7 level-1 backup histories (equals 7
        days worth of history if you run backup once daily), 4 level-2
        backup histories (roughly equals 4 weeks) and 3 level-3 backup
        histories (rougly equals 3 months). The number of levels and history
        per levels are customizable.

    *   Backups are not compressed/archived ("tar"-ed)

        They are just verbatim copies (produced by "cp -a", or "rsync -a")
        of source directory. The upside of this is ease of cherry-picking
        (taking/restoring individual files from backup). The downside is
        lack of compression and the backup not being a single archive file.

        This is because rsync needs two real directory trees when comparing.
        Perhaps when rsync supports tar virtual filesystem in the future...

    *   Hardlinks are used between backup histories to save disk space

        This way, we can maintain several backup histories without wasting
        too much space duplicating data when there are not a lot of
        differences among them.

    *   High performance

        Rsync and cp are implemented in C and have been optimized for a long
        time. rm is also used instead of Perl implementation
        File::Path::remove_path.

    *   Unix-specific

        There are ports of cp, rm, and rsync on Windows, but this module
        hasn't been tested on those platforms.

    This module uses Log::Any logging framework.

HOW IT WORKS
  First-time backup
    First, we lock target directory to prevent other backup process to
    interfere:

     mkdir -p TARGET
     flock    TARGET/.lock

    Then we copy source to temporary directory:

     cp -a    SRC            TARGET/.tmp

    If copy finishes successfully, we rename temporary directory to final
    directory 'current':

     rename   TARGET/.tmp    TARGET/current
     touch    TARGET/.current.timestamp

    If copy fails in the middle, TARGET/.tmp will still be lying around and
    the next backup process will try to rsync it (to be more efficient):

     rsync    SRC            TARGET/.tmp

    Finally, we remove lock:

     unlock   TARGET/.lock

  Subsequent backups (after TARGET/current exists)
    First, we lock target directory to prevent other backup process to
    interfere:

     flock    TARGET/.lock

    Then we copy current to temporary directory, using hardlinks when
    possible:

     cp -la   TARGET/current TARGET/.tmp

    Then we rsync source to target directory:

     rsync    SRC            TARGET/.tmp

    If rsync finishes successfully, we rename target directories:

     rename   TARGET/current TARGET/hist.<timestamp>
     rename   TARGET/.tmp    TARGET/current
     touch    TARGET/.current.timestamp

    If rsync fails in the middle, TARGET/.tmp will be lying around and the
    next backup process will just continue the rsync process.

  Maintenance of histories/history levels
    TARGET/hist.* are level-1 backup histories. Each backup run will produce
    a new history:

     TARGET/hist.<timestamp1>
     TARGET/hist.<timestamp2> # produced by the next backup
     TARGET/hist.<timestamp3> # and the next ...
     TARGET/hist.<timestamp4> # and so on ...
     TARGET/hist.<timestamp5>
     ...

    You can specify the number of histories (or number of days) to maintain.
    If the number of histories exceeds the limit, older histories will be
    deleted, or one will be promoted to the next level, if a higher level is
    specified.

    For example, with histories being set to [7, 4, 3], after
    TARGET/hist.<timestamp8> is created, TARGET/hist.<timestamp1> will be
    promoted to level 2:

     rename TARGET/hist.<timestamp1> TARGET/hist2.<timestamp1>

    TARGET/hist2.* directories are level-2 backup histories. After a while,
    they will also accumulate:

     TARGET/hist2.<timestamp1>
     TARGET/hist2.<timestamp8>
     TARGET/hist2.<timestamp15>
     TARGET/hist2.<timestamp22>

    When TARGET/hist2.<timestamp29> arrives, TARGET/hist2.<timestamp1> will
    be promoted to level 3: TARGET/hist3.<timestamp1>. After a while,
    level-3 backup histories too will accumulate:

     TARGET/hist3.<timestamp1>
     TARGET/hist3.<timestamp29>
     TARGET/hist3.<timestamp57>

    Finally, TARGET/hist3.<timestamp1> will be deleted after
    TARGET/hist3.<timestamp85> comes along.

FUNCTIONS
    None of the functions are exported by default.

  backup(%args)
    Arguments (those marked with "*" are required):

    *   source* => PATH or [PATH, ...]

    *   target* => PATH

    *   histories* => [NUM, ...]

        Specifies number of backup histories to keep for level 1, 2, and so
        on. If number is negative, specifies number of days to keep instead
        (regardless of number of histories).

    *   extra_dir => BOOL

        If set to 1, then backup(source => '/a', target => '/backup/a') will
        create another 'a' directory, i.e. /backup/a/current/a. Otherwise,
        contents of a/ will be directly copied under /backup/a/current/.

        Will always be set to 1 if source is more than one, but default to 0
        if source is a single directory. You can set this to 1 to so that
        behaviour when there is a single source is the same as behaviour
        when there are several sources.

    *   backup => BOOL (default 1)

        Whether to do backup or not. If backup=1 and rotate=0 then will only
        create new backup without rotating histories.

    *   rotate => BOOL (default 1)

        Whether to rotate histories or not (which is done after backup). If
        backup=0 and rotate=1 then will only do history rotating.

    *   extra_cp_opts => ARRAYREF (default none)

        Extra options to pass to cp command when doing backup. Note that the
        options will be shell quoted.

    *   extra_rsync_opts => ARRAYREF (default none)

        Extra options to pass to rsync command when doing backup. Note that
        the options will be shell quoted, so you should pass it unquoted,
        e.g. ['--exclude', '/Program Files'].

HISTORY
    This module came out of the Spanel hosting control panel project. We
    needed a daily backup system for shared hosting accounts that supports
    histories and cherry-picking. At first we used rdiff-backup, but turned
    out it was not very robust as the script chose to exit on many kinds of
    non-fatal errors instead of ignoring the errors and continuning backup.
    It was also very slow: on a server with hundreds of accounts with
    millions of files, backup process often took 12 hours or more. After
    evaluating several other solutions, we realized that nothing beats the
    raw performance of rsync/cp. Thus we designed a simple backup system
    based on them.

TODO
    * Allow ionice etc instead of just nice -n19

SEE ALSO
    File::Backup

    File::Rotate::Backup

AUTHOR
    Steven Haryanto <stevenharyanto@gmail.com>

COPYRIGHT AND LICENSE
    This software is copyright (c) 2011 by Steven Haryanto.

    This is free software; you can redistribute it and/or modify it under
    the same terms as the Perl 5 programming language system itself.

