SYNOPSIS

    See bencher CLI.

DESCRIPTION

    Bencher is a benchmark framework. The main feature of Bencher is
    permuting list of Perl codes with list of arguments into benchmark
    items, and then benchmark them. You can run only some of the items as
    well as filter codes and arguments to use. You can also permute
    multiple perls and multiple module versions.

TERMINOLOGY

    Scenario. A hash data structure that lists participants, datasets, and
    other things. The bencher CLI can accept a scenario from a module
    (under Bencher::Scenario::* namespace), a script, or from command-line
    option. See "SCENARIO".

    Participant. What to run or benchmark. Usually a Perl code or code
    template, or a command or command template. See "participants".

    Dataset. Arguments or parameters to permute with a participant. See
    "datasets".

    (Benchmark) item. Participant that has been permuted with dataset into
    code ready to run. Usually a scenario does not contain items directly,
    but only participants and datasets, and let Bencher permute them into
    items.

SCENARIO

    The core data structure that you need to prepare is the scenario. It is
    a DefHash (i.e. just a regular Perl hash). The two most important keys
    of this hash are: participants and datasets.

    An example scenario (from Bench::Scenario::Example):

     package Bencher::Scenario::Example;
     our $scenario = {
         participants => [
             {fcall_template => q[Text::Wrap::wrap('', '', <text>)]},
         ],
         datasets => [
             { name=>"foobar x100",   args => {text=>"foobar " x 100} },
             { name=>"foobar x1000",  args => {text=>"foobar " x 1000} },
             { name=>"foobar x10000", args => {text=>"foobar " x 10000} },
         ],
     };
     1;

 participants

    participants (array) lists Perl codes (or external commands) that we
    want to benchmark.

  Specifying participant's code

    There are several kinds of code you can specify:

    First, you can just specify module (str, a Perl module name). This is
    useful when running scenario in "Running benchmark in module startup
    mode" in module_startup mode. Also useful to instruct Bencher to load
    the module. When not in module startup mode, there is no code in this
    participant to run.

    You can also specify modules (an array of Perl module names) if you
    want to benchmark several modules together. Similarly, this is only
    useful for running in module startup mode.

    You can specify code (a coderef) which contains the code to benchmark.
    However, the point of Bencher is to use fcall_template or at least
    code_template to be able to easily permute the code with datasets (see
    below). So you should only specify code when you cannot specify
    fcall_template or code_template or the other way.

    You can specify fall_template, and this is the recommended way whenever
    possible. It is a string containing a function call code, in the form
    of:

     MODULENAME::FUNCTIONAME(ARG, ...)

    or

     CLASSNAME->FUNCTIONAME(ARG, ...)

    For example:

     Text::Levenshtein::fastdistance(<word1>, <word2>)

    Another example:

     Module::CoreList->is_code(<module>)

    It can be used to benchmark a function call or a method call. From this
    format, Bencher can easily extract the module name so user can also run
    in module startup mode.

    By using a template, Bencher can generate actual codes from this
    template by combining it with datasets. The words enclosed in <...>
    will be replaced with actual arguments specified in "datasets". The
    arguments are automatically encoded as Perl value, so it's safe to use
    arrayref or complex structures as argument values (however, you can use
    <...:raw> to avoid this automatic encoding).

    Aside from fcall_template, you can also use code_template (a string
    containing arbitrary code), in the cases where the code you want to
    benchmark cannot be expressed as a simple function/method call, for
    example (taken from Bencher::Scenario::ComparisonOps):

     participants => [
         {name=>'1k-numeq'      , code_template=>'my $val =     1; for (1..1000) { if ($val ==     1) {} if ($val ==     2) {} }'},
         {name=>'1k-streq-len1' , code_template=>'my $val = "a"  ; for (1..1000) { if ($val eq "a"  ) {} if ($val eq "b"  ) {} }'},
         {name=>'1k-streq-len3' , code_template=>'my $val = "foo"; for (1..1000) { if ($val eq "foo") {} if ($val eq "bar") {} }'},
         {name=>'1k-streq-len10', code_template=>'my $val = "abcdefghij"; for (1..1000) { if ($val eq "abcdefghij") {} if ($val eq "klmnopqrst") {} }'},
     ],

    Like in fcall_template, words enclosed in <...> will be replaced with
    actual data. When generating actual code, Bencher will enclose the code
    template with sub { .. }.

    Or, if you are benchmarking external commands, you specify cmdline
    (array or strings, or strings) or cmdline_template (array/str) or
    perl_cmdline or perl_cmdline_template instead. An array cmdline will
    not use shell, while the string version will use shell. perl_cmdline*
    are the same as cmdline* except the first implicit argument/prefix is
    perl. When the cmdline template is filled with the arguments, the
    values will automatically be shell-escaped (unless you use the
    <...:raw> syntax).

    When using code template, code will be generated and eval-ed in the
    main package.

  Specifying participant's name

    By default, Bencher will attempt to figure out the name for a
    participant (a sequence number starting from 1, a module name or module
    name followed by function name, etc). You can also specify name for a
    participant explicitly so you can refer to it more easily later, e.g.:

     participants => [
         {name=>'pp', fcall_template=>'List::MoreUtils::PP::uniq(@{<array>})'},
         {name=>'xs', fcall_template=>'List::MoreUtils::XS::uniq(@{<array>})'},
     ],

  List of properties for a participant

    This is a reference section.

      * name (str)

      From DefHash.

      * summary (str)

      From DefHash.

      * description (str)

      From DefHash.

      * tags (array of str)

      From DefHash. Define tag(s) for this participant. Can be used to
      include/exclude groups of participants having the same tags.

      * module (str)

      * modules (array of str)

      * function (str)

      * fcall_template (str)

      * code_template (str)

      * code (code)

      * cmdline (str|array of str)

      * cmdline_template (str|array of str)

      * perl_cmdline (str|array of str)

      * perl_cmdline_template (str|array of str)

      * result_is_list (bool, default 0)

      This is useful when dumping item's codes, so Bencher will use a list
      context when receiving result.

      * include_by_default> (bool, default 1)

      Can be set to false if you want to exclude participant by default
      when running benchmark, unless the participant is explicitly included
      e.g. using --include-participant command-line option.

 datasets

    datasets (array) lists the function inputs (or command-line arguments).
    You can name each dataset too, to be able to refer to it more easily.

    Other properties you can add to a dataset: include_by_default (bool,
    default true, can be set to false if you want to exclude dataset by
    default when running benchmark, unless the dataset is explicitly
    included).

      * name (str)

      From DefHash.

      * summary (str)

      From DefHash.

      * description (str)

      From DefHash.

      * tags (array of str)

      From DefHash. Define tag(s) for this dataset. Can be used to
      include/exclude groups of datasets having the same tags.

      * args (hash)

      Example:

       {filename=>"ujang.txt", size=>10}

      You can supply multiple argument values by adding @ suffix to the
      argument name. You then supply an array for the values, example:

       {filename=>"ujang.txt", 'size@'=>[10, 100, 1000]}

      This means, for each participant mentioning size, three benchmark
      items will be generated, one for each value of size.

      Aside from array, you can also use hash for the multiple values. This
      has a nice effect of showing nicer names (in the hash keys) for the
      argument value, e.g.:

       {filename=>"ujang.txt", 'size@'=>{"1M"=>1024*2, "1G"=>1024**3, "1T"=>1024**4}}

      * argv (array)

      * include_by_default (bool, default 1)

      * include_participant_tags (array of str)

      Only include participants having all these tags.

      * exclude_participant_tags (array of str)

      Exclude participants having any of these tags.

  Other properties

    Other known scenario properties (keys):

      * name

      From DefHash, scenario name (usually short and one word).

      * summary

      From DefHash, a one-line plaintext summary.

      * description (str)

      From DefHash, longer description in Markdown.

      * module_startup (bool)

      * default_precision (float, between=>[0,1])

      Precision to pass to Benchmark::Dumb. Default is 0. Can be overriden
      via --precision (CLI).

      * include_result_size (bool)

      Show the size of the item code's return value. Size is measured using
      Devel::Size. The measurement is done once per item when it is testing
      the.

      * include_process_size (bool)

      Include some memory statistics in each item's result. This currently
      only works on Linux because the measurement is done by reading
      /proc/PID/smaps. Also, since this is a per-process information, to
      get this information each item's code will be run by dumping the code
      (using B::Deparse) into a temporary file, then running the file (once
      per item, after the item's code is completed) using a new perl
      interpreter process. This is done to get a measurement on a clean
      process that does not load Bencher itself or the other items. This
      also means that not all code will work: all the caveats in "MULTIPLE
      PERLS AND MULTIPLE MODULE VERSIONS" apply. In short, all outside data
      will not be available for the code.

      Also, this information normally does not make sense for external
      command participants, because what is measured is the memory
      statistics of the perl process itself, not the external command's
      processes.

      * capture_stdout (bool)

      Useful for silencing command/code that outputs stuffs to stdout. Note
      that output capturing might affect timings if your benchmark code
      outputs a lot of stuffs. See also: capture_stderr.

      * capture_stderr (bool)

      Useful for silencing command/code that outputs stuffs to stderr. Note
      that output capturing might affect timings if your benchmark code
      outputs a lot of stuffs. See also: capture_stdout.

      * extra_modules (array of str)

      You can specify extra modules to load here before benchmarking. The
      modules and their versions will be listed in the result metadata
      under func.module_versions, for extra information. An example to put
      here are modules that contain/produce datasets that get benchmarked,
      because the data might differ from version to version.

      * on_failure (str, "skip"|"die")

      For a command participant, failure means non-zero exit code. For a
      Perl-code participant, failure means Perl code dies or (if expected
      result is specified) the result is not equal to the expected result.

      The default is "die". When set to "skip", will first run the code of
      each item before benchmarking and trap command failure/Perl exception
      and if that happens, will "skip" the item.

      Can be overriden in the CLI with --on-failure option.

      * on_result_failure (str, "skip"|"die"|"warn")

      This is like on_failure except that it specifically refer to the
      failure of item's result not being equal to expected result.

      The default is the value of on_failure.

      There is an extra choice of `warn` for this type of failure, which is
      to print a warning to STDERR and continue.

      Can be overriden in the CLI with --on-result-failure option.

      * before_parse_scenario (code)

      If specified, then this code will be called before parsing scenario.
      Code will be given hash argument with the following keys: hook_name
      (str, set to before_gen_items), scenario (hash, unparsed scenario),
      stash (hash, which you can use to pass data between hooks).

      * after_parse_scenario (code)

      If specified, then this code will be called after parsing scenario.
      Code will be given hash argument with the following keys: hook_name,
      scenario (hash, parsed scenario), stash.

      * before_list_datasets (code)

      If specified, then this code will be called before enumerating
      datasets from scenario. Code will be given hash argument with the
      following keys: hook_name, scenario, stash.

      You can use this hook to, e.g.: generate datasets dynamically.

      * before_list_participants (code)

      If specified, then this code will be called before enumerating
      participants from scenario. Code will be given hash argument with the
      following keys: hook_name, scenario, stash.

      You can use this hook to, e.g.: generate participants dynamically.

      * before_gen_items (code)

      If specified, then this code will be called before generating items.
      Code will be given hash argument with the following keys: hook_name,
      scenario, stash.

      You can use this hook to, e.g.: modify datasets/participants before
      being permuted into items.

      * before_bench (code)

      If specified, then this code will be called before starting the
      benchmark. Code will be given hash argument with the following keys:
      hook_name, scenario, stash.

      * after_bench (code)

      If specified, then this code will be called after completing
      benchmark. Code will be given hash argument with the following keys:
      hook_name, scenario, stash, result (array, enveloped result).

      You can use this hook to, e.g.: do some custom
      formatting/modification to the result.

      * before_return (code)

      If specified, then this code will be called before
      displaying/returning the result. Code will be given hash argument
      with the following keys: hook_name, scenario, stash, result.

      You can use this hook to, e.g.: modify the result in some way.

USING THE BENCHER COMMAND-LINE TOOL

 Running benchmark

 Running benchmark in module startup mode

    Module startup mode can be activated either by specifying
    --module-startup option from the command-line, or by setting
    module_startup property to true in the scenario.

    In this mode, instead of running each participant's code, module name
    will be extracted from each participant and this will be benchmarked
    instead:

     perl -MModule1 -e1
     perl -MModule2 -e1
     ...
     perl -e1 ;# the baseline, for comparison

    Basically, this mode tries to measure the startup overhead of each
    module in isolation.

    Module name can be extracted from a participant if a participant
    specifies module or fcall_template or modules. When a participant does
    not contain any module name, it will be skipped.

MULTIPLE PERLS AND MULTIPLE MODULE VERSIONS

    Bencher can be instructed to run benchmark items against multiple perl
    installations, as well as multiple versions of a module.

    Bencher uses perlbrew to get the list of available perl installations,
    so you need to install perlbrew and brew some perls first.

    To run against multiple versions of a module, specify the module name
    in --multimodver then add one or more library include paths using -I.
    The include paths need to contain different versions of the module.

    Caveats. Here is how benchmarking against multiple perls and module
    versions currently works. Bencher first prepares a new scenario based
    on the input scenario. But the new scenario contains benchmark items
    that has been permuted and where the code template has been converted
    into actual Perl code (a coderef). The new scenario along with the Perl
    codes in it will be dumped using Data::Dmp (which can deparse code)
    into a temporary file. A new Bencher process is then started using the
    appropriate perl interpreter, runs the scenario, and returns the result
    as JSON. The original Bencher process then collects and combines the
    per-interpreter results into the final result.

    Due to the above way of working, there are some caveats. First, code
    that contains closures won't work properly because the original
    variables that the code can see are no longer available in the new
    process. Also, some scenarios prepare data in a hook like in the
    before_bench or before_gen_items hook. This also won't work because the
    new scenario that gets dumped into temporary file currently has all the
    hooks stripped first.

    So in principle, to enable a benchmark item to be run against multiple
    perls or module versions, make the code self-sufficient. Do not depend
    on an outside variable. Instead, only depend on the variables in the
    dataset.

SEE ALSO

    bencher

    BenchmarkAnything. There are lot of overlaps of goals between Bencher
    and this project. I hope to reuse or interoperate parts of
    BenchmarkAnything, e.g. storing Bencher results in a BenchmarkAnything
    storage backend, sending Bencher results to a BenchmarkAnything HTTP
    server, and so on.

    Benchmark, Benchmark::Dumb (Dumbbench)

    Bencher::Scenario::* for examples of scenarios.

