# doc-cache created by Octave 7.3.0
# name: cache
# type: cell
# rows: 3
# columns: 153
# name: <cell-element>
# type: sq_string
# elements: 1
# length: 20
ConfusionMatrixChart


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 932
 -- statistics: P = ConfusionMatrixChart ()
     Create object P, a Confusion Matrix Chart object.

     "DiagonalColor"
          The color of the patches on the diagonal, default is [0.0,
          0.4471, 0.7412].

     "OffDiagonalColor"
          The color of the patches off the diagonal, default is [0.851,
          0.3255, 0.098].

     "GridVisible"
          Available values: on (default), off.

     "Normalization"
          Available values: absolute (default), column-normalized,
          row-normalized, total-normalized.

     "ColumnSummary"
          Available values: off (default), absolute,
          column-normalized,total-normalized.

     "RowSummary"
          Available values: off (default), absolute, row-normalized,
          total-normalized.

     MATLAB compatibility - the not implemented properties are:
     FontColor, PositionConstraint, InnerPosition, Layout.

     See also: confusionchart.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 49
Create object P, a Confusion Matrix Chart object.



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 6
adtest


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 4624
 -- statistics: H = adtest (X)
 -- statistics: H = adtest (X, NAME, VALUE)
 -- statistics: [H, PVAL] = adtest (...)
 -- statistics: [H, PVAL, ADSTAT, CV] = adtest (...)

     Anderson-Darling goodness-of-fit hypothesis test.

     'H = adtest (X)' returns a test decision for the null hypothesis
     that the data in vector X is from a population with a normal
     distribution, using the Anderson-Darling test.  The alternative
     hypothesis is that x is not from a population with a normal
     distribution.  The result H is 1 if the test rejects the null
     hypothesis at the 5% significance level, or 0 otherwise.

     'H = adtest (X, NAME, VALUE)' returns a test decision for the
     Anderson-Darling test with additional options specified by one or
     more Name-Value pair arguments.  For example, you can specify a
     null distribution other than normal, or select an alternative
     method for calculating the p-value, such as a Monte Carlo
     simulation.

     The following parameters can be parsed as Name-Value pair
     arguments.

     Name               Description
     --------------------------------------------------------------------------
     "Distribution"     The distribution being tested for.  It tests whether
                        X could have come from the specified distribution.
                        There are two choise available for parsing
                        distribution parameters:

        * One of the following char strings: "norm", "exp", "ev",
          "logn", "weibull", for defining either the 'normal',
          'exponential', 'extreme value', lognormal, or 'Weibull'
          distribution family, accordingly.  In this case, X is tested
          against a composite hypothesis for the specified distribution
          family and the required distribution parameters are estimated
          from the data in X.  The default is "norm".

        * A cell array defining a distribution in which the first cell
          contains a char string with the distribution name, as
          mentioned above, and the consecutive cells containing all
          specified parameters of the null distribution.  In this case,
          X is tested against a simple hypothesis.

     "Alpha"            Significance level alpha for the test.  Any scalar
                        numeric value between 0 and 1.  The default is 0.05
                        corresponding to the 5% significance level.
                        
     "MCTol"            Monte-Carlo standard error for the p-value, PVAL,
                        value.  which must be a positive scalar value.  In
                        this case, an approximation for the p-value is
                        computed directly, using Monte-Carlo simulations.
                        
     "Asymptotic"       Method for calculating the p-value of the
                        Anderson-Darling test, which can be either true or
                        false logical value.  If you specify 'true', adtest
                        estimates the p-value using the limiting
                        distribution of the Anderson-Darling test statistic.
                        If you specify 'false', adtest calculates the
                        p-value based on an analytical formula.  For sample
                        sizes greater than 120, the limiting distribution
                        estimate is likely to be more accurate than the
                        small sample size approximation method.

        * If you specify a distribution family with unknown parameters
          for the distribution Name-Value pair (i.e.  composite
          distribution hypothesis test), the "Asymptotic" option must be
          false.
        * 
          If you use MCTol to calculate the p-value using a Monte Carlo
          simulation, the "Asymptotic" option must be false.

     '[H, PVAL] = adtest (...)' also returns the p-value, PVAL, of the
     Anderson-Darling test, using any of the input arguments from the
     previous syntaxes.

     '[H, PVAL, ADSTAT, CV] = adtest (...)' also returns the test
     statistic, ADSTAT, and thhe critical value, CV, for the
     Anderson-Darling test.

     The Anderson-Darling test statistic belongs to the family of
     Quadratic Empirical Distribution Function statistics, which are
     based on the weighted sum of the difference [Fn(x)-F(x)]^2 over the
     ordered sample values X1 < X2 < ... < Xn, where F is the
     hypothesized continuous distribution and Fn is the empirical CDF
     based on the data sample with n sample points.

     See also: kstest.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 49
Anderson-Darling goodness-of-fit hypothesis test.



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 6
anova1


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 3022
 -- statistics: P = anova1 (X)
 -- statistics: P = anova1 (X, GROUP)
 -- statistics: P = anova1 (X, GROUP, DISPLAYOPT)
 -- statistics: P = anova1 (X, GROUP, DISPLAYOPT, VARTYPE)
 -- statistics: [P, ATAB] = anova1 (X, ...)
 -- statistics: [P, ATAB, STATS] = anova1 (X, ...)

     Perform a one-way analysis of variance (ANOVA) for comparing the
     means of two or more groups of data under the null hypothesis that
     the groups are drawn from distributions with the same mean.  For
     planned contrasts and/or diagnostic plots, use anovan instead.

     anova1 can take up to three input arguments:

        * X contains the data and it can either be a vector or matrix.
          If X is a matrix, then each column is treated as a separate
          group.  If X is a vector, then the GROUP argument is
          mandatory.

        * GROUP contains the names for each group.  If X is a matrix,
          then GROUP can either be a cell array of strings of a
          character array, with one row per column of X.  If you want to
          omit this argument, enter an empty array ([]).  If X is a
          vector, then GROUP must be a vector of the same length, or a
          string array or cell array of strings with one row for each
          element of X.  X values corresponding to the same value of
          GROUP are placed in the same group.

        * DISPLAYOPT is an optional parameter for displaying the groups
          contained in the data in a boxplot.  If omitted, it is 'on' by
          default.  If group names are defined in GROUP, these are used
          to identify the groups in the boxplot.  Use 'off' to omit
          displaying this figure.

        * VARTYPE is an optional parameter to used to indicate whether
          the groups can be assumed to come from populations with equal
          variance.  When vartype is "equal" the variances are assumed
          to be equal (this is the default).  When vartype is "unequal"
          the population variances are not assumed to be equal and
          Welch's ANOVA test is used instead.

     anova1 can return up to three output arguments:

        * P is the p-value of the null hypothesis that all group means
          are equal.

        * ATAB is a cell array containing the results in a standard
          ANOVA table.

        * STATS is a structure containing statistics useful for
          performing a multiple comparison of means with the MULTCOMPARE
          function.

     If anova1 is called without any output arguments, then it prints
     the results in a one-way ANOVA table to the standard output.  It is
     also printed when DISPLAYOPT is 'on'.

     Examples:

          x = meshgrid (1:6);
          x = x + normrnd (0, 1, 6, 6);
          anova1 (x, [], 'off');
          [p, atab] = anova1(x);

          x = ones (50, 4) .* [-2, 0, 1, 5];
          x = x + normrnd (0, 2, 50, 4);
          groups = {"A", "B", "C", "D"};
          anova1 (x, groups);

     See also: anova2, anovan, multcompare.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 80
Perform a one-way analysis of variance (ANOVA) for comparing the means
of two...



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 6
anova2


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 2499
 -- statistics: P = anova2 (X, REPS)
 -- statistics: P = anova2 (X, REPS, DISPLAYOPT)
 -- statistics: P = anova2 (X, REPS, DISPLAYOPT, MODEL)
 -- statistics: [P, ATAB] = anova2 (...)
 -- statistics: [P, ATAB, STATS] = anova2 (...)

     Performs two-way factorial (crossed) or a nested analysis of
     variance (ANOVA) for balanced designs.  For unbalanced factorial
     designs, diagnostic plots and/or planned contrasts, use anovan
     instead.

     anova2 requires two input arguments with an optional third and
     fourth:

        * X contains the data and it must be a matrix of at least two
          columns and two rows.

        * REPS is the number of replicates for each combination of
          factor groups.

        * DISPLAYOPT is an optional parameter for displaying the ANOVA
          table, when it is 'on' (default) and suppressing the display
          when it is 'off'.

        * MODEL is an optional parameter to specify the model type as
          either:

             * "interaction" or "full" (default): compute both main
               effects and their interaction

             * "linear": compute both main effects without an
               interaction.  When REPS > 1 the test is suitable for a
               balanced randomized block design.  When REPS == 1, the
               test becomes a One-way Repeated Measures (RM)-ANOVA with
               Greenhouse-Geisser correction to the column factor
               degrees of freedom to make the test robust to violations
               of sphericity

             * "nested": treat the row factor as nested within columns.
               Note that the row factor is considered a random factor in
               the calculation of the statistics.

     anova2 returns up to three output arguments:

        * P is the p-value of the null hypothesis that all group means
          are equal.

        * ATAB is a cell array containing the results in a standard
          ANOVA table.

        * STATS is a structure containing statistics useful for
          performing a multiple comparison of means with the MULTCOMPARE
          function.

     If anova2 is called without any output arguments, then it prints
     the results in a one-way ANOVA table to the standard output as if
     DISPLAYOPT is 'on'.

     Examples:

          load popcorn;
          anova2 (popcorn, 3);

          [p, anovatab, stats] = anova2 (popcorn, 3, "off");
          disp (p);

     See also: anova1, anovan, multcompare.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 80
Performs two-way factorial (crossed) or a nested analysis of variance
(ANOVA)...



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 6
anovan


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 8129
 -- statistics: P = anovan (Y, GROUP)
 -- statistics: P = anovan (Y, GROUP, NAME, VALUE)
 -- statistics: [P, ATAB] = anovan (...)
 -- statistics: [P, ATAB, STATS] = anovan (...)
 -- statistics: [P, ATAB, STATS, TERMS] = anovan (...)

     Perform a multi (N)-way analysis of (co)variance (ANOVA or ANCOVA)
     to evaluate the effect of one or more categorical or continuous
     predictors (i.e.  independent variables) on a continuous outcome
     (i.e.  dependent variable).  The algorithms used make anovan
     suitable for balanced or unbalanced factorial (crossed) designs.
     By default, anovan treats all factors as fixed.  Examples of
     function usage can be found by entering the command 'demo anovan'.

     Data is a single vector Y with groups specified by a corresponding
     matrix or cell array of group labels GROUP, where each column of
     GROUP has the same number of rows as Y.  For example, if Y =
     [1.1;1.2]; GROUP = [1,2,1; 1,5,2]; then observation 1.1 was
     measured under conditions 1,2,1 and observation 1.2 was measured
     under conditions 1,5,2.  If the GROUP provided is empty, then the
     linear model is fit with just the intercept (no predictors).

     anovan can take a number of optional parameters as name-value
     pairs.

     '[...] = anovan (Y, GROUP, "continuous", CONTINUOUS)'

        * CONTINUOUS is a vector of indices indicating which of the
          columns (i.e.  factors) in GROUP should be treated as
          continuous predictors rather than as categorical predictors.
          The relationship between continuous predictors and the outcome
          should be linear.

     '[...] = anovan (Y, GROUP, "random", RANDOM)'

        * RANDOM is a vector of indices indicating which of the columns
          (i.e.  factors) in GROUP should be treated as random effects
          rather than fixed effects.  Octave anovan provides only basic
          support for random effects.  Specifically, since all
          F-statistics in anovan are calculated using the mean-squared
          error (MSE), any interaction terms containing a random effect
          are dropped from the model term definitions and their
          associated variance is pooled with the residual, unexplained
          variance making up the MSE. In effect, the model then fitted
          equates to a linear mixed model with random intercept(s).
          Variable names for random factors are appended with a '
          symbol.

     '[...] = anovan (Y, GROUP, "model", MODELTYPE)'

        * MODELTYPE can specified as one of the following:

             * "linear" (default) : compute N main effects with no
               interactions.

             * "interaction" : compute N effects and N*(N-1) two-factor
               interactions

             * "full" : compute the N main effects and interactions at
               all levels

             * a scalar integer : representing the maximum interaction
               order

             * a matrix of term definitions : each row is a term and
               each column is a factor

               -- Example:
               A two-way ANOVA with interaction would be: [1 0; 0 1; 1 1]

     '[...] = anovan (Y, GROUP, "sstype", SSTYPE)'

        * SSTYPE can specified as one of the following:

             * 1 : Type I sequential sums-of-squares.

             * 2 or "h" : Type II partially sequential (or hierarchical)
               sums-of-squares

             * 3 (default) : Type III partial, constrained or marginal
               sums-of-squares

     '[...] = anovan (Y, GROUP, "varnames", VARNAMES)'

        * VARNAMES must be a cell array of strings with each element
          containing a factor name for each column of GROUP. By default
          (if not parsed as optional argument), VARNAMES are
          "X1","X2","X3", etc.

     '[...] = anovan (Y, GROUP, "alpha", ALPHA)'

        * ALPHA must be a scalar value between 0 and 1 requesting
          100*(1-alpha)% confidence bounds for the regression
          coefficients returned in STATS.coeffs (default 0.05 for 95%
          confidence)

     '[...] = anovan (Y, GROUP, "display", DISPOPT)'

        * DISPOPT can be either "on" (default) or "off" and controls the
          display of the model formula, table of model parameters and
          the ANOVA table.  The F-statistic and p-values are formatted
          in APA-style.  To avoid p-hacking, the table of model
          parameters is only displayed if we set planned contrasts (see
          below).

     '[...] = anovan (Y, GROUP, "contrasts", CONTRASTS)'

        * CONTRASTS can be specified as one of the following:

             * A string corresponding to one of the built-in contrasts
               listed below:

                  * "simple" or "anova" (default): Simple (ANOVA)
                    contrasts.  (The first level appearing in the GROUP
                    column is the reference level)

                  * "poly": Polynomial contrasts for trend analysis.

                  * "helmert": Helmert contrasts.

                  * "effect": Deviation effect coding.  (The first level
                    appearing in the GROUP column is omitted).

                  * "treatment": Treatment contrast (or dummy) coding.
                    (The first level appearing in the GROUP column is
                    the reference level).  These contrasts are not
                    compatible with SSTYPE 3.

             * A matrix containing a custom contrast coding scheme (i.e.
               the generalized inverse of contrast weights).  Rows in
               the contrast matrices correspond to factor levels in the
               order that they first appear in the GROUP column.  The
               matrix must contain the same number of columns as there
               are the number of factor levels minus one.

          If the anovan model contains more than one factor and a
          built-in contrast coding scheme was specified, then those
          contrasts are applied to all factors.  To specify different
          contrasts for different factors in the model, CONTRASTS should
          be a cell array with the same number of cells as there are
          columns in GROUP.  Each cell should define contrasts for the
          respective column in GROUP by one of the methods described
          above.  If cells are left empty, then the default contrasts
          are applied.  Contrasts for cells corresponding to continuous
          factors are ignored.

     '[...] = anovan (Y, GROUP, "weights", WEIGHTS)'

        * WEIGHTS is an optional vector of weights to be used when
          fitting the linear model.  Weighted least squares (WLS) is
          used with weights (that is, minimizing sum (weights *
          residuals.^2)); otherwise ordinary least squares (OLS) is used
          (default is empty for OLS).

     anovan can return up to four output arguments:

     P = anovan (...) returns a vector of p-values, one for each term.

     [P, ATAB] = anovan (...) returns a cell array containing the ANOVA
     table.

     [P, ATAB, STATS] = anovan (...) returns a structure containing
     additional statistics, including degrees of freedom and effect
     sizes for each term in the linear model, the design matrix, the
     variance-covariance matrix, (weighted) model residuals, and the
     mean squared error.  The columns of STATS.coeffs (from
     left-to-right) report the model coefficients, standard errors,
     lower and upper 100*(1-alpha)% confidence interval bounds,
     t-statistics, and p-values relating to the contrasts.  The number
     appended to each term name in STATS.coeffnames corresponds to the
     column number in the relevant contrast matrix for that factor.  The
     STATS structure can be used as input for multcompare.  The STATS
     structure is also recognised by the functions bootcoeff and bootemm
     from the statistics-bootstrap package.

     [P, ATAB, STATS, TERMS] = anovan (...) returns the model term
     definitions.

     See also: anova1, anova2, multcompare, fitlm.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 80
Perform a multi (N)-way analysis of (co)variance (ANOVA or ANCOVA) to
evaluat...



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 13
bartlett_test


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 2045
 -- statistics: H = bartlett_test (X)
 -- statistics: H = bartlett_test (X, GROUP)
 -- statistics: H = bartlett_test (X, ALPHA)
 -- statistics: H = bartlett_test (X, GROUP, ALPHA)
 -- statistics: [H, PVAL] = bartlett_test (...)
 -- statistics: [H, PVAL, CHISQ] = bartlett_test (...)
 -- statistics: [H, PVAL, CHISQ, DF] = bartlett_test (...)

     Perform a Bartlett test for the homogeneity of variances.

     Under the null hypothesis of equal variances, the test statistic
     CHISQ approximately follows a chi-square distribution with DF
     degrees of freedom.

     The p-value (1 minus the CDF of this distribution at CHISQ) is
     returned in PVAL.  H = 1 if the null hypothesis is rejected at the
     significance level of ALPHA.  Otherwise H = 0.

     Input Arguments:

        * X contains the data and it can either be a vector or matrix.
          If X is a matrix, then each column is treated as a separate
          group.  If X is a vector, then the GROUP argument is
          mandatory.  NaN values are omitted.

        * GROUP contains the names for each group.  If X is a vector,
          then GROUP must be a vector of the same length, or a string
          array or cell array of strings with one row for each element
          of X.  X values corresponding to the same value of GROUP are
          placed in the same group.  If X is a matrix, then GROUP can
          either be a cell array of strings of a character array, with
          one row per column of X in the same way it is used in 'anova1'
          function.  If X is a matrix, then GROUP can be omitted either
          by entering an empty array ([]) or by parsing only ALPHA as a
          second argument (if required to change its default value).

        * ALPHA is the statistical significance value at which the null
          hypothesis is rejected.  Its default value is 0.05 and it can
          be parsed either as a second argument (when GROUP is omitted)
          or as a third argument.

     See also: levene_test, vartest2, vartestn.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 57
Perform a Bartlett test for the homogeneity of variances.



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 8
barttest


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 1085
 -- statistics: NDIM = barttest (X)
 -- statistics: NDIM = barttest (X, ALPHA)
 -- statistics: [NDIM, PVAL] = barttest (X, ALPHA)
 -- statistics: [NDIM, PVAL, CHISQ] = barttest (X, ALPHA)

     Bartlett's test of sphericity for correlation.

     It compares an observed correlation matrix to the identity matrix
     in order to check if there is a certain redundancy between the
     variables that we can summarize with a few number of factors.  A
     statistically significant test shows that the variables (columns)
     in X are correlated, thus it makes sense to perform some
     dimensionality reduction of the data in X.

     'NDIM = barttest (X, ALPHA)' returns the number of dimensions
     necessary to explain the nonrandom variation in the data matrix X
     at the ALPHA significance level.  ALPHA is an optional input
     argument and, when not provided, it is 0.05 by default.

     '[NDIM, PVAL, CHISQ] = barttest (...)' also returns the
     significance values PVAL for the hypothesis test for each dimension
     as well as the associated chi^2 values in CHISQ


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 46
Bartlett's test of sphericity for correlation.



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 8
betastat


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 1034
 -- statistics: [M, V] = betastat (A, B)

     Compute mean and variance of the beta distribution.

     Arguments
     ---------

        * A is the first parameter of the beta distribution.  A must be
          positive

        * B is the second parameter of the beta distribution.  B must be
          positive
     A and B must be of common size or one of them must be scalar

     Return values
     -------------

        * M is the mean of the beta distribution

        * V is the variance of the beta distribution

     Examples
     --------

          a = 1:6;
          b = 1:0.2:2;
          [m, v] = betastat (a, b)

          [m, v] = betastat (a, 1.5)

     References
     ----------

       1. Wendy L. Martinez and Angel R. Martinez.  'Computational
          Statistics Handbook with MATLAB'. Appendix E, pages 547-557,
          Chapman & Hall/CRC, 2001.

       2. Athanasios Papoulis.  'Probability, Random Variables, and
          Stochastic Processes'.  McGraw-Hill, New York, second edition,
          1984.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 51
Compute mean and variance of the beta distribution.



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 8
binostat


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 1098
 -- statistics: [M, V] = binostat (N, P)

     Compute mean and variance of the binomial distribution.

     Arguments
     ---------

        * N is the first parameter of the binomial distribution.  The
          elements of N must be natural numbers

        * P is the second parameter of the binomial distribution.  The
          elements of P must be probabilities
     N and P must be of common size or one of them must be scalar

     Return values
     -------------

        * M is the mean of the binomial distribution

        * V is the variance of the binomial distribution

     Examples
     --------

          n = 1:6;
          p = 0:0.2:1;
          [m, v] = binostat (n, p)

          [m, v] = binostat (n, 0.5)

     References
     ----------

       1. Wendy L. Martinez and Angel R. Martinez.  'Computational
          Statistics Handbook with MATLAB'. Appendix E, pages 547-557,
          Chapman & Hall/CRC, 2001.

       2. Athanasios Papoulis.  'Probability, Random Variables, and
          Stochastic Processes'.  McGraw-Hill, New York, second edition,
          1984.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 55
Compute mean and variance of the binomial distribution.



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 8
binotest


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 1170
 -- statistics: [H, PVAL, CI] = binotest (POS, N, P0)
 -- statistics: [H, PVAL, CI] = binotest (POS, N, P0, NAME, VALUE)

     Test for probability P of a binomial sample

     Perform a test of the null hypothesis P == P0 for a sample of size
     N with POS positive results.

     Name-Value pair arguments can be used to set various options.
     "alpha" can be used to specify the significance level of the test
     (the default value is 0.05).  The option "tail", can be used to
     select the desired alternative hypotheses.  If the value is "both"
     (default) the null is tested against the two-sided alternative 'P
     != P0'.  The value of PVAL is determined by adding the
     probabilities of all event less or equally likely than the observed
     number POS of positive events.  If the value of "tail" is "right"
     the one-sided alternative 'P > P0' is considered.  Similarly for
     "left", the one-sided alternative 'P < P0' is considered.

     If H is 0 the null hypothesis is accepted, if it is 1 the null
     hypothesis is rejected.  The p-value of the test is returned in
     PVAL.  A 100(1-alpha)% confidence interval is returned in CI.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 43
Test for probability P of a binomial sample



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 7
boxplot


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 8334
 -- statistics: S = boxplot (DATA)
 -- statistics: S = boxplot (DATA, GROUP)
 -- statistics: S = boxplot (DATA, NOTCHED, SYMBOL, ORIENTATION,
          WHISKER, ...)
 -- statistics: S = boxplot (DATA, GROUP, NOTCHED, SYMBOL, ORIENTATION,
          WHISKER, ...)
 -- statistics: S = boxplot (DATA, OPTIONS)
 -- statistics: S = boxplot (DATA, GROUP, OPTIONS, ...)
 -- statistics: [..., H] = boxplot (DATA, ...)

     Produce a box plot.

     A box plot is a graphical display that simultaneously describes
     several important features of a data set, such as center, spread,
     departure from symmetry, and identification of observations that
     lie unusually far from the bulk of the data.

     Input arguments (case-insensitive) recognized by boxplot are:

        * DATA is a matrix with one column for each data set, or a cell
          vector with one cell for each data set.  Each cell must
          contain a numerical row or column vector (NaN and NA are
          ignored) and not a nested vector of cells.

        * NOTCHED = 1 produces a notched-box plot.  Notches represent a
          robust estimate of the uncertainty about the median.

          NOTCHED = 0 (default) produces a rectangular box plot.

          NOTCHED within the interval (0,1) produces a notch of the
          specified depth.  Notched values outside (0,1) are amusing if
          not exactly impractical.

        * SYMBOL sets the symbol for the outlier values.  The default
          symbol for points that lie outside 3 times the interquartile
          range is 'o'; the default symbol for points between 1.5 and 3
          times the interquartile range is '+'.
          Alternative SYMBOL settings:

          SYMBOL = '.': points between 1.5 and 3 times the IQR are
          marked with '.'  and points outside 3 times IQR with 'o'.

          SYMBOL = ['x','*']: points between 1.5 and 3 times the IQR are
          marked with 'x' and points outside 3 times IQR with '*'.

        * ORIENTATION = 0 makes the boxes horizontally.
          ORIENTATION = 1 plots the boxes vertically (default).
          Alternatively, orientation can be passed as a string, e.g.,
          'vertical' or 'horizontal'.

        * WHISKER defines the length of the whiskers as a function of
          the IQR (default = 1.5).  If WHISKER = 0 then 'boxplot'
          displays all data values outside the box using the plotting
          symbol for points that lie outside 3 times the IQR.

        * GROUP may be passed as an optional argument only in the second
          position after DATA.  GROUP contains a numerical vector
          defining separate categories, each plotted in a different box,
          for each set of DATA values that share the same GROUP value or
          values.  With the formalism (DATA, GROUP), both must be
          vectors of the same length.

        * OPTIONS are additional paired arguments passed with the
          formalism (Name, Value) that provide extra functionality as
          listed below.  OPTIONS can be passed at any order after the
          initial arguments and are case-insensitive.

          'Notch'        'on'           Notched by 0.25 of the boxes width.
                         'off'          Produces a straight box.
                         scalar         Proportional width of the notch.
                                        
          'Symbol'       '.'            Defines only outliers between 1.5 and 3
                                        IQR.
                         ['x','*']      2nd character defines outliers > 3 IQR
                                        
          'Orientation'  'vertical'     Default value, can also be defined with
                                        numerical 1.
                         'horizontal'   Can also be defined with numerical 0.
                                        
          'Whisker'      scalar         Multiplier of IQR (default is 1.5).
                                        
          'OutlierTags'  'on' or 1      Plot the vector index of the outlier
                                        value next to its point.
                         'off' or 0     No tags are plotted (default value).
                                        
          'Sample_IDs'   'cell'         A cell vector with one cell for each data
                                        set containing a nested cell vector with
                                        each sample's ID (should be a string).
                                        If this option is passed, then all
                                        outliers are tagged with their respective
                                        sample's ID string instead of their
                                        vector's index.
                                        
          'BoxWidth'     'proportional' Create boxes with their width
                                        proportional to the number of samples in
                                        their respective dataset (default value).
                         'fixed'        Make all boxes with equal width.
                                        
          'Widths'       scalar         Scaling factor for box widths (default
                                        value is 0.4).
                                        
          'CapWidths'    scalar         Scaling factor for whisker cap widths
                                        (default value is 1, which results to
                                        'Widths'/8 halflength)
                                        
          'BoxStyle'     'outline'      Draw boxes as outlines (default value).
                         'filled'       Fill boxes with a color (outlines are
                                        still plotted).
                                        
          'Positions'    vector         Numerical vector that defines the
                                        position of each data set.  It must have
                                        the same length as the number of groups
                                        in a desired manner.  This vector merely
                                        defines the points along the group axis,
                                        which by default is [1:number of groups].
                                        
          'Labels'       cell           A cell vector of strings containing the
                                        names of each group.  By default each
                                        group is labeled numerically according to
                                        its order in the data set
                                        
          'Colors'       character      If just one character or 1x3 vector of
                         string or      RGB values, specify the fill color of all
                         Nx3            boxes when BoxStyle = 'filled'.  If a
                         numerical      character string or Nx3 matrix is
                         matrix         entered, box #1's fill color corrresponds
                                        to the first character or first matrix
                                        row, and the next boxes' fill colors
                                        corresponds to the next characters or
                                        rows.  If the char string or Nx3 array is
                                        exhausted the color selection wraps
                                        around.

     Supplemental arguments not described above (...) are concatenated
     and passed to the plot() function.

     The returned matrix S has one column for each data set as follows:

     1       Minimum
     2       1st quartile
     3       2nd quartile (median)
     4       3rd quartile
     5       Maximum
     6       Lower confidence limit for median
     7       Upper confidence limit for median

     The returned structure H contains handles to the plot elements,
     allowing customization of the visualization using set/get
     functions.

     Example

          title ("Grade 3 heights");
          axis ([0,3]);
          set(gca (), "xtick", [1 2], "xticklabel", {"girls", "boys"});
          boxplot ({randn(10,1)*5+140, randn(13,1)*8+135});


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 19
Produce a box plot.



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 9
canoncorr


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 534
 -- statistics: [A, B, R, U, V] = canoncorr (X, Y)

     Canonical correlation analysis

     Given X (size K*M) and Y (K*N), returns projection matrices of
     canonical coefficients A (size M*D, where D is the smallest of M,
     N, D) and B (size M*D); the canonical correlations R (1*D, arranged
     in decreasing order); the canonical variables U, V (both K*D, with
     orthonormal columns); and STATS, a structure containing results
     from Bartlett's chi-square and Rao's F tests of significance.

     See also: princomp.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 30
Canonical correlation analysis



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 3
cdf


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 3710
 -- statistics: RETVAL = cdf (NAME, X, ...)

     Return cumulative density function of NAME function for value X.

     This is a wrapper around various NAMEcdf and NAME_cdf functions.
     See the individual functions help to learn the signification of the
     arguments after X.  Supported functions and corresponding number of
     additional arguments are:

       function               alternative                      args
     -------------------------------------------------------------------------
       "bbs"                  "Birnbaum-Saunders"              3
       "beta"                                                  2
       "bino"                 "binomial"                       2
       "bino"                 "binomial"                       3 include
                                                               'upper'
       "burr"                 "Burr"                           2
       "cauchy"               "Cauchy"                         2
       "chi2"                 "chi-square"                     1
       "copula"               "Copula family"                  2
       "copula"               "Copula family"                  3 include nu
       "discrete"             "univariate discrete"            2
       "empirical"            "univariate empirical"           1
       "exp"                  "exponential"                    1
       "f"                                                     2
       "gam"                  "gamma"                          2
       "geo"                  "geometric"                      1
       "gev"                  "generalized extreme value"      3
       "gp"                   "generalized Pareto"             3
       "hyge"                 "hypergeometric"                 3
       "jsu"                  "Johnson SU"                     2
       "ks"                   "Kolmogorov-Smirnov"             1
       "laplace"              "Laplace"                        0
       "logistic"                                              0
       "logn"                 "lognormal"                      0 defaults:
                                                               mu=0,sigma=1
       "logn"                 "lognormal"                      2
       "mvn"                  "multivariate normal"            2
       "mvn"                  "multivariate normal"            3 include
                                                               low limit a
       "mvt"                  "multivariate Student"           2
       "mvt"                  "multivariate Student"           3 include
                                                               low limit a
       "naka"                 "Nakagami"                       2
       "nbin"                 "negative binomial"              2
       "norm"                 "normal"                         0 defaults:
                                                               mu=0,sigma=1
       "norm"                 "normal"                         2
       "poiss"                "Poisson"                        1
       "rayl"                 "Rayleigh"                       1
       "stdnormal"            "standard normal"                0
       "t"                                                     1
       "tri"                  "triangular"                     3
       "unid"                 "uniform discrete"               1
       "unif"                 "uniform"                        0 defaults:
                                                               a=0,b=1
       "unif"                 "uniform"                        2
       "wbl"                  "Weibull"                        2

     See also: pdf, rnd.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 64
Return cumulative density function of NAME function for value X.



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 7
cdfcalc


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 800
 -- statistics: [YCDF, XCDF, N, EMSG, EID] = cdfcalc (X)

     Calculate an empirical cumulative distribution function.

     '[YCDF, XCDF] = cdfcalc (X)' calculates an empirical cumulative
     distribution function (CDF) of the observations in the data sample
     vector X.  X may be a row or column vector, and represents a random
     sample of observations from some underlying distribution.  On
     return XCDF is the set of X values at which the CDF increases.  At
     XCDF(i), the function increases from YCDF(i) to YCDF(i+1).

     '[YCDF, XCDF, N] = cdfcalc (X)' also returns N, the sample size.

     '[YCDF, XCDF, N, EMSG, EID] = cdfcalc (X)' also returns an error
     message and error id if X is not a vector or if it contains no
     values other than NaN.

     See also: cdfplot.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 56
Calculate an empirical cumulative distribution function.



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 7
cdfplot


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 1223
 -- statistics: HCDF = cdfplot (X)
 -- statistics: [HCDF, STATS] = cdfplot (X)

     Display an empirical cumulative distribution function.

     'HCDF = cdfplot (X)' plots an empirical cumulative distribution
     function (CDF) of the observations in the data sample vector X.  X
     may be a row or column vector, and represents a random sample of
     observations from some underlying distribution.

     'cdfplot' plots F(x), the empirical (or sample) CDF versus the
     observations in X.  The empirical CDF, F(x), is defined as follows:

     F(x) = (Number of observations <= x) / (Total number of
     observations)

     for all values in the sample vector X.  NaNs are ignored.  HCDF is
     the handle of the empirical CDF curve (a handle hraphics 'line'
     object).

     '[HCDF, STATS] = cdfplot (X)' also returns a structure with the
     following fields as a statistical summary.

          STATS.min              minimum value of X
          STATS.max              maximum value of X
          STATS.mean             sample mean of X
          STATS.median           sample median (50th percentile) of X
          STATS.std              sample standard deviation of X

     See also: qqplot, cdfcalc.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 54
Display an empirical cumulative distribution function.



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 7
chi2gof


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 4752
 -- statistics: H = chi2gof (X)
 -- statistics: [H, P] = chi2gof (X)
 -- statistics: [P, H, STATS] = chi2gof (X)
 -- statistics: [...] = chi2gof (X, NAME1, VALUE1, ...)

     Chi-square goodness-of-fit test.

     'chi2gof' performs a chi-square goodness-of-fit test for discrete
     or continuous distributions.  The test is performed by grouping the
     data into bins, calculating the observed and expected counts for
     those bins, and computing the chi-square test statistic
     SUM((O-E).^2./E), where O is the observed counts and E is the
     expected counts.  This test statistic has an approximate chi-square
     distribution when the counts are sufficiently large.

     Bins in either tail with an expected count less than 5 are pooled
     with neighboring bins until the count in each extreme bin is at
     least 5.  If bins remain in the interior with counts less than 5,
     CHI2GOF displays a warning.  In that case, you should use fewer
     bins, or provide bin centers or binedges, to increase the expected
     counts in all bins.

     'H = chi2gof (X)' performs a chi-square goodness-of-fit test that
     the data in the vector X are a random sample from a normal
     distribution with mean and variance estimated from X.  The result
     is H = 0 if the null hypothesis (that X is a random sample from a
     normal distribution) cannot be rejected at the 5% significance
     level, or H = 1 if the nullhypothesis can be rejected at the 5%
     level.  'chi2gof' uses by default 10 bins ('nbins'), and compares
     the test statistic to a chi-square distribution with 'nbins' - 3
     degrees of freedom, to take into account that two parameters were
     estimated.

     '[H, P] = chi2gof (X)' also returns the p-value P, which is the
     probability of observing the given result, or one more extreme, by
     chance if the null hypothesis is true.  If there are not enough
     degrees of freedom to carry out the test, P is NaN.

     '[H, P, STATS] = chi2gof (X)' also returns a STATS structure with
     the following fields:

          "chi2stat"             Chi-square statistic
          "df"                   Degrees of freedom
          "binedges"             Vector of bin binedges after pooling
          "O"                    Observed count in each bin
          "E"                    Expected count in each bin

     '[...] = chi2gof (X, NAME1, VALUE1, ...)' specifies optional
     argument name/value pairs chosen from the following list.

          Name           Value
     ---------------------------------------------------------------------------
          "nbins"        The number of bins to use.  Default is 10.
          "binctrs"      A vector of bin centers.
          "binedges"     A vector of bin binedges.
          "cdf"          A fully specified cumulative distribution function
                         or a a function handle.  Alternatively, a cell array
                         whose first element is a function handle and all
                         later elements are parameter values, one per cell.
                         provided in a cell array whose first element is a
                         function handle, and whose later elements are
                         parameter values, one per cell.  The function must
                         take X values as its first argument, and other
                         parameters as later arguments.
          "expected"     A vector with one element per bin specifying the
                         expected counts for each bin.
          "nparams"      The number of estimated parameters; used to adjust
                         the degrees of freedom to be 'nbins' - 1 -
                         'nparams', where 'nbins' is the number of bins.
          "emin"         The minimum allowed expected value for a bin; any
                         bin in either tail having an expected value less
                         than this amount is pooled with a neighboring bin.
                         Use the value 0 to prevent pooling.  Default is 5.
          "frequency"    A vector of the same length as X containing the
                         frequency of the corresponding X values.
          "alpha"        An ALPHA value such that the hypothesis is rejected
                         if P < ALPHA. Default is ALPHA = 0.05.

     You should specify either "cdf" or "expected" parameters, but not
     both.  If your "cdf" input contains extra parameters, these are
     accounted for automatically and there is no need to specify
     "nparams".  If your "expected" input depends on estimated
     parameters, you should use the "nparams" parameter to ensure that
     the degrees of freedom for the test is correct.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 32
Chi-square goodness-of-fit test.



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 8
chi2stat


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 840
 -- statistics: [M, V] = chi2stat (N)

     Compute mean and variance of the chi-square distribution.

     Arguments
     ---------

        * N is the parameter of the chi-square distribution.  The
          elements of N must be positive

     Return values
     -------------

        * M is the mean of the chi-square distribution

        * V is the variance of the chi-square distribution

     Example
     -------

          n = 1:6;
          [m, v] = chi2stat (n)

     References
     ----------

       1. Wendy L. Martinez and Angel R. Martinez.  'Computational
          Statistics Handbook with MATLAB'. Appendix E, pages 547-557,
          Chapman & Hall/CRC, 2001.

       2. Athanasios Papoulis.  'Probability, Random Variables, and
          Stochastic Processes'.  McGraw-Hill, New York, second edition,
          1984.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 57
Compute mean and variance of the chi-square distribution.



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 8
chi2test


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 4415
 -- statistics: PVAL = chi2test (X)
 -- statistics: [PVAL, CHISQ] = chi2test (X)
 -- statistics: [PVAL, CHISQ, DF] = chi2test (X)
 -- statistics: [PVAL, CHISQ, DF, E] = chi2test (X)
 -- statistics: [...] = chi2test (X, NAME, VALUE)

     Perform a chi-squared test (for independence or homogeneity).

     For 2-way contingency tables, 'chi2test' performs and a chi-squared
     test for independence or homogeneity, according to the sampling
     scheme and related question.  Independence means that the the two
     variables forming the 2-way table are not associated, hence you
     cannot predict from one another.  Homogeneity refers to the concept
     of similarity, hence they all come from the same distribution.

     Both tests are computationally identical and will produce the same
     result.  Nevertheless, they anwser to different questions.
     Consider two variables, one for gender and another for smoking.  To
     test independence (whether gender and smoking is associated), we
     would randomly sample from the general population and break them
     down into categories in the table.  To test homogeneity (whether
     men and women share the same smoking habits), we would sample
     individuals from within each gender, and then measure their smoking
     habits (e.g.  smokers vs non-smokers).

     When 'chi2test' is called without any output arguments, it will
     print the result in the terminal including p-value, chi^2
     statistic, and degrees of freedom.  Otherwise it can return the
     following output arguments:

          PVAL    the p-value of the relevant test.
          CHISQ   the chi^2 statistic of the relevant test.
          DF      the degrees of freedom of the relevant test.
          E       the EXPECTED values of the original contigency table.

     Unlike MATLAB, in GNU Octave 'chi2test' also supports 3-way tables,
     which involve three categorical variables (each in a different
     dimension of X.  In its simplest form, '[...] = chi2test (X)' will
     will test for mutual independence among the three variables.
     Alternatively, when called in the form '[...] = chi2test (X, NAME,
     VALUE)', it can perform the following tests:

     NAME           VALUE   Description
     --------------------------------------------------------------------------
     "mutual"       []      Mutual independence.  All variables are
                            independent from each other, (A, B, C). Value
                            must be an empty matrix.
     "joint"        scalar  Joint independence.  Two variables are jointly
                            independent of the third, (AB, C). The scalar
                            value corresponds to the dimension of the
                            independent variable (i.e.  3 for C).
     "marginal"     scalar  Marginal independence.  Two variables are
                            independent if you ignore the third, (A, C). The
                            scalar value corresponds to the dimension of the
                            variable to be ignored (i.e.  2 for B).
     "conditional"  scalar  Conditional independence.  Two variables are
                            independent given the third, (AC, BC). The
                            scalar value corresponds to the dimension of the
                            variable that forms the conditional dependence
                            (i.e.  3 for C).
     "homogeneous"  []      Homogeneous associations.  Conditional (partial)
                            odds-ratios are not related on the value of the
                            third, (AB, AC, BC). Value must be an empty
                            matrix.

     When testing for homogeneous associations in 3-way tables, the
     iterative proportional fitting procedure is used.  For small
     samples it is better to use the Cochran-Mantel-Haenszel Test.
     K-way tables for k > 3 are supported only for testing mutual
     independence.  Similar to 2-way tables, no optional parameters are
     required for k > 3 multi-way tables.

     'chi2test' produces a warning if any cell of a 2x2 table has an
     expected frequency less than 5 or if more than 20% of the cells in
     larger 2-way tables have expected frequencies less than 5 or any
     cell with expected frequency less than 1.  In such cases, use
     'fishertest'.

     See also: fishertest.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 61
Perform a chi-squared test (for independence or homogeneity).



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 7
cholcov


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 1389
 -- statistics: T = cholcov (SIGMA)
 -- statistics: [T, P = cholcov (SIGMA)
 -- statistics: [...] = cholcov (SIGMA, FLAG)

     Cholesky-like decomposition for covariance matrix.

     'T = cholcov (SIGMA)' computes matrix T such that SIGMA = T' T.
     SIGMA must be square, symmetric, and positive semi-definite.

     If SIGMA is positive definite, then T is the square, upper
     triangular Cholesky factor.  If SIGMA is not positive definite, T
     is computed with an eigenvalue decomposition of SIGMA, but in this
     case T is not necessarily triangular or square.  Any eigenvectors
     whose corresponding eigenvalue is close to zero (within a
     tolerance) are omitted.  If any remaining eigenvalues are negative,
     T is empty.

     The tolerance is calculated as '10 * eps (max (abs (diag
     (sigma))))'.

     '[T, P = cholcov (SIGMA)' returns in P the number of negative
     eigenvalues of SIGMA.  If P > 0, then T is empty, whereas if P = 0,
     SIGMA) is positive semi-definite.

     If SIGMA is not square and symmetric, P is NaN and T is empty.

     '[T, P = cholcov (SIGMA, 0)' returns P = 0 if SIGMA is positive
     definite, in which case T is the Cholesky factor.  If SIGMA is not
     positive definite, P is a positive integer and T is empty.

     '[...] = cholcov (SIGMA, 1)' is equivalent to ' [...] = cholcov
     (SIGMA)'.

     See also: chov.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 50
Cholesky-like decomposition for covariance matrix.



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 11
cl_multinom


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 2952
 -- statistics: CL = cl_multinom (X, N, B, CALCULATION_TYPE)

     Confidence level of multinomial portions.  Returns confidence level
     of multinomial parameters estimated p = x / sum(x) with predefined
     confidence interval B.  Finite population is also considered.

     This function calculates the level of confidence at which the
     samples represent the true distribution given that there is a
     predefined tolerance (confidence interval).  This is the upside
     down case of the typical excercises at which we want to get the
     confidence interval given the confidence level (and the estimated
     parameters of the underlying distribution).  But once we accept
     (lets say at elections) that we have a standard predefined maximal
     acceptable error rate (e.g.  B=0.02 ) in the estimation and we just
     want to know that how sure we can be that the measured proportions
     are the same as in the entire population (ie.  the expected value
     and mean of the samples are roghly the same) we need to use this
     function.

     Arguments
     ---------

        * X : int vector : sample frequencies bins.
        * N : int : Population size that was sampled by x.  If N<sum(x),
          infinite number assumed.
        * B : real, vector : confidence interval.

          If vector, it should be the size of x containing confence
          interval for each cells.  If scalar, each cell will have the
          same value of b unless it is zero or -1.  If value is 0, b=.02
          is assumed which is standard choice at elections otherwise it
          is calculated in a way that one sample in a cell alteration
          defines the confidence interval.
        * CALCULATION_TYPE : string : (Optional), described below
          "bromaghin" (default) - do not change it unless you have a
          good reason to do so "cochran" "agresti_cull" this is not
          exactly the solution at reference given below but an
          adjustment of the solutions above

     Returns
     -------

     Confidence level.

     Example
     -------

     CL = cl_multinom( [27;43;19;11], 10000, 0.05 ) returns 0.69
     confidence level.

     References
     ----------

     "bromaghin" calculation type (default) is based on the article
     Jeffrey F. Bromaghin, "Sample Size Determination for Interval
     Estimation of Multinomial Probabilities", The American Statistician
     vol 47, 1993, pp 203-206.

     "cochran" calculation type is based on article Robert T. Tortora,
     "A Note on Sample Size Estimation for Multinomial Populations", The
     American Statistician, , Vol 32.  1978, pp 100-102.

     "agresti_cull" calculation type is based on article in which
     Quesenberry Hurst and Goodman result is combined A. Agresti and
     B.A. Coull, "Approximate is better than \"exact\" for interval
     estimation of binomial portions", The American Statistician, Vol.
     52, 1998, pp 119-126


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 41
Confidence level of multinomial portions.



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 7
cluster


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 1199
 -- statistics: T = cluster (Z, "Cutoff", C)
 -- statistics: T = cluster (Z, "Cutoff", C, "Depth", D)
 -- statistics: T = cluster (Z, "Cutoff", C, "Criterion", CRITERION)
 -- statistics: T = cluster (Z, "MaxClust", N)

     Define clusters from an agglomerative hierarchical cluster tree.

     Given a hierarchical cluster tree Z generated by the 'linkage'
     function, 'cluster' defines clusters, using a threshold value C to
     identify new clusters ('Cutoff') or according to a maximum number
     of desired clusters N ('MaxClust').

     CRITERION is used to choose the criterion for defining clusters,
     which can be either "inconsistent" (default) or "distance".  When
     using "inconsistent", 'cluster' compares the threshold value C to
     the inconsistency coefficient of each link; when using "distance",
     'cluster' compares the threshold value C to the height of each
     link.  D is the depth used to evaluate the inconsistency
     coefficient, its default value is 2.

     'cluster' uses "distance" as a criterion for defining new clusters
     when it is used the 'MaxClust' method.

     See also: clusterdata, dendrogram, inconsistent, kmeans, linkage,
     pdist.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 64
Define clusters from an agglomerative hierarchical cluster tree.



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 11
clusterdata


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 834
 -- statistics: T = clusterdata (X, CUTOFF)
 -- statistics: T = clusterdata (X, NAME, VALUE)

     Wrapper function for 'linkage' and 'cluster'.

     If CUTOFF is used, then 'clusterdata' calls 'linkage' and 'cluster'
     with default value, using CUTOFF as a threshold value for
     'cluster'.  If CUTOFF is an integer and greater or equal to 2, then
     CUTOFF is interpreted as the maximum number of cluster desired and
     the "MaxClust" option is used for 'cluster'.

     If CUTOFF is not used, then 'clusterdata' expects a list of pair
     arguments.  Then you must specify either the "Cutoff" or "MaxClust"
     option for 'cluster'.  The method and metric used by 'linkage', are
     defined through the "linkage" and "distance" arguments.

     See also: cluster, dendrogram, inconsistent, kmeans, linkage,
     pdist.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 45
Wrapper function for 'linkage' and 'cluster'.



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 8
cmdscale


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 2698
 -- statistics: Y = cmdscale (D)
 -- statistics: [Y, E] = cmdscale (D)

     Classical multidimensional scaling of a matrix.

     Takes an N by N distance (or difference, similarity, or
     dissimilarity) matrix D.  Returns Y, a matrix of N points with
     coordinates in P dimensional space which approximate those
     distances (or differences, similarities, or dissimilarities).  Also
     returns the eigenvalues E of 'B = -1/2 * J * (D.^2) * J', where 'J
     = eye(N) - ones(N,N)/N'.  P, the number of columns of Y, is equal
     to the number of positive real eigenvalues of B.

     D can be a full or sparse matrix or a vector of length 'N*(N-1)/2'
     containing the upper triangular elements (like the output of the
     'pdist' function).  It must be symmetric with non-negative entries
     whose values are further restricted by the type of matrix being
     represented:

     * If D is either a distance, dissimilarity, or difference matrix,
     then it must have zero entries along the main diagonal.  In this
     case the points Y equal or approximate the distances given by D.

     * If D is a similarity matrix, the elements must all be less than
     or equal to one, with ones along the the main diagonal.  In this
     case the points Y equal or approximate the distances given by 'D =
     sqrt(ones(N,N)-D)'.

     D is a Euclidean matrix if and only if B is positive semi-definite.
     When this is the case, then Y is an exact representation of the
     distances given in D.  If D is non-Euclidean, Y only approximates
     the distance given in D.  The approximation used by 'cmdscale'
     minimizes the statistical loss function known as STRAIN.

     The returned Y is an N by P matrix showing possible coordinates of
     the points in P dimensional space ('P < N').  The columns are
     correspond to the positive eigenvalues of B in descending order.  A
     translation, rotation, or reflection of the coordinates given by Y
     will satisfy the same distance matrix up to the limits of machine
     precision.

     For any 'K <= P', if the largest K positive eigenvalues of B are
     significantly greater in absolute magnitude than its other
     eigenvalues, the first K columns of Y provide a K-dimensional
     reduction of Y which approximates the distances given by D.  The
     optional return E can be used to consider various values of K, or
     to evaluate the accuracy of specific dimension reductions (e.g., 'K
     = 2').

     Reference: Ingwer Borg and Patrick J.F. Groenen (2005), Modern
     Multidimensional Scaling, Second Edition, Springer, ISBN:
     978-0-387-25150-9 (Print) 978-0-387-28981-6 (Online)

     See also: pdist.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 47
Classical multidimensional scaling of a matrix.



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 6
combnk


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 89
 -- statistics: C = combnk (DATA, K)

     Return all combinations of K elements in DATA.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 46
Return all combinations of K elements in DATA.



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 14
confusionchart


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 1861
 -- statistics: confusionchart (TRUELABELS, PREDICTEDLABELS)
 -- statistics: confusionchart (M)
 -- statistics: confusionchart (M, CLASSLABELS)
 -- statistics: confusionchart (PARENT, ...)
 -- statistics: confusionchart (..., PROP, VAL, ...)
 -- statistics: CM = confusionchart (...)

     Display a chart of a confusion matrix.

     The two vectors of values TRUELABELS and PREDICTEDLABELS, which are
     used to compute the confusion matrix, must be defined with the same
     format as the inputs of 'confusionmat'.  Otherwise a confusion
     matrix M as computed by 'confusionmat' can be given.

     CLASSLABELS is an array of labels, i.e.  the list of the class
     names.

     If the first argument is a handle to a 'figure' or to a 'uipanel',
     then the confusion matrix chart is displayed inside that object.

     Optional property/value pairs are passed directly to the underlying
     objects, e.g.  "xlabel", "ylabel", "title", "fontname", "fontsize"
     etc.

     The optional return value CM is a 'ConfusionMatrixChart' object.
     Specific properties of a 'ConfusionMatrixChart' object are:
        * "DiagonalColor" The color of the patches on the diagonal,
          default is [0.0, 0.4471, 0.7412].

        * "OffDiagonalColor" The color of the patches off the diagonal,
          default is [0.851, 0.3255, 0.098].

        * "GridVisible" Available values: on (default), off.

        * "Normalization" Available values: absolute (default),
          column-normalized, row-normalized, total-normalized.

        * "ColumnSummary" Available values: off (default), absolute,
          column-normalized,total-normalized.

        * "RowSummary" Available values: off (default), absolute,
          row-normalized, total-normalized.

     Run 'demo confusionchart' to see some examples.

     See also: confusionmat, sortClasses.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 38
Display a chart of a confusion matrix.



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 12
confusionmat


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 1437
 -- statistics: C = confusionmat (GROUP, GROUPHAT)
 -- statistics: C = confusionmat (GROUP, GROUPHAT, "Order", GROUPORDER)
 -- statistics: [C, ORDER] = confusionmat (GROUP, GROUPHAT)

     Compute a confusion matrix for classification problems

     'confusionmat' returns the confusion matrix C for the group of
     actual values GROUP and the group of predicted values GROUPHAT.
     The row indices of the confusion matrix represent actual values,
     while the column indices represent predicted values.  The indices
     are the same for both actual and predicted values, so the confusion
     matrix is a square matrix.  Each element of the matrix represents
     the number of matches between a given actual value (row index) and
     a given predicted value (column index), hence correct matches lie
     on the main diagonal of the matrix.  The order of the rows and
     columns is returned in ORDER.

     GROUP and GROUPHAT must have the same number of observations and
     the same data type.  Valid data types are numeric vectors, logical
     vectors, character arrays, string arrays (not implemented yet),
     cell arrays of strings.

     The order of the rows and columns can be specified by setting the
     GROUPORDER variable.  The data type of GROUPORDER must be the same
     of GROUP and GROUPHAT.

     MATLAB compatibility: Octave misses string arrays and categorical
     vectors.

     See also: crosstab.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 54
Compute a confusion matrix for classification problems



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 8
cophenet


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 1106
 -- statistics: [C, D] = cophenet (Z, Y)

     Compute the cophenetic correlation coefficient.

     The cophenetic correlation coefficient C of a hierarchical cluster
     tree Z is the linear correlation coefficient between the cophenetic
     distances D and the euclidean distances Y.

     It is a measure of the similarity between the distance of the
     leaves, as seen in the tree, and the distance of the original data
     points, which were used to build the tree.  When this similarity is
     greater, that is the coefficient is closer to 1, the tree renders
     an accurate representation of the distances between the original
     data points.

     Z is a hierarchical cluster tree, as the output of 'linkage'.  Y is
     a vector of euclidean distances, as the output of 'pdist'.

     The optional output D is a vector of cophenetic distances, in the
     same lower triangular format as Y.  The cophenetic distance between
     two data points is the height of the lowest common node of the
     tree.

     See also: cluster, dendrogram, inconsistent, linkage, pdist,
     squareform.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 47
Compute the cophenetic correlation coefficient.



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 8
cor_test


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 1399
 -- statistics: T = cor_test (X, Y)
 -- statistics: T = cor_test (X, Y, ALT, METHOD)

     Test whether two samples X and Y come from uncorrelated
     populations.

     The optional argument string ALT describes the alternative
     hypothesis, and can be "!=" or "<>" (nonzero), ">" (greater than
     0), or "<" (less than 0).  The default is the two-sided case.

     The optional argument string METHOD specifies which correlation
     coefficient to use for testing.  If METHOD is "pearson" (default),
     the (usual) Pearson's product moment correlation coefficient is
     used.  In this case, the data should come from a bivariate normal
     distribution.  Otherwise, the other two methods offer nonparametric
     alternatives.  If METHOD is "kendall", then Kendall's rank
     correlation tau is used.  If METHOD is "spearman", then Spearman's
     rank correlation rho is used.  Only the first character is
     necessary.

     The output T is a structure with the following elements:

        * pval The p-value of the test.

        * stat The value of the test statistic.

        * dist The distribution of the test statistic.

        * params The parameters of the null distribution of the test
          statistic.

        * alternative The alternative hypothesis.

        * method The method used for testing.

     If no output argument is given, the p-value is displayed.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 68
Test whether two samples X and Y come from uncorrelated populations.



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 8
crosstab


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 562
 -- statistics: T = crosstab (X1, X2)
 -- statistics: T = crosstab (X1, ..., XN)
 -- statistics: [T, CHISQ, P, LABELS] = crosstab (...)

     Create a cross-tabulation (contingency table) T from data vectors.

     The inputs X1, X2, ...  XN must be vectors of equal length with a
     data type of numeric, logical, char, or string (cell array).

     As additional return values 'crosstab' returns the chi-square
     statistics CHISQ, its p-value P and a cell array LABELS, containing
     the labels of each input argument.

     See also: grp2idx, tabulate.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 66
Create a cross-tabulation (contingency table) T from data vectors.



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 8
crossval


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 2155
 -- statistics: RESULTS = crossval (F, X, Y)
 -- statistics: RESULTS = crossval (F, X, Y, NAME, VALUE)

     Perform cross validation on given data.

     F should be a function that takes 4 inputs XTRAIN, YTRAIN, XTEST,
     YTEST, fits a model based on XTRAIN, YTRAIN, applies the fitted
     model to XTEST, and returns a goodness of fit measure based on
     comparing the predicted and actual YTEST.  'crossval' returns an
     array containing the values returned by F for every
     cross-validation fold or resampling applied to the given data.

     X should be an N by M matrix of predictor values

     Y should be an N by 1 vector of predicand values

     Optional arguments may include name-value pairs as follows:

     "KFold"
          Divide set into K equal-size subsets, using each one
          successively for validation.

     "HoldOut"
          Divide set into two subsets, training and validation.  If the
          value K is a fraction, that is the fraction of values put in
          the validation subset (by default K=0.1); if it is a positive
          integer, that is the number of values in the validation
          subset.

     "LeaveOut"
          Leave-one-out partition (each element is placed in its own
          subset).  The value is ignored.

     "Partition"
          The value should be a CVPARTITION object.

     "Given"
          The value should be an N by 1 vector specifying in which
          partition to put each element.

     "stratify"
          The value should be an N by 1 vector containing class
          designations for the elements, in which case the "KFold" and
          "HoldOut" partitionings attempt to ensure each partition
          represents the classes proportionately.

     "mcreps"
          The value should be a positive integer specifying the number
          of times to resample based on different partitionings.
          Currently only works with the partition type "HoldOut".

     Only one of "KFold", "HoldOut", "LeaveOut", "Given", "Partition"
     should be specified.  If none is specified, the default is "KFold"
     with K = 10.

     See also: cvpartition.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 39
Perform cross validation on given data.



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 10
datasample


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 1164
 -- statistics: Y = datasample (DATA, K)
 -- statistics: Y = datasample (DATA, K, DIM)
 -- statistics: Y = datasample (..., NAME, VALUE)
 -- statistics: [Y IDCS] = datasample (...)

     Randomly sample data.

     Return K observations randomly sampled from DATA.  DATA can be a
     vector or a matrix of any data.  When DATA is a matrix or a
     n-dimensional array, the samples are the subarrays of size n - 1,
     taken along the dimension DIM.  The default value for DIM is 1,
     that is the row vectors when sampling a matrix.

     Output Y is the returned sampled data.  Optional output IDCS is the
     vector of the indices to build Y from DATA.

     Additional options are set through pairs of parameter name and
     value.  Available parameters are:

     'Replace'
          a logical value that can be 'true' (default) or 'false': when
          set to 'true', 'datasample' returns data sampled with
          replacement.

     'Weigths'
          a vector of positive numbers that sets the probability of each
          element.  It must have the same size as DATA along dimension
          DIM.

See also: rand, randi, randperm, randsample.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 21
Randomly sample data.



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 4
dcov


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 351
 -- statistics: [DCOR, DCOV, DVARX, DVARY] = dcov (X, Y, INDEX=1)

     Distance correlation, covariance and correlation statistics.

     It returns the distance correlation (DCOR) and the distance
     covariance (DCOV) between X and Y, the distance variance of X in
     (DVARX) and the distance variance of Y in (DVARY).

     See also: corr, cov.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 60
Distance correlation, covariance and correlation statistics.



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 10
dendrogram


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 2292
 -- statistics: dendrogram (TREE)
 -- statistics: dendrogram (TREE, P)
 -- statistics: dendrogram (TREE, PROP, VAL)
 -- statistics: dendrogram (TREE, P, PROP, VAL )
 -- statistics: H = dendrogram (...)
 -- statistics: [H, T, PERM] = dendrogram (...)

     Plot a dendrogram of a hierarchical binary cluster tree.

     Given TREE, a hierarchical binary cluster tree as the output of
     'linkage', plot a dendrogram of the tree.  The number of leaves
     shown by the dendrogram plot is limited to P.  The default value
     for P is 30.  Set P to 0 to plot all leaves.

     The optional outputs are H, T and PERM:
        * H is a handle to the lines of the plot.

        * T is the vector with the numbers assigned to each leaf.  Each
          element of T is a leaf of TREE and its value is the number
          shown in the plot.  When the dendrogram plot is collapsed,
          that is when the number of shown leaves P is inferior to the
          total number of leaves, a single leaf of the plot can
          represent more than one leaf of TREE: in that case multiple
          elements of T share the same value, that is the same leaf of
          the plot.  When the dendrogram plot is not collapsed, each
          leaf of the plot is the leaf of TREE with the same number.

        * PERM is the vector list of the leaves as ordered as in the
          plot.

     Additional input properties can be specified by pairs of properties
     and values.  Known properties are:
        * "Reorder" Reorder the leaves of the dendrogram plot using a
          numerical vector of size n, the number of leaves.  When P is
          smaller than N, the reordering cannot break the P groups of
          leaves.

        * "Orientation" Change the orientation of the plot.  Available
          values: top (default), bottom, left, right.

        * "CheckCrossing" Check if the lines of a reordered dendrogram
          cross each other.  Available values: true (default), false.

        * "ColorThreshold" Not implemented.

        * "Labels" Use a char, string or cellstr array of size N to set
          the label for each leaf; the label is dispayed only for nodes
          with just one leaf.

     See also: cluster, clusterdata, cophenet, inconsistent, linkage,
     pdist.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 56
Plot a dendrogram of a hierarchical binary cluster tree.



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 4
ecdf


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 2331
 -- statistics: [F, X] = ecdf (Y)
 -- statistics: [F, X, FLO, FUP] = ecdf (Y)
 -- statistics: ecdf (...)
 -- statistics: ecdf (AX, ...)
 -- statistics: [...] = ecdf (Y, NAME, VALUE, ...)
 -- statistics: [...] = ecdf (AX, Y, NAME, VALUE, ...)

     Empirical (Kaplan-Meier) cumulative distribution function.

     '[F, X] = ecdf (Y)' calculates the Kaplan-Meier estimate of the
     cumulative distribution function (cdf), also known as the empirical
     cdf.  Y is a vector of data values.  F is a vector of values of the
     empirical cdf evaluated at X.

     '[F, X, FLO, FUP] = ecdf (Y)' also returns lower and upper
     confidence bounds for the cdf.  These bounds are calculated using
     Greenwood's formula, and are not simultaneous confidence bounds.

     'ecdf (...)' without output arguments produces a plot of the
     empirical cdf.

     'ecdf (AX, ...)' plots into existing axes AX.

     '[...] = ecdf (Y, NAME, VALUE, ...)' specifies additional parameter
     name/value pairs chosen from the following:

     NAME           VALUE
     --------------------------------------------------------------------------
     "censoring"    A boolean vector of the same size as Y that is 1 for
                    observations that are right-censored and 0 for
                    observations that are observed exactly.  Default is all
                    observations observed exactly.
                    
     "frequency"    A vector of the same size as Y containing non-negative
                    integer counts.  The jth element of this vector gives
                    the number of times the jth element of Y was observed.
                    Default is 1 observation per Y element.
                    
     "alpha"        A value ALPHA between 0 and 1 specifying the
                    significance level.  Default is 0.05 for 5%
                    significance.
                    
     "function"     The type of function returned as the F output argument,
                    chosen from "cdf" (the default), "survivor", or
                    "cumulative hazard".
                    
     "bounds"       Either "on" to include bounds or "off" (the default) to
                    omit them.  Used only for plotting.

     Type 'demo ecdf' to see examples of usage.

     See also: cdfplot, ecdfhist.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 58
Empirical (Kaplan-Meier) cumulative distribution function.



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 12
evalclusters


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 4518
 -- statistics: EVA = evalclusters (X, CLUST, CRITERION)
 -- statistics: EVA = evalclusters (..., Name, Value)

     Create a clustering evaluation object to find the optimal number of
     clusters.

     'evalclusters' creates a clustering evaluation object to evaluate
     the optimal number of clusters for data X, using criterion
     CRITERION.  The input data X is a matrix with 'n' observations of
     'p' variables.  The evaluation criterion CRITERION is one of the
     following:
     'CalinskiHarabasz'
          to create a 'CalinskiHarabaszEvaluation' object.

     'DaviesBouldin'
          to create a 'DaviesBouldinEvaluation' object.

     'gap'
          to create a 'GapEvaluation' object.

     'silhouette'
          to create a 'SilhouetteEvaluation' object.

     The clustering algorithm CLUST is one of the following:
     'kmeans'
          to cluster the data using 'kmeans' with 'EmptyAction' set to
          'singleton' and 'Replicates' set to 5.

     'linkage'
          to cluster the data using 'clusterdata' with 'linkage' set to
          'Ward'.

     'gmdistribution'
          to cluster the data using 'fitgmdist' with 'SharedCov' set to
          'true' and 'Replicates' set to 5.

     If the CRITERION is 'CalinskiHarabasz', 'DaviesBouldin', or
     'silhouette', CLUST can also be a function handle to a function of
     the form 'c = clust(x, k)', where X is the input data, K the number
     of clusters to evaluate and C the clustering result.  The
     clustering result can be either an array of size 'n' with 'k'
     different integer values, or a matrix of size 'n' by 'k' with a
     likelihood value assigned to each one of the 'n' observations for
     each one of the K clusters.  In the latter case, each observation
     is assigned to the cluster with the higher value.  If the CRITERION
     is 'CalinskiHarabasz', 'DaviesBouldin', or 'silhouette', CLUST can
     also be a matrix of size 'n' by 'k', where 'k' is the number of
     proposed clustering solutions, so that each column of CLUST is a
     clustering solution.

     In addition to the obligatory X, CLUST and CRITERION inputs there
     is a number of optional arguments, specified as pairs of 'Name' and
     'Value' options.  The known 'Name' arguments are:
     'KList'
          a vector of positive integer numbers, that is the cluster
          sizes to evaluate.  This option is necessary, unless CLUST is
          a matrix of proposed clustering solutions.

     'Distance'
          a distance metric as accepted by the chosen CLUST.  It can be
          the name of the distance metric as a string or a function
          handle.  When CRITERION is 'silhouette', it can be a vector as
          created by function 'pdist'.  Valid distance metric strings
          are: 'sqEuclidean' (default), 'Euclidean', 'cityblock',
          'cosine', 'correlation', 'Hamming', 'Jaccard'.  Only used by
          'silhouette' and 'gap' evaluation.

     'ClusterPriors'
          the prior probabilities of each cluster, which can be either
          'empirical' (default), or 'equal'.  When 'empirical' the
          silhouette value is the average of the silhouette values of
          all points; when 'equal' the silhouette value is the average
          of the average silhouette value of each cluster.  Only used by
          'silhouette' evaluation.

     'B'
          the number of reference datasets generated from the reference
          distribution.  Only used by 'gap' evaluation.

     'ReferenceDistribution'
          the reference distribution used to create the reference data.
          It can be 'PCA' (default) for a distribution based on the
          principal components of X, or 'uniform' for a uniform
          distribution based on the range of the observed data.  'PCA'
          is currently not implemented.  Only used by 'gap' evaluation.

     'SearchMethod'
          the method for selecting the optimal value with a 'gap'
          evaluation.  It can be either 'globalMaxSE' (default) for
          selecting the smallest number of clusters which is inside the
          standard error of the maximum gap value, or 'firstMaxSE' for
          selecting the first number of clusters which is inside the
          standard error of the following cluster number.  Only used by
          'gap' evaluation.

     Output EVA is a clustering evaluation object.

See also: CalinskiHarabaszEvaluation, DaviesBouldinEvaluation,
GapEvaluation, SilhouetteEvaluation.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 77
Create a clustering evaluation object to find the optimal number of
clusters.



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 5
evfit


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 1488
 -- statistics: PARAMHAT = evfit (DATA)
 -- statistics: [PARAMHAT, PARAMCI] = evfit (DATA)
 -- statistics: [PARAMHAT, PARAMCI] = evfit (DATA, ALPHA)
 -- statistics: [...] = evfit (DATA, ALPHA, CENSOR)
 -- statistics: [...] = evfit (DATA, ALPHA, CENSOR, FREQ)
 -- statistics: [...] = evfit (DATA, ALPHA, CENSOR, FREQ, OPTIONS)

     Estimate parameters and confidence intervals for extreme value
     data.

     'PARAMHAT = evfit (DATA)' returns maximum likelihood estimates of
     the parameters of the type 1 extreme value distribution (also known
     as the Gumbel distribution) given in DATA.  PARAMHAT(1) is the
     location parameter, mu, and PARAMHAT(2) is the scale parameter,
     sigma.

     '[PARAMHAT, PARAMCI] = evfit (DATA)' returns the 95% confidence
     intervals for the parameter estimates.

     '[...] = evfit (DATA, ALPHA)' returns 100(1-ALPHA) percent
     confidence intervals for the parameter estimates.

     '[...] = evfit (DATA, ALPHA, CENSOR)' accepts a boolean vector of
     the same size as DATA with 1 for observations that are
     right-censored and 0 for observations that are observed exactly.

     '[...] = evfit (DATA, ALPHA, CENSOR, FREQ)' accepts a frequency
     vector of the same size as DATA.  FREQ typically contains integer
     frequencies for the corresponding elements in DATA, but may contain
     any non-integer non-negative values.

     '[...] = evfit (..., OPTIONS)'

     See also: evcdf, evinv, evpdf, evrnd, evlike, evstat.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 68
Estimate parameters and confidence intervals for extreme value data.



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 6
evlike


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 1411
 -- statistics: NLOGL = evlike (PARAMS, DATA)
 -- statistics: [NLOGL, AVAR] = evlike (PARAMS, DATA)
 -- statistics: [...] = evlike (PARAMS, DATA, CENSOR)
 -- statistics: [...] = evlike (PARAMS, DATA, CENSOR, FREQ)

     Negative log-likelihood for the extreme value distribution.

     Input Arguments
     ---------------

        * PARAMS is a two-element vector containing the mu and sigma
          parameters of the type 1 extreme value distribution (also
          known as the Gumbel distribution) at which the negative of the
          log-likelihood is evaluated.
        * DATA is the vector of given values.
        * CENSOR is a boolean vector of the same size as DATA with 1 for
          observations that are right-censored and 0 for observations
          that are observed exactly.
        * FREQ is a vector of the same size as DATA that contains
          integer frequencies for the corresponding elements in DATA,
          but may contain any non-integer non-negative values.  Pass in
          [] for CENSOR to use its default value.

     Return Values
     -------------

        * NLOGL is the negative log-likelihood.
        * AVAR is the inverse of the Fisher information matrix.  The
          Fisher information matrix is the second derivative of the
          negative log likelihood with respect to the parameter value.

     See also: evcdf, evinv, evpdf, evrnd, evfit, evstat.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 59
Negative log-likelihood for the extreme value distribution.



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 6
evstat


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 825
 -- statistics: [M, V] = evstat (MU, SIGMA)

     Mean and variance of the extreme value distribution.

     '[M, V] = evstat (MU, SIGMA)' returns the mean and variance of the
     type 1 extreme value distribution with location parameter MU and
     scale parameter SIGMA.  The sizes of M and V are the common size of
     MU and SIGMA.  A scalar input functions as a constant matrix of the
     same size as the other inputs.

     The type 1 extreme value distribution is also known as the Gumbel
     distribution.  The version used here is suitable for modeling
     minima; the mirror image of this distribution can be used to model
     maxima by negating X.  If Y has a Weibull distribution, then 'X =
     log (Y)' has the type 1 extreme value distribution.

     See also: evcdf, evinv, evpdf, evrnd, evfit, evlike.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 52
Mean and variance of the extreme value distribution.



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 6
expfit


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 2212
 -- statistics: MU = expfit (S)
 -- statistics: [MU, CI] = expfit (S)
 -- statistics: [MU, CI] = expfit (S, ALPHA)
 -- statistics: ... = expfit (S, ALPHA, C)
 -- statistics: ... = expfit (S, ALPHA, C, F)

     Estimate the mean of the exponential probability distribution
     function from which sample data S has been taken.  S is expected to
     be a non-negative vector.  If S is an array, the mean will be
     computed for each column of S.  If any elements of S are NaN, that
     vector's mean will be returned as NaN.

     If the optional output variable CI is requested, 'expfit' will also
     return the confidence interval bounds for the estimate as a two
     element column vector.  If S is an array, each column of data will
     have a confidence interval returned as a two row array.

     The optional scalar input ALPHA can be used to define the (1-ALPHA)
     confidence interval to be applied to all estimates as a value
     between 0 and 1.  The default is 0.05, resulting in a 0.95 or 95%
     CI. Any invalid values for alpha will return NaN for both CI
     bounds.

     The optional input C is a logical or numeric array of zeros and
     ones the same size as S, used to right-censor individual elements
     of S.  A value of 1 indicates the data should be censored from the
     mean estimation.  Any nonzero values in C are treated as a 1.

     The optional input F is a numeric array the same size as S, used to
     specify occurrence frequencies for the elements in S.  Values of F
     need not be integers.  Any NaN elements in the frequency array will
     produce a NaN output for MU.

     Options can be skipped by using [] to revert to the default.

     Matlab incompatibility: Matlab's 'expfit' produces unpredictable
     results for some cases with higher dimensions (specifically 1 x m x
     n x ...  arrays).  Octave's implementation allows for n-D arrays,
     consistently performing calculations on individual column vectors.
     Additionally, C and F can be used with arrays of any size, whereas
     Matlab only allows their use when S is a vector.

See also: expcdf, expinv, explike, exppdf, exprnd, expstat.

See also: expstat, exprnd, expcdf, expinv.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 80
Estimate the mean of the exponential probability distribution function
from w...



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 7
explike


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 734
 -- statistics: [NLOGL, AVAR] = explike (PARAM, DATA)

     Compute the negative log-likelihood of data under the exponential
     distribution with given parameter value.

     Arguments
     ---------

        * PARAM is a scalar containing the scale parameter of the
          exponential distribution (equal to its mean).
        * DATA is the vector of given values.

     Return values
     -------------

        * NLOGL is the negative log-likelihood.
        * AVAR is the inverse of the Fisher information matrix.  (The
          Fisher information matrix is the second derivative of the
          negative log likelihood with respect to the parameter value.)

     See also: expcdf, expfit, expinv, explike, exppdf, exprnd.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 80
Compute the negative log-likelihood of data under the exponential
distributio...



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 7
expstat


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 907
 -- statistics: [M, V] = expstat (L)

     Compute mean and variance of the exponential distribution.

     Arguments
     ---------

        * L is the parameter of the exponential distribution.  The
          elements of L must be positive

     Return values
     -------------

        * M is the mean of the exponential distribution

        * V is the variance of the exponential distribution

     Example
     -------

          l = 1:6;
          [m, v] = expstat (l)

     References
     ----------

       1. Wendy L. Martinez and Angel R. Martinez.  'Computational
          Statistics Handbook with MATLAB'. Appendix E, pages 547-557,
          Chapman & Hall/CRC, 2001.

       2. Athanasios Papoulis.  'Probability, Random Variables, and
          Stochastic Processes'.  McGraw-Hill, New York, second edition,
          1984.

     See also: expcdf, expfit, expinv, explike, exppdf, exprnd.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 58
Compute mean and variance of the exponential distribution.



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 4
ff2n


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 460
 -- statistics: DFF2 = ff2n (N)

     Two-level full factorial design

     'DFF2 = ff2n (N)' gives factor settings dFF2 for a two-level full
     factorial design with n factors.  DFF2 is m-by-n, where m is the
     number of treatments in the full-factorial design.  Each row of
     DFF2 corresponds to a single treatment.  Each column contains the
     settings for a single factor, with values of 0 and 1 for the two
     levels.

     See also: fullfact.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 31
Two-level full factorial design



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 11
fillmissing


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 8094
 -- statistics: B = fillmissing (A, 'constant', V)
 -- statistics: B = fillmissing (A, METHOD)
 -- statistics: B = fillmissing (A, MOVE_METHOD, WINDOW_SIZE)
 -- statistics: B = fillmissing (A, FILL_FUNCTION, WINDOW_SIZE)
 -- statistics: B = fillmissing (..., DIM)
 -- statistics: B = fillmissing (..., PROPERTYNAME, PROPERTYVALUE)
 -- statistics: [B, IDX] = fillmissing (...)

     Replace missing entries of array A either with values in V or as
     determined by other specified methods.  'missing' values are
     determined by the data type of A as identified by the function
     ismissing, curently defined as:

        * 'NaN': 'single', 'double'

        * '' '' (white space): 'char'

        * '{''}' (white space in cell): string cells.

     A can be a numeric scalar or array, a character vector or array, or
     a cell array of character vectors (a.k.a.  string cells).

     V can be a scalar or an array containing values for replacing the
     missing values in A with a compatible data type for isertion into
     A.  The shape of V must be a scalar or an array with number of
     elements in V equal to the number of elements orthoganal to the
     operating dimension.  E.g., if 'size(A)' = [3 5 4], operating along
     'dim' = 2 requires V to contain either 1 or 3x4=12 elements.

     If requested, the optional output IDX will contain a logical array
     the same shape as A indicating with 1's which locations in A were
     filled.

     Alternate Input Arguments and Values:
        * METHOD - replace missing values with:

          'next'
          'previous'
          'nearest'
               next, previous, or nearest non-missing value (nearest
               defaults to next when equidistant as determined by
               'SamplePoints'.)

          'linear'
               linear interpolation of neigboring, non-missing values

          'spline'
               piecewise cubic spline interpolation of neigboring,
               non-missing values

          'pchip'
               'shape preserving' piecewise cubic spline interposaliton
               of neighboring, non-missing values

        * MOVE_METHOD - moving window calculated replacement values:

          'movmean'
          'movmedian'
               moving average or median using a window determined by
               WINDOW_SIZE.  WINDOW_SIZE must be either a positive
               scalar value or a two element positive vector of sizes
               '[NB, NA]' measured in the same units as 'SamplePoints'.
               For scalar values, the window is centered on the missing
               element and includes all data points within a distance of
               half of WINDOW_SIZE on either side of the window center
               point.  Note that for compatability, when using a scalar
               value, the backward window limit is inclusive and the
               forward limit is exclusive.  If a two-element WINDOW_SIZE
               vector is specified, the window includes all points
               within a distance of NB backward and NA forward from the
               current element at the window center (both limits
               inclusive).

        * FILL_FUNCTION - custom method specified as a function handle.
          The supplied fill function must accept three inputs in the
          following order for each missing gap in the data:
          A_VALUES -
               elements of A within the window on either side of the gap
               as determined by WINDOW_SIZE.  (Note these elements can
               include missing values from other nearby gaps.)
          A_LOCS -
               locations of the reference data, A_VALUES, in terms of
               the default or specified 'SamplePoints'.
          GAP_LOCS -
               location of the gap data points that need to be filled in
               terms of the default or specified 'SamplePoints'.

          The supplied function must return a scalar or vector with the
          same number of elements in GAP_LOCS.  The required WINDOW_SIZE
          parameter follows similar rules as for the moving average and
          median methods described above, with the two exceptions that
          (1) each gap is processed as a single element, rather than gap
          elements being processed individually, and (2) the window
          extended on either side of the gap has inclusive endpoints
          regardless of how WINDOW_SIZE is specified.

        * DIM - specify a dimension for vector operation (default =
          first non-singeton dimension)

        * PROPERTYNAME-PROPERTYVALUE pairs
          'SamplePoints'
               PROPERTYVALUE is a vector of sample point values
               representing the sorted and unique x-axis values of the
               data in A.  If unspecified, the default is assumed to be
               the vector [1 : SIZE (A, DIM)].  The values in
               'SamplePoints' will affect methods and properties that
               rely on the effective distance between data points in A,
               such as interpolants and moving window functions where
               the WINDOW_SIZE specified for moving window functions is
               measured relative to the 'SamplePoints'.

          'EndValues'
               Apply a separate handling method for missing values at
               the front or back of the array.  PROPERTYVALUE can be:
                  * A constant scalar or array with the same shape
                    requirments as V.
                  * 'none' - Do not fill end gap values.
                  * 'extrap' - Use the same procedure as METHOD to fill
                    the end gap values.
                  * Any valid METHOD listed above except for 'movmean',
                    'movmedian', and 'fill_function'.  Those methods can
                    only be applied to end gap values with 'extrap'.

          'MissingLocations'
               PROPERTYVALUE must be a logical array the same size as A
               indicating locations of known missing data with a value
               of 'true'.  (cannot be combined with MaxGap)

          'MaxGap'
               PROPERTYVALUE is a numeric scalar indicating the maximum
               gap length to fill, and assumes the same distance scale
               as the sample points.  Gap length is calculated by the
               difference in locations of the sample points on either
               side of the gap, and gaps larger than MaxGap are ignored
               by FILLMISSING.  (cannot be combined with
               MissingLocations)

     Compatibility Notes:
        * Numerical and logical inputs for A and V may be specified in
          any combination.  The output will be the same class as A, with
          the V converted to that data type for filling.  Only 'single'
          and 'double' have defined 'missing' values, so except for when
          the 'missinglocations' option specifies the missing value
          identification of logical and other numeric data types, the
          output will always be 'B = A' with 'IDX = false(size(A))'.
        * All interpolation methods can be individually applied to
          'EndValues'.
        * MATLAB's FILL_FUNCTION method currently has several
          inconsistencies with the other methods (tested against version
          2022a), and Octave's implementation has chosen the following
          consistent behavior over compatibility: (1) a column full of
          missing data is considered part of 'EndValues', (2) such
          columns are then excluded from FILL_FUNCTION processing
          because the moving window is always empty.  (3) operation in
          dimensions higher than 2 perform identically to operations in
          dims 1 and 2, most notable on vectors.
        * Method "makima" is not yet implemented in 'interp1', which is
          used by 'fillmissing'.  Attempting to call this method will
          produce an error until the method is implemented in 'interp1'.

     See also: ismissing, rmmissing, standardizeMissing.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 80
Replace missing entries of array A either with values in V or as
determined b...



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 9
fitgmdist


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 2894
 -- statistics: GMDIST = fitgmdist (DATA, K, PARAM1, VALUE1, ...)

     Fit a Gaussian mixture model with K components to DATA.  Each row
     of DATA is a data sample.  Each column is a variable.

     Optional parameters are:
        * 'start': initialization conditions.  Possible values are:
             * 'randSample' (default) takes means uniformly from rows of
               data
             * 'plus' use k-means++ to initialize means
             * 'cluster' Performs an initial clustering with 10% of the
               data
             * vector A vector whose length is the number of rows in
               data, and whose values are 1 to k specify the components
               each row is initially allocated to.  The mean, variance
               and weight of each component is calculated from that
             * structure with elements mu, Sigma ComponentProportion
          For 'randSample', 'plus' and 'cluster', the initial variance
          of each component is the variance of the entire data sample.

        * 'Replicates' Number of random restarts to perform

        * 'RegularizationValue'
        * 'Regularize' A small number added to the diagonal entries of
          the covariance to prevent singular covariances

        * 'SharedCovariance'
        * 'SharedCov' (logical) True if all components must share the
          same variance, to reduce the number of free parameters

        * 'CovarianceType'
        * 'CovType' (string).  Possible values are:
             * 'full' (default) Allow arbitrary covariance matrices
             * 'diagonal' Force covariances to be diagonal, to reduce
               the number of free parameters.

        * 'Options' A structure with all of the following fields:
             * 'MaxIter' Maximum number of EM iterations (default 100)
             * 'TolFun' Threshold increase in likelihood to terminate EM
               (default 1e-6)
             * 'Display'
                  * 'off' (default): display nothing
                  * 'final': display the number of iterations and
                    likelihood once execution completes
                  * 'iter': display the above after each iteration
        * 'Weight' A column vector or n-by-2 matrix.  The first column
          consists of non-negative weights given to the samples.  If
          these are all integers, this is equivalent to specifying
          WEIGHT(i) copies of row i of DATA, but potentially faster.

          If a row of DATA is used to represent samples that are similar
          but not identical, then the second column of WEIGHT indicates
          the variance of those original samples.  Specifically, in the
          EM algorithm, the contribution of row i towards the variance
          is set to at least WEIGHT(i,2), to prevent spurious components
          with zero variance.

     See also: gmdistribution, kmeans.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 55
Fit a Gaussian mixture model with K components to DATA.



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 5
fitlm


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 4130
 -- statistics: TAB = fitlm (X, Y)
 -- statistics: TAB = fitlm (X, Y, NAME, VALUE)
 -- statistics: TAB = fitlm (X, Y, MODELSPEC)
 -- statistics: TAB = fitlm (X, Y, MODELSPEC, NAME, VALUE)
 -- statistics: [TAB] = fitlm (...)
 -- statistics: [TAB, STATS] = fitlm (...)
 -- statistics: [TAB, STATS] = fitlm (...)

     Regress the continuous outcome (i.e.  dependent variable) Y on
     continuous or categorical predictors (i.e.  independent variables)
     X by minimizing the sum-of-squared residuals.  Unless requested
     otherwise, fitlm prints the model formula, the regression
     coefficients (i.e.  parameters/contrasts) and an ANOVA table.  Note
     that unlike anovan, fitlm treats all factors as continuous by
     default.

     X must be a column major matrix or cell array consisting of the
     predictors.  By default, there is a constant term in the model,
     unless you, explicitly remove it, so do not include a column of 1s
     in X. Y must be a column vector corresponding to the outcome
     variable.  MODELSPEC can specified as one of the following:

        * "constant" : model contains only a constant (intercept) term.

        * "linear" (default) : model contains an intercept and linear
          term for each predictor.

        * "interactions" : model contains an intercept, linear term for
          each predictor and all products of pairs of distinct
          predictors.

        * "full" : model contains an intercept, linear term for each
          predictor and all combinations of the predictors.

        * a matrix of term definitions : an t-by-(N+1) matrix specifying
          terms in a model, where t is the number of terms, N is the
          number of predictor variables, and +1 accounts for the outcome
          variable.  The outcome variable is the last column in the
          terms matrix and must be a column of zeros.  An intercept must
          be specified in the first row of the terms matrix and must be
          a row of zeros.

     fitlm can take a number of optional parameters as name-value pairs.

     '[...] = fitlm (..., "CategoricalVars", CATEGORICAL)'

        * CATEGORICAL is a vector of indices indicating which of the
          columns (i.e.  variables) in X should be treated as
          categorical predictors rather than as continuous predictors.

     fitlm also accepts optional anovan parameters as name-value pairs
     (except for the "model" parameter).  The accepted parameter names
     from anovan and their default values in fitlm are:

        * CONTRASTS : "treatment"

        * SSTYPE: 2

        * ALPHA: 0.05

        * DISPLAY: "on"

        * WEIGHTS: [] (empty)

        * RANDOM: [] (empty)

        * CONTINUOUS: [1:N]

        * VARNAMES: [] (empty)

     Type 'help anovan' to find out more about what these options do.

     fitlm can return up to two output arguments:

     [TAB] = fitlm (...) returns a cell array containing a table of
     model parameters

     [TAB, STATS] = fitlm (...) returns a structure containing
     additional statistics, including degrees of freedom and effect
     sizes for each term in the linear model, the design matrix, the
     variance-covariance matrix, (weighted) model residuals, and the
     mean squared error.  The columns of STATS.coeffs (from
     left-to-right) report the model coefficients, standard errors,
     lower and upper 100*(1-alpha)% confidence interval bounds,
     t-statistics, and p-values relating to the contrasts.  The number
     appended to each term name in STATS.coeffnames corresponds to the
     column number in the relevant contrast matrix for that factor.  The
     STATS structure can be used as input for multcompare.  The STATS
     structure is recognised by the functions bootcoeff and bootemm from
     the statistics-bootstrap package.  Note that if the model contains
     a continuous variable and you wish to use the STATS output as input
     to multcompare, then the model needs to be refit with the
     "contrast" parameter set to a sum-to-zero contrast coding scheme,
     e.g."simple".

     See also: anovan, multcompare.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 80
Regress the continuous outcome (i.e.  dependent variable) Y on
continuous or ...



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 8
friedman


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 1714
 -- statistics: P = friedman (X)
 -- statistics: P = friedman (X, REPS)
 -- statistics: P = friedman (X, REPS, DISPLAYOPT)
 -- statistics: [P, ATAB] = friedman (...)
 -- statistics: [P, ATAB, STATS] = friedman (...)

     Performs the nonparametric Friedman's test to compare column
     effects in a two-way layout.  friedman tests the null hypothesis
     that the column effects are all the same against the alternative
     that they are not all the same.

     friedman requires one up to three input arguments:

        * X contains the data and it must be a matrix of at least two
          columns and two rows.
        * REPS is the number of replicates for each combination of
          factor groups.  If not provided, no replicates are assumed.
        * DISPLAYOPT is an optional parameter for displaying the
          Friedman's ANOVA table, when it is 'on' (default) and
          suppressing the display when it is 'off'.

     friedman returns up to three output arguments:

        * P is the p-value of the null hypothesis that all group means
          are equal.
        * ATAB is a cell array containing the results in a Friedman's
          ANOVA table.
        * STATS is a structure containing statistics useful for
          performing a multiple comparison of medians with the
          MULTCOMPARE function.

     If friedman is called without any output arguments, then it prints
     the results in a one-way ANOVA table to the standard output as if
     DISPLAYOPT is 'on'.

     Examples:

          load popcorn;
          friedman (popcorn, 3);

          [p, anovatab, stats] = friedman (popcorn, 3, "off");
          disp (p);

     See also: anova2, kruskalwallis, multcompare.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 80
Performs the nonparametric Friedman's test to compare column effects in
a two...



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 5
fstat


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 1163
 -- statistics: [MN, V] = fstat (M, N)

     Compute mean and variance of the F distribution.

     Arguments
     ---------

        * M is the first parameter of the F distribution.  The elements
          of M must be positive

        * N is the second parameter of the F distribution.  The elements
          of N must be positive
     M and N must be of common size or one of them must be scalar

     Return values
     -------------

        * MN is the mean of the F distribution.  The mean is undefined
          for N not greater than 2

        * V is the variance of the F distribution.  The variance is
          undefined for N not greater than 4

     Examples
     --------

          m = 1:6;
          n = 5:10;
          [mn, v] = fstat (m, n)

          [mn, v] = fstat (m, 5)

     References
     ----------

       1. Wendy L. Martinez and Angel R. Martinez.  'Computational
          Statistics Handbook with MATLAB'. Appendix E, pages 547-557,
          Chapman & Hall/CRC, 2001.

       2. Athanasios Papoulis.  'Probability, Random Variables, and
          Stochastic Processes'.  McGraw-Hill, New York, second edition,
          1984.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 48
Compute mean and variance of the F distribution.



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 8
fullfact


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 334
 -- statistics: fullfact (N)

     Full factorial design.

     If N is a scalar, return the full factorial design with N binary
     choices, 0 and 1.

     If N is a vector, return the full factorial design with ordinal
     choices 1 through N_I for each factor I.

     Values in N must be positive integers.

     See also: ff2n.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 22
Full factorial design.



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 6
gamfit


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 712
 -- statistics: MLE = gamfit (DATA)

     Calculate gamma distribution parameters.

     Find the maximum likelihood estimate parameters of the Gamma
     distribution of DATA.  MLE is a two element vector with shape
     parameter A and scale B.

     This function works by minimizing the value of gamlike for the
     vector R. Just about any minimization function will work, all it
     has to do is minimize for one variable.  Although the gamma
     distribution has two parameters, their product is the mean of the
     data.  So a helper function for the search takes one parameter,
     calculates the other and then returns the value of gamlike.

     See also: gamcdf, gampdf, gaminv, gamrnd, gamlike.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 40
Calculate gamma distribution parameters.



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 7
gamlike


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 269
 -- statistics: NLOGL = gamlike (PARAMS, R)

     Calculates the negative log-likelihood function for the Gamma
     distribution over vector R, with the given parameters A and B in a
     2-element vector PARAMS.

     See also: gamcdf, gampdf, gaminv, gamrnd, gamfit.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 80
Calculates the negative log-likelihood function for the Gamma
distribution ov...



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 7
gamstat


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 1036
 -- statistics: [M, V] = gamstat (A, B)

     Compute mean and variance of the gamma distribution.

     Arguments
     ---------

        * A is the first parameter of the gamma distribution.  A must be
          positive

        * B is the second parameter of the gamma distribution.  B must
          be positive
     A and B must be of common size or one of them must be scalar

     Return values
     -------------

        * M is the mean of the gamma distribution

        * V is the variance of the gamma distribution

     Examples
     --------

          a = 1:6;
          b = 1:0.2:2;
          [m, v] = gamstat (a, b)

          [m, v] = gamstat (a, 1.5)

     References
     ----------

       1. Wendy L. Martinez and Angel R. Martinez.  'Computational
          Statistics Handbook with MATLAB'. Appendix E, pages 547-557,
          Chapman & Hall/CRC, 2001.

       2. Athanasios Papoulis.  'Probability, Random Variables, and
          Stochastic Processes'.  McGraw-Hill, New York, second edition,
          1984.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 52
Compute mean and variance of the gamma distribution.



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 7
geomean


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 1865
 -- statistics: M = geomean (X)
 -- statistics: M = geomean (X, "all")
 -- statistics: M = geomean (X, DIM)
 -- statistics: M = geomean (X, VECDIM)
 -- statistics: M = geomean (..., NANFLAG)

     Compute the geometric mean of X.

        * If X is a vector, then 'geomean(X)' returns the geometric mean
          of the elements in X defined as

               geomean (X) = PROD_i X(i) ^ (1/N)

          where N is the length of the X vector.

        * If X is a matrix, then 'geomean(X)' returns a row vector with
          the geometric mean of each columns in X.

        * If X is a multidimensional array, then 'geomean(X)' operates
          along the first nonsingleton dimension of X.

        * If X contains any negative values, then 'geomean(X)' returns
          complex values.

     'geomean(X, "all")' returns the geometric mean of all the elements
     in X.

     'geomean(X, DIM)' returns the geometric mean along the operating
     dimension DIM of X.

     'geomean(X, VECDIM)' returns the geometric mean over the dimensions
     specified in the vector VECDIM.  For example, if X is a 2-by-3-by-4
     array, then 'geomean(X, [1 2])' returns a 1-by-4 array.  Each
     element of the output array is the geometric mean of the elements
     on the corresponding page of X.  NOTE! VECDIM MUST index at least
     N-2 dimensions of X, where 'N = length (size (X))' and N < 8.  If
     VECDIM indexes all dimensions of X, then it is equivalent to
     'geomean(X, "all")'.

     'geomean(..., NANFLAG)' specifies whether to exclude NaN values
     from the calculation, using any of the input argument combinations
     in previous syntaxes.  By default, geomean includes NaN values in
     the calculation (NANFLAG has the value "includenan").  To exclude
     NaN values, set the value of NANFLAG to "omitnan".

     See also: harmmean, mean.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 32
Compute the geometric mean of X.



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 7
geostat


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 851
 -- statistics: [M, V] = geostat (P)

     Compute mean and variance of the geometric distribution.

     Arguments
     ---------

        * P is the rate parameter of the geometric distribution.  The
          elements of P must be probabilities

     Return values
     -------------

        * M is the mean of the geometric distribution

        * V is the variance of the geometric distribution

     Example
     -------

          p = 1 ./ (1:6);
          [m, v] = geostat (p)

     References
     ----------

       1. Wendy L. Martinez and Angel R. Martinez.  'Computational
          Statistics Handbook with MATLAB'. Appendix E, pages 547-557,
          Chapman & Hall/CRC, 2001.

       2. Athanasios Papoulis.  'Probability, Random Variables, and
          Stochastic Processes'.  McGraw-Hill, New York, second edition,
          1984.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 56
Compute mean and variance of the geometric distribution.



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 6
gevfit


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 2689
 -- statistics: [PARAMHAT, PARAMCI] = gevfit (DATA)
 -- statistics: [PARAMHAT, PARAMCI] = gevfit (DATA, PARAMGUESS)
 -- statistics: [PARAMHAT, PARAMCI] = gevfit (DATA, ALPHA)
 -- statistics: [PARAMHAT, PARAMCI] = gevfit (..., OPTIONS)

     Find the maximum likelihood estimator PARAMHAT of the generalized
     extreme value (GEV) distribution to fit DATA.

     Arguments
     ---------

        * DATA is the vector of given values.
        * PARAMGUESS is an initial guess for the maximum likelihood
          parameter vector.  If not given, this defaults to K_0=0 and
          SIGMA, MU determined by matching the data mean and standard
          deviation to their expected values.
        * ALPHA returns 100(1-ALPHA) percent confidence intervals for
          the parameter estimates.  Pass in [] for ALPHA to use the
          default values.
        * OPTIONS a structure that specifies control parameters for the
          iterative algorithm used to compute ML estimates.  The
          structure must contain the following fields:

          'Display' = "off"; 'MaxFunEvals' = 400; 'MaxIter' = 200;
          'TolFun' = 1e-6; 'TolX' = 1e-6.

          If not provided, the aforementioned values are used by
          default.

     Return values
     -------------

        * PARAMHAT is the 3-parameter maximum-likelihood parameter
          vector '[K_0, SIGMA, MU]', where K_0 is the shape parameter of
          the GEV distribution, SIGMA is the scale parameter of the GEV
          distribution, and MU is the location parameter of the GEV
          distribution.
        * PARAMCI has the approximate 95% confidence intervals of the
          parameter values based on the Fisher information matrix at the
          maximum-likelihood position.

     When K < 0, the GEV is the type III extreme value distribution.
     When K > 0, the GEV distribution is the type II, or Frechet,
     extreme value distribution.  If W has a Weibull distribution as
     computed by the WBLFIT function, then -W has a type III extreme
     value distribution and 1/W has a type II extreme value
     distribution.  In the limit as K approaches 0, the GEV is the
     mirror image of the type I extreme value distribution as computed
     by the EVFIT function.

     The mean of the GEV distribution is not finite when K >= 1, and the
     variance is not finite when PSI >= 1/2.  The GEV distribution is
     defined for K*(X-MU)/SIGMA > -1.

     Examples
     --------

          data = 1:50;
          [pfit, pci] = gevfit (data);
          p1 = gevcdf (data, pfit(1), pfit(2), pfit(3));
          plot (data, p1)

     See also: gevcdf, gevinv, gevlike, gevpdf, gevrnd, gevstat.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 80
Find the maximum likelihood estimator PARAMHAT of the generalized
extreme val...



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 11
gevfit_lmom


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 1193
 -- statistics: [PARAMHAT, PARAMCI] = gevfit_lmom (DATA)

     Find an estimator (PARAMHAT) of the generalized extreme value (GEV)
     distribution fitting DATA using the method of L-moments.

     Arguments
     ---------

        * DATA is the vector of given values.

     Return values
     -------------

        * PARMHAT is the 3-parameter maximum-likelihood parameter vector
          [K; SIGMA; MU], where K is the shape parameter of the GEV
          distribution, SIGMA is the scale parameter of the GEV
          distribution, and MU is the location parameter of the GEV
          distribution.
        * PARAMCI has the approximate 95% confidence intervals of the
          parameter values (currently not implemented).

     Examples
     --------

          data = gevrnd (0.1, 1, 0, 100, 1);
          [pfit, pci] = gevfit_lmom (data);
          p1 = gevcdf (data,pfit(1),pfit(2),pfit(3));
          [f, x] = ecdf (data);
          plot(data, p1, 's', x, f)

     See also: gevfit.

     References
     ----------

       1. Ailliot, P.; Thompson, C. & Thomson, P. Mixed methods for
          fitting the GEV distribution, Water Resources Research, 2011,
          47, W05551


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 80
Find an estimator (PARAMHAT) of the generalized extreme value (GEV)
distribut...



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 7
gevlike


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 1470
 -- statistics: [NLOGL, GRAD, ACOV] = gevlike (PARAMS, DATA)

     Compute the negative log-likelihood of data under the generalized
     extreme value (GEV) distribution with given parameter values.

     Arguments
     ---------

        * PARAMS is the 3-parameter vector [K, SIGMA, MU], where K is
          the shape parameter of the GEV distribution, SIGMA is the
          scale parameter of the GEV distribution, and MU is the
          location parameter of the GEV distribution.
        * DATA is the vector of given values.

     Return values
     -------------

        * NLOGL is the negative log-likelihood.
        * GRAD is the 3 by 1 gradient vector, which is the first
          derivative of the negative log likelihood with respect to the
          parameter values.
        * ACOV is the 3 by 3 inverse of the Fisher information matrix,
          which is the second derivative of the negative log likelihood
          with respect to the parameter values.

     Examples
     --------

          x = -5:-1;
          k = -0.2;
          sigma = 0.3;
          mu = 0.5;
          [L, ~, C] = gevlike ([k sigma mu], x);

     References
     ----------

       1. Rolf-Dieter Reiss and Michael Thomas.  'Statistical Analysis
          of Extreme Values with Applications to Insurance, Finance,
          Hydrology and Other Fields'.  Chapter 1, pages 16-17,
          Springer, 2007.

     See also: gevcdf, gevfit, gevinv, gevpdf, gevrnd, gevstat.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 80
Compute the negative log-likelihood of data under the generalized
extreme val...



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 7
gevstat


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 723
 -- statistics: [M, V] = gevstat (K, SIGMA, MU)

     Compute the mean and variance of the generalized extreme value
     distribution.

     Arguments
     ---------

        * K is the shape parameter of the GEV distribution.  (Also
          denoted gamma or xi.)
        * SIGMA is the scale parameter of the GEV distribution.  The
          elements of SIGMA must be positive.
        * MU is the location parameter of the GEV distribution.
     The inputs must be of common size, or some of them must be scalar.

     Return values
     -------------

        * M is the mean of the GEV distribution

        * V is the variance of the GEV distribution

     See also: gevcdf, gevfit, gevinv, gevlike, gevpdf, gevrnd.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 76
Compute the mean and variance of the generalized extreme value
distribution.



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 14
gmdistribution


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 1202
 -- statistics: GMDIST = gmdistribution (MU, SIGMA)
 -- statistics: GMDIST = gmdistribution (MU, SIGMA, P)
 -- statistics: GMDIST = gmdistribution (MU, SIGMA, P, EXTRA)

     Create an object of the gmdistribution class which represents a
     Gaussian mixture model with k components of n-dimensional
     Gaussians.

     Input MU is a k-by-n matrix specifying the n-dimensional mean of
     each of the k components of the distribution.

     Input SIGMA is an array that specifies the variances of the
     distributions, in one of four forms depending on its dimension.
        * n-by-n-by-k: Slice SIGMA(:,:,i) is the variance of the i'th
          component
        * 1-by-n-by-k: Slice diag(SIGMA(1,:,i)) is the variance of the
          i'th component
        * n-by-n: SIGMA is the variance of every component
        * 1-by-n-by-k: Slice diag(SIGMA) is the variance of every
          component

     If P is specified, it is a vector of length k specifying the
     proportion of each component.  If it is omitted or empty, each
     component has an equal proportion.

     Input EXTRA is used by fitgmdist to indicate the parameters of the
     fitting process.

     See also: fitgmdist.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 80
Create an object of the gmdistribution class which represents a Gaussian
mixt...



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 5
gpfit


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 2128
 -- statistics: PARAMHAT = gpfit (X)
 -- statistics: [PARAMHAT, PARAMCI] = gpfit (X)
 -- statistics: [...] = gpfit (X, ALPHA)
 -- statistics: [...] = gpfit (X, ALPHA, OPTIONS)

     Parameter estimates and confidence intervals for generalized Pareto
     data.

     'PARAMHAT = gpfit (X)' returns maximum likelihood estimates of the
     parameters of the two-parameter generalized Pareto distribution
     given the data in X.  PARAMS(1) is the SHAPE parameter and
     PARAMS(2) is the SCALE parameter.  'gpfit' does not fit a LOCATION
     parameter.

     '[PARAMHAT, PARAMCI] = gpfit (X)' returns 95% confidence intervals
     for the parameter estimates.

     '[...] = gpfit (X, ALPHA)' returns 100*(1 - ALPHA) percent
     confidence intervals for the parameter estimates.

     Pass in [] for ALPHA to use the default values.

     '[...] = gpfit (X, ALPHA, OPTIONS)' specifies control parameters
     for the iterative algorithm used to compute ML estimates with the
     'fminsearch' function.  OPTIONS is a structure with the following
     fields {default values}:
        * 'Display' {"off"}
        * 'MaxFunEvals' {400}
        * 'MaxIter' {200}
        * 'TolBnd' {1.0e-6}
        * 'TolFun' {1.0e-6}
        * 'TolX' {1.0e-6}

     Other functions for the generalized Pareto, such as 'gpcdf', allow
     a LOCATION parameter.  However, 'gpfit' does not estimate LOCATION,
     and it must be assumed known, and subtracted from X before calling
     'gpfit'.

     When SHAPE = 0 and LOCATION = 0, the generalized Pareto
     distribution is equivalent to the exponential distribution.  When
     'SHAPE > 0' and 'LOCATION = SCALE / SHAPE', the generalized Pareto
     distribution is equivalent to the Pareto distribution.  The mean of
     the generalized Pareto distribution is not finite when 'SHAPE >=
     1', and the variance is not finite when 'SHAPE >= 1/2'.  When
     'SHAPE >= 0', the generalized Pareto distribution has positive
     density for 'X > LOCATION', or, when 'SHAPE < 0', for '0 <= (X -
     LOCATION) / SCALE <= -1 / SHAPE'.

     See also: gpcdf, gpinv, gppdf, gprnd, gplike, gpstat.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 73
Parameter estimates and confidence intervals for generalized Pareto
data.



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 6
gplike


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 1460
 -- statistics: NLOGL = gplike (PARAMS, DATA)
 -- statistics: [NLOGL, AVAR] = gplike (PARAMS, DATA)

     Negative log-likelihood for the generalized Pareto distribution.

     'NLOGL = gplike (PARAMS, DATA)' returns the negative of the
     log-likelihood for the two-parameter generalized Pareto
     distribution, evaluated at 'PARAMS(1) = SHAPE' and 'PARAMS(2) =
     SCALE' given DATA.  'gplike' does not allow a LOCATION parameter.
     NLOGL is a scalar.

     '[NLOGL, AVAR] = gplike (PARAMS, DATA)' returns the inverse of
     Fisher's information matrix, ACOV.  If the input parameter values
     in PARAMS are the maximum likelihood estimates, the diagonal
     elements of ACOV are their asymptotic variances.  ACOV is based on
     the observed Fisher's information, not the expected information.

     When SHAPE = 0 and LOCATION = 0, the generalized Pareto
     distribution is equivalent to the exponential distribution.  When
     'SHAPE > 0' and 'LOCATION = SCALE / SHAPE', the generalized Pareto
     distribution is equivalent to the Pareto distribution.  The mean of
     the generalized Pareto distribution is not finite when 'SHAPE >=
     1', and the variance is not finite when 'SHAPE >= 1/2'.  When
     'SHAPE >= 0', the generalized Pareto distribution has positive
     density for 'X > LOCATION', or, when 'SHAPE < 0', for '0 <= (X -
     LOCATION) / SCALE <= -1 / SHAPE'.

     See also: gpcdf, gpinv, gppdf, gprnd, gpfit, gpstat.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 64
Negative log-likelihood for the generalized Pareto distribution.



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 6
gpstat


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 987
 -- statistics: [M, V] = gpstat (SHAPE, SCALE, LOCATION)

     Mean and variance of the generalized Pareto distribution.

     '[M, V] = gpstat (SHAPE, SCALE, LOCATION)' returns the mean and
     variance of the generalized Pareto distribution with SHAPE, SCALE,
     and LOCATION parameeters.

     The default value for LOCATION is 0.

     When SHAPE = 0 and LOCATION = 0, the generalized Pareto
     distribution is equivalent to the exponential distribution.  When
     'SHAPE > 0' and 'LOCATION = SCALE / SHAPE', the generalized Pareto
     distribution is equivalent to the Pareto distribution.  The mean of
     the generalized Pareto distribution is not finite when 'SHAPE >=
     1', and the variance is not finite when 'SHAPE >= 1/2'.  When
     'SHAPE >= 0', the generalized Pareto distribution has positive
     density for 'X > LOCATION', or, when 'SHAPE < 0', for '0 <= (X -
     LOCATION) / SCALE <= -1 / SHAPE'.

     See also: gpcdf, gpinv, gppdf, gprnd, gpfit, gplike.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 57
Mean and variance of the generalized Pareto distribution.



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 7
grp2idx


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 453
 -- statistics: [G, GN, GL] = grp2idx (S)

     Get index for group variables.

     For variable S, returns the indices G, into the variable groups GN
     and GL.  The first has a string representation of the groups while
     the later has its actual values.  The group indices are allocated
     in order of appearance in S.

     NaNs and empty strings in S appear as NaN in G and are not present
     on either GN and GL.

     See also: grpstats.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 30
Get index for group variables.



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 8
grpstats


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 2215
 -- statistics: MEAN = grpstats (X)
 -- statistics: MEAN = grpstats (X, GROUP)
 -- statistics: [A, B, ...] = grpstats (X, GROUP, WHICHSTATS)
 -- statistics: [A, B, ...] = grpstats (X, GROUP, WHICHSTATS, ALPHA)

     Summary statistics by group.  'grpstats' computes groupwise summary
     statistics, for data in a matrix X.  'grpstats' treats NaNs as
     missing values, and removes them.

     'MEANS = grpstats (X, GROUP)', when X is a matrix of observations,
     returns the means of each column of X by GROUP.  GROUP is a
     grouping variable defined as a categorical variable, numeric,
     string array, or cell array of strings.  GROUP can be [] or omitted
     to compute the mean of the entire sample without grouping.

     '[A, B, ...] = grpstats (X, GROUP, WHICHSTATS)', for a numeric
     matrix X, returns the statistics specified by WHICHSTATS, as
     separate arrays A, B, ....  WHICHSTATS can be a single function
     name, or a cell array containing multiple function names.  The
     number of output arguments must match the number of function names
     in WHICHSTATS.  Names in WHICHSTATS can be chosen from among the
     following:

          "mean"         mean
          "median"       median
          "sem"          standard error of the mean
          "std"          standard deviation
          "var"          variance
          "min"          minimum value
          "max"          maximum value
          "range"        maximum - minimum
          "numel"        count, or number of elements
          "meanci"       95% confidence interval for the mean
          "predci"       95% prediction interval for a new observation
          "gname"        group name

     '[...] = grpstats (X, GROUP, WHICHSTATS, ALPHA)' specifies the
     confidence level as 100(1-ALPHA)% for the "meanci" and "predci"
     options.  Default value for ALPHA is 0.05.

     Examples:

          load carsmall;
          [m,p,g] = grpstats (Weight, Model_Year, {"mean", "predci", "gname"})
          n = length(m);
          errorbar((1:n)',m,p(:,2)-m)
          set (gca, "xtick", 1:n, "xticklabel", g);
          title ("95% prediction intervals for mean weight by year")

     See also: grp2idx.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 80
Summary statistics by group.  'grpstats' computes groupwise summary
statistic...



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 8
gscatter


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 1787
 -- statistics: gscatter (X, Y, G)
 -- statistics: gscatter (X, Y, G, CLR, SYM, SIZ)
 -- statistics: gscatter (..., DOLEG, XNAM, YNAM)
 -- statistics: H = gscatter (...)

     Draw a scatter plot with grouped data.

     'gscatter' is a utility function to draw a scatter plot of X and Y,
     according to the groups defined by G.  Input X and Y are numeric
     vectors of the same size, while G is either a vector of the same
     size as X or a character matrix with the same number of rows as the
     size of X.  As a vector G can be numeric, logical, a character
     array, a string array (not implemented), a cell string or cell
     array.

     A number of optional inputs change the appearance of the plot:
        * "CLR" defines the color for each group; if not enough colors
          are defined by "CLR", 'gscatter' cycles through the specified
          colors.  Colors can be defined as named colors, as rgb
          triplets or as indices for the current 'colormap'.  The
          default value is a different color for each group, according
          to the current 'colormap'.

        * "SYM" is a char array of symbols for each group; if not enough
          symbols are defined by "SYM", 'gscatter' cycles through the
          specified symbols.

        * "SIZ" is a numeric array of sizes for each group; if not
          enough sizes are defined by "SIZ", 'gscatter' cycles through
          the specified sizes.

        * "DOLEG" is a boolean value to show the legend; it can be
          either on (default) or off.

        * "XNAM" is a character array, the name for the x axis.

        * "YNAM" is a character array, the name for the y axis.

     Output H is an array of graphics handles to the 'line' object of
     each group.

See also: scatter.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 38
Draw a scatter plot with grouped data.



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 8
harmmean


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 2001
 -- statistics: M = harmmean (X)
 -- statistics: M = harmmean (X, "all")
 -- statistics: M = harmmean (X, DIM)
 -- statistics: M = harmmean (X, VECDIM)
 -- statistics: M = harmmean (..., NANFLAG)

     Compute the harmonic mean of X.

        * If X is a vector, then 'harmmean(X)' returns the harmonic mean
          of the elements in X defined as

               harmmean (X) = N / SUM_i X(i)^-1

          where N is the length of the X vector.

        * If X is a matrix, then 'harmmean(X)' returns a row vector with
          the harmonic mean of each columns in X.

        * If X is a multidimensional array, then 'harmmean(X)' operates
          along the first nonsingleton dimension of X.

        * If X contains any negative values, then 'harmmean(X)' returns
          an error.

     'harmmean(X, "all")' returns the harmonic mean of all the elements
     in X.  If X contains any 0, then the returned value is 0.

     'harmmean(X, DIM)' returns the harmonic mean along the operating
     dimension DIM of X.  Calculating the harmonic mean of any subarray
     containing any 0 will return 0.

     'harmmean(X, VECDIM)' returns the harmonic mean over the dimensions
     specified in the vector VECDIM.  For example, if X is a 2-by-3-by-4
     array, then 'harmmean(X, [1 2])' returns a 1-by-4 array.  Each
     element of the output array is the harmonic mean of the elements on
     the corresponding page of X.  NOTE! VECDIM MUST index at least N-2
     dimensions of X, where 'N = length (size (X))' and N < 8.  If
     VECDIM indexes all dimensions of X, then it is equivalent to
     'geomean(X, "all")'.

     'harmmean(..., NANFLAG)' specifies whether to exclude NaN values
     from the calculation, using any of the input argument combinations
     in previous syntaxes.  By default, harmmean includes NaN values in
     the calculation (NANFLAG has the value "includenan").  To exclude
     NaN values, set the value of NANFLAG to "omitnan".

     See also: geomean, mean.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 31
Compute the harmonic mean of X.



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 5
hist3


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 2133
 -- statistics: hist3 (X)
 -- statistics: hist3 (X, NBINS)
 -- statistics: hist3 (X, "Nbins", NBINS)
 -- statistics: hist3 (X, CENTERS)
 -- statistics: hist3 (X, "Ctrs", CENTERS)
 -- statistics: hist3 (X, "Edges", EDGES)
 -- statistics: [N, C] = hist3 (...)
 -- statistics: hist3 (..., PROP, VAL, ...)
 -- statistics: hist3 (HAX, ...)

     Produce bivariate (2D) histogram counts or plots.

     The elements to produce the histogram are taken from the Nx2 matrix
     X.  Any row with NaN values are ignored.  The actual bins can be
     configured in 3 different: number, centers, or edges of the bins:

     Number of bins (default)
          Produces equally spaced bins between the minimum and maximum
          values of X.  Defined as a 2 element vector, NBINS, one for
          each dimension.  Defaults to '[10 10]'.

     Center of bins
          Defined as a cell array of 2 monotonically increasing vectors,
          CENTERS.  The width of each bin is determined from the
          adjacent values in the vector with the initial and final bin,
          extending to Infinity.

     Edge of bins
          Defined as a cell array of 2 monotonically increasing vectors,
          EDGES.  'N(i,j)' contains the number of elements in X for
          which:

               EDGES{1}(i) <= X(:,1) < EDGES{1}(i+1)
               EDGES{2}(j) <= X(:,2) < EDGES{2}(j+1)

          The consequence of this definition is that values outside the
          initial and final edge values are ignored, and that the final
          bin only contains the number of elements exactly equal to the
          final edge.

     The return values, N and C, are the bin counts and centers
     respectively.  These are specially useful to produce intensity
     maps:

          [counts, centers] = hist3 (data);
          imagesc (centers{1}, centers{2}, counts)

     If there is no output argument, or if the axes graphics handle HAX
     is defined, the function will plot a 3 dimensional bar graph.  Any
     extra property/value pairs are passed directly to the underlying
     surface object.

     See also: hist, histc, lookup, mesh.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 49
Produce bivariate (2D) histogram counts or plots.



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 7
histfit


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 525
 -- statistics: histfit (X, NBINS)
 -- statistics: H = histfit (X, NBINS)

     Plot histogram with superimposed fitted normal density.

     'histfit (X, NBINS)' plots a histogram of the values in the vector
     X using NBINS bars in the histogram.  With one input argument,
     NBINS is set to the square root of the number of elements in X.

     'H = histfit (X, NBINS)' returns the bins and fitted line handles
     of the plot in H.

     Example

          histfit (randn (100, 1))

     See also: bar, hist, pareto.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 55
Plot histogram with superimposed fitted normal density.



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 11
hmmestimate


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 4473
 -- statistics: [TRANSPROBEST, OUTPROBEST] = hmmestimate (SEQUENCE,
          STATES)
 -- statistics: [...] = hmmestimate (..., "statenames", STATENAMES)
 -- statistics: [...] = hmmestimate (..., "symbols", SYMBOLS)
 -- statistics: [...] = hmmestimate (..., "pseudotransitions",
          PSEUDOTRANSITIONS)
 -- statistics: [...] = hmmestimate (..., "pseudoemissions",
          PSEUDOEMISSIONS)

     Estimation of a hidden Markov model for a given sequence.

     Estimate the matrix of transition probabilities and the matrix of
     output probabilities of a given sequence of outputs and states
     generated by a hidden Markov model.  The model assumes that the
     generation starts in state '1' at step '0' but does not include
     step '0' in the generated states and sequence.

     Arguments
     ---------

        * SEQUENCE is a vector of a sequence of given outputs.  The
          outputs must be integers ranging from '1' to the number of
          outputs of the hidden Markov model.

        * STATES is a vector of the same length as SEQUENCE of given
          states.  The states must be integers ranging from '1' to the
          number of states of the hidden Markov model.

     Return values
     -------------

        * TRANSPROBEST is the matrix of the estimated transition
          probabilities of the states.  'transprobest(i, j)' is the
          estimated probability of a transition to state 'j' given state
          'i'.

        * OUTPROBEST is the matrix of the estimated output
          probabilities.  'outprobest(i, j)' is the estimated
          probability of generating output 'j' given state 'i'.

     If ''symbols'' is specified, then SEQUENCE is expected to be a
     sequence of the elements of SYMBOLS instead of integers.  SYMBOLS
     can be a cell array.

     If ''statenames'' is specified, then STATES is expected to be a
     sequence of the elements of STATENAMES instead of integers.
     STATENAMES can be a cell array.

     If ''pseudotransitions'' is specified then the integer matrix
     PSEUDOTRANSITIONS is used as an initial number of counted
     transitions.  'pseudotransitions(i, j)' is the initial number of
     counted transitions from state 'i' to state 'j'.  TRANSPROBEST will
     have the same size as PSEUDOTRANSITIONS.  Use this if you have
     transitions that are very unlikely to occur.

     If ''pseudoemissions'' is specified then the integer matrix
     PSEUDOEMISSIONS is used as an initial number of counted outputs.
     'pseudoemissions(i, j)' is the initial number of counted outputs
     'j' given state 'i'.  If ''pseudoemissions'' is also specified then
     the number of rows of PSEUDOEMISSIONS must be the same as the
     number of rows of PSEUDOTRANSITIONS.  OUTPROBEST will have the same
     size as PSEUDOEMISSIONS.  Use this if you have outputs or states
     that are very unlikely to occur.

     Examples
     --------

          transprob = [0.8, 0.2; 0.4, 0.6];
          outprob = [0.2, 0.4, 0.4; 0.7, 0.2, 0.1];
          [sequence, states] = hmmgenerate (25, transprob, outprob);
          [transprobest, outprobest] = hmmestimate (sequence, states)

          symbols = {"A", "B", "C"};
          statenames = {"One", "Two"};
          [sequence, states] = hmmgenerate (25, transprob, outprob, ...
                                            "symbols", symbols, ...
                                            "statenames", statenames);
          [transprobest, outprobest] = hmmestimate (sequence, states, ...
                                            "symbols', symbols, ...
                                            "statenames', statenames)

          pseudotransitions = [8, 2; 4, 6];
          pseudoemissions = [2, 4, 4; 7, 2, 1];
          [sequence, states] = hmmgenerate (25, transprob, outprob);
          [transprobest, outprobest] = hmmestimate (sequence, states, ...
                                       "pseudotransitions", pseudotransitions, ...
                                       "pseudoemissions", pseudoemissions)

     References
     ----------

       1. Wendy L. Martinez and Angel R. Martinez.  'Computational
          Statistics Handbook with MATLAB'. Appendix E, pages 547-557,
          Chapman & Hall/CRC, 2001.

       2. Lawrence R. Rabiner.  A Tutorial on Hidden Markov Models and
          Selected Applications in Speech Recognition.  'Proceedings of
          the IEEE', 77(2), pages 257-286, February 1989.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 57
Estimation of a hidden Markov model for a given sequence.



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 11
hmmgenerate


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 2608
 -- statistics: [SEQUENCE, STATES] = hmmgenerate (LEN, TRANSPROB,
          OUTPROB)
 -- statistics: [...] = hmmgenerate (..., "symbols", SYMBOLS)
 -- statistics: [...] = hmmgenerate (..., "statenames", STATENAMES)

     Output sequence and hidden states of a hidden Markov model.

     Generate an output sequence and hidden states of a hidden Markov
     model.  The model starts in state '1' at step '0' but will not
     include step '0' in the generated states and sequence.

     Arguments
     ---------

        * LEN is the number of steps to generate.  SEQUENCE and STATES
          will have LEN entries each.

        * TRANSPROB is the matrix of transition probabilities of the
          states.  'transprob(i, j)' is the probability of a transition
          to state 'j' given state 'i'.

        * OUTPROB is the matrix of output probabilities.  'outprob(i,
          j)' is the probability of generating output 'j' given state
          'i'.

     Return values
     -------------

        * SEQUENCE is a vector of length LEN of the generated outputs.
          The outputs are integers ranging from '1' to 'columns
          (outprob)'.

        * STATES is a vector of length LEN of the generated hidden
          states.  The states are integers ranging from '1' to 'columns
          (transprob)'.

     If '"symbols"' is specified, then the elements of SYMBOLS are used
     for the output sequence instead of integers ranging from '1' to
     'columns (outprob)'.  SYMBOLS can be a cell array.

     If '"statenames"' is specified, then the elements of STATENAMES are
     used for the states instead of integers ranging from '1' to
     'columns (transprob)'.  STATENAMES can be a cell array.

     Examples
     --------

          transprob = [0.8, 0.2; 0.4, 0.6];
          outprob = [0.2, 0.4, 0.4; 0.7, 0.2, 0.1];
          [sequence, states] = hmmgenerate (25, transprob, outprob)

          symbols = {"A", "B", "C"};
          statenames = {"One", "Two"};
          [sequence, states] = hmmgenerate (25, transprob, outprob, ...
                                            "symbols", symbols, ...
                                            "statenames", statenames)

     References
     ----------

       1. Wendy L. Martinez and Angel R. Martinez.  'Computational
          Statistics Handbook with MATLAB'. Appendix E, pages 547-557,
          Chapman & Hall/CRC, 2001.

       2. Lawrence R. Rabiner.  A Tutorial on Hidden Markov Models and
          Selected Applications in Speech Recognition.  'Proceedings of
          the IEEE', 77(2), pages 257-286, February 1989.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 59
Output sequence and hidden states of a hidden Markov model.



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 10
hmmviterbi


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 2687
 -- statistics: VPATH = hmmviterbi (SEQUENCE, TRANSPROB, OUTPROB)
 -- statistics: VPATH = hmmviterbi (..., "symbols", SYMBOLS)
 -- statistics: VPATH = hmmviterbi (..., "statenames", STATENAMES)

     Viterbi path of a hidden Markov model.

     Use the Viterbi algorithm to find the Viterbi path of a hidden
     Markov model given a sequence of outputs.  The model assumes that
     the generation starts in state '1' at step '0' but does not include
     step '0' in the generated states and sequence.

     Arguments
     ---------

        * SEQUENCE is the vector of length LEN of given outputs.  The
          outputs must be integers ranging from '1' to 'columns
          (outprob)'.

        * TRANSPROB is the matrix of transition probabilities of the
          states.  'transprob(i, j)' is the probability of a transition
          to state 'j' given state 'i'.

        * OUTPROB is the matrix of output probabilities.  'outprob(i,
          j)' is the probability of generating output 'j' given state
          'i'.

     Return values
     -------------

        * VPATH is the vector of the same length as SEQUENCE of the
          estimated hidden states.  The states are integers ranging from
          '1' to 'columns (transprob)'.

     If '"symbols"' is specified, then SEQUENCE is expected to be a
     sequence of the elements of SYMBOLS instead of integers ranging
     from '1' to 'columns (outprob)'.  SYMBOLS can be a cell array.

     If '"statenames"' is specified, then the elements of STATENAMES are
     used for the states in VPATH instead of integers ranging from '1'
     to 'columns (transprob)'.  STATENAMES can be a cell array.

     Examples
     --------

          transprob = [0.8, 0.2; 0.4, 0.6];
          outprob = [0.2, 0.4, 0.4; 0.7, 0.2, 0.1];
          [sequence, states] = hmmgenerate (25, transprob, outprob);
          vpath = hmmviterbi (sequence, transprob, outprob);

          symbols = {"A", "B", "C"};
          statenames = {"One", "Two"};
          [sequence, states] = hmmgenerate (25, transprob, outprob, ...
                               "symbols", symbols, "statenames", statenames);
          vpath = hmmviterbi (sequence, transprob, outprob, ...
                  "symbols", symbols, "statenames", statenames);

     References
     ----------

       1. Wendy L. Martinez and Angel R. Martinez.  'Computational
          Statistics Handbook with MATLAB'. Appendix E, pages 547-557,
          Chapman & Hall/CRC, 2001.

       2. Lawrence R. Rabiner.  A Tutorial on Hidden Markov Models and
          Selected Applications in Speech Recognition.  'Proceedings of
          the IEEE', 77(2), pages 257-286, February 1989.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 38
Viterbi path of a hidden Markov model.



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 16
hotelling_t2test


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 1848
 -- statistics: [H, PVAL, STATS] = hotelling_t2test (X)
 -- statistics: [...] = hotelling_t2test (X, M)
 -- statistics: [...] = hotelling_t2test (X, Y)
 -- statistics: [...] = hotelling_t2test (X, M, NAME, VALUE)
 -- statistics: [...] = hotelling_t2test (X, Y, NAME, VALUE)

     Compute Hotelling's T^2 ("T-squared") test for a single sample or
     two dependent samples (paired-samples).

     For a sample X from a multivariate normal distribution with unknown
     mean and covariance matrix, test the null hypothesis that 'mean (X)
     == M'.

     For two dependent samples X and Y from a multivariate normal
     distributions with unknown means and covariance matrices, test the
     null hypothesis that 'mean (X - Y) == 0'.

     hotelling_t2test treats NaNs as missing values, and ignores the
     corresponding rows.

     Name-Value pair arguments can be used to set statistical
     significance.  "alpha" can be used to specify the significance
     level of the test (the default value is 0.05).

     If H is 1 the null hypothesis is rejected, meaning that the tested
     sample does not come from a multivariate distribution with mean M,
     or in case of two dependent samples that they do not come from the
     same multivariate distribution.  If H is 0, then the null
     hypothesis cannot be rejected and it can be assumed that it holds
     true.

     The p-value of the test is returned in PVAL.

     STATS is a structure containing the value of the Hotelling's T^2
     test statistic in the field "Tsq", and the degrees of freedom of
     the F distribution in the fields "df1" and "df2".  Under the null
     hypothesis, (n-p) T^2 / (p(n-1)) has an F distribution with p and
     n-p degrees of freedom, where n and p are the numbers of samples
     and variables, respectively.

     See also: hotelling_t2test2.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 80
Compute Hotelling's T^2 ("T-squared") test for a single sample or two
depende...



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 17
hotelling_t2test2


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 1548
 -- statistics: [H, PVAL, STATS] = hotelling_t2test2 (X, Y)
 -- statistics: [...] = hotelling_t2test2 (X, Y, NAME, VALUE)

     Compute Hotelling's T^2 ("T-squared") test for two independent
     samples.

     For two samples X from multivariate normal distributions with the
     same number of variables (columns), unknown means and unknown equal
     covariance matrices, test the null hypothesis 'mean (X) == mean
     (Y)'.

     hotelling_t2test2 treats NaNs as missing values, and ignores the
     corresponding rows for each sample independently.

     Name-Value pair arguments can be used to set statistical
     significance.  "alpha" can be used to specify the significance
     level of the test (the default value is 0.05).

     If H is 1 the null hypothesis is rejected, meaning that the tested
     samples do not come from the same multivariate distribution.  If H
     is 0, then the null hypothesis cannot be rejected and it can be
     assumed that both samples come from the same multivariate
     distribution.

     The p-value of the test is returned in PVAL.

     STATS is a structure containing the value of the Hotelling's T^2
     test statistic in the field "Tsq", and the degrees of freedom of
     the F distribution in the fields "df1" and "df2".  Under the null
     hypothesis,

          (n_x+n_y-p-1) T^2 / (p(n_x+n_y-2))

     has an F distribution with p and n_x+n_y-p-1 degrees of freedom,
     where n_x and n_y are the sample sizes and p is the number of
     variables.

     See also: hotelling_t2test.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 71
Compute Hotelling's T^2 ("T-squared") test for two independent samples.



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 8
hygestat


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 1331
 -- statistics: [MN, V] = hygestat (T, M, N)

     Compute mean and variance of the hypergeometric distribution.

     Arguments
     ---------

        * T is the total size of the population of the hypergeometric
          distribution.  The elements of T must be positive natural
          numbers

        * M is the number of marked items of the hypergeometric
          distribution.  The elements of M must be natural numbers

        * N is the size of the drawn sample of the hypergeometric
          distribution.  The elements of N must be positive natural
          numbers
     T, M, and N must be of common size or scalar

     Return values
     -------------

        * MN is the mean of the hypergeometric distribution

        * V is the variance of the hypergeometric distribution

     Examples
     --------

          t = 4:9;
          m = 0:5;
          n = 1:6;
          [mn, v] = hygestat (t, m, n)

          [mn, v] = hygestat (t, m, 2)

     References
     ----------

       1. Wendy L. Martinez and Angel R. Martinez.  'Computational
          Statistics Handbook with MATLAB'. Appendix E, pages 547-557,
          Chapman & Hall/CRC, 2001.

       2. Athanasios Papoulis.  'Probability, Random Variables, and
          Stochastic Processes'.  McGraw-Hill, New York, second edition,
          1984.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 61
Compute mean and variance of the hypergeometric distribution.



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 12
inconsistent


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 1046
 -- statistics: Y = inconsistent (Z)
 -- statistics: Y = inconsistent (Z, D)

     Compute the inconsistency coefficient for each link of a
     hierarchical cluster tree.

     Given a hierarchical cluster tree Z generated by the 'linkage'
     function, 'inconsistent' computes the inconsistency coefficient for
     each link of the tree, using all the links down to the D-th level
     below that link.

     The default depth D is 2, which means that only two levels are
     considered: the level of the computed link and the level below
     that.

     Each row of Y corresponds to the row of same index of Z.  The
     columns of Y are respectively: the mean of the heights of the links
     used for the calculation, the standard deviation of the heights of
     those links, the number of links used, the inconsistency
     coefficient.

     *Reference* Jain, A., and R. Dubes.  Algorithms for Clustering
     Data.  Upper Saddle River, NJ: Prentice-Hall, 1988.

See also: cluster, clusterdata, dendrogram, linkage, pdist, squareform.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 80
Compute the inconsistency coefficient for each link of a hierarchical
cluster...



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 9
ismissing


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 1247
 -- statistics: TF = ismissing (A)
 -- statistics: TF = ismissing (A, INDICATOR)

     Find missing data in a numeric or string array.

     Given an input numeric data array, char array, or array of cell
     strings A, 'ismissing' returns a logical array TF with the same
     dimensions as A, where 'true' values match missing values in the
     input data.

     The optional input INDICATOR is an array of values that represent
     missing values in the input data.  The values which represent
     missing data by default depend on the data type of A:

        * 'NaN': 'single', 'double'.

        * '' '' (white space): 'char'.

        * '{''}': string cells.

     Note: logical and numeric data types may be used in any combination
     for A and INDICATOR.  A and the indicator values will be compared
     as type double, and the output will have the same class as A.  Data
     types other than those specified above have no defined 'missing'
     value.  As such, the TF output for those inputs will always be
     'false(size(A))'.  The exception to this is that INDICATOR can be
     specified for logical and numeric inputs to designate values that
     will register as 'missing'.

     See also: rmmissing, standardizeMissing.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 47
Find missing data in a numeric or string array.



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 9
jackknife


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 2009
 -- statistics: JACKSTAT = jackknife (E, X)
 -- statistics: JACKSTAT = jackknife (E, X, ...)

     Compute jackknife estimates of a parameter taking one or more given
     samples as parameters.

     In particular, E is the estimator to be jackknifed as a function
     name, handle, or inline function, and X is the sample for which the
     estimate is to be taken.  The I-th entry of JACKSTAT will contain
     the value of the estimator on the sample X with its I-th row
     omitted.

          jackstat (I) = E(X(1 : I - 1, I + 1 : length(X)))

     Depending on the number of samples to be used, the estimator must
     have the appropriate form:
        * If only one sample is used, then the estimator need not be
          concerned with cell arrays, for example jackknifing the
          standard deviation of a sample can be performed with 'JACKSTAT
          = jackknife (@std, rand (100, 1))'.
        * If, however, more than one sample is to be used, the samples
          must all be of equal size, and the estimator must address them
          as elements of a cell-array, in which they are aggregated in
          their order of appearance:

          JACKSTAT = jackknife (@(x) std(x{1})/var(x{2}),
          rand (100, 1), randn (100, 1))

     If all goes well, a theoretical value P for the parameter is
     already known, N is the sample size,

     'T = N * E(X) - (N - 1) * mean(JACKSTAT)'

     and

     'V = sumsq(N * E(X) - (N - 1) * JACKSTAT - T) / (N * (N - 1))'

     then

     '(T-P)/sqrt(V)' should follow a t-distribution with N-1 degrees of
     freedom.

     Jackknifing is a well known method to reduce bias.  Further details
     can be found in:

     References
     ----------

       1. Rupert G. Miller.  The jackknife - a review.  Biometrika
          (1974), 61(1):1-15.  doi:10.1093/biomet/61.1.1
       2. Rupert G. Miller.  Jackknifing Variances.  Ann.  Math.
          Statist.  (1968), Volume 39, Number 2, 567-582.
          doi:10.1214/aoms/1177698418


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 80
Compute jackknife estimates of a parameter taking one or more given
samples a...



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 6
kmeans


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 5114
 -- statistics: IDX = kmeans (DATA, K)
 -- statistics: [IDX, CENTERS] = kmeans (DATA, K)
 -- statistics: [IDX, CENTERS, SUMD] = kmeans (DATA, K)
 -- statistics: [IDX, CENTERS, SUMD, DIST] = kmeans (DATA, K)
 -- statistics: [...] = kmeans (DATA, K, PARAM1, VALUE1, ...)
 -- statistics: [...] = kmeans (DATA, [], "start", START, ...)

     Perform a K-means clustering of the NxD table DATA.

     If parameter "start" is specified, then K may be empty in which
     case K is set to the number of rows of START.

     The outputs are:

     'IDX'
          An Nx1 vector whose Ith element is the class to which row I of
          DATA is assigned.

     'CENTERS'
          A KxD array whose Ith row is the centroid of cluster I.

     'SUMD'
          A Kx1 vector whose Ith entry is the sum of the distances from
          samples in cluster I to centroid I.

     'DIST'
          An NxK matrix whose IJth element is the distance from sample I
          to centroid J.

     The following parameters may be placed in any order.  Each
     parameter must be followed by its value.

     'START'
          The initialization method for the centroids.
          'plus'
               (Default) The k-means++ algorithm.
          'sample'
               A subset of K rows from DATA, sampled uniformly without
               replacement.
          'cluster'
               Perform a pilot clustering on 10% of the rows of DATA.
          'uniform'
               Each component of each centroid is drawn uniformly from
               the interval between the maximum and minimum values of
               that component within DATA.  This performs poorly and is
               implemented only for Matlab compatibility.
          'A'
               A KxDxR matrix, where R is the number of replicates.

     'REPLICATES'
          An positive integer specifying the number of independent
          clusterings to perform.  The output values are the values for
          the best clustering, i.e., the one with the smallest value of
          SUMD.  If START is numeric, then REPLICATES defaults to (and
          must equal) the size of the third dimension of START.
          Otherwise it defaults to 1.

     'MAXITER'
          The maximum number of iterations to perform for each
          replicate.  If the maximum change of any centroid is less than
          0.001, then the replicate terminates even if MAXITER
          iterations have no occurred.  The default is 100.

     'DISTANCE'
          The distance measure used for partitioning and calculating
          centroids.

          'sqeuclidean'
               The squared Euclidean distance.  i.e.  the sum of the
               squares of the differences between corresponding
               components.  In this case, the centroid is the arithmetic
               mean of all samples in its cluster.  This is the only
               distance for which this algorithm is truly "k-means".

          'cityblock'
               The sum metric, or L1 distance, i.e., the sum of the
               absolute differences between corresponding components.
               In this case, the centroid is the median of all samples
               in its cluster.  This gives the k-medians algorithm.

          'cosine'
               One minus the cosine of the included angle between points
               (treated as vectors).  Each centroid is the mean of the
               points in that cluster, after normalizing those points to
               unit Euclidean length.

          'correlation'
               One minus the sample correlation between points (treated
               as sequences of values).  Each centroid is the
               component-wise mean of the points in that cluster, after
               centering and normalizing those points to zero mean and
               unit standard deviation.

          'hamming'
               The number of components in which the sample and the
               centroid differ.  In this case, the centroid is the
               median of all samples in its cluster.  Unlike Matlab,
               Octave allows non-logical DATA.

     'EMPTYACTION'
          What to do when a centroid is not the closest to any data
          sample.

          'error'
               Throw an error.
          'singleton'
               (Default) Select the row of DATA that has the highest
               error and use that as the new centroid.
          'drop'
               Remove the centroid, and continue computation with one
               fewer centroid.  The dimensions of the outputs CENTROIDS
               and D are unchanged, with values for omitted centroids
               replaced by NA.

     'DISPLAY'
          Display a text summary.
          'off'
               (Default) Display no summary.
          'final'
               Display a summary for each clustering operation.
          'iter'
               Display a summary for each iteration of a clustering
               operation.

     Example:

     [~,c] = kmeans (rand(10, 3), 2, "emptyaction", "singleton");

     See also: linkage.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 51
Perform a K-means clustering of the NxD table DATA.



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 13
kruskalwallis


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 2559
 -- statistics: P = kruskalwallis (X)
 -- statistics: P = kruskalwallis (X, GROUP)
 -- statistics: P = kruskalwallis (X, GROUP, DISPLAYOPT)
 -- statistics: [P, TBL] = kruskalwallis (X, ...)
 -- statistics: [P, TBL, STATS] = kruskalwallis (X, ...)

     Perform a Kruskal-Wallis test, the non-parametric alternative of a
     one-way analysis of variance (ANOVA), for comparing the means of
     two or more groups of data under the null hypothesis that the
     groups are drawn from the same population, i.e.  the group means
     are equal.

     kruskalwallis can take up to three input arguments:

        * X contains the data and it can either be a vector or matrix.
          If X is a matrix, then each column is treated as a separate
          group.  If X is a vector, then the GROUP argument is
          mandatory.
        * GROUP contains the names for each group.  If X is a matrix,
          then GROUP can either be a cell array of strings of a
          character array, with one row per column of X.  If you want to
          omit this argument, enter an empty array ([]).  If X is a
          vector, then GROUP must be a vector of the same lenth, or a
          string array or cell array of strings with one row for each
          element of X.  X values corresponding to the same value of
          GROUP are placed in the same group.
        * DISPLAYOPT is an optional parameter for displaying the groups
          contained in the data in a boxplot.  If omitted, it is 'on' by
          default.  If group names are defined in GROUP, these are used
          to identify the groups in the boxplot.  Use 'off' to omit
          displaying this figure.

     kruskalwallis can return up to three output arguments:

        * P is the p-value of the null hypothesis that all group means
          are equal.
        * TBL is a cell array containing the results in a standard ANOVA
          table.
        * STATS is a structure containing statistics useful for
          performing a multiple comparison of means with the MULTCOMPARE
          function.

     If kruskalwallis is called without any output arguments, then it
     prints the results in a one-way ANOVA table to the standard output.
     It is also printed when DISPLAYOPT is 'on'.

     Examples:

          x = meshgrid (1:6);
          x = x + normrnd (0, 1, 6, 6);
          [p, atab] = kruskalwallis(x);

          x = ones (50, 4) .* [-2, 0, 1, 5];
          x = x + normrnd (0, 2, 50, 4);
          group = {"A", "B", "C", "D"};
          kruskalwallis (x, group);


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 80
Perform a Kruskal-Wallis test, the non-parametric alternative of a
one-way an...



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 6
kstest


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 3714
 -- statistics: H = kstest (X)
 -- statistics: H = kstest (X, NAME, VALUE)
 -- statistics: [H, P] = kstest (...)
 -- statistics: [H, P, KSSTAT, CV] = kstest (...)

     Single sample Kolmogorov-Smirnov (K-S) goodness-of-fit hypothesis
     test.

     'H = kstest (X)' performs a Kolmogorov-Smirnov (K-S) test to
     determine if a random sample X could have come from a standard
     normal distribution.  H indicates the results of the null
     hypothesis test.

        * H = 0 => Do not reject the null hypothesis at the 5%
          significance
        * H = 1 => Reject the null hypothesis at the 5% significance

     X is a vector representing a random sample from some unknown
     distribution with a cumulative distribution function F(X). Missing
     values declared as NaNs in X are ignored.

     'H = kstest (X, NAME, VALUE)' returns a test decision for a
     single-sample K-S test with additional options specified by one or
     more name-value pair arguments as shown below.

     "alpha"        A value ALPHA between 0 and 1 specifying the
                    significance level.  Default is 0.05 for 5%
                    significance.
                    
     "CDF"          CDF is the c.d.f.  under the null hypothesis.  It can be
                    specified either as a function handle or a a function
                    name of an existing cdf function or as a two-column
                    matrix.  If not provided, the default is the standard
                    normal, N(0,1).
                    
     "tail"         A string indicating the type of test:

        "unequal"      "F(X) not equal to CDF(X)" (two-sided) (Default)
                       
        "larger"       "F(X) > CDF(X)" (one-sided)
                       
        "smaller"      "CDF(X) < F(X)" (one-sided)

     Let S(X) be the empirical c.d.f.  estimated from the sample vector
     X, F(X) be the corresponding true (but unknown) population c.d.f.,
     and CDF be the known input c.d.f.  specified under the null
     hypothesis.  For 'tail' = "unequal", "larger", and "smaller", the
     test statistics are max|S(X) - CDF(X)|, max[S(X) - CDF(X)], and
     max[CDF(X) - S(X)], respectively.

     '[H, P] = kstest (...)' also returns the asymptotic p-value P.

     '[H, P, KSSTAT] = kstest (...)' returns the K-S test statistic
     KSSTAT defined above for the test type indicated by the "tail"
     option

     In the matrix version of CDF, column 1 contains the x-axis data and
     column 2 the corresponding y-axis c.d.f data.  Since the K-S test
     statistic will occur at one of the observations in X, the
     calculation is most efficient when CDF is only specified at the
     observations in X.  When column 1 of CDF represents x-axis points
     independent of X, CDF is linearly interpolated at the observations
     found in the vector X.  In this case, the interval along the x-axis
     (the column 1 spread of CDF) must span the observations in X for
     successful interpolation.

     The decision to reject the null hypothesis is based on comparing
     the p-value P with the "alpha" value, not by comparing the
     statistic KSSTAT with the critical value CV.  CV is computed
     separately using an approximate formula or by interpolation using
     Miller's approximation table.  The formula and table cover the
     range 0.01 <= "alpha" <= 0.2 for two-sided tests and 0.005 <=
     "alpha" <= 0.1 for one-sided tests.  CV is returned as NaN if
     "alpha" is outside this range.  Since CV is approximate, a
     comparison of KSSTAT with CV may occasionally lead to a different
     conclusion than a comparison of P with "alpha".

     See also: kstest2, cdfplot.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 71
Single sample Kolmogorov-Smirnov (K-S) goodness-of-fit hypothesis test.



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 7
kstest2


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 2241
 -- statistics: H = kstest2 (X1, X2)
 -- statistics: H = kstest2 (X1, X2, NAME, VALUE)
 -- statistics: [H, P] = kstest2 (...)
 -- statistics: [H, P, KS2STAT] = kstest2 (...)

     Two-sample Kolmogorov-Smirnov goodness-of-fit hypothesis test.

     'H = kstest2 (X1, X2)' returns a test decision for the null
     hypothesis that the data in vectors X1 and X2 are from the same
     continuous distribution, using the two-sample Kolmogorov-Smirnov
     test.  The alternative hypothesis is that X1 and X2 are from
     different continuous distributions.  The result H is 1 if the test
     rejects the null hypothesis at the 5% significance level, and 0
     otherwise.

     'H = kstest2 (X1, X2, NAME, VALUE)' returns a test decision for a
     two-sample Kolmogorov-Smirnov test with additional options
     specified by one or more name-value pair arguments as shown below.

     "alpha"        A value ALPHA between 0 and 1 specifying the
                    significance level.  Default is 0.05 for 5%
                    significance.
                    
     "tail"         A string indicating the type of test:

        "unequal"      "F(X1) not equal to F(X2)" (two-sided) [Default]
                       
        "larger"       "F(X1) > F(X2)" (one-sided)
                       
        "smaller"      "F(X1) < F(X2)" (one-sided)

     The two-sided test uses the maximum absolute difference between the
     cdfs of the distributions of the two data vectors.  The test
     statistic is 'D* = max(|F1(x) - F2(x)|)', where F1(x) is the
     proportion of X1 values less or equal to x and F2(x) is the
     proportion of X2 values less than or equal to x.  The one-sided
     test uses the actual value of the difference between the cdfs of
     the distributions of the two data vectors rather than the absolute
     value.  The test statistic is 'D* = max(F1(x) - F2(x))' or 'D* =
     max(F2(x) - F1(x))' for 'tail' = "larger" or "smaller",
     respectively.

     '[H, P] = kstest2 (...)' also returns the asymptotic p-value P.

     '[H, P, KS2STAT] = kstest2 (...)' also returns the
     Kolmogorov-Smirnov test statistic KS2STAT defined above for the
     test type indicated by 'tail'.

     See also: kstest, cdfplot.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 62
Two-sample Kolmogorov-Smirnov goodness-of-fit hypothesis test.



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 11
levene_test


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 2678
 -- statistics: H = levene_test (X)
 -- statistics: H = levene_test (X, GROUP)
 -- statistics: H = levene_test (X, ALPHA)
 -- statistics: H = levene_test (X, TESTTYPE)
 -- statistics: H = levene_test (X, GROUP, ALPHA)
 -- statistics: H = levene_test (X, GROUP, TESTTYPE)
 -- statistics: H = levene_test (X, GROUP, ALPHA, TESTTYPE)
 -- statistics: [H, PVAL] = levene_test (...)
 -- statistics: [H, PVAL, W] = levene_test (...)
 -- statistics: [H, PVAL, W, DF] = levene_test (...)

     Perform a Levene's test for the homogeneity of variances.

     Under the null hypothesis of equal variances, the test statistic W
     approximately follows an F distribution with DF degrees of freedom
     being a vector ([k-1, N-k]).

     The p-value (1 minus the CDF of this distribution at W) is returned
     in PVAL.  H = 1 if the null hypothesis is rejected at the
     significance level of ALPHA.  Otherwise H = 0.

     Input Arguments:

        * X contains the data and it can either be a vector or matrix.
          If X is a matrix, then each column is treated as a separate
          group.  If X is a vector, then the GROUP argument is
          mandatory.  NaN values are omitted.

        * GROUP contains the names for each group.  If X is a vector,
          then GROUP must be a vector of the same length, or a string
          array or cell array of strings with one row for each element
          of X.  X values corresponding to the same value of GROUP are
          placed in the same group.  If X is a matrix, then GROUP can
          either be a cell array of strings of a character array, with
          one row per column of X in the same way it is used in 'anova1'
          function.  If X is a matrix, then GROUP can be omitted either
          by entering an empty array ([]) or by parsing only ALPHA as a
          second argument (if required to change its default value).

        * ALPHA is the statistical significance value at which the null
          hypothesis is rejected.  Its default value is 0.05 and it can
          be parsed either as a second argument (when GROUP is omitted)
          or as a third argument.

        * TESTTYPE is a string determining the type of Levene's test.
          By default it is set to "absolute", but the user can also
          parse "quadratic" in order to perform Levene's Quadratic test
          for equal variances or "median" in order to to perform the
          Brown-Forsythe's test.  These options determine how the Z_ij
          values are computed.  If an invalid name is parsed for
          TESTTYPE, then the Levene's Absolute test is performed.

     See also: bartlett_test, vartest2, vartestn.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 57
Perform a Levene's test for the homogeneity of variances.



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 7
linkage


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 2998
 -- statistics: Y = linkage (D)
 -- statistics: Y = linkage (D, METHOD)
 -- statistics: Y = linkage (X)
 -- statistics: Y = linkage (X, METHOD)
 -- statistics: Y = linkage (X, METHOD, METRIC)
 -- Function File: Y = linkage (X, METHOD, ARGLIST)

     Produce a hierarchical clustering dendrogram

     D is the dissimilarity matrix relative to n observations, formatted
     as a (n-1)*n/2x1 vector as produced by 'pdist'.  Alternatively, X
     contains data formatted for input to 'pdist', METRIC is a metric
     for 'pdist' and ARGLIST is a cell array containing arguments that
     are passed to 'pdist'.

     'linkage' starts by putting each observation into a singleton
     cluster and numbering those from 1 to n.  Then it merges two
     clusters, chosen according to METHOD, to create a new cluster
     numbered n+1, and so on until all observations are grouped into a
     single cluster numbered 2(n-1).  Row k of the (m-1)x3 output matrix
     relates to cluster n+k: the first two columns are the numbers of
     the two component clusters and column 3 contains their distance.

     METHOD defines the way the distance between two clusters is
     computed and how they are recomputed when two clusters are merged:

     '"single" (default)'
          Distance between two clusters is the minimum distance between
          two elements belonging each to one cluster.  Produces a
          cluster tree known as minimum spanning tree.

     '"complete"'
          Furthest distance between two elements belonging each to one
          cluster.

     '"average"'
          Unweighted pair group method with averaging (UPGMA). The mean
          distance between all pair of elements each belonging to one
          cluster.

     '"weighted"'
          Weighted pair group method with averaging (WPGMA). When two
          clusters A and B are joined together, the new distance to a
          cluster C is the mean between distances A-C and B-C.

     '"centroid"'
          Unweighted Pair-Group Method using Centroids (UPGMC). Assumes
          Euclidean metric.  The distance between cluster centroids,
          each centroid being the center of mass of a cluster.

     '"median"'
          Weighted pair-group method using centroids (WPGMC). Assumes
          Euclidean metric.  Distance between cluster centroids.  When
          two clusters are joined together, the new centroid is the
          midpoint between the joined centroids.

     '"ward"'
          Ward's sum of squared deviations about the group mean (ESS).
          Also known as minimum variance or inner squared distance.
          Assumes Euclidean metric.  How much the moment of inertia of
          the merged cluster exceeds the sum of those of the individual
          clusters.

     *Reference* Ward, J. H. Hierarchical Grouping to Optimize an
     Objective Function J. Am.  Statist.  Assoc.  1963, 58, 236-244,
     <http://iv.slis.indiana.edu/sw/data/ward.pdf>.

     See also: pdist,squareform.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 44
Produce a hierarchical clustering dendrogram



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 19
logistic_regression


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 2316
 -- statistics: [INTERCEPT, SLOPE, DEV, DL, D2L, P, STATS] =
          logistic_regression (Y, X, PRINT, INTERCEPT, SLOPE)

     Perform ordinal logistic regression.

     Suppose Y takes values in k ordered categories, and let 'P_i (X)'
     be the cumulative probability that Y falls in one of the first i
     categories given the covariate X.  Then

          [INTERCEPT, SLOPE] = logistic_regression (Y, X)

     fits the model

          logit (P_i (X)) = X * SLOPE + INTERCEPT_i,   i = 1 ... k-1

     The number of ordinal categories, k, is taken to be the number of
     distinct values of 'round (Y)'.  If k equals 2, Y is binary and the
     model is ordinary logistic regression.  The matrix X is assumed to
     have full column rank.

     Given Y only, 'INTERCEPT = logistic_regression (Y)' fits the model
     with baseline logit odds only.

     The full form is

          [INTERCEPT, SLOPE, DEV, DL, D2L, P, STATS]
             = logistic_regression (Y, X, PRINT, INTERCEPT, SLOPE)

     in which all output arguments and all input arguments except Y are
     optional.

     Setting PRINT to 1 requests summary information about the fitted
     model to be displayed.  Setting PRINT to 2 requests information
     about convergence at each iteration.  Other values request no
     information to be displayed.  The input arguments INTERCEPT and
     SLOPE give initial estimates for INTERCEPT and SLOPE.

     The returned value DEV holds minus twice the log-likelihood.

     The returned values DL and D2L are the vector of first and the
     matrix of second derivatives of the log-likelihood with respect to
     INTERCEPT and SLOPE.

     P holds estimates for the conditional distribution of Y given X.

     STATS returns a structure that contains the following fields:
        * "intercept": intercept coefficients
        * "slope": slope coefficients
        * "dfe": degrees of freedom for error
        * "coeff": regression coefficients (intercepts and slops)
        * "covb": estimated covariance matrix for coefficients (coeff)
        * "coeffcorr": correlation matrix for coeff
        * "se": standard errors of the coeff
        * "s": theoretical dispersion parameter
        * "z": z statistics for coeff
        * "pval": p-values for coeff
        * "resid": raw residuals


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 36
Perform ordinal logistic regression.



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 5
logit


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 180
 -- statistics: Y = logit (P)

     Compute the logit for each value of P

     The logit is defined as

          logit (P) = log (P / (1-P))

     See also: probit, logistic_cdf.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 37
Compute the logit for each value of P



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 8
lognstat


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 1074
 -- statistics: [M, V] = lognstat (MU, SIGMA)

     Compute mean and variance of the lognormal distribution.

     Arguments
     ---------

        * MU is the first parameter of the lognormal distribution

        * SIGMA is the second parameter of the lognormal distribution.
          SIGMA must be positive or zero
     MU and SIGMA must be of common size or one of them must be scalar

     Return values
     -------------

        * M is the mean of the lognormal distribution

        * V is the variance of the lognormal distribution

     Examples
     --------

          mu = 0:0.2:1;
          sigma = 0.2:0.2:1.2;
          [m, v] = lognstat (mu, sigma)

          [m, v] = lognstat (0, sigma)

     References
     ----------

       1. Wendy L. Martinez and Angel R. Martinez.  'Computational
          Statistics Handbook with MATLAB'. Appendix E, pages 547-557,
          Chapman & Hall/CRC, 2001.

       2. Athanasios Papoulis.  'Probability, Random Variables, and
          Stochastic Processes'.  McGraw-Hill, New York, second edition,
          1984.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 56
Compute mean and variance of the lognormal distribution.



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 5
mahal


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 493
 -- statistics: D = mahal (Y, X)

     Mahalanobis' D-square distance.

     Return the Mahalanobis' D-square distance of the points in Y from
     the distribution implied by points X.

     Specifically, it uses a Cholesky decomposition to set

           answer(i) = (Y(i,:) - mean (X)) * inv (A) * (Y(i,:)-mean (X))'

     where A is the covariance of X.

     The data X and Y must have the same number of components (columns),
     but may have a different number of observations (rows).


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 31
Mahalanobis' D-square distance.



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 7
manova1


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 3111
 -- statistics: D = manova1 (X, GROUP)
 -- statistics: D = manova1 (X, GROUP, ALPHA)
 -- statistics: [D, P] = manova1 (...)
 -- statistics: [D, P, STATS] = manova1 (...)

     One-way multivariate analysis of variance (MANOVA).

     'D = manova1 (X, GROUP, ALPHA)' performs a one-way MANOVA for
     comparing the mean vectors of two or more groups of multivariate
     data.

     X is a matrix with each row representing a multivariate
     observation, and each column representing a variable.

     GROUP is a numeric vector, string array, or cell array of strings
     with the same number of rows as X.  X values are in the same group
     if they correspond to the same value of GROUP.

     ALPHA is the scalar significance level and is 0.05 by default.

     D is an estimate of the dimension of the group means.  It is the
     smallest dimension such that a test of the hypothesis that the
     means lie on a space of that dimension is not rejected.  If D = 0
     for example, we cannot reject the hypothesis that the means are the
     same.  If D = 1, we reject the hypothesis that the means are the
     same but we cannot reject the hypothesis that they lie on a line.

     '[D, P] = manova1 (...)' returns P, a vector of p-values for
     testing the null hypothesis that the mean vectors of the groups lie
     on various dimensions.  P(1) is the p-value for a test of dimension
     0, P(2) for dimension 1, etc.

     '[D, P, STATS] = manova1 (...)' returns a STATS structure with the
     following fields:

          "W"            within-group sum of squares and products matrix
          "B"            between-group sum of squares and products matrix
          "T"            total sum of squares and products matrix
          "dfW"          degrees of freedom for WSSP matrix
          "dfB"          degrees of freedom for BSSP matrix
          "dfT"          degrees of freedom for TSSP matrix
          "lambda"       value of Wilk's lambda (the test statistic)
          "chisq"        transformation of lambda to a chi-square
                         distribution
          "chisqdf"      degrees of freedom for chisq
          "eigenval"     eigenvalues of (WSSP^-1) * BSSP
          "eigenvec"     eigenvectors of (WSSP^-1) * BSSP; these are the
                         coefficients for canonical variables, and they are
                         scaled so the within-group variance of C is 1
          "canon"        canonical variables, equal to XC*eigenvec, where XC
                         is X with columns centered by subtracting their
                         means
          "mdist"        Mahalanobis distance from each point to its group
                         mean
          "gmdist"       Mahalanobis distances between each pair of group
                         means
          "gnames"       Group names

     The canonical variables C have the property that C(:,1) is the
     linear combination of the X columns that has the maximum separation
     between groups, C(:,2) has the maximum separation subject to it
     being orthogonal to C(:,1), and so on.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 51
One-way multivariate analysis of variance (MANOVA).



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 13
manovacluster


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 954
 -- statistics: manovacluster (STATS)
 -- statistics: manovacluster (STATS, METHOD)
 -- statistics: H = manovacluster (STATS)
 -- statistics: H = manovacluster (STATS, METHOD)

     Cluster group means using manova1 output.

     'manovacluster (STATS)' draws a dendrogram showing the clustering
     of group means, calculated using the output STATS structure from
     'manova1' and applying the single linkage algorithm.  See the
     'dendrogram' function for more information about the figure.

     'manovacluster (STATS, METHOD)' uses the METHOD algorithm in place
     of single linkage.  The available methods are:

          "single"       -- nearest distance
          "complete"     -- furthest distance
          "average"      -- average distance
          "centroid"     -- center of mass distance
          "ward"         -- inner squared distance

     'H = manovacluster (...)' returns a vector of line handles.

     See also: manova1.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 41
Cluster group means using manova1 output.



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 12
mcnemar_test


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 580
 -- statistics: [PVAL, CHISQ, DF] = mcnemar_test (X)

     Perform a McNemar's test.

     For a square contingency table X of data cross-classified on the
     row and column variables, McNemar's test can be used for testing
     the null hypothesis of symmetry of the classification
     probabilities.

     Under the null, CHISQ is approximately distributed as chisquare
     with DF degrees of freedom.

     The p-value (1 minus the CDF of this distribution at CHISQ) is
     returned in PVAL.

     If no output argument is given, the p-value of the test is
     displayed.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 25
Perform a McNemar's test.



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 8
mhsample


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 3388
 -- statistics: [SMPL, ACCEPT] = mhsample (START, NSAMPLES, PROPERTY,
          VALUE, ...)

     Draws NSAMPLES samples from a target stationary distribution PDF
     using Metropolis-Hastings algorithm.

     Inputs:

        * START is a NCHAIN by DIM matrix of starting points for each
          Markov chain.  Each row is the starting point of a different
          chain and each column corresponds to a different dimension.

        * NSAMPLES is the number of samples, the length of each Markov
          chain.

     Some property-value pairs can or must be specified, they are:

     (Required) One of:

        * "pdf" PDF: a function handle of the target stationary
          distribution to be sampled.  The function should accept
          different locations in each row and each column corresponds to
          a different dimension.

          or

        * "logpdf" LOGPDF: a function handle of the log of the target
          stationary distribution to be sampled.  The function should
          accept different locations in each row and each column
          corresponds to a different dimension.

     In case optional argument SYMMETRIC is set to false (the default),
     one of:

        * "proppdf" PROPPDF: a function handle of the proposal
          distribution that is sampled from with PROPRND to give the
          next point in the chain.  The function should accept two
          inputs, the random variable and the current location each
          input should accept different locations in each row and each
          column corresponds to a different dimension.

          or

        * "logproppdf" LOGPROPPDF: the log of "proppdf".

     The following input property/pair values may be needed depending on
     the desired outut:

        * "proprnd" PROPRND: (Required) a function handle which
          generates random numbers from PROPPDF.  The function should
          accept different locations in each row and each column
          corresponds to a different dimension corresponding with the
          current location.

        * "symmetric" SYMMETRIC: true or false based on whether PROPPDF
          is a symmetric distribution.  If true, PROPPDF (or LOGPROPPDF)
          need not be specified.  The default is false.

        * "burnin" BURNIN the number of points to discard at the
          beginning, the default is 0.

        * "thin" THIN: omits THIN-1 of every THIN points in the
          generated Markov chain.  The default is 1.

        * "nchain" NCHAIN: the number of Markov chains to generate.  The
          default is 1.

     Outputs:

        * SMPL: a NSAMPLES x DIM x NCHAIN tensor of random values drawn
          from PDF, where the rows are different random values, the
          columns correspond to the dimensions of PDF, and the third
          dimension corresponds to different Markov chains.

        * ACCEPT is a vector of the acceptance rate for each chain.

     Example : Sampling from a normal distribution

          start = 1;
          nsamples = 1e3;
          pdf = @(x) exp (-.5 * x .^ 2) / (pi ^ .5 * 2 ^ .5);
          proppdf = @(x,y) 1 / 6;
          proprnd = @(x) 6 * (rand (size (x)) - .5) + x;
          [smpl, accept] = mhsample (start, nsamples, "pdf", pdf, "proppdf", ...
          proppdf, "proprnd", proprnd, "thin", 4);
          histfit (smpl);

     See also: rand, slicesample.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 80
Draws NSAMPLES samples from a target stationary distribution PDF using
Metrop...



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 15
monotone_smooth


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 1798
 -- statistics: YY = monotone_smooth (X, Y, H)

     Produce a smooth monotone increasing approximation to a sampled
     functional dependence.

     A kernel method is used (an Epanechnikov smoothing kernel is
     applied to y(x); this is integrated to yield the monotone
     increasing form.  See Reference 1 for details.)

     Arguments
     ---------

        * X is a vector of values of the independent variable.

        * Y is a vector of values of the dependent variable, of the same
          size as X.  For best performance, it is recommended that the Y
          already be fairly smooth, e.g.  by applying a kernel smoothing
          to the original values if they are noisy.

        * H is the kernel bandwidth to use.  If H is not given, a
          "reasonable" value is computed.

     Return values
     -------------

        * YY is the vector of smooth monotone increasing function values
          at X.

     Examples
     --------

          x = 0:0.1:10;
          y = (x .^ 2) + 3 * randn(size(x)); # typically non-monotonic from the added
          noise
          ys = ([y(1) y(1:(end-1))] + y + [y(2:end) y(end)])/3; # crudely smoothed via
          moving average, but still typically non-monotonic
          yy = monotone_smooth(x, ys); # yy is monotone increasing in x
          plot(x, y, '+', x, ys, x, yy)

     References
     ----------

       1. Holger Dette, Natalie Neumeyer and Kay F. Pilz (2006), A
          simple nonparametric estimator of a strictly monotone
          regression function, 'Bernoulli', 12:469-490
       2. Regine Scheder (2007), R Package 'monoProc', Version 1.0-6,
          <http://cran.r-project.org/web/packages/monoProc/monoProc.pdf>
          (The implementation here is based on the monoProc function
          mono.1d)


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 80
Produce a smooth monotone increasing approximation to a sampled
functional de...



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 11
multcompare


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 7706
 -- statistics: C = multcompare (STATS)
 -- statistics: C = multcompare (STATS, "name", VALUE)
 -- statistics: [C, M] = multcompare (...)
 -- statistics: [C, M, H] = multcompare (...)
 -- statistics: [C, M, H, GNAMES] = multcompare (...)
 -- statistics: PADJ = multcompare (P)
 -- statistics: PADJ = multcompare (P, "ctype", CTYPE)

     Perform posthoc multiple comparison tests or p-value adjustments to
     control the family-wise error rate (FWER) or false discovery rate
     (FDR).

     'C = multcompare (STATS)' performs a multiple comparison using a
     STATS structure that is obtained as output from any of the
     following functions: anova1, anova2, anovan, kruskalwallis, and
     friedman.  The return value C is a matrix with one row per
     comparison and six columns.  Columns 1-2 are the indices of the two
     samples being compared.  Columns 3-5 are a lower bound, estimate,
     and upper bound for their difference, where the bounds are for 95%
     confidence intervals.  Column 6-8 are the multiplicity adjusted
     p-values for each individual comparison, the test statistic and the
     degrees of freedom.  All tests by multcompare are two-tailed.

     multcompare can take a number of optional parameters as name-value
     pairs.

     '[...] = multcompare (STATS, "alpha", ALPHA)'

        * ALPHA sets the significance level of null hypothesis
          significance tests to ALPHA, and the central coverage of
          two-sided confidence intervals to 100*(1-ALPHA)%.  (Default
          ALPHA is 0.05).

     '[...] = multcompare (STATS, "ControlGroup", REF)'

        * REF is the index of the control group to limit comparisons to.
          The index must be a positive integer scalar value.  For each
          dimension (d) listed in DIM, multcompare uses
          STATS.grpnames{d}(idx) as the control group.  (Default is
          empty, i.e.  [], for full pairwise comparisons)

     '[...] = multcompare (STATS, "ctype", CTYPE)'

        * CTYPE is the type of comparison test to use.  In order of
          increasing power, the choices are: "bonferroni", "scheffe",
          "mvt", "holm" (default), "hochberg", "fdr", or "lsd".  The
          first five methods control the family-wise error rate.  The
          "fdr" method controls false discovery rate (by the original
          Benjamini-Hochberg step-up procedure).  The final method,
          "lsd" (or "none"), makes no attempt to control the Type 1
          error rate of multiple comparisons.  The coverage of
          confidence intervals are only corrected for multiple
          comparisons in the cases where CTYPE is "bonferroni",
          "scheffe" or "mvt", which control the Type 1 error rate for
          simultaneous inference.

          The "mvt" method uses the multivariate t distribution to
          assess the probability or critical value of the maximum
          statistic across the tests, thereby accounting for
          correlations among comparisons in the control of the
          family-wise error rate with simultaneous inference.  In the
          case of pairwise comparisons, it simulates Tukey's (or the
          Games-Howell) test, in the case of comparisons with a single
          control group, it simulates Dunnett's test.  CTYPE values
          "tukey-kramer" and "hsd" are recognised but set the value of
          CTYPE and REF to "mvt" and empty respectively.  A CTYPE value
          "dunnett" is recognised but sets the value of CTYPE to "mvt",
          and if REF is empty, sets REF to 1.  Since the algorithm uses
          a Monte Carlo method (of 1e+06 random samples), you can expect
          the results to fluctuate slightly with each call to
          multcompare and the calculations may be slow to complete for a
          large number of comparisons.  If the parallel package is
          installed and loaded, multcompare will automatically
          accelerate computations by parallel processing.  Note that
          p-values calculated by the "mvt" are truncated at 1e-06.

     '[...] = multcompare (STATS, "df", DF)'

        * DF is an optional scalar value to set the number of degrees of
          freedom in the calculation of p-values for the multiple
          comparison tests.  By default, this value is extracted from
          the STATS structure of the ANOVA test, but setting DF maybe
          necessary to approximate Satterthwaite correction if anovan
          was performed using weights.

     '[...] = multcompare (STATS, "dim", DIM)'

        * DIM is a vector specifying the dimension or dimensions over
          which the estimated marginal means are to be calculated.  Used
          only if STATS comes from anovan.  The value [1 3], for
          example, computes the estimated marginal mean for each
          combination of the first and third predictor values.  The
          default is to compute over the first dimension (i.e.  1).  If
          the specified dimension is, or includes, a continuous factor
          then multcompare will return an error.

     '[...] = multcompare (STATS, "estimate", ESTIMATE)'

        * ESTIMATE is a string specifying the estimates to be compared
          when computing multiple comparisons after anova2; this
          argument is ignored by anovan and anova1.  Accepted values for
          ESTIMATE are either "column" (default) to compare column
          means, or "row" to compare row means.  If the model type in
          anova2 was "linear" or "nested" then only "column" is accepted
          for ESTIMATE since the row factor is assumed to be a random
          effect.

     '[...] = multcompare (STATS, "display", DISPLAY)'

        * DISPLAY is either "on" (the default): to display a table and
          graph of the comparisons (e.g.  difference between means),
          their 100*(1-ALPHA)% intervals and multiplicity adjusted
          p-values in APA style; or "off": to omit the table and graph.
          On the graph, markers and error bars colored red have
          multiplicity adjusted p-values < ALPHA, otherwise the markers
          and error bars are blue.

     '[...] = multcompare (STATS, "seed", SEED)'

        * SEED is a scalar value used to initialize the random number
          generator so that CTYPE "mvt" produces reproducible results.

     '[C, M, H, GNAMES] = multcompare (...)' returns additional outputs.
     M is a matrix where columns 1-2 are the estimated marginal means
     and their standard errors, and columns 3-4 are lower and upper
     bounds of the confidence intervals for the means; the critical
     value of the test statistic is scaled by a factor of 2^(-0.5)
     before multiplying by the standard errors of the group means so
     that the intervals overlap when the difference in means becomes
     significant at approximately the level ALPHA.  When ALPHA is 0.05,
     this corresponds to confidence intervals with 83.4% central
     coverage.  H is a handle to the figure containing the graph.
     GNAMES is a cell array with one row for each group, containing the
     names of the groups.

     'PADJ = multcompare (P)' calculates and returns adjusted p-values
     (PADJ) using the Holm-step down Bonferroni procedure to control the
     family-wise error rate.

     'PADJ = multcompare (P, "ctype", CTYPE)' calculates and returns
     adjusted p-values (PADJ) computed using the method CTYPE.  In order
     of increasing power, CTYPE for p-value adjustment can be either
     "bonferroni", "holm" (default), "hochberg", or "fdr".  See above
     for further information about the CTYPE methods.

     See also: anova1, anova2, anovan, kruskalwallis, friedman, fitlm.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 80
Perform posthoc multiple comparison tests or p-value adjustments to
control t...



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 6
nanmax


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 372
 -- statistics: [V, IDX] = nanmax (X)
 -- statistics: [V, IDX] = nanmax (X, Y)

     Find the maximal element while ignoring NaN values.

     'nanmax' is identical to the 'max' function except that NaN values
     are ignored.  If all values in a column are NaN, the maximum is
     returned as NaN rather than [].

     See also: max, nansum, nanmin, nanmean, nanmedian.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 51
Find the maximal element while ignoring NaN values.



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 6
nanmin


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 372
 -- statistics: [V, IDX] = nanmin (X)
 -- statistics: [V, IDX] = nanmin (X, Y)

     Find the minimal element while ignoring NaN values.

     'nanmin' is identical to the 'min' function except that NaN values
     are ignored.  If all values in a column are NaN, the minimum is
     returned as NaN rather than [].

     See also: min, nansum, nanmax, nanmean, nanmedian.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 51
Find the minimal element while ignoring NaN values.



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 6
nansum


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 522
 -- statistics: s = nansum (X)
 -- statistics: s = nansum (X, DIM)
 -- statistics: s = nansum (..., "native")
 -- statistics: s = nansum (..., "double")
 -- statistics: s = nansum (..., "extra")
     Compute the sum while ignoring NaN values.

     'nansum' is identical to the 'sum' function except that NaN values
     are treated as 0 and so ignored.  If all values are NaN, the sum is
     returned as 0.

     See help text of 'sum' for details on the options.

     See also: sum, nanmin, nanmax, nanmean, nanmedian.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 42
Compute the sum while ignoring NaN values.



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 8
nbinstat


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 1147
 -- statistics: [M, V] = nbinstat (N, P)

     Compute mean and variance of the negative binomial distribution.

     Arguments
     ---------

        * N is the first parameter of the negative binomial
          distribution.  The elements of N must be natural numbers

        * P is the second parameter of the negative binomial
          distribution.  The elements of P must be probabilities
     N and P must be of common size or one of them must be scalar

     Return values
     -------------

        * M is the mean of the negative binomial distribution

        * V is the variance of the negative binomial distribution

     Examples
     --------

          n = 1:4;
          p = 0.2:0.2:0.8;
          [m, v] = nbinstat (n, p)

          [m, v] = nbinstat (n, 0.5)

     References
     ----------

       1. Wendy L. Martinez and Angel R. Martinez.  'Computational
          Statistics Handbook with MATLAB'. Appendix E, pages 547-557,
          Chapman & Hall/CRC, 2001.

       2. Athanasios Papoulis.  'Probability, Random Variables, and
          Stochastic Processes'.  McGraw-Hill, New York, second edition,
          1984.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 64
Compute mean and variance of the negative binomial distribution.



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 7
ncfstat


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 537
 -- statistics: [M, V] = ncfstat (DF1, DF1, DELTA)

     Mean and variance for the noncentral F distribution.

     '[M, V] = ncfstat (DF1, DF1, DELTA)' returns the mean and variance
     of the noncentral F distribution with DF1 and DF2 degrees of
     freedom and noncentrality parameter DELTA.

     The size of M and V is the common size of the input arguments.
     Scalar input arguments DF1, DF2, and DELTA are regarded as constant
     matrices of the same size as the other input.

     See also: ncfcdf, ncfinv, ncfpdf, ncfrnd.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 52
Mean and variance for the noncentral F distribution.



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 7
nctstat


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 509
 -- statistics: [M, V] = nctstat (DF, DELTA)

     Mean and variance for the noncentral T distribution.

     '[M, V] = nctstat (DF, DELTA)' returns the mean and variance of the
     noncentral T distribution with DF degrees of freedom and
     noncentrality parameter DELTA.

     The size of M and V is the common size of the input arguments.
     Scalar input arguments DF and DELTA are regarded as constant
     matrices of the same size as the other input.

     See also: nctcdf, nctinv, nctpdf, nctrnd.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 52
Mean and variance for the noncentral T distribution.



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 8
ncx2stat


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 533
 -- statistics: [M, V] = ncx2stat (DF, DELTA)

     Mean and variance for the noncentral chi-square distribution.

     '[M, V] = ncx2stat (DF, DELTA)' returns the mean and variance of
     the noncentral chi-square distribution with DF degrees of freedom
     and noncentrality parameter DELTA.

     The size of M and V is the common size of the input arguments.
     Scalar input arguments DF and DELTA are regarded as constant
     matrices of the same size as the other input.

     See also: ncx2cdf, ncx2inv, ncx2pdf, ncx2rnd.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 61
Mean and variance for the noncentral chi-square distribution.



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 22
normalise_distribution


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 2073
 -- statistics: NORMALISED = normalise_distribution (DATA)
 -- statistics: NORMALISED = normalise_distribution (DATA, DISTRIBUTION)
 -- statistics: NORMALISED = normalise_distribution (DATA, DISTRIBUTION,
          DIMENSION)

     Transform a set of data so as to be N(0,1) distributed according to
     an idea by van Albada and Robinson.

     This is achieved by first passing it through its own cumulative
     distribution function (CDF) in order to get a uniform distribution,
     and then mapping the uniform to a normal distribution.

     The data must be passed as a vector or matrix in DATA.  If the CDF
     is unknown, then [] can be passed in DISTRIBUTION, and in this case
     the empirical CDF will be used.  Otherwise, if the CDFs for all
     data are known, they can be passed in DISTRIBUTION, either in the
     form of a single function name as a string, or a single function
     handle, or a cell array consisting of either all function names as
     strings, or all function handles.  In the latter case, the number
     of CDFs passed must match the number of rows, or columns
     respectively, to normalise.  If the data are passed as a matrix,
     then the transformation will operate either along the first
     non-singleton dimension, or along DIMENSION if present.

     Notes: The empirical CDF will map any two sets of data having the
     same size and their ties in the same places after sorting to some
     permutation of the same normalised data:
          normalise_distribution([1 2 2 3 4])
          => -1.28  0.00  0.00  0.52  1.28

          normalise_distribution([1 10 100 10 1000])
          => -1.28  0.00  0.52  0.00  1.28

     Original source: S.J. van Albada, P.A. Robinson "Transformation of
     arbitrary distributions to the normal distribution with application
     to EEG test-retest reliability" Journal of Neuroscience Methods,
     Volume 161, Issue 2, 15 April 2007, Pages 205-211 ISSN 0165-0270,
     10.1016/j.jneumeth.2006.11.004.
     (http://www.sciencedirect.com/science/article/pii/S0165027006005668)


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 80
Transform a set of data so as to be N(0,1) distributed according to an
idea b...



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 8
normlike


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 1462
 -- statistics: NLOGL = normlike (PARAMS, DATA)
 -- statistics: [NLOGL, AVAR] = normlike (PARAMS, DATA)
 -- statistics: [...] = normlike (PARAMS, DATA, CENSOR)
 -- statistics: [...] = normlike (PARAMS, DATA, CENSOR, FREQ)

     Negative log-likelihood for the normal distribution.

     'NLOGL = normlike (PARAMS, DATA)' returns the negative of the
     log-likelihood for the normal distribution, evaluated at parameters
     PARAMS(1) = mean and PARAMS(2) = standard deviation, given DATA.
     NLOGL is a scalar.

     '[NLOGL, AVAR] = normlike (PARAMS, DATA)' returns the inverse of
     Fisher's information matrix, AVAR.  If the input parameter values
     in PARAMS are the maximum likelihood estimates, the diagonal
     elements of AVAR are their asymptotic variances.  AVAR is based on
     the observed Fisher's information, not the expected information.

     '[...] = normlike (PARAMS, DATA, CENSOR)' accepts a boolean vector
     of the same size as DATA that is 1 for observations that are
     right-censored and 0 for observations that are observed exactly.

     '[...] = normlike (PARAMS, DATA, CENSOR, FREQ)' accepts a frequency
     vector of the same size as DATA.  FREQ typically contains integer
     frequencies for the corresponding elements in DATA, but it may
     contain any non-integer non-negative values.  Pass in [] for CENSOR
     to use its default value.

     See also: normcdf, norminv, normpdf, normrnd, normfit, normstat.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 52
Negative log-likelihood for the normal distribution.



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 8
normplot


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 766
 -- Function File: normplot (X)
 -- Function File: normplot (AX, X)
 -- Function File: H = normplot (...)

     Produce normal probability plot of the data in X.  If X is a
     matrix, 'normplot' plots the data for each column.  NaN values are
     ignored.

     'H = normplot (AX, X)' takes a handle AX in addition to the data in
     X and it uses that axes for ploting.  You may get this handle of an
     existing plot with 'gca'/.

     The line joing the 1st and 3rd quantile is drawn solid whereas its
     extensions to both ends are dotted.  If the underlying distribution
     is normal, the points will cluster around the solid part of the
     line.  Other distribution types will introduce curvature in the
     plot.

     See also: cdfplot, wblplot.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 49
Produce normal probability plot of the data in X.



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 8
normstat


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 1006
 -- statistics: [MN, V] = normstat (M, S)

     Compute mean and variance of the normal distribution.

     Arguments
     ---------

        * M is the mean of the normal distribution

        * S is the standard deviation of the normal distribution.  S
          must be positive
     M and S must be of common size or one of them must be scalar

     Return values
     -------------

        * MN is the mean of the normal distribution

        * V is the variance of the normal distribution

     Examples
     --------

          m = 1:6;
          s = 0:0.2:1;
          [mn, v] = normstat (m, s)

          [mn, v] = normstat (0, s)

     References
     ----------

       1. Wendy L. Martinez and Angel R. Martinez.  'Computational
          Statistics Handbook with MATLAB'. Appendix E, pages 547-557,
          Chapman & Hall/CRC, 2001.

       2. Athanasios Papoulis.  'Probability, Random Variables, and
          Stochastic Processes'.  McGraw-Hill, New York, second edition,
          1984.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 53
Compute mean and variance of the normal distribution.



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 16
optimalleaforder


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 1297
 -- statistics: LEAFORDER = optimalleaforder (TREE, D)
 -- statistics: LEAFORDER = optimalleaforder (..., NAME, VALUE)

     Compute the optimal leaf ordering of a hierarchical binary cluster
     tree.

     The optimal leaf ordering of a tree is the ordering which minimizes
     the sum of the distances between each leaf and its adjacent leaves,
     without altering the structure of the tree, that is without
     redefining the clusters of the tree.

     Required inputs:
        * TREE: a hierarchical cluster tree TREE generated by the
          'linkage' function.

        * D: a matrix of distances as computed by 'pdist'.

     Optional inputs can be the following property/value pairs:
        * property 'Criteria' at the moment can only have the value
          'adjacent', for minimizing the distances between leaves.

        * property 'Transformation' can have one of the values 'linear',
          'inverse' or a handle to a custom function which computes S
          the similarity matrix.

     optimalleaforder's output LEAFORDER is the optimal leaf ordering.

     *Reference* Bar-Joseph, Z., Gifford, D.K., and Jaakkola, T.S. Fast
     optimal leaf ordering for hierarchical clustering.  Bioinformatics
     vol.  17 suppl.  1, 2001.

See also: dendrogram,linkage,pdist.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 72
Compute the optimal leaf ordering of a hierarchical binary cluster tree.



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 3
pca


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 3241
 -- statistics: COEFF = pca(X)
 -- statistics: COEFF = pca(X, Name, Value)
 -- statistics: [COEFF, SCORE, LATENT] = pca(...)
 -- statistics: [COEFF, SCORE, LATENT, TSQUARED] = pca(...)
 -- statistics: [COEFF, SCORE, LATENT, TSQUARED, EXPLAINED, MU] =
          pca(...)

     Performs a principal component analysis on a data matrix X

     A principal component analysis of a data matrix of 'n' observations
     in a 'p'-dimensional space returns a 'p'-by-'p' transformation
     matrix, to perform a change of basis on the data.  The first
     component of the new basis is the direction that maximizes the
     variance of the projected data.

     Input argument:
        * X : a 'n'-by-'p' data matrix

     Pair arguments:
        * 'Algorithm' : the algorithm to use, it can be either 'eig',
          for eigenvalue decomposition, or 'svd' (default), for singular
          value decomposition
        * 'Centered' : boolean indicator for centering the observation
          data, it is 'true' by default
        * 'Economy' : boolean indicator for the economy size output, it
          is 'true' by default; 'pca' returns only the elements of
          LATENT that are not necessarily zero, and the corresponding
          columns of COEFF and SCORE, that is, when 'n <= p', only the
          first 'n - 1'
        * 'NumComponents' : the number of components 'k' to return, if
          'k < p', then only the first 'k' columns of COEFF and SCORE
          are returned
        * 'Rows' : action to take with missing values, it can be either
          'complete' (default), missing values are removed before
          computation, 'pairwise' (only with algorithm 'eig'), the
          covariance of rows with missing data is computed using the
          available data, but the covariance matrix could be not
          positive definite, which triggers the termination of 'pca',
          'complete', missing values are not allowed, 'pca' terminates
          with an error if there are any
        * 'Weights' : observation weights, it is a vector of positive
          values of length 'n'
        * 'VariableWeights' : variable weights, it can be either a
          vector of positive values of length 'p' or the string
          'variance' to use the sample variance as weights

     Return values:
        * COEFF : the principal component coefficients, a 'p'-by-'p'
          transformation matrix
        * SCORE : the principal component scores, the representation of
          X in the principal component space
        * LATENT : the principal component variances, i.e., the
          eigenvalues of the covariance matrix of X
        * TSQUARED : Hotelling's T-squared Statistic for each
          observation in X
        * EXPLAINED : the percentage of the variance explained by each
          principal component
        * MU : the estimated mean of each variable of X, it is zero if
          the data are not centered

     Matlab compatibility note: the alternating least square method
     'als' and associated options 'Coeff0', 'Score0', and 'Options' are
     not yet implemented

     References
     ----------

       1. Jolliffe, I. T., Principal Component Analysis, 2nd Edition,
          Springer, 2002


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 58
Performs a principal component analysis on a data matrix X



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 6
pcacov


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 635
 -- statistics: COEFF = pcacov(X)
 -- statistics: [COEFF,LATENT] = pcacov(X)
 -- statistics: [COEFF,LATENT,EXPLAINED] = pcacov(X)

     Perform principal component analysis on the nxn covariance matrix X

        * COEFF : a nxn matrix with columns containing the principal
          component coefficients
        * LATENT : a vector containing the principal component variances
        * EXPLAINED : a vector containing the percentage of the total
          variance explained by each principal component

     References
     ----------

       1. Jolliffe, I. T., Principal Component Analysis, 2nd Edition,
          Springer, 2002


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 67
Perform principal component analysis on the nxn covariance matrix X



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 6
pcares


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 471
 -- statistics: [RESIDUALS, RECONSTRUCTED] = pcares (X, NDIM)

     Calulate residuals from principal component analysis

        * X : N x P Matrix with N observations and P variables, the
          variables will be mean centered
        * NDIM : Is a scalar indicating the number of principal
          components to use and should be <= P

     References
     ----------

       1. Jolliffe, I. T., Principal Component Analysis, 2nd Edition,
          Springer, 2002


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 52
Calulate residuals from principal component analysis



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 3
pdf


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 4274
 -- statistics: RETVAL = pdf (NAME, X, ...)

     Return probability density function of NAME function for value X.

     This is a wrapper around various NAMEpdf and NAME_pdf functions.
     See the individual functions help to learn the signification of the
     arguments after X.  Supported functions and corresponding number of
     additional arguments are:

       function               alternative                      args
     -------------------------------------------------------------------------
       "bbs"                  "Birnbaum-Saunders"              3
       "beta"                                                  2
       "bino"                 "binomial"                       2
       "burr"                 "Burr"                           3
       "cauchy"               "Cauchy"                         0 defaults:
                                                               loc=0,scale=1
       "cauchy"               "Cauchy"                         2
       "chi2"                 "chi-square"                     1
       "copula"               "Copula family"                  2
       "discrete"             "univariate discrete"            2
       "empirical"            "univariate empirical"           2
       "exp"                  "exponential"                    1
       "f"                                                     2
       "gam"                  "gamma"                          2
       "geo"                  "geometric"                      1
       "gev"                  "generalized extreme value"      3
       "gp"                   "generalized Pareto"             3
       "hyge"                 "hypergeometric"                 3
       "iwish"                "inverse Wishart"                2
       "iwish"                "inverse Wishart"                3 set
                                                               log_y=true
       "jsu"                  "Johnson SU"                     2
       "laplace"              "Laplace"                        0
       "logistic"                                              0
       "logn"                 "lognormal"                      0 defaults:
                                                               mu=0,sigma=1
       "logn"                 "lognormal"                      2
       "mn"                   "multinomial"                    1
       "mvn"                  "multivariate normal"            0 defaults:
                                                               mu=0,sigma=1
       "mvn"                  "multivariate normal"            1 defaults:
                                                               sigma=1
       "mvn"                  "multivariate normal"            2
       "mvt"                  "multivariate Student"           2
       "naka"                 "Nakagami"                       2
       "nbin"                 "negative binomial"              2
       "norm"                 "normal"                         2
       "poiss"                "Poisson"                        1
       "rayl"                 "Rayleigh"                       1
       "stdnormal"            "standard normal"                0
       "t"                                                     1
       "tri"                  "triangular"                     3
       "unid"                 "uniform discrete"               1
       "unif"                 "uniform"                        0 defaults:
                                                               a=0,b=1
       "unif"                 "uniform"                        2
       "vm"                   "Von Mises"                      2
       "wbl"                  "Weibull"                        0 defaults:
                                                               scale=0,shape=1
       "wbl"                  "Weibull"                        1 defaults:
                                                               shape=1
       "wbl"                  "Weibull"                        2
       "wish"                 "Wishart"                        2
       "wish"                 "Wishart"                        3 set
                                                               log_y=true

     See also: cdf, rnd.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 65
Return probability density function of NAME function for value X.



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 5
pdist


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 2694
 -- statistics: Y = pdist (X)
 -- statistics: Y = pdist (X, METRIC)
 -- statistics: Y = pdist (X, METRIC, METRICARG, ...)

     Return the distance between any two rows in X.

     X is the NxD matrix representing Q row vectors of size D.

     The output is a dissimilarity matrix formatted as a row vector Y,
     (n-1)*n/2 long, where the distances are in the order [(1, 2) (1, 3)
     ... (2, 3) ... (n-1, n)].  You can use the 'squareform' function to
     display the distances between the vectors arranged into an NxN
     matrix.

     'metric' is an optional argument specifying how the distance is
     computed.  It can be any of the following ones, defaulting to
     "euclidean", or a user defined function that takes two arguments X
     and Y plus any number of optional arguments, where X is a row
     vector and and Y is a matrix having the same number of columns as
     X.  'metric' returns a column vector where row I is the distance
     between X and row I of Y.  Any additional arguments after the
     'metric' are passed as metric (X, Y, METRICARG1, METRICARG2 ...).

     Predefined distance functions are:

     '"euclidean"'
          Euclidean distance (default).

     '"squaredeuclidean"'
          Squared Euclidean distance.  It omits the square root from the
          calculation of the Euclidean distance.  It does not satisfy
          the triangle inequality.

     '"seuclidean"'
          Standardized Euclidean distance.  Each coordinate in the sum
          of squares is inverse weighted by the sample variance of that
          coordinate.

     '"mahalanobis"'
          Mahalanobis distance: see the function mahalanobis.

     '"cityblock"'
          City Block metric, aka Manhattan distance.

     '"minkowski"'
          Minkowski metric.  Accepts a numeric parameter P: for P=1 this
          is the same as the cityblock metric, with P=2 (default) it is
          equal to the euclidean metric.

     '"cosine"'
          One minus the cosine of the included angle between rows, seen
          as vectors.

     '"correlation"'
          One minus the sample correlation between points (treated as
          sequences of values).

     '"spearman"'
          One minus the sample Spearman's rank correlation between
          observations, treated as sequences of values.

     '"hamming"'
          Hamming distance: the quote of the number of coordinates that
          differ.

     '"jaccard"'
          One minus the Jaccard coefficient, the quote of nonzero
          coordinates that differ.

     '"chebychev"'
          Chebychev distance: the maximum coordinate difference.

     See also: linkage, mahalanobis, squareform, pdist2.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 46
Return the distance between any two rows in X.



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 6
pdist2


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 1737
 -- statistics: D = pdist2 (X, Y)
 -- statistics: D = pdist2 (X, Y, METRIC)

     Compute pairwise distance between two sets of vectors.

     Let X be an MxP matrix representing m points in P-dimensional space
     and Y be an NxP matrix representing another set of points in the
     same space.  This function computes the M-by-N distance matrix D
     where 'D(i,j)' is the distance between 'X(i,:)' and 'Y(j,:)'.

     The optional argument METRIC can be used to select different
     distances:

     "euclidean" (default)

     "sqeuclidean"
          Compute the squared euclidean distance, i.e., the euclidean
          distance before computing square root.  This is ideal when the
          interest is on the order of the euclidean distances rather
          than the actual distance value because it performs
          significantly faster while preserving the order.

     "chisq'"
          The chi-squared distance between two vectors is defined as:
          'd(x, y) = sum ((xi-yi)^2 / (xi+yi)) / 2'.  The chi-squared
          distance is useful when comparing histograms.

     "cosine"
          Distance is defined as the cosine of the angle between two
          vectors.

     "emd"
          Earth Mover's Distance (EMD) between positive vectors
          (histograms).  Note for 1D, with all histograms having equal
          weight, there is a simple closed form for the calculation of
          the EMD. The EMD between histograms X and Y is given by 'sum
          (abs (cdf (x) - cdf (y)))', where 'cdf' is the cumulative
          distribution function (computed simply by 'cumsum').

     "L1"
          The L1 distance between two vectors is defined as: 'sum (abs
          (x-y))'

     See also: pdist.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 54
Compute pairwise distance between two sets of vectors.



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 10
plsregress


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 560
 -- statistics: [XLOAD, YLOAD, XSCORE, YSCORE, COEF, FITTED] =
          plsregress(X, Y, NCOMP)

     Calculate partial least squares regression using SIMPLS algorithm.

        * X: Matrix of observations
        * Y: Is a vector or matrix of responses
        * NCOMP: number of components used for modelling
        * X and Y will be mean centered to improve accuracy

     References
     ----------

       1. SIMPLS: An alternative approach to partial least squares
          regression.  Chemometrics and Intelligent Laboratory Systems
          (1993)


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 66
Calculate partial least squares regression using SIMPLS algorithm.



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 8
poisstat


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 916
 -- statistics: [M, V] = poisstat (LAMBDA)

     Compute mean and variance of the Poisson distribution.

     Arguments
     ---------

        * LAMBDA is the parameter of the Poisson distribution.  The
          elements of LAMBDA must be positive

     Return values
     -------------

        * M is the mean of the Poisson distribution

        * V is the variance of the Poisson distribution

     Example
     -------

          lambda = 1 ./ (1:6);
          [m, v] = poisstat (lambda)

     References
     ----------

       1. Wendy L. Martinez and Angel R. Martinez.  'Computational
          Statistics Handbook with MATLAB'. Appendix E, pages 547-557,
          Chapman & Hall/CRC, 2001.

       2. Athanasios Papoulis.  'Probability, Random Variables, and
          Stochastic Processes'.  McGraw-Hill, New York, second edition,
          1984.

     See also: poisscdf, poissinv, poisspdf, poissrnd.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 54
Compute mean and variance of the Poisson distribution.



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 6
ppplot


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 959
 -- statistics: ppplot (X, DIST)
 -- statistics: ppplot (X, DIST, PARAMS)
 -- statistics: [P, Y] = ppplot (X, DIST, PARAMS)

     Perform a PP-plot (probability plot).

     If F is the CDF of the distribution DIST with parameters PARAMS and
     X a sample vector of length N, the PP-plot graphs ordinate Y(I) = F
     (I-th largest element of X) versus abscissa P(I) = (I - 0.5)/N.  If
     the sample comes from F, the pairs will approximately follow a
     straight line.

     The default for DIST is the standard normal distribution.

     The optional argument PARAMS contains a list of parameters of DIST.

     For example, for a probability plot of the uniform distribution on
     [2,4] and X, use

          ppplot (x, "uniform", 2, 4)

     DIST can be any string for which a function DIST_CDF that
     calculates the CDF of distribution DIST exists.

     If no output is requested then the data are plotted immediately.

     See also: qqplot.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 37
Perform a PP-plot (probability plot).



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 8
princomp


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 1236
 -- statistics: COEFF = princomp (X)
 -- statistics: [COEFF, SCORE] = princomp (X)
 -- statistics: [COEFF, SCORE, LATENT] = princomp (X)
 -- statistics: [COEFF, SCORE, LATENT, TSQUARE] = princomp (X)
 -- statistics: [...] = princomp (X, "econ")

     Performs a principal component analysis on a NxP data matrix X

        * COEFF : returns the principal component coefficients
        * SCORE : returns the principal component scores, the
          representation of X in the principal component space
        * LATENT : returns the principal component variances, i.e., the
          eigenvalues of the covariance matrix X.
        * TSQUARE : returns Hotelling's T-squared Statistic for each
          observation in X
        * [...]  = princomp(X,'econ') returns only the elements of
          latent that are not necessarily zero, and the corresponding
          columns of COEFF and SCORE, that is, when n <= p, only the
          first n-1.  This can be significantly faster when p is much
          larger than n.  In this case the svd will be applied on the
          transpose of the data matrix X

     References
     ----------

       1. Jolliffe, I. T., Principal Component Analysis, 2nd Edition,
          Springer, 2002


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 62
Performs a principal component analysis on a NxP data matrix X



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 6
probit


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 184
 -- statistics: Y = probit (P)

     Probit transformation

     Return the probit (the quantile of the standard normal
     distribution) for each element of P.

     See also: logit.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 21
Probit transformation



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 11
prop_test_2


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 886
 -- statistics: [PVAL, Z] = prop_test_2 (X1, N1, X2, N2, ALT)

     Compare two proportions

     If X1 and N1 are the counts of successes and trials in one sample,
     and X2 and N2 those in a second one, test the null hypothesis that
     the success probabilities P1 and P2 are the same.

     Under the null, the test statistic Z approximately follows a
     standard normal distribution.

     With the optional argument string ALT, the alternative of interest
     can be selected.  If ALT is "!=" or "<>", the null is tested
     against the two-sided alternative P1 != P2.  If ALT is ">", the
     one-sided alternative P1 > P2 is used.  Similarly for "<", the
     one-sided alternative P1 < P2 is used.  The default is the
     two-sided case.

     The p-value of the test is returned in PVAL.

     If no output argument is given, the p-value of the test is
     displayed.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 23
Compare two proportions



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 6
qqplot


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 1200
 -- statistics: [Q, S] = qqplot (X)
 -- statistics: [Q, S] = qqplot (X, Y)
 -- statistics: [Q, S] = qqplot (X, DIST)
 -- statistics: [Q, S] = qqplot (X, Y, PARAMS)
 -- statistics: qqplot (...)

     Perform a QQ-plot (quantile plot).

     If F is the CDF of the distribution DIST with parameters PARAMS and
     G its inverse, and X a sample vector of length N, the QQ-plot
     graphs ordinate S(I) = I-th largest element of x versus abscissa
     Q(If) = G((I - 0.5)/N).

     If the sample comes from F, except for a transformation of location
     and scale, the pairs will approximately follow a straight line.

     If the second argument is a vector Y the empirical CDF of Y is used
     as DIST.

     The default for DIST is the standard normal distribution.  The
     optional argument PARAMS contains a list of parameters of DIST.
     For example, for a quantile plot of the uniform distribution on
     [2,4] and X, use

          qqplot (x, "unif", 2, 4)

     DIST can be any string for which a function DISTINV or DIST_INV
     exists that calculates the inverse CDF of distribution DIST.

     If no output arguments are given, the data are plotted directly.

     See also: ppplot.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 34
Perform a QQ-plot (quantile plot).



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 6
qrandn


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 500
 -- statistics: Z = qrandn (Q, R,C)
 -- statistics: Z = qrandn (Q, [R,C])

     Returns random deviates drawn from a q-Gaussian distribution.

     Parameter Q charcterizes the q-Gaussian distribution.  The result
     has the size indicated by S.

     Reference: W. Thistleton, J. A. Marsh, K. Nelson, C. Tsallis (2006)
     "Generalized Box-Muller method for generating q-Gaussian random
     deviates" arXiv:cond-mat/0605570
     http://arxiv.org/abs/cond-mat/0605570

     See also: rand, randn.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 61
Returns random deviates drawn from a q-Gaussian distribution.



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 6
random


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 3126
 -- statistics: R = random(NAME, ARG1)
 -- statistics: R = random(NAME, ARG1, ARG2)
 -- statistics: R = random(NAME, ARG1, ARG2, ARG3)
 -- statistics: R = random(NAME, ..., S1, ...)

     Generates pseudo-random numbers from a given one-, two-, or
     three-parameter distribution.

     The variable NAME must be a string that names the distribution from
     which to sample.  If this distribution is a one-parameter
     distribution ARG1 should be supplied, if it is a two-paramter
     distribution ARG2 must also be supplied, and if it is a
     three-parameter distribution ARG3 must also be present.  Any
     arguments following the distribution paramters will determine the
     size of the result.

     As an example, the following code generates a 10 by 20 matrix
     containing random numbers from a normal distribution with mean 5
     and standard deviation 2.
          R = random("normal", 5, 2, [10, 20]);

     The variable NAME can be one of the following strings

     "beta"
     "beta distribution"
          Samples are drawn from the Beta distribution.
     "bino"
     "binomial"
     "binomial distribution"
          Samples are drawn from the Binomial distribution.
     "chi2"
     "chi-square"
     "chi-square distribution"
          Samples are drawn from the Chi-Square distribution.
     "exp"
     "exponential"
     "exponential distribution"
          Samples are drawn from the Exponential distribution.
     "f"
     "f distribution"
          Samples are drawn from the F distribution.
     "gam"
     "gamma"
     "gamma distribution"
          Samples are drawn from the Gamma distribution.
     "geo"
     "geometric"
     "geometric distribution"
          Samples are drawn from the Geometric distribution.
     "hyge"
     "hypergeometric"
     "hypergeometric distribution"
          Samples are drawn from the Hypergeometric distribution.
     "logn"
     "lognormal"
     "lognormal distribution"
          Samples are drawn from the Log-Normal distribution.
     "nbin"
     "negative binomial"
     "negative binomial distribution"
          Samples are drawn from the Negative Binomial distribution.
     "norm"
     "normal"
     "normal distribution"
          Samples are drawn from the Normal distribution.
     "poiss"
     "poisson"
     "poisson distribution"
          Samples are drawn from the Poisson distribution.
     "rayl"
     "rayleigh"
     "rayleigh distribution"
          Samples are drawn from the Rayleigh distribution.
     "t"
     "t distribution"
          Samples are drawn from the T distribution.
     "unif"
     "uniform"
     "uniform distribution"
          Samples are drawn from the Uniform distribution.
     "unid"
     "discrete uniform"
     "discrete uniform distribution"
          Samples are drawn from the Uniform Discrete distribution.
     "wbl"
     "weibull"
     "weibull distribution"
          Samples are drawn from the Weibull distribution.

     See also: rand, betarnd, binornd, chi2rnd, exprnd, frnd, gamrnd,
     geornd, hygernd, lognrnd, nbinrnd, normrnd, poissrnd, raylrnd,
     trnd, unifrnd, unidrnd, wblrnd.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 80
Generates pseudo-random numbers from a given one-, two-, or
three-parameter d...



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 10
randsample


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 679
 -- statistics: Y = randsample (V, K)
 -- statistics: Y = randsample (V, K, REPLACEMENT=false)
 -- statistics: Y = randsample (V, K, REPLACEMENT=false, [W=[]])

     Sample elements from a vector.

     Returns K random elements from a vector V with N elements, sampled
     without or with REPLACEMENT.

     If V is a scalar, samples from 1:V.

     If a weight vector W of the same size as V is specified, the
     probablility of each element being sampled is proportional to W.
     Unlike Matlab's function of the same name, this can be done for
     sampling with or without replacement.

     Randomization is performed using rand().

     See also: datasample, randperm.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 30
Sample elements from a vector.



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 7
ranksum


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 3053
 -- statistics: P = ranksum (X, Y)
 -- statistics: P = ranksum (X, Y, ALPHA)
 -- statistics: P = ranksum (X, Y, ALPHA, NAME, VALUE)
 -- statistics: P = ranksum (X, Y, NAME, VALUE)
 -- statistics: [P, H] = ranksum (X, Y, ...)
 -- statistics: [P, H, STATS] = ranksum (X, Y, ...)

     Wilcoxon rank sum test for equal medians.  This test is equivalent
     to a Mann-Whitney U-test.

     'P = ranksum (X, Y)' returns the p-value of a two-sided Wilcoxon
     rank sum test.  It tests the null hypothesis that two independent
     samples, in the vectors X and Y, come from continuous distributions
     with equal medians, against the alternative hypothesis that they
     are not.  X and Y can have different lengths and the test assumes
     that they are independent.

     'ranksum' treats NaN in X, Y as missing values.  The two-sided
     p-value is computed by doubling the most significant one-sided
     value.

     '[P, H] = ranksum (X, Y)' also returns the result of the hypothesis
     test with 'H = 1' indicating a rejection of the null hypothesis at
     the default alpha = 0.05 significance level, and 'H = 0' indicating
     a failure to reject the null hypothesis at the same significance
     level.

     '[P, H, STATS] = ranksum (X, Y)' also returns the structure STATS
     with information about the test statistic.  It contains the field
     'ranksum' with the value of the rank sum test statistic and if
     computed with the "approximate" method it also contains the value
     of the z-statistic in the field 'zval'.

     '[...] = ranksum (X, Y, ALPHA)' or alternatively '[...] = ranksum
     (X, Y, "alpha", ALPHA)' returns the result of the hypothesis test
     performed at the significance level ALPHA.

     '[...] = ranksum (X, Y, "method", M)' defines the computation
     method of the p-value specified in M, which can be "exact",
     "approximate", or "oldexact".  M must be a single string.  When
     "method" is unspecified, the default is: "exact" when 'min (length
     (X), length (Y)) < 10' and 'length (X) + length (Y) < 10',
     otherwise the "approximate" method is used.

        * "exact" method uses full enumeration for small total sample
          size (< 10), otherwise the network algorithm is used for
          larger samples.
        * "approximate" uses normal approximation method for computing
          the p-value.
        * "oldexact" uses full enumeration for any sample size.  Note,
          that this option can lead to out of memory error for large
          samples.  Use with caution!

     '[...] = ranksum (X, Y, "tail", TAIL)' defines the type of test,
     which can be "both", "right", or "left".  TAIL must be a single
     string.

        * "both" - "medians are not equal" (two-tailed test, default)
        * "right" - "median of X is greater than median of Y"
          (right-tailed test)
        * "left" - "median of X is less than median of Y" (left-tailed
          test)

     Note: the rank sum statistic is based on the smaller sample of
     vectors X and Y.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 41
Wilcoxon rank sum test for equal medians.



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 8
raylstat


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 855
 -- statistics: [M, V] = raylstat (SIGMA)

     Compute mean and variance of the Rayleigh distribution.

     Arguments
     ---------

        * SIGMA is the parameter of the Rayleigh distribution.  The
          elements of SIGMA must be positive.

     Return values
     -------------

        * M is the mean of the Rayleigh distribution.

        * V is the variance of the Rayleigh distribution.

     Example
     -------

          sigma = 1:6;
          [m, v] = raylstat (sigma)

     References
     ----------

       1. Wendy L. Martinez and Angel R. Martinez.  'Computational
          Statistics Handbook with MATLAB'. Appendix E, pages 547-557,
          Chapman & Hall/CRC, 2001.

       2. Athanasios Papoulis.  'Probability, Random Variables, and
          Stochastic Processes'.  McGraw-Hill, New York, second edition,
          1984.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 55
Compute mean and variance of the Rayleigh distribution.



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 7
regress


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 1317
 -- statistics: [B, BINT, R, RINT, STATS] = regress (Y, X, [ALPHA])

     Multiple Linear Regression using Least Squares Fit of Y on X with
     the model 'y = X * beta + e'.

     Here,

        * 'y' is a column vector of observed values
        * 'X' is a matrix of regressors, with the first column filled
          with the constant value 1
        * 'beta' is a column vector of regression parameters
        * 'e' is a column vector of random errors

     Arguments are

        * Y is the 'y' in the model
        * X is the 'X' in the model
        * ALPHA is the significance level used to calculate the
          confidence intervals BINT and RINT (see 'Return values'
          below).  If not specified, ALPHA defaults to 0.05

     Return values are

        * B is the 'beta' in the model
        * BINT is the confidence interval for B
        * R is a column vector of residuals
        * RINT is the confidence interval for R
        * STATS is a row vector containing:

             * The R^2 statistic
             * The F statistic
             * The p value for the full model
             * The estimated error variance

     R and RINT can be passed to 'rcoplot' to visualize the residual
     intervals and identify outliers.

     NaN values in Y and X are removed before calculation begins.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 80
Multiple Linear Regression using Least Squares Fit of Y on X with the
model '...



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 10
regress_gp


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 1120
 -- statistics: [M, K] = regress_gp (X, Y, SP)
 -- statistics: [... YI DY] = regress_gp (..., XI)

     Linear scalar regression using gaussian processes.

     It estimates the model Y = X'*m for X R^D and Y in R. The
     information about errors of the predictions
     (interpolation/extrapolation) is given by the covarianve matrix K.
     If D==1 the inputs must be column vectors, if D>1 then X is n-by-D,
     with n the number of data points.  SP defines the prior covariance
     of M, it should be a (D+1)-by-(D+1) positive definite matrix, if it
     is empty, the default is 'Sp = 100*eye(size(x,2)+1)'.

     If XI inputs are provided, the model is evaluated and returned in
     YI.  The estimation of the variation of YI are given in DY.

     Run 'demo regress_gp' to see an examples.

     The function is a direc implementation of the formulae in pages
     11-12 of Gaussian Processes for Machine Learning.  Carl Edward
     Rasmussen and  Christopher K. I. Williams.  The MIT Press, 2006.
     ISBN 0-262-18253-X. available online at
     <http://gaussianprocess.org/gpml/>.

     See also: regress.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 50
Linear scalar regression using gaussian processes.



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 16
regression_ftest


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 2392
 -- statistics: [H, PVAL, STATS] = regression_ftest (Y, X, FM)
 -- statistics: [...] = regression_ftest (Y, X, FM, RM)
 -- statistics: [...] = regression_ftest (Y, X, FM, RM, NAME, VALUE)
 -- statistics: [...] = regression_ftest (Y, X, FM, [], NAME, VALUE)

     F-test for General Linear Regression Analysis

     Perform a general linear regression F test for the null hypothesis
     that the full model of the form y = b_0 + b_1 * x_1 + b_2 * x_2 +
     ... + b_n * x_n + e, where n is the number of variables in X, does
     not perform better than a reduced model, such as y = b'_0 + b'_1 *
     x_1 + b'_2 * x_2 + ... + b'_k * x_k + e, where k < n and it
     corresponds to the first k variables in X.  Explanatory (dependent)
     variable Y and response (independent) variables X must not contain
     any missing values (NaNs).

     The full model, FM, must be a vector of length equal to the columns
     of X, in which case the constant term b_0 is assumed 0, or equal to
     the columns of X plus one, in which case the first element is the
     constant b_0.

     The reduced model, RM, must include the constant term and a subset
     of the variables (columns) in X.  If RM is not given, then a
     constant term b'_0 is assumed equal to the constant term, b_0, of
     the full model or 0, if the full model, FM, does not have a
     constant term.  RM must be a vector or a scalar if only a constant
     term is passed into the function.

     Name-Value pair arguments can be used to set statistical
     significance.  "alpha" can be used to specify the significance
     level of the test (the default value is 0.05).  If you want pass
     optional Name-Value pair without a reduced model, make sure that
     the latter is passed as an empty variable.

     If H is 1 the null hypothesis is rejected, meaning that the full
     model explains the variance better than the restricted model.  If H
     is 0, it can be assumed that the full model does NOT explain the
     variance any better than the restricted model.

     The p-value (1 minus the CDF of this distribution at F) is returned
     in PVAL.

     Under the null, the test statistic F follows an F distribution with
     'df1' and 'df2' degrees of freedom, which are returned as fields in
     the STATS structure along with the test's F-statistic, 'fstat'

     See also: regress, regression_ttest.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 45
F-test for General Linear Regression Analysis



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 16
regression_ttest


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 863
 -- statistics: [PVAL, T, DF] = regression_ttest (Y, X, RR, R, ALT)

     Perform a linear regression t-test for the null hypothesis 'RR * B
     = R' in a classical normal regression model 'Y = X * B + E'.

     Under the null, the test statistic T follows a T distribution with
     DF degrees of freedom.

     If R is omitted, a value of 0 is assumed.

     With the optional argument string ALT, the alternative of interest
     can be selected.  If ALT is "!=" or "<>", the null is tested
     against the two-sided alternative 'RR * B != R'.  If ALT is ">",
     the one-sided alternative 'RR * B > R' is used.  Similarly for "<",
     the one-sided alternative 'RR * B < R' is used.  The default is the
     two-sided case.

     The p-value of the test is returned in PVAL.

     If no output argument is given, the p-value of the test is
     displayed.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 80
Perform a linear regression t-test for the null hypothesis 'RR * B = R'
in a ...



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 9
rmmissing


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 1481
 -- statistics: R = rmmissing (A)
 -- statistics: R = rmmissing (A, DIM)
 -- statistics: R = rmmissing (..., NAME, VALUE)
 -- statistics: [R TF] = rmmissing (...)

     Remove missing or incomplete data from an array.

     Given an input vector or matrix (2-D array) A, remove missing data
     from a vector or missing rows or columns from a matrix.  A can be a
     numeric array, char array, or an array of cell strings.  R returns
     the array after removal of missing data.

     The values which represent missing data depend on the data type of
     A:

        * 'NaN': 'single', 'double'.

        * '' '' (white space): 'char'.

        * '{''}': string cells.

     Choose to remove rows (default) or columns by setting optional
     input DIM:

        * '1': rows.

        * '2': columns.

     Note: data types with no default 'missing' value will always result
     in 'R == A' and a TF output of 'false(size(A))'.

     Additional optional parameters are set by NAME-VALUE pairs.  These
     are:

        * 'MinNumMissing': minimum number of missing values to remove an
          entry, row or column, defined as a positive integer number.
          E.g.: if 'MinNumMissing' is set to '2', remove the row of a
          numeric matrix only if it includes 2 or more NaN.

     Optional return value TF is a logical array where 'true' values
     represent removed entries, rows or columns from the original data
     A.

See also: ismissing, standardizeMissing.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 48
Remove missing or incomplete data from an array.



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 8
run_test


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 349
 -- statistics: [PVAL, CHISQ] = run_test (X)

     Perform a chi-square test with 6 degrees of freedom based on the
     upward runs in the columns of X.

     'run_test' can be used to decide whether X contains independent
     data.

     The p-value of the test is returned in PVAL.

     If no output argument is given, the p-value is displayed.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 80
Perform a chi-square test with 6 degrees of freedom based on the upward
runs ...



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 8
runstest


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 1123
 -- statistics: [H, P, STATS] = runstest (X, V)

     Runs test for detecting serial correlation in the vector X.

     Arguments
     ---------

        * X is the vector of given values.
        * V is the value to subtract from X to get runs (defaults to
          'median(x)')

     Return values
     -------------

        * H is true if serial correlation is detected at the 95%
          confidence level (two-tailed), false otherwise.
        * P is the probablity of obtaining a test statistic of the
          magnitude found under the null hypothesis of no serial
          correlation.
        * STATS is the structure containing as fields the number of runs
          NRUNS; the numbers of positive and negative values of 'x - v',
          N1 and N0; and the test statistic Z.

     Note: the large-sample normal approximation is used to find H and
     P.  This is accurate if N1, N0 are both greater than 10.

     Reference: NIST Engineering Statistics Handbook, 1.3.5.13.  Runs
     Test for Detecting Non-randomness,
     http://www.itl.nist.gov/div898/handbook/eda/section3/eda35d.htm

     See also: .


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 59
Runs test for detecting serial correlation in the vector X.



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 11
sampsizepwr


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 6163
 -- statistics: N = sampsizepwr (TESTTYPE, PARAMS, P1)
 -- statistics: N = sampsizepwr (TESTTYPE, PARAMS, P1, POWER)
 -- statistics: POWER = sampsizepwr (TESTTYPE, PARAMS, P1, [], N)
 -- statistics: P1 = sampsizepwr (TESTTYPE, PARAMS, [], POWER, N)
 -- statistics: [N1, N2] = sampsizepwr ("t2", PARAMS, P1, POWER)
 -- statistics: [...] = sampsizepwr (TESTTYPE, PARAMS, P1, POWER, N,
          NAME, VALUE)

     Sample size and power calculation for hypothesis test.

     'sampsizepwr' computes the sample size, power, or alternative
     parameter value for a hypothesis test, given the other two values.
     For example, you can compute the sample size required to obtain a
     particular power for a hypothesis test, given the parameter value
     of the alternative hypothesis.

     'N = sampsizepwr (TESTTYPE, PARAMS, P1)' returns the sample size N
     required for a two-sided test of the specified type to have a power
     (probability of rejecting the null hypothesis when the alternative
     is true) of 0.90 when the significance level (probability of
     rejecting the null hypothesis when the null hypothesis is true) is
     0.05.  PARAMS specifies the parameter values under the null
     hypothesis.  P1 specifies the value of the single parameter being
     tested under the alternative hypothesis.  For the two-sample
     t-test, N is the value of the equal sample size for both samples,
     PARAMS specifies the parameter values of the first sample under the
     null and alternative hypotheses, and P1 specifies the value of the
     single parameter from the other sample under the alternative
     hypothesis.

     The following TESTTYPE values are available:

          "z"     one-sample z-test for normally distributed data with known
                  standard deviation.  PARAMS is a two-element vector [MU0
                  SIGMA0] of the mean and standard deviation, respectively,
                  under the null hypothesis.  P1 is the value of the mean
                  under the alternative hypothesis.
          "t"     one-sample t-test or paired t-test for normally distributed
                  data with unknown standard deviation.  PARAMS is a
                  two-element vector [MU0 SIGMA0] of the mean and standard
                  deviation, respectively, under the null hypothesis.  P1 is
                  the value of the mean under the alternative hypothesis.
          "t2"    two-sample pooled t-test (test for equal means) for
                  normally distributed data with equal unknown standard
                  deviations.  PARAMS is a two-element vector [MU0 SIGMA0] of
                  the mean and standard deviation of the first sample under
                  the null and alternative hypotheses.  P1 is the the mean of
                  the second sample under the alternative hypothesis.
          "var"   chi-square test of variance for normally distributed data.
                  PARAMS is the variance under the null hypothesis.  P1 is
                  the variance under the alternative hypothesis.
          "p"     test of the P parameter (success probability) for a
                  binomial distribution.  PARAMS is the value of P under the
                  null hypothesis.  P1 is the value of P under the
                  alternative hypothesis.
          "r"     test of the correlation coefficient parameter for
                  significance.  PARAMS is the value of r under the null
                  hypothesis.  P1 is the value of r under the alternative
                  hypothesis.

     The "p" test for the binomial distribution is a discrete test for
     which increasing the sample size does not always increase the
     power.  For N values larger than 200, there may be values smaller
     than the returned N value that also produce the desired power.

     'N = sampsizepwr (TESTTYPE, PARAMS, P1, POWER)' returns the sample
     size N such that the power is POWER for the parameter value P1.
     For the two-sample t-test, N is the equal sample size of both
     samples.

     '[N1, N2] = sampsizepwr ("t2", PARAMS, P1, POWER)' returns the
     sample sizes N1 and N2 for the two samples.  These values are the
     same unless the "ratio" parameter, 'RATIO = N2 / N2', is set to a
     value other than the default (See the name/value pair definition of
     ratio below).

     'POWER = sampsizepwr (TESTTYPE, PARAMS, P1, [], N)' returns the
     power achieved for a sample size of N when the true parameter value
     is P1.  For the two-sample t-test, N is the smaller one of the two
     sample sizes.

     'P1 = sampsizepwr (TESTTYPE, PARAMS, [], POWER, N)' returns the
     parameter value detectable with the specified sample size N and
     power POWER.  For the two-sample t-test, N is the smaller one of
     the two sample sizes.  When computing P1 for the "p" test, if no
     alternative can be rejected for a given PARAMS, N and POWER value,
     the function displays a warning message and returns NaN.

     '[...] = sampsizepwr (..., N, NAME, VALUE)' specifies one or more
     of the following NAME / VALUE pairs:

          "alpha" significance level of the test (default is 0.05)
          "tail"  the type of test which can be:
             "both"      two-sided test for an alternative P1 not equal to
                         PARAMS
             "right"     one-sided test for an alternative P1 larger than
                         PARAMS
             "left"      one-sided test for an alternative P1 smaller than
                         PARAMS
          "ratio" desired ratio N2 / N2 of the larger sample size N2 to the
                  smaller sample size N1.  Used only for the two-sample
                  t-test.  The value of 'RATIO' is greater than or equal to 1
                  (default is 1).

     'sampsizepwr' computes the sample size, power, or alternative
     hypothesis value given values for the other two.  Specify one of
     these as [] to compute it.  The remaining parameters (and ALPHA,
     RATIO) can be scalars or arrays of the same size.

     See also: vartest, ttest, ttest2, ztest, binocdf.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 54
Sample size and power calculation for hypothesis test.



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 9
sigma_pts


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 1110
 -- statistics: PTS = sigma_pts (N)
 -- statistics: PTS = sigma_pts (N, M)
 -- statistics: PTS = sigma_pts (N, M, K)
 -- statistics: PTS = sigma_pts (N, M, K, L)

     Calculates 2*N+1 sigma points in N dimensions.

     Sigma points are used in the unscented transfrom to estimate the
     result of applying a given nonlinear transformation to a
     probability distribution that is characterized only in terms of a
     finite set of statistics.

     If only the dimension N is given the resulting points have zero
     mean and identity covariance matrix.  If the mean M or the
     covaraince matrix K are given, then the resulting points will have
     those statistics.  The factor L scaled the points away from the
     mean.  It is useful to tune the accuracy of the unscented
     transfrom.

     There is no unique way of computing sigma points, this function
     implements the algorithm described in section 2.6 "The New Filter"
     pages 40-41 of

     Uhlmann, Jeffrey (1995).  "Dynamic Map Building and Localization:
     New Theoretical Foundations".  Ph.D. thesis.  University of Oxford.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 46
Calculates 2*N+1 sigma points in N dimensions.



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 8
signtest


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 1749
 -- statistics: [PVAL, H, STATS] = signtest (X)
 -- statistics: [PVAL, H, STATS] = signtest (X, M)
 -- statistics: [PVAL, H, STATS] = signtest (X, Y)
 -- statistics: [PVAL, H, STATS] = signtest (X, Y, NAME, VALUE)

     Test for median.

     Perform a signtest of the null hypothesis that X is from a
     distribution that has a zero median.  X must be a vector.

     If the second argument M is a scalar, the null hypothesis is that X
     has median m.

     If the second argument Y is a vector, the null hypothesis is that
     the distribution of 'X - Y' has zero median.

     The argument "alpha" can be used to specify the significance level
     of the test (the default value is 0.05).  The string argument
     "tail", can be used to select the desired alternative hypotheses.
     If "tail" is "both" (default) the null is tested against the
     two-sided alternative 'median (X) != M'.  If "tail" is "right" the
     one-sided alternative 'median (X) > M' is considered.  Similarly
     for "left", the one-sided alternative 'median (X) < M' is
     considered.

     When "method" is "exact" the p-value is computed using an exact
     method.  When "method" is "approximate" a normal approximation is
     used for the test statistic.  When "method" is not defined as an
     optional input argument, then for 'length (X) < 100' the "exact"
     method is computed, otherwise the "approximate" method is used.

     The p-value of the test is returned in PVAL.  If H is 0 the null
     hypothesis is accepted, if it is 1 the null hypothesis is rejected.
     STATS is a structure containing the value of the test statistic
     (SIGN) and the value of the z statistic (ZVAL) (only computed when
     the 'method' is 'approximate'.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 16
Test for median.



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 10
silhouette


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 1908
 -- statistics: silhouette (X, CLUST)
 -- statistics: [SI, H] = silhouette (X, CLUST)
 -- statistics: [SI, H] = silhouette (..., METRIC, METRICARG)

     Compute the silhouette values of clustered data and show them on a
     plot.

     X is a n-by-p matrix of n data points in a p-dimensional space.
     Each datapoint is assigned to a cluster using CLUST, a vector of n
     elements, one cluster assignment for each data point.

     Each silhouette value of SI, a vector of size n, is a measure of
     the likelihood that a data point is accurately classified to the
     right cluster.  Defining "a" as the mean distance between a point
     and the other points from its cluster, and "b" as the mean distance
     between that point and the points from other clusters, the
     silhouette value of the i-th point is:

              bi - ai
     Si =  ------------
            max(ai,bi)

     Each element of SI ranges from -1, minimum likelihood of a correct
     classification, to 1, maximum likelihood.

     Optional input value METRIC is the metric used to compute the
     distances between data points.  Since 'silhouette' uses 'pdist' to
     compute these distances, METRIC is quite similar to the option
     METRIC of pdist and it can be:
        * A known distance metric defined as a string: Euclidean,
          sqEuclidean (default), cityblock, cosine, correlation,
          Hamming, Jaccard.

        * A vector as those created by 'pdist'.  In this case X does
          nothing.

        * A function handle that is passed to 'pdist' with METRICARG as
          optional inputs.

     Optional return value H is a handle to the silhouette plot.

     *Reference* Peter J. Rousseeuw, Silhouettes: a Graphical Aid to the
     Interpretation and Validation of Cluster Analysis.  1987.
     doi:10.1016/0377-0427(87)90125-7

See also: dendrogram, evalcluster, kmeans, linkage, pdist.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 72
Compute the silhouette values of clustered data and show them on a plot.



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 11
slicesample


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 2093
 -- statistics: [SMPL, NEVAL] = slicesample (START, NSAMPLES, PROPERTY,
          VALUE, ...)

     Draws NSAMPLES samples from a target stationary distribution PDF
     using slice sampling of Radford M. Neal.

     Input:
        * START is a 1 by DIM vector of the starting point of the Markov
          chain.  Each column corresponds to a different dimension.

        * NSAMPLES is the number of samples, the length of the Markov
          chain.

     Next, several property-value pairs can or must be specified, they
     are:

     (Required properties) One of:

        * "PDF": the value is a function handle of the target stationary
          distribution to be sampled.  The function should accept
          different locations in each row and each column corresponds to
          a different dimension.

          or

        * LOGPDF: the value is a function handle of the log of the
          target stationary distribution to be sampled.  The function
          should accept different locations in each row and each column
          corresponds to a different dimension.

     The following input property/pair values may be needed depending on
     the desired outut:

        * "burnin" BURNIN the number of points to discard at the
          beginning, the default is 0.

        * "thin" THIN omitts M-1 of every M points in the generated
          Markov chain.  The default is 1.

        * "width" WIDTH the maximum Manhattan distance between two
          samples.  The default is 10.

     Outputs:

        * SMPL is a NSAMPLES by DIM matrix of random values drawn from
          PDF where the rows are different random values, the columns
          correspond to the dimensions of PDF.

        * NEVAL is the number of function evaluations per sample.
     Example : Sampling from a normal distribution

          start = 1;
          nsamples = 1e3;
          pdf = @(x) exp (-.5 * x .^ 2) / (pi ^ .5 * 2 ^ .5);
          [smpl, accept] = slicesample (start, nsamples, "pdf", pdf, "thin", 4);
          histfit (smpl);

     See also: rand, mhsample, randsample.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 80
Draws NSAMPLES samples from a target stationary distribution PDF using
slice ...



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 10
squareform


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 1139
 -- statistics: Z = squareform (Y)
 -- statistics: Y = squareform (Z)
 -- statistics: Z = squareform (Y, "tovector")
 -- statistics: Y = squareform (Z, "tomatrix")

     Interchange between distance matrix and distance vector formats.

     Converts between an hollow (diagonal filled with zeros), square,
     and symmetric matrix and a vector with of the lower triangular
     part.

     Its target application is the conversion of the vector returned by
     'pdist' into a distance matrix.  It performs the opposite operation
     if input is a matrix.

     If X is a vector, its number of elements must fit into the
     triangular part of a matrix (main diagonal excluded).  In other
     words, 'numel (X) = N * (N - 1) / 2' for some integer N.  The
     resulting matrix will be N by N.

     If X is a distance matrix, it must be square and the diagonal
     entries of X must all be zeros.  'squareform' will generate a
     warning if X is not symmetric.

     The second argument is used to specify the output type in case
     there is a single element.  It will defaults to "tomatrix"
     otherwise.

     See also: pdist.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 64
Interchange between distance matrix and distance vector formats.



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 18
standardizeMissing


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 1190
 -- statistics: B = standardizeMissing (A, INDICATOR)

     Replace data values specified by INDICATOR in A by the standard
     'missing' data value for that data type.

     A can be a numeric scalar or array, a character vector or array, or
     a cell array of character vectors (a.k.a.  string cells).

     INDICATOR can be a scalar or an array containing values to be
     replaced by the 'missing' value for the class of A, and should have
     a data type matching A.

     'missing' values are defined as :

        * 'NaN': 'single', 'double'

        * '' '' (white space): 'char'

        * '{''}' (empty string in cell): string cells.

     Compatibility Notes:
        * Octave's implementation of 'standardizeMissing' does not
          restrict INDICATOR of type 'char' to row vectors.

        * All numerical and logical inputs for A and INDICATOR may be
          specified in any combination.  The output will be the same
          class as A, with the INDICATOR converted to that data type for
          comparison.  Only 'single' and 'double' have defined 'missing'
          values, so A of other data types will always output B = A.

See also: ismissing, rmmissing.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 80
Replace data values specified by INDICATOR in A by the standard
'missing' dat...



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 11
stepwisefit


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 1528
 -- statistics: [X_USE, B, BINT, R, RINT, STATS] = stepwisefit (Y, X,
          PENTER = 0.05, PREMOVE = 0.1, METHOD = "corr")

     Linear regression with stepwise variable selection.

     Arguments
     ---------

        * Y is an N by 1 vector of data to fit.
        * X is an N by K matrix containing the values of K potential
          predictors.  No constant term should be included (one will
          always be added to the regression automatically).
        * PENTER is the maximum p-value to enter a new variable into the
          regression (default: 0.05).
        * PREMOVE is the minimum p-value to remove a variable from the
          regression (default: 0.1).
        * METHOD sets how predictors are selected at each step, either
          based on their correlation with the residuals ("corr",
          default) or on the p values of their regression coefficients
          when they are successively added ("p").

     Return values
     -------------

        * X_USE contains the indices of the predictors included in the
          final regression model.  The predictors are listed in the
          order they were added, so typically the first ones listed are
          the most significant.
        * B, BINT, R, RINT, STATS are the results of '[b, bint, r, rint,
          stats] = regress(y, [ones(size(y)) X(:, X_use)], penter);'

     References
     ----------

       1. N. R. Draper and H. Smith (1966).  'Applied Regression
          Analysis'.  Wiley.  Chapter 6.

     See also: regress.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 51
Linear regression with stepwise variable selection.



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 8
tabulate


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 1882
 -- statistics: TABLE = tabulate (DATA, EDGES)

     Compute a frequency table.

     For vector data, the function counts the number of values in data
     that fall between the elements in the edges vector (which must
     contain monotonically non-decreasing values).  TABLE is a matrix.
     The first column of TABLE is the number of bin, the second is the
     number of instances in each class (absolute frequency).  The third
     column contains the percentage of each value (relative frequency)
     and the fourth column contains the cumulative frequency.

     If EDGES is missed the width of each class is unitary, if EDGES is
     a scalar then represent the number of classes, or you can define
     the width of each bin.  TABLE(K, 2) will count the value DATA (I)
     if EDGES (K) <= DATA (I) < EDGES (K+1).  The last bin will count
     the value of DATA (I) if EDGES(K) <= DATA (I) <= EDGES (K+1).
     Values outside the values in EDGES are not counted.  Use -inf and
     inf in EDGES to include all values.  Tabulate with no output
     arguments returns a formatted table in the command window.

     Example

          sphere_radius = [1:0.05:2.5];
          tabulate (sphere_radius)

     Tabulate returns 2 bins, the first contains the sphere with radius
     between 1 and 2 mm excluded, and the second one contains the sphere
     with radius between 2 and 3 mm.

          tabulate (sphere_radius, 10)

     Tabulate returns ten bins.

          tabulate (sphere_radius, [1, 1.5, 2, 2.5])

     Tabulate returns three bins, the first contains the sphere with
     radius between 1 and 1.5 mm excluded, the second one contains the
     sphere with radius between 1.5 and 2 mm excluded, and the third
     contains the sphere with radius between 2 and 2.5 mm.

          bar (table (:, 1), table (:, 2))

     draw histogram.

     See also: bar, pareto.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 26
Compute a frequency table.



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 8
tiedrank


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 981
 -- statistics: [R, TIEADJ] = tiedrank (X)
 -- statistics: [R, TIEADJ] = tiedrank (X, TIEFLAG)
 -- statistics: [R, TIEADJ] = tiedrank (X, TIEFLAG, BIDIR)

     '[R, TIEADJ] = tiedrank (X)' computes the ranks of the values in
     vector X.  If any values in X are tied, 'tiedrank' computes their
     average rank.  The return value TIEADJ is an adjustment for ties
     required by the nonparametric tests 'signrank' and 'ranksum', and
     for the computation of Spearman's rank correlation.

     '[R, TIEADJ] = tiedrank (X, 1)' computes the ranks of the values in
     the vector X.  TIEADJ is a vector of three adjustments for ties
     required in the computation of Kendall's tau.  'tiedrank (X, 0)' is
     the same as 'tiedrank (X)'.

     '[R, TIEADJ] = tiedrank (X, 0, 1)' computes the ranks from each
     end, so that the smallest and largest values get rank 1, the next
     smallest and largest get rank 2, etc.  These ranks are used in the
     Ansari-Bradley test.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 74
'[R, TIEADJ] = tiedrank (X)' computes the ranks of the values in vector
X.



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 8
trimmean


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 383
 -- statistics: A = trimmean (X, P)

     Compute the trimmed mean.

     The trimmed mean of X is defined as the mean of X excluding the
     highest and lowest P percent of the data.

     For example

          mean ([-inf, 1:9, inf])

     is NaN, while

          trimmean ([-inf, 1:9, inf], 10)

     excludes the infinite values, which make the result 5.

     See also: mean.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 25
Compute the trimmed mean.



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 5
tstat


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 838
 -- statistics: [M, V] = tstat (N)

     Compute mean and variance of the t (Student) distribution.

     Arguments
     ---------

        * N is the parameter of the t (Student) distribution.  The
          elements of N must be positive

     Return values
     -------------

        * M is the mean of the t (Student) distribution

        * V is the variance of the t (Student) distribution

     Example
     -------

          n = 3:8;
          [m, v] = tstat (n)

     References
     ----------

       1. Wendy L. Martinez and Angel R. Martinez.  'Computational
          Statistics Handbook with MATLAB'. Appendix E, pages 547-557,
          Chapman & Hall/CRC, 2001.

       2. Athanasios Papoulis.  'Probability, Random Variables, and
          Stochastic Processes'.  McGraw-Hill, New York, second edition,
          1984.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 58
Compute mean and variance of the t (Student) distribution.



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 5
ttest


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 2482
 -- statistics: [H, PVAL, CI, STATS] = ttest (X)
 -- statistics: [H, PVAL, CI, STATS] = ttest (X, M)
 -- statistics: [H, PVAL, CI, STATS] = ttest (X, Y)
 -- statistics: [H, PVAL, CI, STATS] = ttest (X, M, NAME, VALUE)
 -- statistics: [H, PVAL, CI, STATS] = ttest (X, Y, NAME, VALUE)

     Test for mean of a normal sample with unknown variance.

     Perform a t-test of the null hypothesis 'mean (X) == M' for a
     sample X from a normal distribution with unknown mean and unknown
     standard deviation.  Under the null, the test statistic T has a
     Student's t distribution.  The default value of M is 0.

     If the second argument Y is a vector, a paired-t test of the
     hypothesis 'mean (X) = mean (Y)' is performed.  If X and Y are
     vectors, they must have the same size and dimensions.

     X (and Y) can also be matrices.  For matrices, ttest performs
     separate t-tests along each column, and returns a vector of
     results.  X and Y must have the same number of columns.  The Type I
     error rate of the resulting vector of PVAL can be controlled by
     entering PVAL as input to the function multcompare.

     ttest treats NaNs as missing values, and ignores them.

     Name-Value pair arguments can be used to set various options.
     "alpha" can be used to specify the significance level of the test
     (the default value is 0.05).  "tail", can be used to select the
     desired alternative hypotheses.  If the value is "both" (default)
     the null is tested against the two-sided alternative 'mean (X) !=
     M'.  If it is "right" the one-sided alternative 'mean (X) > M' is
     considered.  Similarly for "left", the one-sided alternative 'mean
     (X) < M' is considered.  When argument X is a matrix, "dim" can be
     used to select the dimension over which to perform the test.  (The
     default is the first non-singleton dimension).

     If H is 1 the null hypothesis is rejected, meaning that the tested
     sample does not come from a Student's t distribution.  If H is 0,
     then the null hypothesis cannot be rejected and it can be assumed
     that X follows a Student's t distribution.  The p-value of the test
     is returned in PVAL.  A 100(1-alpha)% confidence interval is
     returned in CI.

     STATS is a structure containing the value of the test statistic
     (TSTAT), the degrees of freedom (DF) and the sample's standard
     deviation (SD).

     See also: hotelling_ttest, ttest2, hotelling_ttest2.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 55
Test for mean of a normal sample with unknown variance.



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 6
ttest2


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 1998
 -- statistics: [H, PVAL, CI, STATS] = ttest2 (X, Y)
 -- statistics: [H, PVAL, CI, STATS] = ttest2 (X, Y, NAME, VALUE)

     Perform a t-test to compare the means of two groups of data under
     the null hypothesis that the groups are drawn from distributions
     with the same mean.

     X and Y can be vectors or matrices.  For matrices, ttest2 performs
     separate t-tests along each column, and returns a vector of
     results.  X and Y must have the same number of columns.  The Type I
     error rate of the resulting vector of PVAL can be controlled by
     entering PVAL as input to the function multcompare.

     ttest2 treats NaNs as missing values, and ignores them.

     For a nested t-test, use anova2.

     The argument "alpha" can be used to specify the significance level
     of the test (the default value is 0.05).  The string argument
     "tail", can be used to select the desired alternative hypotheses.
     If "tail" is "both" (default) the null is tested against the
     two-sided alternative 'mean (X) != M'.  If "tail" is "right" the
     one-sided alternative 'mean (X) > M' is considered.  Similarly for
     "left", the one-sided alternative 'mean (X) < M' is considered.

     When "vartype" is "equal" the variances are assumed to be equal
     (this is the default).  When "vartype" is "unequal" the variances
     are not assumed equal.

     When argument X and Y are matrices the "dim" argument can be used
     to select the dimension over which to perform the test.  (The
     default is the first non-singleton dimension.)

     If H is 0 the null hypothesis is accepted, if it is 1 the null
     hypothesis is rejected.  The p-value of the test is returned in
     PVAL.  A 100(1-alpha)% confidence interval is returned in CI.
     STATS is a structure containing the value of the test statistic
     (TSTAT), the degrees of freedom (DF) and the sample standard
     deviation (SD).

     See also: hotelling_ttest2, anova1, hotelling_ttest, ttest.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 80
Perform a t-test to compare the means of two groups of data under the
null hy...



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 8
unidstat


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 880
 -- statistics: [M, V] = unidstat (N)

     Compute mean and variance of the discrete uniform distribution.

     Arguments
     ---------

        * N is the parameter of the discrete uniform distribution.  The
          elements of N must be positive natural numbers

     Return values
     -------------

        * M is the mean of the discrete uniform distribution

        * V is the variance of the discrete uniform distribution

     Example
     -------

          n = 1:6;
          [m, v] = unidstat (n)

     References
     ----------

       1. Wendy L. Martinez and Angel R. Martinez.  'Computational
          Statistics Handbook with MATLAB'. Appendix E, pages 547-557,
          Chapman & Hall/CRC, 2001.

       2. Athanasios Papoulis.  'Probability, Random Variables, and
          Stochastic Processes'.  McGraw-Hill, New York, second edition,
          1984.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 63
Compute mean and variance of the discrete uniform distribution.



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 8
unifstat


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 1091
 -- statistics: [M, V] = unifstat (A, B)

     Compute mean and variance of the continuous uniform distribution.

     Arguments
     ---------

        * A is the first parameter of the continuous uniform
          distribution

        * B is the second parameter of the continuous uniform
          distribution
     A and B must be of common size or one of them must be scalar and A
     must be less than B

     Return values
     -------------

        * M is the mean of the continuous uniform distribution

        * V is the variance of the continuous uniform distribution

     Examples
     --------

          a = 1:6;
          b = 2:2:12;
          [m, v] = unifstat (a, b)

          [m, v] = unifstat (a, 10)

     References
     ----------

       1. Wendy L. Martinez and Angel R. Martinez.  'Computational
          Statistics Handbook with MATLAB'. Appendix E, pages 547-557,
          Chapman & Hall/CRC, 2001.

       2. Athanasios Papoulis.  'Probability, Random Variables, and
          Stochastic Processes'.  McGraw-Hill, New York, second edition,
          1984.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 65
Compute mean and variance of the continuous uniform distribution.



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 7
vartest


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 2280
 -- statistics: H = vartest (X, V)
 -- statistics: H = vartest (X, V, NAME, VALUE)
 -- statistics: [H, PVAL] = vartest (...)
 -- statistics: [H, PVAL, CI] = vartest (...)
 -- statistics: [H, PVAL, CI, STATS] = vartest (...)

     One-sample test of variance.

     'H = vartest (X, V)' performs a chi-square test of the hypothesis
     that the data in the vector X come from a normal distribution with
     variance V, against the alternative that X comes from a normal
     distribution with a different variance.  The result is H = 0 if the
     null hypothesis ("variance is V") cannot be rejected at the 5%
     significance level, or H = 1 if the null hypothesis can be rejected
     at the 5% level.

     X may also be a matrix or an N-D array.  For matrices, 'vartest'
     performs separate tests along each column of X, and returns a
     vector of results.  For N-D arrays, 'vartest' works along the first
     non-singleton dimension of X.  V must be a scalar.

     'vartest' treats NaNs as missing values, and ignores them.

     '[H, PVAL] = vartest (...)' returns the p-value.  That is the
     probability of observing the given result, or one more extreme, by
     chance if the null hypothesisis true.

     '[H, PVAL, CI] = vartest (...)' returns a 100 * (1 - ALPHA)%
     confidence interval for the true variance.

     '[H, PVAL, CI, STATS] = vartest (...)' returns a structure with the
     following fields:

          "chisqstat"    the value of the test statistic
          "df"           the degrees of freedom of the test

     '[...] = vartest (..., NAME, VALUE), ...' specifies one or more of
     the following name/value pairs:

          Name           Value
     ---------------------------------------------------------------------------
          "alpha"        the significance level.  Default is 0.05.
                         
          "dim"          dimension to work along a matrix or an N-D array.
                         
          "tail"         a string specifying the alternative hypothesis:
             "both"      "variance is not V" (two-tailed, default)
             "left"      "variance is less than V" (left-tailed)
             "right"     "variance is greater than V" (right-tailed)

     See also: ttest, ztest, kstest.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 28
One-sample test of variance.



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 8
vartest2


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 2477
 -- statistics: H = vartest2 (X, Y)
 -- statistics: H = vartest2 (X, Y, NAME, VALUE)
 -- statistics: [H, PVAL] = vartest2 (...)
 -- statistics: [H, PVAL, CI] = vartest2 (...)
 -- statistics: [H, PVAL, CI, STATS] = vartest2 (...)

     Two-sample F test for equal variances.

     'H = vartest2 (X, Y)' performs an F test of the hypothesis that the
     independent data in vectors X and Y come from normal distributions
     with equal variance, against the alternative that they come from
     normal distributions with different variances.  The result is H = 0
     if the null hypothesis ("variance are equal") cannot be rejected at
     the 5% significance level, or H = 1 if the null hypothesis can be
     rejected at the 5% level.

     X and Y may also be matrices or N-D arrays.  For matrices,
     'vartest2' performs separate tests along each column and returns a
     vector of results.  For N-D arrays, 'vartest2' works along the
     first non-singleton dimension and X and Y must have the same size
     along all the remaining dimensions.

     'vartest' treats NaNs as missing values, and ignores them.

     '[H, PVAL] = vartest (...)' returns the p-value.  That is the
     probability of observing the given result, or one more extreme, by
     chance if the null hypothesisis true.

     '[H, PVAL, CI] = vartest (...)' returns a 100 * (1 - ALPHA)%
     confidence interval for the true ratio var(X)/var(Y).

     '[H, PVAL, CI, STATS] = vartest (...)' returns a structure with the
     following fields:

          "fstat"        the value of the test statistic
          "df1"          the numerator degrees of freedom of the test
          "df2"          the denominator degrees of freedom of the test

     '[...] = vartest (..., NAME, VALUE), ...' specifies one or more of
     the following name/value pairs:

          Name           Value
     ---------------------------------------------------------------------------
          "alpha"        the significance level.  Default is 0.05.
                         
          "dim"          dimension to work along a matrix or an N-D array.
                         
          "tail"         a string specifying the alternative hypothesis:
             "both"      "variance is not V" (two-tailed, default)
             "left"      "variance is less than V" (left-tailed)
             "right"     "variance is greater than V" (right-tailed)

     See also: ttest2, kstest2, bartlett_test, levene_test.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 38
Two-sample F test for equal variances.



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 8
vartestn


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 3283
 -- statistics: vartestn (X)
 -- statistics: vartestn (X, GROUP)
 -- statistics: vartestn (..., NAME, VALUE)
 -- statistics: P = vartestn (...)
 -- statistics: [P, STATS] = vartestn (...)
 -- statistics: [P, STATS] = vartestn (..., NAME, VALUE)

     Test for equal variances across multiple groups.

     'H = vartestn (X)' performs Bartlett's test for equal variances for
     the columns of the matrix X.  This is a test of the null hypothesis
     that the columns of X come from normal distributions with the same
     variance, against the alternative that they come from normal
     distributions with different variances.  The result is displayed in
     a summary table of statistics as well as a box plot of the groups.

     'vartestn (X, GROUP)' requires a vector X, and a GROUP argument
     that is a categorical variable, vector, string array, or cell array
     of strings with one row for each element of X.  Values of X
     corresponding to the same value of GROUP are placed in the same
     group.

     'vartestn' treats NaNs as missing values, and ignores them.

     'P = vartestn (...)' returns the probability of observing the given
     result, or one more extreme, by chance under the null hypothesis
     that all groups have equal variances.  Small values of P cast doubt
     on the validity of the null hypothesis.

     '[P, STATS] = vartestn (...)' returns a structure with the
     following fields:

          "chistat"      - the value of the test statistic
          "df"           - the degrees of freedom of the test

     '[P, STATS] = vartestn (..., NAME, VALUE)' specifies one or more of
     the following NAME/VALUE pairs:

     "display"      "on" to display a boxplot and table, or "off" to omit
                    omit these displays.  Default "on".
                    
     "testtype"     One of the following strings to control the type of test
                    to perform:

        "Bartlett"         Bartlett's test (default).
                           
        "LeveneQuadratic"  Levene's test computed by performing anova on the
                           squared deviations of the data values from their
                           group means.
                           
        "LeveneAbsolute"   Levene's test computed by performing anova on the
                           absolute deviations of the data values from their
                           group means.
                           
        "BrownForsythe"    Brown-Forsythe test computed by performing anova
                           on the absolute deviations of the data values from
                           the group medians.
                           
        "OBrien"           O'Brien's modification of Levene's test with
                           W=0.5.

     The classical 'Bartlett' test is sensitive to the assumption that
     the distribution in each group is normal.  The other test types are
     more robust to non-normal distributions, especially ones prone to
     outliers.  For these tests, the STATS output structure has a field
     named "fstat" containing the test statistic, and "df1" and "df2"
     containing its numerator and denominator degrees of freedom.

     See also: vartest, vartest2, anova1, bartlett_test, levene_test.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 48
Test for equal variances across multiple groups.



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 6
violin


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 2663
 -- statistics: violin (X)
 -- statistics: H = violin (X)
 -- statistics: H = violin (..., PROPERTY, VALUE, ...)
 -- statistics: H = violin (HAX, ...)
 -- statistics: H = violin (..., "horizontal")

     Produce a Violin plot of the data X.

     The input data X can be a N-by-m array containg N observations of m
     variables.  It can also be a cell with m elements, for the case in
     which the variables are not uniformly sampled.

     The following PROPERTY can be set using PROPERTY/VALUE pairs
     (default values in parenthesis).  The value of the property can be
     a scalar indicating that it applies to all the variables in the
     data.  It can also be a cell/array, indicating the property for
     each variable.  In this case it should have m columns (as many as
     variables).

     Color
          ("y") Indicates the filling color of the violins.

     Nbins
          (50) Internally, the function calls 'hist' to compute the
          histogram of the data.  This property indicates how many bins
          to use.  See 'help hist' for more details.

     SmoothFactor
          (4) The fuction performs simple kernel density estimation and
          automatically finds the bandwith of the kernel function that
          best approximates the histogram using optimization ('sqp').
          The result is in general very noisy.  To smooth the result the
          bandwidth is multiplied by the value of this property.  The
          higher the value the smoother the violings, but values too
          high might remove features from the data distribution.

     Bandwidth
          (NA) If this property is given a value other than NA, it sets
          the bandwith of the kernel function.  No optimization is
          peformed and the property SmoothFactor is ignored.

     Width
          (0.5) Sets the maximum width of the violins.  Violins are
          centered at integer axis values.  The distance between two
          violin middle axis is 1.  Setting a value higher thna 1 in
          this property will cause the violins to overlap.

     If the string "Horizontal" is among the input arguments, the violin
     plot is rendered along the x axis with the variables in the y axis.

     The returned structure H has handles to the plot elements, allowing
     customization of the visualization using set/get functions.

     Example:

          title ("Grade 3 heights");
          axis ([0,3]);
          set (gca, "xtick", 1:2, "xticklabel", {"girls"; "boys"});
          h = violin ({randn(100,1)*5+140, randn(130,1)*8+135}, "Nbins", 10);
          set (h.violin, "linewidth", 2)

     See also: boxplot, hist.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 36
Produce a Violin plot of the data X.



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 7
wblplot


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 1748
 -- statistics: wblplot (DATA, ...)
 -- statistics: HANDLE = wblplot (DATA, ...)
 -- statistics: [HANDLE, PARAM] = wblplot (DATA)
 -- statistics: [HANDLE, PARAM] = wblplot (DATA, CENSOR)
 -- statistics: [HANDLE, PARAM] = wblplot (DATA, CENSOR, FREQ)
 -- statistics: [HANDLE, PARAM] = wblplot (DATA, CENSOR, FREQ, CONFINT)
 -- statistics: [HANDLE, PARAM] = wblplot (DATA, CENSOR, FREQ, CONFINT,
          FANCYGRID)
 -- statistics: [HANDLE, PARAM] = wblplot (DATA, CENSOR, FREQ, CONFINT,
          FANCYGRID, SHOWLEGEND)

     Plot a column vector DATA on a Weibull probability plot using rank
     regression.

     CENSOR: optional parameter is a column vector of same size as DATA
     with 1 for right censored data and 0 for exact observation.  Pass
     [] when no censor data are available.

     FREQ: optional vector same size as DATA with the number of
     occurences for corresponding data.  Pass [] when no frequency data
     are available.

     CONFINT: optional confidence limits for ploting upper and lower
     confidence bands using beta binomial confidence bounds.  If a
     single value is given this will be used such as LOW = a and HIGH =
     1 - a.  Pass [] if confidence bounds is not requested.

     FANCYGRID: optional parameter which if set to anything but 1 will
     turn off the fancy gridlines.

     SHOWLEGEND: optional parameter that when set to zero(0) turns off
     the legend.

     If one output argument is given, a HANDLE for the data marker and
     plotlines is returned, which can be used for further modification
     of line and marker style.

     If a second output argument is specified, a PARAM vector with
     scale, shape and correlation factor is returned.

     See also: normplot, wblpdf.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 78
Plot a column vector DATA on a Weibull probability plot using rank
regression.



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 7
wblstat


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 1094
 -- statistics: [M, V] = wblstat (SCALE, SHAPE)

     Compute mean and variance of the Weibull distribution.

     Arguments
     ---------

        * SCALE is the scale parameter of the Weibull distribution.
          SCALE must be positive

        * SHAPE is the shape parameter of the Weibull distribution.
          SHAPE must be positive
     SCALE and SHAPE must be of common size or one of them must be
     scalar

     Return values
     -------------

        * M is the mean of the Weibull distribution

        * V is the variance of the Weibull distribution

     Examples
     --------

          scale = 3:8;
          shape = 1:6;
          [m, v] = wblstat (scale, shape)

          [m, v] = wblstat (6, shape)

     References
     ----------

       1. Wendy L. Martinez and Angel R. Martinez.  'Computational
          Statistics Handbook with MATLAB'. Appendix E, pages 547-557,
          Chapman & Hall/CRC, 2001.

       2. Athanasios Papoulis.  'Probability, Random Variables, and
          Stochastic Processes'.  McGraw-Hill, New York, second edition,
          1984.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 54
Compute mean and variance of the Weibull distribution.



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 4
x2fx


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 2698
 -- statistics: [D, MODEL, TERMSTART, TERMEND] = x2fx (X)
 -- statistics: [D, MODEL, TERMSTART, TERMEND] = x2fx (X, MODEL)
 -- statistics: [D, MODEL, TERMSTART, TERMEND] = x2fx (X, MODEL, CATEG)
 -- statistics: [D, MODEL, TERMSTART, TERMEND] = x2fx (X, MODEL, CATEG,
          CATLEVELS)

     Convert predictors to design matrix.

     'D = x2fx (X, MODEL)' converts a matrix of predictors X to a design
     matrix D for regression analysis.  Distinct predictor variables
     should appear in different columns of X.

     The optional input MODEL controls the regression model.  By
     default, 'x2fx' returns the design matrix for a linear additive
     model with a constant term.  MODEL can be any one of the following
     strings:

          "linear"       Constant and linear terms (the default)
          "interaction"  Constant, linear, and interaction terms
          "quadratic"    Constant, linear, interaction, and squared terms
          "purequadratic"Constant, linear, and squared terms

     If X has n columns, the order of the columns of D for a full
     quadratic model is:

        * The constant term.
        * The linear terms (the columns of X, in order 1,2,...,n).
        * The interaction terms (pairwise products of columns of X, in
          order (1,2), (1,3), ..., (1,n), (2,3), ..., (n-1,n).
        * The squared terms (in the order 1,2,...,n).

     Other models use a subset of these terms, in the same order.

     Alternatively, MODEL can be a matrix specifying polynomial terms of
     arbitrary order.  In this case, MODEL should have one column for
     each column in X and one r for each term in the model.  The entries
     in any r of MODEL are powers for the corresponding columns of X.
     For example, if X has columns X1, X2, and X3, then a row [0 1 2] in
     MODEL would specify the term (X1.^0).*(X2.^1).*(X3.^2).  A row of
     all zeros in MODEL specifies a constant term, which you can omit.

     'D = x2fx (X, MODEL, CATEG)' treats columns with numbers listed in
     the vector CATEG as categorical variables.  Terms involving
     categorical variables produce dummy variable columns in D.  Dummy
     variables are computed under the assumption that possible
     categorical levels are completely enumerated by the unique values
     that appear in the corresponding column of X.

     'D = x2fx (X, MODEL, CATEG, CATLEVELS)' accepts a vector CATLEVELS
     the same length as CATEG, specifying the number of levels in each
     categorical variable.  In this case, values in the corresponding
     column of X must be integers in the range from 1 to the specified
     number of levels.  Not all of the levels need to appear in X.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 36
Convert predictors to design matrix.



# name: <cell-element>
# type: sq_string
# elements: 1
# length: 5
ztest


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 2142
 -- statistics: H = ztest (X, V, SIGMA)
 -- statistics: H = ztest (X, V, SIGMA, NAME, VALUE)
 -- statistics: [H, PVAL] = ztest (...)
 -- statistics: [H, PVAL, CI] = ztest (...)
 -- statistics: [H, PVAL, CI, ZVALUE] = ztest (...)

     One-sample Z-test.

     'H = ztest (X, V)' performs a Z-test of the hypothesis that the
     data in the vector X come from a normal distribution with mean M,
     against the alternative that X comes from a normal distribution
     with a different mean M.  The result is H = 0 if the null
     hypothesis ("mean is M") cannot be rejected at the 5% significance
     level, or H = 1 if the null hypothesis can be rejected at the 5%
     level.

     X may also be a matrix or an N-D array.  For matrices, 'ztest'
     performs separate tests along each column of X, and returns a
     vector of results.  For N-D arrays, 'ztest' works along the first
     non-singleton dimension of X.  V and SIGMA must be a scalars.

     'vartest' treats NaNs as missing values, and ignores them.

     '[H, PVAL] = ztest (...)' returns the p-value.  That is the
     probability of observing the given result, or one more extreme, by
     chance if the null hypothesisis true.

     '[H, PVAL, CI] = ztest (...)' returns a 100 * (1 - ALPHA)%
     confidence interval for the true mean.

     '[H, PVAL, CI, ZVALUE] = vartest (...)' returns the value of the
     test statistic.

     '[...] = vartest (..., NAME, VALUE), ...' specifies one or more of
     the following name/value pairs:

          Name           Value
     ---------------------------------------------------------------------------
          "alpha"        the significance level.  Default is 0.05.
                         
          "dim"          dimension to work along a matrix or an N-D array.
                         
          "tail"         a string specifying the alternative hypothesis:
             "both"      "variance is not V" (two-tailed, default)
             "left"      "variance is less than V" (left-tailed)
             "right"     "variance is greater than V" (right-tailed)

     See also: ttest, vartest, signtest, kstest.


# name: <cell-element>
# type: sq_string
# elements: 1
# length: 18
One-sample Z-test.





