statsmodels.nonparametric.kernel_regression.KernelReg

class statsmodels.nonparametric.kernel_regression.KernelReg(endog, exog, var_type, reg_type='ll', bw='cv_ls', ckertype='gaussian', okertype='wangryzin', ukertype='aitchisonaitken', defaults=None)[source]

Nonparametric kernel regression class.

Calculates the conditional mean E[y|X] where y = g(X) + e. Note that the “local constant” type of regression provided here is also known as Nadaraya-Watson kernel regression; “local linear” is an extension of that which suffers less from bias issues at the edge of the support. Note that specifying a custom kernel works only with “local linear” kernel regression. For example, a custom tricube kernel yields LOESS regression.

Parameters:
  • endog (array_like) – This is the dependent variable.

  • exog (array_like) – The training data for the independent variable(s) Each element in the list is a separate variable

  • var_type (str) –

    The type of the variables, one character per variable:

    • c: continuous

    • u: unordered (discrete)

    • o: ordered (discrete)

  • reg_type ({'lc', 'll'}, optional) – Type of regression estimator. ‘lc’ means local constant and ‘ll’ local Linear estimator. Default is ‘ll’

  • bw (str or array_like, optional) – Either a user-specified bandwidth or the method for bandwidth selection. If a string, valid values are ‘cv_ls’ (least-squares cross-validation) and ‘aic’ (AIC Hurvich bandwidth estimation). Default is ‘cv_ls’. User specified bandwidth must have as many entries as the number of variables.

  • ckertype (str, optional) – The kernel used for the continuous variables.

  • okertype (str, optional) – The kernel used for the ordered discrete variables.

  • ukertype (str, optional) – The kernel used for the unordered discrete variables.

  • defaults (EstimatorSettings instance, optional) – The default values for the efficient bandwidth estimation.

bw

The bandwidth parameters.

Type:

array_like

Methods

aic_hurvich(bw[, func])

Computes the AIC Hurvich criteria for the estimation of the bandwidth.

cv_loo(bw, func)

The cross-validation function with leave-one-out estimator.

fit([data_predict])

Returns the mean and marginal effects at the data_predict points.

loo_likelihood()

r_squared()

Returns the R-Squared for the nonparametric regression.

sig_test(var_pos[, nboot, nested_res, pivot])

Significance test for the variables in the regression.