Jul 15, 2012

Understanding Advanced Statistics

Without a mathematics background can be confusing. So I am going to try and ferret out explanations that are simple and document them here.

Degrees of Freedom

It depends on the sample size as well as the number of variables in the model. Degree of freedom is defined as the number of independent components minus the number of parameters estimated.
df = Sample size - Number of parameters to be estimated

The degree of freedom is the number of pieces of "useful information".

For example, with a sample size of 1, there is no way a meaningful regression line can be estimated. The degree of freedom in this context is zero or even less.The data have no freedom to vary, and there is no parameter estimation or inference possible from a dataset with a sample size of just 1. When the sample size is 2, there can only be one regression line. There is one df for estimation. No alternative models can be explored. The freedom to further the study of association is limited. Data have limited freedom to vary.

Perfect Fit

Slope of the estimated regression line goes through all the data points and there is no residual. Also called unrealistic fit. They are unrealistic because they cannot be generalized. They may be true for just one dataset, your dataset and you cannot possibly make any generalized inference from a perfectly fitted model.

Over fitting

When the number of parameters to be estimated is higher than the number of observations in the sample, the model has been overfit. An overfit model lacks df.

A larger sample size means more data points and more residuals.


df in Non-traditional regression methods

Ridge regression, linear smoothers and smoothing splines, are not based on least-squares, and thus df defined in terms of dimensionality is not applicable to these modeling.

...
References:

0 Comments:

Post a Comment