Trying to resolve an unending confusion over correct practice:
Once upon a time, there were two distinct entities.
-
Significance Testing
- Fisher's evidential p-value (for significance testing): A data dependent random value. The concept of statistical inference. Fisher specified only the null hypothesis. Given that the null hypothesis is true, p is the probability of encountering an outcome of the observed magnitude or larger. It is evidence against the H0.The tail area of the p-value is known ONLY after the outcome is observed.
- The Neymann-Pearson Type I error rate or α. (for hypothesis testing): Must be set before experimenting.
Statistical testing as a mechanism for making decisions and guiding behaviour. Specified both the null and the alternate hypothesis. Mistakes occur when choosing between the two:
Type I Error: The Null Hypothesis is true, but you rejected it. Found an association where there wasn't any.
The Neymann-Pearson theory is about error control and NOT concerned with evidence gathering. NP theory does not apply to an individual study. No role for p-value. Freedman et al., 2007 explain how the origin of the nearly universal acceptance of the 5% cutoff point for significant findings is tied to the abridged form in which the chi-square table was originally published. Before computers and calculators could easily give quick approximations to the chi-square distribution, tables were used routinely. Because there is a different chi-square distribution corresponding to every possible value for the degrees of freedom, the tables could not give many points for any one distribution. The tables typically included values at 1%, 5%, and a few other levels, encouraging the practice of checking the chi-squared statistic calculated from one's data to see if it exceeded the cutoff levels in the table. In Neyman and Pearson's original formulation of hypothesis testing, the alpha level was supposed to be determined from contextual considerations, especially the cost of Type I and Type II errors. This thoughtful aspect of their theory was rapidly lost when the theory entered common scientific use.
Type II Error: The Null Hypothesis is false, but you accepted it. Failed to find the existing association.
Hypothesis Testing
What is valid practice then? When you calculate the p-value and reject H0 when p<α and accept it otherwise, you can only say: 100*α% false-rejection of the null with ongoing sampling is true The specific p-value itself is not relevant and should not be reported. You can only say whether or not a result fell in the rejection region but not where it fell. BECAUSE α is a probability of a SET of potential outcomes that may fall anywhere in the tail area of the distribution under the null hypothesis. We cannot know ahead of time which of these outcome will occur.
- MOST common scenario: If the measure of evidence is the point of interest, don't report error probabilities Report EXACT p-values. p-values can exaggerate the evidence against H0
- LESS common scenario eg. quality control experiments: If the error probabilities are the point of interest, don't report p-values.
- Hubbard R, Armstrong JS. Why We Don't Really Know What Statistical Significance Means: Implications for Educators. Journal of Marketing Education. 2006 Aug;28(2):114-120. Available from: http://dx.doi.org/10.1177/0273475306288399
- Goodman SN. p values, hypothesis tests, and likelihood: implications for epidemiology of a neglected historical debate. American journal of epidemiology. 1993 Mar;137(5). Available from: http://view.ncbi.nlm.nih.gov/pubmed/8465801.
- Biau DJJ, Jolles BM, Porcher R. P value and the theory of hypothesis testing: an explanation for new researchers. Clinical orthopaedics and related research. 2010 Mar;468(3):885-892. Available from: http://dx.doi.org/10.1007/s11999-009-1164-4.
- Johansson T. Hail the impossible: p-values, evidence, and likelihood. Scandinavian journal of psychology. 2011 Apr;52(2):113-125. Available from: http://dx.doi.org/10.1111/j.1467-9450.2010.00852.x.
0 Comments:
Post a Comment