Practical Assessment, Research & Evaluation, Vol 18, No 10 Page 9
De Winter; t-test with extremely small Ns
References
Bem, D. J., Utts, J., & Johnson, W. O. (2011). Must
psychologists change the way they analyze their data?
Journal of Personality and Social Psychology, 101, 716–719.
Blair, R. C., Higgins, J. J., & Smitley, W. D. (1980). On the
relative power of the U and t tests. British Journal of
Mathematical and Statistical Psychology, 33, 114–120.
Box, J. F. (1987). Guinness, Gosset, Fisher, and small
samples. Statistical Science, 2, 45–52.
Bridge, P. D., & Sawilowsky, S. S. (1999). Increasing
physicians’ awareness of the impact of statistics on
research outcomes: comparative power of the t-test and
Wilcoxon rank-sum test in small samples applied
research. Journal of Clinical Epidemiology, 52, 229–235.
Campbell, M. J., Julious, S. A., & Altman, D. G. (1995).
Estimating sample sizes for binary, ordered categorical,
and continuous outcomes in two group comparisons.
BMJ: British Medical Journal, 311, 1145–1148.
Cohen, J. (1970). Approximate power and sample size
determination for common one-sample and two-
sample hypothesis tests. Educational and Psychological
Measurement, 30, 811–831.
Conover, W. J., & Iman, R. L. (1981). Rank transformations
as a bridge between parametric and nonparametric
statistics. The American Statistician, 35, 124–129.
De Winter, J. C. F., & Dodou, D. (2010). Five-point Likert
items: t test versus Mann-Whitney-Wilcoxon. Practical
Assessment, Research & Evaluation, 15, 11.
Elliott, A. C., & Woodward, W. A. (2007). Comparing one or
two means using the t-test. Statistical Analysis Quick
Reference Guidebook. SAGE.
Fay, M. P., & Proschan, M. A. (2010). Wilcoxon-Mann-
Whitney or t-test? On assumptions for hypothesis tests
and multiple interpretations of decision rules. Statistics
Surveys, 4, 1–39.
Fitts, D. A. (2010). Improved stopping rules for the design
of efficient small-sample experiments in biomedical and
biobehavioral research. Behavior Research Methods, 42, 3–
22.
Fritz, C. O., Morris, P. E., & Richler, J. J. (2012). Effect size
estimates: Current use, calculations, and interpretation.
Journal of Experimental Psychology: General, 141, 2–18.
Harlow, H. F., Dodsworth, R. O., & Harlow, M. K. (1965).
Total social isolation in monkeys. Proceedings of the
National Academy of Sciences of the United States of America,
54, 90–97.
Hesterberg, T., Moore, D. S., Monaghan, S., Clipson, A., &
Epstein, R. (2005). Bootstrap methods and permutation
tests. Introduction to the Practice of Statistics, 5, 1–70.
Ingre, M. (in press). Why small low-powered studies are
worse than large high-powered studies and how to
protect against “trivial” findings in research: Comment
on Friston (2012). NeuroImage.
Ioannidis, J. P. (2005). Why most published research
findings are false. PLOS Medicine, 2, e124.
Ioannidis, J. P. (2008). Why most discovered true
associations are inflated. Epidemiology, 19, 640–648.
Ioannidis, J. P. (2013). Mega-trials for blockbusters. JAMA,
309, 239–240.
Janušonis, S. (2009). Comparing two small samples with an
unstable, treatment-independent baseline. Journal of
Neuroscience Methods, 179, 173–178.
Lehmann, E. L. (2012). “Student” and small
-sample theory. In
Selected Works of E.L. Lehmann (pp. 1001–1008).
Springer US.
Ludbrook, J., & Dudley, H. (1998). Why permutation tests
are superior to t and F tests in biomedical research. The
American Statistician, 52, 127–132.
Mathes, E.W., King, C.A., Miller, J.K. and Reed, R.M.
(2002). An evolutionary perspective on the interaction
of age and sex differences in short-term sexual
strategies. Psychological Reports, 90, 949–956.
Mesholam, R. I., Moberg, P. J., Mahr, R. N., & Doty, R. L.
(1998). Olfaction in neurodegenerative disease: a meta-
analysis of olfactory functioning in Alzheimer’s and
Parkinson’s diseases. Archives of Neurology, 55, 84–90.
Mudge, J. F., Baker, L. F., Edge, C. B., & Houlahan, J. E.
(2012). Setting an optimal α that minimizes errors in
null hypothesis significance tests. PLOS One, 7, e32734.
OECD (2010), PISA 2009 Results: What students know and
can do – Student performance in reading, mathematics
and science (volume I)
http://dx.doi.org/10.1787/9789264091450-en
Pashler, H., & Harris, C. R. (2012). Is the replicability crisis
overblown? Three arguments examined. Perspectives on
Psychological Science, 7, 531–536.
Posten, H. O. (1982). Two-sample Wilcoxon power over the
Pearson system and comparison with the t-test. Journal
of Statistical Computation and Simulation, 16, 1–18.
Ramsey, P. H. (1980). Exact Type 1 error rates for
robustness of Student’s t test with unequal variances.
Journal of Educational and Behavioral Statistics, 5, 337–349.