The reader-friendliness of effect sizes

Effect size has long been the standard measurement used in educational research. This commonality allows for comparison across studies, between programmes, and so on. It’s a tricky statistic, though, because its implications are not necessarily understood by the typical consumer of research. For example, saying that a programme has an effect size of +0.13 is likely to be less meaningful to the layperson than saying that a programme yielded a gain of one month’s learning.

In an effort to make effect sizes more reader-friendly, writers sometimes translate effect sizes into terms easier to understand, most often into units of time, such as days/years of learning. Yet research statisticians warn that what is gained in understandability may be lost in accuracy.

In an article appearing on Educational Researcher’s Online First site, RAND’s Matthew Baird and John Pane compared the “years of time” translation to three other reader-friendly measures, rating which were most and least accurate reflections of effect sizes: to benchmarking against similar groups in other studies, to percentile growth, and to determining the probability of meeting a certain threshold. Specifically, Baird and Paine used data from a 2017 evaluation of personalised learning that reported detailed assessment procedures, data structure and methods of analysis, applying this information to calculate whether each reader-friendly term incorporated six properties they deemed necessary to promote accuracy between the effect size reported and the more reader-friendly terms in which it was stated.

Results showed that the units of time translation was in fact the least accurate, while the percentile gains option yielded the best results.

Source: Translating standardized effects of education programs into more interpretable metrics (2019), Educational Researcher DOI: 10.3102/0013189X19848729

This study is discussed further in this blogpost by Robert Slavin.

How much is enough?

There have now been many controlled studies of preventive mental health interventions for young people. For these studies to be useful, practitioners need to know whether the effects shown for a particular intervention are modest, moderate, or large.

Emily Tanner-Smith and colleagues summarised more than 400 mean effect size estimates from 74 meta-analyses that synthesised findings from many trials. All the trials were of programmes aimed at preventing problematic behaviour or emotional problems for young people aged 5-18. The results, published in Prevention Science, indicate that, with few exceptions, the median average effect sizes on various outcomes fell within the range of +0.07 to +0.16. The authors advise that these indicate the level of improvement that has been achieved to date and can serve as a benchmark for assessing the value of new findings.

The report also points out that prevention programmes yielded larger effects on knowledge than on actual behaviour. Providing information to increase knowledge (e.g., about the risks of drug use) is an important component of many programmes, but knowledge does not always correlate strongly with actual behaviour.

Source: Empirically Based Mean Effect Size Distributions for Universal Prevention Programs Targeting School-Aged Youth: A Review of Meta-Analyses (August 2018) Prevention Science