Effect size has long been the standard measurement used in educational research. This commonality allows for comparison across studies, between programmes, and so on. It’s a tricky statistic, though, because its implications are not necessarily understood by the typical consumer of research. For example, saying that a programme has an effect size of +0.13 is likely to be less meaningful to the layperson than saying that a programme yielded a gain of one month’s learning.
In an effort to make effect sizes more reader-friendly, writers sometimes translate effect sizes into terms easier to understand, most often into units of time, such as days/years of learning. Yet research statisticians warn that what is gained in understandability may be lost in accuracy.
In an article appearing on Educational Researcher’s Online First site, RAND’s Matthew Baird and John Pane compared the “years of time” translation to three other reader-friendly measures, rating which were most and least accurate reflections of effect sizes: to benchmarking against similar groups in other studies, to percentile growth, and to determining the probability of meeting a certain threshold. Specifically, Baird and Paine used data from a 2017 evaluation of personalised learning that reported detailed assessment procedures, data structure and methods of analysis, applying this information to calculate whether each reader-friendly term incorporated six properties they deemed necessary to promote accuracy between the effect size reported and the more reader-friendly terms in which it was stated.
Results showed that the units of time translation was in fact the least accurate, while the percentile gains option yielded the best results.
Source: Translating standardized effects of education programs into more interpretable metrics (2019), Educational Researcher DOI: 10.3102/0013189X19848729
This study is discussed further in this blogpost by Robert Slavin.