The pitfalls of scaling up educational interventions
Written by Jacob Strauss and edited by Dr Jayne Pickering. Jacob is a PhD student at Loughborough University. Please see here for more information about Jacob and his work.
How does education research transition to practice? The usual approach is something like this:
Phase 1: start with a small-scale study
Phase 2: repeat phase 1 using a much larger sample
Phase 3: communicate research findings to schools, policymakers, and other educational professionals.
Phase 1 is riddled with problems. Many interventions fail. Sometimes the theory is not strong. Sometimes the methodological design is not sound. That may feel obvious; if we already knew the best possible ways to do everything, then we wouldn’t need research at all. What is perhaps less obvious, is that much promising research also collapses at phase 2.
There are many examples of interventions which failed to scale up. The Parent Academy, a programme designed to equip toddler’s parents with skills to support their children’s learning, initially showed outstanding promise. The Educational Endowment Fund (EEF) spent nearly a million pounds on implementing Parent Academy, but the initiative failed miserably. The Collaborative Reading Strategy, a programme designed to increase reading comprehension, failed to reproduce the same benefits at a large-scale that were observed in initial trials. Project CRISS, a professional development programme for teachers, showed promising results in the initial research stages that were later overturned in a larger study. The infamous class-reduction-size study, Project STAR, failed to replicate the benefits of reduced classroom sizes in the large-scale and expensive Program Challenge and Basic Education Programme.
In principle, scaling up seems like an easy, almost trivial, task. Simply take an existing intervention with proven success on a small scale and apply it to a larger scale. The reality is starkly different.
Each of the above examples illustrate some manifestation of the “scaling effect”. The scaling effect is the net change in a treatment effect as a result of scaling, encompassing both positive and negative changes. Many people have attempted to generate models and theoretical frameworks that encapsulate the key factors contributing to the decline in efficacy of programmes at scale. For this post, I have combined these models into a single summary (below), which provides an overview of how a programme’s scalability is under threat at each stage of the knowledge-creation process.
Threats to scalability
Innovation
The Innovation Myth. Innovations are not always useful to schools.Whether a program is innovative is irrelevant; first and foremost it must be effective.
Sampling
Researcher Choice / Bias. Researchers may select a sample that benefits most from the program to boost its measure effects.
Homogenous Sampling. Data collection from a homogenous sample limits the study’s applicability to other groups.
Selection Bias. Those willing to participate in research may not be representative of the wider target population.
Non-Random Attrition. The measures of the treatment effect will not incorporate these people.
Data collection
Hawthorne Effect. The alteration of behaviour by participants due to their awareness of being observed.
John Henry Effect. The alteration in behaviour by those in a control group due their awareness of being in a control group.
Analysis
Confounding. Both individual- and school-level effects on learning can have a big impact on the effectiveness of a program.
Low statistical power. Underpowered studies fail to ensure an acceptable likelihood that differences in outcomes attributable to the program will be detected when they exist.
Policy implementation
Diseconomies of Scale. The cost per participant might increases as a program is scaled up making it expensive to maintain.
Overgeneralising. Overgeneralising a program’s applicability to a wide variety of situations and populations will distort the program’s effectiveness.
Practice
Poor Dissemination. Major breakdowns in going to scale comes from failing to disseminate findings in a way that communicates effectively with educators.
Program Drift. Individuals implementing the program may additionally make minor changes to the program to fit their context.
Incorrect Delivery / Dosage. The program may be incorrectly applied, delivered or dosed.
The Learn Effect Myth. It is not the program per se that generates effects, it is the activities students perform with this device.
Al-Ubaydli et al. (2019) offer advice to scholars, policymakers, and practitioners on the actions they can each do to prevent things going wrong at scale. Everyone has their part to play in the transition of research to practice. I will give a brief overview of one important issue: the representativeness of the situation.
Representativeness can refer to the sample. I.e., an intervention may work for a particular demographic but not another. Representativeness may also refer to the research context. Characteristics of research, such as having a high level of control and providing participants with a high level of support, vanish as a programme is scaled up. Contextual idiosyncrasies such as the efficacy of the teacher, the classroom culture or the in-class support from teaching assistants are often overlooked or unaccounted for when scaling up interventions. A potential solution is for researchers to use technology to standardise as much as possible and to conduct educational research as naturalistically as possible by setting up ecologically valid conditions.
Clarke and Dene (2009) describe 37 contextual variables that could influence the efficacy of a technology-based intervention. These variables were spread across five categories: (i) student variables, such as their access to technology or absentee record; (ii) teacher variables, such as their pedagogical beliefs or their prior professional development related to technology in classrooms; (iii) technology infrastructure conditions, such as the reliability of the equipment or its location in the school; (iv) school/class variables, such as the type of class schedule or the length of lessons; and finally, (v) the administrative variables, such as the level of support from the school’s administrators.
Clarke and Dene developed a ‘scalability index’ identifying which variables statistically interacted with the treatment, and thus were conditions for success. By identifying key features of the intervention’s context, Clarke and Dene were able to give policymakers a detailed depiction of the types of schools the computer game would be suitable for and what additional requirements were needed for the programme to scale up.
Scaling up educational interventions is extremely complicated, and this post barely scratches the surface of everything that could cause a scaling effect. The representativeness of the situation in which a study is conducted is one of the many ways the scaling effect could manifest, but it is often overlooked by researchers, policymakers and school leaders. The context in which research is conducted is sometimes counterproductively the most conducive for positive results. Research programmes are carefully checked that they are being implemented properly, participants might change their normal behaviours as a result of being observed and the organisational culture of schools and classrooms might be instrumental to the programme’s success. Before deciding whether to adopt an evidence-based practice, it is important to not only ask whether the sample is representative of the individuals who will be affected by these practices, but also whether the context is representative of the organization adopting these practices. And if in doubt, contact the researchers of the original study and ask them for advice on how to implement their research programme.
Centre for Mathematical Cognition
We write mostly about mathematics education, numerical cognition and general academic life. Our centre’s research is wide-ranging, so there is something for everyone: teachers, researchers and general interest. This blog is managed by Dr Bethany Woollacott, a research associate at the CMC, who edits and typesets all posts. Please email b.woollacott@lboro.ac.uk if you have any feedback or if you would like information about being a guest contributor. We hope you enjoy our blog!