Putting testing to the test

In response to the lack of evidence surrounding debate in the US over whether pupils are being over tested, the Council of the Great City Schools has conducted a detailed study on testing. They examined test practices at primary and secondary level in 66 of the largest urban school districts during the 2014-15 school year.

The authors found that the average pupil took eight standardised tests a year. Grade 8 (Year 9) pupils were tested the most, spending an average of 4.22 days per year being tested. Yet there was no correlation between the amount of test time and maths and reading achievement.

The study also revealed a number of problems with testing. States reported having to wait 2-4 months for school-level test results meaning the data could not usefully guide teaching, test results were used in ways they weren’t intended to be (eg, to judge an individual staff member’s performance), the tests themselves were not an accurate measure of content knowledge, and pupils were tested in the same subject more than once for different reasons.

A survey of the parents revealed that they support testing that accurately reflects their child’s performance in school, and that they do not support more difficult tests.

Source: Student testing in America’s great city schools: An inventory and preliminary analysis (2015), Council of the Great City Schools.

No evidence of “gaming” in Reception Baseline trial

From 2016, when children start school in England they will be given an initial assessment, called the Reception Baseline assessment. This will be used as the starting point from which their progress through school will be measured. The Department for Education (DfE) has published new research, including a Randomised Controlled Trial (RCT), which aimed to investigate schools’ behaviour changes in response to this accountability reform.

The RCT was carried out in autumn 2014, and explored whether schools’ perceptions of the purpose of the assessment led to differences in pupils’ early attainment, and in particular if there was any evidence of “gaming”. There might be a concern that schools would lower pupils’ results to show greater progress later on.

A sample of 153 schools (5,368 eligible pupils) were randomly allocated into two groups, with one group told that the assessment would be used for accountability purposes (the Accountability Group) and the other told it would only be used as a teaching and learning aid (the Teaching and Learning Group).

In the two treatment groups, the mean score within the Accountability Group was 2.7 marks (4.2%) less than those within the Teaching and Learning Group, and this reduction was seen in the two subject areas making up the test – maths and reading. However, once the correlation between pupils within schools was taken into account, the result was no longer statistically significant.

The report concludes that the trial found no strong evidence that framing the Reception Baseline assessment as an accountability measure as opposed to a teaching and learning aid resulted in a reduction in test results.

Source: Reception Baseline Research: Results of a Randomised Controlled Trial. Research Brief (2015), Department for Education.

Monitoring inspections help head teachers to focus

A new report from Durham University forms part of a comparative study to measure the impact of school inspections on teaching and learning in eight European countries.

This report describes the results from three years of data collection in England, which ran from January 2011 to December 2013. Each year head teachers in primary and secondary schools were asked to complete an online survey. The survey included questions on educational quality and change capacity in schools, changes made in the quality and change capacity of the school, inspection activities in the school, the school’s acceptance and use of feedback, the extent to which inspection standards set expectations and promote self-evaluations, and choice/voice/exit of stakeholders in response to inspection reports. The survey results were used to create a number of scales, such as capacity building, school effectiveness, setting expectations, and accepting feedback.

The authors found that on all the scales used, in the first two years of data collection, schools that received their main inspection and an extra monitoring inspection scored higher on average than the schools that received only a main inspection. In the third year, this was also true on almost all scales. A number of these differences (particularly the scales where schools were commenting on their improvement activities compared to last year) were large and statistically significant in the first year of data collection.

Source: Years 1, 2 and 3 Principal Survey Data Analysis: England (2014), Centre for Evaluation & Monitoring, Durham University.

Test results don’t show how effective teachers are

A new study has looked at the link between instructional alignment (how teaching is aligned with standards and assessments), value-added measures of teacher effectiveness, and composite measures of teacher effectiveness using multiple measures.

The issue is important as, in the US and around the world, there is more emphasis on measuring teacher effectiveness and rewarding effective teachers. The study looked at 324 teachers of fourth and eighth grade (Year 5 and Year 9) mathematics and English language arts in five US states. They completed a Survey of Enacted Curriculum to measure their instructional alignment. This was then compared with value-added measures (taken from state assessments and two supplementary assessments) and teacher effectiveness (using Framework for Teaching scores, widely used by states).

The results showed modest evidence of a relationship between instructional alignment and value-added measures, although this disappeared when controlling for pedagogical quality. The one significant relationship they found was that the association between instructional alignment and value-added measures is more positive when pedagogy is high quality. There was no association between instructional alignment and measures of teacher effectiveness.

These results suggest that the tests used for calculating value-added measures are not able to detect differences in the content or quality of classroom teaching.

Source: Instructional Alignment as a Measure of Teaching Quality (2014), Education Evaluation and Policy Analysis, online first, May 2014.

Good schools make a difference

A new article, published online in Urban Education, looks at the impact of family, school, and neighbourhood contextual characteristics on the outcomes of children growing up in poverty. Using data on 424 children from seven schools in deprived areas of Chicago, the authors examined four school performance outcomes including children’s maths and reading levels, grades repeated, and behavioural problems. They conclude that the study validates the impact of poverty and other adversities on a child’s school achievement and behaviours.

They found negative associations at the family level; for example, household size and household adversity were significantly associated with the increased probability of repeating a grade, and children not living with their fathers were more likely to repeat a grade or have behavioural problems. There were also negative associations at a community level; for example, low neighbourhood education levels were negatively associated with children’s maths and reading scores.

However, children enrolled in high-performing schools had higher reading scores and higher maths scores compared with those from mid/low-performing schools. The authors suggest that interventions aiming to improve the quality of schools may mediate the negative effects of individual and neighbourhood disadvantages on children’s school performance.

Source: School and Behavioral Outcomes Among Inner City Children: Five-Year Follow-Up (2013), Urban Education.

High-stakes tests may damage teaching quality

A new article published in the American Educational Research Journal has found that the quality of instructional support (ie, teaching methods and classroom organisation) is lower when teachers are under the greatest pressure to increase test performance.

The authors used two years of observation data from a cohort of US pupils who were first graders (Year 2) during the 2007–08 school year. A total of 348 observations took place in 23 classrooms in eight selected schools, when the children were in second grade and third grade (Years 3 and 4).

Using the Classroom Assessment Scoring System (CLASS), the researchers found that in the months leading up to high-stakes testing in Year 4, teachers in these classrooms offered lower levels of instructional support than Year 3 teachers who were not experiencing the same level of accountability pressure. However, observations after the tests revealed the quality of instructional support was indistinguishable between Years 3 and 4.

The authors suggest that accountability policies do not necessarily need to have negative consequences for classroom quality, but could be designed to improve it by including relevant measures.

Source: Pressures of the Season: An Examination of Classroom Quality and High-Stakes Accountability (2013), American Educational Research Journal, 50(5).