Increased accountability for schools and educators—and more recently, teacher preparation programs –has led to a proliferation of new measures to document effectiveness. There have been several efforts to examine their validity and reliability, some of which have heightened awareness about their limitations. As such, there is an ongoing need to ensure that these tools demonstrate important measurement properties. For example, findings from data collected as part of the Measures of Effective Teaching (MET) project led to identification of suggested principles for using measures of effective teaching that include a focus on ensuring reliability and validity of measures (Bill & Melinda Gates Foundation, 2013).
The Regional Educational Laboratory for the Central states (REL Central) at Marzano Research worked with the Kansas State Department of Education (KSDE) to develop a tool for analyzing data from a candidate performance assessment. The tool and how it is used to monitor and improve reliability of scores was presented on Friday, September 30, 2016 at the Fall conference for the Council for Accreditation of Education Preparation (CAEPCon) in Washington, DC. Nikk Nelson, Licensure Consultant from the Kansas State Department of Education and two REL Central Researchers from RMC Research provided this presentation to over 100 audience members.
Data to generate the scorer feedback reports include over 6,500 KPTP scores that were collected over a period of five academic years. The feedback reports provide average overall scores and scores for each KPTP task for each academic year between 2010-11 and 2014-15 and for the aggregated five-year period. Standard deviations are also presented to show score variability. Scores are presented separately for a given rater and for all scorers at each point in time. Line charts summarize individual scorer ratings relative to state average ratings at each point in time and for the five-year period.
Participants may apply what they learned in this presentation by: (1) exploring approaches to improve reliability of their own candidate assessment tools; and (2) collecting data to support ongoing monitoring of score reliability, inform revisions to data collection tools, provide feedback to scorers, and to inform scorer training.