Using a newly-created teacher evaluation instrument, Inter-rater Reliability (IRR) analyses were conducted on four teacher videos as a means to establish instrument reliability. Raters included 42 principals and assistant principals in a southern US school district. The videos used spanned the teacher quality spectrum and the IRR findings across these levels varied. Key findings suggest that while the overall IRR coefficient may be adequate to assess the validity of a classroom observation instrument, the overall coefficient may be unstable across the various teacher performance levels. Findings also strongly suggest that raters are much more likely to agree when they see high-quality teaching when compared to levels of agreement regarding low-quality teaching.
Rights and Access Note
This is original work, not published in any other journal.
Zepeda, S. J., & Jimenez, A. M. (2019). Teacher Evaluation and Reliability: Additional Insights Gathered from Inter-rater Reliability Analyses. Journal of Educational Supervision, 2 (2). https://doi.org/https://doi.org/10.31045/jes.2.2.2