We had gotten a hold of the 2005-2006 audit of the portfolios which showed that a startlingly high percent of them were graded higher than they should have been. Why is this significant? Because it calls into question the argument that Kentucky students are doing as well as proponents of the 1990 education reforms say they are.
But on his blog Kentucky School News and Comment, Richard Day responds to my post. His first argument against my statements was that my characterization of the audit as "closely held" was inaccurate:
Cothran's suggestion that the audit reports are "closely held" seems to lack justification. KSN&C asked for and received the current 2006-2007 report from KDE without objection or inquiry within an hour this morning. By law, KDE must audit portfolios annually to monitor the degree of inter-rater reliability and the report is public record.Okay, several things: First, I didn't say that KDE was unwilling to release the report when asked, or that anyone had to pry it out of their grip by force. Secondly, I wasn't talking about the 2006-2007 report, but the 2005-2006 report. So I am still a bit mystified as to what was wrong with my characterization of the report. I chose the expression very carefully: I said "closely held" because very few people outside of KDE were aware of its existence until we outed it. No legislator was made aware of it and it was absent from their website.
The question isn't whether the audit was closely held: the question is why it was closely held. Here we have been arguing over the fate of the CATS test for weeks and KDE has a report which bears directly on the issue and we're the ones to end up having to make it public. What gives?
But then Day gets directly to the issue of reliability, admitting it is a problem, during the course of which he quotes KDE apologist Lisa Gross:
And how long has KDE had to do this? Sixteen years. So what was the hold up? And why is it so easy to "bump up" the portfolio scores? Could it possibly be because THEY ARE COMPLETELY SUBJECTIVE!!!
KDE spokeswoman Lisa Gross told KSN&C,The results, and human nature, might also imply that close calls get bumped up. The only way to even approach reliability is through carefully described process.
As students become more refined writers, it is sometimes more difficult for two scorers to agree completely on their ability levels. The complexity of their work can cause a score to be "on the edge," meaning that another scorer could consider the work to be of slightly higher or slightly lower quality...What the overall audit results tell us is that we need to enhance scorer training and ensure that everyone is comfortable with the analytic model.
Then Day argues that the process of grading has been changed, implying that this has solved the problem with the scoring. But has it been solved? One would think, if it was, then the results would be appreciably different in later audits. And, indeed, if you were to look at the figures Day posts on his blog, you would get that impression. Day posts the pages from the 2006-2007 audit which was released the same week as we released the 2005-2006 audit.
Lo and behold! The agreement between the original portfolio scores and the audited scores looks much higher! In the new 2006-2007 charts there are now figures of 90%+ where in the 2005-2006 there were figures as low as 8.89% They must be doing a better job of scoring! You would think so, anyway, given the way it is presented.
Day uncritically posts the pages of the new audit without pointing out that, in an apparent attempt to hide the fact that the new audit shows that the problem is just as bad as it was the year before, the new audit changed the way it reported the figures. Whereas the 2005-2006 audit reported the percentage of portfolios that received the same ranking (Novice, Apprentice, Proficient, Distinguished) by the original grader and the auditor, the new audit reports portfolios that received the same grade or the next grade above or below it.
In other words, if one of the 2005-2006 portfolios received a "distinguished" ranking by the original grader but only a "proficient" grade by the auditor (one full step down), it was counted as having been scored differently. But in the 2006-2007 audit, the same situation was not counted as having been scored differently: it was considered to have been close enough. In the original audit, a portfolio was considered incorrectly graded if it was scored differently. In the new audit, it has to have been so far off as to be practically an act of incompetence to count as being scored incorrectly.
Look at the headings on the charts. On the 2005-2006 Audit, it says "Percent Agreement." But on the 2006-2007 charts it says "Percentage Agreement Exact and Adjacent." Heck, why don't they just say, "Percentage Agreement Exact and Inexact"? It would be no less accurate a characterization.
Now this is a fairly transparent attempt by education bureaucrats to make it look like there has been improvement where, in fact, there has been none. If you look at the actual numbers, they are about as bad as they were the year before. Fortunately, Mark Hebert of WHAS-TV in Louisville reported the numbers correctly, observing that they were just as bad as the year before.
What is ironic about this is that what KDE is doing is trying to hide the fact that portfolio scores (and therefore CATS scores themselves) have been inflated. And how do they accomplish this little exercise in deceit? By inflating scores!
I have dealt with KDE cant and deception for 18 years now, so it doesn't surprise me a bit. I lost any respect I had for these people a long time ago. But what does surprise me is that there are still people out there who fall for it.
Now Richard Day is a fair and honest guy, and I'm assuming he just didn't notice the fudged numbers, or didn't see it as necessary to not the difference between the way the two audits reported the numbers. But I can't resist throwing back the words he threw at me in his post, and point out that any belief that the scoring has improved "lacks justification."
Now Richard knows why he got that new report with so little coaxing.