Vital Remnants: CATS test

Showing posts with label CATS test. Show all posts

Tuesday, April 28, 2009

Time is up on Kentuckys testing plan

Susan Weston at the Prichard Commitee blog points out the following language from Senate Bill I:

Within thirty days of the effective date of this Act, the Kentucky Department of Education in collaboration with the Council on Postsecondary Education shall plan and implement a comprehensive process for revising the academic content standards in reading, language arts including writing, mathematics, science, social studies, arts and humanities, and practical living skills and career studies. The revision process shall include a graduated time table to ensure that all revisions are completed to allow as much time as possible for teachers to adjust their instruction before new assessments are administered.

Susan interprets this as meaning we should have word soon on the plan. I interpret this (call me a literalist) that we should already have word. The bill became effective, as Susan points out, on March 25. Thirty days after March 25 is April 24. April 24 was last Friday.

So where is it?

Tuesday, February 17, 2009

The death of the CATS test

Like other felines, CATS seems to have had nine lives. Since 1992, when the KIRIS tests made their debut, the testing system has proved the most controversial aspect of Kentucky's education reform. Time after time the test has taken hits for its inaccuracy, unreliability, unmanagability--not to mention its sheer intrusiveness in the education process.

If you want to know what it has been like for those of us who have tried to stop the nonsense all these years, just watch the scene in Star Wars where they try to attack the Death Star: the thing is just so big and seemingly invulnerable that every shot just bounces off.

I'd love to say that those of us in the Rebellion delivered the final blow to the thing by finding a vent somewhere where we used the Force and got it down the right hole, but in reality what has happened with the test is that it just plain petered out.

Was it Thomas Kuhn who said that intellectual revolutions come about not because one theory is refuted by another, but because the advocates of the reigning theory simply die off?

That may, in fact, be the situation with the CATS tests: those who swore the blood oath in 1990 to defend every aspect of the reform act to the death just faded away. How many legislators who actually voted for KERA are left? And isn't it an irony that one of the few left is the one who is bringing the test down?

This is the way the test ends:
This is the way the test ends:
This is the way the test ends:
Not with a bang, but with a whimper.

Tuesday, January 13, 2009

No one knows how much CATS costs, committee finds

Oh dear. While some of us were thinking that the problem with CATS was that it costs too much, all of a sudden we find out that that's not the problem. The problem is we don't know how much it costs. Turns out that our wonderful education bureaucrats have no idea how much the monstrous state education testing system is setting us back because of poor accounting.

All state auditors could determine is that it costs at least $18.6 million, a higher figure than has been reported before. But there is no way, given the state's poor accounting, to know the total cost of the tests because no figures are available to determine how much local school districts are spending, and amount that is likely to be very high.

"There isn't a mechanism to be able to determine the cost at the local level for the assessment testing," Brian Lykins, director of special audits in the auditor's office, told the Louisville Courier-Journal.

This comes at a bad time for supporters of the tests, since President of the Senate David Willliams has announced that he would like to see the test eliminated.

Stay tuned on this issue...

Monday, June 30, 2008

Discussion on state education testing now online

My televised discussion/debate on education testing with several other figures in state education in Kentucky that took place last Monday, June 23 is now online at Kentucky Educational Television (KET). You can access it directly by clicking here. Guests on the show were:

Sharron Oxendine, president of the Kentucky Education Association
Lu Young, superintendent of Jessamine County Schools
Tim Decker, an art teacher at Russell Middle School
Martin Cothran, senior policy analyst with The Family Foundation of Kentucky

Saturday, March 22, 2008

KDE fudges numbers in new audit report on writing portfolio

Well, the Kentucky Department of Education strikes again. Just as we publicly released the 2005-2006 Writing Portfolio Audit Report (because they didn't), they release the 2006-2007 Audit, which makes it look like there has been improvement in the scoring of portfolios, which are a significant part of the way schools in Kentucky are assessed and held accountable.

We had gotten a hold of the 2005-2006 audit of the portfolios which showed that a startlingly high percent of them were graded higher than they should have been. Why is this significant? Because it calls into question the argument that Kentucky students are doing as well as proponents of the 1990 education reforms say they are.

But on his blog Kentucky School News and Comment, Richard Day responds to my post. His first argument against my statements was that my characterization of the audit as "closely held" was inaccurate:

Cothran's suggestion that the audit reports are "closely held" seems to lack justification. KSN&C asked for and received the current 2006-2007 report from KDE without objection or inquiry within an hour this morning. By law, KDE must audit portfolios annually to monitor the degree of inter-rater reliability and the report is public record.

Okay, several things: First, I didn't say that KDE was unwilling to release the report when asked, or that anyone had to pry it out of their grip by force. Secondly, I wasn't talking about the 2006-2007 report, but the 2005-2006 report. So I am still a bit mystified as to what was wrong with my characterization of the report. I chose the expression very carefully: I said "closely held" because very few people outside of KDE were aware of its existence until we outed it. No legislator was made aware of it and it was absent from their website.

The question isn't whether the audit was closely held: the question is why it was closely held. Here we have been arguing over the fate of the CATS test for weeks and KDE has a report which bears directly on the issue and we're the ones to end up having to make it public. What gives?

But then Day gets directly to the issue of reliability, admitting it is a problem, during the course of which he quotes KDE apologist Lisa Gross:

KDE spokeswoman Lisa Gross told KSN&C,
As students become more refined writers, it is sometimes more difficult for two scorers to agree completely on their ability levels. The complexity of their work can cause a score to be "on the edge," meaning that another scorer could consider the work to be of slightly higher or slightly lower quality...What the overall audit results tell us is that we need to enhance scorer training and ensure that everyone is comfortable with the analytic model.
The results, and human nature, might also imply that close calls get bumped up. The only way to even approach reliability is through carefully described process.

And how long has KDE had to do this? Sixteen years. So what was the hold up? And why is it so easy to "bump up" the portfolio scores? Could it possibly be because THEY ARE COMPLETELY SUBJECTIVE!!!

Then Day argues that the process of grading has been changed, implying that this has solved the problem with the scoring. But has it been solved? One would think, if it was, then the results would be appreciably different in later audits. And, indeed, if you were to look at the figures Day posts on his blog, you would get that impression. Day posts the pages from the 2006-2007 audit which was released the same week as we released the 2005-2006 audit.

Lo and behold! The agreement between the original portfolio scores and the audited scores looks much higher! In the new 2006-2007 charts there are now figures of 90%+ where in the 2005-2006 there were figures as low as 8.89% They must be doing a better job of scoring! You would think so, anyway, given the way it is presented.

Think again.

Day uncritically posts the pages of the new audit without pointing out that, in an apparent attempt to hide the fact that the new audit shows that the problem is just as bad as it was the year before, the new audit changed the way it reported the figures. Whereas the 2005-2006 audit reported the percentage of portfolios that received the same ranking (Novice, Apprentice, Proficient, Distinguished) by the original grader and the auditor, the new audit reports portfolios that received the same grade or the next grade above or below it.

In other words, if one of the 2005-2006 portfolios received a "distinguished" ranking by the original grader but only a "proficient" grade by the auditor (one full step down), it was counted as having been scored differently. But in the 2006-2007 audit, the same situation was not counted as having been scored differently: it was considered to have been close enough. In the original audit, a portfolio was considered incorrectly graded if it was scored differently. In the new audit, it has to have been so far off as to be practically an act of incompetence to count as being scored incorrectly.

Look at the headings on the charts. On the 2005-2006 Audit, it says "Percent Agreement." But on the 2006-2007 charts it says "Percentage Agreement Exact and Adjacent." Heck, why don't they just say, "Percentage Agreement Exact and Inexact"? It would be no less accurate a characterization.

Now this is a fairly transparent attempt by education bureaucrats to make it look like there has been improvement where, in fact, there has been none. If you look at the actual numbers, they are about as bad as they were the year before. Fortunately, Mark Hebert of WHAS-TV in Louisville reported the numbers correctly, observing that they were just as bad as the year before.

What is ironic about this is that what KDE is doing is trying to hide the fact that portfolio scores (and therefore CATS scores themselves) have been inflated. And how do they accomplish this little exercise in deceit? By inflating scores!

I have dealt with KDE cant and deception for 18 years now, so it doesn't surprise me a bit. I lost any respect I had for these people a long time ago. But what does surprise me is that there are still people out there who fall for it.

Now Richard Day is a fair and honest guy, and I'm assuming he just didn't notice the fudged numbers, or didn't see it as necessary to not the difference between the way the two audits reported the numbers. But I can't resist throwing back the words he threw at me in his post, and point out that any belief that the scoring has improved "lacks justification."

Now Richard knows why he got that new report with so little coaxing.

Tuesday, March 18, 2008

CATS portfolio scores overstated, says audit

For Immediate Release
Contact: Martin Cothran
March 17, 2008
Phone: 859-329-1919

LEXINGTON, KY—A closely held portfolio audit conducted by state education bureaucrats calls an important aspect of the state CATS tests into question, according to a family advocacy group supporting proposed changes in state education testing. “This report is a blow to the idea that the CATS tests are a reliable indicator of how our students are performing,” said Martin Cothran, senior policy analyst for The Family Foundation of Kentucky. “The portfolios are a key element of the state CATS tests. If this audit says what it looks like it says, then we’ve got serious problems that lawmakers cannot afford to ignore.”

The Family Foundation is supporting Senate Bill 1, which would replace the CATS testing system with a more objective, easier to administer and grade multiple choice test that would give reliable scores for individual students.

The audit, conducted by the Kentucky Department of Education (KDE), shows that writing portfolio grades given by CATS graders in 2005-2006 were dramatically higher than they should have been and that the higher the grade given, the less likely it was to be an accurate grade. Some agreement rates between how the portfolios were actually graded and how they should have been graded were lower than 20 percent.

KDE Portfolio Audit Agreement Rates¹

	Novice	Apprentice	Proficient	Distinguished
Grade 4	100% (24/24)	89.42% (186/208)	64.97% (230/354)	28.74% (25/87)
Grade 7	98.89% (178/180)	83.45% (237/284)	42.01% (92/219)	8.89% (4/45)
Grade 12	96.88% (31/32)	70.11% (190/271)	38.92% (65/167)	18.92% (7/37)
Total	98.73% (233/236)	80.34% (613/763)	52.3% (387/740)	21.3% (36/169)

“The audit seems to suggest,” said Cothran, “that about 75 percent of the portfolios ranked as “distinguished” in 2005-2006 were graded too high, and almost half of portfolios rated “proficient” were given higher grades than they deserved. If this audit is an accurate picture of how portfolios are being graded, then the problems with CATS are even worse that some of us thought they were.”

Out of the 136 portfolios that were originally scored distinguished, only 36 were scored distinguished after the audit. Out of 740 portfolios that were originally scored proficient, only 387 scored proficient after the audit. “One of the justifications for doing this kind of testing under KERA in the first place was to avoid the ‘Lake Wobegone Effect’ of norm-referenced testing,” said Cothran, “which occurs when a majority of students score above average. But findings like this seem to indicate that the Lake Wobegone Effect is alive and well right here in Kentucky.”

The Family Foundation supports the use of portfolios for instructional rather than assessment purposes.

###

¹ “Kentucky Commonwealth Accountability Testing System 2005-2006 Writing Portfolio Audit Report,” Appendix J, p. 24.

Saturday, March 08, 2008

Story on passage of CATS testing bill

I was quoted in today's Lexington Herald-Leader story on the Senate's approval of SB 1, which would replace the CATS test.

Friday, March 07, 2008

Press Release: Senate approves testing change

FOR IMMEDIATE RELEASE
March 7, 2008
Contact: Martin Cothran
Phone: 859-329-1919

“It’s time to put CATS to sleep.”
— Martin Cothran

Family group praises Senate vote to replace CATS

LEXINGTON, KY—“We are pleased to see that lawmakers are finally heeding years of calls for changing the state’s testing system,” said Martin Cothran, senior policy analyst for The Family Foundation. “The CATS test has had 10 years to prove itself—18 if you count its prior incarnation as KIRIS—and it has yet to do so. How many generations of Kentucky’s children are we going to hold hostage to the reputations of those who have tied their political fortunes to the success of KERA? It’s time to put CATS to sleep.”

Senate Bill 1 was passed by the Kentucky State Senate in a 22-15 vote. The bill would replace the CATS testing system with a more objective, easier-to-administer-and-grade multiple choice test that would give reliable scores for individual students. CATS currently includes “open response” questions and portfolios that have been criticized as subjective and unreliable.

“This bill would give our testing system five things it doesn’t have now,” said Cothran. “It would be objective, easy to administer, easy to grade, reliable on an individual level, and would give us quicker feedback on how our students—and schools—are doing.”

Cothran was one of the most vocal critics of the testing system, and his criticisms led to some of the changes in KIRIS that resulted in the CATS test. He also served on the Assessment and Accountability Subcommittee of the Governor’s Task on Education Reform under former Gov. Paul Patton. He was the author of the minority report for the subcommittee.

###

Question for KEA: Where do your members stand on changing the CATS test?

Sharron Oxendine says her teachers union, the KEA, . Interestingly, the KEA also opposes Senate Bill 1, which would replace the CATS test with something that is actually valid and reliable.

Now I've got a question for Sharron: Are you telling Kentuckians that there are more members of your organization who support raising taxes than there are members who are in favor of changing the CATS test?

Here's your chance to demonstrate how representative you are of your members. Let's see the numbers, huh?

Thursday, March 06, 2008

Prichard Committee touting flawed critique of SB 1

Georgetown College's Center for Advanced Study of Assessment (CASA), it turns out, is not so expert after all. In a recent report being touted by the Prichard Committee, Skip Kifer, Ben Oldham and Tom Guskey criticized Senate Bill 1, which would replace the CATS test with tests that are actually objective, reliable, and useful. But, as it turns out, according to another testing expert, several of their criticisms got basic things wrong about the CATS test, calling into question the report's credibility.

George Cunningham, an emeritus professor from the University of Louisville, the author of numerous books on educational testing and a nationally recognized measurement expert, points out that the CASA report made fundamental errors in describing the CATS tests and what SB1 would do.

Is CATS a "standards-based" test?
The CASA report made the assertion that the CATS test was "criterion-referenced" or "standards based", and that SB1 would replaced it with a "norm-referenced" test:

The new legislation, while not requiring an off-the-shelf set of tests, appears to favor such an approach by requiring norm-referenced tests for individual students rather than the criterion-referenced or standards-based ones which historically the Commonwealth has used to measure school outcomes. (p. 7)

"The authors are confused," says Cunningham, "or perhaps just dated in their use of measurement terminology." "The criticism of SB1 tests that they will be norm-referenced is nonsensical because the current test, CATS, is also norm-referenced."

Oops. It might be a good idea, folks, before we start defending the CATS test to know what kind of test it is.

He points out that to say that CATS is somehow "standards-based" is misleading, and that it is only standards based in the same sense that all test are standards based:

The term “criterion-referenced” has lost its meaning. At one time it referred to the process of reporting results on an objective-by-objective basis and it was closely associated with mastery learning. Outside of special education, it would be difficult to find examples of this sort of criterion-referenced testing. Certainly, neither KIRIS nor CATS was ever criterion-referenced in this sense. Because the term apparently focus-groups well, a more modern usage of the term has emerged.

Ouch.

A "criterion-referenced" test is one that sets forth certain objective criteria and the score depends upon how a student meets those criteria. If a student, say, gets 6 out of 10 questions right, and 60 percent is a D on a predetermined grading scale, then the student gets a "D". A "norm-referenced" test is like test graded on a curve. If a student gets the same 6 out of 10, but the average in the class is a 6 out of 10, then the student gets a "C".

Cunningham's point is that neither the the KIRIS (the CATS before 1998) or CATS tests (KIRIS after 1998)--or the tests proposed by SB1 are "criterion-referenced". They're all norm-referenced. Of course Bob Sexton and the Prichard Commitee have been spreading this disinformation for years despite the fact that it has been pointed out publicly a number of times. In fact, I pointed it out in an opinion piece in the Herald Leader after the CATS test was first implemented.

Can multiple-choice tests measure complex knowledge and skills?
The CASA report repeats the completely unfounded assertion that multiple choice tests have some problem measuring advanced knowledge and skills:

The major strength of multiple-choice items in an assessment is that they are efficient. That is, in a relatively short amount of time, it is possible to get information about array of knowledge and skills. Their strength is not in measuring complex skills and knowledge.

Wrong again, Cunningham points out. "High quality, reliable and valid, off-the-shelf, standardized achievement tests are available to assess reading and math," he says, "...These available tests also do a good job of assessing high level thinking skills." In fact, Cunningham apparently considers the error bad enough to call CASA's credentials into question:

It is a little surprising to read a statement like this written by members of an organization that claims to focus on the advanced study of assessment. A more nuanced discussion about test type and high level thinking might be expected...It is axiomatic in educational measurement, that high level thinking is measured well by multiple-choice items. The authors should know this.

That's about as strong as academic take downs get. Once again, multiple choice tests can accurately and reliably measure high level thinking skills. In fact, it's done all the time. Just repeating a discredited view that they can't doesn't make it true.

I should point out here that I have questions concerning how well writing skills can be assessed using any system of measurement. Only another competent writer can assess competent writing. But that is not what is at issue here.

Are multiple choice tests less reliable for assessing schools?
The CASA report argues that the CATS test is a better measure of school performance than the more objective tests proposed by SB 1:

SB 1 changes the fundamental purpose of the assessment from emphasizing school outcomes to measuring individual student achievements. This, of course, has consequences. The most important one is whether the new emphasis and assessment is a better measure what Kentucky wants its schools to do ... The assessment envisaged by SB 1 would take, by design, a substantially narrower sample of the domain of desirable outcomes. (p. 8)

Well, not so fast. Says Cunningham, "There is no reason that test scores cannot be valid for both individual students and schools. Actually, the validity of school scores is dependent on the validity of individual students."

Kifer, Oldham, and Gusky acknowledge that matrix sampling renders individual students scores unusable but they claim that they make the school scores better. They assert that the SB 1 test sacrifices the validity of the school scores to get individual scores. While it is true that it is possible to include more open-ended items if multiple forms are used, by using a multiple-choice format even more items can be included, more than enough to compensate for the broader coverage from matrix sampling.

One wonders if the Prichard Committee had a role in getting this self-serving report produced in the first place, or whether they were attracted by the misinformative nature of it after the fact, and simply saw another opportunity to serve up disinformation. We do know that Helen Mountjoy, the Governor's education secretary requested the report, and that Mountjoy has long been a blind apologist for the state's flawed testing system. She has worked hand in glove with the Prichard Committee to oppose attempts to address the flaws in the tests. In any case, one wonders why there are those who still consider these people reliable sources of information.

Wednesday, March 05, 2008

Time to Put CATS to Sleep

Richard Day over at Kentucky School News and Comments offers coverage of Gov. Steve Beshear's press conference today attacking Senate Bill 1 , which would replace the controversial CATS test, saying, "The governor said the proposal has multiple flaws, and called on lawmakers to reject it."

Uh, wait a minute. Isn't that the exact argument being used against the CATS tests in the first place?

Bad choice of words, no doubt. But it does point up the incredible double standard going on here. Why are flaws in a bill an argument against the bill, but flaws in CATS--which have been pointed out, documented, argued over, fussed about, bemoaned, and, of course, swept under the rug--are not considered an argument against CATS?

Let's just cover briefly several qualities a test should have that CATS doesn't have:

Objectivity
Accuracy on an individual student level
Reliability
Ability to receive scores back in a reasonable amount of time
Promotes basic skills

Now if you were told that a test you were considering didn't have these qualities, what in the world would possess you to use it? And how could you justify spending millions of dollars and countless man hours on the part of teachers and administrators to administer it?

Despite having no good answer to this question, we are still spending way too much money on the test, and there are still people willing to risk their credibility to defend it.

Go figure.