Tuesday, March 18, 2008

CATS portfolio scores overstated, says audit

For Immediate Release
Contact: Martin Cothran
March 17, 2008
Phone: 859-329-1919

LEXINGTON, KY—A closely held portfolio audit conducted by state education bureaucrats calls an important aspect of the state CATS tests into question, according to a family advocacy group supporting proposed changes in state education testing. “This report is a blow to the idea that the CATS tests are a reliable indicator of how our students are performing,” said Martin Cothran, senior policy analyst for The Family Foundation of Kentucky. “The portfolios are a key element of the state CATS tests. If this audit says what it looks like it says, then we’ve got serious problems that lawmakers cannot afford to ignore.”

The Family Foundation is supporting Senate Bill 1, which would replace the CATS testing system with a more objective, easier to administer and grade multiple choice test that would give reliable scores for individual students.

The audit, conducted by the Kentucky Department of Education (KDE), shows that writing portfolio grades given by CATS graders in 2005-2006 were dramatically higher than they should have been and that the higher the grade given, the less likely it was to be an accurate grade. Some agreement rates between how the portfolios were actually graded and how they should have been graded were lower than 20 percent.

KDE Portfolio Audit Agreement Rates1


Novice

Apprentice

Proficient

Distinguished

Grade 4

100% (24/24)

89.42% (186/208)

64.97% (230/354)

28.74% (25/87)

Grade 7

98.89% (178/180)

83.45% (237/284)

42.01% (92/219)

8.89% (4/45)

Grade 12

96.88% (31/32)

70.11% (190/271)

38.92% (65/167)

18.92% (7/37)

Total

98.73% (233/236)

80.34% (613/763)

52.3% (387/740)

21.3% (36/169)

“The audit seems to suggest,” said Cothran, “that about 75 percent of the portfolios ranked as “distinguished” in 2005-2006 were graded too high, and almost half of portfolios rated “proficient” were given higher grades than they deserved. If this audit is an accurate picture of how portfolios are being graded, then the problems with CATS are even worse that some of us thought they were.”

Out of the 136 portfolios that were originally scored distinguished, only 36 were scored distinguished after the audit. Out of 740 portfolios that were originally scored proficient, only 387 scored proficient after the audit. “One of the justifications for doing this kind of testing under KERA in the first place was to avoid the ‘Lake Wobegone Effect’ of norm-referenced testing,” said Cothran, “which occurs when a majority of students score above average. But findings like this seem to indicate that the Lake Wobegone Effect is alive and well right here in Kentucky.”

The Family Foundation supports the use of portfolios for instructional rather than assessment purposes.

###


1Kentucky Commonwealth Accountability Testing System 2005-2006 Writing Portfolio Audit Report,” Appendix J, p. 24.


4 comments:

Richard Day said...

Hey Martin.

On what basis do you say, "closely held?" No press release... or something more?

I'll follow up on this one at KSN&C.

...soon.

Richard

~
Kentucky School News and Commentary
http://theprincipal.blogspot.com

One Brow said...

By the way, what makes the audit more accurate than the initial scores, if the test is as subjective as it claimed?

Anonymous said...

The real issue is this. The KERA, then the CATS test has never been a statistically valid or reliable test. I have never seen a study published which states the test measures what its suppose to measure, and does in a reliable manner.

That is why no principal or teacher has ever lost their jobs because of low performance on the test. KDE knows that in such a lawsuit they would lose- big time.

Oh, and how many dollars has been spent on this unreliable and invalid testing system in KY?

One Brow said...

The real issue is this. The KERA, then the CATS test has never been a statistically valid or reliable test. I have never seen a study published which states the test measures what its suppose to measure, and does in a reliable manner.

How does that differentiate it from other sorts of testing? Whenever I read about any sort of testing, there seems to be controversy on whether the test actually measures what it claims to measure.