Guest Post:: Thoughts on NYS Scale Score Changes

The text below was shared by a colleague on the DATAG listserv. It's a worthwhile read for a calm and rationale approach to the (pending) uproar about the changing scale scores.

I’ve been working with many teachers who are discouraged at the news that SED will raise cut points this year and thrust schools back into embarrassing situations because groups won’t make AYP. As we read the rhetoric from newspapers and politicians who hammer public schools with glee, it is easy to feel like teachers are the whipping boys and girls for so many of society’s problems. The move to value-added assessment and changes in teacher accountability lend credence to this interpretation, since increasing teacher accountability is being sold nation-wide as the key to school improvement. Longer tests are offered as a means to produce better data sets so we can identify our weak teachers and terminate them before they do more harm if we accept the tone of the US DOE. Yes, I’m simplifying, and I’m reminded that DATAG began as a group dedicated to making appropriate use of data to inform professional development and school improvement without playing a blame game. We have to reengage in that effort every year.
My work with DATAG and with local teachers takes quite a different tact that is more positive and recognizes how much success New York has experienced in improving student achievement since NCLB began. I want to reframe the dialog around testing and raising cutpoints: It’s a reflection of our success rather than the notion that we have been ineffective or settled for low performance all along.
First, let’s review that our standards are criterion-based, not norm-referenced. That difference is important here. We have, however, treated our tests as norm-referenced in the ways we have allowed outsiders (and ourselves) to report them. The development of our standards came with extensive work by experts and with teachers who carefully described what they thought a student should do at a given grade level and subject, based on our standards documents. So, we define the kinds of achievement in ELA, for example, that a student in grade 8 should be able to do by the testing date in a given year. When we have a passing rate of 50% in early years (forgive me for not looking up the statewide level 3 proficiency rate when this began, but it was dismal) we were embarrassed and we worked very hard to improve that rate. And, so that we were comparing students on tests of the same difficulty level from year to year, SED and the test developers worked hard to create tests that held the proficiency scale score at 650 for each succeeding year. That process was designed to hold that proficiency level constant even as the tests changed each year. What that grade 8 student had to know to reach 650 was judged to be equivalent from year to year. Folks who understand the IRT models (like our DATAG colleague and frequent presenter Kathy Feller) help us to understand that a 650 in 2004 is the same as a 650 in 2010.

Today, we see the proficiency rates of students and schools rising every year. Since we have different questions on the test every year, we can’t cheat to produce this improvement. Have our tests tended to concentrate on some standards more than others? Yes, but that is because we have too darn many standards and some are more important than others, on which most of us can agree. The point I want to make is that as NY educators, we have done a good job of increasing student success in becoming proficient on the skills we developed, and at the level that we agreed was appropriate for each grade level. We should be proud of that, and there should be regular statements to that effect in recognition of our success.

Let me use a very simple analogy. Think of your gym classes and teaching students to jump over a bar. In grade 3 we want kids to jump over a bar that is 18” high. In grade 4 we raise it to 22”, and by grade 8 kids are jumping over a bar that is 40” high. When we began, our kids were not used to being asked to jump, so many of them couldn’t get to 40” by grade 8. Today, most do, since we’ve done well in getting them in shape. Now, we’ve decided that our initial, well-designed 40” grade 8 target is too small to ensure that students are ready for the real world and high school, so we raise the bar to 44”. That will cause kids to fail, but it sets the whole bar higher and we have to readjust.

Think sports. When I was in middle school the world high jump record was 7 feet. New methods of jumping and better coaching has pushed the record to 8 feet 0.46 in. So if one set a goal in 1956 to be the best in the world, you targeted 7 feet as your goal and if everyone reached it you would be the best coach in the world. A decision to use 7 feet as your target would not be good enough today because 7 fee is not world-class.
Back to our cut points. Our leaders have decided that our goals are no longer world-class. When we set them, they were among the hardest in the nation. The national norm equivalent score for level 4 when we tested only grade 4 and 8 was the 96th to 98th percentile, which CTB recognized if you reached the right folks to talk to. We know normed scored for each question because the questions came from nationally normed samples. [Note: norm sampled is not the same as norm-referenced. Norm sampled means the questions were tried out on students across the country before appearing on the final test. Norm-referenced refers to how scores are reported] So, a level 4 student was better than 96-98% of students in the country.

And the level 3 kid had to hit the 68th percentile, which is a full standard deviation above average. We no longer have national norms available for our questions so we can’t make those statements today. But while many states reduced their test difficulty or their cutpoint to improve NCLB passing rates, New York has not. We have held to a 650 cut point for proficiency and we improved year after year. This was a significant success, but now it has been decided that our targets, though higher than most states, should be raised.
I wish that the dialog over raising cut points had focused on the original high standards and on our success at progress toward reaching them. Were we all praised for the progress we have made, we could more enthusiastically accept the idea that the world around us keeps raising their standards and we want to stay ahead of that rising tide. Let’s applaud our progress and reset our standards so that we are encouraged to continue our improvement.

This kind of dialog is not impossible to achieve. I hope leaders who roll this out can be more cognizant of our successes as they raise our cutpoints. We should explain this as a next step arising from our success, not because of deficiency, bad teaching, low standards, etc. After all, we do the same with our students—as they succeed, we raise the bar on what we expect from them because we know they are ready for greater challenges. That’s how I want SED to position these changes, and that’s how I would like to see these changes portrayed in the media by those who lead us in the coming years.

Dr. Brian Preston
Lower Hudson Regional Information Center
Elmsford, NY


Mrs. Tenkely said...

That is the key, the bar keeps being raised. This should be a good thing, this means that the previous goal has been met and the standards made more rigorous as a result.

Theresa G said...

Excellent point Mrs. Tenkley! And one that seems to get lost in all of the hype about the change!
While I think the change could have been implemented differently and with a greater respect for the hard work that many, many teachers have done, it is time to move forward!