Grand Rounds: Pick an item, any item

Yesterday, I wrote a quick run through of item p-value in which I ignored a bunch of stuff about item analysis in order to focus on the big idea of predicting item difficulty. 20 anonymous people - of unknown age, teaching experience and background, answered a quick poll I put up about two items from the released NYS items. 20 adults (I assume -and based on their responses to the optional question, it's a reasonable assumption) picked which item they thought was harder. And 17 adults - about the size of an elementary school PLC - or 85% of them picked Item 1. Reasons for picking 1 included:

To answer question 1 you need to go off and look up the contents of the 4 referenced paragraphs and then keep them in mind while evaluating the most plausible answer. This would put more strain on the working memory. The content needed to answer question 2 is right there in the question, easily visible.
Question 1 requires students to go back into the reading selection; that is not required for Question 2.
"Predicts the action" is awkward phrasing, question requires students to flip back into the story to reread. Also, question number 2 is a more familiar format

All reasonable. All made by (again, I assume) well-educated adults using the evidence in front of them to draw a conclusion. And yet, the reason we need student item data...

49% of NYS 4th graders who took the test got this question correct.

25% of NYS 4th graders who took the test got this question correct.

Does this mean that the 20 adults don't understand teaching, students, or education? Not by a mile. It's a reminder that when it comes to assessment - especially multiple choice item design - when adults read items we see difficulty different than the test taker. Think one of the released items is especially hard? Check the p-values by using the released charts. See the small blue number in the top left? That's the item's code. Look for that code on the p-value chart. If you're in a school in NY wondering how to make connections to your students, look at your students' p-values in the reports released by the RIC and start to have conversations about the implications. SED has released guidance on how to analyze the data and educators across the state are writing (written by my co-blogger, Theresa) about how to state assessment data to inform conversations.

State ed testing takes 300 minutes, 1%, 2 weeks - however you chose to present the numbers - of a child's year. It is the LEAST important assessment children will engage in over the course of a year. What's to be gained - or lost - by framing the LEAST important thing students do as a way to advance our agendas? How does it help alleviate students' (and parents') stress when we give the LEAST important thing in the education landscape the most attention?

Disclaimer: And as you read these posts, please know, gentle reader, that I am an advocate of performance-based, portfolio, and authentic assessment. I love roses but have committed to the science of the teaching profession which means working to ensure we're describing the daisies correctly. So the usual disclaimer - I am not defending NCLB, VAM, APPR. I'm not even defending the NYS assessments. It's my hunch that we're making it harder to fix the big picture when we neglect to accurately define the parts of the whole.

Grand Rounds

Pick an item, any item - the reveal

No comments: