When this question lept off the page for me at a recent Communities for Learning session, it seemed that three days worth of thinking had found a home. This post has been in draft form for a while but I have decided that examining the research and practice around this essential question will be one focus for me in the upcoming year. Since this post will capture my initial thinking around this topic, it is not heavy in the research but merely my attempt to capture the "problem."
Consider these scenerios that recently presented themselves in my work:
District A has adopted a new series and carved out a 90 minute literacy block. Teachers are struggling with the use of the block and despite having an onsite "coach" do not seem to be making good use of the time. In a planning meeting to discuss the development of the teachers, the idea of starting by coaching those teachers who were closest to the ideal in order to have them lead their colleagues was suggested. Building principal was not sure there were many teachers who were "close" and was concerned about how those teachers who were coached might be percieved by their peers.
District B has slowly been acquiring new technology resources for teachers to use and the building principal has been committed to providing teachers with the use of interactive whiteboards. As teachers see this equipment being used in the school, some are ready to embrace the technology (and the learning curve) and try out some lessons. Building principal asks the early users to showcase how they use the technololgy for their peers at a faculty meeting - none feel they have the expertise to do so and the computer teacher demos something instead.
In the same district, one new teacher (untenured) has slowly been integrating the technology even though she has not been given one of the interactive whiteboards. She researchs sites on the Internet, is taking a graduate course on media literacy, brings her class to the school lab weekly and has integrated quite a bit of technology. She even selected a technology based lesson for her observation with the building principal. She is not asked to present at the faculty meeting and has recently been passed by for the installation in her classroom of a new whiteboard purchased by the PTA.
In District C, a consultant has asked teachers who have been engaged in a long-term professional learning opportunity around discourse to share an instance where they took a risk and were successful. In the reflection around this question, teachers struggled to think of answers where they had been successful.
In each of these examples, I am certain that the district wants to foster teacher leadership and that there are teacher leaders available - yet they have not been tapped. What conditions must be in place in a school system for teacher leadership to be developed, and more importantly, to thrive in a sustained way? What dispositions must teacher leaders exhibit to be effective? In short, the essential question is what does it take to truly develop teacher leadership?
As I work to frame this question and my research better - I would appreciate any warm/cool feedback on the identification of the problem and question!!
Usually, the most common reason for giving a practice math test is to identify students’ weaknesses. Hopefully the first post showed why it’s so critical to determine the weakness you’re talking about: familiarity with format, time, etc. If you’re worried about the math there are particular ways to approach the practice test.
First, ignore how the student did on the assessment. It sounds counter-intuitive but there is rationale reason for it, I promise. It’s more important how your students did on particular items than how they did overall. There are a couple of reasons for this:
NYS Assessments are based on a criterion-referenced model. Typically, when you give a student 25 questions, you mark the correct responses, determine a fraction of correct over total and come up with a score. Generally, we talk about these scores in percentages. Due to the complexity of the NYS assessments and the fact that they focus on performance in relation to a standard or criteria, scores are NOT reported this way. In fact, the number of raw points needed to demonstrate mastery shifts from year to year depending on the standard setting process.
It’s not the real deal. Regardless of the conditions we create, students know it’s not the real deal. Their performance may be inflated or deflated for that very reason and may not reflect their true performance.
NYS Test Design procedures. NYS follows a particular test design model that requires the test include items with varying difficulty. I’m sure you’ve noticed looking through the test that some questions “feel” easier than others. This isn’t a coincidence. Items are strategically chosen for the assessment that reflect a range of difficulty based on how students performed on them on the field testing. It doesn’t make sense to include 25 questions on Book 1 that were missed by most students during field testing. So, the test designers include items with a variety of difficulty – a few hard, a few easy and most middle of the road. This concept of item difficulty is called “p-value” – most simply put, what percent of students responded correctly to a question. In shorthand, we say items with high p-values are easy, while items with low p-values are hard for the particular group of students under discussion. So - two districts side by side may have different p-values on the same item. We need a neutral standard or benchmark to act as judge and jury around item difficulty. That's where the state data come in.
A great deal of data about NYS tests are made public every year – including p-values. These data can tell us which questions are easy and which are hard. It’s not a secret and requires only a smidge of background to use correctly. P-values are provided at a couple of levels. The one that is most important is for our purposes here Low Level 3. In this example, let’s talk about fifth grade. My mental model around scale scores and p-values is to picture a giant swimming pool filled with every fifth grader in the state of New York who took the state assessment last year. Floating above their head is their scale score. Students from the Bronx to Buffalo, from Long Island to Lake Placid. Students with and without disabilities. Levels 1, 2, 3 and 4.
I can look at how ALL the students did on items but included within the mix are students who really struggled and students who did really well (We assume most questions were hard for students at Level 1 while most were easy for students at Level 4.) So, I as the data lifeguard blow my whistle and call out every child who scores Levels 1 and 2. Same for the Level 4’s. Left in the pool are my Level 3’s – every child who met the standard. Because I want the data to be as clean and precise as possible, I’m going to boot out every child who scores above the minimum standard – which in Fifth grade in 2008 was 650. Left in the pool I have a few thousand students – all who met the minimum standard, AKA scale score 650. For each question these students took, I can look at how many got each question right and compare (or benchmark) my students to their performance. The graph below shows you what that looks like:
Out of ALL of the students who scored 650, only 18% of the students got question 7 correct. In other words, that was a hard question. My gut isn't telling me that. My students aren't telling me that. Students from across NYS are telling me that. Take a look and see how your students did on it. Odds are, they didn't do very well. It's not because you didn't teach it or they just weren't listening. It could be because the wording tripped them up - just like 82% of all students who scored a 650. The question is below:
Students are likely to pick A because it practically screams "PICK ME!" at them. Your students may know fractions inside out and sideways. Picking A and not C is an issue of testing sophistication, not mathematics. When reviewing similar problems with students, as much as possible, give them "PICK ME!" choices so they can learn what they look like and how to avoid their siren song.
However, before assuming it's a strength or weakness, look for other evidence that the students understand the concept. Formative assessment can really come in handy here. You can pose a similar question and ask students to respond on their way out the door. This time though, ask:
Anne has completed 87% of the race. What fraction represents that portion of the race she has NOT finished?
If students get the math, they should pick A. If they pick C, it's probably a testing issue. They slid past the NOT. Anyone who picks B or D may have a problem with fractions in general. How did they do on question 15 which taps a similar understanding? (I use Tinkerplots to answer these questions. It's one of my favorite data toys.) The students will form themselves into like-needed groups, depending on what the other instructional evidence shows.
So - if you're going to give the test to identify weaknesses:
- Consider how your students do on easy questions (high Low Level 3 p-values) versus hard (low Low Level 3 p-values) questions.
- Be aware of what wrong answers students give as that's often more interesting than what they got right.
- Consult other evidence (formative and summative) before confirming the students have a mathematical weakness.