You say Tomato, I say Tomahto ... it's [almost] 2015, why are we still talking about this?

We all like to think we're open-minded; that we arrived at a reasonable, logical and right conclusion after careful consideration of the facts, perspectives, and various opinions. Sometimes we get the benefit of cardiac assessment ("it just feels right") or a gut response to an issue. We weigh the evidence, reach a conclusion, and can rest comfortably in our superiority over those who haven't reached the clearly obvious conclusions we have. Ok... so that last bit may or may not be true, depending on your approach to critical thinking and awareness of cognitive biases.

Here's an interesting experiment. Below are two Kindergarten standards. One is from the Common Core Learning Standards (NYS's version of the CCSS) and one is from the so-called "Lost Standards" (the version NYS was working on when Common Core came along).

Standards A
Standards B
Ask and answer questions in order to:
• seek help,
• get information,
• or clarify something that is not understood.
• Ask and answer questions about classroom activities
• Request help when needed
• Know when and how to ask permission

My bias is that standards are the least important part of ensuring a quality public education for all students. I'd co-sign this post by Kathleen Porter-Magee on standards and curriculum if it were a petition. So when I make the claim: The text on both sides of the table are basically saying the same thing, it's informed by my bias and hunch that difference between any two set of standards isn't really all that big. In my opinion, the biggest change from the old NYS ELA standards and the CCLS (besides the six shifts) was the introduction of coding, shared language between grade levels, and the explicit inclusion of culture and choice in the language of the standards.

Someone with a different bias, perhaps that the Common Core are "developmentally inappropriate" will likely see the text in the two columns as different. The would likely make the counterclaim: one is more "appropriate" than the other. 

All of that said, here's my question: Does it matter? It's 2015. How will the problems created by the Common Core be fixed by dropping them and going back to standards that NYS walked away from in 2009?

This isn't about being right or being wrong about Common Core or which is better. This post is about humbleness and hubris. I've been writing this for a while now. While walking through airports, driving home from programs, falling asleep at night, in-between designing programs, and reading assessment research and I'm still struggling to find the right words. Usually, not always, I found myself mentally composing this post after scrolling through Twitter and watching the absolute confidence that a large number of educators speak about a particular issue. Mostly white male educators. Mostly about Common Core. I thought perhaps it was a function of 140 but the language of their blogs is often the same. Just for fun, I've "pushed back" (which I've been told is "trollish" and a "bad habit") and in most instances, I get a response that one might classify as doubling-down. I've reflected on why I feel compelled to comment and poke. It's partially because I'm fascinated by how we engage via Social Media. It's partially because I advocate for process assessments (asking students about HOW they think) and logical discourse. It's partially because I'm annoyed. I'm annoyed by hubris. I'm annoyed by the number of white male educators who write and post their thoughts on issues, rather than boosting the voices of women and men of color who have been writing about the given issue for months, even years. Maybe I'm a little jealous of the sheer hubris that some exhibit as they write and post about an issue, wrapped in a toasty blanket of confidence that they are absolutely, incontrovertibly right.

Mostly though, mostly it's because I'm angry. I'm angry that at the end of 2014, following months in which Black Americans had to say - aloud - "my life matters", people are still having conversations that feel like they should have been resolved in 2010. I'm angry at the data below.

I'm angry that someone claimed (with a seemingly straight face) that replacing one set of standards that are basically the same as another set of standards will reduce misbehavior among Black preschoolers and therefore, reduce the suspension rate. I'm angry that a number larger than 1 of middle-aged, white men wrote long-form essays on the impact of Common Core - without citing or even referring to the lived experiences of classroom teachers or current college students. I'm angry that many of those who are anti-test (seriously folks, take Jose Vilson's advice and advocate for the "Whole Child") aren't offering alternatives to annual testing other than "Opt Out." It's 2014, not 2009. How about we move on from the standards and onto pedagogy, quality curriculum, equity, and cultural literacy?

How powerful would it be if instead of continuing the same conversations in 2015 that have been going on since 2009, we start or join new ones? The ones about race and culture and whose voices we trust and the role of public education and the tension between the learner and schooling? How about instead of tweeting "this is the truth", we ask "what makes you think that's the truth?" What if we asked more than told? Questioned more than pontificated? Reflected more than bumper sticker-ed? But eh... whadda I know? I'm just a troll.

Edit: I wrote this is December 2014. I was going to remove the above few paragraphs as they're repeating what has been said much more clearly and concisely by other authors such as #educolor members and equity researchers. I elected to keep them in an effort to be an honest author. Now that Cuomo has announced his panel, my frustration has flared up again. Instead of talking about the problems and flaws of APPR, we're going to be playing a game of wordsmithing to create "NYS Standards" that basically say the same thing as the CCSS but are juuuust different enough to require new curriculum and assessment design. I for one, am fairly confident, that's not the most effective use of our time, not the thing that should be the top of our "to do" list.

Broadening Our Concept of the Whole Educator

Attending to the social and emotional needs of individual educators is an important component of a healthy classroom, school, and public education system. At the same time, ensuring a healthy system and a safe space for all children requires that all adults in the system go through the difficult work of uncovering and confronting assumptions, biases, and mental models. This work is confounded by the shared human trait of egocentrism. It is impossible for any human – regardless of age, culture, or experience – to truly “get” where another human is coming from. We are all limited in that regard. Yet, it is this very shared trait that allows us to create dynamic, diverse communities in which each person is seen and treated as a whole and rich being with their own perspective, passions, interests, and experiences. The creation of those communities, though, requires that we acknowledge our shared failings and work to openly address them. In the absence of these conversations, we may end up unintentionally causing harm to students. For example:

Last YearA high school teacher created an assessment to assess a complex Common Core standard and selected a series of texts for students to use to support or refute a claim. When all was said and done, 13% of the teacher’s students wrote papers making the claim, supported by evidence the teacher provided, that the Holocaust didn’t happen.

It’s not uncommon for school communities to say, in the wake of similar instances, “the teacher was unaware of any [Jewish] students in his/her class.” In other communities, parents of color have shared that their children only see texts by authors of color during Black History Month or the role of women is mostly addressed during the month of March. Questions for discussion that events like this might provoke include: What are the implications when a teacher or a school’s approach to what they put in front of children is informed by the absence of a demographic group within the school walls or by particular events on the calendar? What do teachers need to know, be able to do, and value in order to to select the kinds of texts that will lead to a more accurate, fair-minded understanding of historical events and trends?

Six Months AgoA transgendered student was told that the student would have use the restroom the school principal and teachers thought was appropriate for the student, not the restroom the student would have preferred to use.

In assuming a seemingly neutral stance, the school ended up denying an aspect of the student’s personal identity or may have creating shame or guilt in a young person struggling to protect her own emotional well-being. A similar defense around school policies (“It’s part of the dress code”) was used when a Navajo student was told to cut his hair (kept long for religious reasons) before returning to school. Questions for discussion that events like this might provoke include: What are the implications when our inherent inability to see through another’s eyes informs the policies and practices of school? Is empathy enough?

This MonthA teacher was suspended when, after the topic of Michael Brown’s death came up in class, a lesson involved having students act out Michael’s death. The re-enactment included researching the number of times the young man was shot and having students play the role of the officer who shot him.

In a powerful piece called “Facing Race Issues in the Classroom: How to Connect with Students”, an educator reminds the reader, “We may not be able to prevent everything, but we can control how we react to things.” Questions for discussion that events like this might provoke include: What are the implications when, having the best of intentions in reacting to student questions about a real and important issue, educators design tasks that in hindsight are clearly poorly informed? What steps can and should be taken before putting such activities in front of students?

In each instance, the educators involved were doing what they thought was best. Each of these teachers is also likely a family member, with friends and hobbies and pets. Based on the demographics of the American public education teaching force, the probability is high that these educators were white and female. It is also probable, that if asked, each woman involved would deny her actions were racist or biased[1].

Attending to the whole educator requires that we revisit who we think we are and how that fits into the larger social narrative and structures we exist, teach, and learn within. Learning more about how to confront our own biases and “blind spots” is the responsibility of all adults but given the impact that educators have, it’s crucial work for members of the profession. It’s also equally as important that as we do the work of broadening our conception of who we are as whole educators, we don’t infringe upon others’ emotional safety. It seems like a natural step to reach out to faculty members of color to discuss race or to Lesbian, Gay, Bisexual, Trans* or Queer (LGBTQ) colleagues around issues they may experience, but that small move is identified as “othering” by social justice advocates. Othering is where we use our frame of reference to define someone else’s identity. Instead of Ms. Jones being defined as a grade 5 Science teacher or however she self-identifies, she’s approached as Ms. Jones, a woman of color, based on how she’s seen by her white colleagues. Although intentions – seeking to understand – may be noble, constantly being “othered” can take its toll. [2] 

The students involved in the Student Six Tips in the “Facing Race” article shared that as a result of their teachers’ intentional work, they felt safer and found greater success. “The teachers treat us like peers and we respect that.” In a recent Twitter chat around LGBT issues, a teacher reported that a subtle shift in her language – from “husband and wife” to the more neutral “spouse” allowed one of her LGBT students to feel safe enough to confide in her the emotional and social challenges he was facing as a gay youth.

The work and heavy lifting of expanding the boundaries of the whole educator to include social justice and equity considerations has to be done by each individual. This work is not optional and should not happen in response to events like those listed. It needs to happen now, without hesitation, and without fear of saying the wrong thing, a common, shared fear. The hardest part may very well be admitting that for the majority of educators, discussions about race, gender, or sexuality are often event-based rather than a part of professional, reflective conversations. In order for us to answer the essential question, “How do we become a more just society?”  we have to start exploring the boundaries of our identities while seeking to understand others’ – even if, and especially if, the classrooms, faculty rooms, and media we see on a daily basis reflect mostly faces and experiences that look like our own. This work is important. The work must start now.

In addition to the resources linked in this column or referenced in footnotes, educators may find the following resources useful to inform their reflective practice.

Quick ReadsMedium Length ReadsLong Read
Race in education and the classroom5 Ways to teach about Ferguson 
“We cannot be color-blind” Race, Antiracism, and the Subversion of Dominant Thinking by George J. Sefa Dei This is not a Test by Jose Vilson 
LGBT issues in education and the classroom#LBGTeach GLSEN school resource guide

Raising My Rainbow by Lori Duron

Learning more about fairness in assessment design“Identify and Eliminate  Assessment Bias” (video) by James PophamRegents Exams Item Criteria Checklist

Gender bias and fairness by Ruth Axman Childs

[1] This article by Ta-Nehisi Coates, The Good, Racist People  or the TedX Talk by Jay Smooth are two great resources on the important distinction between racist actions and “being a racist”.)
[2] The blog space maintained  by Jose Vilson speaks to his experiences as an educator, parent, and man of color, and frequently makes connections to larger social issues.

Originally published in the NY ASCD newsletter (September 2014) 

Pick an item, any item - the reveal

Yesterday, I wrote a quick run through of item p-value in which I ignored a bunch of stuff about item analysis in order to focus on the big idea of predicting item difficulty. 20 anonymous people - of unknown age, teaching experience and background, answered a quick poll I put up about two items from the released NYS items. 20 adults (I assume -and based on their responses to the optional question, it's a reasonable assumption) picked which item they thought was harder. And 17 adults - about the size of an elementary school PLC - or 85% of them picked Item 1. Reasons for picking 1 included:
  • To answer question 1 you need to go off and look up the contents of the 4 referenced paragraphs and then keep them in mind while evaluating the most plausible answer. This would put more strain on the working memory. The content needed to answer question 2 is right there in the question, easily visible.
  • Question 1 requires students to go back into the reading selection; that is not required for Question 2.
  • "Predicts the action" is awkward phrasing, question requires students to flip back into the story to reread. Also, question number 2 is a more familiar format
All reasonable. All made by (again, I assume) well-educated adults using the evidence in front of them to draw a conclusion. And yet, the reason we need student item data...

49% of NYS 4th graders who took the test got this question correct. 
25% of NYS 4th graders who took the test got this question correct. 
Does this mean that the 20 adults don't understand teaching, students, or education? Not by a mile. It's a reminder that when it comes to assessment - especially multiple choice item design - when adults read items we see difficulty different than the test taker. Think one of the released items is especially hard? Check the p-values by using the released charts. See the small blue number in the top left? That's the item's code. Look for that code on the p-value chart. If you're in a school in NY wondering how to make connections to your students, look at your students' p-values in the reports released by the RIC and start to have conversations about the implications. SED has released guidance on how to analyze the data and educators across the state are writing (written by my co-blogger, Theresa) about how to state assessment data to inform conversations.

State ed testing takes 300 minutes, 1%, 2 weeks - however you chose to present the numbers - of a child's year. It is the LEAST important assessment children will engage in over the course of a year. What's to be gained - or lost - by framing the LEAST important thing students do as a way to advance our agendas? How does it help alleviate students' (and parents') stress when we give the LEAST important thing in the education landscape the most attention? 

Disclaimer: And as you read these posts, please know, gentle reader, that I am an advocate of performance-based, portfolio, and authentic assessment. I love roses but have committed to the science of the teaching profession which means working to ensure we're describing the daisies correctly. So the usual disclaimer - I am not defending NCLB, VAM, APPR. I'm not even defending the NYS assessments. It's my hunch that we're making it harder to fix the big picture when we neglect to accurately define the parts of the whole.

Pick an item, any item

So we're going to play a little game in this post. But first, let me set the stage.

While lurking on a Twitter exchange about race, education, and schools, I saw a great reminder from Bill Fitzgerald scroll by. In effect [and apologies, Bill, if I've summarized incorrectly], it's worth engaging around important topics even it's clear the discourse isn't going anywhere because you never know who might be listening, seeking to learn. To revisit an earlier post, I decided not to worry about the manager of the nursery, and consider instead the walkers just out for a Sunday stroll who may overhear the discussion about the daisies.

I am going to make one claim here in this post and one claim only: when adults look at multiple choice items, we see them differently than students do. Experience, background knowledge, expertise, confirmation bias, 20 years of living - a wide variety of things influence how we read an item. Any teacher who's seen students ace a "hard" item or tank on an "easy" one will know that it's not until students actually take the items that we get a real sense of the item's difficulty.

Item design is a science - and an art. Objectivity plays a large role. BUT:
One cannot determine which item is more difficult simply by reading the questions. One can recognize the name in the second question more readily than that in the first. But saying that the first question is more difficult than the second, simply because the name in the second question is easily recognized, would be to compute the difficulty of the item using an intrinsic characteristic. This method determines the difficulty of the item in a much more subjective manner than that of a p value. Basic Concepts in Item Design
This is why there's field testing. Or why we should field test classroom tests and why states have to field test items from their large scale tests. The test designer (teachers or Pearson writers) do their level best but we need certain statistics (available only after students have actually taken and responded to an item) to reach conclusions about how high quality an item is. The most common statistic we can use is what's known as a p-value. This value is the percent correct - the higher the p-value, the more students who got an item correct, the easier the item was for the group of students who took the test. There are guidelines around p-values but generally speaking, "p-values should range between 0.30 and 0.90." There's a lot more to unpack around item difficulty but we'll just leave this here.

In the absence of these p-values, our observations about the difficulty of an item are just that - observations. Hambleton and Jirka (2006), two psychometricians/unicorns reviewed the literature around estimating item difficulty and found studies where highly qualified, well-experienced teachers were inconsistent when it came to accurately estimating how students would do on an item. "No one in any studies we have read has been successful at getting judges to estimate item difficulty well." Pretty compelling evidence that we need to temper our opinions with supporting evidence from students who, you know, actually took the assessments.

So now onto the game. Let's pick an. any item. How about Item 132040149_1 from the released Grade 4 Assessment items?

Now, in order for this game to work, you have to play along. Click the link above to read the Pecos Bill story and do your best to answer the question. You may look at it and conclude it requires skills "required skills out of the reach for many young children" or that the number of steps to answer this question are too many and too complicated for 4th graders. Now consider the question below also from the fourth grade test:

Which one would you expect to be harder for the students? The top one or the bottom one? What's informing your decision making? What evidence are you using? What percent of students do you think got the top item correct? How about the bottom one?

Hit me up here or on Twitter and share your thinking. I'll follow up with the answer in an upcoming blog, provided I get through my rose-tending to do list.

The reveal is here.

Chasing Down Pineapple Chasers

Imagine you're strolling with a loved one in a local park one bright Sunday morning. You and your companion pass a cluster of flowers and you overhear another stroller say, "Look at those gross weeds. They should be pulled out, they ruined this entire garden." You look where he's pointing and see the happiest, cheeriest, sunniest, albeit ugliest, bunch of daisies you've ever seen. You look at his face and recognize the speaker as a highly regarded and respected nursery owner. What do you do? What would Emily Post say? What would Freud advise? You look over at your loved one, panic clear on your face. If your loved one is like mine, s/he smiles, squeezes your hand and asks, "is it worth it?"

You decide it's not. No damage done. Who cares that the nursery owner confused a rare variety of daisies with weeds? But then you look over and see a group you recognize from the local gardening club, nodding along. "Bad weeds" you hear one mutter. "Terrible things. They should be yanked." Another pulls out a garbage bag and covers up the bunch of daisies. "No one should have to see these weeds." She says and you hear the passion and conviction in her voice. Her voice practically vibrates with anger.

The nursery owner is consulting a gardening book and reading aloud the problems he sees with the weeds and your stomach drops as you recognize he's misreading some of the information. "They're going to strangle the whole garden." You hear him mutter and you start to twitch, knowing that the daisies actually attract a particular strain of butterflies that help germinate a different section of the park.

You know because you're a botanist. You spend your professional life studying plants. While your work actually involves roses, you had to study daisies in order to better understand the species you grow. There is the real possibility, you admit, that you're wrong. The longer you stand there, the louder the group gets, the more convinced you are that you must be off-base; daisies aren't THAT necessary and it would be great to use the space for more of your roses..

So, you say nothing. The moment passes. The group is unified by their hate for those damn not-really-weeds. Not much you can say.  So you walk on with your companion, working hard to not give your loved one yet another lecture on the importance of ugly daisies in a well-balanced ecosystem.  On your way out of the park, you hear members of the gardening group telling incoming strollers how the owner of the nursery had, just that morning, published a piece in a national gardening newsletter, "setting the record straight" on those nasty weeds.

What's the role of expertise in conversations like this? Do you, with expertise in flowers, though you way prefer roses to daisies, speak up? Does your obligation to speak up change based on the size of the crowd? Is it changed by knowing the nursery owner isn't fond of you, and has even publicly called you "uninformed?" In truth, last time you spoke up, you had a middle school flashback to being told "your opinion doesn't matter because you're not tall/ short/ athletic/ musical/ smart enough/ right-handed enough" to comment. Even worse, the last time someone spoke up, it seemed some members of the gardening group became even more insistent and vocal about calling the much maligned daisies "weeds."

Help me out here, gentle readers. If you were the botanist, what you do? What if you were the nursery owner, would you want the botanist to speak up? Is there a right time and place to speak up? Is it worth it?

Chasing Pineapples – Part 1

In my column for NY ASCD, I considered the role of assessment literacy on education. Here on my blog, I want to poke at the idea a bit more in-depth. And since this is my (“our” actually - Theresa was a much more consistent writer than I was when we first started. Her reflections and thinking can be found throughout the archive) blog, I’m going to draw a line between assessment literacy and the Common Core Learning Standards.

I have a favorite Common Core Learning Standard. I realize that’s a bit like saying I have a favorite letter in the alphabet, but there you go.

CCSS.ELA-LITERACY.W.11-12.1: Write arguments to support claims in an analysis of substantive topics or texts, using valid reasoning and relevant and sufficient evidence. (NYS added an additional sentence to this standard when the state adopted CCLS: Explore and inquire into areas of interest to formulate an argument.)

Making the promise to NY students that we will do everything in our power to help them develop their ability to understand arguments, logic, evidence, and claims, in my humble opinion, is long overdue. In truth, I’m jealous that my teachers weren't working towards this goal. I learned how to write a really solid 5 paragraph essay in HS English and it wasn't until I was paying for my education that I was introduced to the rules of arguments and logical fallacies. Since I missed the chance during my formative years to explore this approach to discourse and discussion, I try to practice it as much as I can as an adult.

It’s my hunch that assessment illiteracy is having a dramatically negative impact on how we talk about public education. More to the point, I suspect that the same quirk that makes us fear Ebola more than texting while driving is what leads us to discuss and debate the state assessments with more energy, passion, and time more than the assessments students see on a regular, daily basis. My claim: when viewed as a data-collection tool mandated by politicians with a 10,000 foot perspective, the tests are benign. Their flaws and challenges are amplified when we connect them to other parts of the system, or if we view them through the same lens we view assessments designed by those with a 10 foot perspective on student learning. When we chase the flaws in a test that takes less than 1% of the year, we end up chasing pineapples. 

In the traditional of well-supported arguments, I want to focus on patterns more than individuals and on a narrow, specific claim, rather than a bigger narrative. (In other words, I’m not defending the tests, NYSED, Pearson, Bill Gates, APPR, or anything else.) The pattern across Twitter and in Facebook groups is a call for NYSED to release the NCLB-mandated tests so that the public (including parents) can judge their appropriateness, quality, length, use of copyright, or whichever variable the person asking for the release wants to investigate. I absolutely support the Spencerport teachers desire to see the entire test but a voice in the back of my head keeps asking, “Why? So what? What criteria are you using to determine if the test is any good?” Last year, NYSED released 25% of the items and a few bloggers shared their opinions about the quality of the items but I haven't been able to find any analysis of the released items against specific criteria. This is not to say they don't exist, just that they escaped my google searches. This year, NYSED released 50% of the items and the response has been NYSED should release ALL of the items. Which, I suspect, is what NYSED wants to be able to do but funding issues are preventing it from happening. I've been watching Twitter, hoping to see critical reviews of the released 50% but instead, there’s been lots of opining. Lots of “I” statements, not a lot of evidence-based claims. This, I suspect, is a side effect of assessment illiteracy across the board. We just aren't any good as a field, much less as a collection of citizens, at assessing the quality of measurement tools.

So, what makes an assessment good? What makes it bad? Given rules of quality test design as outlined in the APA Testing Standards, why are we willing to accept that the strength of the speaker’s opinion as the determining factor of quality? Is the issue of quality in large scale assessments a matter of opinion? I suspect anyone who has taken any large scale test (from the drivers test to the SAT’s) hopes that’s not the case. I know that numerous organizations including the National Council for Measurement in Education work to establish explicit criteria. The USDOE is instituting a peer review process for state assessments to ensure quality. PARCC is being as transparent as possible, including bringing together 400 educators and teachers to provide feedback. All of these groups use specific criteria to assess the quality large scale assessments. Yet, in the media - social and traditional, one person's opinion about "bad" or "hard" items is treated as if it's the truth.

So, my confusion remains: if members of a groups who do assessment for a living spend years establishing and sharing measures of quality for large scale tests, what tools will the public use to assess their quality? How can the general public use “cardiac evaluation” (I know it because I feel it - not my phrase, I totally cribbed it from someone else) when the vast majority of classroom teachers receive little or no training during teacher prep in how to assess and evaluate assessments? When it comes to state assessments, is it more about chasing pineapples – making claims about the tests quality – than actually catching them – supporting claims with evidence?

 And as I often do, I end up asking myself why it matters. If a parent says "I think this item is too hard/developmentally inappropriate/unfair" should that be enough to say that it is? How much of the science of the education profession actually belongs to members of the profession? 

Two roads diverged... and I blundered straight ahead

I am passionately, openly, and sometimes foolishly, in love with authentic assessment and portfolio assessment. I have seen working as a teacher, and now alongside them, the power that relevant and meaningful work holds for students. . . One of the amazing things I get to do for a living is help schools design performance-based assessments that ask students to do something with what they have learned, not just recall facts and provide a right answer. My job does not depend on the success or failure of state tests. I have no stake in testing itself, beyond that of a taxpayer and an educator privileged to work with teachers and schools. So my passionate belief in the craft of the teaching profession comes from my professional experience in classrooms and schools. I believe, adamantly, in using both the science of learning and the art of instruction to provide a quality public education to all of New York’s students.
I wrote the above paragraph last Spring in a column on standardized tests for Chalkbeat. (I can't remember if I picked the post title. In hindsight, I think "A Primer on Standardized Tests" would have been better.) My passion for performance-based assessment hasn't lessened. My commitment to supporting teachers as they design meaningful tasks for the 99% of the school year that isn't devoted to state tests hasn't changed. What has changed is the degree of assessment illiteracy I'm willing to tolerate when it comes to talking about assessment design of both the large-scale, standardized and the day to day classroom flavor. My understanding of measurement has deepened and expanded as I work to better understand how we take this messy, amazing thing called learning and work to capture evidence of it through artifacts and evidence in order to make decisions at the student, teacher, school, district, state, and national level. 

I sometimes joke that I feel a bit like a butcher who chimes in on vegetarian's conversations. If I am so passionate about performance-based assessment, why do I care what word a columnist uses to describe student performance on state tests? If I would rather see students designing portfolios that tell the story of their learning than take yet another multiple choice test, why does it matter so much to me that educators understand the basic rules of large-scale standardized testing? For me, it about what it means to be members of a profession. I don't expect parents to get how p-values work nor do I expect reporters and columnists to have a deep understanding of psychometric jargon. I'm frustrated and disconcerted when those columnists and authors who are also members of the education profession ignore the science of their profession in order to prove a point. And I've elected to speak up when I see that happening. I've chosen to speak up when a vegetarian is wandering around my kitchen complaining about the how unhealthy cow bacon is. 

This is a space Theresa and I created a few years ago to ramble, babble, reflect, and wrestle and I want to dust it off to have conversations that go beyond 280 character blurts. I've been perseverating on some of the same ideas since Day 1 of our blog. I've also revised some of the claims I'm willing to make. On my "things I want to write about" list are topics such as the NYS annotated items, issues around bias in teacher-designed assessments, challenging some commonly held beliefs around assessment design, and gender-related issues in education. 

So - I'm an assessment nerd. Ask me (almost) anything. 

Increase Our Assessment Literacy

There are 11 million licensed drivers in NY State. Unless they took a Driver’s Ed course, these motor vehicle operators (5% of which are between 16 and 19 years old) willingly subjected themselves – sometimes gleefully – to a high-stakes standardized test. Some do it more than once. Again and again, they go back to ensure they pass a standardized test that will have a dramatic impact on their personal freedom and daily lives. Despite the significant impact of this test, there really aren't any public cries about the quality, validity, or reliability of the NYS Driver’s Test. In fact, the technical documents to support the design and psychometrics of the permit test don't appear to be publically available. This lack of interest in the quality of the tests may be because it’s short (20 questions) or because it’s followed up by a road test scored by an assessor trained in the rules and basics of the road who gives the test taker immediate feedback on any mistakes or errors. In either case, we accept the presence of a standardized test as a part of the transition to responsible adulthood.

Americans have an odd relationship with standardized tests. We expect that the service providers we interact with, from cosmetologists to real estate agents, are duly licensed to do their jobs. We require that doctors, lawyers, teachers, and others who are members of a board or a profession meet certification criteria. In almost all of these cases, the license or the certification is only awarded after successfully passing one or more standardized tests. There are likely a variety of reasons why we're comfortable with some standardized tests and not others: the age of the test taker, the degree to which the test taker wants to or seeks to take the test, the degree to which we believe the test measures something important, the test taker’s ability to prepare for the test, how the results will be used, or a fear of test-taking. Some of the discomfort may come from the fact that our field suffers from what Popham (2004) calls “assessment illiteracy”. He goes so far as to claim, “the vast majority of educators reside in blissful ignorance” when it comes to understanding the design and nature of standardized tests. In 2004, he got 5 million hits from Google when he searched for “educational assessment.” In 2014, there are 159 million and the need for assessment literacy has never been higher.

While it is impossible to explain the complexity of large-scale assessment in a single column, I’d like to offer an invitation for readers to invest time in their own assessment literacy. There are several available resources that can provide a NY educator with a better understanding of standardized test design and a deeper understanding of what the NYS standardized tests are and are not.

The best starting point for learning about standardized testing is a document referenced in the APPR documentation. The 1999 Standards for Educational and Psychological Testing was published by the American Psychological Association (APA), American Educational Research Association (AERA) and the National Council on Measurement in Education (NCME) and provide the foundation for testing. Each section of the text defines the psychometric concept (i.e. validity, reliability, fairness, etc.) and sets out the limits of the concept. While not written to explain how testing works, it is the official source for understanding testing concepts. I find myself going back to Popham’s ASCD book, “The Truth about Testing”, an overview of standardized testing that is free of the “noise” created by recent policy like APPR and RttT. Finally, membership in NCME costs $70 a year and provides access to numerous field-friendly and scholarly texts on standardized tests. A more practical document to improve New York State specific assessment literacy is the NY Testing Program Technical Reports. Like the Testing Standards, each section of the technical report explains the psychometric concepts and presents the statistics around the concept from the test being reported.

Social media is awash with claims that the NYS Tests are unfair, invalid, or unreliable. These technical reports provide evidence of the veracity, or lack thereof, for those claims. For example, several groups are claiming the tests are too long. A statistic known as “speededness” provides details about how many students left items blank at the end of the test, giving us an actual report of how many students finished or ran out of time. Additionally, NYSED has released items from the 2012-2013 and 2013-2014 assessments that include explicit and annotated alignment between the items’ demands and the CCLS. A thoughtful read of these resources can empower educators who wish to make claims against the misuse of standardized tests.

The use of standardized tests as the primary means to ascribe growth and attainment for students is highly problematic and has been documented extensively (Berliner and Glass, 2014). Seeking out information about standardized tests to improve assessment literacy isn’t a concession or an endorsement of over-testing or bad policy. It is important for educators to deepen their understanding of these tests by looking at SED annotations and reports, studies from the field of psychometrics, and long form analysis rather than relying primarily on social media and or impressionistic observations.

I named my column in NY ASCD, where this post was originally published, “Pushing at the Boundaries of Assessment” because we wanted to carve out space to investigate what it means to poke at our understanding of what it means to capture evidence of student learning. It’s challenging though, to push at boundaries if we don’t know where they are. These boundaries of standardized tests are defined by and for our profession. Members of our profession have the obligation to separate myth from truth, hyperbole from fact. It is by learning and understanding the research and thinking behind the tests that we can truly be prepared to lead with knowledge and can be prepared to answer the difficult questions. It is good to question and push at the boundaries. It is responsible to be well informed, even if it isn't our own particular area of expertise.

Is it worth it?

Is it worth it? One simple question. Set aside for a moment your textbooks with checklists on quality assessment design and statistical software. Ignore the blog posts with titles like “4 Questions You Must Answer Before You Give Another Test!” or “10 MUST Ask Questions About Assessment.” Forget state education tests and mandates or district requirements, look at the assessments within your control and ask one simple question: Is it worth it?

To be worthy, a task has to have value and meaning to the child engaged in the work. When it comes to the boundaries of assessment, we often reflect on how those boundaries impact adults. We’re comfortable talking about how much time teachers spend scoring or wrestling with assessment design from our vantage point. We talk about the value of an assessment’s results for data analysis by adults. We work to make explicit the meaning of an assessment as it relates to a school’s mission or vision. Consider this an invitation to reflect on the worth of the tasks we ask students to do – from their perspective.

Much of what students do in schools every day are things that are found only in school settings. A history professor on Twitter offers a cash bounty for anyone who can find a five paragraph essay “in the wild”. There are sites devoted to the stretches teachers make to create “real world” problems that are anything but. Typically, students submit tasks to their teachers that will be read only by their teachers and never leave the classroom. They’re often given checklists of how to complete a project, resulting in a project that looks exactly like their classmates’ projects. Asking Is it worth it? about assessment is more important than finding a definite answer. For those asking the question, the answer lies in the context in which teaching and learning occurs. In many of these contexts, the curriculum is overwhelmingly prescriptive and there is little autonomy or choice for students, and sometimes for teachers. Yet, taking a beat to consider the worth of the tests, tasks, quizzes, projects, worksheets, and activities we ask students to complete is not only sensible, it’s a humbling experience.

Attending to worth is a way to start building the foundation for the kind of system where students spend their days constructing knowledge in a way that is individual, powerful, meaningful and relevant to them. Identifying opportunities to increase worth is a small move we can make to give students more space to find themselves within the system. Worth can be increased through curriculum moves by asking, wrestling with, and answering essential questions such as “Can we revile a thinker, but revere their thoughts?”, “Is war inevitable?” “Can one person change the world?”. It can also be increased by asking students to identify an authentic audience for their work and then mailing, sending, or presenting their work to that audience, rather than just handing it in to the teacher. Worth can be increased in offering choice – true choice – around an assessment. Consider asking students, “I’m looking for evidence you’ve learned about or mastered this skill, standard, or concept. How do you want to show me your learning?” By asking students to attend to their own culture, to create something, to go beyond the task, we can answer the question, “Why do I need to do this?” before it’s even asked.

Originally published on