Larry Ainsworth - Day Two (aka Better Late Than Never)

Life interupted work and blogging for me this past week. You never realize how you are missed until folks say "Hey - where is Day 2 of that workshop?" or send you volumes of emails wondering why you didn't respond to one!!

But I digress!! Larry Ainsworth on Day 2 was just as great as Day 1 but absolutely more of a work session. For folks with a strong background in item writing, rubrics, and the like - it was light on content and heavy on work. For others - it was heavy on content. I am still surprised at the number of educators who don't know what a rubric is!!

Since our Power Social Studies team (my name, not theirs) had already unwrapped our standard and our fearless leader had already culled all the questions that applied to that standard from previous NYS Assessments - we were ready to roll. We thought...

Turns out that the questions that NYS asks on this standard didn't exactly match the content and skills we had unwrapped!!

Now - before there is tremendous shock and outrage - as we talked it through, we realized in part that it is due to the data format. When we selected a Power Standard to unwrap - we selected an upcoming unit for our lead teacher and one that we knew kids had a hard time understanding. It was a standard that is covered frequently on the NYS Assessments (so it met the power standard requirement) but the standard alone (from the core curriculum) did not give us sufficient information to do a good job in unwrapping it!

Here is how we modified the process:

1. After unwrapping the standard for content/skills, we looked at all the test questions that "matched" that standard.
2. We developed a list of "key vocabulary" (which included not only terms but events and people).
3. We redefined the "skills" we had unwrapped to include any from the questions that were not already there (i.e. recognizing point of view in a political cartoon)
4. We created a secondary set of "essential questions" for the unit - that had an answer and were not quite as broad as the ones we created initially (so should probably be called "guiding" questions).

As we moved on to creating our "assessments" we used the old questions for multiple choice (no sense re-creating the wheel) but developed our own DBQ to answer one of our larger essential questions.

The template that guided our work was helpful - but as I said, we took some libterties with it. Particularly in Social Studies - I think we will need to do this in order to get around the breadth vs. depth issues.

Our end product isn't posted for the world to see yet - Erie 1 BOCES is trying to work out some copyright issues - once it is there, I'll be able to share more. And maybe once I have had more than 3 hours sleep - I'll post more!

Common Formative Assessments - Part 1

Attending a two-day conference with Larry Ainsworth on developing common formative assessments. The room is packed with 160 teachers and leaders from the region - it is a pretty diverse group. Larry Ainsworth does a remarkable job in working with the crowd and in reminding the leaders in the group that they need to participate as well.

The structure of the day is such that we have an input session, then work in small groups, input session, work in small groups. The goal at the end of two days is that we will have "unwrapped" standards for an upcoming topic/unit and used a template to create some formative assessments.

Some take-aways from Day 1:

- Evaluating your current assessment system can be a simple, but eye-opening exercise. After listing all the assessments we use, we then ranked and coded them to indicate those that make the greatest impact on instruction and student learning, those that aligned to "power standards," those that emphasize literacy/numeracy and others. The purpose? Are we getting the biggest bang for our buck when it comes to assessment?

- We are over testing and under assessing.

- The revised Blooms are back!! The final cognitive piece of create is critical - being able to assess the cognitive load of standards/state assessments in critical to ensure alignment of our formative assessments. (Why ask a "remember" level question if the standards/assessment require "analysis"?)

- If the standards are asking for "lower level" Blooms and our expectations are higher - raise them up! Just notate them as "teacher added" so that someone who reads them afterwards understands. (This happened after it was observed by more than one group that the NYS Social Studies standards are fairly low level.)

- Under this model - essential questions and "big ideas" are not identical to the UbD model. They apply more to the individual units or learning goals - not quite as lofty.

- "Big Ideas" are those things that at the end of the learning activities, we would be happy that kids could articulate in their own language. "Essential Questions" are the questions that would get us the "Big Idea" answer.

Tomorrow - we write assessments!

Information Revolution

While I have been using technology for the past year and a half to work with teachers, Jenn is more intuitively techie than I. Often - I wait to see what Jenn had discovered and then ride her wake to make it applicable to what I do.

I count it as a major triumph that we led a Skype revolution this past week - getting virtually all of our C4L Fellows ready to go and my one colleague who doesn't necessarily have her laptop glued to her fingertips at all times seeing its power.

But in working with teachers on using technology, I bring up the three roadblocks that I continue to encounter:

1. Lack of technology comfort - note, I didn't say knowledge. We know how to do these things - the new technology tools are extremely logical. It is our comfort level with the tools, clicking on a button with confidence that we will not change the course of history, that seems to be a bigger issue.

2. Lack of school support - I can't tell you how many trainings I am asked to do where I have to email my list of links a week in advance to get them "temporarily" unblocked for the training, only to have the teachers then not be able to access the tools they were trained on a mere 24 hours later. If you want the training, trust the teachers!!

3. Lack of transparency - in other words, we don't want to publish our work, our questions, our classrooms. While we will talk about it with colleagues in the abstract, putting ourselves "out there" will make us transparent - and open to comment, critique, criticism. Who wants that?

We do!! Jenn showed in the previous post the tools that our students are using and asked (begged? pleaded? implored?) you to try at least one. Did you? Why not? If her post didn't compel you to do so - maybe this video will:

How will you find information in the future if you can't use these tools?

A reminder why we do this. . .

After two days of technology frustration, I wasn't ready to throw out my laptop, but I was ready to give it a stern talking to. Then I saw this. And I remembered why it is imperative that educators ride the wave with the students - not drown in their wake. I ask you to watch this and then commit to mastering one thing that is mentioned in the video. Exploring one idea or tool or thread that will make one of the students in this video bump into you in the sea that is the Web 2.0 and say "welcome, glad you're here." Please.

Blogging DATAG meeting October 5, 2007

Before leaving for DATAG, this posting on G-Town Talks came up on my feed and I was immediately inspired. One of the other attendees at DATAG also uses Skype and I would have liked to explored Skyping during a meeting as a means of processing the content but it was hard enough to follow David, much less type up notes and follow a Sype conversation.

Below are the notes I took during David Abrams' speech this morning. Any misquotes or misunderstandings are my fault.

There are more people in attendance today than I think I’ve ever seen here. David Abrams is our first speaker – as the Assistant Commissioner for all things testing related, he’s the top of the food chain.

So – before we get started. . . I’m wondering:
Will the rest of the state adopt the model being used by NYC? (3 part report cards based on value-added, parent surveys, and school walk-throughs)
When are they going to pitch NYStart out the window and start again? (Note: 10 mins into his presentation, David said it was no longer his cross to bear. That answers that question.)

Introduction from Brian Preston – yup, it’s the largest attendance. More than the summer conference. Our theme is “state assessment and future models”. Sexy!

Background on David Abrams – Original member of DATAG (from his time in Albany). He was a high school English teacher and I think this shows in his presentation style.

Recognition from David for the work of DATAG (wow – he talks really fast), especially during the changes in testing programs. Apparently, he gets letters from across the state that leads him to believe that not everyone understands the system and reason for the changes. (Good plug for DIG’s – getting people involved in local groups with a more relaxed environment).

For the irrelevant portion of the program – David advised us all to join DATAG so we can “get on the listserv and bitch”. There hasn’t been a lot of that lately but point taken.

It’s official – there is a next generation of the accountability system and is in development.

David’s PowerPoint is pulled together from the Commissioner’s PowerPoints and will highlight the data points that David find interesting. Starting with English. There’s been movement over the last few years in a positive direction. Some discussion about the area of Reading as its own focus (as opposed to literary or reading in the content areas). The idea of students as independent readers emerges in middle school. David is interested in exploring the concept of a middle level literacy profile. Getting those struggling middle level readers what they need. The reality is that there are very few reading teachers at the middle school or high school level.

Comment about tests being built form two primary languages – primary verbal language and primary math language – but I’m not sure what he meant by that. He’s used the phrase not happening "at scale" twice now. This issue of discrepancy of data use and understanding across the state is apparently an issue.

Ah – the testing policy for the ELLs. As a data person, David wasn’t freaked out by the changes in the ELL policy. He wanted the ELA data as an additional data point. He shared he was asked to present at hearings in DC but didn’t want to share what he stance he took. His previous statement leads me to believe that he supported testing of ELL in English after one year, rather than 3.

Native born students – came into system in K or grade 1. District should be getting these kids at grade level by Grade 3.

Newly arrived – some discussion about the role of literacy in Language 1 and the impact on education. Clearly, David has spent a lot of time thinking about this issue. All of his comments make sense in the large scale sense of the testing program. He has the ultimate view of aggregate data.

Students with Interrupted Formal Education – missed the point he was making here but he used the words heterogonous and homogenous about six times in the same sentence.

Test was designed to “get in and get out” – tests have 20-30 MC items but everyone knows “that one item can really make a difference for Level 4’s”. There is a need to refine the tails of the test in order to get better data for the Level 1’s and Level 4’s.

Discussion about Students with Disabilities. Raised the point that diagnosis patterns are not consistent across districts. Mindset about testing SWD has changed but more work is needed around who is being identified and who is not.

Interesting political point – David just referred to the reauthorization of Title 1 “which was called No Child Left Behind”. He said he’s trying to break the habit of calling it NCLB. Hum. . .

So – to summarize the first half an hour. David is a proponent of longer tests that get better data and are more specific at the tails. He’s aware of issues for SWD and ELLs.

Another use of the phrase “at scale” – take a drink of Diet Pepsi.

Just got confirmation the state tests are not designed to tell Mrs. Jones and Apple School how her kids are doing. It’s to assess how New York State schools are doing in implementing the NYS Learning Standards. It is always powerful to hear him say that and I wish he’d said it a little slower.

HS Math. Ok – this year is a baseline of math standards – all together now, “at scale”. The number of students at Level 1 in grade 8 is scary and has major implications for HS programs. Schools should really take a look at the Grade 9 math program. Every HS principal should be looking at how students are being taught math in Grade 9.

70% of LEP population in NYS speaks Spanish. Wow. Students can take the math test in their dominate language so there are minimal ELL issues, but there are still gaps between Hispanic/Black and White/Asian students. Expects we’ll move into incremental movements on the math test.

Continued issue of item density at the tails. The break between Low 2 and High 1 is narrow. Need to identify who is “floating” right above standard cut points. Commissioner won’t talk about Standard Error because it will “break the brains of the press” (ha!) but districts need to be aware of those issues. He just flew through an example of a thermometer in the sun or in the shade but I lost the context, sorry.

Just got confirmation that David supports formative assessment and multiple measures. He said: some districts are buying formative assessment programs and using them as a summative program. This is bad. (I’m paraphrasing)

Review of standard setting process for Integrated Algebra test – I think there was some really important stuff there about pre-equating, post-equating, open-ended and closed responses but I wasn’t able to catch all of it and when I re-read what I did write, it made no sense. So, I’ll infer that David wants a dense test, not long and wants to make sure it’s done right. He just challenged people to prove that the Regents don’t test higher order skills – he can prove that they do. Any takers?

He said “fat data set” when describing data collected during the standard setting process. I'm totally stealing that.

Sample test for the Integrated Algebra will be out by Halloween. He would recommend that every High School in New York State pull together all math teachers and break apart the sampler. What is the range of difficult? What are the standards? DO NOT DROP THE SAMPLER ON A KID’S DESK AND SAY “TAKE IT.” Teach the curriculum. The value of the sampler is for the instructional staff. Have meetings with Grade 8 math teachers. Don’t look at it in isolation. Consider the curriculum, consider the core standards document.

David has been arguing with SED about test design, psychometrics, and protecting the integrity of the test for the students. He confessed that he is scared by the fact that he is extracting measurement from children. Live human children. His job is to protect the individual rights of the individual students. Aww!

The most stable data set for setting a standard is the operational data. David has spent a long time and asked a lot of people about the best way to do standard setting. Unlike 3-8, when your standard setting six tests at a time, David is confident that they can do standard setting within one week for the math test. The entire testing program has been audited and peer review by the USDOE. Arguments that the test is not aligned or unfair do not hold water. Conversion chart should be up by the morning of June 26th.

David confessed that he can’t sit still. Not a surprise.

He went to Pearson and gave them the “hairy death eye” on behalf of NYS. He reviewed their scoring procedures, see their scanning center, and meet with everyone who will be working on NYS assessments. He came out of the trip with several ideas on how to make things go smoothly in June.

He’s drafting a logistic memo that will be release by November. It will say “here are the rules, here’s what will happen, here’s what everything will look like.” This is to cut off (he actually used the phrase CYA) anyone who says they didn’t get notification in time.

Will do a full formal dry run in the Spring to vet any IT problems. Wants to make sure that files are accessible and can be viewed by all BOCES. Don’t worry about structure and format, as he’s given his work that the file formats will be compatible.

Information about the new tests will be posted at:

Accountability Update: Status versus Growth

Status Model: takes a snapshot of a subgroup’s or school’s level of student proficiency at one point in time and often compares that proficiency level with an establish target. New cohort each time.

Growth: Variation on status. Take a snapshot at each consecutive year and compare to previous year. Lots of different approaches. Lots of discussion and issues of standard error.

Real issue is tension between governance and school improvement.

David likes the New York Yankees. Dislikes Red Sox.
Likes Lobster. Dislikes Liver.
These do not impact his judgment.
He does not like or dislike value- or growth-added. His job to find out what works best and which is the most sound.

The goal is to build from status to growth. What are they doing? He won’t tell us. But he will tell us what he’s exploring.

Growth is not allowed. NYS is not in model because we started a year too late. Other states started a year before NYS.

David has been meeting with USDOE around the NYS model. He shared the name of several places that do models – it’s all about growth NOT value-added.

There is tension in the next generation because people want the system to inform decisions at the school level. “The best way to inform school decisions is through a multiple-measure system of assessment.” However, this system needs a standard “spine”.

David is scared by Margaret Spelling. I wonder if that’s a like or dislike?

NYS is researching if we can build a full vertical scale. No time for discussion today. His PowerPoint at this point summarizes most of his monologue. Note that the new model will start in 08-09. I wonder if this will change after the federal election?

Transparent does not mean easily understandable. A system this complex cannot be easily understood. It’s is rocket science. I think that was a little shout-out to the Geeks in the room.

OK – here’s a question. He said growth NOT value but the slide he just showed that the design is destined to align to value-added in 2010-11. I asked my question aloud – not sure I know the difference between the two. What I got from his response was that one is related to large-scale accountability, one isn’t. Social Studies was used as an example but I’m sure how it fits into my question. I’m hoping the next session will explain the difference between value-added and growth-added at a slower pace.

David will forward four references to the listserv about large-scale assessment. Thung at Michigan State is a recommended author as well as some folks at UCLA.

Lunch break and then value-added.