Tuesday, August 3, 2010

Measurement, Incentives, and Educational Reform

In my last post, I talked about how the way we assess the quality and effectiveness of education comes to determine the overall quality and effectiveness of our entire education system. I made that argument principally with regard to colleges and universities, but in fact, the relationship between assessments and outcomes is one of the most important questions public education in America today—and it is a discussion many layers deep. In this post, I will peel back another layer of that debate, and disassemble the assumptions behind the arguments in my last post.

If you missed the last post, my point was this: in a system as massive as that of American higher education, educational quality can be maintained only to the extent that educational quality is measured: in the other words, you only get what you test for. Why? Because if we do not measure educational quality, then we cannot know where and when it exists; if we do not know where and when it exists, then we cannot reward anyone—teachers, administrators, and staff—for producing it, nor punish them for failing to do so; and if there are no incentives to produce it, then why should anyone care about educational quality? Now, of course, there is another reason why anyone would care, but I'll get to that a little later.

As obsessively mechanistic as the above argument may sound, it constitutes a fundamental assumption behind contemporary American education policy. What's more, there's evidence that that assumption is at least partly correct. The seminal education act of the past couple decades, No Child Left Behind, nationalized what had been a growing movement at the state level towards high-stakes testing. The principle behind this movement is that you test kids regularly, and you reward or punish teachers and administrators based on students' performance. Why? Because, if you don't, teachers and administrators won't bother educating anyone.

When that law was passed, anecdotes of teachers reading the newspaper at their desks instead of teaching were used as evidence for that view. Concrete data on the extent of teacher-slacking is difficult to obtain, of course, since teachers try not to get caught doing it. There's no doubt, however, that America's schools were in a sorry state before the law was passed, and there's some evidence that educational achievement has increased significantly in the years since.[1] Naturally, there are those who question the evidence.[2] I'm no position, currently, to analyze the statistical work and pass any judgment over it—and even if I did, why should you trust my opinion?

What's more interesting and less ambiguous is the impact of NCLB on instruction: suddenly, everyone's teaching to the test.[3] So now, what you test for is precisely what your educators focus on—and anything that's not tested, probably won't get taught. We have here a self-fulfilling prophecy: we believe that educators will only educate to the extent that we measure and incentivize them, so we create a law that measures and incentivizes them so thoroughly and so specifically that they have no time to teach anything besides what we measure. Now our original assumption is true—but was it true to begin with?

In the words of Andre 3000, we-we-we-well, yes and no. No, it wasn't true everywhere, it wasn't true for every educator, it may not have been wholly true for any of them. We know this, because before the days of high-stakes testing—i.e. in the absence of any formal, nationwide or statewide assessments and incentives—some kids still got educated. But, yes, it happened to some degree. It must have, because the world has plenty of greedy, lazy, or simply self-satisfied and ineffectual people, and god knows some of them ended up in education. Sure, most people who work in education get into it, at least initially, because they care about kids, social justice, civic responsibility and so on—but over time, some burn out; some get cushy jobs as district superintendents; some find that their own economic needs come to outweigh their higher ideals; some care about certain types of kids more than others; some don't even realize that they've stopped trying. I believe that this last type is the most common—I've seen these people, I've worked with them.

What we know for certain is that not all teachers and administrators are equal. Some schools are safe, supportive, educational communities while others, in the same neighborhoods, are war-zones; some classrooms are orderly, efficient learning environments, while others, in the same school, full of demographically equivalent kids, are chaos. Clearly, immutable differences of talent and personality play a big role in separating good teachers or administrators from bad ones, but different amounts of effort, different styles of school- and classroom-management, and different instructional practices have a big impact as well . The anecdotal evidence is consistent in showing that the best teachers and principals are also the hardest working; what's more, they tend to adhere to a recognizable set of best practices, which various observers have sought, with some success, to identify.[4] Studying these best practices is humbling, and less devoted teachers, or less talented ones—even those who are committed to their jobs and believe in the project of education—are often too proud to undertake it or simply unaware that such practices exist.

So, consider the federal government's position. We have an education system that's chronically underperforming and a teaching-force of mixed but improvable quality, and we want to make things better. There are three possible approaches.

We can throw money at the problem. We can raise per-pupil spending with absolutely no strings attached, which rarely accomplishes anything, or we can direct that money at some clearly-defined target: we can up teacher salaries, for example, or reduce class sizes, but the evidence is that raising teacher salaries has little impact if you don't enact policies to insure that you get better teachers, and reducing class sizes doesn't do much, if instructional practices remain the same.[5] Free breakfast programs can have a positive impact on learning and behavior, but a school crippled by gang violence, highschool illiteracy, and absenteeism isn't going to suddenly turn around if you give all the kids free breakfast. More money's not enough: you have to use the money in intelligent ways.

So, instead, you can mandate specific policies. You can create more stringent requirements for teacher certification, you can mandate a national curriculum, you can standardize teacher training and instructional methods, etc.—but few would advocate for such uniformity in our education system, especially in a country as diverse as America, where different regions have different demographics and different needs. Moreover, if we did institute such policies, we'd have to spend a fortune monitoring schools to ensure that the policies were properly carried out and a second fortune studying the impact of the policies to see if they were doing any good.

What we need, clearly, is a way to ensure that money that goes into education is used effectively and innovatively to improve educational outcomes, without specifically mandating how that money is used. Well, here's a brilliant, elegant idea: pay attention to the outcomes. Don't worry about how educators are using public money, worry about the results they're getting. Test the kids—after all, that's who this is all about in the end—and see if they're learning anything. That way, educators will have the freedom to innovate and adapt according to the specific circumstances in which they are working, but will also be held accountable for the success of their work. It's beautiful. It's simple. It might even work... sort of.

So, we see that government, in trying to systematically improve education, has no better recourse than to measure educational outcomes and put stakes on them—if you've got a better idea, please write in!—but we also see that doing so leads to a system where all anyone cares about is state tests, and all kids do is cram for them.[6] So, we come at last to a question that I've been aiming to address: what do we do? Answering that question will force us to peel back yet another layer of this debate, however, and so it will wait for a future post…

[1] (2006) No Child Left Behind Act Is Working Department of Education. Retrieved 6/7/07

[2] Linda Perlstein, Tested, Henry Holt & Company, 2007. To be honest, I pulled this and the preceding reference off the Wikipedia page for No Child Left Behind. I'm not interested in doing a literature review of the impact of NCLB. Clearly, there's going to be a lot of debate on that topic, and the conclusion is not that important to the point I'm making in this post. 'Nuff said.

[3] That's a statement that I'm not going to bother backing up with statistical data. Anyone with any awareness of public issues in this country has been hearing, for seven years, about teachers and administrators complaining that they have no choice but to teach to the test; I've been one of those teachers, I've taught to the test, I've complained about it. When you're constantly inundated with such anecdotal evidence, to wait until sociologists develop some quantitative measure of the degree to which teachers teach to the test, then wait again for them to conduct a nationwide study, then wait again for other researchers to find flaws in the study and to conduct a study with different parameters and obtain different results—well, that's just crazy. To insist on such "scientific" evidence is to close ones eyes to reality.

[4] Best practices no doubt differ from one educational context to another. Because of the current focus on urban education and the direness of that project, which drives educators to more rigorous self-analysis, the best practices in that context are more thoroughly documented than those in other educational contexts. For a good analysis of effective instructional and management techniques in urban education, see Doug Lemov's Taxonomy of Best Teaching Practices.

[5] Heckman (2000) writes "there is a growing consensus indicating that within current ranges in most developed economies, measured inputs such as class size and spending per pupil have little, if any, effect on the future earnings of students."
     See also:
Hanushek, E. (1998). The evidence on class size. Occasional Paper Number 98-1. W. A. Wallis Center, University of Rochester.
Card, D. & Krueger, A. (1996). School resources and student outcomes: An overview of the literature and new evidence from north and south Carolina. Journal of Economic Perspectives, 10, 31–50.

[6] Actually, as far as I can work out, that's only happening at poorer schools. Wealthier children, who receive more educational support from their parents, are so far ahead of state curricula that the schools serving them don't really need to worry about state tests and can focus on other educational goals. The vast majority, however, orient their curriculums instead to help kids achieve high scores on the SAT and AP tests.
     In fact, though, that leads to a vastly better education, for two reasons. First, the SAT test is a better test than most state exams, and the APs are very good tests, consisting largely of open-response problems. Thus, they test deeper knowledge and motivate deeper teaching. Second, because such tests occur only at the end of highschool, higher performing schools are able to take a long view and develop a curriculum that will build a strong foundation of knowledge and skills, rather than scramble year after year, to get kids through a state curriculum for which they lack the foundational knowledge.


  1. You write: "Because if we do not measure educational quality, then we cannot know where and when it exists..."

    Does this mean that in, say, classical Athens or 19th Century England or any part of American history up until, probably, WWII -- where, presumably, no one was even thinking about measuring educational quality -- they didn't know where and when it existed?

  2. Certainly, educational quality was measured in 19th Century England. Under the recitation system of education then in vogue, classes consisted largely of students standing up and reciting, verbatim or in paraphrase, sections from the preceding night's reading or from material presented by the professor during the preceding class. These recitations were assessments, in that they told the professor how well the student had learned the preceding day's material-- and they were frequent and lengthy. If anything, I'd say the recitation system was excessively weighted towards assessment over instruction. I don’t know what happened in Ancient Greece, but I’m sure there were tests.

    What there may not have been in 19th Century England—and surely was not in Ancient Greece—is any kind of standardized, nation-wide assessment. I’m not sure when standardized testing began in America, but elite universities had begun using the SAT as a meritocratic means of identifying students deserving scholarships, by the mid-1930s. This constituted a nationally standardized, if not universally administered exam. According to Wikipedia, though, China had a system of standardized assessments called the Imperial Examinations, which were used pretty much continuously from 605 AD until 1905 AD, in order to make the selection of officials for the imperial bureaucracy more—you guessed it—meritocratic.

    The thing to remember in all of this, is that when you measure the student, you’re inevitably measuring the teacher and the school as well. A measure of student achievement is also a measure of educational quality.

  3. "Well, here's a brilliant, elegant idea: pay attention to the outcomes."

    I can't speak for pre-college education, but "outcomes assessment" has been a buzzword in higher education for some years. I believe it has entered the protocols of accrediting organizations, such as the Middle States group responsible for accrediting colleges. Colleges must document what they are doing to assess outcomes. At Queens College, CUNY, departments must write explanations of their outcomes assessment policy as part of the official self-study in the accreditation process.

  4. fascinating. I hadn't heard about this. It may signify the beginning of a more rigorous approach to measuring educational quality at the college level. Of course, with rigor comes rigidity.

    What's your take, JB? Do you find that these outcomes assessments get in the way of your instruction? Do they even affect you? Are you involved in designing the assessments?

    If you have any links to info on this self-study, go ahead and post them. I'd like to read up on it.