Measuring Up 2000: The State-by-State Report Card for Higher Education

GRADING STUDENT LEARNING:
BETTER LUCK NEXT TIME
By Peter Ewell



Imagine receiving a report card that contained letter grades for your child's conduct and preparation for school but nothing about performance in English or Math. Policymakers and the public will face a similar situation in this inaugural issue of Measuring Up. The National Center's decision to award all states an Incomplete in the area of student learning is wise for many reasons. But including this "subject" in the first place highlights a significant gap in our ability as a nation to say something meaningful about what or how much our students learn in college.

Why Fifty Incompletes?

Photo Credit: Dennis Brack, Black Star
Peter Ewell is Senior Associate at the National Center for Higher Education Management Systems.

The decision not to award a letter grade in student learning is the right one because there are no common benchmarks that would allow meaningful state-to-state comparisons. This is not to say that individual states know nothing about student learning at higher levels within their own borders. Nor does it mean that there has been no interest in academic achievement at the national level.

Since about the mid-1980s, states have been seriously concerned about examining what students learn in college. But how the states act on this concern varies substantially. Most approach the task indirectly by asking each public college and university to administer a locally designed or locally chosen assessment and to report on what they find. Often this is done in loose partnership with regional accrediting bodies, all of which require colleges and universities to undertake some kind of student assessment.

Fewer than ten states administer a common test to large numbers of college students—and these states do so for different reasons. Some, like Florida (and to some extent, Texas and South Dakota), want to ensure that all students have the necessary knowledge and skills to progress through the system. Others, like Tennessee (and to some extent Missouri and Wisconsin), want to collect data in order to take stock of—and sometimes to reward—institutional performance. Still others examine students in only one area of study (for example, Georgia, in writing) or test students in one postsecondary sector (for example, community and technical colleges in West Virginia or state universities in California).

Such variations in scope and purpose mean that states employ very different methods when they assess college students, if they do so at all. And because states can mandate testing only in the public colleges and universities that they operate, they have no authority to assess students enrolled at private institutions. This range of variation doesn't mean that the states are doing nothing—though some are, in truth, doing nothing—but that the many different things they may be doing cannot be aggregated or compared.

We have seen periodic national interest in establishing common benchmarks for collegiate learning. The National Education Goals, which were proposed in 1990, included a provision that graduates of U.S. colleges in 2000 "will have increased markedly in their ability to think critically, communicate effectively, and solve problems." A four-year effort to design a national assessment system to measure these outcomes produced many good ideas and even a few workable prototypes. But consensus was hard to achieve about how and whether to proceed with what would have been an expensive and controversial enterprise, and the U.S. Department of Education proposal to undertake the project was never funded.

A few existing data sources provide some basis for estimating collegiate proficiency. For example, the National Adult Literacy Survey (NALS) conducted in 1992 can be used to estimate some higher-level literacy skills for college graduates. Aggregated results of the 1992 NALS, in fact, are included in the Educational Benefits grade in Measuring Up 2000. Also, large numbers of college students take statewide or nationwide examinations (professional licensure and professional school admissions tests, tests for aspiring K-12 teachers, and graduate school admissions exams). None of these, of course, were designed as national benchmarking tools, and the students who take them are neither broadly representative nor comparable across states. But this has not stopped some commentators from trying—with appropriate caveats—to make sense of them.

An Incomplete grade for all states is thus entirely appropriate, given the information we have. Some states are doing some things, and much of the "homework" needed to provide a national benchmark has already been completed. A prominent, but inadequate, grade might spur further action.

...
. "The decision not to award a letter grade in student learning is the right one because there are no common benchmarks that would allow meaningful state-to-state comparisons." .
...

Why Is This Hard?

National data on academic achievement have been available for K-12 students for many years through assessments like the National Assessment of Educational Progress (NAEP), and most states use standardized achievement tests to examine primary and secondary school students, either for benchmarking purposes or to certify progress. Why haven't we done the same for "grade 16"? There are at least four reasons.

First, there is relatively little consensus about what the core outcomes of a college education ought to be. Institutions of higher education differ by design in both clientele and mission, and policymakers are aware of and largely support such differences. True, all colleges and universities offer "general education" courses, which are supposed to be teaching similar skills and knowledge. But except for having acquired communications skills and a basic set of quantitative abilities, graduates in different majors at different institutions arguably ought to look different. Coming to an agreement about performance standards for core generic abilities like "critical thinking" and "problem-solving" is thus a formidable task for both educators and policymakers.

Second, performance on any college-level exit assessment depends a lot on the abilities that students had when they arrived on campus. This means that "outcomes measures" for many institutions say more about how selective their admissions policy is than about what students learn while attending them. Admittedly, this difficulty is less troublesome when we look at state-level outcomes. But in most states, debates about whether to assess college students' achievement have focused on measuring institution-level performance—sometimes for high-stakes purposes like performance funding. As might be expected, many colleges and universities have little interest in going down that path.

A third challenge is how to create assessment instruments that can measure the abilities that constitute successful performance for college graduates. Educators and employers agree that the requisite abilities are too complex to be measured by multiple-choice tests. An appropriate assessment would require students to write extensively, solve open-ended problems, and perform real-life tasks. We now know a lot about how to create these kinds of tests, but commercial test-makers have been understandably reluctant to do so until there is a demand for them—and the first two difficulties have up to now restricted demand. As a result, the inventory of standardized tests designed for large-scale assessments of the outcomes of higher education is quite limited. Perhaps more strikingly in the light of a burgeoning "assessment movement" in higher education, most of these tests are over ten years old. This means that adequate assessments need to be created largely from scratch, at considerable expense, and no state has yet been willing to foot the bill.

Finally, the few states that have statewide testing programs find it hard to create conditions under which students will do their best. This problem has also been encountered in K-12, but a more generalized culture of student compliance tends to mitigate its effects. Motivating young adults and older returning students to show up for an examination that does not affect their coursework—let alone motivating them to try hard when they do—is not an easy task.

What Might Be Done?

Taken together, these obstacles have proven formidable enough to deter most states from directly assessing student learning. Even if these obstacles were overcome, there are few incentives for states to cooperate in creating assessments that would allow meaningful state-to-state comparisons.

One thing not to do is to reward states for doing mindless testing using old and inadequate instruments. States have good reasons for choosing different paths in assessing student learning, and their testing programs have quite different goals. Moreover, states are unlikely (and largely unable) to enforce any assessment requirement that would involve testing students who attend private institutions—a substantial portion of the enrollment in some states.

Experience drawn from the widespread practice of standardized exit testing in K-12 suggests that there are serious side effects to an ill-considered common-testing approach: restricted access, "dumbed down" curricula, and teaching to the test. For these good reasons (and some bad ones as well), colleges and universities strongly resist proposals for exit testing. When public institutions are simply avoiding responsibility, states need to take firm and direct action. (Indeed, the next edition of Measuring Up might examine how well states are doing in this matter.) But the inescapable conclusion is that national benchmarks for student learning are not going to result from state-level efforts any time soon.

National initiatives appear both more appropriate and more promising. For example, if all goes well, a new National Assessment of Adult Literacy will be administered in 2002. A proposal is now on the table to administer the survey to samples of college sophomores and seniors in addition to the general population, and the tasks used to assess these students will reflect authentic college-level abilities. This assessment is likely to yield valid results for selected states, and these can provide a starting point for an analysis of student learning. Collecting data through a national initiative will also prevent inappropriate comparisons among different kinds of institutions, because the small sample sizes preclude the compilation of institution-level results. Comparing states, on the other hand, appears both feasible and justifiable, because every state ought to have an appropriate mix of institutions within its borders. If all goes well, the results will be available for a future edition of Measuring Up.

...
. "National data on academic achievement have been available for K-12 students for many years.… Why haven't we done the same for 'grade 16'?" .
...

Meanwhile, we must resume work on a responsible national assessment initiative for higher education. We know a lot more now than we did in the early 1990s about how to create task-based assessments that reflect the complexity of college-level work, and we can use new technologies to create highly interactive and challenging assessments. More than ever, such an assessment initiative is more a matter of political will rather than technical ability.

But a national assessment would solve only half our problem. To make real progress, institutions and faculty must reassert responsibility for the integrity of the degrees that they award. In our diverse system of higher education, we will always rely on local assessments to certify student learning. But we can attain greater uniformity in local standards by establishing clear benchmarks for achievement in key subject areas and by periodically examining typical examples of student work. This is what other countries do routinely, through national qualifications exams and external examiners.

National benchmarks and aligned local standards of achievement in core competencies are within our reach and can play an important role in improving performance. If we want to, we can make progress on both in time for the next edition of Measuring Up.

back to top