JAC Home
About JAC
Current Volume
Archives
Subscriptions
Submissions
Contact Us

JAC Volume 12 Issue 2

Editor:
Gary A. Olson

Back to 12.2 ToC

Evaluating Writing Programs: Paradigms, Problems, Possibilities

Susan H. McLeod

Those of us involved in writing programs, both departmental and campus-wide, need to know about program evaluation. This need stems not only from the fact that we want information about how to improve our programs, but also because others (administrators, legislators, funding agencies) keep asking us rather insistently how well these programs are working. In recent years the nation-wide movement toward measuring educational outcomes in higher education (see Boyer et al.) has increased the pressure to evaluate individual programs in order to demonstrate their effectiveness. Because of their high visibility (on some campuses, writing courses are the only courses all students must take) and because of their connection to campus-wide initiatives (general education reform, writing across the curriculum), writing programs are among the first to feel that pressure.

But evaluating writing programs, like evaluating writing itself, is not an easy task. Those who have written about program evaluation show at some length just how difficult it can be.1 Many programs are multifaceted, involving staff and curriculum development, articulation of courses, and some administrative scaffolding; the sheer complexity of these programs is one reason that the art of evaluating them remains, as Faigley and Witte say, in its infancy (66). Further, what Fulwiler says about WAC programs is true of writing programs in general: they are institution-specific; an evaluation design that works well for one program may not fit one at another institution where the program configuration is quite different (62). But a major obstacle in the path of writing program evaluation has to do with evaluation methods themselves, as both the research community and those who want to examine the results of program evaluation understand those methods (since there is always a gap between the present state of research and actually putting that research into practice). What I should like to do here is describe two methodological approaches to evaluation (quantitative and qualitative) that grow out of two different research paradigms, suggest the problems these methods present, and discuss the ways in which an awareness of these problems will help us understand more clearly how to work with researchers in designing useful evaluations of our writing programs.

Assessing, Evaluating, and Testing

First, let me define terms. The words assessment, evaluation, and testing are often used synonymously by those who write about or insist upon program review. I should like to make some distinctions, however, since I will be discussing program evaluation in an institutional context. Assessment as it is now generally understood in academic institutions involves data gathering of all sorts, and it serves two purposes: improvement of institutional functioning, and accountability to the institution's various constituencies. Many universities now have offices of institutional research, where busy employees create data bases to track enrollment trends and retention rates (up or down this year?), calculate student and faculty FTE's and student/faculty ratios, and provide answers to arcane but important questions (of all first-year students who identified themselves as ESL students, how many passed freshman composition during the Fall semester?). Other parts of the institution are continually assessing issues related to planning and management, budget, personnel, physical facilities, and academic units. Assessment, in other words, is and should be an ongoing part of any university. Program assessment activities ask such questions as, "Where are we now (as opposed to where we were)? Where do we seem to be headed?" In institutional terms, evaluation is part (but not all) of assessment; it involves judging worth and making decisions based on the data generated by assessment activities. Personnel are evaluated regularly (often for merit raises, promotion, or tenure), as are programs and academic units (often for accreditation or reallocation of resources). Program evaluation activities ask such questions as, "How good is this? How could it be improved? Should it be continued/modified/eliminated?"2 Testing is only one form of evaluation. It asks such questions as, "How much does this person (or population) know of this material? How well can he or she apply certain concepts or demonstrate particular skills?" It is almost always part of our evaluation of student learning in specific classes, but it is not always a part of program evaluation. As those who write about program evaluation warn, testing should never be the only instrument used to evaluate programs that have multiple goals and objectives (as most programs do). These three terms may be thought of as nested boxes, assessment being the largest and most inclusive, and testing being the smallest and most specific.

Quantitative and Qualitative Methods

Evaluation methodology was born within the research tradition of the social sciences, which was modeled on the positivist tradition of the natural sciences. This tradition—or paradigm, in the Kuhnian sense of a system of shared beliefs or a world view—rested on certain assumptions. Carol Berkenkotter sums up these assumptions as follows:
Social scientists, like physical scientists, are detached from their objects of study.

Investigations of social phenomena can therefore be conducted in a value-neutral fashion, the researcher eliminating all personal bias and preconceptions and employing language that expresses objectivity.

Social science, like the physical sciences, is nomothetic—that is, it is possible to extrapolate from social scientists' data social laws that apply across numerous contexts. (71)
The methodology that has come to be identified with these positivist assumptions is experimental and quantitative, involving careful research designs with controls for variables and statistical formulae for making sense of the data generated; the now-familiar experimental article, the rhetoric of which Charles Bazerman analyzed, not only discusses the outcomes of the experiment but also displays significant data in tables and charts.

While not everyone who uses these methods would subscribe to the positivist epistemology that gave birth to them (for example, see Linda Flower's eloquent discussion of this issue), the methods themselves are still very much alive and are also very much a part of evaluation studies; writing program evaluations are no exception. Witte and Faigley describe several such studies: of Northern Iowa University, which was a comparison between the performance of students who had taken composition and students who had not; the University of California at San Diego, which compared the results of different instructional methods in four writing programs within the university; Miami University, which compared the performance of students in a sentence-combining curriculum to those enrolled in a "traditional" composition course; and the University of Texas (the authors' own study), which compared the performance of students in a "sentence expansion" curriculum with that of students in a "traditional" course. Unfortunately, these evaluations yielded little in the way of useful information.3 As Witte and Faigley say, "Evaluation studies, including our own, which were based on the quantitative model have yielded few major insights concerning the teaching of writing or the operation of writing programs" (38). Quantitative methods by themselves cannot do justice to programs that involve not only students but also faculty, curriculum, and administrative structures.

There has been, however, a paradigm shift in evaluation studies, as described by Lincoln and Guba in two works, Naturalistic Inquiry and the more recent Fourth Generation Evaluation. The new paradigm, which has been described not only as post-positivist but also "hermeneutical" (Hesse), "constructivist" (Guba and Lincoln), and "naturalist" (Lincoln and Guba), calls into question all the basic assumptions of quantitative methodology, proposing instead qualitative (that is, interpretative or ethnographic) methods. Lincoln and Guba contrast the two paradigms4 as follows:

Axiom 1: The nature of reality (ontology)

Positivist version: There is a single tangible reality "out there" fragmentable into independent variables and processes, any of which can be studied independently of the others; inquiry can converge onto that reality until, finally, it can be predicted and controlled.

Naturalist version: There are multiple constructed realities that can be studied holistically; inquiry into these multiple realities will inevitably diverge (each inquiry raises more questions than it answers) so that prediction and control are unlikely outcomes although some level of understanding . . . can be achieved.

Axiom 2: The relationship of knower to known (epistemology)

Positivist version: The inquirer and the object of inquiry are independent; the knower and the known constitute a discrete dualism.

Naturalist version: The inquirer and the "object" of inquiry interact to influence one another; knower and known are inseparable.

Axiom 3: The possibility of generalization

Positivist version: The aim of inquiry is to develop a nomothetic [law-based] body of knowledge in the form of generalizations that are truth statements free from both time and context (they will hold anywhere and at any time).

Naturalist version: The aim of inquiry is to develop an ideographic [based on symbol] body of knowledge in the form of "working hypotheses" that describe the individual case.

Axiom 4: The possibility of causal linkages

Positivist version: Every action can be explained as the result (effect) of a real cause that precedes the effect temporally (or is at least simultaneous with it).

Naturalist version: All entities are in a state of mutual simultaneous shaping so that it is impossible to distinguish causes from effects.

Axiom 5: The role of values in inquiry (axiology)

Positivist version: Inquiry is value-free and can be guaranteed to be so by virtue of the objective methodology employed.

Naturalist version: Inquiry is value-bound. (37-38)

This paradigm shift in the evaluation community is good news for writing program evaluation. The assertions that reality is complex and diverse, that causality is mutual rather than direct, that evaluation, like any inquiry, cannot be value-free (indeed, that we should make value-oriented discourse the center of our concerns in evaluation; see Schwandt)—all these are assertions with which many of us in the composition community would agree. The qualitative methods associated with this paradigm are ones with which we are more comfortable, ones which promise to provide a more workable evaluation design for our multifaceted programs.

Qualitative Evaluation

What would constitute such an evaluation is suggested by several researchers. Witte and Faigley tell us that a program evaluation needs to do at least two things: specify the components of writing program evaluation (such as the cultural and social context in which the program exists, the institutional context, the program structure and administration, the content or curriculum of the program, and the instruction involved), and the interactions among these components, which will allow us to examine the effects of the program (intended or unintended). Effects may be observable during the program, "outcomes" evident at the end of a program, or long-range effects evident only after a longitudinal study. They can be seen through many different lenses, including various kinds of data (written products, attitudes, teaching methods) (39-41). The evaluator looks at the components of the program and its effects, and the questions generated by both. (For example, is the program meeting the needs of the institution? In what way?) The evaluator also looks at the goals and objectives of the program and asks how well they have been achieved, never forgetting that unintended effects may be just as important as goals achieved (57-63).

Lincoln and Guba are more specific about how to proceed. They list various elements of a naturalistic inquiry, such as creating the focus of the evaluation, deciding where and from whom data will be collected, determining successive phases of the inquiry, determining instrumentation, planning data collection and recording modes, devising data analysis procedures, planning logistics, and preparing for "trustworthiness" of the study—the latter being the naturalist's answer to validity, reliability, and objectivity. Such a study would rely heavily on human instrumentation (such as, interviews and observation) rather than pencil and paper instruments; its focus would not be laid out ahead of time, but would be emergent; data gathered would not be just written products, but also results of interviews and field notes. Data analysis would not be strictly statistical but interpretive, in the tradition of the anthropological field study (226-49; 332-33).5

Yet, as Lincoln and Guba point out, it is paradoxical to discuss evaluation design in qualitative inquiry, since the design is emergent. The focus of the study may change, the instrumentation is refined as one goes along, the data analysis is open-ended and inductive, the timing cannot be predicted, and the expected end products cannot be specified (224-25). The methodology is messy and labor-intensive and suffers from a perceived lack of rigor among those used to quantitative evaluations.6 And given the methodology, the budget for a naturalistic evaluation is virtually unspecifiable. Lincoln and Guba comment that rather than deal with the question "What will it cost to carry out these tasks?" the funder must ask "What am I willing to spend?" (225). Imagine saying to one of your own administrators, "I'm not really sure how much this will cost, Dean Tightfist. How much are you willing to spend?" You would, at best, be laughed out of the office; responsible administrators want to know ahead of time precisely how much an evaluation will cost them, what activities they are paying for, and what kind of bang to expect for their buck. Moreover, many university administrators above the level of dean, especially at research institutions, are scientists by training and are used to quantitative methods. They deal with numbers and statistics as part of their jobs as administrators, they know how to interpret such data, and they expect that sort of information from the rest of us to support our requests and justify the existence of our programs. They would find qualitative data exotic at best.7

This latter problem is perhaps the most serious difficulty of all for those of us who must deal not only with administrators but also with legislators and boards of regents who control the purse strings and who do not have the time or inclination to wade through the complexities of qualitative data—indeed, who sometimes want statistics and hard data but do not even understand this information when we give it to them. Ed White gives us a glimpse of this hard reality in a wonderful anecdote in Developing Successful College Writing Programs. White was testifying before the California Senate Committee on Finance as the expert witness, trying to persuade the legislature to fund a program for students who scored in the low range of the state-wide English Placement Test. The chair of the committee, whom White describes as a "venerable but notoriously shrewd sheep rancher from the central valley," asked White a question about the placement test: how many students flunked it? White continues,

I was ready for the question and had prepared a careful response. "I can't give you a simple number, Senator," I said. "The test is designed to give a profile of student skills, not just a pass or fail. We report a set of six separate scores to each campus for each student; the campus then analyzes the scores and places the student appropriately in whatever curriculum the campus has in place for entering students." I sat back, feeling that I had started out just fine.
    But the senator was frowning, and when he spoke he was no longer friendly. "Just like a professor!" he barked at no one in particular. "You ask a simple question and all you get back is a bunch of gobbledygook!" (193)
White was rescued by an administrator he describes as the dean-of-getting-money-from-the-legislature, who sprang into action, telling the committee that fully half of those who took the test failed it. How did he know? Why, that was the number that scored below the fiftieth percentile. In other words, White tells us, he told the committee chair that the lower half of the scores was the lower half of the scores. "'Thank you,' said the now genial sheep rancher. 'It's a relief to have somebody at that table who can give a straight answer when we need one'" (194). The writing program was funded. White is not, of course, advocating that we lie to our legislators—only that we need to know our audiences and how to present information to them when we discuss evaluation.

So there is a paradox here, a fundamental contradiction between the paradigm of evaluation which seems best suited to writing programs (post-positivist and qualitative) and the kind that will yield the information administrators, legislators, and funding agencies usually understand and want from us (positivist and quantitative). But this contradiction need not hinder our evaluation efforts if we understand that these two paradigms of evaluation are not ones we must subscribe to with theological fervor, clinging to one and rejecting the other. They are instead different ways of looking at the world, different stances, different lenses through which we may examine phenomena. They need not be incompatible, as many researchers have pointed out (see, for example, Howe). In evaluation, we need to ask not which paradigm is "correct," but which view of the data is appropriate for a particular purpose. Lincoln points to an analogy in mathematics:

The axioms that make up Euclidian geometry have served us well for several millennia here on earth—where it is useful to have triangles with interior angles of 180!, and where all lines are straight (or at least where we might pretend they are straight). Turn those axioms on their heads, however, and you have what appears to be nonsense. Who could use triangles whose interior angles only approached 180! as the triangles got smaller? What if the shortest distance between two points were not a straight line, but a curved one, or several thousand of them? What could one do with a geometry with such axioms?
    The quick answer is this: You can put people on the moon with such a geometry—which is called Lobatchevskian—and you can bring them back home again. The point is fit. Euclid's is the axiomatic set of choice in some instances and other geometries are the sets of choice when you have other kinds of problems to solve. (32)
As long as we are aware of the paradigms, we can choose our methods carefully and wisely, according to how well they fit what we and others need to know in order to make decisions about our programs.

We can do this by designing what Michael Patton calls "utilization-focused evaluation." One designs and organizes such an evaluation by first identifying the relevant decision makers and then asking, "What information do these people need and what will they do with it?" The answers to these questions then shape the evaluation: what paradigm to use, what data to collect, what research methods to use, and how to present the results (Utilization passim). This sounds familiar to those of us who teach writing (know your audience and purpose before you begin), and in designing program evaluation the advice to start with such questions is just as easy and as difficult as it is in writing a paper. Furthermore, this advice about where to start assumes at least a passing knowledge of certain fundamentals: of such things as goals clarification, design alternatives (matching research design to program design), questionnaire construction, interview techniques, methods of data collection, analysis, and presentation. It also assumes, as in some writing situations, a certain familiarity with various models so that one can adapt models of evaluation to one's own use.8

Such a utilization-focused design for program evaluation is particularly appropriate for writing programs in that it is situational and contextual, applicable to the uniqueness of each program. According to what the decision-makers (including ourselves) need, we might or might not want to interview students and faculty, conduct case studies of selected students and faculty, examine expressive and Likert Scale evaluations of classes and of staff workshops or training sessions, conduct writing attitude surveys of both students and faculty, collect and track the use of assignments and syllabi, set up experimental (or quasi-experimental) matched classes to study classroom practices. An evaluation focusing on the user of and the purpose for the data recognizes that there is no one particular paradigm, sacred methodology, or magic design for evaluation, but a variety of creative possibilities. For that reason it is a challenging and to some a daunting task. But as those engaged in evaluation of their writing programs know, it is a task that can—and often must—be done.

Washington State University
Pullman, Washington

NOTES

1For example, see Davis, Scriven, and Thomas; Fulwiler; White, Developing and Teaching; Witte and Faigley; Young and Fulwiler.
2For a more extensive definition of evaluation, see Patton, Practical 33-37. Also see the definitions of these three terms in Scriven.
3For extensive critiques of each of these studies, see Witte and Faigley 8-38. Also see White's discussion of this evaluation research tradition in "Language."
4Few researchers in evaluation theory would accept the extreme versions of the two positions described here; these researchers, however, are not the ones who are asking us how our programs work. I believe it is useful to examine the extremes in order to understand why it is that non-researchers often react to qualitative research methods as "non-scientific." I would argue that the popular understanding of "scientific" methods is shaped by the positivist paradigm as described here.
5For a discussion of such an evaluation in writing-across-the-curriculum programs, see Fulwiler. See Guba for a discussion of methodology.
6See Skrtic 206-16 for a case study in such difficulties.
7This fact was brought home to me recently when an administrator I know, a social scientist whose administrative duties have prevented him from keeping up with research in his field, recommended against internal funding for a study using qualitative methods, telling the review committee that the methodology was "suspect." The study was later funded at a much higher level by an outside agency.
8For sensible advice on the fundamentals of evaluation, see the works by Patton. House provides a taxonomy of models. For advice on how to write up the results of an evaluation, see Morris and Fitz-Gibbon. A good example of a recent evaluation design for a writing-across-the-curriculum program based on multiple measures like these, is in White, Developing 207-08. The Council of Writing Program Administrators' Consultant-Evaluator program is an excellent example of utilization-focused evaluation; for a description of this program, see Lindemann and the article by the WPA Board of Consultant-Evaluators.

Works Cited

Bazerman, Charles. "Codifying the Social Scientific Style: The APA Publication Manual as a Behaviorist Rhetoric." Shaping Written Knowledge: The Genre and Activity of the Experimental Article in Science. Madison: U of Wisconsin P, 1988. 257-77.
Berkenkotter, Carol. "The Legacy of Positivism in Empirical Composition Research." Journal of Advanced Composition 9 (1989): 69-82.
Boyer, Carol M., et al. "Assessment and Outcomes Measurement: A View from the States." AAHE Bulletin (March 1987): 8-12.
Davis, Barbara Gross, Michael Scriven, and Susan Thomas. The Evaluation of Composition Instruction. 1981. New York: Teachers' College P, 1987.
Erickson, Frederick. "Qualitative Methods in Research on Teaching." Handbook of Research on Teaching. Ed. Merlin C. Wittrock. 3rd ed. New York: Macmillan, 1986.
Flower, Linda. "Cognition, Context, and Theory Building." College Composition and Communication 40 (1989): 282-311.
Fulwiler, Toby. "Evaluating Writing Across the Curriculum Programs." Strengthening Programs for Writing Across the Curriculum. Ed. Susan H. McLeod. San Francisco: Jossey, 1988. 61-75.
Guba, Egon G. Toward a Methodology of Naturalistic Inquiry in Educational Evaluation. Los Angeles: UCLA Center for the Study of Evaluation, 1978.
Guba, Egon G., and Yvonna S. Lincoln. Fourth Generation Evaluation. Newbury Park, CA: Sage, 1989.
Hesse, Mary. Revolutions and Reconstructions in the Philosophy of Science. Bloomington: Indiana UP, 1980.
Hillocks, George. Research on Written Composition: New Directions in Teaching. Urbana: NCRE/ERIC, 1986.
House, Ernest R. "Assumptions Underlying Evaluation Models." Educational Researcher 7.3 (1978): 4-12.
Howe, Kenneth. "Against the Quantitative-Qualitative Incompatibility Thesis, or Dogmas Die Hard." Educational Researcher 17.8 (1988): 10-16.
Lincoln, Yvonna S., ed. Organizational Theory and Inquiry: The Paradigm Revolution. Beverly Hills, CA: Sage, 1985.
Lincoln, Yvonna S., and Egon G. Guba. Naturalistic Inquiry. Beverly Hills, CA: Sage, 1985.
Lindemann, Erika. "Evaluating Writing Programs: What An Outside Evaluator Looks For." WPA: Writing Program Administration 3 (1979): 17-24.
Morris, Lynn L., and Carol T. Fitz-Gibbon. How to Present an Evaluation Report. Beverly Hills: Sage, 1978.
Patton, Michael Q. Practical Evaluation. Beverly Hills: Sage, 1982.
—. Utilization-Focused Evaluation. Beverly Hills: Sage, 1978.
Schwandt, Thomas A. "Recapturing Moral Discourse in Evaluation." Educational Researcher 18.8 (1989): 11-16.
Scriven, Michael. Evaluation Thesaurus. 4th ed. Newbury Park, CA: Sage, 1991.
Skrtic, Thomas M. "Doing Naturalistic Research into Educational Organizations." Organizational Theory and Inquiry. Ed. Yvonna S. Lincoln. Beverly Hills, CA: Sage, 1985. 185-220.
White, Edward M. Developing Successful College Writing Programs. San Francisco: Jossey, 1989.
—. "Language and Reality in Writing Assessment." College Composition and Communication 41 (1990): 187-200.
—. Teaching and Assessing Writing. San Francisco: Jossey, 1985.
Witte, Stephen P., and Lester Faigley. Evaluating College Writing Programs. Carbondale: Southern Illinois UP, 1983.
WPA Board of Consultant Evaluators. "Writing Program Evaluation: An Outline for Self-Study." WPA: Writing Program Administration 4 (1980): 23-28.
Young, Art, and Toby Fulwiler. Writing Across the Disciplines: Research Into Practice. Upper Montclair, NJ: Boynton, 1986.
 
   
Copyright 2006 by ATAC