![]() |
![]() |
| |
JAC Volume 12 Issue 2 |
|
Editor: |
Evaluating Writing Programs: Paradigms, Problems, PossibilitiesSusan H. McLeodThose of us involved in writing programs, both
departmental and campus-wide, need to know about program evaluation.
This need stems not only from the fact that we want information about
how to improve our programs, but also because others (administrators,
legislators, funding agencies) keep asking us rather insistently how
well these programs are working. In recent years the nation-wide movement
toward measuring educational outcomes in higher education (see Boyer
et al.) has increased the pressure to evaluate individual programs
in order to demonstrate their effectiveness. Because of their high
visibility (on some campuses, writing courses are the only courses
all students must take) and because of their connection to campus-wide
initiatives (general education reform, writing across the curriculum),
writing programs are among the first to feel that pressure.
But evaluating writing programs, like evaluating writing itself, is not an easy task. Those who have written about program evaluation show at some length just how difficult it can be.1 Many programs are multifaceted, involving staff and curriculum development, articulation of courses, and some administrative scaffolding; the sheer complexity of these programs is one reason that the art of evaluating them remains, as Faigley and Witte say, in its infancy (66). Further, what Fulwiler says about WAC programs is true of writing programs in general: they are institution-specific; an evaluation design that works well for one program may not fit one at another institution where the program configuration is quite different (62). But a major obstacle in the path of writing program evaluation has to do with evaluation methods themselves, as both the research community and those who want to examine the results of program evaluation understand those methods (since there is always a gap between the present state of research and actually putting that research into practice). What I should like to do here is describe two methodological approaches to evaluation (quantitative and qualitative) that grow out of two different research paradigms, suggest the problems these methods present, and discuss the ways in which an awareness of these problems will help us understand more clearly how to work with researchers in designing useful evaluations of our writing programs. Assessing, Evaluating, and TestingFirst, let me define terms. The words assessment,
evaluation, and testing are often used synonymously
by those who write about or insist upon program review. I should like
to make some distinctions, however, since I will be discussing program
evaluation in an institutional context. Assessment as it
is now generally understood in academic institutions involves data
gathering of all sorts, and it serves two purposes: improvement of
institutional functioning, and accountability to the institution's
various constituencies. Many universities now have offices of institutional
research, where busy employees create data bases to track enrollment
trends and retention rates (up or down this year?), calculate student
and faculty FTE's and student/faculty ratios, and provide answers
to arcane but important questions (of all first-year students who
identified themselves as ESL students, how many passed freshman composition
during the Fall semester?). Other parts of the institution are continually
assessing issues related to planning and management, budget, personnel,
physical facilities, and academic units. Assessment, in other words,
is and should be an ongoing part of any university. Program assessment
activities ask such questions as, "Where are we now (as opposed to
where we were)? Where do we seem to be headed?" In institutional terms,
evaluation is part (but not all) of assessment; it involves
judging worth and making decisions based on the data generated by
assessment activities. Personnel are evaluated regularly (often for
merit raises, promotion, or tenure), as are programs and academic
units (often for accreditation or reallocation of resources). Program
evaluation activities ask such questions as, "How good is this? How
could it be improved? Should it be continued/modified/eliminated?"2
Testing is only one form of evaluation. It asks such questions
as, "How much does this person (or population) know of this material?
How well can he or she apply certain concepts or demonstrate particular
skills?" It is almost always part of our evaluation of student learning
in specific classes, but it is not always a part of program evaluation.
As those who write about program evaluation warn, testing should never
be the only instrument used to evaluate programs that have multiple
goals and objectives (as most programs do). These three terms may
be thought of as nested boxes, assessment being the largest and most
inclusive, and testing being the smallest and most specific.
Quantitative and Qualitative MethodsEvaluation methodology was born within the research
tradition of the social sciences, which was modeled on the positivist
tradition of the natural sciences. This tradition—or paradigm, in
the Kuhnian sense of a system of shared beliefs or a world view—rested
on certain assumptions. Carol Berkenkotter sums up these assumptions
as follows:
Social scientists, like physical scientists, are
detached from their objects of study.
Investigations of social phenomena can therefore
be conducted in a value-neutral fashion, the researcher eliminating
all personal bias and preconceptions and employing language that expresses
objectivity.
Social science, like the physical sciences, is nomothetic—that
is, it is possible to extrapolate from social scientists' data social
laws that apply across numerous contexts. (71)
The methodology that has come to be identified
with these positivist assumptions is experimental and quantitative,
involving careful research designs with controls for variables and
statistical formulae for making sense of the data generated; the now-familiar
experimental article, the rhetoric of which Charles Bazerman analyzed,
not only discusses the outcomes of the experiment but also displays
significant data in tables and charts.
While not everyone who uses these methods would subscribe to the positivist epistemology that gave birth to them (for example, see Linda Flower's eloquent discussion of this issue), the methods themselves are still very much alive and are also very much a part of evaluation studies; writing program evaluations are no exception. Witte and Faigley describe several such studies: of Northern Iowa University, which was a comparison between the performance of students who had taken composition and students who had not; the University of California at San Diego, which compared the results of different instructional methods in four writing programs within the university; Miami University, which compared the performance of students in a sentence-combining curriculum to those enrolled in a "traditional" composition course; and the University of Texas (the authors' own study), which compared the performance of students in a "sentence expansion" curriculum with that of students in a "traditional" course. Unfortunately, these evaluations yielded little in the way of useful information.3 As Witte and Faigley say, "Evaluation studies, including our own, which were based on the quantitative model have yielded few major insights concerning the teaching of writing or the operation of writing programs" (38). Quantitative methods by themselves cannot do justice to programs that involve not only students but also faculty, curriculum, and administrative structures. There has been, however, a paradigm shift in evaluation studies, as described by Lincoln and Guba in two works, Naturalistic Inquiry and the more recent Fourth Generation Evaluation. The new paradigm, which has been described not only as post-positivist but also "hermeneutical" (Hesse), "constructivist" (Guba and Lincoln), and "naturalist" (Lincoln and Guba), calls into question all the basic assumptions of quantitative methodology, proposing instead qualitative (that is, interpretative or ethnographic) methods. Lincoln and Guba contrast the two paradigms4 as follows: Axiom 1: The nature of reality (ontology) Positivist version: There is a single tangible reality "out there" fragmentable into independent variables and processes, any of which can be studied independently of the others; inquiry can converge onto that reality until, finally, it can be predicted and controlled. Naturalist version: There are multiple constructed realities that can be studied holistically; inquiry into these multiple realities will inevitably diverge (each inquiry raises more questions than it answers) so that prediction and control are unlikely outcomes although some level of understanding . . . can be achieved. Axiom 2: The relationship of knower to known (epistemology) Positivist version: The inquirer and the object of inquiry are independent; the knower and the known constitute a discrete dualism. Naturalist version: The inquirer and the "object" of inquiry interact to influence one another; knower and known are inseparable. Axiom 3: The possibility of generalization Positivist version: The aim of inquiry is to develop a nomothetic [law-based] body of knowledge in the form of generalizations that are truth statements free from both time and context (they will hold anywhere and at any time). Naturalist version: The aim of inquiry is to develop an ideographic [based on symbol] body of knowledge in the form of "working hypotheses" that describe the individual case. Axiom 4: The possibility of causal linkages Positivist version: Every action can be explained as the result (effect) of a real cause that precedes the effect temporally (or is at least simultaneous with it). Naturalist version: All entities are in a state of mutual simultaneous shaping so that it is impossible to distinguish causes from effects. Axiom 5: The role of values in inquiry (axiology) Positivist version: Inquiry is value-free and can be guaranteed to be so by virtue of the objective methodology employed. Naturalist version: Inquiry is value-bound. (37-38) This paradigm shift in the evaluation community is good news for writing program evaluation. The assertions that reality is complex and diverse, that causality is mutual rather than direct, that evaluation, like any inquiry, cannot be value-free (indeed, that we should make value-oriented discourse the center of our concerns in evaluation; see Schwandt)—all these are assertions with which many of us in the composition community would agree. The qualitative methods associated with this paradigm are ones with which we are more comfortable, ones which promise to provide a more workable evaluation design for our multifaceted programs. Qualitative EvaluationWhat would constitute such an evaluation is suggested
by several researchers. Witte and Faigley tell us that a program evaluation
needs to do at least two things: specify the components of writing
program evaluation (such as the cultural and social context in which
the program exists, the institutional context, the program structure
and administration, the content or curriculum of the program, and
the instruction involved), and the interactions among these components,
which will allow us to examine the effects of the program (intended
or unintended). Effects may be observable during the program, "outcomes"
evident at the end of a program, or long-range effects evident only
after a longitudinal study. They can be seen through many different
lenses, including various kinds of data (written products, attitudes,
teaching methods) (39-41). The evaluator looks at the components of
the program and its effects, and the questions generated by both.
(For example, is the program meeting the needs of the institution?
In what way?) The evaluator also looks at the goals and objectives
of the program and asks how well they have been achieved, never forgetting
that unintended effects may be just as important as goals achieved
(57-63).
Lincoln and Guba are more specific about how to proceed. They list various elements of a naturalistic inquiry, such as creating the focus of the evaluation, deciding where and from whom data will be collected, determining successive phases of the inquiry, determining instrumentation, planning data collection and recording modes, devising data analysis procedures, planning logistics, and preparing for "trustworthiness" of the study—the latter being the naturalist's answer to validity, reliability, and objectivity. Such a study would rely heavily on human instrumentation (such as, interviews and observation) rather than pencil and paper instruments; its focus would not be laid out ahead of time, but would be emergent; data gathered would not be just written products, but also results of interviews and field notes. Data analysis would not be strictly statistical but interpretive, in the tradition of the anthropological field study (226-49; 332-33).5 Yet, as Lincoln and Guba point out, it is paradoxical to discuss evaluation design in qualitative inquiry, since the design is emergent. The focus of the study may change, the instrumentation is refined as one goes along, the data analysis is open-ended and inductive, the timing cannot be predicted, and the expected end products cannot be specified (224-25). The methodology is messy and labor-intensive and suffers from a perceived lack of rigor among those used to quantitative evaluations.6 And given the methodology, the budget for a naturalistic evaluation is virtually unspecifiable. Lincoln and Guba comment that rather than deal with the question "What will it cost to carry out these tasks?" the funder must ask "What am I willing to spend?" (225). Imagine saying to one of your own administrators, "I'm not really sure how much this will cost, Dean Tightfist. How much are you willing to spend?" You would, at best, be laughed out of the office; responsible administrators want to know ahead of time precisely how much an evaluation will cost them, what activities they are paying for, and what kind of bang to expect for their buck. Moreover, many university administrators above the level of dean, especially at research institutions, are scientists by training and are used to quantitative methods. They deal with numbers and statistics as part of their jobs as administrators, they know how to interpret such data, and they expect that sort of information from the rest of us to support our requests and justify the existence of our programs. They would find qualitative data exotic at best.7 This latter problem is perhaps the most serious difficulty of all for those of us who must deal not only with administrators but also with legislators and boards of regents who control the purse strings and who do not have the time or inclination to wade through the complexities of qualitative data—indeed, who sometimes want statistics and hard data but do not even understand this information when we give it to them. Ed White gives us a glimpse of this hard reality in a wonderful anecdote in Developing Successful College Writing Programs. White was testifying before the California Senate Committee on Finance as the expert witness, trying to persuade the legislature to fund a program for students who scored in the low range of the state-wide English Placement Test. The chair of the committee, whom White describes as a "venerable but notoriously shrewd sheep rancher from the central valley," asked White a question about the placement test: how many students flunked it? White continues, I was ready for the question and had prepared a careful
response. "I can't give you a simple number, Senator," I said. "The
test is designed to give a profile of student skills, not just a pass
or fail. We report a set of six separate scores to each campus for
each student; the campus then analyzes the scores and places the student
appropriately in whatever curriculum the campus has in place for entering
students." I sat back, feeling that I had started out just fine.
But the senator was frowning,
and when he spoke he was no longer friendly. "Just like a professor!"
he barked at no one in particular. "You ask a simple question and
all you get back is a bunch of gobbledygook!" (193)
White was rescued by an administrator he describes
as the dean-of-getting-money-from-the-legislature, who sprang into
action, telling the committee that fully half of those who took the
test failed it. How did he know? Why, that was the number that scored
below the fiftieth percentile. In other words, White tells us, he
told the committee chair that the lower half of the scores was the
lower half of the scores. "'Thank you,' said the now genial sheep
rancher. 'It's a relief to have somebody at that table who can give
a straight answer when we need one'" (194). The writing program was
funded. White is not, of course, advocating that we lie to our legislators—only
that we need to know our audiences and how to present information
to them when we discuss evaluation.
So there is a paradox here, a fundamental contradiction between the paradigm of evaluation which seems best suited to writing programs (post-positivist and qualitative) and the kind that will yield the information administrators, legislators, and funding agencies usually understand and want from us (positivist and quantitative). But this contradiction need not hinder our evaluation efforts if we understand that these two paradigms of evaluation are not ones we must subscribe to with theological fervor, clinging to one and rejecting the other. They are instead different ways of looking at the world, different stances, different lenses through which we may examine phenomena. They need not be incompatible, as many researchers have pointed out (see, for example, Howe). In evaluation, we need to ask not which paradigm is "correct," but which view of the data is appropriate for a particular purpose. Lincoln points to an analogy in mathematics: The axioms that make up Euclidian geometry have served
us well for several millennia here on earth—where it is useful to
have triangles with interior angles of 180!, and where all lines are
straight (or at least where we might pretend they are straight). Turn
those axioms on their heads, however, and you have what appears to
be nonsense. Who could use triangles whose interior angles only approached
180! as the triangles got smaller? What if the shortest distance between
two points were not a straight line, but a curved one, or several
thousand of them? What could one do with a geometry with such axioms?
The quick answer is this: You can put people on the moon with such a geometry—which is called Lobatchevskian—and you can bring them back home again. The point is fit. Euclid's is the axiomatic set of choice in some instances and other geometries are the sets of choice when you have other kinds of problems to solve. (32) As long as we are aware of the paradigms, we
can choose our methods carefully and wisely, according to how well
they fit what we and others need to know in order to make decisions
about our programs.
We can do this by designing what Michael Patton calls "utilization-focused evaluation." One designs and organizes such an evaluation by first identifying the relevant decision makers and then asking, "What information do these people need and what will they do with it?" The answers to these questions then shape the evaluation: what paradigm to use, what data to collect, what research methods to use, and how to present the results (Utilization passim). This sounds familiar to those of us who teach writing (know your audience and purpose before you begin), and in designing program evaluation the advice to start with such questions is just as easy and as difficult as it is in writing a paper. Furthermore, this advice about where to start assumes at least a passing knowledge of certain fundamentals: of such things as goals clarification, design alternatives (matching research design to program design), questionnaire construction, interview techniques, methods of data collection, analysis, and presentation. It also assumes, as in some writing situations, a certain familiarity with various models so that one can adapt models of evaluation to one's own use.8 Such a utilization-focused design for program evaluation is particularly appropriate for writing programs in that it is situational and contextual, applicable to the uniqueness of each program. According to what the decision-makers (including ourselves) need, we might or might not want to interview students and faculty, conduct case studies of selected students and faculty, examine expressive and Likert Scale evaluations of classes and of staff workshops or training sessions, conduct writing attitude surveys of both students and faculty, collect and track the use of assignments and syllabi, set up experimental (or quasi-experimental) matched classes to study classroom practices. An evaluation focusing on the user of and the purpose for the data recognizes that there is no one particular paradigm, sacred methodology, or magic design for evaluation, but a variety of creative possibilities. For that reason it is a challenging and to some a daunting task. But as those engaged in evaluation of their writing programs know, it is a task that can—and often must—be done. Washington State University NOTES1For example, see Davis, Scriven, and Thomas;
Fulwiler; White, Developing and Teaching; Witte
and Faigley; Young and Fulwiler.
2For a more extensive definition of evaluation,
see Patton, Practical 33-37. Also see the definitions of
these three terms in Scriven.
3For extensive critiques of each of these
studies, see Witte and Faigley 8-38. Also see White's discussion of
this evaluation research tradition in "Language."
4Few researchers in evaluation theory would
accept the extreme versions of the two positions described here; these
researchers, however, are not the ones who are asking us how our programs
work. I believe it is useful to examine the extremes in order to understand
why it is that non-researchers often react to qualitative research
methods as "non-scientific." I would argue that the popular understanding
of "scientific" methods is shaped by the positivist paradigm as described
here.
5For a discussion of such an evaluation
in writing-across-the-curriculum programs, see Fulwiler. See Guba
for a discussion of methodology.
6See Skrtic 206-16 for a case study in
such difficulties.
7This fact was brought home to me recently
when an administrator I know, a social scientist whose administrative
duties have prevented him from keeping up with research in his field,
recommended against internal funding for a study using qualitative
methods, telling the review committee that the methodology was "suspect."
The study was later funded at a much higher level by an outside agency.
8For sensible advice on the fundamentals
of evaluation, see the works by Patton. House provides a taxonomy
of models. For advice on how to write up the results of an evaluation,
see Morris and Fitz-Gibbon. A good example of a recent evaluation
design for a writing-across-the-curriculum program based on multiple
measures like these, is in White, Developing 207-08. The
Council of Writing Program Administrators' Consultant-Evaluator program
is an excellent example of utilization-focused evaluation; for a description
of this program, see Lindemann and the article by the WPA Board of
Consultant-Evaluators.
Works CitedBazerman, Charles. "Codifying the Social Scientific
Style: The APA Publication Manual as a Behaviorist Rhetoric." Shaping
Written Knowledge: The Genre and Activity of the Experimental Article
in Science. Madison: U of Wisconsin P, 1988. 257-77.
Berkenkotter, Carol. "The Legacy of Positivism in Empirical
Composition Research." Journal of Advanced Composition 9
(1989): 69-82.
Boyer, Carol M., et al. "Assessment and Outcomes Measurement:
A View from the States." AAHE Bulletin (March 1987): 8-12.
Davis, Barbara Gross, Michael Scriven, and Susan Thomas.
The Evaluation of Composition Instruction. 1981. New York:
Teachers' College P, 1987.
Erickson, Frederick. "Qualitative Methods in Research
on Teaching." Handbook of Research on Teaching. Ed. Merlin
C. Wittrock. 3rd ed. New York: Macmillan, 1986.
Flower, Linda. "Cognition, Context, and Theory Building."
College Composition and Communication 40 (1989): 282-311.
Fulwiler, Toby. "Evaluating Writing Across the Curriculum
Programs." Strengthening Programs for Writing Across the Curriculum.
Ed. Susan H. McLeod. San Francisco: Jossey, 1988. 61-75.
Guba, Egon G. Toward a Methodology of Naturalistic
Inquiry in Educational Evaluation. Los Angeles: UCLA Center for
the Study of Evaluation, 1978.
Guba, Egon G., and Yvonna S. Lincoln. Fourth Generation
Evaluation. Newbury Park, CA: Sage, 1989.
Hesse, Mary. Revolutions and Reconstructions in
the Philosophy of Science. Bloomington: Indiana UP, 1980.
Hillocks, George. Research on Written Composition:
New Directions in Teaching. Urbana: NCRE/ERIC, 1986.
House, Ernest R. "Assumptions Underlying Evaluation
Models." Educational Researcher 7.3 (1978): 4-12.
Howe, Kenneth. "Against the Quantitative-Qualitative
Incompatibility Thesis, or Dogmas Die Hard." Educational Researcher
17.8 (1988): 10-16.
Lincoln, Yvonna S., ed. Organizational Theory and
Inquiry: The Paradigm Revolution. Beverly Hills, CA: Sage, 1985.
Lincoln, Yvonna S., and Egon G. Guba. Naturalistic
Inquiry. Beverly Hills, CA: Sage, 1985.
Lindemann, Erika. "Evaluating Writing Programs: What
An Outside Evaluator Looks For." WPA: Writing Program Administration
3 (1979): 17-24.
Morris, Lynn L., and Carol T. Fitz-Gibbon. How
to Present an Evaluation Report. Beverly Hills: Sage, 1978.
Patton, Michael Q. Practical Evaluation. Beverly
Hills: Sage, 1982.
—. Utilization-Focused Evaluation. Beverly
Hills: Sage, 1978.
Schwandt, Thomas A. "Recapturing Moral Discourse in
Evaluation." Educational Researcher 18.8 (1989): 11-16.
Scriven, Michael. Evaluation Thesaurus. 4th
ed. Newbury Park, CA: Sage, 1991.
Skrtic, Thomas M. "Doing Naturalistic Research into
Educational Organizations." Organizational Theory and Inquiry.
Ed. Yvonna S. Lincoln. Beverly Hills, CA: Sage, 1985. 185-220.
White, Edward M. Developing Successful College
Writing Programs. San Francisco: Jossey, 1989.
—. "Language and Reality in Writing Assessment." College
Composition and Communication 41 (1990): 187-200.
—. Teaching and Assessing Writing. San Francisco:
Jossey, 1985.
Witte, Stephen P., and Lester Faigley. Evaluating
College Writing Programs. Carbondale: Southern Illinois UP, 1983.
WPA Board of Consultant Evaluators. "Writing Program
Evaluation: An Outline for Self-Study." WPA: Writing Program Administration
4 (1980): 23-28.
Young, Art, and Toby Fulwiler. Writing Across the
Disciplines: Research Into Practice. Upper Montclair, NJ: Boynton,
1986.
|
||
|
|||||||||