JAC Home
About JAC
Current Volume
Archives
Subscriptions
Submissions
Contact Us
JAC Volume 7

Editor:
Gary A. Olson

Back to Vol. 7 ToC

Readability: Reading/Writing Tools for Measurement

Alice S. Horning

In a recent and comprehensive review of readability research, George Klare composes a picture of the current state of our knowl­edge of readability. He makes an important distinction between research on predictions of readability and research on production factors. This distinction is useful because it leads to new ways of analyzing the problem. Klare says that there are certain measurable factors known to be useful for predicting readability (683, 703-4). Among these are counts of content words, syllables, and the like—all things countable by various readability formulas. There are also certain features of text production that figure prominently in readability—cohesion and propositional density, for example—which are either already measurable or soon to be measurable by computer. Thus, while neither prediction variables alone nor pro­duction variables alone provide a complete picture, taken together the two provide a much improved analysis of readability. Such combining has proven helpful in analyzing data which resists other analysis, such as the material in a study to be discussed below. Also, such combining yields an analysis that taps both text factors and reader factors as a way to get at “real” readability. Since the resulting insights are pertinent to the problem of measuring and teaching readable writing, they can be of use to writing teachers as well.
Formula Readability

Readability begins with text features known to be helpful for predicting the difficulty of material for a certain group of readers. Readability formulas have been analyzed, criticized, reformulated and reviewed entirely too often (Klare; Davison and Kantor). However, the formulas do work, in a limited and specific way. For instance, some formulas count numbers of concrete words and numbers of syllables and use them in an equation that produces a readability score. Melissa Holland notes that “more concrete words are easier to understand and remember independently of length and frequency” (6), and the ratio of content words to function words also has an impact on the comprehensibility of a text (9). Various formulas tap these factors to make useful predictions based on specific counts of content words and syllables in relation to overall words, and counts of sentence length.

These counts are now available by computer, as demonstrated by Michael Schuyler’s summary of nine of the most popular formulas. Several derive a count of content words, using differing strategies. In the Dale-Chall formula, a list of 3000 words is provided, words found to be familiar to fourth graders in a test done by Dale and Chall (Schuyler 564). The Fog formula counts three-syllable words, and the Automated Readability Index counts word length in charac­ters. All of these are attempts to get a numerical measure of the content words in a passage. The reason for this approach is that research shows that content word counts have something to do with the difficulty of a passage. However, content level is one, but not the best or only, measure of readability.

Other formulas make use of syllable counts. The Flesch formula uses a count of syllables per hundred words (Schuyler 567), and Kincaids revision of the Flesch formula uses a ratio of syllables to total number of words. It is important to note that one of the flaws in the formulas, as is often pointed out, is that the classic formula counts numbers of words and numbers of sentences, and the number of syllables may not correlate with either of these two factors. That is, some formulas differ from one another in terms of what they count and in terms of the readability levels they assign to a given passage.

There are two important points to be made about the readability factors discussed so far. First, they tell part but not all of the reada­bility story, and second, all the factors listed here are measurable by computer. Readability is more complex than counts of sentences or syllables can suggest. But we can make use of these computer-countable elements by combining them with other factors known to make a difference in readability.


Propositional Analysis

Discourse analysis, an area known to be pertinent to readability, has made great progress in the last ten years or so in the de­velopment of text analysis systems that examine factors in the text and certain aspects of reader activity during the process of deriving meaning from print. Of these systems, propositional analysis and cohesion analysis currently provide the most useful insights. Tierney and Mosenthal’s meticulous review of these two systems of text analysis explains how each works and what its weaknesses are.

The first type of text analysis reviewed by Tierney and Mosenthal’s propositional analysis, chiefly found in the work of Walter Kintsch. Kintsch’s system involves breaking down each sentence of passage into its idea units, called propositions. These propositions reflect the knowledge of the writer and are built around verbs, a feature supported by the newly developed theory of Residential Grammar (Binkert). Comprehension occurs as a reader processes the content of a passage in the form of propositions and fits them into a general sense of the meaning of the text, or its macrostructure. The system works well on all types of texts, and can also work on passages longer than one sentence. However, propositional analysis is also extremely difficult to use, despite Turner and Green’s procedures manual, and an easier, simplified version worked out by Susan Bovair and David Kieras. Moreover, Tierney and Mosenthal point out that the rules for creating a macrostructure are not strongly supported empirically and require much further study.

Still, Kintsch has turned up some interesting findings in his studies. Kintsch takes reading time and recall as the two salient measures of readability. Even his earliest studies show that more propositions require more reading time (with length held constant), and that certain types of propositions are easier to recall than others. Also, repetition of arguments (elements within propositions) has an impact on readability, as does the number of different arguments found in the propositional analysis of a passage. Although argu­ments in propositional analysis and content words counted by formulas are not necessarily the same thing, both examine the substance of the passage.

More interesting for those concerned with the measure of reada­bility is the development of a computer program which counts propositions and compares lists of them. This program, written by David Kieras at the University of Michigan, is described in Britton and Black’s Understanding Expository Text. The program does not perform basic propositional analysis, but can take a list of proposi­tions prepared by a researcher and compare it to another list. In Kieras’ studies, he has compiled a list of propositions by asking subjects to write down a brief summary of what they recall from a passage and dividing this summary into its propositional compo­nents. The program compares the two proposition lists and provides information about them. Since empirical data shows that propositions have a dear relationship to readability as measured by reading time and recall, the work of Kintsch, supplemented by Kieras and Bovair, provides an important analysis of this proposi­tional component of readability.


Cohesion Analysis

Cohesion analysis provides another angle on readability. Cohe­sion, as Tierney and Mosenthal point out, is not a system of content analysis. Instead, cohesion is the feature of text that relates sentences and allows us to judge that a string of sentences (paragraph, chapter, book) form a unified text. Cohesion analysis, originally proposed in detail by M.A.K. Halliday and R. Hasan, examines a passage for five types of cohesive ties: reference, substitution, ellipsis, conjunction, and lexical cohesion. As with propositional analysis, cohesive ties have been found empirically to be related to readability. More cohesive ties increase reading speed and improve recall in at least one study (Irwin). For writers, cohesive ties are relatively easy to add and can make a real difference to a text. The data on the pas­sages in the study discussed below reveal the important difference cohesion can make.

While all this research on discourse analysis seems promising and yields extremely interesting data on readability, the systems are relatively new, only partly tested, and highly subjective in use. Thus, the data discussed below are necessarily preliminary because it is very hard to be sure that one’s own analysis is a correct one. Even experienced researchers like Kieras agree that it is hard to know if a particular analysis is correct. Cohesion analysis is a little better in this context than propositional analysis because it has been fully explicated by Halliday and Hasan in Cohesion in English. But even so, it is difficult to feel confident about data generated in such an analysis, and even more difficult to verify an analysis by using a second scorer, because the systems of analysis are complex and not easily learned. With propositional analysis, even using the Bovair and Kieras simplified system, judgments are highly subjective and, again, difficult to verify.

Still, Tierney and Mosenthal rightly point out that cohesion studies provide a useful descriptive analysis of text and supplement content studies. Moreover, both propositional analysis and cohesion analysis provide numerical data, albeit subjective and prelimi­nary, that can be fitted together with the more objective, if superficial, counts of syllables, content words and so on, producing a more complete analysis of a text.


Applications to Readability Study

I have been looking at these various approaches to text analysis in order to resolve particular problems with texts I have been examining for some time. In a study I completed in 1982, I found that I could significantly improve readability as measured by Cloze scores by adding redundancy to texts written by professional writers. At least, in two of the three passages I used, my additions improved the Cloze scores. Since all three passages were treated the same way, I could not initially account for why my changes failed to improve the Cloze scores in the third case.

In this earlier study, I prepared as Cloze tests three passages on three different topics: the origins of male dominance in society, the effect of television on sports, and evaluating the performance of governments. In Cloze testing, a passage is prepared with words systematically deleted—in my study, I chose a one-in-six deletion formula, following a pattern in a similar study—and readers are risked to fill in the blanks with the words they think have been permitted. I prepared four versions of each passage: the original, a version with only syntactic redundancy added, a version with only semantic redundancy added, and a version with both types of redundancy added. Syntactic redundancy was operationally defined as rearranging sentences into more predictable subject-verb-object patterns, changing passives to active voice, and the like. Semantic redundancy was operationally defined as adding examples or defining phrases, and similar changes (Horning, “On Defining”). The passages were read by a total of 240 first-year college students, and scored using an approximate-synonyms-acceptable scoring system. The results showed that on the first and third passages (male dominance and government topics), the addition of both types of redundancy produced significantly higher Cloze scores. In the other passage (on the effects of television on sports), adding redundancy made no difference. I have been using the various approaches to readability discussed thus far to help account for my outcome.


New Factors

I have re-examined my data, using both propositional analysis and cohesion analysis, and in addition, have carried out other detailed studies using computerized readability measures and recently developed writing aids. Taken all together, these factors reveal that there are real differences among the passages, differences not captured by holding length, Dale-Chall level, and other super­ficial features constant. This finding supports the point that readability is much more complex than formulas alone suggest. My findings from both computer-based analysis and from propositional and cohesion analysis account for why adding redundancy did not improve the Cloze scores in the third case. Furthermore, the results show that it is complex, but possible, to take reader-text interaction into account when analyzing the readability level of a passage. In what follows, I hope to show why I obtained the results that I did, and to suggest that readability can be defined and measured in terms of redundancy and other factors now countable by computer. These findings may have important relevance to the teaching of writing.

For the propositional analysis of the original version of each passage, I have made use of the Bovair and Kieras system discussed above. This propositional analysis isolates and lists the main idea units in a passage according to a set of rules. The analysis operates chiefly at the surface level of text; by staying close to the surface, one has a greater chance of producing an accurate propositional analysis of the passage. In my analysis, I have stayed at this superficial level and have produced an interesting outcome, given the results of my study. What I have found is that the number of propositions is somewhat higher in the passage in which adding redundancy did not help (the sports passage, referred to below as the 200-series) than it was in the other two passages. And, as noted above, Kintsch s studies show clearly that more propositions increase reading time and lower recall on a passage. (See Table 1.)

Table ¶

Proposition Levels in the Passages
Form

PropositionCount
101

88
201

120
301

111



While I cannot say that these differences are significant, and while I would insist that these findings are merely preliminary because they have not been confirmed by a second scorer, these differences do suggest that, in terms of proposition count, the 200-series passage in its original form was more difficult than either of the other two.

There are other measures of difficulty, notably the cohesion analysis discussed above. My cohesion analysis follows the system developed by Halliday and Hasan. I have not made use of their detailed coding scheme, but I have conducted a more generalized analysis of the number of cohesive ties in each of the passages. In this case, because of the similarities in the passages due to the nature of the changes I made in adding redundancy, I was able to carry out a cohesion analysis on all versions of all three passages. Several interesting, if again preliminary, findings turn up here, as presented in Table 2:

Table 2

Cohesive Ties in Passages
Form

ClozePercentage

Cohesive Ties

In Sentence

Across Sentence
101

72

40

14

26
102

57

49

22

27
103

70

45

19

26
104

81

57

31

26
Mean 48
201

50

40

4

36
202

44

37

7

30
203

41

43

7

36
204

40

40

10

30
Mean 40
301

32

54

14

40
302

34

49

9

40
303

27

55

17

38
304

49

50

12

38
Mean 52

There are several points to be made about these figures. Overall, the mean numbers of cohesive ties suggest that the aberrant 200-series passages are less cohesive than the other two passages. And, as noted above, the Irwin study shows that fewer cohesive ties yield longer reading times and lower recall scores, again suggesting that the 200-series is more difficult in a general way than either of the other two. The other salient finding from the cohesion analysis is that the number of cohesive ties within sentences generally in­creases with the addition of redundancy, but the number of ties across sentences, described by Halliday and Hasan as the major type of cohesion, does not change dramatically with the addition of redundancy.

A further point has come to light in my studies. A report by Shanahan et al. shows that Cloze tests may not be sensitive to cross-sentence constraints in text. In their study, Shanahan and his colleagues describe three separate attempts to find Cloze tests sensitive to comprehension across sentences. Different readers were given (1) standard Cloze passages, (2) passages with sentences in scrambled order, and (3) passages in which a single sentence from the standard passage was put into a wholly different context. Prior knowledge, time on task, and formula readability level were all controlled. Shanahan’s results suggest that Cloze tests are not sensi­tive to cohesive ties. However, my data may reflect a greater sensi­tivity because of my Cloze methodology. In my earlier study, I used a one-word-in-six deletion formula, deleting nouns and main verbs exclusively. This procedure allowed me to delete the same words in all versions of each passage. Because I was using Cloze in this way, my task may have been more sensitive to cohesion levels. In addi­tion, other factors to which Cloze is sensitive also turn up differences in the 200-series.


Computer-Based Findings

In terms of features known to have an impact on readability as measured by reading time and recall (i.e. not Cloze), the 200-series passage is different from and more difficult than either of the other two. Several other measures support this finding in a more objective fashion. I have made use of the computerized readability pro­gram written by Michael Schuyler and described above. In addition, I have had my passages analyzed by the WRITER’S WORKBENCH program developed by Bell Labs and in use at Colorado State Uni­versity. I have analyzed the passages using the SENSIBLE SPELLER program for Apple computers, the HOMER program for writing assistance, and, finally, I have done statistical analyses of the results of these various computer-generated counts of a variety of features in the texts. The findings presented below are more objective than those discussed already, and again bear upon the outcome of the original study and help to account for what happened.

Michael Schuyler’s computerized readability program makes counts of syllables, words, sentences and three-syllable words, and uses these figures to calculate nine popular readability formulas. The raw numbers generated by the program are of considerable interest, and the syllable count is particularly important in this data. To count syllables, the program counts the number of characters in the passage and divides by a constant of 3.1127, which provides a highly accurate count of the number of syllables. The results of this count are presented in Table 3 (see page 108). Klares discussion of prediction data suggests clearly that the number of syllables is related to the difficulty or complexity of the passage (710). Here, the mean numbers of syllables in the passages show considerable variation. And, once again, the 200-series, where I was unable to improve read­ability, turns out to have the largest number of syllables, suggesting that it is simply a much more complex passage.

Table 3

Mean Syllable Counts Derived from ReadabilityProgram
Series Syllables(Mean)
100 455.63
200 523.10
300 503.66

Yet another measure of text difficulty comes from counts of content words, a task easily done by a computer. This task can be accomplished in several different ways, and in my case, the two approaches that I applied turned up results consistent with the findings I have presented so far. In the WRITER’S WORKBENCH program, the computer generates a count of content words and a percentage figure representing the percentage of words in the pas­sage that are content as opposed to function words. The count of content words represents all nouns, verbs, adjectives, and non -ly adverbs in the passage. The results of the content word count are presented in Table 4:

Table 4

Mean Content Wordsand Percentages by WRITER’S WORKBENCH
Series

MeanNumber of Content Words

Mean Percent of Content
100

156

522
200

185.8

62.07
300

161.5

53.77

Here again, the 200-series turns out to have a higher count of content words and a higher percentage of content words than either of the other two, both of which are at about the same level.

Yet another approach to content can be taken by using SENSIBLE SPELLER, a program designed to correct spelling errors in texts. SENSIBLE SPELLER counts the number of unique words in a pas­sage. These counts, while not counts of content words per se, are counts of the number of different words in the passage, and would presumably omit articles, many function words, and others that recur normally in written text.

Table 5

Mean Counts of Unique Words Derivedfrom SENSIBLE SPELLERProgram
Series Number of Words Number of Unique Words
100 293 154
200 301.7 173.25
300 298 148.5

Here again, the 200-series turns up a larger number of unique words, and like the content words, this finding suggests that the 200-series passage was generally more complex for readers than either of the other two.

These data have been examined statistically to determine whether or not it is reasonable to make predictions based on the results from these small samples. Here, perhaps, is where the most interesting outcome appears. I tried correlating these various counts with one another and with my Cloze score results. I turned up very strong correlations of content words to syllable counts in the 100-and 300-series, both strong and significant in the statistical sense. In the 200-series, a weak, negative, and nonsignificant correlation appears.

Table 6

Speannan Correlation Coefficient: Content Words to Syllable Count
Series Spearman Correlation Significance
100 1.0 p.< .001
200 -26 p. < .37
300 .89 p.< .05

Logically, it seems that the numbers of syllables and the numbers of content words should correlate, particularly since research shows that both are factors in text difficulty (Samuels and Eisenberg). Here, the figures suggest that the 200-series is a highly unusual passage in which the expected correlation fails to appear.

Finally, the HOMER program from California counts preposi­tions, “to be” verbs, nominalizations and “woolly words." None of these counts is particularly informative with the exception of the nominalization figures. This program counts nominalizations, as these forms have been found by researchers to require more pro­cessing in the brain, and thus to be more difficult, where difficulty is measured by reading time (716). Like the other results, these numbers again suggest a difference in the 200-series:

Table 7

Mean Nominalizatlon Counts Derived from HOMER Program
Series Number of "Shun" Words
100 11
200 14
300 8.5

As before, the 200-series turns up higher in nominalizations, suggesting again that the 200-series was inherently more difficult than either of the other two passages.
Directions for Further Research

On a number of measures, both subjective and objective, the passage that did not respond to added redundancy is different from and more difficult than the other two passages studied. As these results have come in, I have become more and more convinced that the 200-series passage was so much different and so much more difficult that my changes made no difference. It seems that the changes I made, based on my hypothesis that adding redundancy would im­prove readability, should have worked no matter what the nature of the passage was. Since all the passages were treated the same way, it seems as though my changes still should have worked.

However, rethinking suggests why this reasoning is incorrect. Redundancy, as I defined it ("On Defining”), affects the text syntac­tically and semantically, tapping factors not dearly measured by any of the features reviewed thus far. That is, redundancy does not alter nominalizations, content words, syllable counts, proposition counts, or cohesive ties across sentences to any great degree. Thus, when I add redundancy, I do not alter any of these factors known to contribute to text difficulty, because increasing redundancy does not reduce any of these difficulty factors. Redundancy must, based on my results, increase reading ease, but its effects are mitigated to vary­ing degrees by difficulty factors. All these factors, both subjective and objective, which contribute to difficulty in some way, have turned out to be at much higher levels in the passage where adding redun­dancy did not improve Cloze scores.

Eventually, we may reach a complete understanding of the nature of text difficulty, from both the predictive factors sum­marized here, and the productive factors like cohesion and proposi­tion density that are in the control of writers. Basic research of this kind will ultimately lead to more specific kinds of pedagogical strategies for teaching writing, although such practical application seems far off at this juncture.

The research findings reported here point research in several specific directions. First, knowing that the factors measured here—cohesion, propositions, syllable and content-word counts—play a role in the inherent difficulty of text demonstrates that additional text features can and should be analyzed to make a predictive measure of readability. Second, these factors are now known to be significant to readability because they tap into the important interaction between text and reader. Third, production factors such as redundancy, which can be altered in a text, must also be measured and, when other difficulty factors do not interfere, can significantly enhance the readability of text. Thus, while basic research continues, teachers of writing can keep their hopes high that new theoretical findings will soon provide strong evidence for classroom metho­dology that will teach students how best to produce readable writing.

Oakland University
Rochester, Michigan

Works Cited

Binkert, Peter J. Generative Gram mar without Tnansfornuztions. Berlin: Mouton,1984.
Bovalr, Susan, and David E. Kieras. “A Guide to Propositional Analysis for Re­search on Technical Prose.” Understanding Expository Text: A Theoretical and Practical Handbook for Analyzing Expository Text. Ed. Bruce IC Britton and John B. Black. Hillsdale, NJ: Erlbaum, 1985. 315-62.
Davison, Alice, and Robert Kantor. “On the Failure of Readability Formulas to Define Readable Texts A Case Study from Adaptations.” Reading Research Quar­terly 17 (1982): 187-206.
Halliday, Michael, and Ruqaiya Hasan. Cohesion in English. London: Longman, 1976.
Holland, Melissa. Psycholinguistic Alternatives to Readability Formulas. Document Design Project Technical Report #12,1981. ERIC ED 214 370.
Horning. Alice. “On Defining Redundancy in Language Case Notes.” Journal of Reading 22(1979): 312-20.
—‘Redundancy and Text Difficulty: A Study.” New England Reading Association Journal 17(1982): 18-24.
Irwin, Judith W. ‘The Effect of Linguistic Cohesion on Prose Comprehension.” Jour­nal of Reading Behavior 12(1980): 325-32.
Kintsch, Walter. The Representation of Meaning in Memory. Hiflsdale, NJ:
Erlbaum, 1974.
Klare, George R. “Readability.” Handbook of Reading Research. Ed. P. David Pearson, et al. New York: Longman, 1984. 681-744.
Saniuels, S. Jay, and M. Eisenberg. “A Framework for Understanding the Reading Process.” Neuro psychological and Cognitive Processes in Reading. Ed. Francis Pirozzolo and Merlin Wittrock. New York Academic Press, 1981. 31-67.
Schuyler, Michael. “A Readability Formula for Use on Microcomputers.” Journal of Reading 26 (1982): 560-91.
Shanahan, Timothy, Michael Kamil, and A. W. Tobin. “Cloze as Measure of Inter-sentential Comprehension.” Reading Research Quarterly 17(1982): 229-55.
Tierney, Robert, and 1. Mosenthal. “Discourse Comprehension and Production:
Analyzing Text Structure and Cohesion.” Reader Meets Author: Bridging the Gap. Ed. Judith A. Langer and M. Trika Smith-Burke. Newark, DE: Interna­tional Reading Association, 1982. 55-104.
Turner, A., and E. Greene. The Construction and Use of a Propositional Text Base. Boulder, CQ University of Colorado Institute for the Study of Intellectual Behavior, 1977.
I wish to acknowledge the assistance of Professor Kate Kiefer of Colorado State University, in running the passages on WRITER’S WORKBENCH. In addition, the work reported here was completed under the auspices of an Oakland University Faculty Research Fellowship, Spring/Summer, 1985.

 
   
Copyright 2006 by ATAC