You are on page 1of 31

Improving Tabular Displays, with NAEP Tables as Examples and Inspirations Author(s): Howard Wainer Source: Journal of Educational

and Behavioral Statistics, Vol. 22, No. 1 (Spring, 1997), pp. 1-30 Published by: American Educational Research Association and American Statistical Association Stable URL: http://www.jstor.org/stable/1165236 Accessed: 25/10/2010 17:43
Your use of the JSTOR archive indicates your acceptance of JSTOR's Terms and Conditions of Use, available at http://www.jstor.org/page/info/about/policies/terms.jsp. JSTOR's Terms and Conditions of Use provides, in part, that unless you have obtained prior permission, you may not download an entire issue of a journal or multiple copies of articles, and you may use content in the JSTOR archive only for your personal, non-commercial use. Please contact the publisher regarding any further use of this work. Publisher contact information may be obtained at http://www.jstor.org/action/showPublisher?publisherCode=aera. Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printed page of such transmission. JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about JSTOR, please contact support@jstor.org.

American Educational Research Association and American Statistical Association are collaborating with JSTOR to digitize, preserve and extend access to Journal of Educational and Behavioral Statistics.

http://www.jstor.org

Journal of Educational and Behavioral Statistics Spring 1997, Vol. 22, No. 1, pp. 1-30

Improving Tabular Displays, With NAEP Tables as Examples and Inspirations


Howard Wainer
Educational Testing Service
Keywords: graphical comprehension, NAEP, tabular displays The modem world is rich with data; an inability to effectively utilize these

data is a real handicap.One commonmode of data communication is the In data table. this article we the use printed providefour guidelines of
which can make tables more effective and evocative data displays. We use the National Assessment of Educational Progress both to provide inspiration for the development of these guidelines and to illustrate their operation. We also discuss a theoretical structure to aid in the development of test items to tap students' proficiency in extracting informationfrom tables.

In his 1786 atlas of England and Wales, William Playfair wrote of the increasing complexity of modem life. He pointed out that when life was simpler and data were less abundant,an understandingof economic structure was both more difficult to formulate and less important for success. But by the end of the 18th century, this was no longer true. Statistical offices had been established and had begun to collect data on which political and commercial leaders could base their decisions. Yet the complexity of these data precluded their easy access by any but the most diligent. Playfair's genius was in surmountingthis difficulty through his marvelous invention of statistical graphs and charts. The complexity of life within 18thcentury Britain and the massiveness of available data were but trifles in comparison to today's complex network of data sources and topics. These data are being transformed into graphic forms at a breathless pace. Today we have the need to clearly and accurately display summaries of huge amounts of information.Computing equipment, software, and electronic networks provide the means to summarize information and disseminate results. What we lack is a broad understandingof how best to do it. In this article we examine the data table as a communicative display and suggest four steps which, if followed, can allow tables to communicate better. In this reportwe focus principallyon the table because we heartily subscribe to the notion that although accurate data can help us to understandthe world, they can help only if they are properlyinterpreted. There can be no assuranceof a proper of the data on the printed however,unless the arrangement interpretation, it page is clear, logical, complete,and properlyfocused. . .. Incidentally, is ourconviction,testedin experience, thatlanguageflows moreeasily and

Wainer

logically from the pen of him whose tabulateddata reflect careful and & Durost, 1936, p. iii) precisethinking.(Walker I have two goals in this article. The primary one is to provide concrete guidelines for constructing more communicative tables. My second goal is more limited. Because tables are used so often to communicate information, it is generally felt that the ability to understandtables is an important skill for a literate citizen. It is not uncommon to find tables used as stimuli within many tests of basic reading or mathematical skills. Thus my second goal is to illustrate how improved tabular presentation immediately lends itself to expanding the range of test questions that can be asked. For reasons that will become clearer by the end of this article, we have chosen to intertwine a general discussion of tabulardisplay with a discussion of the use of tables within the National Assessment of Educational Progress (NAEP)'-both as stimuli in test items and as communicative media in published reports. To build any effective display we must have a firm notion of purpose. We cannot know what the best answers are unless we know what the questions are. Thus we must first understand what questions will be asked of the data. Any discussion of data display in the abstract is pointless. To aid in understanding the intended purposes for tabular displays, we shall look at the sorts of test questions that NAEP pairs with data tables. It does not seem far-fetched to assume thatthe sorts of questions NAEP experts believe children ought to be able to answer from tables are the same sorts that everyone else should be facile with. This exercise is what naturally led us to the second goal, oftentimes intermingled in this report with the first, of expanding the range of test questions that can be posed when data are well displayed. The advice offered here comes from many sources, filtered through me. To the extent that I add anything to the wisdom of those who have preceded me on this path, it is in the attention devoted to depicting error.

Tabular Presentation
froma tableis likeextracting froma cucumber. Gettinginformation sunlight & 1891, p. 55) (Farquhar Farquhar, The disdain shown by the two 19th-century economists quoted above reflected a minority opinion at that time. As commonly prepared, tables, spoken of so disparagingly by the Farquhars,remain to a large extent worthy of contempt. Before we explore ways to improve a tabular display, it is wise to be explicit about the likely audience and goals of the display. In this report we examine tables within NAEP that are aimed at three separate audiences: children, the lay public, and education professionals. While it might seem that this diversity of audience and associated goals ought to yield quite different structuresfor their displays, it appearsthat the requirementsof their
2

Improving Tabular Displays

shared cognitive and perceptual apparatusdominate their differences in age, training, and interests. The sets of rules for table construction that emerge for the three groups are virtually identical in general structure and vary only because of constraints imposed by the increasing complexity of the data themselves. Why are tables used to display data? All data displays, including tables, are used for one or more of four purposes: (1) Exploration. Data can contain answers to questions that may be explicit in the viewer's mind or not. Data exploration answers explicit questions while posing questions previously unthought of. (2) Communication. Once the data are explored they can be displayed to convey what has been discovered to a broader audience. (3) Storage. Data are expensive to gather; once they have been gathered, it is usually imprudentto lose them. In the past they have been stored for future use in various sorts of data displays. (4) Decoration. Data displays are often used to enliven a presentation. Indeed, conversations with reporterson the use of graphics invariably center around how to locate a display to attract the eye of the reader. A principal tenet of effective data display is that before designing a display one must establish a hierarchy of purpose and not try to do too much. A display aimed at communication should not try to serve an archival purpose as well, since rules governing these two purposes are often antithetical. The initial collection, and hence display, of most data sets begins with a data table. Thus any discussion of display should start with the table as the most basic construction. Rules for table construction are often misguided, aimed at the use of a table for data storage rather than data exploration or communication.The computerrevolutionof the past 30 years has obviated the need for archiving of data in printedtables, but rules for table preparation have not been revised apace with this change in purpose. Modem data storage is accomplished well on magneticdisks or tapes, optical disks, and other mechanical devices. Paper and print are meant for human eyes and human minds. Helen Walker and Walter Durost (1936) provided a careful description of guidelines for the constructionof statistical tables. Ehrenberg(1977) amplifies some of these rules to allow tables to become a still more effective multivariate display. Among his rules are (a) rounding heavily, (b) ordering and grouping the rows and columns by some aspect of the data, (c) framing the display with suitable summary statistics, and (d) spacing to aid perception. More recent work on effective tabularpresentation(Clark, 1987; Wainer,1992, 1993) elaborates and illustrates these simple rules for designing effective tables. We shall begin this discussion with a more detailed statement andjustification of these four rules of effective tabular display within the context of tabular displays in NAEP test items. When this is complete, we shall then go on to do the same thing for largertables used principally for communication 3

Wainer

in NAEPreports, although they areoftenexpectedto servearchival purposes as well.

Tablesas Partof NAEPItems


Example 1: 1992 12th Grade Math, Questions 3 and 4

This exampleshows how rounding tableentriesmakesa difference.The 4 were basedis: which and table on 3 Questions original

POPULATIONSOF DETROITAND LOS ANGELES


1920-1970

Year 1920 1930 1940 1950 1960 1970

City Detroit Los Angeles 950,000 500,000 1,500,000 1,050,000 1,800,000 1,500,000 1,900,000 2,000,000 1,700,000 2,500,000 1,500,000 2,800,000

The two questions(omittingthe alternatives offered)were: 3. How manymorepeoplewerelivingin Los Angeles in 1960 than 1940? 4. Whatwas the first year listed in which the population of Los Angeles was greaterthanthe population of Detroit? If we roundto two digits (the nearesthundredthousand)and tidy up the displaya bit, we get:

PopulationsFor Two Cities


Years: 1920-1970

Units: Millions Year 1920 1930 1940 1950 1960 1970


4

Detroit 1.0 1.5 1.8 1.9 1.7 1.5

Los Angeles 0.5 1.1 1.5 2.0 2.5 2.8

Tabular Displays Improving

The answerto Question3 is clearly 1 million, and the answerto Question 4 is 1950. It awaitsempiricalverification whetherthis is easierthanbefore revision,butmy intuition (andmy twelve-year-old son) certainly suggestsso. to two digits?Letus explorethisin a discussion WhydidI suggestrounding of the first rule of table construction: Rule I. Round-a lot! This is for threereasons: "*Humanscannotunderstand morethantwo digits very easily. "*Wecan almostnever more thantwodigitsof accuracy justify statistically. "*We almostnevercare aboutaccuracyof more thantwo digits. Let us take each of these reasonsseparately. Consider the statement"This year's school budget is Understanding. or rememberthat? If we remember $27,329,681." Who can comprehend it is almost the translation "This year's school budget is anything, surely about27 million dollars." Statistical The standard of moststatisticsis proportional error justification. to 1 over the squareroot of the sample size.2 God did this, and there is nothing we can do to change it. Thus suppose we would like to reporta correlation as .25. If we don't want to reportsomethingthat is inaccurate, we must be sure that the second digit is reasonablylikely to be 5 and not 6 or 4. To accomplishthis we need the standarderror to be less than .005. But since the standard is proportional error to 1/ , the obviousalgebra = therefore - 1/.005 200) yields the inexorable con(1/, n- .005, clusionthata samplesize of the orderof 2002or 40,000 is required to justify the presentation of morethana two-digitcorrelation. A similarargument can be made for most otherstatistics. Whocares? I recentlysaw a tableof averagelife expectancies thatproudly the mean life of a male at in birth Australia to be 67.14 reported expectancy What does 4 the mean? Each unit in the hundredths years. digit of this overzealousreportage 4 What is served in knowing represents days. purpose a life expectancyto this accuracy? For most communicative (not archival) purposes,67 would have been enough. The effects of too manydigits is sufficientlyperniciousthatI would like to emphasizethe importance of roundingwith anothershortexample.The

following equation is taken from State Court Caseload Statistics: Annual

Report,1976 (CourtStatisticsProject,1976):
In (DIAC) = -.10729131 + 1.00716993 x In (FIAC),

where DIAC is the annualnumberof case dispositions,and FIAC is the annualnumberof case filings. This is obviously the resultof a regression error analysiswith an overgenerous outputformat.Using the standard justification for roundingwe see thatto justify the eight digits shown we would
5

Wainer

need a standard errorthatis of the orderof .000000005,or a samplesize of the orderof 4 X 10'6. This is a very largenumberof cases-the population of Chinadoesn'tput a dent in it. The actualn is the number of states,which allows one digit of accuracyat most. If we roundto one digit andtransform out of the log metricwe arriveat the more statistically defensibleequation DIAC = .9 FIAC. intoEnglishas "There Thiscan be translated areabout90%as manydispositions as filings."Obviouslythe equationthatis more defensiblestatistically is also much easier to understand. who knows My colleagueAl Biderman, more aboutcourtsthan I do, suggestedthatwe needed to roundfurther, to the nearestinteger(DIAC= FIAC),and so a morecorrectstatement would be "Thereare about as many dispositionsas filings." A minute'sthought aboutthe courtprocess remindsone thatit is a pipelinewith filings at one end and dispositionsat the other.They must equal one another,and any variationin annualstatisticsreflectsonly the vagariesof the calendar. The the sortof numerical demonstrated in first can statistisophistry equation give cians a bad name.3
Example2: 1990 8th and 12th Grade Science Assessment

of purpose. Any redesigntask mustfirst try to develop an understanding The presentation of the dataset in Table 1 must have been intendedto help the readeranswersuch questionsas: of battery life forthebrands chosen? (1) Whatis thegenerallevel (inhours) How do the brands differ with (2) battery respectto theirlife expectancies? What'sthe best one? The worst? use batteries (3) Whatkindsof equipment up mostquickly?Leastquickly? TABLE 1
Tablepaired with Items21 and 22 on the 1990 8th and 12th grade science assessment

BatteryLife in Hours Battery Brands ConstantCharge PowerBat Servo-Cell Never Die Electro-Blaster
6

Cassette Player 5 7 4 8 10

Radio 19 24 21 28 26

Portable Flashlight Computer 10 3 13 5 12 2 16 6 15 4

Improving Tabular Displays

(4) Are there any unusualinteractionsbetween equipmentand battery brand? These are obviouslyparallelto the questionsthatareordinarily addressed in the analysis of any multifactorial table--overall level, row, column, and effects. interaction the information in the table in this way we are able to By characterizing out areas of explicitly lay questionsthat might be asked about these data in an effort to determinethe extent to which studentscan understand data a In in table. there were three that followed this fact, presented questions table, but only one asked aboutthe data, and it was parallelto Question2: 21. On the basisof the information in the table,whichbranddo you think is the best all-purpose (Assumeall batteriescost the same.) battery? The next questionaskedabouthow the studentmadethis determination: 22. Briefly explain how you used the information in the table to make your decision. Before going further, I invite you to readTable 1 carefullyand see to what extent you can answerthe four questions.But don't peek ahead! The entriesin this tableare alreadyrounded, so we can go directlyto the second rule of tableconstruction: Rule II. Order the rows and columns in a way that makes sense. orderis almostneverthe best way to go. Threeuseful ways to Alphabetical orderthe dataare: "*by size. Oftenwe look mostcarefully at whatis on top andless carefully further down.Putthe biggestthingfirst.Also, ordering by some aspect of the data often reflects orderingby some hidden variablethat can be inferred. "*naturally. Time is orderedfromthe past to the future.Showingdatain thatordermeldswell with whatthe viewermightexpect.This is always a good idea. "*accordingto interest.If we are especially interestedin comparinga set of rows or columns,put them adjacentto one another. particular Table2 is a redoneversionof Table1 in whichbatteries (rows)areordered life in a with the radio, by battery longest-lastingbatteryfirst. Types of are ordered how least (columns) equipment by quicklythey use up batteries, voraciousfirst.Fromthis we see thatby ordering radio use we have also by orderedfor flashlights.There is some minor shufflingwithin the cassette columns.Now thatthetableis ordered, NAEP playerandcomputer answering Question21 is easy, as are most othermaineffect questions. We can improvemattersstill further the thirdrule: by remembering
7

Wainer TABLE 2

First revisionof batterylife table (rowsand columnsordered,extraneous lines removed)

Battery Brands Never Die Electro-Blaster PowerBat Servo-Cell Constant Charge

Radio 28 26 24 21 19

BatteryLife in Hours Cassette Portable Flashlight Player Computer 16 8 6 15 10 4 13 7 5 12 4 2 10 5 3

Rule III. ALL is different and important. Summaries of rows and columns are important as a standard for comparison-they providea measure of usualness.What summarywe use to characterize all depends on the a Sometimes sum or a mean is more often a median.But suitable, purpose. whateveris chosen it shouldbe visuallydifferent fromthe individual entries and set spatiallyapart. The summaries(means)surrounding Table 3 make the row and column effects explicit. Now we not only see thatthe Never Die battery the best all but we have a measure of how much better it is. We also see thata around, uses batteries about times 6 as fast as a radio. computer Can we go further?Sure. To see how requiresthat we consider what a tablefroma graph.A graphuses spaceto conveyinformation. distinguishes A table uses a specific iconic representation. We have made tables more understandable a table more like a graph.We can by using space-making tables further them moregraphical still. A semigraphical improve by making display like the stem-and-leaf diagram(Tukey, 1977) is merely a table in which the entriesare not only orderedbut are also spacedaccordingto the size of the gaps betweenadjacent rows or columns.The rule then is: TABLE 3
Secondrevisionof batterylife table (rowand columnmeansshownand emphasized) Life in Hours Battery Cassette Portable Battery Battery Brands Radio Flashlight Player Computer Averages NeverDie 28 16 8 6 15 Electro-Blaster 26 15 10 4 14
PowerBat 24 13 7 5 12

Servo-Cell
Constant Charge

21
19

12
10

4
5

2
3

10
9

Usage averages
8

24

13

12

Improving Tabular Displays

Rule IV. Add spacing to aid perception. If there is a clustering among rows or columns, space them so that they look clustered. To put this notion into practice, consider the next version of Table 1, shown as Table 4. The rows have been spaced according to what appear to be significant gaps (Wainer& Schacht, 1978), and we see that batteries fall into two groups: three relatively strong batteries and two weaker ones. This yields a table that is about as good as we can do. Now we can see that a battery lasts about twice as long in a radio as in a flashlight, and about twice as long in a flashlight as in a cassette player. Moreover, we see clearly that the three best batteries yield about 50% more life than the two worst. This brings us to an interesting issue. NAEP Questions 21 and 22 could be answered trivially if the table were transformedas we have done in Table 4. Should we transformthe table? The way in which we have structuredthe table is not based on the particularquestions that were asked, but rather on general rules for all tables. We would have done it in exactly the same way had we not seen the questions. This transformationmerely follows a set of rules that characterizes good practice. The original table was flawed in that it didn't conform to standards of good practice. Basing a characterization of an examinee's ability to understand a data display on a question paired with a flawed display is akin to characterizing someone's ability to read by asking questions about a passage full of spelling and grammaticalerrors whose sentences were ordered haphazardly.What are we really testing? One might say that we are examining whether or not someone can understand what is de facto "out there." I have some sympathy with this view, but what is the relationship between the ability to understandilliterate prose and the ability to understand proper prose? If we measure the former, do we know anything more about the latter? Yet how often do we encounter well made displays in the everyday world? Should we be testing what is, or what should be?

TABLE4 Thirdrevisionof batterylife table (rows spaced to accentuatebatteryclusters) BatteryLife in Hours Cassette Portable Battery Battery Brands Radio Flashlight Player Computer Averages NeverDie 28 8 16 6 15 Electro-Blaster 26 15 10 4 14 PowerBat 24 13 7 5 12 Servo-Cell Constant Charge Usage averages 21 19 24 12 10 13 4 5 7 2 3 4 10 9 12 9

Wainer

A morepractical most constructed, problemis thatif a displayis properly Thatis the nature of graphics commonlyaskedquestionsareeasily answered. to ask nontrivial andhumaninformation processingability.It is harder questions of a well constructed table.This is not an isolatedissue. I will discuss it further in the conclusionof this article. While we cannothope to resolve these issues here, I would like to add one vote towardtesting literacywith prose that is correctlycomposedand of testingnumeracywith data displaysthat conformto acceptedstandards good practice.If we do otherwisewe may be able to connectour test with commonpractice,but is thatwhat we wish to do? In the concluding sectionof thisarticleI will discussthe kindsof questions that can be constructed and suggest a theoreticalstructure that will aid in futuretests of this sort.
Example3: 1992 4th, 8th, and 12th Grade Math Assessment Original Table Revised Table

Ten Students' Test scores Student A B C D E


F

Ten Students' Test scores Student C A H E


B

Score 88 65 91 36 72
57

Score 91 88 85 72
65

G H I J

50 85 62 48

I F G J D
Mean

62 57 50 48 36
65

Question9, associatedwith the above table, is as follows. 9. The tableaboveshowsthe scoresof 10 students on a finalexamination. Whatis the rangeof these scores?(thenfour options) To answerthis question,one needs to know thatthe rangeis the difference between the largestand the smallest entries,find them, and then subtract them. A properlyprepared table, which ordersthe rows by the data rather than some arbitrary letter, removes the need for the second step.4 Also, where thereare datagaps (invisiblein the originaltable) introducing spaces the to provides opportunity ask other,deeperquestionsaboutthe structure of these data.
10

Tabular Improving Displays Big Tables in NAEP Reports

NAEP reportsare often motherlodes of information, but sometimesit amountof effortto mine thatinformation. One reason takes a considerable is the formatof the datapresentation. that such effortis required It appears is sometimes viewed as more thatsavingspace a important goal thaneffective Let us examinea single largetable from one majorNAEP communication. of the aforesaidfour rules can increase reportand see how the application its comprehensibility. The table chosen sharesenoughof its characteristics with othertablesto allow one exampleto be broadlygeneralizable.
Example4: Table2.12 From Data Compendiumfor the NAEP 1992

Mathematics Assessment of the Nationandthe States This table,reproduced as Table5, shows the averagemathematics performance of eighth-grade examineesfrom all participating in the jurisdictions 1992 statemathematics assessmentas a functionof parent's education. Also includedis thepercentage of examinees in eachstatewhoseparents' education is at each of the designatedlevels. As is customary, the standard errorsof all figuresare presentedin parentheses. Before we attemptto revise this table, it is wise to considerits likely purpose.Why would anyone want to see data like these? What sorts of questions would such data answer?How easily could the readerof this table answerthe same sorts of questionsthat were asked of childrenin the assessment?How hardis it to answera questionanalogousto Question21 aboutwhatis thebestall-purpose (Whatis thebestperforming battery state?)? Or one analogous to Question9 aboutthe rangeof scoresamong 10 children (Whatis the rangeof performances states?)? amongthe 41 participating Any should allow such obvious redesign questionsto be answeredeasily. Moregenerally, for this table,as withmosttwo-waydisplays,the questions that can be answeredare basedon the factorspresented, to wit: (1) How did the childrenin each of the jurisdictionsperformin math? Whichstates did the best? Whichthe worst?How much variation is thereamongthe states?How does my statecomparewith otherslike it?Withthe nationas a whole?Whatis theclustering amongthe states? (2) What is the relationshipbetween parentaleducationand children's mathperformance? educationhave the same effect in all jurisdictions? (3) Does parental In addition,thereare questionsparallelto these dealingwith the percentage of childrenat each parentaleducationlevel. (4) How well educatedare the parentsof these childrenin each of the Whichstateshave the best educated Whichthe jurisdictions? parents? worst?How much variation is thereamongthe states?How does my
11

Wainer

state compare with others like it? With the nation as a whole? What is the clustering among the states? (5) Which level of parental education is most common? Which is least? How much parental education is "typical"? (6) Does the distribution of parental education have the same shape in all jurisdictions? After answering the above questions, we would like to be able to know which differences we observe are possible artifacts of sampling fluctuation and which represent real differences in the populations of interest. TABLE5 for the NAEP 1992 Mathematics OriginalTable2.12 from DataCompendium
Assessment of the Nation and the States (p. 83): Average mathematics proficiency by parents' highest level of education
Grade 8 - 1992 Did Not Finish High Some Education After Graduated College Graduated High School I Don't Know School High School Perce.nage Average Percentage Average Prcentagel Average Per rtage Average Percentage Average of Students Proficiency of Studens Proficiency of Studenr Proficienc of S ens Proficiency of Students Proficiency 40 (1.4) 279 (1.4) 18 (0.6) 270 (1.2) 25 (0.8) 8 (0.6) 256 (1.4) 248 (1.8) 9 (0.5) 251 (1.7) 38 (3.1) 282 (4.2) 18 (1.1) 267 (3.0) 8 (0.9) 26 (2.2) 259 (4.2) 246 (4.2) 10 (1.2) 250 (3.3) 35 (1.9) 270 (1.9) 17 (0.8) 263 (2.0) 28 (1.4) 249 (1.9) 12 (1.6) 246 (4.2) 8 (1.0) 248 (4.3) 42 (2.7) 283 (2.9) 20 (1.4) ** (') 273 (1.6) 26 (1.7) 4 (0.7) 264 (2.3) 7 (0.8) 258 (3.8) 43 (2.9) 279 (2.6) 18 (1.2) 274 (2.6) 19 (1.5) 9 (1.1) 252 (2.9) 248 (2.4) 11 (0.9) 248 (2.9) 33 36 30 39 46 47 39 32 39 35 38 48 33 44 28 32 40 44 48 38 48 36 36 46 (1.6) (1.5) (1.1) (1.8) (1.2) (1.3) (1.2) (1.0) (1.5) (1.7) (1.1) (1.2) (1.5) (1.4) (1.4) (1.4) (1.5) (1.7) (1.5) (1.6) (1.3)> (1.7) (1.3) (1.5) 261 277 264 275 282 288 (2.5) (1.5) (1.9) (2.0) (1.3) (1.0)> 18 (0.7) 22 (1.0) 20 (0.8) 18 (1.0) 19 (0.9) 16 (0.8) 18 (1.0) 17 (0.8) 19 (0.7) 18 (0.7) 15 (0.9)< 20 (0.8) 21 (0.9) 21 (0.8) 19(0.8) 20 (0.9) 22 (1.0) 18 (0.9) 17 (0.8) 23 (0.9) 21 (0.9) 16 (0.7) 22 (0.9) 20 (1.0) 17 (0.8) 18 (0.8) 20 (0.7) 18 (1.1) 20 (0.8) 18 (0.7) 19 (0.7) 21 (0.9) 19 (0.9) 18 (1.5) 16(0.7) 21 (0.9) 18 (0.8)> 22 (1.0) 18 (0.8) 18 (0.8) 24 (0.8) 22 (0.8) 13 (0.7) 11 (0.8) 258 270 264 266 276 272 268 240 266 264 266 278 275 285 267 259 281 266 272 271 284 256 275 280 280 275 264 271 265 283 272 272 274 271 268 265 272 278 270 269 282 278 (2.0) (1.5) (1.7) (2.1) (1.6) (1.8) (2.3) (1.9) (1.9) (1.7) (1.9) (1.3) (1.9) (1.5) (1.6) (1.8) (1.5) (1.9) (1.8) (2.0) (1.8) (2.0) (1.5) (1.6) (1.5) (2.1) (1.4) (2.4) (1.6)> (1.9) (1.6) (1.9) (1.9) (1.5) (1.7) (1.8) (1.6) (1.2) (1.6) (1.4) (1.5) (1.7) 29 (1.1) 21 (0.9j 31 (1.1) 17 (0.9) 21 (0.9) 22 (0.9) 30 (1.0) 29 (0.8) 24 (1.1) 30 (1.2) 25 (1.0) 19 (0.9) 32 (1.1) 25 (1.1) 32 (0.9) 30 (1.3) 26 (1.1) 25 (1.2) 21 (1.0) 26 (0.9) 22 (0.9)"< 29 (1,4) 29 (1.0) 24 (1.2) 24 (1.1) 23 (1.2) 26 (1.1) 23 (1.0) 27 (0.9)< 19 (1.3) 32 (1.1) 26 (1.0) 30 (1.2) 22 (1.4) 31 (0.9) 29 (1.0) 21 (1.0) 15 (0.8) 24 (0.9) 33 (1.1)< 28 (1.8) 23 (0.7) 27 (1.1) 29 (0.9) 13 (0.9) 10 (0.7) 11 (0.7) 10 (0.9) 6 (0.6) 6 (0.6) 6 (0.5) 251 (1.7) 224 (1.6) 9 (0.7) 251 (1.8) 8 (0.7) 11 (0.8) 250 (1.3) 246 (1.8) 6 (0.5) 7 (0.5) 268 (1.4)> 8 (0.6) 260 (1.6) 4 (0.4) 273 (1.3) 254 (1.6) 15 (0.9) 10 (0.7) 242 (1.6) 267 (1.1) 6 (0.5) 250 (1.8) 6 (0.8) 261 (1.4) 7 (0.6) 257 (1.7) 6 (0.5) 270 (1.8)> 3 (0.4) 239 (1.6) 13 (0.8) 8 (0.7) 264 (1.6) 4 (0.5) 267 (1.7) 267 (0.9)>> 6 (0.5) 7 (0.6) 259 (2.5) 249 (1.4) 11 (0.7) 6 (0.8) 256 (2.5) 246 (1.7) 10 (0.6) 271 (1.7) 3 (0.5) 7 (0.6) 260 (2.3) 257 (1.7) 8 (0.7) 262 (1.6) 7 (0.8) 256 (1.6) 8 (0.4) 248 (1.4) 9 (0.6) 251 '1.6) 12 (0.8) 253 (1.6) 16 11.0) 258 (1.8) 3 (0.3) 252 (1.5) 9 (0.6) 251 (1.2) 13(0.9) 270 (1.9) 5 (0.6) 266 (1.1) 5 (0.6) (1.8) (1.6) (1.6) (2.1) (1.5)> (1.8) 229 (1.9) 221 (1.9) 10 (0.9) 14 (0.9) 244 256 248 251 260 260 239 245 246 241 250 245 248 225 244 244 242 254 250 262 246 237 259 240 248 249 256 234 254 247 259 253 244 243 240 259 243 254 252 244 248 245 247 254 248 244 254 258 (2.0) (2.5) (2.4) (2.2) (2.4) (3.3) (4.0) (3.2) (2.7) (2.2) (3.5) (2.3) (2.6) (2.4) (1.7) (2.4) (2.7) (3.7) (3.2) (2.0) (4.2) (1.8) (2.4) (3.3) (2.5) (3.8) (1.9) (4.2) (2.3) (4.5) (2.6) (2.9) (2.8) (2.1) (2.1) (2.0) (1.7) (3.2) (2.1) (1.8) (3.4) (3.3) 7 (0.6) 12 (0.8) 8 (0.6) 16 (1.1) 7 (0.5) 9 (0.6) 8 (0.9) 12 (0.6) 10 (0.7) 6 (0.6) 16 (0.8) 6 (0.5) 6 (0.5) 5 (0.4) 6 (0.4) 7 (0.6) 5 (0.5) 7 (0.5) 7 (0.6) 7 (0.6) 7 (0.6) 7 (0.6) 6 (0.5) 6 (0.6) 7 (0.5)> 8 (0.7) 10 (0.6) 10 (1.0) 6 (0.5) 5 (0.5) 5 (0.S) 6 (0.5) 5 (0.5) 8 (0.6) 7 (0.3) 5 (0.4) 11 (0.8) 7 (0.5) 8 (0.6) 7 (0.4) 6 (0.6) 7 (0.5) 22 (1.2) 24 (1.0) 237 248 245 240 252 251 248 229 244 245 246 254 249 266 242 236 266 245 248 248 268 231 252 256 262 250 245 240 240 272 249 251 252 239 247 243 244 258 251 239 255 260 (2.9) (2.7) (2.7) (2.9) (2.6) (2.4) (3.4) (2.2) (3.21 (2.6) (2.1) (2.8) (3.3) (2.8) (2.8) (3.7) (2.6i (3.8; (2.6) (3.0o (3.0) (2.8) (2.9) (3.8) (2.5) (3.9) (2.0): (3.8) (3.6) (2.8) (4.5) (4.3) (3.81 (2.5) (3.0) (3.6) (2.4! (2.7) (2.51 (2.31 (4.01 (2.2)

PUBULIC "SCHOOLS NATION Northeast Southeast Central West STATES Alabama Arizona Arkansas California Colorado Connecticut Delaware Dist. Columbia Florida Georgia Hawaii Idaho Indiana Iowa Kentucky Louisiana Maine Maryland Massachusetts Michigan Minnesota Mississippi Missouri Nebraska New Hampshire New Jersey New Mexico New York North Carolina North Dakota Ohio Oklahoma Pennsylvania Rhode Island South Carolina Tennessee Texas Utah Virginia West Virginia Wisconsin Wyoming TERRITORIES Guam Virgin Islands

46 (1.5) 45 (1.6) 34 (1.4) 44 (1.8) 36 (1.2) 54 (1.2)> 37 (1.4) 39 (1.4) 39 (1.8) 43 (1.1) 37 (1.4) 33 (1.5) 34 (1.6) 53 (1.3) 41 (1.5) 29 (1.1) 38 (2.4) 42 (0.9) 28 (1.2) 23 (1.1)

274 (1.3) 244 (1.7) 268 (1.9) 271 (2.1) 267 (1.5) 281 (0.9) 283 (1.5) 291 (1.2)> 278 (1.6))) 256 (2.5) 288 (1.4) 278 (1.8) 284 (1.3) 277 (2.2) 290 (1.0)> 254 (1.6) 280 (1.7) 287 (1.2) 287 (1.4) 283 (1.8) 272 (1.4) 277 (1.9) 271 (1.4;> 289 (1.1) 279 (1.8) 277 (1.5) 282 (1.6) 276 (1.1) 272 (1.5) 267 (2.1) 281 (2.1)> 280 (1.0) 282 (1.5) 270 (1.5) 287 (1.8) 281 (0.9) 246 (1.9) 224 (2.0)

244 (2.4) 232 (2.4)

224 (2.5) 219 (2.4)

226 (2.0) 217 (1.4)

The percentages for parents" highest level ofeducation may not add to 100 percent because some students I don't know." >>Thevalue for 1992 was significanty higher than the value for 1990 at about the 95 permentcertainty level. <<Thevalueresponded for 1992 was significantly lower than the value for 1990 at about the 95 percent certainty level. These notations indicate statistical signiflcance from a multiple comparison procedure based on the 37 jurisdictions participating in both 1992 and 1990. If looking at only one state then >and< also indicate differences that are significant. Statistically significant differences between 1990 and 1992 for the state comparison samp!lesfor the nation and regions are not indicated.

12

Tabular Improving Displays

Answersto all of these questionslie within the boundsof Table5, but Can we ease the pain of this extraction how easily can they be extracted? a in the of the table? design through change Let us begin the real workof the redesignby askingwhy one wouldwant in each educational to includethe percentages categoryin the same tableas as to the mathematics proficiency, opposed placingthem in theirown table on a facing page. The majorreasonis thatthe percentages are important for in means. means are other but it state Such would tables, calculating given seem good practice(remember Rule III)to includethemhere.Once theyare calculated,they provide a sensible variableon which to order the states thanthealphabet-Rule II).Oncethisordering hasbeenaccomplished (rather we cansee apparent in the states' A natural visualmetaphor performance. gaps for these data gaps is to include matchingphysical gaps.5The resulting table of mean proficienciesby state is shown as Table 6. The percentage of parental distributions education elidedfromthis tablehave been set aside for a paralleltable;we shall return to these datashortly. We havealso movedthe Districtof Columbia into the sectionof nonstates thatalso includesGuamand the VirginIslands.All orderingis done within table section.Note thatthe key summaries are in boldfacetype. For ease of the standard errorshave temporarily been removedfrom their manipulation parentheses. They will shortlybe removedaltogether. Table6 allows us to answersome of the questionsphrasedinitiallyquite of the states easily, especially those dealing with the relativeperformance (Question 1). The usual finding of Midwesternstates having the highest andthe Southern statesthe lowest is seen immediately. averageperformance Moreover,we see that there is a 37-point differencebetween the highest states and the lowest. Interpreting 37 points is helpedby remembering that there is an averageincreaseof 12 NAEP points/yearbetween fourthand as eighth grade in math. Thus, the 37-point differencecan be interpreted correspondingto about a three-yeardifference in average performance betweenthe best and worst performing states.This increasesto more than fouryearswhen one's gaze shiftsto the three"other The gaps jurisdictions." depictedhelp keep our eyes from blurringwhile examiningsuch a large table, and they also provide rough groupingsthat may be suggestive of explanatory hypotheses.Note thatthe dataaboutthe varioussectionsof the countryon the top of Table5 have been removedentirely.This was done becausethey werethe products of a different surveyandhave largerstandard errorsthan the individualstates from which those sections are composed. Theirinclusionjust addedan unnecessary sourceof confusion. Examiningthe averageproficiencyfor the nationat each educationlevel revealsthe unsurprising resultthatchildren whoseparents arebetter educated score higherin mathematics. In addition,it appearsthatchildrenwho don't know their parents'educationperformslightly betterthan childrenwhose did not finishhigh school.This is suggestiveof a grouping somewhat parents
13

TABLE6
Reformatted version of Table 5 in which standard errors are in separately labeled columns, categories of parental education are separated, average state performance is shown, and rows are ordered and spaced by average performance
PUBLIC SCHOOLS Some Education Graduated After Graduated College HighSchool HighSchool Did Not Finish High School se 0 se

I don't Know

Average

0 Nation 279 States


Iowa North Dakota Minnesota Maine Wisconsin New Hampshire Nebraska Idaho Wyoming Utah Connecticut Colorado Massachusetts New Jersey Pennsylvania Missouri Indiana Ohio Oklahoma Virginia Michigan New York Rhode Island Arizona Maryland Texas Delaware 291 289 290 288 287 287 287 281 281 280 288 282 284 283 282 280 283 279 277 282 277 277 276 277 278 281

so 1.4
1.2 1.1 1.0 1.4 1.4 1.8 1.2 0.9 0.9 1.0 1.0 1.3 1.3 1.8 1.6 1.7 1.5 1.8 1.5 1.5 2.2 1.9 1.1 1.5 1.8 2.1 1.3 1.6 2.0 1.5 1.9 2.1 1.4 2.1 1.5 1.4 1.5 1.9 2.5 2.5 1.6

0 270
285 283 284 281 280 282 280 278 278 278 272 276 272 275 274 275 275 272 272 270 271 271 271 270 266 272 268 267 266 268 266 264 264 265 269 265 266 264 258 259 256

so 1.2
1.5 1.9 1.8 1.5 1.5 1.5 1.6 1.3 1.7 1.2 1.8 1.6 1.8 2.1 1.9 1.5 1.9 1.6 1.9 1.6 2.0 2.4 1.5 1.5 1.9 1.6 2.3 1.6 2.1 1.7 1.9 1.7 1.4 1.8 1.4 1.6 1.9 1.7 2.0 1.8 2.0

256
273 271 270 267 267 270 267 268 266 258 260 260 261 259 262 264 260 260 257 252 257 256 256 256 250 253 251 254 251 248 251 250 249 251 251 246 246 248 244 242 239

1.4 248
1.3 1.7 1.8 1.1 0.9 1.9 1.7 1.4 1.1 1.8 1.8 1.5 1.4 2.5 1.6 1.6 1.6 2.3 1.7 1.5 1.7 2.5 1.6 1.6 1.8 1.6 1.7 1.6 2.1 1.4 1.8 1.3 1.4 1.6 1.2 1.7 1.8 1.6 1.8 1.6 1.6 262 259 256 259 259 254 247 254 258 254 245 250 248 253 252 254 250 243 254 248 249 243 244 245 240 247 248 246 241 248 244 244 244 245 244 240 242 246 239 237 234

1.8
2.4 4.5 4.2 2.7 2.5 3.4 3.3 2.3 3.3 3.2 3.3 2.4 3.2 3.8 2.8 2.4 2.6 2.6 2.9 2.1 2.0 4.2 2.1 2.5 3.7 1.0 4.0 1.7 2.2 2.1 2.7 2.2 1.9 2.0 1.8 2.3 3.5 2.4 2.0 2.4 1.8

251
266 272 268 266 262 255 256 254 260 258 251 252 248 250 252 252 249 249 251 251 248 240 239 248 245 244 248 242 240 247 244 245 245 243 239 240 246 245 237 236 231

so 1.7
2.8 2.8 3.0 2.6 2.1 4.0 3.8 2.8 2.2 2.7 2.4 2.6 2.6 3.9 3.8 2.9 3.3 4.5 4.3 2.5 3.0 3.8 2.5 2.7 3.8 2.4 3.4 2.8 2.9 3.0 3.2 2.6 2.0 3.6 2.3 3.6 2.1 2.7 2.9 3.7 2.8

267
283 283 282 278 278 278 277 274 274 274 273 272 272 271 271 271 269 268 267 267 267 265 265 264 264 264 262 261 260 260 259 259 259 258 258 258 257 256 251 249 246

se 1.4
1.4 1.5 1.6 1.5 1.4 2.0 1.6 1.3 1.3 1.3 1.6 1.6 1.6 2.3 1.9 1.8 1.8 2.1 1.9 1.7 2.1 2.5 1.5 1.8 2.1 1.8 1.9 1.7 2.2 1.7 2.1 1.8 1.5 2.0 1.5 1.7 1.9 1.9 2.2 2.2 1.8

274 Kentucky 278 California 275 South Carolina 272 Florida 268 Georgia 271 New Mexico 272 Tennessee 267 West Virginia 270 North Carolina 271 Hawaii 267 Arkansas 264 Alabama Louisiania Mississippi 261 256 254

Other Jurisdictions Guam 246 District of Columbia 244 Virgin Islands 224

1.9 1.7 2.0

244 240 232

2.4 1.9 2.4

229 224 221

1.9 1.6 1.9

224 225 219

2.5 3.2 2.4

226 229 217

2.0 2.2 1.4

235 234 222

2.0 1.9 1.9

Improving Tabular Displays

A smallplot of meanmathperformance in parental education. heterogeneous againstparents'education(Figure1), with a roughreferenceline drawnin, makes the quantitative aspect of this relationshipclearer and provides a answerto Question2. reasonable Scanningdown the firstcolumnof the tableshows thatthe higher-scoring statesalso tend to have a greater of children proportion comingfromhomes with a parent who was a college graduate. But even amongjust thesechildren on parents' thereis stilla 37-point difference between education), (conditioning the highest-and lowest-scoring states.This is partof an answerto the third kindof question, morecomplete answers can be builtby constructing although states.Such a graph,shownas Figure2, graphslike Figure1 for individual contradicts the hypothesisthatdifferences in states'overallperformance are due to differences in parents' education. Aside frombeingmildlystartling in its own right,this resultreducesstill further the needto includethe percentage of childrenin each parental education categorywithinthis table. 280

270
0

'0

260

250

240

Graduated College

Some Education After High School

High School Graduate

I Don't Know

High School Dropout

Parents' Education
FIGURE 1. A plot showing children's mathematics proficiency and their parents' education. A reference line drawn in shows roughly the relationship between the (mostly) ordered categories of parental education and children's performance.

15

Wainer
300

290

280

S"".A
270-

0
260
OU

"'.. OO

250-&

""

240'

230-

220

Graduated College

Some Education After High School

High School Graduate

I don't know

High School Dropout

Parents' Education
in Iowa, 2. A comparison FIGURE of 8th gradersin mathematics of theperformance New Jersey,and Mississippi,shownas a function of theirparents'education.The by state is not solely due of childrenin mathematics differencein the performance in parentaleducation. to differences WhatAbout the Standard Errors? Questions about the statistical significance of these observed differences can be answered after doing a little arithmetic on the standarderrors included within the table. A natural question to ask is why that arithmetic hasn't already been done by the generators of the table. One possible answer to this question is that there are too many plausible questions of statistical significance that might be asked to calculate all of the possible error terms. But, playing devil's advocate, couldn't some conservative errorterm be calculated that would save all of the clutter introduced by the many columns of standard errors? The answer to this, simply put, is yes. And the next version of this table (shown as Table 7) segregates the standarderrors into a separate table and substitutes instead (for quick and dirty significance judgments) three estimates of the standarderror of the difference between any two entries in thatcolumn. The first is an upperbound on the standarderrorof the difference. 16

Tabular Improving Displays This is obtained by multiplying the largest value of the standarderror in that column by ,J. The second entry, labeled 40 Bonferroni, is the first entry multiplied by 3.2. This is obtained from the Bonferroni inequality and based on the idea that a user is interested in making comparisons of his/her own state with each of the others. This controls the family of tests protection beyond the .05 level. The last entry, labeled 820 Bonferroni, is the first entry multiplied by 4.0 and controls the family of tests significance for someone who compares each state with all others. It is likely that this last estimate is unnecessary, since anyone expecting to make that many comparisons will almost surely want the tighter error bounds constructed from the individual standarderrors and perhaps use more powerful procedures for multiple comparisons (e.g., Benjamini & Hochberg, 1995).6 A table augmented with these error summaries, but relieved of the burden of individual accompanying standarderrors, is not only a good deal clearer to look at but, for most prospective users, a good deal easier to use for making inferences about statistical significance of observed differences. Next, while there was no good reason to combine mathematicsachievement and percentageof childrenin each category into the same table, these percentage distributionsare importantin their own right. It was just that their presentation was clearerafter they were separatedinto two tables. To examine this, consider the two variables, shown as Tables 7 and 8. Table 7 contains just mean mathematicsproficiency; Table 8 just the distributionof children across levels of parentaleducation. It appearsthat the benefits associated with housing both of these variables within the same table are too few to offset the increases in perceptualcomplexity that accrue by mixing them. It seems, however, worthwhile to keep them contiguous. Thus we would recommend placing them on facing pages. Note that the states in Table 8 are ordered by the state means from Table 7. This facilitates comparisons between the two tables. It also raises the interestingquestion of whether the increased ease of comprehension yielded by ordering a table by its contents is more than offset by the increased difficulty in making comparisons across tables orderedin different ways. This issue will be discussed further at the end of this section. On both tables we have highlighted unusual entries by putting them in boldface type and boxing them in. Entries that are unusually large are also shaded. Entries that are unusually small are boxed but unshaded. We have also appended a positive or negative sign as a furtherreminderof the direction of the entry's variation. Thus in Table 7 we see that the average score of children whose parentshad only some post-high school education was unusually high in West Virginia. Similarly, Nebraska's and Connecticut's children of high school dropouts scored unusually poorly. The determination of which entries were unusual was made by fitting a simple additive model to the data and examining the residuals. Those residuals that stuck out excessively (more than 2 times the square root of the mean of the squared residuals) were then highlighted. Table 7 goes about as far as
17

TABLE7
Revision of Table 6 with individual standard errors replaced by conservative estimates, unusual entries highlighted, and a state locator index inserted
PUBLIC SCHOOLS Graduated College Some Education After High
School

Graduated High
School

Did Not Finish High


School

I Don't
Know Mean

Nation States
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 Iowa NorthDakota Minnesota Maine Wisconsin New Hampshire Nebraska Idaho Wyoming Utah Connecticut Colorado Massachusetts New Jersey Pennsylvania Missouri Indiana Ohio Oklahoma Virginia Michigan New York Rhode Island Arizona Maryland Texas Delaware Kentucky California South Carolina Florida Georgia New Mexico Tennessee West Virginia NorthCarolina Hawaii Arkansas Alabama Louisiania Mississippi

279
291 289 290 288 287 287 287 281 281 280 288 282 284 283 282 280 283 279 277 282 277 277 276 277 278 281 274 278 275 272 268 271 272 267 270 271 267 264 261 256 254

270
285 283 284 281 282 280 280 278 278 278 272 276 272 275 274 275 275 272 272 270 271 271 271 270 266 272 268 267 266 268 266 264 264 265 269+ 265 266 264 258 259 256 244 240 232

256
273 271 270 267 270 267 267 268 266 258 260 260 261 259 262 264 260 260 257 252 257 256 256 256 250 253 251 254 251 248 251 250 249 251 251 246 246 248 244 242 239

248
262 259 256 259 254 259 247254 258 254 245250 248 253 252 254 250 243 254 248 249 243 244 245 240 247 248 246 241 248 244 244 244 245 244 240 242 1 246 + 239 237 234 224 225 219

251
266 272 268 266 255 262 256 254 260 258 251 252 1 248250 252 252 249 249 251 251 248 240239248 245 244 248 242 240 247 244 245 245 243 239 240 246 245 237 236 231

267
283 283 282 278 278 278 277 274 274 274 273 272 272 271 271 271 269 268 267 267 267 265 265 264 264 264 262 261 260 260 259 259 259 258 258 258 257 256 251 249 246

Other Jurisdictions 42 Guam 43 District of Columbia 44 VirginIslands

246 244 224

229 224 221

226 229 217

235 234 222

Max Std error of diff 40 Bonferroni 820 Bonferroni

3.5 11.3 14.0

3.4 11.0 13.6

Error terms for comparisons 6.4 3.5 11.3 20.7 14.0 25.6

6.4 20.7 25.6

3.5 11.3 14.0

TABLE8
A parallel of Table 7 including instead percentage distribution of parental education
Graduated College Some Education After High School Graduated High School Did Not Finish High School

PUBLIC SCHOOLS

I Don't Know

Nation States
1 2 3 4 5 6 7 8 9 Iowa NorthDakota Minnesota Maine Wisconsin New Hampshire Nebraska Idaho Wyoming

40

18

25

44 64 + 48 40 38 46 46 48 42 47 46 48 45 39 36 33 37 39 41 38 44 43 36 44 34 39 2839 37 39 35 34 33 2936 38 3033 3236

21 18 21 22 24 17 20 20 22

25 19 22 26 28 24 24 19 23

4 3 3 6 5 6 4 7 5 6 6 7 7 7 8 8 7 8 9 6 6 8 10 6 16 6 15 10 9 8 11 11 12 13 10 6 11 13 10 13

5 5 7 5 6 7 6 6 7

10
11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41

Utah 53+

22
16 19 17 18 19 22 21 19 21 18 23 18 18 22 18 18 18 19 18 16 19 18 20 21 18 20 15 20 18 20 16

1522 21 21 23 30 29 32 32 26 24 26 23 22 21 25 21 30 32 17 31 24 30 26 29 33 27 25 31 29 30 29

7
9

Connecticut Colorado Massachusetts New Jersey Pennsylvania Missouri Indiana Ohio Oklahoma Virginia Michigan New York Rhode Island Arizona Maryland Texas Delaware Kentucky California South Carolina Florida Georgia New Mexico Tennessee West Virginia NorthCarolina Hawaii Arkansas Alabama Louisiania L Mississippi

7 7 8 5 6 6 5 6 8 7 10 8 12 7 11 8 6 16 7 10 6 10 5 7 6 16 8 7 7 7

Other Jurisdictions 42 Guam 43 District of Columbia 44 VirginIslands

28 32 23

13 17 11

27 29 29
Error terms for comparisons 2.5 8.1 10.0

10 9 14

22 12 24

Max Std error of diff 40 Bonferroni 820 Bonferroni

3.4 11.0 13.6

2.1 6.8 8.4

1.4 4.5 5.6

1.7 5.5 6.8

Wainer

we might expect in displaying the results to answer all of the questions about achievement scores phrased earlier. Last, the individual standard errors that were previously housed in the original table have been combined and piled into two tables of standard errors matching Tables 7 and 8. These are available from the author at hwainer@ets.org. I believe that these will be so rarely consulted that it isn't worth using up extra pages here. Futureexperience will inform this judgment, and I am prepared to change the format if I am wrong. Thus we have found that separating variables that are only tangentially related into separatetables yields increased comprehensibility.Once the separation is completed, the tables should be structured according to the four rules specified earlier. The questions posed at the beginning of this section, which characterize the most plausible reasons why anyone would want to see these data, are all answered more easily from these revised tables. What about order? Clearly, if we wish to compare data values on different variables from the same set of states it is often helpful if those data are ordered in the same way in those different tables. This is currently accomplished by ordering all tables alphabetically. Is this a good idea? I think that there are several alternatives. The most attractive one to me is to order each table as an independent entity, to be looked at and understood on its own. Secondary analyses that require combining information from several tables should be done from a different data source than the table; almost surely it should be some electronic database that would allow easy subsequent manipulations. But if we are to think of the tables as the first available archive, there may be an argument for ordering all tables on a similar topic in the same way, so that various pieces of information about a particularstate can be picked out easily. If so, alphabetical ordering is only one possibility among many. Is it the best one? Alphabetical ordering has only one thing going for it: It makes locating a specific state easier.7Its principal drawbackis that it usually obscures the structure that the table was constructed to inform us about. If a set of tables like those that grew out of Table 5 are constructed and ordered by overall performance (instead of alphabetically), we have made finding a particular state a bit more difficult.s I believe that this is a small cost in comparison to the gain in comprehensibility. But even this can be ameliorated through the inclusion of a "locator table." All we need to do is number the jurisdictions in the table sequentially from 1 to 44, as was done in the first column of Tables 7 and 8, and then have a small, alphabetically ordered locator index table (Table 9) that connects alphabetically ordered state names to row numbers in the empirically ordered tables. CompoundTables Table 7 is a rectangulararray showing a single dependent variable, mean mathematics proficiency, as a function of two independent variables, parents'
20

Improving Tabular Displays

TABLE9
Alphabetically ordered locator index table of the states in Tables 7 and 8, to be used in case of an emergency loss of any particular jurisdiction

State Alabama Arizona Arkansas California Colorado Connecticut Delaware Florida Georgia Hawaii Idaho Indiana Iowa Kentucky Louisiana Maine Maryland Massachusetts Michigan Minnesota Mississippi Missouri Nebraska

Position 39 24 38 29 12 11 27 31 32 37 8 17 1 28 40 4 25 13 21 3 41 16 7

State New York New Jersey New Mexico New Hampshire NorthDakota NorthCarolina Ohio Oklahoma Pennsylvania RhodeIsland SouthCarolina Tennessee Texas Utah Virginia West Virginia Wisconsin Wyoming Other Jurisdictions Districtof Columbia Guam VirginIslands

Position 22 14 33 5 2 36 18 19 15 23 30 34 26 10 20 35 6 9

43 42 44

education and geographic location. As we have seen, when properly designed such a table can be a clear and evocative communicator of information. Unfortunately,clear design is too commonly abandonedin favor of compound tables when multivariate or multilevel data are to be displayed. Such tables are very hard to understand. In fact, in a survey of education policymakers, Hambleton and Slater (1995) found that such compound tables were the most frequently misunderstood of any data display in NAEP executive summaries. In one such display, parallel in structure to Table 10, more than half of the education professionals answered a simple data extraction question incorrectly. To illustrate the negative effects of a compound table, consider Table 10, a somewhat tidied up version of Table 2.3 from the 1992 NAEP Reading 21

CU

)
aC

-o

C4 um
-

-: ,

"( >i

cf c

cis... C'i N- i
00

.
C-i E

0:

rA

P"
CC C4. :?0 ---el 0
C/

cn
00r-. < ), 00 W, ) " -c-CiCi C, ,C,4 00 , 't" It 0" , 00 0, -,

U
C

" .
*

C,

40-

, C, C,

0 zo

U Cd

, 6 W
?c
V VL CU?

0 -r-ac? olo
m
C-4

Yo

*u

CUNC~C
-4 -4

~d

C-',4n
-4

crrIt ddd

i i0 ~3~ - r 0t ..0 .= 0-4


U

cr0

d
C4

"" ""
"

</

C4

v"
t 0

= u,
-C Oka0 0
00

r-

co

CU

z
o

~ mC 0ro il

0d U " mCdC~ ~ d Cd cyS?


'? ~o00ON

~0

00u
-L

2,

cn r- 00

4.4 tin:3 4.j to) t=a W) z 000 - r. z u CU *3CV 00 U


Z
tr tUU0000

cl? CL, Q:00 Yi 0 9 CA 90) 4.


0

Z0Z

Improving Tabular Displays

Assessment. This display can be improved by breaking it up into smaller displays. For example, the average proficiencies are best shown as a twoway table by themselves (see Table 11). This table shows quantitatively the modest size of the region effects and the much larger grade effects. There are no large interactions, and so no entries are boxed in. Of course, an even more evocative image could be obtained by subtractingout the grade means and then plotting the residuals. Such a table would make clear just how different the Southeast is. Similar tables containing the percentage of children at each of the NAEP reading levels could also be constructed. Why is it that it is often easier to understandseveral simple displays than one compound one? To understanda display involves two distinct phases of perception (Bertin, 1973/1983) which are characterized by two questions: What are the components of data that are being reported? What are the relations among them? The first phase is easy if the horizontal and vertical components are unitary, for example, grade level versus region. It becomes more difficult if they are not, for example, level of proficiency and average proficiency and percent versus region and grade. The second phase of perception is addressed by the four rules, but it is made more difficult if the first phase is complex. In this report we have suggested that unless the simultaneous presentation of multidimensional information is critical to understanding,comprehension is aided by keeping the components of data simple and presenting tables paired. This was illustrated earlier when we separated the percentage distributions and the achievement scores into separate tables.

Discussion and Conclusions


Just as there are no good or bad tests, neither are there good or poor graphs nor good or poor tables. Their value depends on the uses to which they are put. Some constructions answer the questions one is entitled to ask, and TABLE11 versionof Table10 in whichonly regionalproficiencies Reformatted by grade level are included.It is arrangedto emphasizethe two-waystructure of the data. It is bordered by meansand main effects

Average Proficiency
4 Region Grade
Northeast Central West Southeast 223 221 215 214

Grade 8
263 264 260 254

Grade12
293 294 292 284

Regional Regional Means Effects


260 260 256 251 4 4 0 -5

Grade Means Grade Effects

216 -40

260 4

291 35

256

23

Wainer

of possible questionsexplicit, we others do not. By makingthe hierarchy emphasizethe fact thatone cannotlook at a graphor table as one looks at or a trafficsignal.Onedoes notpassivelyreada graph; one queries a painting it. And one must know how to ask useful questions. What are the questionsthat can be asked?To some extent we explored this in the secondsection.In generalthey are the same questionsthatwould Whatare the row effects? be asked of data from any factorialexperiment: How do the rows Whatare the columneffects? Whatare the interactions? and columnsgroupas functionsof these effects? The goal of effective display is to ease the viewer's task in answering thesequestions.We havefoundthatwisely ordering, rounding, summarizing, this. In addition,we must and spacinggo a long way towardaccomplishing confrontthe likely use of a tablehead-onbeforeincludingvariousmixtures of variablesinto it. Addingextrastuffalwaysaffectscomprehensibility, and we must make the triagedecisionbetweensaving space by combiningtwo or moretablesintoone andcommunicating clearly.It hasbeenourexperience that breakingup complex displayssensiblyoften communicates more effiem for a than table. em, largecompound ciently,
Measuring Numeracy

Earlierwe showedthatif tablesthatare used as stimuliwithina test item areprepared the questions associatedwiththemareusuallyreduced properly, in difficulty,often dramatically. This does not mean that the practiceof askingsuch questionsoughtto be discontinued, any morethanwe advocate to use constructed tables make to suchquestions less trivial. continuing poorly The test's usefulnessas a learninginstrument wouldbe enhancedif it served as a model for how tables ought to be prepared as well as illustrating the of information available from well tables. depth readily prepared Well prepared tables will also allow us to constructquestionsthatprobe the deepstructure of thedatain a way thatis too difficultwithpoorlyprepared tables.Whatare such questionslike? To answerthis we need a little theory. this theory,we will use the battery life item fromthe 1990 And, to illustrate ScienceAssessmentintroduced earlierandreproduced hereas Table12.This is identicalto Table4, shown earlier,except thatfour unusualentrieshave beenindicated by boxingthemin. A shadedbox witha positivesign indicates a higherthanexpectedentry;an unshaded box with a negativesign means a lower thanexpectedentry. datapresented in a table (1977) calls the abilityto understand Ehrenberg This but we shall use it in "numeracy." termmay have broader application, this narrowcontextfor the nonce. How can we measuresomeone'sproficiency in understanding quantitative thatare presentedin a tabular phenomena way (an individual's numeracy)? to do exactly this; Obviouslythereare NAEP test items writtenthatpurport the items describedearlier are some typical examples. We can do better
24

Tabular Displays Improving TABLE12 Revisionof Table4 with unusualentrieshighlighted Battery Life in Hours Cassette Portable Battery Battery Brands Radio Flashlight Player Computer Averages 16 8 6 NeverDiel 28 + 15

Electro-Blaster26
PowerBat Servo-Cell 24 21 24

15
13 12 13

10
7 4 7

[
F

45 2

14
12 10

Constant Charge19
Usageaverages

10

3+
4

12

with the guidance of a formal theory of graphic communication (Wainer, 1980, 1992). Rudimentsof a Theoryof Numeracy Fundamentalto the measurementof numeracy is the broader issue of what kinds of questions tables can be used to answer. My revisions of Bertin's (1973/1983) three levels of questions are:

"*Elementary-level questions involve data extraction, for example, How "*Intermediate-level questions involve trends seen in parts of the data, for
example, How much longer is a battery likely to last in a radio than in a portable computer? "*Overall-level questions involve the deep structure of the data being presented in their totality, usually comparing trends and seeing groupings, for example, Which two appliances show the same pattern of battery usage?, or, Which brands of batteries show the same pattern of battery life? They are often used in combination; for example, Zabell (1976) referred to their use in the detection of outliers-unusual data points. To accomplish this objective we need a sense of what is usual (i.e., a trend at the intermediate level), and then we look for points that do not conform to this trend (the elementary level). Such questions are hard to answer from a raw table such as Table I but are trivial in Table 11, where such interactions (this time from an additive model) are highlighted. Note that although these levels of questions involve an increasingly broad understanding of the data, they do not necessarily imply an increase in the empirical difficulty of the questions.9 Reading a table at the intermediate level is clearly different from reading a table at the elementary level; a concept of trend requires the notion of 25 long does a Servo-Cell last in a cassette player?

Wainer

butinstead fourdecreasIf thecolumnswerenotfourappliances connectivity. ing levels of parentaleducation(as in Table 5), the idea of an increasing trendsamongdifferentstates trendwould be more meaningful.Comparing likewise requiresan additionalnotion of connectivity,but this time across is charactervariable the dependent (NAEPmathscores).Thisconnectedness costs of mixing ized by a commonvariableand emphasizesthe inferential variablesin the same display. togetherdifferentdependent I hope that this brief introduction conveys a sense of how this formal testsof numeracy, andto understand structure can makeit easierto construct of numeracy we are measuring. Of course,to ask betterwhichcharacteristic dataof sufficientrichnessto support them, questionsat higherlevels requires to showthrough. as well as tablesclearenoughforthequantitative phenomena or overall-levelquestions It is much more difficultto answerintermediatefromTable1 thanfromTable12. It is also easierto see trends,anddeviations from them, with a differentdisplayformataltogether(see Figure3). Once again we see thatthe formatwe choose mustbe baseduponour purposein
30

25
20

15

S10
Never Die PowerBat Electro-Blaster Constant Charge Servo-Cell

0
Radio Flashlight Cassette Player Portable Computer

Appliances
FIGURE 3. A graphthatemphasizes the largedifferences in batterylife spanamong to the somewhat smallerdifferences possible usages compared amongbatteries.

26

Tabular Displays Improving

the display.Whileelementary-level questionsarebest answered constructing and overall-levelquestionsmay be easier with a with a table, intermediatewell prepared tablescan be useful graph.However,as we havedemonstrated, at higherlevels. is thattest itemsassociated withtablestendto be questions My experience of the first kind, althoughoften they are compounded throughthe use of This is not an isolatedpracticeconfinedto the meanontabular complexity. of numeracy. surement In thetestingof verbalreasoning it is commonpractice to make a reasoningquestionmore difficultsimply by using more arcane This practicestems from the fact thatit is almostimpossibleto vocabulary. writequestionsthataremoredifficultthanthe questioner is smart.Whenwe try to test the upperreachesof reasoningability,we must find item writers who are more clever still. Of course,whenwe recorda certainlevel of performance by an examinee on a table-based item,we can only infera lowerboundon someone'snumeracy;'0a bettertableof the samedataoughtto makethe itemeasier.Similarly, a more numerate audiencemakesa table appearmoreefficacious.
Software

Thereis an enormous wealthof softwareavailableto maketables.I have found that the versatilityof spreadsheet is especially useful. All programs tablesin this articlewereprepared EXCEL'"on a Macinusing Microsoft's tosh computer. Such softwareallows prettycompletecontrolof fonts, type andshading. rowsis trivial; columnsrequires sizes, borders, Ordering ordering a little work. Transforming data from a spreadsheet table into the graphof your choice is easily accomplishedwithin EXCEL'" for most common graphicforms. For more esoteric formatsthe data are easily moved into special-purpose programs. The real powerof spreadsheets of some comemergeswhen calculations are This lifts the from being merely handyto plexity required. spreadsheet tables.The identification of unusual datapoints being essentialfor preparing in a two-way table requirescalculatingrow and column effects and then themout. Determining whichgaps in a univariate datastringare subtracting to be worthemphasizingrequiresorderingthe data, calculatingthe likely gaps as well as a vector of inverse logistic weights, and combiningand them.All of thesetaskscan be doneon the fly withina spreadsummarizing sheet. Specially designed table softwaredoes not always measureup in this regard. Before one can use this or any softwareon NAEP data, one must first extractthose datafromrather them complex NAEPdatafiles and transform into a formatacceptableto a spreadsheet. This formerlyoneroustask has been eased considerably of some special-purpose throughthe development software(NAEPEX)thatis distributed with the NAEP Secondary-Use Data products.NAEPEXallows the user to define, extract,and analyze subsets 27

Wainer

of NAEP data in a relatively painless manner.Furtherdetails about NAEPEX are contained in its user guide (Rogers, 1995).

Summing Up
Tables are used for many purposes within NAEP: as stimuli in test items, as containers to archive data, and as a communicative medium. Believing that the archival purpose is anachronistic, we focused our attention on rules for building tables to facilitate their efficacy as communicative devices. We found that the same four rules apply to the simplest tables used as stimuli within the assessment and to the most complex tables aimed at scientists. While the rules are objective and as such can be applied through a completely automatic procedure, human judgment and wisdom are still required.Before applying the rules, one must decide on the most likely prospective uses for the data in the table and include only those data that facilitate those uses. Of foremost importance is the notion that we are typically not looking at a table to simply extract a number. To become involved in a problem and to understand it is to shift from extracting individual entries to understanding quantitative phenomena. The construction of efficacious data displays aims to promote this transition,allowing the readera graceful change from spectator to participant. Notes This workwas sponsored Statisticsthrough by the NationalCenterfor Education ContactNumberR999B40013to the Educational TestingService, HowardWainer, forthissupport, Principal Investigator. AlthoughI am pleasedto expressmy gratitude I must reexpressthe usual caveat thatall opinionsexpressedhere are those of the author anddo notnecessarily reflecttheviewsof eitherNCESortheU.S. Government. I am delightedto be able to thankJeremyFinn for his critical and constructive commentson this workas it developed.Of course,he shouldn'tbe held responsible forwhathasresulted fromhis goodadvice.I wouldalso liketo thank BrentBridgeman, JohnMazzeo,KeithReid-Green, LindaSteinberg, andan unusually helpfulassociate editorand two sharp-eyed, but anonymous refereesfor theircomments thoughtful, on an earlierdraft.Last, my gratitude to JohnTukeyfor his helpful suggestionson the choice of an errorterm for large tables. This article was abstracted from a in receiving considerably longertechnicalreport(Wainer,1995);readersinterested a copy of thatreportcan requestit from Martha Thompsonat Educational Testing Service (mthompson @ets.org). 'NAEP is a congressionally mandated achievement of surveyof the educational American studentsand of changesin thatachievement acrosstime. This surveyhas been operational for nearly25 yearsand utilizestechnicallysophisticated sampling and assessmentmethodology. The resultsof NAEP are made availableto boththe and lay publiccontinuously and are cited with increasing professional as frequency evidence in publicdebatesabouteducational topics. NAEP'sresults arecomplex,consisting, as theydo, of (a) outcomeson achievement tests of complex characteron a variety of subjects;(b) attitudeand behavioral information from the children,teachers,and othersassociatedwith the children's 28

Tabular Improving Displays information aboutthe childrenwho took schooling;and (c) detaileddemographic the assessmentinstruments. These data are reported in a varietyof ways that vary of thedata,theirprospective andthepurposes of the data. withthecharacter audience, 2Onecan easily constructpathologicalexceptions(e.g., a sample mean from a butfor mostnormal situations this is a prettygood generalrule. Cauchydistribution), are too radical3I sometimeshearfromcolleaguesthatmy ideas aboutrounding that such extremeroundingwould be "OKif we knew thata particular resultwas final. But our final resultsmay be used by someoneelse as intermediate in further would resultin unnecessary calculations. of error." Too-earlyrounding propagation not archiving.Roundthe numbers Keep in mindthattablesare for communication, thattheunrounded detailsareavailable and,if you must,inserta footnoteproclaiming from the author. Then sit backand wait for the deluge of requests. and so yield tableslike 40f course,teachers'gradebooksare usuallyalphabetical the original.But I suspectmanyteachers(myselfincluded) now use electronicgrade books which are alphabetized for ease of dataentryand have a second versionfor This discussionis aboutretrieval. retrieval. to be largishthrough consideration of boththeirsize gaps were determined "5These and theirlocation.A big gap in the tails is not as unlikelyas one of similarsize in the middle.In this instancewe used inverselogistic weightson the gaps to adjust for location(Wainer & Schacht,1978). the maximum for manyusers.Twoalternatives 6Choosing maybe too conservative may be considered.The first is shrinkingthe maximuminwardbased upon the error.In this instancethe standard errors stabilityof the estimatesof the standard arebasedon about30 degreesof freedom. Thiswouldsuggestsome modestshrinkage. If the degreesof freedomwere 3 or 300, quitedifferent decisionswouldbe reached. The second alternativeis replacingMAX(se) with a more average figure-for example,

I k=l

se2/n.

This second alternative seems especially attractive in this instance, since the distribution of standard errors across states is not too far from the null The issues surroundingthe best choice of error term is a bit afield from our purpose, and so we shall be content to raise it and leave its resolution to other accounts. 7Although not that easy. I have discovered, to my chagrin, that the twoletter state abbreviations do not yield the same alphabetic ordering as the full state names. 8An especially difficult task is finding out that the state you are looking for did not participate in the assessment. children (Wainer, 1980) showed that, on average, item difficulty increased with level and graphicacy increased with age. 29

distribution variable with 30 degreesof freedom. expectedfroma chi-square

9Althoughone small empiricalstudy among 3rd-, 4th-, and 5th-grade

Wainer

"?Itis like trying to decide on Mozart's worth as a composer on the basis of a performance of his works by Spike Jones on the washboard.

References
Y.(1995).Controlling thefalsediscoveryrate:A practical Y.,& Hochberg, Benjamini,
and powerful approach to multiple testing. Journal of the Royal Statistical Society, Series B, 57, 289-300.

Bertin,J. (1983). Semiologyof graphics(W. Berg & H. Wainer, Trans.).Madison: workpublished1973) Universityof WisconsinPress.(Original as a formof exposition.ScholarlyPublishing, Clark,N. (1987). Tablesandgraphics
10(1), 24-42. Court Statistics Project. (1976). State court caseload statistics: Annual report, 1976.

VA:NationalCenterfor StateCourts. Williamsburg, A. S. C. (1977). Rudiments of numeracy. Journalof the RoyalStatistical Ehrenberg,
Society, Series A, 140, 277-297. Farquhar, A. B., & Farquhar, H. (1891). Economic and industrial delusions: A discourse of the case for protection. New York: Putnam. Hambleton, R. K., & Slater, S. C. (1995). Are NAEP executive summary reports understandable to policy-makers and educators? (Research report). Amherst: Uni-

versityof Massachusetts.

NJ: Educational TestingService. W. data analysis.Reading,MA: Addison-Wesley. J. (1977). Tukey, Exploratory H. (1980). A test of graphicacy in children. MeasureWainer, AppliedPsychological
ment, 4, 331-340.

Playfair, W. (1786). The commercial and political atlas. London: Corry. Rogers, A. (1995). NAEPEX: NAEP data extraction program user guide. Princeton,

Wainer,H. (1992). Understanding graphs and tables. EducationalResearcher,


21(1), 14-23.

H. (1993). Tabular Wainer, Chance,6, 52-56. presentation. NJ: Educational Rep. No. 95-1). Princeton, TestingService. Wainer, H., & Schacht,S. (1978). Gapping. 43, 203-212. Psychometrika,
Wainer, H. (1995). A study of display methods for NAEP results: I. Tables (Tech.

Walker, H. M., & Durost, W. N. (1936). Statistical tables: Their structure and use.

New York:Bureauof Publications, Teachers College, ColumbiaUniversity. Heberden and the Bills of Mortality(Tech.Rep. No. Zabell,S. (1976). Arbuthnot, of Statistics. 40). Chicago:The Universityof Chicago,Department

Author
HOWARDWAINERis PrincipalResearchScientist,Educational TestingService, NJ 08541; hwainer@ets.org. He specializesin statisticalgraphicsand Princeton, psychometrics. ReceivedMarch6, 1995 RevisionreceivedAugust28, 1995 AcceptedOctober26, 1995

30

You might also like