The Disparity of Colorado's ACT Scores
by Ari Armstrong, August 22, 2003
[Author's note: A relatively short article below is followed by a more detailed discussion between Kimberly Swygert and me. I would like to thank Linda Seebach for putting me in touch with Swygert, whose excellent comments add considerably to a good understanding of the ACT test. I also want to thank Ms. Swygert for granting me permission to include her comments here.]
One bit of good news is that juniors taking the ACT in Colorado scored better than they would guessing randomly at all but a handful of schools. The overwhelming majority of students scored considerably better, and many scored quite well indeed.
Here are the results from a few schools: Adams City High School 13.7, Aspen 20.4, Boulder 21.5, Cherry Creek 23.5, George Washington 19.4, North 14.3, and Grand Junction 18.9. The top school in the state (in terms of ACT scores) is D'Evelyn, a rigorous alternative school in Jefferson County, with 25.6.
Scores increased state-wide from 18.8 last year to 19 now. This is a couple points behind the national average, but the state of Colorado contracts with ACT to provide a test to every junior in the government school system, so a greater percent of students take the test who aren't as well prepared for it. [More recent scores from a national test date show Colorado students averaged 20.1, compared to a 20.8 national average.]
The ACT consists of four sub-tests: English, mathematics, reading, and science reasoning. The raw score on each of these tests is converted to a scaled score between 1 and 36, with 1 being the lowest and 36 being the highest. The ACT does not deduct points for wrong answers (as does the competing SAT), so it's to a student's advantage to fill in an answer for every question, even if that means guessing.
Thus, we can figure a score a student is likely to get by guessing randomly. Using a slightly different conversion table from an earlier ACT, guessing randomly would (on average) produce the following scores: English 10, math 14, reading 12, and science reasoning 14. This results in a composite score of 12.5. Thus, we should consider a score of (around) 12.5 to be the low mark.
Obviously, going from a 1 to a 12 is a fairly easy task. You don't have to read or solve any of the questions: you just have to bubble in answers randomly. Going from a 12 to a 24 is considerably more difficult.
So how did students at a few small schools score below what's obtainable by guessing randomly? They must not have finished the test. Perhaps they didn't know random guesses helped their score.
North High School in Denver was "most improved," from 10.6 to 14.3, but the results are little better than random guessing. To earn an even score of 14, a student need answer correctly the first 10 English questions out of 75 and guess on the rest, fill in the entire math section randomly, answer correctly the first 4 reading questions out of 40 and guess on the rest, and fill in the entire science reasoning section randomly. (This will vary a bit from test to test.) 14 problems is 6.5% of the total 215.
How does this differ from, say, D'Evelyn? To score an even 25 on each of the sections, a student would have to solve 50 English questions correctly and guess on the rest, 35 math questions, 23 reading questions, and 25 science reasoning questions. Thus, the student must answer correctly (not counting correct guesses) 133 problems out of the total of 215, or 62% of the total. (The higher a student scores, the less significant is guessing.) A more meaningful comparison, then, between North and D'Evelyn shows students need work about 6.5% of the test correctly to score a 14, and about 62% of the test for a 25.
Of course, students at D'Evelyn may be better prepared specifically for the ACT. It is probably the case that students at North actually answered more questions correctly (not counting correct guesses) than indicated, but, because they didn't know the ins and outs of the test, they failed to guess randomly on the rest.
Still, the difference between a 14 and a 25 is dramatic. Assuming the students are guessing randomly on the problems they can't solve, those scoring a 25 are solving nearly ten times as many problems as are those scoring a 14. If we drop the assumption that students guess on the problems they don't solve (i.e., if we assume they use a defective strategy), students scoring a 25 are still solving around 2.5 times as many problems. This disparity is a cause for concern.
[Author's note: In my writing below, I quote from the article above, and I also include some passages that I originally cut. Some passages were modified from the time Swygert saw them to the time I finished the article above, so I include the original language.]
Armstrong: Here are the results from a few schools...
Swygert: Is this a mean composite score across all students in the school? Does this include special education students? Do the population sizes of these schools vary considerably? Those are all things I might want to know before interpreting these scores.
Armstrong: The Rocky Mountain News reports scores increased state-wide from 18.8 last year to 19 now. This is a couple points behind the national average, but the state of Colorado contracts with ACT to provide a test to every junior in the government school system, so a greater percent of students take the test who aren't as well prepared for it.
Swygert: It's a virtual given that some students are not prepared, some are not capable, and some will just not care. I would expect the schools that have the smallest percentage of the senior class planning to enroll in college to have the lowest ACT mean composite scores. The fact that the ACT is required of all juniors is indeed a fact that should be highlighted in any discussion of Colorado's ACT scores.
Armstrong: There is, of course, a correlation between ACT results and the quality of schools. However, it would be a mistake to conclude the schools are primarily responsible for ACT results. The primary indicator of student success is the involvement of the parents. Well-educated parents who spend time with their children and purposely help them develop intellectually will tend to raise better-educated children, regardless of where their children go to school (or even if they go to school). Such parents are more likely to move to the best school districts and be the most involved with their children's schools. That said, of course great schools can compensate for other disadvantages, and bad schools can discourage even the brightest students. And let us not forget free will. Some students fail despite their privilege, while other students succeed even though all the odds are stacked against them.
Swygert: This part seems related to ED Hirsch's criticism of public schools, and the fact that bad public schools widen the gap between rich and poor students. Rich students will do well even if in bad schools, because they receive extra instruction, guidance, and example from home. But poor students are often absolutely reliant on their public schools to be of good quality, because that's where most if not all of their academic guidance is going to come from. I'm glad the author here focuses on involvement of parents rather than on SES, because while SES can be important, parental involvement will improve education within all socio-economic levels.
Armstrong: [T]here are some peculiarities with the ACT that must be understood in order to grasp what the results mean. For instance, the difference between North's 14.3 and Boulder's 21.5 is greater even than what the 7.2-point difference might indicate. While simply dividing the scores indicates the score for North is 67% what it is in Boulder, this figure says nothing about the relative significance of the scores.
Swygert: The basic premise here is correct, but the use of the word "peculiarities" to describe it is not. Most large-scale high- stakes standardized test use this same format, which is that number- right scores are converted to scaled scores. The ACT scale is one of the smallest that is used, but it is, for example, also true that the difference between a 300 and 400 on the SAT is not the same as the difference between a 500 and a 600, in terms of the number of items that must be answered correctly to achieve the 100-point gain in scaled scores. The conditional SEMs for the SAT are different than those for the ACT, but virtually every large-scale high-stakes test is scaled in order to adjust for varying difficulties of the same test across different administrations, and so it is in no way "peculiar" that the ACT converts raw scores to scaled scores, or that the number-right needed for a one point gain in ACT scaled score differs across the scaled score range. However, it is indeed crucial to be sure that readers understand that NO scaled scores can simply be divided into one another in order to compare schools.
Armstrong Replies:I wasn't trying to use the word "peculiarities" with a negative connotation. Of course various tests convert "raw" scores to a scaled score, but the ACT's scale is (to my knowledge) unique. And, relative to the older SAT, the ACT is significantly different in that it doesn't deduct a partial point for wrong answers.
Armstrong: [T]he ACT does not deduct points for wrong answers (as does the competing SAT), so it's to a student's advantage to fill in every blank, even if that means guessing.
Swygert: True, and useful to point this out.
Armstrong: I called ACT and spoke with several people, but nobody there was willing or able to give me the score conversion sheet for the particular test in question. However, the conversion tables are similar from test to test, so I looked up one for a test from 2002 (coded 0255C).
Swygert: Not surprising, because the conversion methods and the conversion tables for recent tests are going to be closely guarded. My guess is that information would not be released without threat of lawsuit. However, once conversion tables are published, they are fair game, and so the author here is correct to go look at old ones. This doesn't mean that the current tables are the same, so that caveat must be added.
Armstrong: Using this conversion table, guessing randomly would (on average) produce the following scores: English 10, math 14, reading 12, and science reasoning 14. This results in a composite score of 12.5. Thus, we should consider a score of (around) 12.5 to be the low mark.
Swygert: I don't know if I'd call it a low mark; I'd call it performing at chance level. From a layperson's point of view, obviously, it's a low score, but test items are often written with attractive distracters, so that an examinee who is not guessing but is in fact using bad judgment to solve all the items could go for the distracters and end up with a score even lower than chance level. However, I think it's probably okay here to call the performing-at-chance-level here the "low mark". Also, it would help things later on it here you would provide the number of items in each section, and also remind the reader of whether the ACT items have four or five options, so that they can do the math for calculating chance performance on X number of items themselves.
Armstrong Replies: The English test consists of 75 questions, and each question lists four answer choices. The math tests consists of 60 questions, and each question lists five answer choices. Both the reading and science reasoning tests consist of 40 questions, and each question lists four answer choices. The "raw" scores on each section are converted to a scaled score with a key specific to the test.
You're correct that the concept of "attractive distracters" is an important one for understanding how these tests work. I was treating an "ideal" score possible when students use optimal strategies. Of course, many students don't do that, which I why I added the caveat at the end. This does further illustrate the importance of specific training for the test -- students who are aware of "attractive distracters" will be better able to avoid them.
Armstrong: North High School in Denver was "most improved," from 10.6 to 14.3, but this seems not terribly impressive given the results are little better than random guessing...
Swygert: This is a good point, but if we believe that a score above 12 is meaningful, meaning from responses not due to chance, we do have to conclude that a score increase is a score increase, wherever achieved. So it's not as big a percentile jump to go from 10 to 14, but it can still be a meaningful increase, and not necessarily due just to chance.
Armstrong: So how did students at a few (very small) schools score below what's obtainable by guessing randomly? They must not have finished the test. This might have been because they had no interest in doing a good job on it, or perhaps they didn't know random guesses helped their score.
Swygert: Or they were trying to solve items that had attractive distracters. They could indeed have run out of time, or they could have indeed not known that to fill in all remaining bubbles very quickly could only help them.
Armstrong: While an ACT score is roughly related to a student's academic ability, it would be a mistake to imagine too strong a link. Surely some students hated the experience and just doodled, and some of these students might be quite book smart. The students at the worst-scoring schools have had the toughest knocks in life. The ACT tests only developed ability, not latent ability. Finally, the ACT tests only a very narrow range of skills, and students who don't enjoy studying are often great at other things.
Swygert: These four criticisms don't seem to be related to one another and aren't really relevant. I'll address each one in turn:
1. "Surely some students hated the experience and just doodled, and some of these students might be quite book smart" If students who don't wish to take the ACT are forced to take and waste time by bubbling in at random or doodling, that does indeed indicate that their particular scores are not valid, but it does not negate the assumption that, had they tried their best, their ACT score would have indeed been a good indicator of their academic achievement. Simply because scores can be invalidated in this method does not mean the test itself is invalid.
2. "The students at the worst-scoring schools have had the toughest knocks in life." I'm not sure what that has to do with anything. If having a hard life means one has not been able to learn basic math (because one attended a school with a bad teacher, or had no money for books, or whatever), the ACT still measures whether or not one knows basic math. We can argue about *why* a kid might not have learned basic math, or whether a low score should be interpreted in light of life situations, but that doesn't mean the ACT isn't actually measuring basic math knowledge. We need to separate out here whether or not ACT score is related to academic ability (which is it) from whether a kid's academic ability score should be judged independently of their situation (which it might not be).
3. "The ACT tests only developed ability, not latent ability." Yes, that's the stated purpose of the test. It's unclear why this means that the ACT is not strongly linked to *academic* ability (and how is that being defined?). Actually, the test is measuring some amount of both developed and latent abilities. My guess is that a kid with natural intelligence who didn't try all that hard and a kid who isn't naturally bright but worked a great deal are both going to get high ACT scores, and it's up to college admissions officers to tease out latent from accomplished and decide which is more important for college performance.
4. "The ACT tests only a very narrow range of skills, and students who don't enjoy studying are often great at other things" Yes, but what the author is concerned about is whether the ACT is strongly related to academic abilities, and what the ACT measures are basic academic skills. Yes, someone who tanks on the ACT might be a great musician or painter or creative person, but does that mean they should be admitted to college? The ACT is meant to predict first year college grades, and no matter what a kid is good at, if they don't have core reading and mathematics skills down, we could argue that they are not academically developed, and might not do well in college.
Armstrong Replies: I deleted this whole matter from the abbreviated version.
Here's why I included it originally. It seems clear the ACT crew is not very keen on publicizing the fact that a "guess" score is around a 12. Similarly, on the SAT, the absolute lowest score possible is a 400 -- and this is true even if a student gets a negative raw score because of "attractive distracters."
Why do both tests in some sense give the worst-performing students an inflated scaled score? After all, in school the most common grading method is simply to note the percent correct out of the total.
In trying to give readers an accurate view of what the scores actually mean, I thought it important to note that too much can be made of these tests. Yes, it's a useful test with *some* correlation to college success, but on the other hand it doesn't say much about the person taking the test. Just as the ACT and SAT in some sense protect low-performing students with inflated scores, I suppose I was also trying to offer an apology for those students.
Armstrong: We've seen that, to score a 14, a student need only answer correctly (not counting correct guesses) 14 problems, out of a total of 215, which translates to 6.5% of the total. How does this differ from, say, D'Evelyn? Let's round down and shoot for an even score of 25 on each of the sections. A student would have to solve 50 English questions correctly and guess on the rest, 35 math questions, 23 reading questions, and 25 science reasoning questions. Thus, the student must answer correctly (not counting correct guesses) 133 problems out of the total of 215, or 62% of the total. (The higher a student scores, the less significant is guessing.) A more meaningful comparison, then, between North and D'Evelyn shows students need work about 6.5% of the test correctly to score a 14, and about 62% of the test for a 25.
Swygert: The wording in this part is confusing. It would be better to say that to get a 14, a student need only know how to solve 14 of the total 215 items on the exam, 10 from English and 4 from reading, and then be able to guess randomly on all remaining items. This would bring their total number of items correct up to (insert number here) out of 215. On the other hand, to get a 25, a student would need to know how to solve 133 items, and then guess randomly on all remaining, for a total of (insert number here) out of 215. Your comparison of 14 items to 133 is indeed the correct way to go about comparing these scores, though.
Armstrong: There is at least one important caveat. Probably students at D'Evelyn are better prepared specifically for the ACT. It is probably the case that students at North actually answered more questions correctly (not counting correct guesses) than indicated, but, because they didn't know the ins and outs of the test, they failed to guess randomly on the rest. On the other hand, it is also true that some questions are much harder than others, so the more problems a student solves, the harder the rest of the test becomes (assuming a wise strategy).
Swygert: I'm not sure if "knowing the ins and outs of the test" has as much to do with it as the fact that guessing at random is not as easy as it sounds. Chances are the students at least glanced at the items they did not know and tended to guess answers that looked right to them, which could have been attractive distracters. But even if you know that answering randomly might help you, that doesn't mean that it's easy to answer in a truly random fashion.
Armstrong Replies: First, "knowing the ins and outs of the test" includes knowing about "attractive distracters" -- though I didn't make that point explicitly. My experience is that the major error students make is to rush through the test (either the ACT or SAT) and attempt to answer too many questions. Students who follow my advice avoid or reduce this error, and thus tend to avoid distracters.
Second, it is true that guessing randomly will produce a variety of scores -- some students will be especially lucky and some students will be especially unlucky. This is one of those points I decided to leave out. However, the scores will conform to a bell curve, and they will average out for a group.
Swygert: It's also completely unclear to me what "it is also true that some questions are much harder than others, so the more problems a student solves, the harder the rest of the test becomes" actually means. Do you mean that students who work on items and solve them correctly are more likely to run out of time for subsequent items?
Armstrong Replies:What I mean is that a wise test-taking strategy implies the student begins with the easiest problems on the test and works up to the most difficult problems. This strategy can be applied (in different ways) to every section on the ACT and SAT. Of course, the strategy cannot be perfectly applied, as it's sometimes difficult to distinguish between easy and difficult problems at the outset.
Armstrong: Still, the difference between a 14 and a 25 is dramatic...
Swygert: This is simply another way of explaining the percentiles. We know that 14 (8th %) is very, very different than 25 (82nd %). What more evidence do we need than that?
Armstrong Replies: The percentiles alone don't quite paint the picture, as they are strictly relativistic. It is not immediately obvious how great a span a difference in percentile indicates. If all students are very close academically, a small difference in ability will result in a huge difference in percentiles.
Armstrong: And I think that disparity should bother all of us.
Swygert: True. This does indicate some extreme inter-school variability on ACT performance, that can be explained by many things, but it is likely that some schools are not teaching kids the core academic skills that they need to perform well on this test.