Does taking tests help students learn science?

A New York Times article ( based upon a recent Purdue study by Jeffrey D. Karpicke and Janell R. Blunt shows that when students take tests they learn science better than when they use study aids such as concept maps and other learning devices.

Purdue study abstract:

Educators rely heavily on learning activities that encourage elaborative studying, while activities that require students to practice retrieving and reconstructing knowledge are used less frequently. Here, we show that practicing retrieval produces greater gains in meaningful learning than elaborative studying with concept mapping. The advantage of retrieval practice generalized across texts identical to those commonly found in science education. The advantage of retrieval practice was observed with test questions that assessed comprehension and required students to make inferences. The advantage of retrieval practice occurred even when the criterial test involved creating concept maps. Our findings support the theory that retrieval practice enhances learning by retrieval-specific mechanisms rather than by elaborative study processes. Retrieval practice is an effective tool to promote conceptual learning about science.

On the surface this appears to be a significant finding that supports testing as a method for education. However, it is important to remember that there are two well established forms of assessment, formative and summative. Summative results show what a student has learned. Formative assessments show what a student needs to learn. Furthermore these results can be reported as normative or criterion referenced scores. Normative scores rank all the students taking the test in order of scores and criterion referenced tests compare the scores to an established standard for each construct being measured. A study like this must be very careful to avoid confounding criterion referenced and normative reports of the results. Just because a group of students do better than another group (normative results) on an inferential test doesn’t mean they know their science (criterion referenced results).

In an article posted on “Our view is that learning is not about studying or getting knowledge ‘in memory,’” said Purdue psychology professor Jeffrey Karpicke, the lead investigator for the study that appears today in the journal Science. “Learning is about retrieving. So it is important to make retrieval practice an integral part of the learning process.”

It would make sense then that for tasks related to retrieval, it would be better to learn through taking tests, a kind of formative assessment. Groups such as Kaplan have made small fortunes from this observation. However, is retrieval really the goal? What criterion do we want to reference? Do we want students of science to be exceptional at retrieving information, or do we want them to think like a scientist? It strikes me that this study might say more about being a student than being a scientist or even a student of science. Perhaps it says that practicing tests will make students better at taking tests. Is this the criterion we want to reference?

Science itself makes use of testing to study nature. Observations lead to questions that become studies with tests to answer the questions. Feedback from these tests creates new questions and the process of learning more about nature continues. If a scientist asks questions about molecules, the tests are conducted with instruments that observe molecules. Since testing and feedback form the basis for science it would make sense that testing would increase understanding. But what sorts of things to scientists test? Scientists don’t depend upon retrieval when testing hypotheses, they depend upon instrumentation and results. How does the above study improve student performance on hypothesizing and interpreting results?

As a researcher situated in the School of Education at UC Davis, these kinds of studies bother me. Am I the only one?

6 comments to Does taking tests help students learn science?

  • William Powers

    You clearly know more and have thought more about this than I have.  You raise important questions regarding learning.  When we say our goal is to “learn” science, what do we mean?  Science is taught for more than one reason.  Not all science test takers intend to or are being trained to be scientists.  For such science students, what is the goal?  It need not be to be able to do science.  Art students do not necessarily study art for the purpose of creating art.  This raises the general question of to what extent the study of a subject is intended to train us to produce that which is studied.  To take another example, I often wear two hats when I study physics, one as a physicist and one as a philosopher.  The perspectives of the two are different.  The physicist wants to understand how a particular physics is integrated into what they already know of physics, and then how to apply that physics to novel physics problems.  This task is, in the first cut, not much different than that of an auto mechanic.  The philosopher, however, is interested in what goes on largely unsaid.  They will also want to integrate that largely unspoken underpinning into already familiar philosophical traditions and constructs.  In both cases, the study requires an appropriate understanding of the new material in itself and additional integration into a prior understanding.  The philosopher may not be so much interested in the “mechanics” of the new physics, but in broader categories.  Yet, it seems, that in neither case is the objective of physicist or the philosopher to be able to produce that which they study.  To be able to produce that which is studied would be to be able to produce new theory, whereas, the purpose of the physicist is quite often not to do that, but rather with the ability to reproduce that which is studied.  O f course, the philosopher is not so much interested in the physics itself, but with rather using the new physics to study something else, e.g., how physics is done, theory acceptance, or metaphysical presumptions.
    The student and the physicist who studies physics, while not immediately concerned with learning how to create a new physics, will learn something of this in the study of physics.  They will learn something of the kinds of mathematical forms that are generally employed in its application to a physical study of the world.  Still, it ought to be clear from science education that this is definitely not the objective of science teaching.  What is generally presented to the student in science classes at all levels is a mature, developed, and accepted science.  In the process what is consciously left out is the messy procedure of its maturation, mistaken paths, and failures.  It is for this reason that many of us would recommend the study of the history of science for all science students.  The scientist for the most part only encounters the task and travail of science creation after they have been judged scientists.
    Given, then, that the goal of science education is not to train us in doing science, but rather to be able to reproduce science, that is, to be good mechanics, then an outcome based testing makes sense.  But if the purpose of science teaching is to create scientists, testing would still be valuable, but it would be a different kind of testing.
    I hope that I’ve addressed to some extent the issues you raise here.

  • Scot Sutherland

    You make a great point.  Put another way there are consumers, practitioners and luminaries in every field.  Tests given to consumers by practitioners are designed to determine something about the condition of the consumer on some scale of performance.  In the world of practitioners there are tests that help sort out who can do what kind of practice.  Luminaries provide new ideas that move the field forward with frameworks that inform the way tests are made.  Most of us take on multiple roles, spending more time as consumers than practitioners and upon rare occasion taking the role of luminary.  We also play these roles simultaneously in different fields. A philosopher can be a philosopher of science or history.  The reporting of these tests is perhaps more important than the test itself because it provides feedback that leads to decisions and actions.
    As I look at the methodology used in this study, it bothers me that tests were given to determine if tests help students learn how to take tests.  A test and a concept map are different forms of interaction with a system of ideas.  One is more flexible and changing while the other is quite constrained.  It strikes me that using an instrument quite different from either form of interaction might provide a fairer picture of the learning that took place.  Performance based tests reenacting experiments, project based tests where the product is evaluated via a rubric or perhaps an essay or more formal oral exam.
    I have yet to find the study in a peer reviewed journal, but I have found a few citations of the article in non-peer reviewed journals.  I expect that most of my colleagues would push back with the same sorts of questions we have presented.  I find this to be one of the most frustrating things about social science.  I wonder if articles like this would not be reported as important in other scientific fields.

  • William Powers

    Perhaps you need to say more about why “these kinds of studies bother” you, and say in in a language that we can all understand.  Remember we (at least I) are (am) not education experts.  You say that it bothers you that one would use tests to determine if tests help students learn how to take tests.  That does seem to be a problem.  If we answer in the affirmative, then it seems that taking more tests to test whether tests help students learn how to take tests would improve their performance on those tests.
    If I were interested in trying to determine whether there was a statistical advantage to taking more tests, I would create a two groups, one in which a final was administered at the end of a semester and that was the only test given, and another group in which there were frequent tests throughout the semester and the same final given at the end.  The problem with this kind of classical scheme is that the frequent tests may be doing more than simply helping students learn how to take my tests.  For example, it would also tend to keep the average student more up-to-date in their studies, which would likely improve final scores.  You could probably find a way around this (e.g., sampling the group to see whether they were keeping up).  One might also try having the control given the same number of tests, but the tests were considerably different from the final.
    Are you sure that it’s the taking of tests to improve a student’s ability to take tests that is of interest here?  I had thought that the researchers were talking about using a certain kind of test to teach the material.  So that the student would “learn” the material better if they took these tests.  Then the question becomes, how do you measure this effect?  Well, I guess we have to administer a test.  If it’s the same kind of test that they’ve been giving, then we have to wonder whether their results are being effected by the improved learning of how to take these kind of tests.
    It seems to me that tests of all kinds help assimilate new material.  They force us to employ and handle what is being learned.  Homework does the same thing.  Ought we treat homework as much different than tests?  You can grade homework.  The setting is, of course, different in general.  Tests also elaborate our experience with the material, although there are obviously other ways to elaborate the learning experience.
    Well, I could go on an say something about science learning, but I’ll stop.  I’ve babbled enough for now.
    thanks, bill

  • Scot Sutherland

    We must construct tests based upon theories of knowing (epistemic models) in a particular field and it bothers me when these theories are not made explicit in studies like these, especially because in our field we have so many different competing theories.
    So when the claim is made that students learn better by taking tests, we need to know from what theoretical perspective they make this statement.  I don’t find that to be explicitly given in the articles I’ve read.  Psychometrics and learning science are very young in the sense that we have between five and seven competing theoretical perspectives.
    Hopefully that clarifies my concerns a bit better.
    Thanks for asking,

  • Scot Sutherland

    These kinds of studies bother me 1) because they don’t seem to have been vetted by peer review 2) they fail to acknowledge the difficulty in separating the ability to take a test from knowledge of the construct being tested (validity) 3) They don’t acknowledge the epistemological (what it means to know something in or about physics for instance) assumptions they are making.
    One study found that the results on one physics test correlated highly with tests of mathematics and showed little correlation on tests of geophysics, whereas the results of another physics test  correlated highly with tests on geophysics but not mathematics, a third physics test results correlated mildly with both mathematics and geophysics.  Expert physicists did better on the first and third tests than they did on the second.  Cross correlating tests with contrasting tests that have overlapping underlying concepts reveals something about the validity of the test.  The first and second test were short answer and the last test was essay.  It is not that any of these tests could not be helpful for learning, but what is being learned is hard to determine because we don’t have direct access to the mind.  All of the comparison tests were multiple choice, begging the question, were the correlations due to the type of test or the content being tested.
    Psychometricians use human raters to determine validity (are we measuring the concept we want to measure) and statistics to determine reliability (does this test measure the concept accurately across the entire population).  The raters are instructed to examine validity based upon a series of questions that are based upon a particular theory of knowing.
    We must construct tests based upon theories of knowing (epistemic models) in a particular field and it bothers me when these theories are not made explicit in studies like these.  Peer review of my own work always tends to force me to be explicit about the theoretical assumptions I’m making, especially because in this field we have so many different competing theories.
    So when the claim is made that students learn better by taking tests, we need to know from what theoretical perspective they make this statement.  I don’t find that to be explicitly given in the articles I’ve read.  Psychometrics and learning science are very young in the sense that we have between five and seven competing theoretical perspectives.
    Hopefully that clarifies my concerns a bit better.
    Thanks for asking,

  • David Wallace

    “Do we want students of science to be exceptional at retrieving information, or do we want them to think like a scientist?”
    I would say some of each plus a scientist needs to know how and where to look-up information that is not committed to memory.  Certain basic information in a discipline needs to be quickly available to the scientist or else formulating a hypothesis or generating results becomes very slow and tedious.
    Unfortunately some professors find it easier simply to put the class examples on exams rather than expect students to develop answers from the subject’s first principles.  Students who simply memorize the examples tend to do much better than those who actually can solve a problem from scratch.  However, once they get to a work environment they quickly become redundant.


January 2011
« Dec   Feb »

Email Notifications for Posts