What are test techniques? Quite simply they are means of eliciting behavior from candidates which will tell us about their language abilities. What we need are techniques which:
1. will elicit behavior which is a reliable and valid indicator of the ability in which we are interested;
2. will elicit behavior which can be reliably scored;
3. are as economical of time and effort as possible;
4. will have a beneficial backwash effect.
MULTIPLE CHOICE
Multiple choice items take many forms, but their basic structure is as follows.
There is a stem:
Enid has been here ________ half an hour.
and a number of opinions, one of which is correct, the others being distractors:
A. during
B. for
C. while
D. since
It is candidate’s task to identify the correct or most appropriate option (in this case B).
Perhaps the most obvious advantage of multiple choice, referred to earlier in the book, is that scoring can be perfectly reliable. Scoring should also be rapid and economical. A further considerable advantage is that, since in order to respond the candidate has only to make a mark on the paper, it is possible to include more items than would otherwise be possible in a given period of time.
The advantages of multiple choice technique were so highly regarded at one time that it almost seemed that it was the only way to test. While many laymen have always been skeptical of what could be achieved through multiple choice testing, it is only fairly recently that the technique’s limitations have been more generally recognized by professional testers. The difficulties with multiple choice are as follows.
THE TECHNIQUE TEST ONLY RECOGNITION KNOWLEDGE
If there is a lack of fit between at least some candidates’ productive and receptive skills, then performance on a multiple choice test may give a quite inaccurate picture of those candidates’ ability. A multiple choice grammar test score, for example, may be a poor indicator of someone’s ability to use grammatical structures. The person who can identify the correct response in the item above may not be able to produce the correct form when speaking or writing. This is in part a question of construct validity; whether or not grammatical knowledge of the kind that can be demonstrated in a multiple choice test underlies the productive use of grammar. Even if it does, there is still a gap to be bridged between knowledge and use; if use is what we are interested in, that gap will mean that test scores are at best giving incomplete information.
GUESSING MAY HAVE A CONSIDERABLE BUT UNKNOWABLE EFFECT ON TEST SCORES
The chance of guessing the correct answer In a three-option multiple choice item is one in three, or roughly thirty-three per cent. On average we would expect someone to score 33 on a 100-item test purely by guesswork. We would expect some people to score fewer than that by guessing, others to score more. The trouble is that we can never know what apart of any particular individual’s score has come about through guessing. Attempts are sometimes made to estimate the contribution of guessing by assuming that all incorrect responses are the result of guessing, and by further assuming that the individual has had average luck in guessing. Scores are then reduced by the number of points the individual is estimated to have obtained by guessing. However, neither assumption is necessary correct, and we cannot know that the revised score is the same as (or very close to) the one an individual would have obtained without guessing. While other testing methods may also involve guessing, we would normally expect the effect to be much less, since candidates will usually not have a restricted number of responses presented to them (with the information that one of them is correct).
THE TECHNIQUE SEVERELY RESTRIC WHAT CAN BE TESTED
The basic problem here is that multiple choice items require distractors, and distractors are not always available. In a grammar test, it may not be possible to find three or four plausible alternatives to the correct structure. The result is that command of what may be an important structure is simply not tested. An example would be the distinction in English between the past tense and the present perfect. Certainly for learners at a certain level of ability, In a given linguistic context, there are no other alternatives that are likely to distract. The argument that this must be a difficulty for any item that attempts to test for this distinction is difficult to sustain, since other items that do not overtly present a choice may elicit the candidate’s usual behavior, without the candidates resorting to guessing.
IT IS VERY DIFFICULT TO WRITE SUCCESSFUL ITEMS
A further problem with multiple choice is that, even where items are possible, good ones are extremely difficult to write. Professional test writers reckon to have to write many more items than they actually need for a test, and it is only after pre-testing and statistical analysis of performance on the items that they can recognize the ones that are usable. It is my experience that multiple choice tests that are produced for use within institutions are often shot through with faults. Common amongst these are: more than one correct answer; no correct answer ; there are clues in the options as to which is correct (for example the correct option may be different in length to the others); ineffective distractors. The amount of work and expertise needed to prepare good multiple choice tests is so great that, even if one ignored other problems associated with the technique, one would not wish to recommend it for regular achievement testing (where the same test is not used repeatedly) within institutions.
BACKWASH MAY BE HARMFUL
It should hardly be necessary to point out that where a test which is important to students is multiple choice in nature, there is a danger that practice for the test will have a harmful effect on learning and teaching. Practice at multiple choice items (especially when, as happens, as much attention is paid to improving one’s educated guessing as to the content of the items) will not usually be the best way for students to improve their command of a language.
CHEATING MAY BE FACILITATED
The fact that the responses on multiple choice test (a, b, c, d) are so simple makes them easy to communicate to other candidates nonverbally. Some defense against this is to have at least two versions of the test, the only difference between them being the order in which the options are presented.
All in all, the multiple choice technique is best suited to relatively infrequent testing of large numbers of candidates. This is not to say that there should no multiple choice items in tests produced regularly within institutions. In setting a reading comprehension test, for example, there may be certain tasks that lend themselves very readily to the multiple choice format, with obvious distractors presenting themselves in the text. There are real-life tasks (say, a shop assistant identifying which one of four dresses a customer is describing) which are essentially multiple choice. The simulation in a test of such a situation would seem to be perfectly appropriate. What the reader is being urged to avoid is the excessive, indiscriminate, and potentially harmful use of the technique.
CLOZE, C-TEST, AND DICTATION MEASURING OVERALL ABILITY
The three techniques that are to be discussed in the remainder of this chapter have in common the fact that they seem to offer economical ways of measuring overall ability in a language. The cloze technique has in addition been recommended as a means of measuring reading ability.
One way of measuring overall ability would of course be to measure a variety of separate abilities and then to combine scores. This would hardly be economical if we simply wanted to use test results for making decisions which were not of critical importance.
VARIETIES OF CLOZE PROCEDURE
Cloze test
It its original form, the cloze procedure involves deleting a number of words in a passage, leaving blanks and requiring the person taking the test to attempt to replace the original words. After a short unmutilated ‘lead-in’ it is usually about every seventh word which is deleted. The following example, which the reader might wish to attempt, was used in research into cloze in the United States (put only one word in each space).
Example:
What is a collage?
Confusion exists concerning the real purposes, aims, and goals of a college. What are these? What should a college be?
Some believe that the chief function 1. _______ even a liberal arts college is 2. ______ vocational one. I feel that the 3. _________ function of a college, while important, 4. ________ nonetheless secondary. Others profess that the 5. _________ purpose of a college is to 6. _________ paragons of moral, mental, and spiritual 7. ________ – Bernard McFaddens with halos. If they 8. __________ that the college should include students 9.________ the highest moral, ethical, and religious 10. _______ by precept and example, I 11. _______ willing to accept the thesis. …………..
I 12. ____________ in attention to both social amenities 13. __________ regulations, but I prefer to see 14. ___________ colleges get down to more basic 15. -__________ and ethical considerations instead of standing in loco parentis 16. ___________ four years when 17. ___________ student is attempting in his youthful 18. ___________ awkward ways, to grow up. It 19. ___________ been said that it was not 20. ___________ duty to prolong adolescences. …
(Oller and Conrad 1971)
Some of the blanks you will have completed with confidence and ease. Others, even if you are a native speaker of English, you will have found difficult, perhaps impossible. In some cases you may have supplied a word which, though different from the original, you may think just as good or even better.
The cloze procedure seemed very attractive. Cloze tests were easy to instruct, administer and score. Reports of early research seemed to suggest that it mattered little which passage was chosen or which words were deleted; the result would be a reliable and valid test of candidates’ underlying language abilities.
Unfortunately, cloze could not deliver all that was promised on its behalf. For one thing, as we saw above, even if some underlying ability is being measured through the procedure, it is not possible to predict accurately from this what is people’s ability with respect to the variety of separate skills (speaking, writing, etc.) in which we are usually interested. Further, it turned out that different passages gave different results, as did the deletion of different sets of words in the same passage. Another matter for concern was the fact that intelligent and educated native speakers varied quite considerably in their ability to predict the missing words. What is more, some of them did less well than many non-native speakers. The validity of the procedure, even as a very general measure of overall ability, was thus brought into question.
The C-test
The C-test is a variety of cloze, instead of whole words it is te second half of every word which is deleted.
Example:
There are usually five men in the crew of a fire engine. One o_____ them dri____ the eng_____. Eh lea_____ sits bes_____ the dri_____. The ot_____ firemen s_____ inside t_____ cab o_____ the f_____ engine. T_____ leader h_____ usually be_____ in t_____ fight diff______ sorts o_____ fires. S_____, when t______ firemen arr_____ at a fire, it is always the leader who decides how to fight a fire. He tells each fireman what to do.
The advantage of the C-test over the cloze test are that exact scoring is necessary and that shorter passages are possible. Compare with the cloze test, it also takes a little space and not so much time to complete.
Dictation
Dictation tests give results similar to those obtained from cloze tests. In predicting overall ability they have the advantage of involving listening ability. Certainly they are easy to create and administer. But they are certainly easy to score. Because of this scoring problem partial dictation is often used to overcome it. In this, part of what is dictated is already printed on the candidate’s answer sheet. The candidate has simply to fill in the gaps.
Like cloze, dictation is a useful technique where estimates of overall ability are needed. When administering it, it is usual to begin with the entire passage straight through. Then the passage is read out sentence by sentence, not too slowly, one after the other. Enough time should be given to the candidates to write down what they have heard.
Categories: