Whether we realize it or not, we test every day in virtually every cognitive effort we make. When we read a book, listen to the news on TV, or prepare a meal, we are testing hypotheses and making judgments. Anytime we “try” something – a new recipe, a different tennis racquet, a new pair of shoes – we are testing. We are formulating a judgment about something on the basis of a sample of behavior. The foreign language learner is testing his newly acquired forms of language almost every time he speaks. He devises hypotheses about how the language forms are structured and how certain functions are expressed in forms. On the basis of the feedback he receives, he makes judgments and decisions. Language teachers also test, informally and intuitively, in every contact with learners. As a learner speaks or writes or indicates  either aural or reading comprehension, the teacher makes a judgment about the performance and from that judgment infers certain competence on the part of the learner. Classroom-oriented informal testing is an everyday and every common activity in which teachers engage almost intuitively. 
1.1   TESTING AND TEACHING
A language test which seeks to find out what candidates can do with language provides a focus for purposeful, everyday communication activities. Such a test will have a more useful effect on the learning of a particular language than a mechanical test of structure. In the past event good tests of grammar, translation or language manipulation had a negative and even harmful effect on teaching. A good communicative test of language, however, should have a much more positive effect on learning and teaching and should generally result in improved learning habits.
1.2        WHAT IS A TEST?
A test, in plain words, is a method of measuring a person’s ability or knowledge in a given domain. The definition captures the essential components of a test. A test is first a method. It is a set of techniques, procedures, and items that constitute an instrument of some sort that requires performance or activity on the part of the test-taker (and sometimes on the part of the tester as well). The method may be intuitive and informal, as in the case of a holistic impression of someone’s authenticity of pronunciation. Or it may be quite explicit and structured, as in a multiple-choice technique in which correct responses have already been specified by some “objective” means.
Next, a test has the purpose of measuring. Some measurements are rather broad and inexact, while others are quantified in mathematically precise terms. The difference between formal and informal assessment exists to a great degree in the nature of the quantification of data.
A test measures a person’s ability or knowledge. Care must be taken in any test to understand who the test-takers are. What is their previous experience and background? Is the test appropriate for them? How are scores to be interpreted for individuals?
Finally, a test measures a given domain. In the case of a proficiency test, even though the actual performance on the test involves only a sampling of skills of a language. Other tests may have more specific criteria. A test of pronunciation might well be a test only of a particular phonemic minimal pair in a language. One of the biggest obstacles to overcome in constructing adequate tests is to measure the desired criterion and not inadvertently include other factors.
1.3        WHY TEST?
The function indicated in the preceding paragraph provides one of the answers to question; why test? But it must be emphasized that the evaluation of student performance for purposes of comparison or selection is only one of the functions of a test. Furthermore, a good classroom test will also help to locate the precise areas of difficulty encountered by the class or by the individual student. Just as it is necessary for doctor first to diagnose the patient’s illness, so it is equally necessary for the teacher to diagnose the student’s weakness and difficulties. Unless the teacher is able to identify and analyze  the errors a student makes in handling the target language.
The test should also enable the teacher to ascertain which parts of the language program have been found difficulty by the class. In this way the teacher can evaluate the effectiveness of the syllabus as well as the methods and materials he or she is using. The test result may indicate, for example, certain areas or the language syllabus which have not taken sufficient account of foreign learner difficulties or which, for some reason, have been glossed over. A test which sets out to measure students’ performance as fairly as possible without in any way setting traps for them can be effectively used to motivate them.
1.4        WHAT SHOULD BE TESTED AND TO WHAT STANDARD?
Before a test is constructed, it is important to question the standards which are being set. What standards should be demanded of learners of a foreign language? For example, should foreign language learners after a certain number of months or years be expected to communicate with the same ease and fluency as native speakers.
Examinations in the written language have in the past artificial standards even for native speakers have often demanded skill similar to those acquired by the great English essayists and critics. In imitating first language  examinations have proved far more unrealistic in their expectations of the performances of foreign learners, who have been required to write some of the greatest literary masterpieces in their own words or to write original essays in language beyond their capacity.
1.5        TESTING THE LANGUAGE SKILLS
Four major skills in communicating through language are often broadly defined as listening, listening and speaking, reading and writing. In many situation, where English is taught to perform as many genuinely communicative tasks as possible. Where it is important for the test writer to concentrate on those types of test items which appear directly relevant to the ability to use language for real-life communication, especially in oral interaction. Thus, questions which test the ability to understand and respond appropriately to polite requests, advice, instructions, etc. would be preferred to test of reading aloud or telling stories. In the written section of a test, questions requiring students to write letters, memos, reports and messages  would be used in place many of the more traditional composition used  in the past. In listening and reading tests, questions in which students show  their ability to extract specific  information of a practical nature would be preferred to questions testing the comprehension of un important and irrelevant details. 
            Ways of assessing performance in the four major skills may take the form of tests of:
          listening (auditory) comprehension, in which short utterances, dialogues, talks and lectures are given to the testees;
          speaking ability, usually in the form of an interview, a picture description, role play, and a problem-solving task involving pair work or group work;
          reading comprehensions, in which questions are set to test the students’ ability to understand the gist of a text and to extract key information on specific points in the text; and
          writing ability, usually in the form of letters, reports, memos, messages, instructions, and accounts of past events, etc.
     it is the test constructor’s task to assess the relative importance  of these skills at the various levels and to devise an accurate means of measuring the student’s success in developing these skills.
1.6        TESTING LANGUAGE AREAS
In an attempt to isolate the language areas learnt, a considerable number or tests include section on:
          grammar and usage
          vocabulary (concerned with word meanings, word formation and collocations);
          phonology (concerned with phonemes, stress and intonation)
    
1.6.1        Test of grammar and usage
  These tests measure students’ ability to recognize appropriate grammatical forms and to manipulate structures.
            Although it (1)  ……. quite warm now. (2) …… will change later today. By tomorrow morning, it (3) ……… much colder and there may even be little snow …. (etc.)
(1) A. seems      B. will seem                   C. seemed                     D. had seemed
(2) A. weather    B. the weather               C. a weather                  D. some weather
(3) A. is             B. will go to be                          C. is going to be                        D. would be
 Note that this particular type of question is called a multiple-choice item. The term multiple-choice item is used because the students are required to select the correct answer from a choice of several answers. The word item is used in preference  to the word question because the latter word suggests the interrogative form; many test items are, in fact, written in the form of statements.
1.6.2   Test of vocabulary      
A test of vocabulary measures students’ knowledge of the meaning of certain words as well as the patterns and collocations in which they occur. Such a test may test their active vocabulary (the words they should be able to use in speaking and in writing) or their passive vocabulary (the words should be able to recognize and understand when they are listening to some one or when they are reading) obviously, in this kind of test the  method used to select the vocabulary items(=sampling) is of the outmost importance.
         
In the following item students are instructed to circle the letter at the side of the word which best completes the sentence.
Did you …….. that book from the school library?
A. beg               B. borrow         C. hire              D. lend             E. ask
     
      In another common type of vocabulary test students are given a passage  to read and required to replace certain words at the end of the passage with their equivalents in the passage.
1.6.3        Test of Phonology
      Test  items designed to test phonology might attempt to assess the following such skills; ability to recognize and pronounce the significant sound contrasts of a language, ability to recognize and use the stress patterns of a language, and ability to hear and produce the melody or patterns of the tunes of a language (i.e. the rise and fall of the voice).
     In the following item, students are required to indicate which of the three sentences they hear are the same;
            Spoken:
            Just look at that large ship over there.
            Just look at that large sheep over there.
            Just look at that large ship over there.
        
      Although this item, which used to be popular in certain tests, is now very rarely included as a separate item in public examinations, it is sometimes appropriate for inclusion in a class progress or achievement test at an elementary level. Successful performance in this field, however, should not be regarded as necessarily indicating an ability to speak.
1.7   RECOGNITION AND PRODUCTION   
     Methods of testing the recognition of correct words and forms of language often take the following form in tests:
Choose the correct answer and write A,B,C, or D
I’ve been standing here ……… half an hour.
A. since             B. during           C. while                        D. for
This multiple-choice test item tests students’ ability to recognize the correct form: this ability is obviously not quite the same as the ability to produce and use the correct form in real-life situations. However, this type of item has the advantage of being easy to examine statistically.
     If the four choices were omitted, the item would come closer to being a test of production:    
  Complete each blank with the correct word.
  I’ve been standing here …….. half an hour.
Students would then be required to produce the correct answer (=for). In many cases, there would only be one possible correct answer, but production items do not always guarantee that students will deal with the specific matter the examiner had in mind (as most recognition items do). 
     A good language test may contain either recognition-type items or production-type items, or a combination of both. Each type has its unique functions, and these will be treated in detail later.
1.8   AVOIDING TRAPS FOR THE STUDENTS
A good test should never have constructed in such a way as to trap the students into giving an incorrect answer. When techniques of error analysis are used, the setting of deliberate trap or pitfalls for unwary students should be avoided many testers, themselves, are caught out by constructing test items which succeed only in trapping the more able students. Care should be taken to avoid trapping students by including grammatical and vocabulary items which have never been taught.
     In the following example, students have to select the correct answer (C), but the whole item is constructed so as to trap them into making choice B or D.  When this item actually appeared in a test, it was found  that the more proficient students, in fact chose B and D, as they had developed the correct habit of associating the tense forms have seen  and have been seeing with since and for.
When I met Tim yesterday, it was the first time I ……….  him since Christmas.
A. saw              B. have seen      C. had seen       D. have been seeing 
      To summarize, all tests should be constructed primarily with the intention of finding out what students know – not of trapping them. By attempting to construct affective language tests, the teacher can gain a deeper insight into the language he or she is testing and the language learning process involved. 
1.9   KINDS OF TEST AND TESTING
This we use test to obtain information. The information that we hope to obtain will of course vary from situation to situation. It is possible, nevertheless, to categorize tests according to a small number of kinds of information being sought. This categorization will prove useful both in deciding whether an existing test is suitable for particular purpose and in writing appropriate new tests where these are necessary. The four types of test which we will discuss in the following sections are: proficiency test, achievement tests, diagnostic tests, and placement tests.
1.9.1        Proficiency tests
Proficiency tests are designed to measure people’s ability in a language regardless of any  training they may have had in that language. The content of a proficiency test, therefore, is not based on the content or objectives of language courses which people taking the test may have followed.
     In the case of some proficiency tests, ‘proficient’ means having sufficient command of the language  for a particular purpose. An example of this would be a test designed to discover whether someone can function successfully as a United Nations translator. Another example would be a test  used to determine whether a student’s English is good enough to follow a course of a study at a British University. Such  a test may follow courses in  particular subject areas.
Despite differences between content and level of difficulty, all proficiency tests have in common the fact that they are not based on courses that candidates may have previously taken.
1.9.2        Achievement tests   
      Most teachers are unlikely to be responsible for proficiency tests. It is much more probable that they will be involved in the preparation and use of achievement tests. In contrast to proficiency tests, achievement tests are directly related to language courses, their purpose being to establish how successful individual students, groups of students, of the courses themselves have been in achieving objectives. They are of two kinds final achievement tests and progress achievement tests.
      Final achievement tests are those administered at the end of a course of study. They may be written and administered by ministries of education, official examining boards, or by members of teaching institutions. Clearly the content of these tests must be related to the courses with which they are concerned, but the nature of this relationship is a matter of disagreement amongst language testers.
      Progress achievement tests as their name suggests, are intended to measure the progress that students are making. Since ‘progress’ is towards the achievement of course objectives, these tests too should relate to objectives. But how? One way of measuring progress would be repeatedly to administer final achievement test, the (hopefully) increasing scores indicating the progress made. This is not feasible, particularly in the early stages of a course. The alternative is to establish a series of well-defined short term objectives. These should make a clear progression towards the final achievement tests based on course objectives.
1.9.3        Diagnostic tests
     Diagnostic tests are used to identify students’ strengths and weakness. They are intended primarily to ascertain what further teaching is necessary. At the level of broad language skill is reasonably straightforward. We can be fairly confident of our ability to create tests that will tell us that a student is particularly weak in, say, speaking as opposed to reading in a language. Indeed existing proficiency tests may often prove adequate for this purpose.
     We may be able to go further, analyzing samples of a student’s performance in writing or speaking in order to create profiles of the student’s ability with respect to such categories as ‘grammatical accuracy’ or ‘linguistic appropriacy.’
1.9.4        Placement tests
     Placement tests as their name suggests, are intended  to provide  information which will  help to place students at the stage (or in the part) of the teaching program most appropriate to their abilities. Typically they are used to assigned students to classes at different levels.
     Placement tests can be bought, but this is not to be recommended  unless the institution concerned is quite sure that the test being considered suits its particular teaching program. No one placement test will work for every institution, and the initial assumption about any test that is commercially available must be that it will not work well.
1.9.5        Direct versus indirect testing
Testing is said to be direct when it requires the candidate to perform precisely the skill which we wish to measure. If we want to know how well candidates can write compositions, we get them to write compositions. If we want to know how well they pronounce a language, we get them to speak. The tasks, and the texts which are used, should be as authentic as possible. The fact that candidates are aware that they are in a test situation means that the tasks cannot be really authentic. Nevertheless the effort is made to make them as realistic as possible.
     Direct testing is easier to carry out when it is intended to measure the productive skills of speaking and writing. The very acts of speaking and writing provide us with information about the candidate’s ability. With listening and reading, however, it is necessary to get candidates not only to listen or read but also to demonstrate that they have done this successfully. The tester has to devise methods of eliciting such evidence accurately and without the method interfering with the performance of the skills in which he or she is interested.
      Direct testing has a number of attractions. First, provided that we are clear about just what abilities we want to assess, it is relatively straight –forward to create the conditions which will elicit  the behavior on which to base our judgements. Secondly, at least in the case of the productive skills, the assessment and interpretation of students’ performance is also quite straightforward. Thirdly, since practice is likely to be a helpful backwash effect.
     Indirect testing attempts to measure the abilities which underlie the skills in which we are interested. One section of the TOEFL, for example, was developed as an indirect measure of writing ability. It contains items of the following kind
At first the old woman seemed unwilling to accept anything that was offered her by my friend and I.
     Where the candidate has to identify which of the underlined elements is erroneous or inappropriate in formal standard English. While the ability to respond to such items has been shown to be related statistically to the ability to write compositions (though the strength of the relationship was not particularly great), it is clearly not the same thing.
     The main problem with indirect tests is that the relationship between performance on them and performance of the skills in which we are usually more interested tends to be rather weak in strength and uncertain in nature.
1.9.6  Discrete point versus integrative testing
Discrete point testing refers to the testing of one element at a time, item by item. This might involve, for example, a series of item each testing  a particular grammatical structure. Integrative testing, by contrast, requires the candidate to combine many language elements in the completion  of a task. This might involve writing a composition, making notes while listening to a lecture, taking a dictation, or completing a cloze passage. Clearly this distinction is not unrelated to that between indirect and direct testing. Discrete point tests  will almost always be indirect, while integrative tests will tend to be direct. However, some integrative testing methods  such as the cloze procedure, are indirect.   
1.9.7   Communicative language testing
Much has been written in recent years about ‘communicative language testing’. Discussions have centered on the desirability of measuring the ability to take part in acts of communication (including reading and listening) and on the best way to do this. It is assumed in this book that it is usually communicative ability which we want to test. As a result, what I believe to be the most significant points made in discussions of communicative testing are to be found throughout. A recapitulation under a separate heading would therefore be redundant.