In fact, each IPA symbol is shorthand for a whole range of properties, and those properties explain how the particular segment being symbolized is pronounced; unpacking the black box for each sound reveals not a jumble, but an internal structure, and understanding that structure allows us to make comparisons with other sounds. When we know that [k], for instance, is a voiceless velar plosive, we can start to see what properties it shares with other sounds which might also be voiceless, or velar, or plosives; we can also see how it differs from other sounds which are not voiceless, or velar, or plosives. Furthermore, we shall see what properties different allophones of the same phoneme share, which might allow them to be regarded as ‘the same’ by speakers of English: that is, we can work out what particular phonetic features speakers of English tend to ignore, and which they are aware of. Since this may be very different for speakers of other languages, unpacking IPA notation in this way also allows cross-linguistic comparisons to be made. In this chapter, we shall therefore consider a very basic set of phonetic features which enable us to describe the articulation of the consonants of English, and to assess their differences and similarities.


A biologist looking at some particular creature wants to know various things about it, to work out where it should be placed in conventional biological classification. Some properties are visible and therefore easy to work out, such as how many legs it has or whether it has fur, feathers or scales. In other cases, closer observation will be needed: tooth shape cannot usually be checked from a distance. Still other properties are behavioural, and our biologist might need to observe her creature over a longer period of time to figure out whether it lays eggs or bears live young, or what it eats.

2.1  Voiced or voiceless?

A major division among speech sounds which is relevant for all languages is the dichotomy of voiced and voiceless. If you put your fingers on your ‘Adam’s apple’ or ‘voicebox’ (technically the larynx), and produce a very long [zzzzzzz], you should feel vibration; this shows that [z] is a voiced sound. On the other hand, if you make a very long [sssssss], you will not feel the same sort of activity: [s] is a voiceless sound. Consequently, although English has the minimal pairs tip – diplatter – ladderbit – bid for /t/ versus /d/, [d] is only voiced throughout its production in ladder, where it is medial and surrounded by voiced vowels. Word-initially, we are more likely to identify /t/ in tip by its aspiration, and /d/ in dip by lack of aspiration, than rely on voicing.

Voicelessness and voicing are the two main settings of phonation, or states of the glottis: for English at least, the only other relevant case, and again one which is used paralinguistically, is whisper. In whisper phonation, the vocal folds are close together but not closed; the reduced size of the glottis allows air to pass, but with some turbulence which is heard as the characteristic hiss of whisper.

2.2  Oral or nasal?

The next major issue is where the pulmonic egressive airstream used in English goes. For most sounds, air passes from the lungs, up through a long tube composed of the trachea, or windpipe; the larynx; and the pharynx, which opens out into the back of the oral cavity. The air passesthe various articulators in the mouth, and exits at the lips; and all these vocal organs are shown in Figure 3.1. However, for three English sounds, air passes through the nasal cavity instead.

The key to whether air can flow through the nose is the velum, or soft palate, which you can identify by curling the tip of your tongue up and running it back along the roof of your mouth until you feel the hard, bony palate giving way to something squashier. For oral sounds, the velum is raised and pushed against the back wall of the pharynx, cutting off access to the nose. However, for [m], [n] and [ŋ] in ramran and rang, the velum is lowered, so that air moving up from the lungs must flow through the nose. If you produce a long [s], you will be able to feel that air is passing only through your mouth; conversely, if you hum a long [m], you will notice that air continues to flow through your nose while your lips are pressed together, with that closure being released only at the end of the [m]. When someone suffering from a cold tells you ‘I’ve got a cold id by dose’ instead of ‘I’ve got a cold in my nose’, she is failing to produce [n] and [m] because soft tissue swelling blocks air access to the nose and perforce makes all sounds temporarily oral.

Nasal sounds, like [m] and [n], are produced with air only passing through the nasal cavity for at least part of their production. On the other hand, nasalised sounds, like the vowel in can, preceding a nasal consonant, as opposed to the vowel in cat, which precedes an oral one, are characterised by airflow through both nose and mouth simultaneously.

2.3  What is the manner of articulation?

To produce any consonant, an active articulator, usually located somewhere along the base of the vocal tract, moves towards a passive articulator, somewhere along the top. Where those articulators are, determines the consonant’s place of articulation, as we shall see in the next section. How close the active and passive articulators get, determines the manner of articulation. There are three main manners of articulation, and one subsidiary case which in a sense is intermediate between the first two.

  1. STOPS

If the active and passive articulators actually touch, stopping airflow through the oral cavity completely for a brief period, the sound articulated is a stop. If you put your lips together to produce [p] pea, and hold them in that position, you will feel the build-up of air which is then

released when you move from the stop to the following vowel. Further back in the vocal tract, [t] tea and [k] key are also stop sounds. More accurately, all these are plosives, the term for oral stops produced on a pulmonic egressive airstream, just as clicks are stops produced on a velaric ingressive airstream, for instance. Plosives may be voiceless, like[p], [t] and [k], or voiced, like their equivalents [b], [d] and [g].

Since the definition of a stop involves the complete, transient obstruction of the oral cavity, it also includes nasal sounds, where airflow continues through the nose. English [m], [n] and [ŋ] are therefore nasal stops, although they are typically referred to simply as nasals, as there are no distinctive English nasals involving other manners of articulation. All these nasals are also voiced.

Finally, some varieties of English also have subtypes of stops known as taps or trills. While a plosive is characterised by a complete obstruction of oral airflow, followed generally by release of that airflow, a tap is a very quick, ballistic movement where the active articulator strikes a glancing blow against the passive one; interruption of the airstream is real, but extremely brief. Many Scots speakers have a tapped allophone [ɾ] of the phoneme /r/ between vowels, as in arrowvery; many American speakers have a similar tap as a realisation of /t/ in butterwater. Trills are repeated taps, where the active articulator vibrates against the passive one. Trilled [r] is now rather uncommon for speakers of English, although attempts at imitating Scots often involve furious rolling of [r]s.


During the production of a fricative, the active and passive articulators are brought close together, but not near enough to totally block the oral cavity. This close approximation of the articulators means the air coming from the lungs has to squeeze through a narrow gap at high

speed, creating turbulence, or local audible friction, which is heard as hissing for a voiceless fricative, and buzzing for a voiced one. English [f] five and [s] size are voiceless fricatives, while [v] fivand [z] sizare voiced.

The subclass of affricates consists of sounds which start as stops and end up as fricatives; but as we shall see in Chapter 5, they behave as single, complex sounds rather than sequences. Stops generally involve quick release of their complete articulatory closure; but if this release is

slow, or delayed, the articulators will pass through a stage of close approximation appropriate for a fricative. The two relevant sounds for English are [tʃ], at the beginning and end of church, and its voiced equivalent [ʤ], found at the beginning and end of judge. If you pronounce these words extremely slowly, you should be able to identify the stop and fricative phases.


It is relatively easy to recognize a stop or fricative, and to diagnose the articulators involved, since these are either touching or so close that their location can be felt. In approximants, on the other hand, the active and passive articulator never become sufficiently close to create audible friction. Instead, the open approximation of the articulators alters the shape of the oral cavity, and leads to the production of a particular sound quality.

There are four approximant consonant phonemes in English: /j/ yes, /w/ wet, /r/ red (although as we have seen, /r/ may have a tapped allophone for some speakers) and /l/ let. All these approximants are voiced.

3.3.6 What is the place of articulation?

As we have seen, the location of the active and passive articulators determines the place of articulation for a consonant. In English, consonants are produced at eight places of articulation. Since we have now covered all the other articulatory parameters required to describe consonants, introducing and defining these places will allow us to build up a complete consonant phoneme system for English. In the tables below, the phoneme or allophone in question is initial in the example word, unless another part of that word is bold-face.


For a bilabial sound, the active articulator is the bottom lip, and the passive articulator is the top lip.

/p/      pie       voiceless bilabial plosive

/b/      by         voiced bilabial plosive

/m/     my        voiced bilabial nasal

There is at least one further English phoneme which to an extent fits under this heading: this is the approximant /w/ in wet. In producing [w], the lips are certainly approximated, though not enough to cause friction or obstruct the airflow; but you should be able to feel that the back of your tongue is also bunched up.


For labio-dental sounds, the active articulator is again the bottom lip, but this time it moves up to the top front teeth. Note that these sounds are labio-dental, while /w/ and /_/ are labial-velar, because in the first case, articulation takes place only at a single location, while in the second, there are two separate, simultaneous articulations.

/f/       fat        voiceless labio-dental fricative

/v/      vat       voiced labio-dental fricative


In most English sounds, and most speech sounds in general, the active articulator is part of the tongue; to avoid confusion, places of articulation where the tongue is involved are therefore generally called after the passive articulator. For the two dental fricatives, it follows that the passive articulator is the top front teeth; the active articulator is the tip of the tongue. The tongue itself is conventionally divided into the tip (the very front); the blade (just behind the blade, and lying opposite the alveolar ridge); the front (just behind the blade, and lying opposite the hard palate); the back (behind the front, and lying opposite th velum); and the root (right at the base, lying opposite the wall of the pharynx).

[θ]        thigh     voiceless dental fricative

[ð]        thy        voiced dental fricative e


Alveolar sounds are produced by the tip or blade of the tongue moving up towards the alveolar ridge, the bony protrusion you can feel if you curl your tongue back just behind your top front teeth.

/t/ tie voiceless alveolar plosive

/d/ die voiced alveolar plosive

/n/ nigh voiced alveolar nasal

/s/ sip voiceless alveolar fricative

/z/ zip voiced alveolar fricative

/r/ rip voiced alveolar central approximant

/l/ lip voiced alveolar lateral approximant

The symbol /r/ is used for the phoneme here and throughout the book, primarily because it is typographically convenient; but different realizations of /r/ are found throughout the English-speaking world, and as we have seen, [r] itself, the voiced alveolar trill, is rather rare. The tapped realization, [ɾ], is also alveolar; but another even more common pronunciation is not. This is the voiced retroflex approximant, [ɹ], which is produced with the tip of the tongue curled back slightly behind the alveolar ridge; this is the most common realization of /r/ for speakers of Southern Standard British English and General American Southern Standard British English and General American.


If you move your tongue tip back behind the alveolar ridge, you will feel the hard palate, which then, moving further back again, becomes the soft palate, or velum. Postalveolar sounds are produced with the blade of the tongue as the active articulator, and the adjoining parts of the alveolar ridge and the hard palate as the passive one. They include two fricatives, and the affricates introduced in the last section.

/ʃ/ ship voiceless postalveolar fricative

/ʒ/ beigvoiced postalveolar fricative

/tʃ/ chunk voiceless postalveolar affricate

/dʒ/ junk voiced postalveolar affricate


Palatals are produced by the front of the tongue, which moves up towards the hard palate. We have so far encountered two palatal sounds: the approximant /j/ in yes, and the voiceless palatal stop [c] in kitchen. Recall, however, that [c] is the allophone of /k/ found before certain vowels; velar [k] appears elsewhere. Since we are constructing a phoneme system here, these allophones are not included in the list.

/j/ yes              voiced palatal approximant

  1. VELAR

For velar sounds, the back of the tongue approximates to the soft palate. As with other points of contact, several types of sound can be made here. In English there are four consonants made in the velar region, the plosives /k, g/ , the nasal /ŋ/ and the voiced semi-vowel /w/ as in ‘woo’.

/k/ cot              voiceless velar plosive

/g/ got            voiced velar plosive

/ŋ/ rang         voiced velar nasal

/w/ woo           voiced semi-vowel


Glottal sounds are in the minority in articulatory terms, since they do not involve the tongue: instead, the articulators are the vocal folds, which constitute a place of articulation as well as having a crucial role in voicing. English has two glottal sounds. The first is allophonic, namely the glottal stop, [ʔ], which appears as an intervocalic realization of /t/ in many accents, as in butter. The glottal stop is technically voiceless, though in fact it could hardly be anything else, since when the vocal folds are pressed together to completely obstruct the airstream, as must be the case for a stop sound, air cannot simultaneously be passing through to cause vibration. The second, the voiceless glottal fricative [h], is a phoneme in its own right.

/h/ high           voiceless glottal fricative