Essentially all the sounds that we hear are the result of vibrations in the air around us.  For practical purposes we limit our study of speech to the sounds that are made when air is expelled from the lungs and is modified in various ways as it moves upward and out of the body. The vocal cords in the larynx provide the basic vibration in the air stream, which is further modified above the larynx in the vocal tract. There are two kinds of modification: the vocal tract can be shaped in different ways so that air vibrates in different patterns of resonance, or the air stream can be obstructed, wholly or partly, in different places.

  1. THE HUMAN VOICE

Human speech is very much like the playing of a wind instrument. Different speech sounds, in any language, are made by moving a column of air through part of the upper body and creating various kinds of vibration and noise as the air moves. It is possible to use air that is drawn into the body from outside (try to say ‘Yes’ while inhaling). A more familiar way of using ingressive air for sound-production is to produce a click, such as the tongue-tip noise which we represent as tsk-tsk, or the clucking sound that is sometimes used in getting a horse to move. To produce such clicks we create a vacuum in the mouth, then open suddenly so that air rushes in. Another way of producing an air stream is to gather a quantity of air in the throat and then eject it all at once. In almost all of our speaking, however, we use a column of air which moves up from the lungs and out the mouth or the nose or both together, and we modify the air in its passage. Everything that we say about speech sounds from here on will assume the use of egressive lung air.


All of the vocal organs have other functions – breathing, sucking, chewing, swallowing. The lungs expand and contract to bring in air or let it out. Air expelled from the lungs travels up the trachea, or windpipe. At the top of the trachea is a structure of cartilage known as the larynx, or voicebox. The primary vibration needed for speech is produced in the larynx by the vocal cords, which are described in the next section. Above the larynx are three interconnected areas, the pharynx, the nasal cavity, and the mouth (or oral cavity) which serve as resonance chambers. The three together are called the vocal tract.

 
We can summarize the functions of these vocal organs this way: The lungs supply the basic force which is needed (initiation). The vocal cords furnish the basic vibration (phonation). The three resonance chambers are maneuvered in various ways to produce sounds of different quality (articulation). In general, variations in the force with which air is expelled from the lungs results in differences of intensity or loudness in speech sounds. Variations in the frequency of vibration of the vocal cords are responsible for variations in pitch, or tone. Variations in the shape of the vocal tract are related to different speech sounds. In theory, then, loudness, pitch, and articulation are separate from one another. In reality, these things go together. In an English utterance the parts which are more prominent than others are generally louder, higher in pitch, and have certain special articulatory characteristics. Most notably, the prominent parts have greater duration.
 
  1. THE VOCAL CORDS

Within the larynx are two elastic bands of tissue, the vocal cords (or vocal bands, or vocal folds). These extend from the thyroid (‘shield-shaped’) cartilage in front – which, in the male, protrudes as the so-called Adam’s apple – to the arytenoid (‘ladle-shaped’) cartilages in the back. The vocal cords are joined together at the thyroid cartilage but are attached separately to the two arytenoid cartilages, which can rotate and thus move the vocal cords apart or together. The opening between the cords is called the glottis. For ordinary breathing the glottis is wide open. For various purposes we can close the glottis, that is, bring the vocal cords together. We do this automatically when we swallow, so that food goes down the esophagus to the stomach and not into the trachea. The glottis also closes in order to lock air into the lungs so that our bodies can exert greater force, as in lifting a heavy object.

When they are neither fully open nor completely closed, the vocal cords are somewhat tense, and then the pressure of the outgoing air causes them to vibrate. The vibration is not audible; speech is produced above the larynx. As we produce a stream of speech, the vocal cords are sometimes vibrating, sometimes not vibrating. Speech sounds produced while the vocal cords are vibrating are voiced; those made without vibration are unvoiced, or voiceless. If the cords are vibrating, they can be stretched to different degrees of tension, so that they vibrate at different frequencies, producing different pitches in the sounds articulated above the vocal cords. Speech has melody – different melodies or intonation patterns – as a result of these different frequencies of vibration.

How can you tell if a speech sound is voiced or voiceless? There are three good tests. Take the sound [z] as at the end of the word buzz and the sound [s] as at the end of the word hiss. Make a long [z-z-z] and a long [s-s-s] as you apply the tests.

(1) Put your thumb and fingers on your throat, on each side of the Adam’s apple. You should feel vibration as you say one of these sounds, but not the other. (2) Cover your ears with your hands while making the two sounds. You should hear greater noise while making the sound that uses vibration in the larynx. (3) Try to sing up and down the scale while making each of these sounds. Singing means changing pitch, which means changing the frequency of vocal cord vibration. If the cords are vibrating, you can sing; if not, you can’t.

By this time you should have decided – if you didn’t already know – that [z] is voiced and [s] is voiceless. Try the same three tests with the [f] sound of fife and the [v] sound of valve. This test works well with fricative sounds like [s z f v] but it does not work well with stops such as the beginning and end of pop and bob.

  1. THE VOCAL TRACT

Air which leaves the larynx goes through the pharynx and then out through the nasal cavity or the oral cavity or both at the same time. Some speech sounds result from obstructing the flow of air somewhere in the pharynx or oral cavity. The obstruction may be partial, as in articulating the sounds [f v s z], or it may be complete, as in making the first and last sounds of pop, bob, tot, dead. Speech sounds which result from complete or partial obstruction are called obstruents; they are essentially noise, the result of aperiodic vibrations. Other, more musical speech sounds are called sonorants, or resonants; examples are the vowel sounds of the four words above and the consonants of mill and run. In the articulation of a sonorant the basic glottal vibration produced in the larynx is modified in the vocal tract. By changing the shape of parts of the vocal tract we create different resonance chambers so that different parts of the basic glottal vibration are strengthened while other parts are weakened.

The shape of the pharynx can be modified only slightly, either by raising the larynx or by retracting the root of the tongue – either of which makes the pharynx smaller. Changes in the shape of the pharynx have only minor importance in the production of English speech sounds.

The nasal cavity cannot be varied at all. It serves as a resonance chamber only if air is allowed to flow through it; speech sounds made with air not  going through the nasal cavity have no nasal resonance. At the junction of the pharynx and the mouth there is a sort of trap door, the velic, the very end of the velum. The velic is lowered to permit the entry of air into the nasal cavity or raised to prevent it. Pronounce the word mum and prolong the final [m] sound. You are making a sound with egressive lung air, since it can be prolonged, and yet your lips are closed. Where is air escaping? Holding a finger in front of the nostrils will give the answer. And then, is [m] voiced or voiceless?

Of the three resonance chambers the mouth is the one which can be varied most. The jaw and tongue can be lowered and raised. The tongue may touch the roof of the mouth, stopping the flow of air (complete obstruction), or it may be positioned close to the roof of the mouth so that air moves through a narrow opening with friction or turbulence (partial obstruction). In a similar way the lower lip may cause complete or partial obstruction at the upper lip or upper teeth. The tongue, but not the lip, moves back and forth to cause obstruction, full or partial, in various parts of the mouth. Without any obstruction the tongue can assume different heights, producing resonance chambers of different size and shape. Moreover, the tongue can be curled in at the sides, or back at the tip; it can acquire a groove along the center line or instead maintain a flat surface. The lips can be stretched or rounded. All these ways of shaping the vocal tract are used in articulating speech sounds. In fact, all the sounds we shall study require some action or state in the mouth, with or without concomitant use of larynx, pharynx, and nasal cavity.

  1. KINDS OF SPEECH SOUNDS

Each syllable which the child utters is likely to consist of two parts, closure and opening. The air stream is obstructed and then it flows freely; there is a consonant and then a vowel. Later, depending on what the child is imitating, some syllables will consist of closure, opening, and closure – consonant, vowel, consonant. The degree of closure and the amount of opening vary freely at first, and so does the place where closure is made, but eventually the child learns to control these manipulations and begins to sound like other members of the language community. Controlling the manipulations means learning to make different kinds of closure and opening and to make them in different parts of the vocal tract.

Different kinds of speech sounds, different manners of articulating, are different ways of manipulating the air stream. We recognize six kinds of speech sounds: vowelsglidesnasalsliquidsfricatives, and stops. Vowels and stops (the latter also called plosives) are completely different. Vowels are produced by allowing the air to flow freely, stops are made by complete obstruction of the air stream. Other kinds of speech sounds have some of the characteristics, or features, of vowels and some of the features of stops. Stops and vowels differ from each other in four features:

  1. A vowel is resonant, the result of periodic waves; when a vowel is articulated, particles of air vibrate in regular, repetitive patterns. A stop is essentially an instant of silence. If air particles are vibrating at all, there is no regular pattern. To express this difference we say that vowels are [sonorant] and stops are [− sonorant].
  2. A vowel is the center or peak of its syllable, more prominent than what precedes or follows in the syllable. When two or more adjoining syllables differ in loudness or pitch, the difference is in their respective vowels. In a two-syllable word like baby the first syllable is more prominent than the second, and that difference is due to the comparative prominence of their vowels. If the voice rises in saying ‘Baby?’ or instead falls and produces ‘Baby!’ – that is, whether the vocal cords increase or decrease their frequency of vibration – the change occurs in the vowels. Stops have no role in relative prominence or change of pitch. Thus we say that vowels are [syllabic] and stops are [− syllabic].
  3. When a vowel is articulated, air comes continuously out of the mouth. The nature of a stop is that air is stopped – prevented from escaping. We say that vowels are [continuant] and stops are [− continuant].
  4. When a stop is articulated, either the lower lip or some part of the tongue is in contact with some other part of the mouth – the upper lip or some part of the roof of the mouth. When a vowel is articulated, there is no interruption of the air stream. This distinction is captured with a feature [consonantal]. When there is some interruption of the breath stream, as there is for stops, the segment is [consonantal]. Vowels are [− consonantal].

The other four classes of speech sounds, fricatives, nasals, liquids, and glides, are partly like stops and vowels but of course are also different from them and from one another. The four features, [sonorant], [syllabic], [continuant] and [consonantal], describe their similarities and differences.

Fricatives are segments like the [f v s z] of feel, veal, seal, zeal, respectively. They are articulated by squeezing the outgoing air stream between an articulator (the lower lip or some part of the tongue) and a point of articulation (the upper lip or some part of the roof of the mouth) so that turbulence or friction – rubbing – results. Fricatives are like stops in three features but differ in one. Like stops, they are the result of aperiodic vibration, therefore [− sonorant]; they require some interruption of the air stream, so they are [+ consonantal]; they are not typically the peaks of syllables and so are designated [− syllabic]. (There are marginal exceptions to the last statement: a hiss [s-s-s] and the interjection that we write pst! have a fricative as the peak of a syllable, but there are no English words with such syllables). Finally, unlike stops, fricatives are [+ continuant] since air is flowing continuously out of the mouth. Say cup and see if you can prolong the final sound; say cuff and hold the last sound as long as you can.

Nasals are segments like the [m] of mitt and the [n] of knit, sounds made by stopping the flow of air somewhere in the mouth but letting it exit through the nose. Nasals are musical – [+ sonorant] – as every singer and teacher of singing knows. Since the air stream is interrupted in the mouth, they are [+ consonantal]. Since air does not escape through the mouth, they are [− continuant]. (This is a matter of definition; [+ continuant] is defined to mean ‘with air flowing out the mouth’; actually a nasal can be prolonged because air is flowing continuously through another exit. Say come and make the last sound continue as long as you have breath.) Last, we classify nasals as both plus and minus syllabic – [} syllabic]. They are usually not the peak of a syllable, but they can be, as in the word kitten.

Liquids include the [l] of lead and the [r] of read. In their articulation the tongue is raised, partly impeding the flow of air, but the tongue is shaped in such a way that air flows around it, creating particular patterns of vibration. Because of the impedance liquids are classed as [+ consonantal]; because of the periodic vibration they are [+ sonorant]; because air flows freely they are [+ continuant]. Finally, like nasals, they are [} syllabic] – usually not the peak of a syllable but sometimes the peak, as in metal and manner.

Glides include the [j] of yet and the [w] of wet, for example. Glides are like vowels except in one feature. Slow down the pronunciation of yet and wet until each word becomes two syllables, the first starting with a vowel like that of tea, the second word beginning with a vowel like that of too. A glide is like a vowel except that it does not have the prominence of a vowel, does not act as the peak of a syllable. Glides, then, are [− syllabic] but in other respects are like vowels: [+ sonorant], [+ continuant], [− consonantal].

Four features have been used to define six classes of speech sounds. The following chart summarizes them, with abbreviations of the feature names that will be used hereafter:

 

Naturally each class is defined by a different cluster of pluses and minuses.

We have introduced the four features with emphasis on the six manners of articulation that they define. Let’s recapitulate with emphasis on the features themselves.

A speech sound is [+ syllabic] if it is the most prominent segment of a syllable, the principal carrier of stress and pitch. Vowels are [+ syl] always, nasals and liquids may be [+ syl] but are more often [− syl], and other segments are [− syl].

A speech sound is [+ consonantal] if its articulation requires interruption of the breath stream, accomplished with the lower lip or some part of the tongue. Liquids, nasals, fricatives, and stops are [+ cons], vowels and glides are [− cons].

A speech sound is [+ continuant] if it is articulated with air flowing continuously out of the mouth. Vowels, glides, liquids, and fricatives are [+ cont], nasals and stops (which might be called ‘nasal stops’ and ‘oral stops’, respectively) are [− cont].

A speech sound is [+ sonorant] if its quality depends on the regular patterns of vibration of air particles within the vocal tract, so that some part of the vocal tract acts as a resonance chamber. Vowels, glides, liquids, and nasals are [+ son], stops and fricatives are [− son].