Chinese Pronunciation

The language and its writing systems

Mandarin is the standard national language of China; it is also known as Standard Modern Chinese, or just Chinese. Its Chinese name is Putonghua. This presentation tells you how to pronounce it - in other words, it tells you how the speech-sound system works; it doesn't tell you how to read Chinese. There are two reasons for this. The first is that, if you want to start learning a language, knowing how to make the sounds it's built on is a much more productive use of your time than trying to read it. The second is that learning to read Chinese is a long and difficult task, longer and more difficult than learning to read any western foreign language. So let us look briefly at the two Chinese writing-systems before moving on.

One system uses what are called 'characters' - an example is shown on the first of the slides that accompany this Introduction. The three signs shown there are the signs for pu-tong-hua, one for each syllable making up the name of the Chinese language. There is no phonetic information in any of these signs - they are not made up of letters, and they tell you nothing about how they are pronounced. So to learn to read Chinese, you have to learn a minimum of 3,000 such characters. The other system is called Pinyin; it uses the western alphabet and is roughly phonetic. However, since it's not fully phonetic - I deal with that question in more detail below - I'm using in this teaching material a third system of writing, the International Phonetic Alphabet, or IPA.

The International Phonetic Alphabet is an alphabet that provides an agreed symbol for every different sound that is found in human speech. There are not as many as you might think - about 165 symbols in all. The same symbol is used for a given sound in whatever language it is found in, so if you know some of the symbols you've got a head-start in learning how to pronounce a language. Most two-language dictionaries now give the pronunciation of each word using IPA.

The 34 speech-sounds of Mandarin are shown (in IPA) on the second slide.


Most people know that Chinese is spoken with a kind of sing-song melody that makes it sound quite different from western languages. This is because Chinese is what is called a 'tone language': the melody that the speaker gives to a syllable determines its meaning. Chinese has five different tones or melodies, so each syllable can have five different meanings according to which tone is used. An example frequently given is the syllable ma, which can mean mother, horse, scold, hemp or that was a question, according to tone.

The five tones are shown at the top of the second slide, where each IPA symbol a is followed by a second symbol giving a sort of picture of the melody for that tone.

• Tone 1 is a high, level tone; you need to sing it rather than say it, because in English we have an ingrained habit of letting the pitch fall towards the end of a syllable, and you have to counteract that.

• Tone 2 is a high and rising tone. Raise your eyebrows without saying anything, then raise them again while saying ah. The pitch of your voice will rise automatically. This is Tone 2.

• For Tone 3 you have to nod, then pause, then raise your head (all silently). Then say ah while doing that. The tone falls when you nod, then rises when you raise your head. Many Chinese speakers make a noticeably long pause in the middle of syllables with this tone, which gives them time to make tea, steam rice, or resolve the latest Party leadership struggle.

• Tone 4, which drops sharply, can be got by stamping your foot.

(You don't of course actually make these physical movements when speaking. That's just for practice, and is more fun.)

• Tone 5 is a bit tricky, and varies a lot according to what has gone before (which is why some commentators don't mention it). For the moment, it doesn't matter much what you do with it as long as you don't do anything definite - just throw that syllable away. In the rest of this material, the absence of a tone-marker means that that syllable has Tone 5.

The tones themselves are not hard, but stringing them together in even a short utterance can be difficult. The major problem - which may take a lot of practice to eradicate - is that English has a strong melody, but that English melody is used quite differently from Chinese tones, and messes the sentence up when English speakers speak Chinese. If I say He's not here in an ordinary unemotional voice, the melody on here normally drops sharply; but if I say it as an angry question, then the melody rises. This use of melody to convey meaning is automatic for English speakers. In Chinese, however, each syllable has its own tone, which never changes, whether the speaker is angry or bored or asking a question or making a statement, and whether the word comes at the beginning or end of the phrase and is stressed or not; if you change the tone, you're saying a different word. Getting rid of the English melodic habits and replacing them with Chinese tones is hard. It makes it a bit easier if you think of each word as its own separate unit, having no connection with the word before or after - try to produce each syllable in isolation.

There are two sets of exceptions to this rule that the tone of a syllable never changes. One is the fifth tone, as mentioned above. The other is what is called 'tone sandhi'. This refers to the fact that where two successive syllables within the same phrase have the same tone, one of them will change to another tone. ('Sandhi' is a linguist's term for changes that happen to sounds because they're in contact with neighbouring sounds.) The two most important manifestations of this are:

• The sequence Tone 3 - Tone 3 changes to the sequence Tone 2 - Tone 3. So, for example, ni˧˨˧ xau˧˨˧ changes to ni˦˥ xau˧˨˧.

• The sequence Tone 4 - Tone 4 changes to the sequence Tone 1 - Tone 4. So, for example, pu˥˩ ʧu˥˩ changes to pu˥ ʧu˥˩.

Tone sandhi is not taught in Chinese schools, so if you ask native speakers about it they may not recognise it. But they do it just the same.


The place to start with the consonants is the seven boring sounds in the middle of the second slide, on the right. Four of these are the same as the English sounds normally denoted by that letter: m n f l. The other three symbols may be unfamiliar, but they too denote sounds that are also used in English: ŋ is the sound of English ng, ɹ is an ordinary English r, and x is the sound found at the end of the Scottish English word loch, and twice in the middle of Auchtermuchty.

Next, look at the six easy symbols on the left. They fall into three pairs. In each pair, the upper symbol stands by itself, whereas the lower symbol has a superscript ʰ after it. The ʰ denotes that there is a strong puff of breath after the consonant. You can hear the difference between the two sounds in, for example, a typical Scottish pronunciation of Perth, which has amost no puff of breath after the p, and the Southern English pronunciation, where the p has a much stronger puff of breath; you can also hear it in the difference between the French pronunciation of a word like père and the Southern English pronunciation of the like pair. The same difference separates t from and k from kʰ.

Finally, among the consonants, we need to look at the Nine Horrors. These are not as complicated as they appear at first. The first row shows three sibilants, which differ only in the position of the tongue: for s, the tip of the tongue is placed against the gum-ridge (or 'alveolus'); for ɕ, the middle of the tongue is placed against the back curve of the gum-ridge; and for ʃ, the blade of the tongue is placed against the back of the gum-ridge, while the body of the tongue is drawn down and back. (These tongue positions are shown on the ninth slide, the one with all the diagrams of a red tongue on it.) So s sounds the same as English s, ɕ is the consonant found at the end of Scottish dreich or at the beginning of many Scottish pronunciations of the word huge, and ʃ is something like English sh but without the rounding and pushing forward of the lips that usually accompanies it.

The sounds in the first row are distinguished by tongue position; those in the second row have the same tongue positions as those in the first row, but have a sort of t-sound immediately before the sibilant (and are therefore called 'affricates'). In other words, whereas for s (for example) the tongue approaches the alveolar ridge closely enough to make a hissing sound but does not actually touch it, for ʦ the tongue touches the alveolar ridge before making the hissing sound, and similarly with the other two sibilants on that row. There is a blob on the ninth slide to indicate this.

The sounds in the third row are the same as those in the second row, but with the same additional puff of breath as those produced in the breathy set pʰ tʰ kʰ.


The vowels are more difficult, and are shown on the vowel chart on the eleventh slide. Vowels in any language are made by tensing and raising the appropriate part of the tongue, and a vowel chart shows which part of the tongue is raised and tense: the front of the tongue is on the left of the chart. Virtually all the vowels of Chinese are also found in either English or French; those that are not are found in other European languages.

The 'close' row

i represents an ee sound like i in machine or in French vite.

y is the sound of u in French tu or vu, or the sound of u in a strongly Scottish pronunciation of huge or cute.

ɨ is made by pushing the centre of the tongue (rather than the front or back of the tongue) high up to the roof of the mouth; many Chinese speakers pull back the corners of the lower lip (but not the upper lip) when producing it. The same sound is found in Turkish (letter i without a dot) and Polish (letter y).

u is the vowel-sound in moon in a southern English accent.

Below the 'close' row

ʊ is the sound of southern English good, foot.

The 'close-mid' row

e is the same as French e-acute; it's like the sound in southern English led but more tense, or like the sound of Scottish laid but shorter.

ɤ is made by tensing the back of the tongue and raising it a little, while keeping the lips straight. It is found in the first syllable of Scottish Gaelic duine ('man').

Below the 'close-mid' row

ə is a sound found in many languages, the 'murmur vowel' in the first syllable of English along, about, or in the first three words of French je ne le crois pas ('I don't think so').

The 'open-mid' row

ɛ is again like the sound of southern English led, but this time slacker.

œ is ɛ made with rounded lips, and like the French eu in neuf.

ɔ represents the vowel of English cot.

The 'open' row

a is like Scottish a, a sound half-way between southern English Sam and southern English psalm.


The final topic we need to consider is the make-up of a Chinese syllable. The traditional description is that every Chinese syllable consists of two elements, an 'initial', of which there are twenty-three, and a 'final', of which there are thirty-six. However, an analysis based on the traditional concepts of western phonology is more illuminating, and is presented on the tenth slide. According to this analysis, every Chinese syllable consists of a consonant (optional), an on-glide (optional), a vowel (not optional), and an off-glide (optional). The sound ŋ is treated as an off-glide for this purpose, since it can't occur at the beginning of a syllable. There are restrictions on which sounds can be present in a given position. In first position any consonant is permitted, and in third position any vowel, but the only possible on-glides are i y u, and the only possible off-glides are i u n ŋ.


Everything that I have described so far has been presented using the International Phonetic Alphabet, but you will also want to read Pinyin (the official system for writing Chinese in the western alphabet), since teaching materials use it extensively. However, the correspondence between Pinyin and the sounds of the language as I've laid them out above is a bit rough and ready. The matching of consonants is manageable:

• The 'six easy' consonant sounds, p t k pʰ tʰ kʰ in IPA, are represented in Pinyin by b d g p t k - in other words, the unaspirated sounds are represented by the letters b d g, and the letters p t k move over to represent the aspirated sounds. It wrong-foots you, but at least it's clear.

• Four of the 'seven boring' sounds use the same letters in Pinyin as they do in IPA (and the same as English): m n f l. The other three sounds, ŋ x ɹ, are represented in Pinyin by ng h r, which are plausible English letters for these sounds. It's worth noting that the sound x is sometimes pronounced more weakly by Chinese speakers, like an English h.

• The Nine Horrors are more difficult. Their alveolar, alveolo-palatal and post-alveolar speech-sounds correspond as follows:

  IPA    Pinyin
  AlveolarAlv-palPost-alveolar  AlveolarAlv-palPost-alveolar
 Fricativesɕʃ  sxsh
 Affricate, unaspiratedʦʨʧ  zjzh
 Affricate, aspiratedʦʰʨʰʧʰ  cqch

In Pinyin the tones can be written in one of two ways (though they are often omitted). One way is to put a tone-number after the syllable: fang2 zi (no tone-number for the fifth tone) faŋ˦˥ ʦɨ house. Another is to use an accent on the vowel: fáng zi. The first of these methods is more keyboard-friendly.

The vowels in Pinyin, by contrast, are a bit of a mess. One difficulty arises from the fact that commentators do not agree on how many different vowel sounds there are, and which ones are different from others; the twelve that I give above are boiled down from my intuitive examination of recordings, rather than the result of the standard procedure that linguists carry out to determine just which sounds truly differ from which (known as a 'phonemic analysis'). The five vowel letters a e i o u are obviously not enough to represent twelve vowel-sounds, and so there are a number of cases where the Pinyin letter represents a different sound according to the consonant which comes before or after it. (There is also a sixth vowel letter, u-diaeresis ü, but many texts ignore it and use u instead.) A few examples:

• The Pinyin letter i is pronounced as ɨ when it follows the letters s z c sh zh ch (i.e. the two outer columns in the table of Nine Horrors), but as i when it follows the letters x j q (i.e. the middle column in the Nine Horrors). This produces ʦɨ for the second syllable of fang2 zi, but ʨiŋ for the second syllable of Beijing.

• The letter o is normally pronounced ɔ, but before ng it is pronounced ʊ, giving kʊŋ as the first syllable of gong zuo kʊŋ˥ ʦuɔ˥˩ have job.

• The letter a is normally pronounced a, but before n it is pronounced ɛ, giving tʰiɛn as the first syllable of Tianjin.

Further study

Pronouncing Pinyin requires the speaker to make a great number of such adjustments, and this is the reason that I decided to use IPA (and I haven't managed to identify all the adjustments.) For further study, however, here is a website that gives the pronunciation of every possible Chinese syllable. It's from that website that I gathered most of the observations that I note above.

Audio sample

You can hear an audio sample, with IPA transription, here. The text is spoken twice, by two different speakers. The second speaker makes greater use of creaky voice than the first, and her u˧˨˧ sound (for the numeral 5) is close to o˧˨˧.