In the context of this study, involving university students in Hong Kong, Morley's assertion that students need "instruction that will give them communicative empowerment - effective language use that will help them not just to survive, but to succeed" (Morley, 489; my italics) could not be more true. Both to succeed on a personal level and to contribute to the continued success of Hong Kong on the world stage, many of our students will need a standard of spoken English high enough to feel at ease in demanding business or academic situations involving participants from around the world.
In order to equip them for these tasks, the teacher of English pronunciation in Hong Kong must address the many facets of pronunciation. It is now widely accepted that pronunciation teaching involves attention not just to the segmental (phonemic) level but to the suprasegmental level as well, which includes those features which span across the phonemes and operate at sentence, discourse or language level.
Brown states that "[w]riters are now convinced of the importance of suprasegmentals in pronunciation" and argues for more attention to intonation in the classroom (Brown 1995: 172). Jones and Evans (1995) argue that pronunciation teaching should begin at the level of voice quality and point to the characteristic differences in voice settings in different languages, while Leather, in his influential state of the art paper on the subject notes that "[w]hen L2 has lexical tones [...] the arguments for prosodic training seem all the more compelling" (Leather 1983: 200). Cantonese learners of English do indeed face great difficulties, both in physiological/phonetic terms, that is, in the physical problem of how to produce authentic sounding intonation, and in phonological/semantic terms, that is, in deciding how to use intonation meaningfully in their speech.
A number of reseachers and teachers have used speech analysis programs to teach suprasegmental features of English pronunciation (e.g. Molholt 1988) and Leather welcomes the use of computer technology to teach suprasegmental features of English pronunciation (Leather, 1983: 211), while Morley refers positively to the possible imaginative uses to which computers might be put (Morley, 1991: 511). Others are at present engaged in further research into the pedagogical applications of visual displays of speech (Lambacher 1996).
The expansion in higher education in Hong Kong and elsewhere has in recent years led to greater emphasis being placed on autonomous self-access work and enhanced self-monitoring skills on the part of students. This paper reports on a pilot self-access programme in which students used a fundamental frequency analyser to see visual feedback on their intonation patterns. Projects such as this, which place in students' hands tools to help them monitor and improve their own performance are thus valuable in view of the current educational climate.
After discussing the acoustic correlates of the phonological features pitch, intonation and prominence, and giving a brief overview of the pedagogical model of intonation used, this paper presents examples of the type of feedback generated, illustrating a number of features of student's pronunciation which can be examined, and reports on possible areas of pronunciation this may benefit, before concluding with suggestions regarding the introduction of a CSL-Pitch in a Self-Access Centre.
Prominence is what we hear when a word "stands out" from those around it, as in for example the prominent word "I" in "I am", a possible answer to "Who's coming?", compared with the prominent word "am" in "I am" answering perhaps "You're not coming, are you?"
The primary physiological cause of both pitch and prominence in speech is the varying rate of vibration of the vocal folds, the acoustic correlate of which is fundamental frequency (F0). The correlation between pitch and fundamental frequency is non-linear: the frequency difference between two tones necessary for listeners to judge that the higher tone is twice as high as the other is much greater at high absolute frequencies than at low. However, as F0 frequencies are relatively low, that is, below 500Hz., pitch can for practical purposes be equated with F0 (Cruttenden 1986: 4) and indeed the vertical axis of the CSL-Pitch display is labelled "Pitch".
Other factors are involved in prominence and intonation, including duration and loudness, loudness being the (again, non-linear) perceptual correlate of the acoustic feature amplitude. While these factors are relevant, they are generally recognised to be secondary in importance to fundamental frequency (Cruttenden, 1986: 2). Although the CSL-Pitch is capable of displaying energy levels and information on a number of other parameters, these were therefore not the focus of the project and with the occasional exception of duration where this was an important factor, were not discussed.
An example will serve to make this clear. Consider this sentence, taken from a government announcement on local television :
"If we want a world that's safe for everyone, we can't do it alone"
In this example, the first clause is spoken with a "referring" fall-rise tone and the second with a "proclaiming" tone, the implication being that the idea of "a world that's safe for everyone" is something familiar to the listener, while the focus, the new idea, is contained in the next half of the sentence, "we can't do it alone".
The following table gives the possible tone choices in interactive discourse as identified by the model and the notation used in this paper to indicate them.
Table 1 The functions and notation used for the tones in direct orientation
| Function | Role of speaker | Notation (Brazil, 1985) | Realised as |
| Referring to common ground | Non-dominant | r | Fall-rise tone |
| Dominant | r+ | Rising tone | |
| Proclaiming new information | Non-dominant | p | Falling tone |
| Dominant | p+ | Rise-fall tone |
It is important to note that Brazil's tonal categories are arrived at by an initial hypothesis as to the significance of tones in English discourse: "before proceeding to detailed phonetic specification we need to know how many meaningful oppositions there are..." (Brazil 1985: 14). Only secondly does he relate this set of meaningful oppositions to their phonetic realisations, because that is "the only research procedure available" (loc. cit.).
While this approach is a more teachable system than other, attitudinal, approaches to intonation, the fact that the tonal categories proposed are based on both discoursal significance and on phonetic realisations leaves the teacher with a problem at the phonetic level. Brazil is indeed careful to emphasise that his descriptions of the phonetic realisations of the tones are not intended to be phonetically precise. Rather, they are intended to be a convenient shorthand by means of which typical realisations can be labelled, whilst recognising the wide range of non-significant variations which can occur within each category. For this reason he uses "prevaricating quotation marks" around all phonetic descriptions (Brazil 1985: 15) and maintains a "mental separation between the meaningful distinctions the speaker makes and the physical events whereby his decisions are manifest". (Brazil 1985: 22). His referring tones are thus characterised at the same time both by their function of referring to the changing common ground in the "interpenetrating biographies" of speaker and hearer as the discourse progresses, and by their typically rising or fall-rising pitch.
Table 2 The notation used in this HTML document
| Item | Notation |
| Tone unit boundaries | // text // |
| Prominence | Upper case |
| Tonic syllable | Underlining |
| Tones | Codes as in Table 1 |
To begin comparison, the student first uses the mouse to select one of the two view windows, then opens a file containing the sample model's voice. A pitch contour is displayed, which "walks" across the screen in synchrony with the playback. Students can now play this file back as many times as they wish to familiarise themselves with the changing pitch pattern and when ready can begin recording ("capturing") their own voice.
During capture, the CSL-Pitch analyses the F0 levels and produces an instant display which again "walks" across the screen as the student speaks. After capture, students can play back both the model and their own attempts, seeing the pitch contour in synchrony with the spoken playback, thus reinforcing as many times as required the auditory and visual feedback. This provides instant feedback on unsatisfactory suprasegmental features of student speech.
In addition, on-screen displays or hard copy print-outs can be obtained of various statistical values of the student's and model's speech. These can be used to draw attention to global and local features of the students speech.
Figure 2: Target utterance (TU): // r I'm aFRAID // p I have to go to a MEEting // r on WEDnesday //

This figure shows the bouncy intonation common to much student speech, in which each high peak reaches much the same level. It will be seen that the uniquely high F0 peak on the word "meeting" as spoken by the model is not replicated successfully by the student, whose pitch rises on each content word. This lack of variation in F0 leads to the perception that no prominence has been given to any particular word and thus to the impression that the speaker has not reacted to or is not aware of the context of interaction.
In addition, the word "Wednesday" spoken by the model with a clear fall-rise tone, is spoken by the model with a fall. This is a very common feature in real student discourse and again gives the impression that the speaker has not reacted appropriately to the situation, in which the idea "Wednesday" is clearly common ground between speaker and hearer and thus should carry a referring tone.
Finally, although not a matter of suprasegmental features, it will be noted from the phonetic transcription that the learner makes a number of errors at the segmental level. Using the editing features of the CSL-Pitch, students can easily highlight such errors.
It is this kind of speech, in which segmental and suprasegmental inaccuracy combine, that leads to poor performance in interactive situations.
Figure 3: TU: // p WELL // there's no point in WORRying about it // r what's DONE // p is DONE //

A related problem is seen in Figure 3, where the student fails to produce enough F0 variation, this time leading to a not incorrect but nevertheless intonationally different pronunciation of the idiomatic phrase "What's done is done". The student's low falling tones give an impression of fatalism, of the utter hopelessness of the situation, whereas the model's higher, livelier-sounding intonation is indicative of optimism, of "putting the past behind one". Again, there are a number of mispronunciations at the segmental level, which could hamper communication.
Figure 4: TU: Why don't we go on Wednesday then?

In this example, the student's vocal range, from 81 Hz. to a peak of 194 Hz. at the cursor, is clearly insufficient to bring out the prominence on "Wednesday". In contrast, the model's pitch range is much broader, from a low of 105Hz. to a peak at the cursor of 384 Hz.. Such speech as that of the student sounds bored and certainly fails to bring out the lively enthusiasm intended in the example.
Negative impressions which a learner's speech may give, such as boredom, rudeness, or a failure to react to the situation appropriately are a primary reason for work on suprasegmental features of pronunciation, for they are more subtle and pernicious than shortcomings at the segmental level, interfering as they do on a subconscious level with cultural and social expectations.
Figure 5: TU: // what DID you wear //p ANyway //

Just such a subtle effect is shown by the contour of the student's utterance in Figure 5. Here, although the F0 reaches a high enough peak, the contours of the pitch on both the words "what" and "did" rises extremely sharply; in the model's version only the word "what" has a comparably sharp rise. This sharp upwards moving contour is characteristic of Cantonese pronunciation of monosyllabic words with a final stop consonant and could, in an utterance such as this, give a listener an unintended impression of brusqueness or rudeness. It is thus on sociolinguistic grounds an area of concern and deserves the students' and teacher's attention. Again, without a facility such as the CSL-Pitch it is extremely hard to isolate such features.
In contrast to this, Figure 6 on the next page illustrates a strikingly successful student modelling of the utterance "Why don't we go on Wednesday then", showing a high pitch peak on the stressed syllable of the word "Wednesday", which, at 373 Hz, is almost the same as that of the model (385 Hz) and follows an almost identical contour. This example shows the positive feedback the CSL-Pitch provides upon satisfactory suprasegmental pronunciation.
Figure 6 (TU): "Why don't we go on Wednesday then?

Figure 7: TU: // I MANaged to answer all the QUESTions //

Figure 7 too shows satisfactory prominence on the word "managed", with a high pitch peak and very similar contour pattern , although it will be noted that the pitch contour on the tonic syllable does not match that of the model.
Figure 8: Target Utterance: // i RANG the BELL //

This example shows good control of F0 and duration. The falling tone, as illustrated here, generally causes fewer difficulties to students than do the other tones and so provides a good introduction in the early stages of intonation practice while students are experimenting with the CSL-Pitch and with how to control their own voices.
Figure 9: TU: // r what's DONE // p is DONE //

Notice the student's maximum pitch peak, at 297 Hz., as opposed to that of the model at more than twice that height. It is most common to find that students' maximum pitch reaches only about half that of the model.
Figure 10: TU: // p I MEAN //

In this example, the student takes 30% longer to say the words "I mean" than does the model and also makes a slightly exaggerated fall-rise tone instead of a falling tone.
Figure 11: TU: // r COULd I have a WORD with him //

Figure 11, above, shows a more noticeable failure to match the intonation contour, resulting in a very exaggerated fall-rise tone and greatly exaggerated duration on "word".
Figure 12: TU: // r It SHOULDn't be LONG now //

Figure 12 shows an exaggerated fall-rise tone exacerbated by undue duration on the words "long now".
These print-outs are thus examples of the features of student pronunciation which can be examined with much greater certainty than is possible without such equipment.
Bibliography
Bolton, Kingsley and Kwok, Helen (1990) "The Dynamics of the Hong Kong Accent: Social Identity and Sociolinguistic Description" Department of English Studies and Comparative Literature U Hong Kong, Hong Kong Journal of Asian Pacific Communication; 1990, 1, 147-172
Brazil, D. (1985) The Communicative Value of Intonation in English Discourse Analysis Monograph No. 8 ELR University of Birmingham
Brown, Adam (1995) "Minimal pairs: minimal importance?" in ELT Journal Vol. 49/2, April 1995, 169 - 175
Coulthard, M. (1992) "The significance of intonation in discourse" in Coulthard, M. (ed.) Advances in Spoken Discourse Analysis Routledge London
Cruttenden, A (1986) Intonation CUP Cambridge
Jones, R.H. and Evans, S. "Teaching pronunciation through voice quality" in ELT Journal 49: 3 244- 251
Fry, D.B. (1979) The Physics of Speech Cambridge University Press
Lambacher, S.G. (1996) Spectrograph Analysis as a Tool in Developing L2 Pronunciation Skills Feb. 28, 1996 http://www.u-aizu.ac.jp/~steeve/speakout2.html (July 1, 1996)
Leather, Jonathan (1983) "Second-language pronunciation learning and teaching" in Language Teaching July 1983, Vol. 16, no. 3, 198 -219
Molholt, G. (1988) "Computer-assisted instruction in pronunciation for Chinese speakers of American English" in TESOL Quarterly Vol. 22, No. 1, March 1988, 91 - 111
Morley, Joan (1991) "The pronunciation component in teaching English to speakers of other languages" in TESOL Quarterly Vol 25 (3), 481-520
Throughout this paper the example utterances are from Brazil 1994 and Bradford 1988.