web-sls 01.01 22.02.01

 

WEB-SLS

The European Journal of Language and Speech

 


Acoustic Analysis of Singleton and Geminate Affricates in Italian

by

Simone Faluschi and Maria-Gabriella Di Benedetto

INFOCOM Department, Univresity of Rome 'La Sapienza'

 

via Eudossiana, 18, 00184 Roma
simone.fal@tiscalinet.it
gaby@acts.ing.uniroma1.it

 


Abstract

The aim of this study was to analyze gemination of affricate consonants in Italian. Disyllabic VCV (vowel-consonant-vowel) and VCCV (vowel-consonant-consonant-vowel) words were considered, where C was one of the four affricate Italian consonants [ʧ, ʤ, ʦ, ʣ], in singleton and geminate forms, while V was one of the three Italian point vowels [a, i, u]. Time-related, frequency-related, and energy-related parameters were estimated; These parameters were measured at different sampling times, in order to observe their variation within an utterance. Results showed that the preconsonant vowel and consonant durations varied significantly among singleton vs. geminate utterances. Frequency and energy parameters did not present differences as large as durational parameters, although they were significantly different in the two groups. While results in the time-domain are in agreement with previous studies on Italian gemination of consonants (stops, liquids, fricatives and nasals), frequency and energy significant variations among single vs. geminate groups seem to be peculiar to the class of affricates.

 

Introduction

In Italian, minimal pairs, formed by words which can be identified only by consonant gemination, are common. Words belonging to minimal pairs are orthographically distinguished by a double grapheme of the geminate consonant (for example: micia (pussy-cat) and miccia (fuse)). A problem, which is still unsolved, regards the identification of acoustic correlates of singleton vs. geminates, and their perceptual verification. In order to give an answer to this question in Italian, several analyses - within the GEMMA project in progress at the INFOCOM Department at University 'La Sapienza' in Rome - were performed on the following classes of consonants: stops, liquids, fricatives and nasals. Results on stop consonants were reported in Rossetti (1993, 1994) and Esposito and Di Benedetto (1999). Liquids were analyzed in Argiolas et al. (1995), while the analysis of fricatives was reported in Giovanardi (1998) and Giovanardi and Di Benedetto (1998). Finally, results on nasal consonants were reported in Mattei (1999) and Mattei and Di Benedetto (2000).

As already pointed out in (Mattei and Di Benedetto, 2000) gemination has been analyzed in other languages as well: (Shrotriya, 1995), (Blumstein et al., 1998), (Rochet and Rochet, 1995), (Cohn et al., 1999), (Abramson, 1999) (Local and Simpson, 1999), (Arvaniti, 1999), (Louali and  Maddieson, 1999). However, the above studies do not analyze affricate consonants. The only study referring to this particular class of consonants is (Abramson, 1999); However, the author points out that a pre-test perceptive analysis presented a percentage of mistakes which was too high, and therefore, the affricates were discarded.

The paper is organized as follows: the speech materials and measurements are described in Section 1; acoustic analyses carried out on the above speech materials are reported in Section 2; in Section 3, the results of the acoustic analyses are discussed. Section 3 also includes the conclusions and the indications for future work.

 

1 Speech materials and measurements

In Italian, several disyllabic words form minimal pairs which can be distinguished on the sole basis of gemination of one consonant. Native speakers exhibit a natural attitude in producing disyllabic words of minimal pairs identified by the presence or absence of consonant gemination. The above consideration led to the creation of the GEMMA database formed by a set of vowel-consonant-vowel disyllabic words (the singleton case) and vowel-consonant-consonant-vowel disyllabic words (the geminate case) which would serve as the basis for studying gemination in all possible geminated consonantal forms of Italian. The words in the database included the entire set of those Italian consonants which appear in singleton and geminated forms. Affricate consonants were studied when coarticulated with the three Italian point vowels [a, i, u] and represent a subset of Italian vowels [a, e, ɛ, i, o, ɔ, u]. Words were pronounced by six pronunciation defectless and dialectal inflexionless native speakers, three men and three women aged between twenty-four and fifty. In order to keep total control on the parameters used, words were pronounced in isolation and not in carrier sentences. By this way, elements such as intonation or internal stress, which could have had an influence, were eliminated. Each word was pronounced three times by each speaker. Our procedure in building up the database was motivated by the strong need of having data with well-controlled parameters, as a starting point for future more realistic settings. The analyzed words in the present study were therefore 3 for each affricate consonant (which are [ʧ, ʤ, ʦ, ʣ] and their geminate version) and 6 for each speaker in three repetitions, leading to a total of 3x4x2x6x3=432 utterances (216 singletons and 216 geminates). Words were written on cards which were presented to the speaker by the operator. Cards were shuffled after each recording session.

All the utterances were produced and recorded at the Speech Laboratory of the INFOCOM Department at the University of Rome 'La Sapienza' (Italy). Equipment used for this experiment is of a superior quality and the recording took place in a sound-treated room. The operator who supervised the recording procedure was an acoustically trained subject. When there was an evident pronunciation mistake or if the pronunciation was judged unnatural, the speaker was compelled to repeat the word. There were three recording sessions, corresponding to each of the three repetitions.

The set of analysed words is reported in Table I. Geminate utterances are indicated by a double grapheme of the consonant.

After recording, the utterances were digitized using the UNICE software produced by VECSYS (Vecsys, 1989). Speech signals were filtered at 5 kHz, sampled at 10 kHz, and each sample was represented by using 16 bits. The resulting signals were stored on a PC. Next, an analysis performed by UNICE included spectrograms, DFT (Discrete Fourier Transform) and LPC (Linear Predictive Coding) spectra. For this processing, a pre-emphasizing filter (with α=0.95) on the signal and a Hamming window of 256 samples were used.

The GEMMA project considers the measurement of standard parameters (Giovanardi and Di Benedetto, 1998) and (Mattei and Di Benedetto, 1999). Since the affricate consonants are characterized by having an occlusive and a fricative phase, it was decided to enlarge the set of  analyzed parameters. The following parameters were used in the present study:

1.                    duration of the pre-consonant vowel, indicated as V1d. The vowel onset was identified by the appearance of a glottal pulse followed by other regular glottal pulses. In those cases in which a glottal excitation was visible before regular vowel voicing, the vowel onset was taken as the beginning of regular vowel voicing, and the initial glottal excitation was discarded. Vowel offset was identified, by examination of both the waveform and the spectrogram

2.                    duration of the occlusive part of consonant, indicated as C1d

3.                    consonant vowel, indicated as V1d. The vowel onset was identified by the appearance of a glottal pulse followed by other regular glottal pulses. In those cases in which a glottal excitation was visible before regular vowel voicing, the vowel onset was taken as the beginning of regular vowel voicing, and the initial glottal excitation was discarded. Vowel offset was identified, by examining both the waveform and the spectrogram

4.                    duration of the occlusive part of consonant, indicated as C1d

5.                    duration of the fricative part of consonant, indicated as C2d

6.                    duration of the consonant, indicated as Cd (= C1d + C2d)

7.                    duration of the vowel following the consonant, indicated as V2d. The V2 onset was identified, by visual inspection of both the waveform and the spectrogram. The V2 offset was identified as the temporal sampling point where the glottal pulse disappeared

8.                    duration of the whole utterance, indicated as Utd

9.                    total energy of V1. Xi is the sample i, t1 and t2 are the temporal sampling points of vowel onset and vowel offset, respectively

 


 

10.                 average power of V1


 

11.                 total energy of occlusive phase of C, indicated as EtotC1 and computed as for V1, with t1 and t2 corresponding to V1 offset (C1 onset) and C2 onset (C1 offset), respectively

12.                 average power of occlusive phase of C, indicated as PmC1 and computed as for the average power of V1

13.                 total energy of fricative phase of C, indicated as EtotC2 and computed as for V1, with t1 and t2 that corresponding to C1 offset (C2 onset) and C2 offset, respectively

14.                 average power of fricative phase of C, indicated as PmC2 and computed as for the average power of V1

15.                 total energy of the whole consonant, indicated as EtotC and computed as described above

16.                 average power of the whole consonant, indicated as PmC

17.                 instantaneous energy at V1 center, indicated as EiV1, computed in a window of 256 samples centered on V1

18.                 instantaneous energy at V1-C1 transition, indicated as EiV1-C1: The window of 256 samples is centered on V1 offset

19.                 instantaneous energy at C1 center, indicated as EiC1 and computed as EiV1

20.                 instantaneous energy at C1-C2 transition, indicated as EiC1-C2: The window of 256 samples is centered on C1 offset (C2 onset)

21.                 instantaneous energy at C2 center, indicated as EiC2 and computed as EiV1

22.                 instantaneous energy at C2 offset, indicated as EiC2off, computed right before the first 256 samples of V2 onset

23.                 F0, A0, F1, A1, F2, A2, F3, A3, at V1 center (where F1,F2 and F3 are the formants and A1, A2 and A3 their amplitudes)

24.                 F0, A0, F1, A1, F2, A2, F3, A3, at V1 offset

25.                 F0, A0, F1, A1, F2, A2, F3, A3, at the transition from V1 to C

26.                 F0 and A0 at the onset of voiced consonants

27.                 F0 and A0 at the centre of occlusive phase of voiced consonants

28.                 F0 and A0 at the centre of fricative phase of voiced consonants

29.                 F0 and A0 at the offset of voiced consonants

30.                 F0, A0, F1, A1, F2, A2, F3, A3, at V2 onset

31.                 F0, A0, F1, A1, F2, A2, F3, A3, at V2 center

 

Sampling points selected for the computation of the acoustic parameters and indication of the frames where frequency parameters were estimated are in Figure 1.

Time and energy domain parameters were not normalized since the comparisons were between values obtained by the same speakers who pronounced the singleton and geminate utterances under the same conditions. Moreover, normalization would have obscured potentially relevant energy information.

Back to introduction

 

2 Results of acoustic analysis

In this paragraph, time, frequency and energy related parameters are reported. The following statistical tests were performed: Mono and Multivariate Anova, the maximum a-posteriori classification test, the maximum likelihood classification test and the Spearman Rank Correlation Coefficient rs. Results in time and frequency domains are reported separately.

2.1 Results in the time domain

Table II reports the average values of V1d, C1d, C2d, V2d and Utd for each of the four consonants coarticulated with the three vowels, and the corresponding standard deviations. As can be noticed on Table II, parameters which present relevant differences between singleton and geminate utterances are: V1d, C1d and C2d (as a reminder, C1 and C2 together form the affricate). A general tendency to shorten the first vowel duration and to lenghten the consonant duration (both in occlusive and fricative phase) is present. On the contrary, there are no relevant variations in the second vowel duration. Statistical analyses on the data confirm the above observations and are reported in Table III. The values which significantly vary (p<0.05) between singletons to geminates, are indicated in bold characters. A strong significant variation is found for V1d, C1d and C2d, except for [ʦ] coarticulated with [i]. The parameter Utd increases significantly in rather all geminate utterances. Moreover, the second vowel duration V2d does not vary significantly between singletons and geminates, except for [ʤ] articulated with [a].

A Spearman Rank correlation test was carried out in order to verify if the relation between phonemes durations, in particular those between V1d, C1d and C2d, was based on gemination. The results were as follows:

·         when the two singleton and geminate groups are merged the following values of  rs are obtained: V1d vs C1d rs= - 0.471; V1d vs C2d rs = - 0.474

·         when the two utterance groups are kept separate, then the following rs values are obtained: V1d vs C1d: rs = - 0.271 for singletons and rs not significative for geminates; V1d vs C2d: rs = - 0.432 for singletons and rs = - 0.321 for geminates.

Therefore, the correlation between V1d and C1d can be attributed to the presence of gemination. The same can be said for the correlation between V1d and C2d, although not with the same strength.

On the basis of the results on time-domain quantities, it was decided to use the parameters V1d, C1d, C2d, Cd, Cd/Utd, Cd/V1d, C1d/V1d and C2d/V1d to perform utterance classification based on time parameters using the Maximum Likelihood Criterion (MLC) (Dillon W.R. and Goldstein M., 1984). Results are reported in Table IV. The error percentages are: on Cd/Utd: 19.9%, on Cd/V1d: 16.7%, on C1d/V1d: 13.9%, on C2d/V1d: 30.3%, on V1d: 23.4%, on C1d: 16.9%, on C2d: 36.3%, on Cd: 17.6%. The best result is obtained using C1d/V1d but, if we consider specific consonants, the result is even better (for example, with C1d/V1d there are no mistakes for [ʤ] and with Cd there is only one mistake (percentage error 4.63%) for [ʧ]).

2.2 Results in the frequency domain

Table V reports the average frequency parameter values and their standard deviations obtained by working out the average of all utterances. All single measured values can be found in (Faluschi, 2000). Statistical analysis, using a multivariate ANOVA, indicates that there are not statistically meaningful differences between measurements, except for slight changes in the fundamental frequency (F0) when it is measured in very specific frames. F0 is about 14 Hz and 12 Hz higher in the geminate form (+9% and +8%) in the V1 offset, and V1 to C transition frames, respectively. No significant variations were observed in F1, F2 and F3. Formant amplitudes A1, A2 and A3 are significantly higher in their geminate form (1-3 dB) in V1 center, V1 offset, V1 to C transition and V2 onset frames. Performing a Maximum Likelihood Criterion based on the statistically significative frequency parameters, we are led to about 64% of correct classifications, an average which is quite low.

2.3 Results in the energy domain

Table VI reports the data on the energy-based parameters. An ANOVA mulivariate test was performed in order to determine if statistically significative differences between averages exist. Bold characters indicate significantly different values. The general tendency is to pronounce geminate words with more emphasis than singleton ones. This is confirmed also by previous analysis where there is a higher amplitude of formants in geminate utterances. Performing a Maximum Likelihood Criterion based on the energy parameters, leads to a poor average 61.7% of correct classification.

Back to introduction

 

3 Discussion and conclusion

Affricate consonants were analyzed in the present study. These consonants have peculiar and very distinctive characteristics that made necessary a split of the consonant itself into two parts: the first is named C1 and indicates the stop phase, while the second, named C2, indicates the fricative phase. This distinction had not been necessary in previous studies about other Italian consonants.

Overall, results of the present study on affricates confirm the observations of the above studies on the other classes of Italian consonants, i.e. there is a significant increase of consonant duration in geminates, and a corresponding reduction of the pre-consonant vowel length. These results are confirmed by previous studies concerning the perception of stops (Esposito and Di Benedetto, 1999) and nasals (Mattei and Di Benedetto, 2000).

In the time domain, there is evidence for a strong connection between gemination and phoneme durations. In particular, as it was for previous gemination studies on Italian, it was evident that when moving from singleton to geminate utterances, an increase of consonant duration (both in the C1 stop phase and C2 fricative phase) and a decrease of pre-consonant vowel duration were observed. There are no significant variations in second vowel (V2) duration. Even the total duration of the utterance (Utd) moves up in the geminate form, although not in such a significant way as in V1, C1 and C2. This observation leads to suppose that a compensation effect (even if incomplete) takes place between V1 and C durations. This compensation effect is the element that tends to make the phonetic rhythm almost unchanged.

In the frequency domain, F0 increases passing from singleton to geminate only in two frames, by an average quantity of about 13 Hz that is perceptively relevant. The amplitudes of the three measured formants , A1, A2 and A3 in each analysis frame, except for V2 center, were also measured. These three amplitudes are higher of about 2 dB average in geminate utterances compared to singleton ones.

As far as the energy and power of pronunciations are concerned, we see the tendency to give more emphasis to geminates compared to singletons.

For affricates, it was found that the average difference between singles and geminates in terms of V1d is 38 ms (»-25% for geminates), in C1d is 51 ms (»+62%for geminates), in C2d is 27 ms (»+28% for geminates) while considering the whole consonant Cd=C1d+C2d the difference is 78ms (»+44% for geminates). For fricatives (Giovanardi M., 1998) the average V1d difference was 49 ms (»-28% for geminates) and in Cd 98 ms (»+73% for geminates). As regards stops (Esposito A., Di Benedetto M.G., 1999), V1d difference was 43 ms (»-26% for geminates) and 92 ms for the stop closure duration (+101% for geminates). Finally, for nasals (Mattei M., Di Benedetto M.G., 1999) the average V1d difference was 59 ms (»-32% for geminates) and in Cd 121 ms (»+134% for geminates).

The straightforward comparisons that can made with previous studies are between the C1 occlusive phase of the affricate consonants and the stop consonants (Esposito A., Di Benedetto M.G., 1999) and also between the C2 fricative phase of the affricate consonants and the fricatives (Giovanardi M., 1998).

There is evidence to assert that differences between singleton and geminate durations are less sharp and clear than in other Italian consonants.

The singleton/geminate classification based on the Maximum Likelihood Criterion using the Cd and Cd/V1d parameters leads to the following results, belonging to previous studies:

1) 0.47% of errors in nasals for both parameters

2) 12 % of errors in fricatives for both parameters

3) 4% and 8% of errors in stops for Cd and Cd/V1d, respectively

In Table IV, it is possible to see that error percentages for affricates are worse compared to those obtained for other classes of consonants.

Finally, an additional similarity between nasals, fricatives, and stops was found; the Spearman Rank correlation coefficient between Cd and V1d was for all these consonants equal to values from -0.71 to -0.78, while for affricates this value is lower.

As a general conclusion, the most relevant outcomes of the present work can be summarized as follows.

The general tendency of shortening the pre-consonant vowel and of lengthening the consonant in geminate utterance, observed on stops, fricatives and nasals in previous studies, is confirmed for affricates.

In previous studies, significant variations in the energies of utterances have not been observed, except for the total energy of the consonant in nasals which was stronger in the geminates; this effect was explained by the longer duration of the consonant since consonant power was constant. For affricates, the tendency is to emphasize both energy and power of the geminate utterance (even if differences are limited to very few dB). This effect seems peculiar to the affricate class.

Finally, we wish to point out that there are studies in progress concerning synthetised utterances of affricate consonants obtained using the articulatory synthetiser HLsyn (HLsyn Reference Manual, 1997); the project will pilot an experiment of perceptive analysis in order to support the idea that, as there is evidence in this study, the duration of phonemes is the most important factor to distinguish a singleton utterance from a geminate one. In order to illustrate this theory we insert below two link-buttons: the first connects to the singleton synthetised utterance of the [ʧ] articulated with the [a] vowel, while the second connects to the corresponding geminate synthetised utterance.

 

aʧa

 

atʧa

 

Back to introduction

 

References

Abramson A. S. (1999) "Fundamental frequency as a cue to word-initial consonant length: Pattani Malay" ICPhS99 San Francisco pp 591-594.

Argiolas F., Macrì F.,  Di Benedetto  M.G. "Acoustic analysis of Italian [r] and [l]," Journal of the Acoustical Society of America 97, no. 5, pt.2, pp.3418, 1995.

Arvaniti A. (1999) "Effects of speaking rate on the timing of single and geminate sonorants" ICPhS99 San Francisco pp 599-602.

Blumstein S.E., Pickett E., Burton M. (1998). “Effects of speaking rate on Singleton/Geminate consonant contrast in Italian”, unpublished manuscript.

Canepari L. (1979) "Introduzione alla fonetica" Piccola Biblioteca Einaudi.

Cohn A.C., Ham W.H., Podesva R. J.(1999) "The phonetic realization of singleton-geminate contrasts in three languages of Indonesia" ICPhS99 San Francisco pp 587-590.

Esposito A., Di Benedetto M.G (1999), "Acoustic and Perceptual Study of Gemination in Italian Stops", Journal of the Acoustical Society of America, completare.

Faluschi S. (2000) "Analisi Acustica e Sintesi delle consonanti affricate singole e geminate in Italiano" Degree Thesis in Electrical Engineering, Rome University "La Sapienza".

Giovanardi M. (1998) "Analisi Acustica e Sintesi delle consonanti fricative singole e geminate in Italiano" Degree Thesis in Electrical Engineering, Rome University "La Sapienza".

Giovanardi M. and Di Benedetto M.G. (1998)."Acoustic analysis of singleton and geminate fricatives in Italian" European student journal of language and speech.

HLsyn Reference Manual (1997)

Local J.and Simpson A.P. (1999) "Phonetic implementation of geminates in Malayalam nouns" ICPhS99 San Francisco pp 592-595.

Louali N.and Maddieson I. (1999) "Phonological contrast and phonetic realization: the case of Berber stops" ICPhS99 San Francisco pp 603-606.

Mattei M. and Di Benedetto M.G. (2000) "Acoustic analysis of singleton and geminate nasals in Italian" European student journal of language and speech.

Mattei M. (1999) "Analisi Acustica delle consonanti nasali singole e geminate in Italiano" Degree Thesis in Electrical Engineering, Rome University "La Sapienza".

Muljacic Z. (1972). "Fonologia della Lingua Italiana", (Il Mulino, Bologna).

Rochet,L.B., and Rochet,A.P. (1995) “The perception of the single-geminate consonant contrast by native speakers of Italian and Anglophones” in proceedings of ICPhS95 pp. 616-619.

Rossetti R. (1993) "Caratteristiche acustiche del fenomeno di geminazione nelle consonanti occlusive Italiane: applicazione all'adattamento automatico di pronunce straniere" Degree Thesis in Electrical Engineering, Rome University "La Sapienza".

Rossetti R. (1994) "Gemination of Italian stops", Journal of the Acoustical Society of America, 95, 2pSP25, pp.2874.

Shrotriya N., Siva Sarma A.S., Verma R., Agrawal S.S. (1995). “Acoustic and perceptual characteristics of geminate Hindi stop consonants”, in Proceedings of ICPhS9, pp.132-135.

Vecsys (1989). The Unice User Manual (Vecsys - Chemin du Chene rond - 91570 Bièvres, France).