|
WEB-SLS The European Journal of Language and Speech |
![]() |
Acoustic Analysis of Singleton and Geminate Affricates in Italian
by
Simone Faluschi and Maria-Gabriella Di Benedetto
INFOCOM Department, Univresity of Rome 'La Sapienza'
via Eudossiana, 18, 00184 Roma
simone.fal@tiscalinet.it
gaby@acts.ing.uniroma1.it
The aim of this study was to analyze gemination of affricate consonants in Italian. Disyllabic VCV (vowel-consonant-vowel) and VCCV (vowel-consonant-consonant-vowel) words were considered, where C was one of the four affricate Italian consonants [ʧ, ʤ, ʦ, ʣ], in singleton and geminate forms, while V was one of the three Italian point vowels [a, i, u]. Time-related, frequency-related, and energy-related parameters were estimated; These parameters were measured at different sampling times, in order to observe their variation within an utterance. Results showed that the preconsonant vowel and consonant durations varied significantly among singleton vs. geminate utterances. Frequency and energy parameters did not present differences as large as durational parameters, although they were significantly different in the two groups. While results in the time-domain are in agreement with previous studies on Italian gemination of consonants (stops, liquids, fricatives and nasals), frequency and energy significant variations among single vs. geminate groups seem to be peculiar to the class of affricates.
In
Italian, minimal pairs, formed by words which can be identified only by
consonant gemination, are common. Words belonging to minimal pairs are
orthographically distinguished by a double grapheme of the geminate consonant
(for example: micia (pussy-cat) and miccia (fuse)). A problem, which is still
unsolved, regards the identification of acoustic correlates of singleton vs.
geminates, and their perceptual verification. In order to give an answer to
this question in Italian, several analyses - within the GEMMA project in
progress at the INFOCOM Department at University 'La Sapienza' in Rome - were
performed on the following classes of consonants: stops, liquids, fricatives
and nasals. Results on stop consonants were reported in Rossetti (1993, 1994)
and Esposito and Di Benedetto (1999). Liquids were analyzed in Argiolas et al.
(1995), while the analysis of fricatives was reported in Giovanardi (1998) and
Giovanardi and Di Benedetto (1998). Finally, results on nasal consonants were
reported in Mattei (1999) and Mattei and Di Benedetto (2000).
As
already pointed out in (Mattei and Di Benedetto, 2000) gemination has been
analyzed in other languages as well: (Shrotriya, 1995), (Blumstein et al.,
1998), (Rochet and Rochet, 1995), (Cohn et
al., 1999), (Abramson, 1999) (Local and
Simpson, 1999), (Arvaniti, 1999), (Louali
and Maddieson, 1999).
However, the above studies do not analyze affricate consonants. The only study
referring to this particular class of consonants is (Abramson, 1999); However,
the author points out that a pre-test perceptive analysis presented a
percentage of mistakes which was too high, and therefore, the affricates were
discarded.
The paper is organized as follows: the speech materials and measurements are described in Section 1; acoustic analyses carried out on the above speech materials are reported in Section 2; in Section 3, the results of the acoustic analyses are discussed. Section 3 also includes the conclusions and the indications for future work.
In Italian, several disyllabic words form minimal pairs which can be distinguished on the sole basis of gemination of one consonant. Native speakers exhibit a natural attitude in producing disyllabic words of minimal pairs identified by the presence or absence of consonant gemination. The above consideration led to the creation of the GEMMA database formed by a set of vowel-consonant-vowel disyllabic words (the singleton case) and vowel-consonant-consonant-vowel disyllabic words (the geminate case) which would serve as the basis for studying gemination in all possible geminated consonantal forms of Italian. The words in the database included the entire set of those Italian consonants which appear in singleton and geminated forms. Affricate consonants were studied when coarticulated with the three Italian point vowels [a, i, u] and represent a subset of Italian vowels [a, e, ɛ, i, o, ɔ, u]. Words were pronounced by six pronunciation defectless and dialectal inflexionless native speakers, three men and three women aged between twenty-four and fifty. In order to keep total control on the parameters used, words were pronounced in isolation and not in carrier sentences. By this way, elements such as intonation or internal stress, which could have had an influence, were eliminated. Each word was pronounced three times by each speaker. Our procedure in building up the database was motivated by the strong need of having data with well-controlled parameters, as a starting point for future more realistic settings. The analyzed words in the present study were therefore 3 for each affricate consonant (which are [ʧ, ʤ, ʦ, ʣ] and their geminate version) and 6 for each speaker in three repetitions, leading to a total of 3x4x2x6x3=432 utterances (216 singletons and 216 geminates). Words were written on cards which were presented to the speaker by the operator. Cards were shuffled after each recording session.
All
the utterances were produced and recorded at the Speech Laboratory of
the INFOCOM Department at the University of Rome 'La Sapienza' (Italy).
Equipment used for this experiment is of a superior quality and the recording
took place in a sound-treated room. The operator who supervised the recording
procedure was an acoustically trained subject. When there was an evident
pronunciation mistake or if the pronunciation was judged unnatural, the
speaker was compelled to repeat the word. There were three recording sessions,
corresponding to each of the three repetitions.
The
set of analysed words is reported in Table
I. Geminate utterances are indicated by a double
grapheme of the consonant.
After
recording, the utterances were digitized using the UNICE software produced by
VECSYS (Vecsys, 1989). Speech signals were filtered at 5 kHz, sampled at 10
kHz, and each sample was represented by using 16 bits. The resulting signals
were stored on a PC. Next, an analysis performed by UNICE included
spectrograms, DFT (Discrete Fourier Transform) and LPC (Linear Predictive
Coding) spectra. For this processing, a pre-emphasizing filter (with α=0.95)
on the signal and a Hamming window of 256 samples were used.
The GEMMA project considers the measurement of standard parameters (Giovanardi and Di Benedetto, 1998) and (Mattei and Di Benedetto, 1999). Since the affricate consonants are characterized by having an occlusive and a fricative phase, it was decided to enlarge the set of analyzed parameters. The following parameters were used in the present study:
1. duration of the pre-consonant vowel, indicated as V1d. The vowel onset was identified by the appearance of a glottal pulse followed by other regular glottal pulses. In those cases in which a glottal excitation was visible before regular vowel voicing, the vowel onset was taken as the beginning of regular vowel voicing, and the initial glottal excitation was discarded. Vowel offset was identified, by examination of both the waveform and the spectrogram
2. duration of the occlusive part of consonant, indicated as C1d
3. consonant vowel, indicated as V1d. The vowel onset was identified by the appearance of a glottal pulse followed by other regular glottal pulses. In those cases in which a glottal excitation was visible before regular vowel voicing, the vowel onset was taken as the beginning of regular vowel voicing, and the initial glottal excitation was discarded. Vowel offset was identified, by examining both the waveform and the spectrogram
4. duration of the occlusive part of consonant, indicated as C1d
5. duration of the fricative part of consonant, indicated as C2d
6. duration of the consonant, indicated as Cd (= C1d + C2d)
7. duration of the vowel following the consonant, indicated as V2d. The V2 onset was identified, by visual inspection of both the waveform and the spectrogram. The V2 offset was identified as the temporal sampling point where the glottal pulse disappeared
8. duration of the whole utterance, indicated as Utd
9. total energy of V1. Xi is the sample i, t1 and t2 are the temporal sampling points of vowel onset and vowel offset, respectively
![]()
10.
average power
of V1
![]()
11.
total energy of
occlusive phase of C, indicated as EtotC1
and computed as for V1, with t1 and t2 corresponding to
V1 offset (C1 onset) and C2 onset (C1 offset), respectively
12.
average power
of occlusive phase of C, indicated as PmC1
and computed as for the
average power of V1
13.
total energy of
fricative phase of C, indicated as EtotC2
and computed as for V1, with t1 and t2 that
corresponding to C1 offset (C2 onset) and C2 offset, respectively
14.
average power
of fricative phase of C, indicated as PmC2
and computed as for the
average power of V1
15.
total energy of
the whole consonant, indicated as EtotC
and computed as described
above
16.
average power
of the whole consonant, indicated as PmC
17.
instantaneous
energy at V1 center, indicated as EiV1,
computed in a window of 256 samples centered on V1
18.
instantaneous
energy at V1-C1 transition, indicated as EiV1-C1:
The window of 256 samples is centered on V1 offset
19.
instantaneous
energy at C1 center, indicated as EiC1
and computed as EiV1
20.
instantaneous
energy at C1-C2 transition, indicated as EiC1-C2:
The window of 256 samples is centered on C1 offset (C2 onset)
21.
instantaneous
energy at C2 center, indicated as EiC2
and computed as EiV1
22.
instantaneous
energy at C2 offset, indicated as EiC2off,
computed right before the
first 256 samples of V2 onset
23.
F0, A0, F1, A1,
F2, A2, F3, A3, at V1 center
(where F1,F2 and F3 are the formants and A1, A2 and A3 their amplitudes)
24.
F0, A0, F1, A1,
F2, A2, F3, A3, at V1 offset
25.
F0, A0, F1, A1,
F2, A2, F3, A3, at the transition from V1 to C
26.
F0 and A0 at
the onset of voiced consonants
27.
F0 and A0 at
the centre of occlusive phase of voiced consonants
28.
F0 and A0 at
the centre of fricative phase of voiced consonants
29.
F0 and A0 at
the offset of voiced consonants
30.
F0, A0, F1, A1,
F2, A2, F3, A3, at V2 onset
31.
F0, A0, F1, A1,
F2, A2, F3, A3, at V2 center
Sampling
points selected for the computation of the acoustic parameters and indication
of the frames where frequency parameters were estimated are in Figure
1.
Time and energy domain parameters were not normalized since the comparisons were between values obtained by the same speakers who pronounced the singleton and geminate utterances under the same conditions. Moreover, normalization would have obscured potentially relevant energy information.
In
this paragraph, time, frequency and energy related parameters are reported.
The following statistical tests were performed: Mono and Multivariate Anova,
the maximum a-posteriori classification test, the maximum likelihood
classification test and the Spearman Rank Correlation Coefficient rs.
Results in time and frequency domains are reported separately.
2.1
Results in the time domain
Table
II reports the average values of V1d, C1d, C2d,
V2d and Utd for each of the four consonants coarticulated with the three
vowels, and the corresponding standard deviations. As
can be noticed on Table II,
parameters which present relevant differences between singleton and geminate
utterances are: V1d, C1d and C2d (as a reminder, C1 and C2 together form the
affricate). A general tendency to shorten the first vowel duration and to
lenghten the consonant duration (both in occlusive and fricative phase) is
present. On the contrary, there are no relevant variations in the second vowel
duration. Statistical analyses on the data confirm the above observations and
are reported in Table III. The
values which significantly vary (p<0.05) between singletons to geminates,
are indicated in bold characters. A strong significant variation is found for
V1d, C1d and C2d, except for [ʦ]
coarticulated with [i]. The parameter Utd increases significantly in rather
all geminate utterances. Moreover, the second vowel duration V2d does not vary
significantly between singletons and geminates, except for [ʤ]
articulated with [a].
A
Spearman Rank correlation test was carried out in order to verify if the
relation between phonemes durations, in particular those between V1d, C1d and
C2d, was based on gemination. The results were as follows:
·
when the two
singleton and geminate groups are merged the following values of
rs are obtained: V1d vs C1d rs= - 0.471; V1d vs
C2d rs = - 0.474
·
when the two
utterance groups are kept separate, then the following rs values
are obtained: V1d vs C1d: rs = - 0.271 for singletons and rs
not significative for geminates; V1d vs C2d: rs = - 0.432 for
singletons and rs = - 0.321 for geminates.
Therefore,
the correlation between V1d and C1d can be attributed to the presence of
gemination. The same can be said for the correlation between V1d and C2d,
although not with the same strength.
On
the basis of the results on time-domain quantities, it was decided to use the
parameters V1d, C1d, C2d, Cd, Cd/Utd, Cd/V1d, C1d/V1d and C2d/V1d to perform
utterance classification based on time parameters using the Maximum Likelihood
Criterion (MLC) (Dillon
W.R. and Goldstein M., 1984). Results are reported in Table
IV. The error percentages are: on Cd/Utd: 19.9%, on Cd/V1d: 16.7%, on
C1d/V1d: 13.9%, on C2d/V1d: 30.3%, on V1d: 23.4%, on C1d: 16.9%, on C2d:
36.3%, on Cd: 17.6%. The best result is obtained using C1d/V1d but, if we
consider specific consonants, the
result is even better (for example, with C1d/V1d there are no mistakes for [ʤ]
and with Cd there is only one mistake (percentage error 4.63%) for [ʧ]).
2.2 Results in the frequency domain
Table
V reports the average frequency parameter values and their standard
deviations obtained by working out the average of all utterances. All single
measured values can be found in (Faluschi, 2000). Statistical analysis, using
a multivariate ANOVA, indicates that there are not statistically meaningful
differences between measurements, except for slight changes in the fundamental
frequency (F0) when it is measured in very specific frames. F0 is about 14 Hz
and 12 Hz higher in the geminate form (+9% and +8%) in the V1 offset, and V1
to C transition frames, respectively. No significant variations were observed
in F1, F2 and F3. Formant amplitudes A1, A2 and A3 are significantly higher in
their geminate form (1-3 dB) in V1 center, V1 offset, V1 to C transition and
V2 onset frames. Performing a Maximum Likelihood Criterion based on the
statistically significative frequency parameters, we are led to about 64% of
correct classifications, an average which is quite low.
2.3 Results in the energy domain
Table
VI reports the data on the energy-based parameters. An ANOVA mulivariate
test was performed in order to determine if statistically significative
differences between averages exist. Bold characters indicate significantly
different values. The general tendency is to pronounce geminate words with
more emphasis than singleton ones. This is confirmed also by previous analysis
where there is a higher amplitude of formants in geminate utterances.
Performing a Maximum Likelihood Criterion based on the energy parameters,
leads to a poor average 61.7% of correct classification.
Affricate
consonants were analyzed in the present study. These consonants have peculiar
and very distinctive characteristics that made necessary a split of the
consonant itself into two parts: the first is named C1 and indicates the stop
phase, while the second, named C2, indicates the fricative phase. This
distinction had not been necessary in previous studies about other Italian
consonants.
Overall,
results of the present study on affricates confirm the observations of the
above studies on the other classes of Italian consonants, i.e. there is a
significant increase of consonant duration in geminates, and a corresponding
reduction of the pre-consonant vowel length. These results are confirmed by
previous studies concerning the perception of stops (Esposito and Di Benedetto,
1999) and nasals (Mattei and Di Benedetto, 2000).
In
the time domain, there is evidence for a strong connection between gemination
and phoneme durations. In particular, as it was for previous gemination
studies on Italian, it was evident that when moving from singleton to geminate
utterances, an increase of consonant duration (both in the C1 stop phase and
C2 fricative phase) and a decrease of pre-consonant vowel duration were
observed. There are no significant variations in second vowel (V2) duration.
Even the total duration of the utterance (Utd) moves up in the geminate form,
although not in such a significant way as in V1, C1 and C2. This observation
leads to suppose that a compensation effect (even if incomplete) takes place
between V1 and C durations. This compensation effect is the element that tends
to make the phonetic rhythm almost unchanged.
In
the frequency domain, F0 increases passing from singleton to geminate only in
two frames, by an average quantity of about 13 Hz that is perceptively
relevant. The amplitudes of the three measured formants , A1, A2 and A3 in
each analysis frame, except for V2 center, were also measured. These three
amplitudes are higher of about 2 dB average in geminate utterances compared to
singleton ones.
As far as the energy and power of pronunciations are concerned, we see the tendency to give more emphasis to geminates compared to singletons.
For
affricates, it was found that the average difference between singles and
geminates in terms of V1d is 38 ms (»-25% for
geminates), in C1d is 51 ms (»+62%for
geminates), in C2d is 27 ms (»+28% for
geminates) while considering the whole consonant Cd=C1d+C2d the difference is
78ms (»+44% for
geminates). For fricatives (Giovanardi M., 1998) the average V1d difference
was 49 ms (»-28% for
geminates) and in Cd 98 ms (»+73%
for geminates). As regards stops (Esposito A., Di Benedetto M.G., 1999), V1d
difference was 43 ms (»-26%
for geminates) and 92 ms for the stop closure duration (+101% for geminates).
Finally, for nasals (Mattei M., Di Benedetto M.G., 1999) the average V1d
difference was 59 ms (»-32% for
geminates) and in Cd 121 ms (»+134%
for geminates).
The
straightforward comparisons that can made with previous studies are between
the C1 occlusive phase of the affricate consonants and the stop consonants
(Esposito A., Di Benedetto M.G., 1999) and also between the C2 fricative phase
of the affricate consonants and the fricatives (Giovanardi M., 1998).
There
is evidence to assert that differences between singleton and geminate
durations are less sharp and clear than in other Italian consonants.
The
singleton/geminate classification based on the Maximum Likelihood Criterion
using the Cd and Cd/V1d parameters leads to the following results, belonging
to previous studies:
1)
0.47% of errors in nasals for both parameters
2)
12 % of errors in fricatives for both parameters
3)
4% and 8% of errors in stops for Cd and Cd/V1d, respectively
In Table IV, it is possible to see that error percentages for affricates are worse compared to those obtained for other classes of consonants.
Finally, an additional similarity between nasals, fricatives, and stops was found; the Spearman Rank correlation coefficient between Cd and V1d was for all these consonants equal to values from -0.71 to -0.78, while for affricates this value is lower.
As
a general conclusion, the most relevant outcomes of the present work can be
summarized as follows.
The
general tendency of shortening the pre-consonant vowel and of lengthening the
consonant in geminate utterance, observed on stops, fricatives and nasals in
previous studies, is confirmed for affricates.
In
previous studies, significant variations in the energies of utterances have
not been observed, except for the total energy of the consonant in nasals
which was stronger in the geminates; this effect was explained by the longer
duration of the consonant since consonant power was constant. For affricates,
the tendency is to emphasize both energy and power of the geminate utterance
(even if differences are limited to very few dB). This effect seems peculiar
to the affricate class.
Finally,
we wish to point out that there are studies in progress concerning synthetised
utterances of affricate consonants obtained using the articulatory synthetiser
HLsyn (HLsyn Reference Manual, 1997); the
project will pilot an experiment of perceptive analysis in order to support
the idea that, as there is evidence in this study, the duration of phonemes is
the most important factor to distinguish a singleton utterance from a geminate
one. In order to illustrate this theory we insert below two link-buttons: the
first connects to the singleton synthetised utterance of the [ʧ] articulated with the [a] vowel, while the second connects to the
corresponding geminate synthetised utterance.
References
Abramson
A. S. (1999) "Fundamental frequency as a cue to word-initial consonant
length: Pattani Malay" ICPhS99 San Francisco pp 591-594.
Argiolas
F., Macrì F., Di Benedetto M.G. "Acoustic analysis of Italian [r] and [l],"
Journal of the Acoustical Society of America 97, no. 5, pt.2, pp.3418, 1995.
Arvaniti
A. (1999) "Effects of speaking rate on the timing of single and geminate
sonorants" ICPhS99 San Francisco pp 599-602.
Blumstein
S.E., Pickett E., Burton M. (1998). “Effects of speaking rate on
Singleton/Geminate consonant contrast in Italian”, unpublished manuscript.
Canepari
L. (1979) "Introduzione alla fonetica" Piccola Biblioteca Einaudi.
Cohn
A.C., Ham W.H., Podesva
R.
J.(1999)
"The phonetic realization of singleton-geminate contrasts in three
languages of Indonesia" ICPhS99 San Francisco pp 587-590.
Esposito
A., Di Benedetto M.G (1999), "Acoustic and Perceptual Study of Gemination
in Italian Stops", Journal of the Acoustical Society of America,
completare.
Faluschi
S. (2000) "Analisi Acustica e Sintesi delle consonanti affricate singole
e geminate in Italiano" Degree Thesis in Electrical Engineering, Rome
University "La Sapienza".
Giovanardi
M. (1998) "Analisi Acustica e Sintesi delle consonanti fricative singole
e geminate in Italiano" Degree Thesis in Electrical Engineering, Rome
University "La Sapienza".
Giovanardi
M. and Di Benedetto M.G. (1998)."Acoustic analysis of singleton and
geminate fricatives in Italian" European student journal of language and speech.
HLsyn
Reference Manual (1997)
Local
J.and Simpson A.P. (1999) "Phonetic implementation of geminates in
Malayalam nouns"
ICPhS99 San Francisco pp 592-595.
Louali
N.and Maddieson I. (1999) "Phonological contrast and phonetic realization: the case
of Berber stops" ICPhS99 San Francisco pp 603-606.
Mattei
M. and Di Benedetto M.G. (2000) "Acoustic analysis of singleton and
geminate nasals in Italian" European
student journal of language and speech.
Mattei
M. (1999) "Analisi Acustica delle consonanti nasali singole e geminate in
Italiano" Degree Thesis in Electrical Engineering, Rome University
"La Sapienza".
Muljacic
Z. (1972). "Fonologia della Lingua Italiana", (Il Mulino, Bologna).
Rochet,L.B.,
and Rochet,A.P. (1995) “The perception of the single-geminate consonant
contrast by native speakers of Italian and Anglophones” in proceedings of
ICPhS95 pp. 616-619.
Rossetti
R. (1993) "Caratteristiche acustiche del fenomeno di geminazione nelle
consonanti occlusive Italiane: applicazione all'adattamento automatico di
pronunce straniere" Degree Thesis in Electrical Engineering, Rome
University "La Sapienza".
Rossetti
R. (1994) "Gemination of Italian stops", Journal of the Acoustical
Society of America, 95, 2pSP25, pp.2874.
Shrotriya
N., Siva Sarma A.S., Verma R., Agrawal S.S. (1995). “Acoustic and perceptual characteristics of geminate Hindi stop
consonants”, in Proceedings of ICPhS9, pp.132-135.
Vecsys
(1989). The Unice User Manual (Vecsys - Chemin du Chene rond - 91570 Bièvres,
France).