In NGMN scenarios, speech quality is expected to vary during a call, as a result of changing conditions in the connection, network handover and/or codec changeover. Thus, in order to quantify quality, the entire length of a call has to be considered. Standard listening-only tests which make use of speech samples of 4-8 s length (ITU-T Rec. P.800, 1996) are not suitable for this purpose. On the other hand, conversational tests - despite being comparable to normal telephone usage and thus being ecologically valid - place a content-related focus on the user's attention; in such a situation, users are generally less analytic in their judgments, and it might happen that subtle perceptual differences get blurred.
As a compromise, we opted for a twofold test protocol: (a) We simulated conversations of 60s length by concatenating 5 meaningful speech segments alternating with pauses, playing them back to the test participants, asking them to answer content-related questions during the pauses, and asking for an overall quality judgment at the end of the simulated conversation; this approach has been developed by Berger and is now recommended for call-quality measurement in ETSI Technical Report 102 5 06 v. 1.1.1 (2006); (b) The composing segments of the simulated conversations and some additional segments of approx. 6 s length were presented to the participants in a standard listening-only context, asking for an overall quality rating after each sample. We carried out two tests of the first type (Tests la and 2a) and two corresponding tests of the second type (Tests lb and 2b). The following subsections describe the test conditions, set-up and participant group in more detail.
Test Conditions
In the experimental work, Test 1a concentrated on WB/NB transitions and the effects of packet loss on perceived quality. It contained two conditions with pure NB and WB calls, 4 conditions where packet loss continuously increases until the middle (3rd segment) of the call, and then switching occurs to a loss-free network with a different codec (or not), and 4 conditions where NB→WB or WB→NB transitions occur at the beginning (2nd segment), or at the end (4th segment) of a call. We consider packet loss rates of 10-20%to be realistic constraints when a handover should be executed at latest. Table 1 summarizes the conditions. The corresponding Test 1b contains all segments of the simulated conversations, plus additional samples with similar degradations, addressing also Flash-OFDM networks. The resulting list of 26 segments for this test is presented in Table 3.
No. | Network(s) | Codec(s) | Ppl per segment |
---|---|---|---|
1 | H | G.711 | 0 |
2 | W | G.722.2 | 0 |
3 | H | G.711 | 0,10,20,10,10 |
4 | W | G.722.2 | 0,10,20,10,10 |
5 | H→W mid. | G.711→G.722.2 | 0,10,20,0,0 |
6 | W→H mid. | G.722.2→G.711 | 0,10,20,0,0 |
7 | H→W beg. | G.711→G.722.2 | 0 |
8 | H→W end | G.711→G.722.2 | 0 |
9 | W→H beg. | G.722.2→G.711 | 0 |
10 | W→H end | G.722.2→G.711 | 0 |
Test 2a was designed to put a magnifier on the most interesting findings of the first test. It focused on the switching position within a simulated conversation, as well as on additional packet loss rates (see Table 2). The corresponding Test 2b with short samples also included different network load and high packet-loss-rate scenarios for limited WLAN networks (overall 27 conditions).
No. | Network(s) | Codec(s) | Ppl per segment |
---|---|---|---|
1 | W | G.722.2 | 0 |
2 | H | G.711 | 0 |
3 | H→W beg. | G.711→G.722.2 | 0 |
4 | H→W mid. | G.711→G.722.2 | 0 |
5 | W→H mid. | G.722.2→G.711 | 0,3,3,0,0 |
6 | W→H mid. | G.722.2→G.711 | 0,5,5,0,0 |
7 | W→H mid. | G.722.2→G.711 | 0,10,10,0,0 |
8 | W→H→W | G.722.2→G.711 | 0 |
9 | H→W→H→W | G.711→G.722.2 | 0 |
10 | W | G.722.2 | 0,0/5,5,5/0,0 |
No. | Network(s) | Codec(s) | Ppl per network |
---|---|---|---|
1 | F->H | G.722.2 | 0 |
2 | F->H | G.722.2 | 10,0 |
3 | F->H(a.) | G.722.2->G.711 | 0 |
4 | F->H(b.) | G.722.2->G.711 | 0 |
5 | H | G.711 | 0 |
6 | H | G.711 | 10,0 |
7 | H | G.711 | 20,0 |
8 | H | G.711->G.722.2 | 0 |
9 | H | G.722.2->G.711 | 0 |
10 | H->F | G.711 | 0 |
11 | H->F | G.711 | 10,0 |
12 | H->F(a.) | G.711->G.722.2 | 0 |
13 | H->F(b.) | G.711->G.722.2 | 0 |
14 | H->W | G.711 | 0 |
15 | H->W | G.711 | 10,0 |
16 | H->W(a.) | G.711->G.722.2 | 0 |
17 | H->W(b.) | G.711->G.722.2 | 0 |
18 | H->W(b.) | G.711->G.722.2 | 20,0 |
19 | W | G.722.2 | 0 |
20 | W | G.722.2 | 10,0 |
21 | W | G.722.2 | 20,0 |
22 | W->H | G.722.2 | 0 |
23 | W->H | G.722.2 | 10,0 |
24 | W->H(a.) | G.722.2->G.711 | 0 |
25 | W->H(b.) | G.722.2->G.711 | 0 |
26 | W->H(b.) | G.722.2->G.711 | 20,0 |
Test Setup
Tests 1a/b and 2a/b were carried out at distinct points in time, with different participant groups. Test participants were invited to a sound-insulated laboratory, were instructed about the purpose of the test, and listened to the samples in three sessions of approx. 25 min. each (2 sessions for parts a, 1 session for parts b). Speech samples were presented over a Sennheiser HMD 410 headset at a comfortable listening level, with a background level below 35 dB(A) (ITU-T Rec. P.800, 1996). At the end of each simulated conversation of part a, as well as after each sample of part b, participants had to rate the overall quality on a 5-point absolute category scale, with 5 corresponding to "excellent" and 1 to "bad" quality. The test set-up and scale followed mainly the requirements given in ITU-T Rec. P.800 and ETSI Technical Report 102 506 v. 1.1.1. 13 participants took part in Test 1a, 24 in Test 1b, 14 in Test 2a, and 17 in Test 2b. They were recruited from the normal telephone-user population, did not report any hearing impairment, and received a voucher in return for their effort.
No comments:
Post a Comment