Estimation of Quality Judgments



Auditory tests like the ones described in the previous sections are expensive and time-consuming; thus, we checked the available quality prediction models as to whether they provide valid estimations also for NGMN handover situations. Two different types of models have been checked: A parametric model which estimates quality on the basis of the parameters describing the network conditions, and a signal-based model which estimates quality as a perceptually-weighted distance between the input and the output signals of the network under test.
We used an extended version of the E-model as a parametric model. It is based on the original algorithm described in ITU-T Rec. G. 107 for NB networks and has been modified to take into account WB transmission (by linearly extending the underlying transmission rating scale from 100 to 129), WB speech codecs (by defining codec-specific equipment impairment factors), and packet loss. The necessary modifications are recommended in the most recent update of ITU-T Rec. G. 107, Amendment 1. As the E-model does not yet consider NBWB transitions, we decided to calculate two separate scores for the samples in which such NBWB transitions occur, then calculate an average of the underlying transmission ratings, and transform this back to the MOS scale. As a signal-based type of model, we used the WB extension of the PESQ model which is described in ITU-T Rec. P.862.2 .
It has to be emphasized that both types of prediction models have not yet been validated for the scenarios investigated here. In addition, they estimate instantaneous speech quality of short samples, and not of entire conversations. Thus, we applied the predictions only to Tests 1b and 2b. The parametric E-model uses packet loss as an input parameter, which has not been manipulated in a controlled way in Test 2b; thus, for this test, only the signal-based model can be used. We compare the model estimations to the auditory test results in terms of MOS, and calculated the Pearson correlation r and the root mean squared error σ for each test. The results are given in Table 1.
Table 1: Pearson Correlation and Root Mean Squared Error between auditory and estimated MOS 
Test
WB E-model
WB-PESQ
 
r
σ
R
σ
1b
0.58
1.44
0.96
0.44
2b
n.a.
n.a.
0.89
0.70
The parametric E-model does only use general information on the network condition (average packet-loss percentage Ppl, codec type) as an input; consequently, the predictions are not accurate. In contrast to this, WB-PESQ is able to recognize speech signal degradations caused by the network handover, codec changeover and packet loss. For Test 1b, this leads to very good prediction accuracy, even better than the value of r = 0.93 which is obtained for in-scope data (Rix et al., 2006). The prediction accuracy is slightly lower for Test 2b which contains conditions with G.722.2 coding at 12.65 Kbit/s; WB-PESQ has been shown to have bigger problems in predicting the effects of this codec compared to the 23.05 Kbit/s bit-rate used in Test 1a/b.

No comments:

Post a Comment

Related Posts Plugin for WordPress, Blogger...