Research Case Study

Impact: New techniques pioneered to measure speech and video over the internet

Couple watching film on phone

Essex researchers’ pioneering techniques for measuring speech and video quality of media transmitted over the internet has improved streaming services globally.

A new time-alignment method for measuring quality, developed in collaboration with Psytechnics, was incorporated into Microsoft’s Skype for Business product, and by 2016 was being used to measure the voice quality for about 100 million users globally, thereby improving the quality of teleconferencing.

The same time-alignment technique was critical in a new video quality measurement standard which has been used worldwide since 2008 to improve video delivery services and is still in use today.

The challenge

The advent of the internet as a mainstream medium has led to a revolution in digital media. The transition of mass media transmission from traditional broadcast TV and telephone systems to the internet, which started around the year 2000, is now commonplace.

In internet streaming, data is split into individual packets that may be subject to delays. This was a problem for existing measurement systems that relied upon the received signal being exactly aligned with the original.

Previous techniques to measure speech quality required either adding disturbing signals during the communication or measuring the communication system before or after the actual conversation. This meant it was not possible to measure the quality of a conversation while the conversation was in progress – something essential for internet streaming.

This meant completely new methods were needed to measure the quality of speech and video transmitted over the internet.

What we did

Essex researchers investigated the problem of time-alignment in measuring media quality transmitted over the internet.

Our researchers addressed the issue by pioneering practical statistical methods to align the media allowing the measurement of speech and video quality in the new packet-based transmission medium of the internet. The technique relies upon creating a histogram of audio or video events and using this to compare over previous/later frames to find the optimal alignment point.

In collaboration with Psytechnics and BT, the Essex team developed a non-intrusive technique for measuring speech quality by modelling the human vocal tract and comparing the measured speech with this model to discriminate between real speech and errors in the transmission.

This meant, speech quality can be determined by simply monitoring the transmitted voice data, while the conversation is in progress, without disturbing the conversation. 

What we changed

The work at the University of Essex on the measurement of speech and video quality has had global impact on speech and video streaming services.

The time-alignment work, in collaboration with Psytechnics, has become an essential part of the systems used by industry to measure and improve internet speech communication quality and reliability. The findings of this research were incorporated into Microsoft’s Skype for Business teleconferencing product (also called Lync) with more than 100 million users. Without such a component, Microsoft would not have been able to determine the speech quality of its product.

This work was used by Psytechnics as an essential component of their measurement products and for the Psytechnics contribution to the ITU-T standard for video quality measurement J.247, which continues to be a standard method for measuring the quality of encoded video transmitted over packet media such as the internet.

This objective measurement system, using the time-alignment process produced by the University of Essex, replaces time consuming subjective measurements (using large panels of users), thus significantly reducing costs to industry.

A new non-intrusive speech measurement system, developed by Essex in collaboration with Psytechnics, formed the basis of the first practical implementation of a method to measure speech quality without having access to the original signal. The work led directly to the ITU standard P.563 and formed a key product for Netscout which contributed $500 million annual revenue to the company by 2018.

Through this product, Essex research was used to monitor networks of service providers, government agencies, large financial institutions and other enterprises across the globe to improve the quality of speech communication.