Protocol Stacks: Transmission Quality in VoIP Networks

QoS - Quality of delivery of multimedia information
Jitter - Signal transmission delay and its variation
Echo Cancellation In packet networks

In the tech development of VoIP, there was a need to solve the initiation, maintenance, and termination of multimedia connections in a composite network, one (or more) part of which operates on PSTN protocol, while the other(s) use IP protocol.

Both parts are quite different: PSTN was built for analog voice signal transmission, which tolerates errors but is sensitive to delay and jitter. Meanwhile, IP-based networks are built for data transmission, tolerating delay but being sensitive to errors.

So, it was necessary to adapt the VoIP network's IP part to transmit standard phone signals using something like an VoIP gateway. Moreover, VoIP networks must also deliver video information with good quality. In short, VoIP networks needed to ensure Quality of Service (QoS) for multimedia information.

The quality requirements for packet data delivery vary by application type. Some applications tolerate packet loss and can recover small parts of data from received packets. This is typical for most multimedia apps, especially audio and video. Generally, packet loss shouldn't exceed 1%. However, for compressed audio and video, even a 1% loss is unacceptable due to high sensitivity.

When discussing QoS, we refer to mechanisms ensuring the required delivery quality for specific multimedia applications, using VoIP phone connections as examples.

VoIP quality issues were covered in earlier posts, discussing two simultaneously working key protocols: the Real-Time Transport Protocol (RTP) for multimedia transport services, and the Real-Time Control Protocol (RTCP) for data flow management and congestion control.

It should me mentioned that there's no precise measure for voice or TV signal quality since it depends on human perception, making it quite subjective by design. Statistics comes to rescue. For example, the Mean Opinion Score (MOS) assesses speech quality based on numerous expert trials, evaluating the result from 1 to 5. An average MOS of 4 indicates good quality, while anything below 3.5 means poor quality.

Pic. 1 shows logical quality assessments for phone signals corresponding to different MOS ranges. Quality can also be evaluated using the R factor in percentages. An R over 93% means good phone signal quality. Users notice quality drops when R is below 70%.

If MOS and R values consistently drop, detailed connection testing is conducted to find the cause. Note that phone connection quality depends on user environment. For VoIP, MOS provides an integrated assessment of factors affecting speech quality, like delay, jitter, packet loss, echo, and room conditions.

Packet delay (latency) depends on many factors, including signal processing methods, network equipment properties, and the transmission medium.

To achieve satisfactory quality, VoIP packet delay should not exceed 150 ms. Satellite sections with up to 500 ms delay are sometimes present in VoIP connections, but many users either don't notice or adapt well.

VoIP network delay has four components:

Propagation delay due to the finite signal speed in the transmission medium (copper, fiber, satellite, etc.).
Handling delay for packet formation, compression, switching, etc.
Serialization delay to convert bits to bytes at interfaces, usually much less than the first two components.
Queuing delay, the one specific to packet networks, caused by network overload.

Jitter in VoIP results from fluctuating packet send and arrival times. If two nodes aren't connected to the same switch, packet delay can vary. Increased connection overload raises jitter. While not critical for file downloads or web surfing, streaming video and IP telephony are highly sensitive to jitter. Special memory buffers are used to combat jitter, but only up to a point since larger buffers increase overall packet delay. For instance, a buffer size increase to 300 ms can significantly degrade VoIP quality.

Thus, while delay and jitter are fundamentally different, they're interconnected. Acceptable jitter ranges from 100 to 150 ms.

Jitter compensation buffers can be static or dynamic, with dynamic buffers adjusting capacity based on recent jitter analysis.

Packet loss can degrade VoIP connection quality. Packet drops often occur during overloads, leading to missing data fragments, connection disruptions, and other issues. Furthermore, retransmitting lost packets can worsen the problem.

VoIP connection time can be divided into periods of single packet loss and multiple neighboring packet losses. Single packet loss (gap) usually doesn't affect connection quality since existing methods can conceal it. Gaps are described with their density, packet loss percentage over a period, and average duration.

Multiple losses, involving many adjacent packets, typically occur in high-density areas where packets are sent in bursts, severely degrading VoIP quality. This is measured by the percentage of packet bursts and burst density, i.e., the average percentage of lost packets in a burst.

Various mechanisms (see Pic. 2) are used for recovery. If a packet isn't received in expected time (this time can change dynamically), it's considered lost and replaced by the last successfully received packet. Since losing a packet means losing just 20 ms of voice signal, users won't notice the replacement. This is called the "concealment strategy." For example, the G.729 codec, used in traditional digital telephony, tolerates up to 5% packet loss per session. Note that this strategy is only effective for single packet loss.

Echo during calls often significantly degrades call quality. Echo is characterized by its volume and duration - the higher these are, the more annoying the effect. A delay of about 25 ms is acceptable.

Modern packet networks embed echo cancellers in low-speed voice codecs. The main parameter is the Echo Tail delay, with real values being 16, 24, 32, 64, and 128 ms. It's crucial to configure the echo canceller delay accurately during initial VoIP setup. Incorrect settings cause users to hear their own echo.

Protocol Stacks: Transmission Quality in VoIP Networks

Products in this post

Sign up for my newsletter