January 09, 2025

Application of H.264 Technology Advantage in H.323 System

This article focuses on H.323 systems that are suitable for providing multimedia services over IP networks. H.264 is a new video codec standard proposed by JVT for achieving higher compression ratio of video, better image quality and good network adaptability. It turns out that H.264 encoding is more stream-saving. Its inherent anti-loss, error-resistance and network adaptability make it ideal for IP transmission. H.264 is expected to become the preferred video standard in H.323 systems. .

This article refers to the address: http://

The H.323 system proposes the following three main requirements for the video codec standard:

(1) Some IP network access methods such as xDSL can provide limited bandwidth. Except for the audio and data occupied bandwidth, the available video bandwidth is less. This requires high video codec compression rate, so that it can be fixed. Better image quality at bit rate.

(2) Anti-loss performance and anti-error performance, adapt to various network environments, including wireless networks with packet loss and serious errors.

(3) The network adaptability is good, and it is convenient for the video stream to be transmitted in the network.

Second, H.264 is suitable for three technical advantages of H.323 system

H.264 fully considered the various requirements of multimedia communication for video encoding and decoding, and borrowed the research results of previous video standards, so it has obvious advantages. The following will combine the H.323 system requirements for video codec technology to illustrate the three advantages of H.264.

1. Compression ratio and image quality

The improvement of traditional intra prediction, inter prediction, transform coding and entropy coding makes H.264 coding efficiency and image quality further improved on the basis of previous standards.

(1) Variable block size: The size of the block can be flexibly selected during inter prediction. H.264 adopts four modes of 16×16, 16×8, 8×16 and 8×8 in macroblock (MB) division; when it is divided into 8×8 mode, it can further adopt 8×4, 4 The three sub-macroblock partition modes of ×8 and 4×4 are further divided, so that the division of moving objects can be more accurate, the prediction error is reduced, and the coding efficiency is improved. Intra prediction generally adopts two luma prediction modes: Intra_4×4 and Intra_16×16. Intra_4×4 is suitable for areas with rich detail in the image, while Intra_16×16 mode is more suitable for rough image areas.

(2) High-precision motion estimation: The accuracy of motion compensation prediction for luminance signals in H.264 is 1/4 pixel. If the motion vector points to the integer pixel position of the reference image, the predicted value is the value of the reference image pixel at that position; otherwise, the linear interpolation of the 6th-order FIR filter is used to obtain the predicted value of the 1/2 pixel position, by taking the integer and 1/ The value of the 1/4 pixel position is obtained in such a manner that the pixel value of the 2-pixel position is averaged. Obviously, using high-precision motion estimation will further reduce the interframe prediction error.

(3) Multi-reference frame motion estimation: Each M×N luma block is subjected to motion compensation prediction to obtain motion vector and reference image index, and each sub-macroblock partition in the sub-macroblock has different motion vectors. The process of selecting the reference image is performed at the sub-macroblock level, and thus the plurality of sub-macroblocks in one sub-macroblock use the same reference image for prediction, and the reference image selected between the plurality of sub-macroblocks of the same slice It can be different, this is the multi-reference frame motion estimation.

(4) The selection of the reference image is more flexible: the reference image may even be an image using bidirectional predictive coding, which allows the image matching the current image to be selected as a reference image for prediction, thereby reducing the prediction error.

(5) Weighted prediction: The encoder is allowed to weight the motion compensation prediction value with a certain coefficient, so that the image quality can be improved under certain scenarios.

(6) Elimination of blockiness filter in motion compensation loop: In order to eliminate the block effect introduced in the prediction and transformation process, H.264 also adopts the elimination block filter, but the difference is the elimination block effect of H.264. The filter is located inside the motion estimation loop, so the image after the block effect can be used to predict the motion of other images, thereby further improving the prediction accuracy.

2. Anti-lost and anti-error aspects

The use of key technologies such as parameter sets, slice usage, FMO, and redundancy chips can greatly improve the system's anti-loss and error-resistance performance.

(1) Parameter set: The parameter set and its flexible transmission method will greatly reduce the possibility of error due to the loss of key header information. In order to ensure that the parameter set reliably reaches the decoder end, the same parameter set may be transmitted multiple times or multiple parameter sets may be transmitted in a retransmission manner.

(2) Use of slices: Images can be divided into one or several slices. By dividing the image into multiple slices, the spatial visual impact when a slice cannot be decoded properly is greatly reduced, and the slice also provides a resynchronization point.

(3) PAFF and MBAFF: When encoding interlaced images, there is a large scanning interval between the two fields, so that the spatial correlation of adjacent two rows in the frame is relative to the progressive When scanning, it will be reduced. At this time, encoding the two fields separately will save the code stream. For the frame, there are three alternative encoding modes, which encode the two scenes as one frame or separately encode the two fields or combine the two scenes as one frame, but the difference is that the frames are vertically adjacent. Two macroblocks are combined to encode a macroblock pair. The first two are called PAFF coding, and the field mode is effective when encoding the motion region. The non-motion region has a larger correlation due to the adjacent two rows, so the frame mode is more effective. When the image has both the motion area and the non-motion area, at the MB level, the field mode is adopted for the motion area, and the frame mode is more effective for the non-motion area, which is called MBAFF.

(4) FMO: FMO can further improve the error recovery capability of the chip. Through the use of slice groups, FMO changes the way images are divided into slices and macroblocks. The macroblock to slice group mapping defines which slice group the macroblock belongs to. Using FMO technology, H.264 defines seven macroblock scan modes.

(1) Intra prediction: H.264 draws on the experience of previous video coding and decoding standards in intra prediction. It is worth noting that in H.264, IDR images can invalidate the reference picture buffer, and the subsequent pictures are decoded. The image before the IDR image is no longer referenced, so the IDR image has a good resynchronization effect. In some channels with severe packet loss and severe error, the IDR image can be transmitted from time to time to further improve the error resistance and anti-drop performance of H.264.

(2) Redundant image: In order to improve the robustness of the H.264 decoder in the event of data loss, a method of transmitting redundant images may be employed. When the basic image is lost, the original image can be reconstructed by the redundant image.

(3) Data partitioning: Since information such as motion vectors and macroblock types is of higher importance than other information, the concept of data partitioning is introduced in H.264, and the syntax elements related to each other in the slice are placed in the same In a division. There are three different types of data partitioning in H.264. The three types of data partitioning are transmitted separately. If the information of the second type or the third type is lost, the error recovery tool can still use the information in the first type of partition to perform the lost information. Proper recovery.

(4) Multi-reference frame motion estimation: Multi-reference frame motion estimation can not only improve the coding efficiency of the encoder, but also improve the error recovery capability. In the H.323 system, by using RTCP, when the encoder knows that there is a reference image loss, the image that the decoder has correctly received can be selected as the reference image.

(5) To prevent the spatial spread of errors, the decoder side can specify that the macroblocks in the P slice or the B slice do not use adjacent non-intra coded macroblocks as references when doing intra prediction.

3. Network adaptability

To accommodate a variety of network environments and applications, H.264 defines the Video Coding Layer (VCL) and the Network Extraction Layer (NAL). The VCL function is to perform video codec, including motion compensation prediction, transform coding and entropy coding; NAL is used to package and package VCL video data in an appropriate format.

(1) NAL Units: Video data is encapsulated in an integer byte of NALU, and its first byte marks the type of data in the unit. H.264 defines two package formats. A packet-switched network (such as an H.323 system) can encapsulate the NALU using the RTP encapsulation format. Other systems may require the NALU to be transmitted as a sequential bitstream. For this purpose, H.264 defines a transport mechanism in the bitstream format, which uses the start_code_prefix to encapsulate the NALU to determine the NAL boundary.

(2) Parameter set: Header information such as GOBGOP images in video coding and decoding standards is crucial in the past, and the loss of packets containing such information often causes images related to such information to be undecodeable. To this end, H.264 transfers these little changes and the information that contributes to a large number of VCL NALUs in the parameter set. There are two types of parameter sets, namely the sequence parameter set and the image parameter set. To accommodate multiple network environments, parameter sets can be transmitted in-band or out-of-band.

Third, implement H.264 in H.323 system

Since H.264 is a new video encoding and decoding standard, there are some problems in applying H.264 in the H.323 system, such as how to define the H.264 capability of the entity in the H.245 capability negotiation process, so it must be H The .323 standard is supplemented and modified as necessary. To this end, ITU-T has developed the H.241 standard. This article only describes the modifications related to H.323.

First, it is necessary to specify how to define H.264 capabilities during the H.245 capability negotiation process. The H.264 capability set is a list of one or more H.264 capabilities. Each H.264 capability includes two mandatory parameters, Profile and Level, and several optional parameters such as CustomMaxMBPS and CustomMaxFS. In H.264, Profile is used to define the encoding tools and algorithms that generate the bitstream, and Level is the requirement for some key parameters. The H.264 capability is included in the GenericCapability structure, where the CapabilityIdentifier is of type standard and has a value of 0.0.8.241.0.0.1, which is used to identify H.264 capabilities. MaxBitRate is used to define the maximum bit rate. The Collapsing field contains H.264 capability parameters. The first entry in the Collapsing field is Profile, the ParameterIdentifier type is standard, the value is 41, which is used to identify the profile, the ParameterValue type is booleanArray, and the value identifies the profile, which can be 64, 32, or 16. These three values ​​represent Baseline and Main in turn. And Extended three profiles; the second entry of the Collapsing field is Level, the ParameterIdentifier type is standard, the value is 42, which is used to identify the Level, and the ParameterValue type is unsignedMin, whose value identifies the 15 optional levels defined in H.264 AnnexA. value. Several other parameters appear as optional items.

Secondly, because the organization of images in H.264 is different from the traditional standards, some original H.245 signaling is not applicable to H.264, such as videoFastUpdateGOB in MiscellaneousCommand, so H.241 redefines several letters. Let the corresponding function be provided.

Finally, the H.264 RTP package reference RFC 3550, the payload type (PT) field is not specified.

Fourth, the conclusion

As a new international standard, H.264 has achieved success in terms of coding efficiency, image quality, network adaptability and error resistance. However, with the rapid development of terminals and networks, the requirements for video codec are constantly improving, so H.264 is still being improved and developed to meet new requirements. The current research on H.264 focuses on how to further reduce the codec delay, algorithm optimization and further improve the image quality. At present, there are more and more video conferencing systems using H.264 for encoding and decoding, and most of them have achieved interoperability on the Baseline Profile. With the continuous improvement of H.264 itself and the continuous popularization of video communication, it is believed that the application of H.264 will become more and more extensive.

Laptops

Laptops,windows Laptops,win11 Laptops,win10 Laptops

Jingjiang Gisen Technology Co.,Ltd , https://www.gisengroup.com