January 10, 2025

Design and Implementation of AVS-M Real-Time Encoder

This article refers to the address: http://

AVS-M is part 7 of the AVS standard – a video codec standard designed specifically for mobile communication environments. The purpose of this project is to verify the performance of AVS-M in the application environment, thereby demonstrating the practical value of the standard and contributing to its industrialization process in China. Mobile communication terminals have the characteristics of low bandwidth, high bit error rate and weak computing power. Currently, MPEG-4 SP is the standard for video encoding and decoding. In the future, possible international upgrade schemes include H.264 and VC-1. This project demonstrates the practical value of the AVS-M standard by comparing the actual performance of the AVS-M and H.264 standards in the same test environment. In order to be close to the real use environment, the encoder implemented in this project needs to realize real-time audio and video acquisition, real-time encoding, and real-time output of the code stream in the form of MPEG2 TS stream through Ethernet.

VLC and x264 are two open source software released under the GPL standard. VLC is a streaming media platform that supports plug-in functions. x264 is an H.264 encoding library optimized for x86 platforms. In order to obtain the verification results as soon as possible, the project uses the VLC and x264 projects as the starting point for the design. VLC implements real-time audio and video capture, H.264 encoding, MPEG2 TS stream multiplexing and Ethernet output in the form of plug-ins, which fits the overall needs of the project; the AVS-M standard originated from the H.264 standard, two The structure is similar and the functions are the same. Developing the AVS-M standard based on H.264 can speed up the development process, and the same code tree can better compare the actual performance difference between AVS-M and H.264 standards. In order to better match the actual use environment, this project uses AAC+ as the coding standard for audio. VLC itself does not support AAC+ encoding function, but only supports its decoding function. Here, 3GPP engineering 26410-700 is adopted as the implementation of AAC+ standard, and AAC+ audio encoding function is realized by plug-in method.

VLC not only supports the acquisition, encoding, multiplexing and Ethernet transmission of audio and video data, but also supports the Ethernet receiving, demultiplexing, decoding and playback functions of the stream. In order to verify the actual coding effect of the encoder, this project also uses VLC as the receiving end of the code stream, and judges the performance of the encoder by watching the effect of playing in real time. The VLC supporting AVS-M decoding function is the development result of another project and will not be described in detail in this paper.

Encoder

Both audio and video encoding are computationally intensive. If you want real-time encoding, you need a powerful computing platform. Here is the hardware foundation of a Dell PowerEdge 2950 server-acting encoder. The PowerEdge 2950 features an Intel Xeon 5160 (Woodcrest) 3.0GHz dual-core CPU, 1GB DDR2 memory, SATA II hard drive, built-in dual Broadcom BCM5708C NetXtreme II GigE Gigabit Ethernet controller, and two PCI-X expansion slots for expansion Peripheral interface function. The operating system uses the Red Hat Enterprise Linux 4 (32bit) operating system.

The PowerEdge 2950 does not have an audio/video capture interface and needs to be expanded by the corresponding capture card. Here, an Osprey 230 capture card is used as the real-time audio and video capture interface. It adopts the PCI-X interface and supports the PAL/NTSC/SECAM video standard. It can collect a clear-cut video and two-channel audio in real time. The Ethernet output uses the Gigabit Ethernet interface built into the PowerEdge 2950. The overall block diagram of the encoder is shown below:

The entire encoding process is: PAL/NTSC/SECAM video signal through Composite or S-Video interface, audio enters Osprey 230 capture card through two-channel audio interface; Osprey 230 is driven by Video4Linux2 and OSS driver, VLC passes these two The interface controls the capture card, reads the audio and video data in real time, and sends the audio and video data to the AVS-M encoder and the AAC+ encoder for encoding respectively; the encoded stream generated by the encoding is sent to the MPEG2 TS multiplexer for multiplexing; The multiplexed TS stream is sent out through the Ethernet interface in UDP unicast or multicast mode.

1 code library

The development of the code base supporting the AVS-M standard is the focus of this project. The x264 code base is modified according to the similarities and differences between the AVS-M standard and the H.264 standard. The principle of modification is that the original H.264 coding function is not changed. Increase the AVS-M encoding function. In order to support both of the above criteria, the method of using the runtime switch here enables the encoding library to support both the H.264 standard and the AVS-M standard, and both can be dynamically switched. The following are the different parts of the two standards involved in the development process.

a) NAL layer

AVS-M is similar to H.264. The basic unit of the code stream is NAL. Each NAL can contain multiple syntax structures such as sequence header, image header and stripe. The difference is that in H.264, in order to avoid confusion with the start code, when 0x000001 appears inside the NAL, a 0x03 is inserted before 0x01. So when we implement AVS-M, we need to delete this module inserted into 0x03.

b) Strip upper semantics

In AVS-M, there are sequence parameter sets and image parameter sets corresponding to H.264. In addition, AVS-M adds an image header, which makes the boundary of each frame of image data clear and concise, which facilitates the implementation of the decoder. When we implement AVS-M, we also need to add image header support accordingly.

c) intra prediction

In luma intra prediction, AVS-M and H.264 have 9 modes, but their order is not the same, as shown in Figure 1.

In the implementation process, we used a mapping table to link the two different sort orders, so that the code changes are minimized. Of course, we also need to modify the nuances of intra prediction according to the standard.

In addition, there are only 4x4 intra prediction modes in AVS-M, and H.264 has 16x16 and 8x8 modes, so we have to turn off two modes that are not used. In terms of chrominance, AVS-M does not have to be based on "plane" predictions, so it should also be removed from X264. Finally, note that if an intra-predicted macroblock occurs in an inter-predicted frame (P-frame), the predicted value of the intra-prediction mode of its adjacent inter-predicted block is defined as unavailable in AVS-M ( -1), and is defined as DC prediction mode (2) in H.264.

d) motion vector prediction

In AVS-M, the motion vector prediction of the current block is the motion vector using its lower left, upper and upper right corners, and H.264 is the motion vector using the upper left, upper and upper right corners, as shown in Figure 2.

In addition, the calculation method of the motion vector predictor is slightly different.

e) Fractional pixel interpolation

In AVS-M and H.264, sample values ​​of half-pixel precision are methods using bilinear interpolation, although they use different filters. The most important thing to note is that AVS-M uses the "star" method when the horizontal and vertical directions are quarter-precision sample prediction, while H.264 uses the "diamond" method, as shown in Figure 3. Shown.

In AVS-M, e, g, p and r are calculated using the following formula.

e=( F+j+1 ) >> 1


g=( G+j+1 ) >> 1


p=( N+j+1 ) >> 1


r=( O+j+1 ) >> 1

In H.264, e, g, p, and r are calculated using the following formula.

e=( b+h+1 ) >> 1


g=( b+m+1 ) >> 1


p=( h+t+1 ) >> 1


r=( m+t+1 ) >> 1

f) transform and inverse transform

AVS-M and H.264 use a similar integer DCT transform, and it is important to note the chrominance aspect. In AVS-M, chrominance uses the same transform method as luminance, and in H.264, the DC component of chrominance is transformed again.

1.1.2.1 Quantization and inverse quantization

AVS-M and H.264 use a similar quantization method, which uses look-up tables, multiplications, and shifts to avoid division. It should be noted that the AVS-M should map the quantized parameters once as the quantization parameter of the chrominance.

g) variable length coding

AVS-M uses a context-based multi-step Columbus code, while H.264 has a special CAVLC or CABAC encoding. It should be noted that in the AVS-M, a "direct" mode is added to the intra prediction, that is, all 4x4 blocks use the prediction mode. Therefore, when we are variable length coding, we must first judge whether the "direct" mode appears, and then proceed with the corresponding processing.

h) loop filtering

Both AVS-M and H.264 have loop filtering, which can significantly reduce blockiness and improve visual quality. Their specific implementation is different, in general AVS-M is more simplified than H.264.

i) Debugging

In the debugging process, we used the comparison method. That is, the predicted value and residual of each frame are stored in a file from the encoder side, then decoded by a standard decoder, and the predicted values ​​and residuals are compared while being decoded, and then the erroneous macroblock is determined and debugged. . This ensures the correctness of the encoder by comparing the reconstructed image at the encoder end with the output image at the decoder end.

optimization

Video coding requires a lot of computing resources, and it is difficult to meet the requirements of real-time encoding without optimizing for a particular platform. The hardware platform used in this project is Intel's Xeon series, which has accelerated instruction sets such as MMX, SSE, and SSE2. X264 itself has been optimized for the MMX and SSE instruction sets. Given the similarity between AVS-M and H.264, most of the optimization strategies for H.264 should be applied to the AVS-M standard. Because the code streams generated by the optimized and unoptimized encoders under the same input should be binary equal, in the development process, under the premise of the same input, it is relatively unoptimized and optimized whether the code streams output by the two versions are Binary equals to determine which optimization modules are shared between AVS-M and H.264.

In the specific comparison, the dichotomy method is adopted to speed up the comparison. First, half of the optimization modules are masked, and then the code streams generated when the optimization module is turned on and the optimization module is turned off are compared. If they are equal, the currently enabled optimization module is AVS-M and H.264 can be shared. If it is not equal, the range is reduced and the straight line can determine each optimization module. After the above comparison, it is finally determined that only four optimization modules can not be shared, and other modules can be shared. Two modules that cannot be shared can be solved by modifying the C code, and the other two need to modify the MMX/SSE assembly code. .

test

The comparative analysis of the encoding and decoding effects of audio and video is mainly based on two aspects - objective indicators and subjective feelings. At present, there have been many comparison tests based on objective indicators (PSNR) for AVS-M and H.264. It is not necessary to repeat these tests again. This article will focus on the audience's subjective feelings under real-time coding conditions. Comparison test. The test is mainly based on whether the viewer can feel the obvious distortion when watching the audio and video processed by the real-time encoding. The survey is divided into four aspects, namely the clarity and coherence of the video and the clarity and consistency of the audio. Each aspect is scored according to the actual experience of the audience. The criteria for scoring are as follows:

The environment of the comparison test is set to video frame rate of 25, fixed rate control mode, closed loopfilter function, GOP is 15, and H.264 uses baseline level; audio sampling rate is 48000, dual channel, AAC HE encoding format, code rate 52 kbps . The results of the test are recorded in the table below (the four letters represent "video clarity", "video coherence", "audio clarity" and "audio coherence".

Summary of this article

Comparing the test results data, it can be seen that AVS-M has performance close to H.264 under low bit rate (32-512Kbps) and low resolution (SQCIF~CIF), but overall it is backward with H.264. standard. Considering the limited computing power of mobile terminals and the computational complexity of AVS-M over H.264, we can be sure that AVS-M should have a place in the future of mobile communications.

references

[1] Zhou Dashan, Li Hua, Zhang Shufang, etc. Design and Implementation of AVS-M Video Decoder. TV Technology, 2005, 8: 10-11
[2] Advanced Audio/Video Coding of Information Technology: Video [S]. National Standard of the People's Republic of China. GB/T20090.2-2006, 2006
[3] Gao Wen, Wang Qiang, Ma Siwei. AVS digital audio and video codec standard. ZTE Technology, 2006, 6: 6-9

Mini PC Core I7

Intel Nuc I7,Mini Pc I7,Intel Nuc 10 I7,Mini Cpu I7

Guangdong Elieken Electronic Technology Co.,Ltd. , https://www.elieken.com