January 09, 2025

Song Li: Interpretation of the current status and future trends of codecs

"It's a good teacher, not a good one, but not pure." - Song Li.

Upon receiving an interview with Song Li, the text began to be a self-deprecating, making the original boring technical interviews vivid. Compared with previous interviews, this article has a larger amount of information and has to admire the three points of rigorous academic circles. This article is the sixth article of the "Next Generation Codec" series interview, invited to Shanghai Jiaotong University as a researcher, doctoral tutor Song Li, he comprehensively interpreted the status quo and future trends of codecs.

LiveVideoStack: Please briefly introduce yourself, as well as the current main work direction, which technologies or areas are interested?

Song Li: Thank you for your invitation. I am currently a researcher at Shanghai Jiaotong University and a doctoral tutor. The research direction is video coding, image processing and computational vision.

I belong to a wide range of interests and are curious about all kinds of wonders. As far as video is concerned, from the production, distribution and consumption of video, the related technologies of the entire link will be noticed. Recent concerns are:

Production: UHD/HDR, 360VR and other new video content acquisition, processing, synthesis technology; artificial intelligence-based video processing technology; Video over IP; cloud media production system;

Distribution/delivery: new coding standards/technology, hybrid cloud + edge media processing architecture, low latency transport related protocols and technologies (CAMF, WebRTC, HTTP2)

Video consumption (endpoint): new video player/terminal software that supports cross-screen sync, dynamic rendering (VR), strong interaction (AR) and AI

Video Experience Evaluation (QoE), Video Content Protection (DRM), etc., such as the marriage of blockchain and media services.
Song Li: Interpretation of the current status and future trends of codecs

About the codec

LiveVideoStack: What kind of codec is a good codec? Video quality, code rate, algorithm complexity, robustness to data loss or errors, etc.

Song Li: When we evaluate whether a thing is good or bad, we first need to give a measure. In a broad sense, the codec's measure is a function of multiple dimensions, including: compression efficiency (ie, code rate and quality), algorithm complexity (time, space complexity), implementation complexity (platform support capabilities, technology) The market share of personnel, system indicators (parallelism, scalability, delay size, error resistance). Since these dimensions are mutually constrained, in actual research and development, only the main dimensions can be taken and other aspects can be sacrificed to achieve a certain balance. For example, encoders such as JPEG2K and TICO used in the production of video content emphasize low complexity, and the compression ratio is not high in HEVC and H264 in the distribution field. For example, the H264 encoder is divided into different profiles or types. The live encoder code rate is usually 1.5 to 2 times that of the offline encoder, which is to sacrifice the compression efficiency in order to reduce the delay.

LiveVideoStack: What is the main research direction of the industry for the development of codecs? What is the difficulty? Are there some typical or very optimistic scenarios?

Song Li: The research and development of codec has roughly three types of people: "When you eat in a bowl, look at the pot, and think of the field." 』

The first type of people are industry video coding technicians. Everyone mainly focuses on the optimization of codec, and solves the problem of low cost and high efficiency. It is represented by Netflix, Youtube, Twitch, Hulu, Qiyi, Tencent and Youku Tudou. Recently, I heard some of their reports, including content adaptive ABR, narrowband HD, multi-stream joint optimization, etc., all around the rate reduction, efficiency, and save money for the company's operations.

The second category is the industry standard or alliance of R & D personnel, working in the MPEG, AVS, AOM and other camps. The field of video coding is a bit like the field of wireless communication. The standard organization composed of industry has to go forward and synchronize every 8 to 10 years to update the generation standard. The basic technical logic behind it is RD (Rate Distortion Efficiency), multiplication law and Moore's Law (used to combat the increase of complexity). Currently, the two camps are more lively: ITU/MPEG FVC And AMO/Google's AV1, as a pool of two coding technologies, the former has been officially Called Proposal (CfP), next February to evaluate each of the standard scheme, determine the new reference platform 1.0, the early algorithm detection software JEM The RD efficiency has exceeded 30% gain (subjective performance is higher), so the next generation video coding standard H.266 should be confident in 2020, but the codec complexity is currently high (10 times). The AV1 advances faster and has been closed. The final Spec will be officially released by the end of this year. In terms of performance, AV1 is mainly compared with the previous VP9, ​​Netflix and Google test, RD efficiency increased by more than 30%. AV1 decoding complexity control is better, compared with VP9, ​​only increased by more than 2 times. Therefore, the recent work focus of these two camps should be different. FVC will mainly focus on the proposal and optimization of coding algorithm modules and participate in standard warfare; AV1 should focus on platform design and optimization and reduce coding complexity from next year. etc.

The third category is the second type of hardening team and explorer of subversive technology. In the past, each generation of video coding industry cycle was basically promoted in the order of technology, standards, software and hardware. Generally, the hardware must wait until the standard is set, and the deployment and demand reach a certain scale. Therefore, most of the hardware teams are still iterating the version deployed on the current market, and staying on the new standard. However, the Internet camp is iteratively fast. While AV1 is doing software, the hardware reference design and IP core are basically synchronized, and the difference is several months to half a year. Therefore, it is expected that the hard coding of AV1 will be released in the second half of 2018.

Another wave of people in the academic world is currently trying to use new technologies to innovate the coding architecture, such as artificial intelligence, neural computing models. As far as coding is concerned, the basic structure of video coding (waveform hybrid coding) has not changed much in the past few decades, mainly due to the continuous replacement and improvement of the three components (and parts) of budget, transformation and entropy coding. Recently, the academic community has proposed some research work to improve the coding of HEVC with deep learning, but most of them focus on intra-ring filtering or post-processing, or belong to module-level improvement, no structural breakthrough, no End to End learning ability. Experts in the FVC and AV1 camps generally believe that in the short term, deep learning is still not a variety of coding modes (operators) of artificial fine tune, and it has to continue to lurk for a while.

In terms of application scenarios, individuals pay more attention to two: one is real-time video service scenes (like catching dolls, online teaching, video conferencing, etc.), and the other is large-resolution infiltrating VR video services, pushing video encoding to a higher level. Faster and stronger. Although VR was cold in 2017, but undercurrent, the technical footsteps have not slowed down, and all major players are accumulating. It is expected that there will be a big chance in the next two years.

LiveVideoStack: Everyone has started to study H.266, AV1 and domestic AVS2. What are their characteristics? What are the respective application scenarios?

Song Li: First of all, look at H.266, which is exactly FVC (Future video coding). The official H.266 will not be counted until the standard is released to the ITU in 2020. It is the successor to HEVC/H.265 and is currently in the CfP phase. A number of potential technologies have been unveiled in the previous Study and CfE (Call for Evidence) phases. Representative tools include:

Block structure: quadtree plus binary tree (QTBT) block structure, large coding optimization unit (CTU maximum 256x256, TU maximum 64x64);

Intra prediction: 65 directions, multi-tap interpolation, advanced boundary filtering, cross-CTU prediction (CCLM), position-dependent combined prediction (PDPC), adaptive reference sample smoothing, etc.

Interframe prediction: OBMC model, affine motion compensation, bidirectional optical flow, local illumination compensation (LIC), higher precision MV, local adaptive MV, advanced MV prediction, sub-PU level MV prediction, decoder MV refine, etc. (DMVR) );

Transformation aspects: DCT/DST multi-core transformation, mode-dependent indivisible quadratic transformation, signal-dependent transformation, etc.

In-loop filtering: bilateral filter (BLF), adaptive in-loop filter (ALF), content adaptive limiting, etc.

Entropy coding: context model selection of transform coefficients, multi-hypothesis probability estimation, better context model initialization, etc. It should be noted that the official snoring has not yet begun. Some of the above modules may be replaced or kicked out by new modules, and some modules will be further refined and optimized. In terms of applications, it may be too early to say that UHD/HDR, VR and other large-resolution video services should be the main export.

For AV1, there is no comprehensive and in-depth analysis. The following information comes from the Google team's articles and reports this year. Representative tools include:

The coding structure: SuperBlock, the size can be 128x128.

Intra prediction: 65 direction prediction, Top-right boundary expansion, Alt mode for smooth regions, luminance prediction chrominance, reference pixel smoothing.

Interframe prediction: 8 rectangles + demarcation, OBMC, extended reference frame (ERF), Dual MC interpolation filtering (horizontal and vertical filters can be different), global motion compensation, new MV prediction methods, etc.

Transformation aspect: extended mode (horizontal and vertical transform can be combined from different types of transform kernels), recursive mode (adaptively splitting transform units according to residual characteristics), merge mode (transform unit covers multiple prediction units), rectangular transformation, etc. ;

In-loop filtering: Directional Deringing Filter (DDF), Conditional Low Pass Filter (CLPF), Wiener Filter, etc.

Entropy coding aspects: non-binary multi-symbol entropy coding, adaptive context, new identity set, etc. It should be noted that their code iterations have been faster in the past year, and some module-level technologies have been replaced or merged. On the application side, AV1 should be the market with HEVC, especially in the IPTV and OTT areas.

In terms of AVS2, in general, most of the module technologies are aligned with HEVC. The featured technology is the background frame prediction technology introduced for video surveillance scenarios. The performance is outstanding in still camera scenes, which is significantly better than HEVC (40%~50% gain). . Regarding the technical details of AVS2, the WeChat public number of the "Intelligent Media" of Peking University's Research Institute has a series of technical posts for introduction, so I will not repeat them here. In terms of application, the State Administration of Press and Publication of the State Administration of Press and Television issued a document some time ago, clearly requiring 4K ultra-high-definition TV terminals to support AVS2, which will promote AVS2 landing.

LiveVideoStack: When developing or optimizing a codec, will you consider the related patents and the cost of the costs?

Song Li: Because I am not a commercial encoder, some of the thinking is for reference only. There are roughly three types of R&D encoder companies: relying on codecs to make money; self-developing codecs for their own use; selling hardware platforms to send encoders.

The first type of company is a professional encoder company or team, such as Harley, Atmel, Envivo (acquired by Ericsson), Elemental (acquired by Amazon), etc., the scale is not large, the market valuation is not high, so The patent aspect is not too much of a concern, MPEG LA is also good, HEVC Advanced is also good, do not expect to get oil from them. Therefore, these companies mainly decide the pace of development and optimization according to market demand and maturity. Natural HEVC is their main product target.

The second category of companies are mainly Internet players, such as Youtube, Netflix, and domestic iQiyi, Tencent, Youku Tudou, most of these teams are optimized based on open source software such as ffmpeg/x264/x265/vp9. Such companies are either too forked, they have the final say, use their own or free, like Youtube, Netflix, or pursue fast landing and not bad money, the terminal is mostly controlled by their own, currently using H.264 H.265 is also used, and it is not clear that the patent fee has not been paid (no one has been paid).

The third type of company represents Intel, Nvida, ARM and other companies. These platform-level giants do not care about this patent fee. In order to consolidate the status of the rivers and lakes, they are all free to send partners through the SDK. However, because the care surface is relatively wide, often the initial performance optimization generally leaves room for the first two types of companies or personnel. But as time progressed, performance gradually came up. For example, the new version of the Intel Media SDK has optimized the H265 quite well.

LiveVideoStack: It is generally believed that the quality of the hardware codec does not work well with software codecs. Is there a solution that combines the massive processing power of a hardware codec with high image quality?

Song Li: When it comes to hard encoders, the default application scenario in my mind is live, and it is a direct source (camera or decoded video sent through the contribution encoder). In this application, limited by delay and linear fast editing, leaving the encoder time window is relatively short, combined with hardware encoder calculation and buffer resources are limited, such as multipass, lookahead and other complex coding optimization methods will be disabled Therefore, compression efficiency is limited. The software encoder application scenario is more on-demand or delayed live broadcast, and can be deployed on a flexible cloud. If the parallel scalability design is good, some advanced complex coding modes can be turned on, and the compression capability can be fully applied and compressed. The efficiency is naturally better.

In addition, when we talk about image quality, it is actually more about the subjective feeling of reconstructing video, not the PSNR (or SSIM, etc.) that we usually use when optimizing the encoder. For a particular video frame, whether it is that metric, code rate or QP change, the relative change in image quality can be directly reflected. Difficult to video in different content, people's feelings about the quality of the image is very impict, from bad to good scale is uneven and non-linear. This brings a lot of freedom to the encoder optimization, and it is also the space for the coding old irons. Narrowband HD, per-title, perceptual coding, visual optimization, content adaptation, etc. all work in this interval. We often say that coding is both Tech and Art. There are a lot of know-how tricks, and old drivers with artisans tend to be better. For example, we have conducted a comparative test on a foreign commercial encoder and x265, HM. Looking at PSNR alone, commercial encoders are better than HM, even x265, but in actual scenes such as sports games, song and dance parties, etc. Subjective aspects such as uniformity and consistency (such as low scintillation between pictures), commercial encoders have obvious advantages.

In addition, the quality of the image quality depends not only on the codec itself, but also on preprocessing and post-processing. Even the image quality improvement effect is greater than the code optimization itself.

LiveVideoStack: What is the prospect of FPGA/ASIC coding and decoding?

Song Li: I understand that the original purpose of the question is to discuss the career prospects of hardware codec engineers and software codec engineers. First of all, video as the so-called largest big data, access and exchange are inseparable from compression and decompression. Therefore, the codec technology (and engineers) behind it is needed, and will not be left out in the short term. But from another perspective, it is precisely because of the ubiquity of codec, high frequency and basic characteristics, although logically often placed in the application layer, but the trend of technology sinking or hardening is very obvious. Needless to say, decoding, the player mostly calls the hard decoding built into the chip. In terms of coding, the situation is better. NTT, Socionext and Fujitsu are mainly specialized in coding chips. Most of the real Internet video service providers still use cloud + soft coding. However, as mentioned earlier, platform vendors have a trend of bargain-hunting, and Intel and Nvida are representatives. Intel, in particular, is the main platform for cloud computing. Its new CPU+FPGA computing platform is worth looking forward to, even though it is now dominated by AI against Nivida. But in terms of video, it is not difficult to speculate that the Media SDK should gradually support the FPGA level. In this sense, the FPGA/ASIC programming and decoding prospects are still good, suitable for people who practice internal skills, but the employment is not large, concentrated in a few. In comparison, the software codec threshold is low, the mouth is wide but the change is fast.

LiveVideoStack: What are the suggestions for technical students who are going to learn about codec and multimedia development from a graduate student or from other R&D fields?

Song Li: Just play the role of a good teacher, and say the level of the point (used to fool students' routines). I call it "red" and "special", or T-learning model. “Red” refers to the horizontal dimension, emphasizing the knowledge aspect, and “special” refers to the vertical dimension, emphasizing technical expertise. In an era of unconstrained technology and accelerated innovation, on the one hand, it needs to accumulate in its own field and form core competitiveness. In terms of video coding, including architecture, algorithms or platforms, there are many different layers. Some people are proficient in all aspects, especially for those who are new to the line. The result of the face is often a bit of water, it is better to play its own characteristics, specializing in a skill. On the other hand, you can't just look down on the car and don't look up the road. While accumulating expertise, we must also expand the relevant knowledge systems in a variety of ways to understand the ups and downs of the industry. For example, to optimize the coding algorithm, we must always pay attention to the new platform (lower layer changes), new architecture (upper layer changes), new applications (left side changes), new channels (right side changes). The reason why this is emphasized is that all fields are making progress, and the next king is not resting. If he does not pay attention, he may be bypassed or cut off. For example: in the past two years, deep learning has achieved good results in image super-resolution. When encoding 4K video, it can be downsampled into HD and then codec, and then super-divided into 4K through post-processing. This may have only been effective in low bit rate applications in previous years, but now this advantage may be raised to medium code rates. This is much more troublesome than doing RDO optimization directly on 4K. Therefore, you must see yourself, see the heavens and the earth, and see the sentient beings.

LiveVideoStack: Can you recommend some books and materials for system learning codec and multimedia development?

Song Li: I don't talk about specific information. In the previous interview, the big cows have given more specifics. Talk about the level of the art, summed up in 8 words:

Diligent study: reading papers, soaking the community, reading posts;

Hard training: The most no shortage of reference code in the multimedia field, from jm/hm/jem, to ffmpeg, x264/265, vlc..., spend time practicing swords;

Think more: This is similar to practicing the mind, you need to organically relate knowledge and skills to form your own system and toolbox, plug and play;

Frequent chat: If you are a tall person, you can't lose your arms, your face is thick, and you can talk to your master. It is the fastest way to improve.

In short, "Never stop learning, learning from everyone."

SMD/SMT Inductor(Coils)

An SMD inductor is a surface mount device inductor. It is a model of inductor that has no wire leads on its ends. This type of inductor is mounted directly to the surface of a circuit board via tining or soldering. They are positive reactance devices, available in many sizes and form factors optimized for particular applications. They are used in circuits that require filtering, power supplies, and many other functions. Inductance is associated with any PCB trace, via and ground plane and is usually considered a parasitic effect in all passive components and integrated circuits with wire bonds.ve components and integrated circuits with wire bonds.

Features:
â–ªIntegrally molded Power Inductor using a metallic magnetic material.Magnetic shield type and low noise
â–ªIt achieve large current,low Rdc,and compactness
â–ªGood performance in high-temperature environments with good DC superimposition characteristics
â–ªNo Halogen,RoHS compliant

Application:
Power supply around the PC,servers,communication devices,automotive electronics,compact power supply modules,others.


Drum core inductorthrough hole inductorcommon mode chokeSMD inductorSMD inductorSMD inductorSMD inductorSMD inductorSMD inductorSMD inductor

FAQ

1-MOQ?

We will work hard to fit your MOQ .Small purchase quantity is ok.

2-Payment term?

T/T, Western Union, Paypal, Credit Card

3-Delivery port?

Shenzhen, Guangzhou, Zhongshan, Hongkong.

4-Shipping date?

About 7 days when we check the payment.

5-Do you produce the core and bobbin by yourself .

Yes.we have 2 head company,1 subsidiary company.one is bobbin factory,one is core factory,last one is transformer factory.

6-Where is your factory?

In ShaanxI



Smd Inductor,Smd Power Inductor,Smt Power Inductor,Copper Wire Smd Inductor

Shaanxi Magason-tech Electronics Co.,Ltd , https://www.magason-tech.com