January 10, 2025

Analysis of OCR Recognition Technology Based on FPGA Heterogeneous Acceleration

OCR is widely used in scenarios such as general-purpose text recognition. OCR recognition based on FPGA heterogeneous acceleration has the advantages of small delay and low cost compared with CPU/GPU implementation. We designed a multi-FPGA chip collaborative heterogeneous acceleration architecture, which can quickly adapt to the service OCR model changes. The overall performance of the detection and recognition is 130% for GPU P4, and the processing delay is only 1/10 of P4 and 1/30 of CPU.

1. Character Recognition Technology - OCR

OCR technology, in general terms, is a method for detecting and recognizing characters from images. It has been widely used in application scenarios such as universal text recognition, electronic books, automatic information collection, and license identification. The OCR of the common scenario is therefore a challenging research field in the field of artificial intelligence. It does not need to be customized for special scenes, and can recognize the text in any scene picture.

给AI换个“大动力小心脏”之OCR异构加速

The general OCR technology consists of two key technologies: text detection and text recognition. The role of the detection model is simply to determine where there are words in the picture and to frame the areas with words. Text recognition uses the text detection box as an input to identify the characters in it.

In recent years, deep learning has been gradually applied to the field of time series data modeling such as audio, video, and natural language understanding. Improving the effect of Sequence Learning through end-to-end learning of deep learning has become a hot topic of current research. The basic idea is to combine CNN with RNN: CNN is used to extract image features with representation ability, introduce serialization features of RNN into text detection, increase context information of text detection candidate regions, and effectively improve the performance of text detection tasks. . The hybrid network of CNN+RNN takes the effect of text string recognition to a new level.

1: CRNN network structure

1: CRNN network structure

* The above figure is quoted from "An End-to-End Trainable Neural Network for Image-based Sequence RecogniTIon and Its Applica TIon to Scene Text RecogniTIon".

Let's take the CRNN model, which is widely used at present, as an example. It is a combination of DCNN and RNN. It can be learned directly from sequence tags without detailed annotation; it has much fewer parameters than the standard DCNN model. At the same time, CRNN strictly preserves the order between image features and recognition content sequences, and is good at identifying text sequences that are difficult to segment words.

The architecture consists of three parts:
1) Convolution layer, extracting the feature sequence from the input image, and performing spatial preservation order compression on the image, which is equivalent to forming several slices in the horizontal direction, and each slice corresponds to one feature vector;

2) The loop layer predicts the label distribution of each frame; the double-layer bidirectional LSTM is used to further learn the context features, and accordingly the character categories corresponding to the slices are obtained.

3) TranscripTIon layer, using CTC and forward backward algorithm to solve the optimal label sequence.

2. OCR Acceleration Architecture

Depending on the programmability, high performance, and high communication bandwidth of the FPGA, we designed a heterogeneous acceleration architecture with multiple FPGA chips. The single chip is deeply customized for one type of model, and the acceleration process of the entire hybrid model is completed by load balancing and pipelining between different chips.

Figure 2: OCR Acceleration Hardware Architecture

Figure 2: OCR Acceleration Hardware Architecture

FPGA 0 is configured as a general CNN acceleration architecture

FPGA 1 is configured as a general-purpose LSTM acceleration architecture

Use a CPU for calculations with a small amount of FC to maintain model flexibility

FPGA and server CPU communicate data through PCIe Gen3, load balancing is controlled by CPU

Data exchange between FPGAs through AURORA lightweight protocol, data exchange delay ns level, similar to memory sharing between different boards

Subsequent platform upgrades support multitasking parallel/schedule scheduling between servers

The underlying deep architecture optimization is performed for a specific deep learning model, and the performance of the heterogeneous acceleration device is fully utilized through optimization at the architectural level to achieve maximum computational gain.

Copper Tube Terminals Without Checking Hole

Our company specializes in the production and sales of all kinds of terminals, copper terminals, nose wire ears, cold pressed terminals, copper joints, but also according to customer requirements for customization and production, our raw materials are produced and sold by ourselves, we have their own raw materials processing plant, high purity T2 copper, quality and quantity, come to me to order it!

Copper Tube Terminals Without Checking Hole,Cable Lugs Insulating Crimp Terminal,Cable Connector Tinned Copper Ring Terminal,Tubular Cable Lugs Crimp Terminal

Taixing Longyi Terminals Co.,Ltd. , https://www.lycopperlugs.com