Realization of MPEG-4 real-time encoder on ADSP-BF533

Design scheme of MPEG-4 real-time encoder on ADSP-BF533

MPEG-4 video coding technology can transmit high-quality video data in a small bandwidth, saving a lot of storage space, but the coding complexity is also high, there are currently three kinds of implementation solutions: programming on a general PC; through ASIC hardware implementation and using a general-purpose DSP chip.

Compared with the previous two, the general DSP chip implementation has the following advantages: strong computing performance; good upgradeability, similar to the PC, the function of the general DSP chip is still implemented by programming, which can be quickly and easily upgraded. And add new functions to adapt to technological development and market changes; low cost, low power consumption, and a wide range of applications.

System hardware design

The processing core of the encoding system is ADSP-BF533 (hereinafter referred to as BF533), which uses its multi-functional parallel interface PPI to collect video data. The PPI has DMA function and can perform high-speed data transmission without kernel intervention. After the transmission is completed, it can automatically send DMA to the kernel Interrupt.

The video capture part selects 0VniVision's CIF-level color CMOS image sensor chip OV6630, which has a maximum resolution of 352 × 288 and a capture rate of up to 60fps, and on-chip hardware implements raw RGB data to 4: 2: 2YUV format The user does not need to write a complex RGB to YUV application program, which greatly reduces the amount of code and is suitable for MPEG-4 encoding.

Set OV6630

The output is 4: 2: 2 YUV video data format, BF533 PPI can directly receive video data in this format. The two can be seamlessly connected without intermediate circuits.

Considering that there are still a large number of analog cameras still widely used, the video ADC ADV7183 is also added to the system. This chip can convert PAL analog TV signals to ITU-R656 standard 4: 2: 2 format digital video. The output port of ADV7183 is connected to the PPI of BF533. In this way, you can use the system's own CMOS image sensor to collect video, or you can connect an external PAL analog camera. The user can choose freely.

Due to the limited storage space in the BF533 chip, and the huge amount of video capture and encoding data, a 4MBSDRAM HY57V56162 (containing 4 internal subsets) of Hynix was selected as the large-capacity dynamic memory outside the BF533 chip. At the same time, in order to store the program startup code, a flash memory chip PSD4256G6V with a capacity of 1MB is used to form the off-chip non-volatile memory of BF533. The system hardware structure is shown in Figure 1.

Encoder design and implementation

Memory space allocation

BF533 adopts a unified 32-bit, 4GB addressable space, including L1 high-speed SRAM on-chip, SDRAMSpace, and asynchronous memory space (A syn chronous Memory Space).

On-chip Ll SRAM includes: 64KB instruction SRAM, 16KB instruction Cache / SRAM, 32KB data SRAM, 32KB data Cache / SRAM, and 4KB note memory.

The L1 SRAM works at the core clock frequency, and the core can perform high-speed bandwidth access to it. It has the fastest access speed among all memories, but its capacity is limited. Therefore, only the most critical codes and data should be stored in L1SRAM. At the same time, both instruction cache and data cache are enabled, which can greatly improve the efficiency of accessing off-chip memory. Cache setting is achieved through corresponding IME M_C ONTROL, DMEM_CONTROL, and CPLB register configuration.

Due to the limited storage space in the BF533 chip, and the large amount of original video data: the CIF level 4: 2: 2 format takes up 202752 bytes per frame of image. If analog PAL video input is used, the space occupied by each frame is as high as 829440 bytes, you can only put it in SDRAM. Similarly, video encoded data should also be stored in SDRAM. In addition, after the system starts to complete the Boot loader boot, the main key code is executed in the on-chip L1 program SRAM, but most of the program code still needs to be executed from the SDRAM. Due to SDRAM's special read / write method, if the next access to the memory page is different from the current active page (AcTIve Page), that is, Page Miss occurs, SDRAM must first close the current page and then open a new page Reduced SDRAM read and write rate. In this system, SDRAM needs to store a variety of data, and the core and DMA need to access SDRAM frequently. Therefore, the SDRAM space should be carefully allocated to minimize page misses.

BF533's SDRAM controller (SDC) can support an active page in each internal subset of SDRAM, and there is no delay when switching between these four I-Banks. In this way, mapping different data and codes to different I-Banks can minimize page misses, thereby improving the efficiency of accessing SDRAM.

Due to the need for real-time encoding, in order to ensure that video acquisition and compression can be performed synchronously, the ping-pong buffer technology is adopted: two video frame receiving buffers BUF1 and BUF2 are set, and BF533 DMA transmission chain is used for transmission. When a certain DMA buffer is filled, the kernel encodes it in MPEG-4 (while also constructing a reference frame), at this time the DMA starts to fill another buffer. Since BUF1, BUF2, program codes and reference frames are located in different I-Banks, the chance of SDRAM page switching is reduced, and efficient access to SDRAM is realized.

MPEG-4 program flow

MPEG-4 encoding is based on macro blocks (Macro Block), and each macro block contains 4 8 × 8 luma sub-blocks and 2 8 × 8 chroma sub-blocks. The principle of MPEG-4 encoding mainly includes the encoding of I-frames and P-frames. P-frames have more motion estimation and compensation modules than I-frames.


Code writing and optimization

BF533 supports C / C ++ high-level language, but the high-level language has low execution efficiency. To achieve maximum execution efficiency, full assembly language is used to implement MPEG-4 encoding.

For I-frame coding, the main calculations are forward discrete cosine transform FDCT and reverse discrete cosine transform IDCT. The optimized code provided by ADI is used here. It is based on the chen's fast DCT algorithm and uses a lot of BF533-specific parallel instructions It takes only 293 clock cycles to complete an 8 × 8 DCT.

Compared with I-frames, P-frame coding is relatively complicated. Among them, motion estimation is the most time-consuming part of P-frame coding, that is, searching for the position in the reference frame that most closely matches the currently coded macroblock or subblock.

The SAD (absolute error sum) criterion has the advantages of no need for multiplication and division, and is simple and convenient to implement. It is selected as the motion estimation matching criterion.

The motion estimation search algorithm uses the diamond search method (also known as the diamond search method), the search accuracy is half a pixel, and the diamond search method is simple, robust, and efficient.

The Blackfin assembly instruction set has the instruction S specifically for SAD calculation

AA (src_reg_O, src_reg_1), one instruction can complete 4 bytes of SAD calculation at the same time. In addition, some other video-specific operation instructions are comprehensively used, such as BYTEPACK (_packing 4 bytes into a 32-bit register), BYTEUNPACK (opposite to BYTEPACK function), BYTEOP16M (subtracting 4 bytes), Can significantly improve the efficiency of code operation.

The BF533 integrated development environment VisualDSP ++ also provides the Profile function, which can be used to evaluate the performance of the program code, find the bottleneck of program execution, and perform targeted optimization. The main optimization methods are: 1. Minimize branch and condition judgment instructions, because these instructions will destroy the BF533 pipeline and cause additional clock delays; at the same time, some simple subprograms will be rewritten with macros to avoid stack operations and parameter transfer during subprogram calls;

2. Using parallel execution instructions, BF533 is not a superscalar structure DSP, but it can still support a maximum of 3 instructions in parallel execution, such as: saa (r1: 0, r3: 2) IIr0 = [i0 ++] â…¡r2 = [i1 ++]; use this The parallel instruction can complete the 4-byte SAD calculation in one clock cycle, and simultaneously complete the data update of r0 and r2 for the next calculation;

3. In the quantization process of DCT / IDCT and some other modules, division is involved, and the calculation of division by BF533 takes a lot of clock cycles. Therefore, the division is changed to the reciprocal of the quantization factor here. . Combined with the shift operation, the purpose of efficient calculation of division can be achieved without basically losing accuracy.

After the aforementioned optimization process, CIF-level real-time encoding of MPEG-4 SP was successfully implemented on BF533. In addition, for the CIF-level CMOS sensor OV6630, the system can directly encode it in real time. However, if it is an external PAL analog camera, the A / D conversion by ADV7183 has a resolution of 720 × 576. Due to the performance limitations of BF533, it is not yet possible to encode this level of resolution in real time. The resolution is reduced to 352 × 288, and then do: MPEG-4 encoding.

Experimental results

The BF533 core clock (CLK) is set to 594MHz, and the system clock (SLK) is 118.8MHz. Select the CIF-level standard test sequence flower (frame rate is 25fps, a total of 75 frames) to verify the system.

Figures 3 and 4 show the reconstructed images after I-frame coding and P-frame coding, respectively. The I-frame coding compression ratio is 11.5: 1, and the reconstructed image signal-to-noise ratio is 33.43dB. The P frame encoding compression ratio reached 65.7: 1, and the signal-to-noise ratio of the reconstructed image was 32.65dB. By adopting the BF533 on-chip real-time clock RTC for accurate timing, the entire video sequence has a total of 75 frames, encoding takes 2.27s, and the average encoding rate reaches 33fps, meeting the real-time encoding requirements.

Conclusion

This article introduces the implementation of CIF-level MPEG-4 SP real-time encoding on BF533 DSP; DSP chip, the encoder can flexibly select the system's own CMOS sensor and user-selected PAL analog camera two video capture sources. The system can be used in IP video phone, traffic detection, supermarket monitoring, intelligent community security and other fields, and has strong practical value.

Dodecagonal Steel Pole

Dodecagonal Steel Pole,Galvanized Steel Frame,Galvanized 55Ft Steel Pole,Transmission Steel Pole

Jiangsu Baojuhe Science and Technology Co.,Ltd. , https://www.galvanizedsteelpole.com

Posted on