Smart DDR PHY training technology combining software and hardware

introduction

The DDR interface rate is getting higher and higher, and each generation of products is challenging the limits of technology, and the training requirements for DDR PHY are becoming more and more stringent. This article talks about the challenges faced by DDR PHY training from the perspective of the cutting-edge IP company Corey, introduces the main process and advantages of Corey DDR PHY training, and explains how Corey solves the problems in DDR PHY training.

Introduction to DDR PHY training

High reliability is one of the important quality and performance requirements of system-level chip SoC. The complexity of SoC is that each IP module has a vital impact on it. From the experience of Xing Yaohui in serving customers for a long time, access to DDR SDRAM is a common requirement in customer SoC design, so DDR PHY has become a very critical IP. Its stable and reliable work determines the entire SoC. The quality and reliability of the chip.

The Standards Organization of the Solid State Technology Association (JEDEC) that formulated the DDR protocol did not require dynamic random access memory (DRAM) to have the ability to adjust the delay of input and output signals, so usually DDR PHY assumes both input and output. The direction of the delay adjustment work, this adjustment process is called training (training). The training is to make the DDR PHY output signal meet the requirements of the Solid State Technology Association standards. The DDR PHY adjusts the delay line at the transmitting end so that the DRAM particles can sample the control signal and data signal smoothly at the receiving end; correspondingly, On the DDR PHY end, by adjusting the delay line of the internal receiving end, the DDR PHY can sample the output signal of the DRAM particles smoothly. Therefore, the DDR interface can work stably and reliably in both reading and writing directions.

However, as the operating frequency of DDR increases, the accuracy and precision requirements of DDR PHY training also increase. The accuracy and precision of training determine whether the DDR system can work stably and reliably at higher frequencies.

Challenges faced by DDR PHY training

There are many types of DDR training, and the results of each training cannot be wrong. At the same time, the training sequences defined by the Solid State Technology Association are relatively single. If only these default sequences are used, the training results are not an optimal value in actual work.

At present, most DDR PHYs use hardware training methods. If the hardware algorithm has problems, it will cause training errors, DDR cannot work normally and stably, resulting in the failure of the entire SoC. At the same time, it is difficult for the hardware training mode to support complex training sequences and training algorithms, so that the optimal solution for the training results cannot be obtained.

Xin Yaohui's DDR PHY uses a firmware training method that combines software and hardware to jump out of the fixed thinking of the above DDR PHY training mode.

The advantages of DDR PHY in training

Solve the problem of write leveling

Write equalization is to calculate the difference between the routing delay of the command path and the data path under the flyby structure. This difference is compensated to the data path in the DDR PHY so that the delay of the data path and the command path is finally consistent.

In actual applications, the delay on the command path will exceed the delay on the data (DQ) path. Assuming path difference = command path delay-data path delay, the path difference is generally between 0 and 5 clock cycles. The path difference can be divided into an integer part and a decimal part (the unit is 0.5 clock cycles).

According to the write equalization requirements of the Solid State Technology Association standards (such as JESD79-4C), DRAM will use the DQS edge sent by the DDR PHY to sample CK in the write equalization mode and return the sampled value to the DDR PHY through DQ.

Through this training, DDR PHY can calculate the fractional part of the delay difference between the command and the data path, but there is no way to train the integer part of the delay difference between the command and the data path (delay DQS by one more clock cycle or less by one clock) Period, sampling value of CK with DQS is the same).

In order to solve this problem, the approximate path difference is usually estimated based on the layout design, so that the integer part of the path difference is obtained by itself, and it is directly configured in the DDR PHY register. This approach is not a big problem when the frequency is relatively low and the consistency of mass production is relatively good. However, in mass production, if the inconsistency between platforms exceeds one clock cycle (the cycle at the highest frequency of LPDDR4 is 468ps), the above method of directly configuring the integer part will not work, and it will inevitably lead to some chip is not working properly.

Xin Yaohui uses the firmware training method and special adjustment methods during DDR write operations, which can help customers calculate the integer plus decimal part of the path difference, without the customer needing to estimate the path difference range based on the layout design.

Filter the high impedance state of DQS during training

During the read operation, the DQS signal is in the high-impedance state before the preamble, and the leading part of the DQS signal cannot reach the most stable state, so it is necessary to train the gate signal to read the DQS to filter out the high-impedance state and the preamble. , Just get the effective DQS of the entire read burst (Red Burst) operation, this is the read DQS gate training.

Xin Yaohui uses a specific method to eliminate the interference of unstable DQS during training and use the gate signal of reading DQS to obtain the rising edge position of the first DQS corresponding to the read burst data, thereby obtaining the gate position.

Delay DQS to improve the accuracy of reading DQ training

Generally, there is no such training in DDR PHY, because the training is not required by the standards of the Solid State Technology Association, but in practical applications, this training has more important significance.

The deviation between reading DQS and reading DQ is tDQSQ, and this value ranges from 0 to 0.18 UI (about 0 to 42 ps at high frequencies). When reading training, use the method of delaying DQS to find the left and right windows of DQ, and finally put DQS at the center of the DQ window. Due to the internal DQS-DQ delay deviation of the DDR PHY, the package pad delay deviation, and the PCB trace deviation, although the tDQSQ output from the DRAM side is a positive number (the delay of DQ is greater than DQS), the tDQSQ seen inside the DDR PHY But it may be a negative number (the delay of DQS in DDR PHY is larger than DQ).

In this case, even if the delay of DQS is 0, DQS falls within the window of DQ. The PHY will search the left and right windows of DQ by increasing the delay of DQS from 0 delays, which will inevitably lead to the final search of DQ The window is smaller than the actual window. The sampling point of the DQS after reading the training is not in the middle of the DQ, but at a position to the right, and the final reading margin (margin) becomes smaller.

Xin Yaohui uses a specific method to make each DQ window be on the right side of DQS. In this way, when reading training, the complete window of DQ can be searched, which improves the accuracy of reading training and improves the reading performance of DDR.

Use the firmware training method to obtain the optimized value of the read data eye (Read data eye)

Reading data eye diagram training is done by delaying reading DQS, placing the reading DQS in the middle of the DQ window. The biggest problem at present is that the standard of the Solid State Technology Association defines the read sequence for reading data eye diagrams relatively simply. For example, for DDR4, the defined sequence is a fixed sequence of 01010101. Because of the inter-symbol interference and signal reflection of high-speed signals, the DQ window is different in the case of different reading sequences, so the use of a simple fixed sequence does not cover the actual use scene well, resulting in the training results in actual work. It is not an optimal value.

Xin Yaohui adopts the firmware training method, which can set different patterns, such as PRBS paradigm, specially designed frequency sweep paradigm, etc. Obviously, this type of paradigm can better reflect the characteristics of the data channel, because it contains high-frequency, intermediate-frequency, and low-frequency information, as well as problems such as inter-symbol crosstalk caused by long 0 and long 1, and better training results can be obtained. Get a reliable value that can cover the actual work scenario.

Optimized reference voltage (Vref) voltage and address line (CA) delay in two-dimensional training mode

Address line training is introduced in LPDDR3. DRAM feeds back the sampled address signal to the DDR PHY through the data path. DDR PHY can adjust the delay of the address line through this feedback. In LPDDR4, the training of the reference voltage of the address line is also added, so it is not only necessary to adjust the delay of the address line, but also to find an optimal reference voltage value. The traditional way of using hardware training will be stretched when faced with this two-dimensional training, and the hardware algorithm cannot be too complicated.

Xin Yaohui uses the firmware's two-dimensional training mode to draw a complete two-dimensional image with the address line delay as the abscissa and the reference voltage as the ordinate, so as to obtain a better reference voltage and corresponding address line delay.

Optimized DQ reference voltage and DQ delay in two-dimensional training mode

The DQ reference voltage is introduced in the DDR4 Solid State Technology Association standard, but there is no explanation and support for how to train. Therefore, most DDR PHYs do not support DDR4 DQ reference voltage training and can only be configured with a fixed reference voltage value.

The LPDDR4 Solid State Technology Association standard adds support for writing DQS-DQ training (adjusting the phase of writing DQ relative to writing DQS) and DQ reference voltage training protocol.

Xin Yaohui uses firmware to not only support DQ reference voltage training for DDR4 but also for writing DQS-DQ and DQ reference voltage training for LPDDR4. It also uses the firmware's two-dimensional training mode to draw a complete DQ delay as The abscissa and the two-dimensional image with the DQ reference voltage as the ordinate, find the better DQ reference voltage and the corresponding DQ delay in the entire two-dimensional image.

to sum up

With the improvement of process nodes and the evolution of DDR particle technology, the operating frequency of DDR is getting higher and higher, and the training requirements for DDR particles are getting higher and higher. At the same time, for DDR PHY, the internal analog circuits (FFE, DFE, etc.) also need to do various high-precision training as the frequency increases. Xin Yaohui's intelligent training method using a combination of software and hardware can not only support various necessary complex training of DDR particles but also support various training of the internal analog circuit of DDR PHY. Through continuous optimization of the training algorithm, we continue to challenge the rate limit of each generation of DDR products.

Keep going, and go one step further, Xinyaohui people will take the responsibility of providing high-performance interface IP and high-quality design services as their own responsibility, work hard, and work with the majority of chip design companies to launch better products to help the development of China's chip industry.