How to reduce the failure rate of high-speed optical modules in data centers


5G, big data, artificial intelligence and other technologies have higher requirements for data processing and network bandwidth.Data centers need to continuously improve network bandwidth to meet.Therefore, there is an urgent need to improve network bandwidth in data centers these days, especially in Internet data centers.The most direct way to increase network bandwidth is to increase single-port network bandwidth from 40G to 100G, from 100G to 200G, or even higher, thereby increasing the bandwidth of the entire data center.Experts have predicted that most 400GbE deployments will begin in 2019. 400GbE switches will be used as spine or core switches for ultra-large data centers, as well as spine or backbone switches for private and public cloud data centers, knowing that 100G is also popular. In the past three years, it is now necessary to transition to 400G, and the network bandwidth is increasing faster and faster.

On the one hand, there is a strong demand for high-speed modules in the data center, and on the other hand, the module failure rate is high.Compared to 1G, 10G, 40G, 100G or even 200G, the intuitive failure rate is much higher.Of course, the process complexity of these high-speed modules is much higher than that of low-speed modules. For example, a 40G optical module is essentially bound by four 10G channels.  At the same time, it is equivalent to four 10Gs working, as long as there is a problem. The whole 40G can no longer be used, and the failure rate is of course higher than 10G, and the optical module needs to coordinate the work of four optical paths, and the probability of error is naturally higher.The 100G is even more so, some are bound by 10 10G channels, and some use new optical technology, which will increase the possibility of error.The 100G is even more so, some are bound by 10 10G channels,  and some use new optical technology, which will increase the possibility of error.Not to mention the higher speed, the technical maturity is not high, like 400G is still the technology in the laboratory, it will be introduced to the market in 2019, there will be a small climax of the failure rate, but the amount is not at the beginning. There will be a lot, and as the technology continues to improve, I believe it will be as stable as the vulgar module.Imagine getting the 1G optical module of GBIC 20 years ago. It is similar to the feeling of using 200G now. It is inevitable that the new product will increase in failure rate in the short term.

Fortunately, the fault of the optical module has less impact on the service. The links in the data center are redundantly backed up. If one link optical module has a problem, the service can take other links. If it is a CRC error packet, it can also pass the network management. Immediately found that the replacement process is done early, so the optical module failure rarely has a big impact on the business. In rare cases, the optical module may cause a device port failure, which may cause the entire device to hang. This situation is mostly caused by unreasonable device implementation, and rarely occurs. Between most optical modules and devices is Loosely coupled, although connected together, has no coupling relationship. Therefore, although the use of high-speed optical modules is more and more bad, the impact on the business is not so great. Generally, it will not attract people’s attention. It is found that the fault is directly replaced, and the maintenance time of the high-speed optical module is also long. The fault is basically free. Replacement, the loss is not large.

The faults of the optical module are mostly caused by the failure of the port to be up, the optical module to be unrecognized, and the error of the port CRC. These faults are related to the device side, the optical module itself, and the link quality, especially the misstatement and failure to UP. Determine the location of the fault from the software technology. Some are still the problem of the adaptation class. There is no problem between the two parties, but there is no debugging and adaptation between them, which makes it impossible to work together. This situation is still quite a lot, so many network devices will give adaptation. The optical module list requires customers to use their own adapted optical modules to ensure stable availability.If there is a fault, the best method is still rotation test, change link optical fiber, change module, change port, through this series of tests to confirm whether it is the optical module problem, or link or equipment port problem, fortunately, generally this kind of fault phenomenon is relatively certain, it is difficult to deal with that kind of fault phenomenon is not fixed.For example, if there is a CRC wrong packet on the port, the optical module will be directly pulled out and replaced with a new one.  The fault phenomenon will disappear, and then the original optical module will be replaced and the fault will not be repeated, which makes it difficult to judge whether it is the optical module problem or not. This situation is often encountered in practical use, which makes it difficult to judge.

How to reduce the failure rate of light modules? First, pays special attention to the source, higher bandwidth of the light module don’t jump into the market, to make full of experiments, and the module need relevant equipment, realize these techniques also need to be perfect to mature, the new module to smoothly into the market, not simply the pursuit of high speed, network equipment now support multiple ports, not 400 g, bundled with four 100 g can also meet the requirements.Second, we should pay attention to the introduction of high-speed optical modules. Network equipment suppliers and data center customers should be careful in the introduction of high-speed optical modules, increase the strict test of high-speed optical modules, and resolutely filter defective products in quality.Nowadays, the market competition for high-speed optical modules is fierce.They all hope to seize the opportunities in the new high-speed modules, but the quality and price are uneven. This requires network equipment vendors and data center customers to increase their assessment efforts. The higher the rate of the module, the more the complexity of the verification.Third, the optical module is actually a device with a particularly high degree of integration. The exposed fiber channel and internal components are relatively fragile. When using it, it should be handled gently, with clean gloves to avoid falling into dust, which will also reduce Use the failure rate, the unused optical module should be equipped with a fiber cap and placed in the bag.Fourth, the limit condition of less as far as possible, such as 100 g of light module used in the case of close to the speed limit and for a long time, 200 meters distance light module, and must be used in the 200 – meter distance, these limit values using the wastage of the optical module is bigger, it just like people, people work in air conditioning room of 24 ~ 26 degrees, the efficiency is high, in the high temperature of 35 degrees outside environment, attention can’t focus for a long time, work efficiency is very low, in more than 40 degrees, people are coming to the heat also how to work. Providing a comfortable environment for the optical module can effectively extend the service life of the optical module.

With the growth of massive data, the bandwidth demand of data centers is getting higher and higher, and the introduction of higher-speed optical modules has become the only way to control the quality.If the new high-speed modules hit a wall frequently in the market, they will be eliminated. Of course, any new technology has a mature process, high-speed optical module is no exception, need to continue technological innovation, solve various problems, improve module quality, reduce the probability of failure. High speed light module is the profit engine of module manufacturers, and it is the key place for module manufacturers in past dynasties.

WhatsApp Online Chat !