Flash memory is useful, but it doesn’t last forever. The process of writing and erasing onto the memory chip degrades the lifetime of the flash, harming reliability. While this is a threat to all embedded devices that store and handle important data, it’s particularly a concern in sophisticated automotive systems. In a connected vehicle, poor flash lifetime can lead to corrupted data, potentially resulting in significant costs as well as even critical system failure and end-user risks.

As a result, it’s extremely important to calculate the flash lifetime of a vehicle to avoid failure more effectively. Calculating the lifetime of the overall design can be a complicated task, though. It’s even trickier without an understanding of how the file system interacts with the media.

Let’s take a look at why that process is so complex, as well as some of the factors impacting the lifetime of an automotive system. Finally, we’ll look at what solutions are available for making the process easier.

NAND flash is the go-to memory choice for the vehicle

Automotive embedded devices now use flash media extensively in their design. Instead of single sensors that use NOR flash to store occasional updates, designs now use NAND flash to store both log files and more frequent updates. These devices are increasingly joining automotive storage clusters, with shared storage that must handle multiple use cases. These domain controllers can be tasked with additional logging and centralized analysis of gathered data before offloading to the cloud.

The NAND-based media has also evolved from earlier designs. It must provide a low-cost solution that also provides the lifetime required by manufacturers. SLC NAND flash with 100,000 program/erase (P/E) cycles has been replaced by storage with multiple bits per cell and a much shorter lifetime.

Write amplification and fragmentation – a duo that’s damaging flash lifetime

Managing data from users, applications, and system updates is handled by the file system. Sensors generate a certain amount of data, but because of the operational limitations of flash media, this sensor data will end up getting multiplied by a small factor as it is stored on the media. This is known as write amplification, and it can shorten the lifetime of the flash.

Each file of data must also have metadata – the file name, permissions, and date and time information. How the gathered data is written can also add to write amplification, especially when data must be committed immediately for system integrity. Fragmentation also has a negative impact. A file system by nature is designed to store files in a contiguous way across the storage media. However, as the file system starts to age and there are less contiguous regions to store files on to, the files start getting broken up into several pieces. When that happens, the file system must allocate additional metadata, or expand allocation tables. Metadata writes flushed immediately to the media (for system integrity) result in a HUGE write amplification factor. An additional design penalty is visible in read performance, when read-ahead buffers are now useless for every fragment encountered.

Based on these estimates and calculations, a designer must solve the equation for the required media lifetime. Ten years is not uncommon in the automotive industry. Recently, this estimate was shown to be insufficient for sensors and devices which were generating too many log files. Presuming these are also factored in the initial design, there is only one question left for the flash media vendor.

How many write cycles is enough?

When we talk to our customers, we find they have recently spoken with flash media vendors who give them a number like “1 million write cycles”. Without context, that phrase just does not provide enough information for how the media will deal with customer use cases. How can this be translated into an expected lifetime?

Each NAND flash media design has an expected maximum for program and erase cycles. This maximum is for each erase block or write page of the flash media. In a perfect world with no write amplification, the raw maximum data that can be written can be calculated by multiplying the maximum P/E cycles and the number (and size) of the erase blocks on the media.

The automotive community and SSD vendors both measure storage in terabytes written (TBW). This is defined by the standards group JEDEC as the “number of terabytes that may be written to the media over its lifetime” (JEDEC standard JESD218). Factoring in the information mentioned above (write amplification factor or WAF, and write volume workload), they can define a requirement for the design. When problems occurred, one recent automaker’s design replaced an 8 GB eMMC with a 64 GB part, theoretically resulting in 8 times as much life.

This goes some distance toward repairing situations where the workload is greater than originally planned for. More frequent security updates (especially over-the-air) and increased logging of sensors and situations are a big factor in these unplanned or at least uncertain write increases. An even bigger factor can be the raw data from DVR and other cameras in modern vehicle designs.

I hate to be the bearer of bad tidings, but just increasing the capacity is not sufficient. Bigger is not necessarily better, when you factor in other changes that come with it.

Larger media, larger flash challenges

For larger media, flash calculations get a bit more difficult – for several reasons. Besides the additional cost for the new part, larger media tends to have larger write page sizes. File system block sizes also expand on larger media. All the previous write amplification factor calculations must be redone for the larger media, and WAF often grows considerably.

Teams which find they have additional space available may add to the amount of data logging they perform. As desktop hard drives and memory capacities have grown, desktop operating systems and applications have never gotten smaller or written less data.

Like your car, your smartphone, and your laptop, the performance of flash media can also drop as it approaches the maximum lifetime. One reason for this is the increase in correctable bit errors. While no data is lost, dealing with bit errors properly takes additional time and can reduce performance. The physical processes of programming the media can also take longer as NAND flash approaches the maximum program and erase cycles. The goal of every design is for that reduction to happen in year eight or nine, not year two.

Getting the numbers right with optimized flash testing

As we’ve seen so far, determining the lifetime of automotive flash is vitally important for ensuring a safe and secure car. But it’s also a complex challenge.

Tuxera’s team of experts can help your team navigate these waters. We have decades of experience in helping customers write their specs and requirements to ensure the right things are being measured, and also measured in a consistent and repeatable way. We have a flash testing service that can use your workload and devices to measure the write amplification of the design and, the media level, and how all of that will affect the resulting lifetime. Tuxera system architects can examine your design to help your team understand the potential pitfalls.

Finally, our quality-assured automotive file systems like Tuxera Reliance™ Velocity and Tuxera FlashFX® Tera were written to work best on flash media, providing high reliability, fail-safety, the lowest write amplification – and the highest resistance to fragmentation possible.

Involving Tuxera in your next project will result in the best performance over the longest lifetime for your media – generating customer satisfaction instead of recalls.

Automotive Tier-1s and OEMs, learn more about how we can help you optimize your flash lifetime and performance.