NAND vs NOR flash memory: an embedded developer’s guide to choosing

Careful consideration of the differing capabilities of NAND and NOR flash memory, the specific needs of the use case, and the best storage management software help to improve embedded device performance and lifetime.

Consumer electronics, cars, and other devices use large amounts of embedded flash memory for non-volatile storage – more, in fact, than ever before.[1],[2],[3]

One important decision in designing such devices involves the kind of flash memory to use: NAND or NOR? These are the two major types of flash memory used today. They differ in their circuitry, and each was originally named for the behavior of a discrete logic gate – NAND or NOR.

In this article, we’ll explore which of these flash technologies could be the right choice for you – and steps you can take when making that decision.

1 – Start with data size and usage  

When deciding between these two flash memory types, one of the first questions everyone will ask is, “how much data needs to be stored?”

NOR flash memory has traditionally been used to store relatively small amounts of executable code for embedded computing devices, like simple wearables or small IoT devices. NOR is well suited to use for code storage because of its reliability, fast read operations, and random access capabilities. Because code can be directly executed in place, NOR is ideal for storing firmware, boot code, operating systems, and other data that changes infrequently.

NAND flash memory, on the other hand, has become the preferred format for storing larger quantities of data on just about any sophisticated device or system – such as your smartphone or car infotainment system. Higher density, lower cost, faster write and erase times, and a longer read-write life expectancy – these traits make NAND especially well suited for consumer applications. In such use cases, large files of sequential data need to be loaded into memory quickly and replaced with new files repeatedly.

However, data storage size remains just one factor when selecting flash memory. The choice between using NAND and NOR flash may not be a simple one for the complex embedded devices of today and the future. While ever-larger media files are driving increased demand for inexpensive NAND, powerful new operating systems and intricate applications running on fast processors call for the kind of quick-executing code NOR can support. combines a tremendous need for storage with a demanding set of application performance requirements. In some cases, an optimal design might call for both types of flash memory in the same device.

2 – Consider the performance strengths of each technology

Whichever type of flash is used in a device, there are certain performance bottlenecks – due to the design characteristics of the technologies – that need to be mitigated. For example:

  • NOR is fast to read current data but markedly slower to erase it and write new data.
  • NAND is fast to erase and write, but slow to read non-sequential data through its serial interface.
  • NAND is also prone to single-bit errors, requiring rigorous algorithms for error detection and correction.

3 – Choose the right storage management software

Well-designed software strategies can be very effective in increasing the performance and reliability of flash hardware. The goals of flash memory management software include:

  • avoiding loss of data
  • improving effective performance
  • maximizing media lifespan

I’ll dive a little deeper into these aspects below.

Avoiding loss of data

Perhaps the most important goal in managing flash memory is to assure that no data is ever lost as a result of an interrupted operation or the failure of a memory block. There are several ways that flash management software can achieve this aim. Rewrite operations, for example, can be managed in such a way that new data is written and verified before the old data is deleted, so that no power loss or other interruption can result in the loss of both old and new data.

Bad block management is another important safeguard to prevent data from being written to memory blocks that have failed. Software can check for bad blocks shipped from the factory, as is typical with NAND, and avoid writing to those blocks from the beginning. When blocks go bad over time they can be identified and managed so that they are no longer used. Finally, as the end of media life nears, good memory management software can implement a graceful strategy – such as placing the entire flash unit in a read-only state. Data loss can subsequently be avoided when the number of block errors exceeds a predefined number.

Improving effective performance

Flash hardware needs to be able to handle the growing performance requirements of contemporary data-intensive devices. Two ways media management software can improve performance are through background compaction and multithreading.

Compaction reclaims space by identifying blocks that have obsolete data that can be erased, copying any valid data to a new location, and then erasing the blocks to make them available for reuse. Such compaction increases the amount of usable space on the media and improves write performance. Compaction may also help to defragment non-contiguous data for improved performance on read operations. The space recovery is particularly valuable for the more costly NOR memory, while the defragmentation benefits the slower-reading NAND.

Compaction is best performed in the background during idle time, however, or it can interfere with critical operations and degrade performance. This is where a multithreading system becomes important. By allowing high-priority read requests to interrupt low priority maintenance operations, a multithreading system can reduce read latency by orders of magnitude compared to a single-thread solution.

Maximizing media lifespan

Flash lifetime is an important part of data reliability. When some blocks of memory contain fixed content – such as binary code – the remaining blocks will experience increased demand for erase and write operations. That leads to earlier flash failure. Wear leveling algorithms can prevent overuse of memory blocks, and prevent a “stalemate” scenario in which a small region of memory becomes locked in a pattern of repeated writing and compaction. Wear leveling software can monitor block usage to identify high-use areas and low-use areas containing static data, then swap the static data into the high use areas. The software can also balance write operations across all available blocks, by choosing the optimal location for each write operation.

A premium quality flash manager – like Tuxera FlashFX Tera® – successfully balances the above trifecta of features, safeguarding the memory from data loss while boosting performance and achieving the longest lifetime possible.

Final thoughts

With the amount of data being entrusted to flash memory, embedded device manufacturers must make sure they select the optimal kind of flash for their needs. Getting it wrong means muddled performance and a less than stellar flash lifetime. The decision between NAND and NOR memory may be more complex than would first appear – and will ultimately depend on both technical and pricing requirements of the device being built.

However, by carefully considering the differing capabilities of NAND and NOR along with the specific needs of the use case, more effective flash memory decisions can be made.

Regardless of the type or combination of flash that’s finally chosen, it’s prudent to include memory management software (like FlashFX Tera) to prevent data loss while improving the performance and maximizing the lifespan of the memory.


Embedded device manufacturers – let’s keep your flash memory optimally managed and reliable.

CONTACT US

References

[1] Flash memory market revenues worldwide from 2013 to 2021, Statista.

[2] “Semiconductor market size worldwide from 1987 to 2020”, by WSTS, WSTS Semiconductor Market Forecast Fall 2019.

[3] IDC Report: The Digitization of the World From Edge to Core, by IDC and Seagate.


Ensuring efficiency in embedded IoT with the help of file systems

The right file systems can help safeguard IoT data and save the developer precious project time.

Today’s embedded hardware is highly advanced. As many with experience developing IoT solutions know, progress has been rapid, with flash memory and RAM allocations offered by the latest 32-bit microcontrollers at a level unheard of in the embedded space just a short time ago – and with CPUs often clocked at speeds once reserved for desktop PCs.

As a result, it may be easy to assume that an efficient use of CPU resources (including memory and clock cycles) is at most a minor concern.

But while embedded hardware has been getting increasingly advanced, the device software needs to be able to meet the growing demands of the hardware and use cases too. It is therefore important – perhaps now more than ever – for developers to ensure that their software runs with the highest efficiency, and that their own project time is spent in an efficient manner.

One of the ways to achieve that is with the help of file systems.

Defining and understanding efficiency in IoT

Let’s take a step back. In order to understand how a file system can help boost IoT efficiency, it is first critical to examine what IoT efficiency really means.

There is more than one way to look at efficiency in IoT. For years, efficiency has been a major area of interest for developers searching for kernel, file system, and other software modules. Often, the rationale used to justify the adoption of such modules is that with the right software solutions developers could save time and resources by focusing on writing application code, instead of wading through massive amounts of infrastructure code. Overall, this perspective sees efficiency as a function of resource usage in a more optimal way. The end result? Ideally, a more efficient project.

However, it would be a mistake to look at IoT efficiency as purely a question of resource usage. That is just one aspect of file system efficiency. Other, less obvious aspects play a role, such as:

  • How intuitive an embedded software module is to use and integrate into a project.
  • How well-documented the interface of such a module is. With poor documentation, a developer can lose significant time resolving an issue that could simply be the case of misused functions.
  • Another key part is reliability. Even code that is documented with clarity, precision, and in a comprehensive way can cause costly development time loss if that code functions unreliably. Safeguarding data integrity of the IoT device is vital to ensuring resource efficiency, by avoiding costly data failure and device malfunction. When evaluating software solutions, developers should therefore seek evidence and documented proof of reliability, such as certifications or test results.

How the file system fits into the efficiency puzzle: reliability & customizability

We mentioned file systems earlier, but what role do they play in the IoT developer’s toolbox?

File systems can be a powerful enabler of increased efficiency. One of the ways they can do that is by helping to enable both reliability and customizability on the device – thanks to smart design features that provide a power fail-safe environment for the data, along with extensive configuration options for the developer.

While some real-time operating systems (RTOS) provide a FAT-like file system (which includes code to perform I/O with a standard media format, including folders and files), generally this isn’t very customizable, and it rarely protects from data loss during a power failure.

Reliability

A file system like Tuxera’s Reliance Edge™ is a better choice for reliability, providing a power fail-safe environment through the use of transaction points – saving development time as well as resources. The file system’s transactional architecture ensures complete metadata and file data integrity. Dynamic Transaction Point™ technology gives developers complete compile-time and run-time control. That reliability is proven with accompanying source code for a variety of different tests, allowing application developers to confirm the file system is running reliably in a particular development environment.

Customizability

In addition to reliability, Reliance Edge provides customization of storage options, further saving the developer precious project time. In the minimum use case, referred to as “File System Essentials”, no folders or even file names are used. Data is stored into numbered inodes. The count of these locations is determined at compile time, but the sizes are not predetermined. One “file” can contain more data than the others, and the media is only full when the total size of the “files” reaches the threshold. Files can also be truncated, read and written to freely. What all of those features achieve is ultimate control for the developer, and a file system that can fit a given use case precisely as needed – establishing greater resource efficiency.

Whitepaper: Maximizing efficiency in IoT projects

For software to effectively meet the needs of today’s advanced IoT hardware, developers would be wise to ensure the highest efficiency in both their software and their own work time. That includes developing a more holistic and comprehensive view of what IoT efficiency is, in addition to making use of tools like a real-time kernel or a transactional file system.

This joint whitepaper by Tuxera and Weston Embedded (formerly Micrium) examines those concepts, and focuses on IoT efficiency within off-the-shelf software components. You’ll learn about:

  • The importance of efficiency in modern IoT projects.
  • Why efficiency is not purely a matter of resource usage.
  • The effect of a real-time kernel on IoT resource efficiency.
  • How a transactional file system improves resource efficiency.
  • Resource efficiency examined through a real-world example of an IoT medical device.

Download the whitepaper here.

Final thoughts

In the realm of IoT, inefficient resource usage can be costly for developers and vendors. It can potentially lead to constrained product design, wasted development hours, as well as a slower time-to-market – not to mention additional production costs.

A key step in achieving improved efficiency is understanding each layer of the efficiency puzzle, including the importance of intuitive software modules, well-documented code, and software reliability.

A further important part of efficiency is the reliability of the device, as lost or corrupted data can be disastrous from an efficiency standpoint, setting a project back significantly in terms of both time-to-market and resources spent. File systems specifically designed for safeguarding IoT data can supercharge efficiency – allowing developers to focus fully on their next product innovation.


Find out more about our reliable and high-performing IoT solutions

CONTACT US

ACID: the acronym for bulletproofing embedded device updates

System updates that were once occasional are now regularly required for devices of all kinds. If your device is connected to the internet, frequent patches are needed to keep up with security, feature enhancements, and bug fixes. Each patch is a test of special code the designer wrote to install updates – and also a chance for failure.

One failure during a software update can leave a device completely useless – basically a brick. Software update failures can lead to the loss of critically important data. In addition, they can cause costly, irreparable damage to a company’s reputation. When these dangers are combined with the increasing frequency of updates and criticality of data, planning for a failsafe update process can be considered a requirement. This involves more testing of the updates themselves, including the impact of power failure during the update, and a better method for installing software changes.

There’s one guiding tool that will help embedded developers design safer updates, and its ACID.

What is ACID?

The ACID acronym is a collection of properties designed to improve data fail-safety. ACID is used by database vendors, who know how important it is to perform an operation perfectly only once. This includes, for example, mortgage check funds which need to be withdrawn from an account and deposited with the loan company in one atomic operation.

According to ACID, a procedure should be atomic, consistent, isolated, and durable. Those same requirements can be used to help bulletproof a system update.

Atomicity – For an atomic update, transactions are all or nothing. This means that all of the data is written (or modified) at once, with no partial operations. This is the case whether the system update involves a single file, or multiple.

Consistency – Only valid data is saved. Consistency is applied to the individual files, mainly by requiring that they be valid. In other words, no download errors or media bit flips are allowed. For that matter, changes should be validated after they are applied, perhaps by performing a cyclic redundancy check of the resulting files, reverting to the previous state upon any failure.

Isolation – Transactions that are isolated do not affect each other. In contrast, performing an update while the embedded system is still writing other data to the media increases the likelihood of failure.

Durability – It’s important to make sure that written data is not lost. Any update process should therefore have the ability to recover from interruption. Most of the time errors in updates are due to a power failure – but a media error, failed cyclic redundancy check, or other exception could be to blame.

Whitepaper: Avoiding the brick

Security challenges and bug fixes are causing modern embedded devices to be updated more frequently than ever before, with more than 80% receiving updates yearly and 34% up to once a month. When a system update fails, the consequences range from trying again, all the way to a complete failure to boot – known as the brick.

The ACID acronym is a useful tool. But it’s just one part of a larger puzzle in keeping embedded devices safe from failed updates. The risk of complete failure can be mitigated through solutions that require extensive testing and rework, taking developers away from tasks that add customer value. But with the help of things like a transactional file system, any size and scope of system update can be handled seamlessly. A power failure or other interruption will never result in a brick, and no major changes to the application are required.

Download this whitepaper for information on how updates can go wrong, as well as the protections that are available in even minimal embedded environments.

Final thoughts

With updates to embedded devices occurring so frequently, it’s important to take steps in ensuring the updates don’t brick a device. The alternative? The risk of losing critical data, and the costly process of recovering the device. The ACID acronym is one tool for developers to use in achieving better fail-safety.


Let’s ensure your embedded device stays functional and fail-safe.

CONTACT US

Help! Why are my embedded devices failing?

When devices fail, the problems can be numerous. In conversations with the embedded OEMs we work with, a common issue affects almost every manufacturer – the cost of diagnosing and fixing the causes of field failure. This impacts time-to-market and pulls resources away from development, to be used instead for field diagnostics and post-mortem analysis. It is therefore vital to correctly understand what is causing a device to fail.

Pinpointing the causes of field failure

When delving into the causes, a lot of the core issues can be traced to an oversight with regards to the importance of the file system. Issues such as critical data loss through power outages and badly worn-out flash can be effectively tackled through optimized file systems.

The associated cost in time and resources that results from device failure is especially relevant for the following reasons:

  1. The need for defect prevention during field operations: The high degree of reliability required for protecting critical data dictates that devices must not fail. Manufacturers are required to run extensive tests for a range of user scenarios to safeguard against edge cases. The analysis of test results can be a daunting task due to several interfaces between hardware, software, and application layers. Hence, there is a need to continuously track these interactions, so that any difference in the interactions can be quickly discovered and corrected.
  2. Vulnerability of device to wear-related failures: As flash media continues to increase in density and complexity, it’s also becoming more vulnerable to wear-related failures. With the shrinking lithography comes increased ECC requirements, and the move to more bits/cell. With this also comes a concern that what was written to the disk may not in fact be what is read off the disk. However, most applications assume that the data written to the file system will be completely accurate when read back. If the application does not fully validate the data read, there may be errors in the data that cause the application to fail, hang or just misbehave. These complications require checks to validate data read as against the data written, so as to prevent device failures due to data corruption.
  3. Complexity of hardware and software integration: The complex nature of hardware and software integration within embedded devices makes finding the cause of failures a painstaking job, one that requires coordination between several hardware and software vendors. For this reason, it often takes OEMs days to investigate causes at the file system layer alone. Problems below that layer can entail more extensive testing and involve multiple vendors. Log messages can help manufacturers pinpoint the location of failure so that the correct vendor can be notified.

As each of the three points have indicated, identifying failures more quickly is the key to reducing the cost. Understanding why your embedded design is failing could be related to any or all of the above issues.

The ability to pinpoint the cause of failure is especially helpful when an OEM is:

  • Troubleshooting during the manufacturing and testing process to make sure that their devices do not fail for the given user scenarios.
  • Doing post-mortem analysis on parts returned from their customers, in order to understand the reasons for failures, and possible solutions.
  • Required to maintain a log of interactions between the various parts of the device, for future assistance with failure prevention or optimization.

Identifying the causes and costs of field failure is one thing, but what specific solutions can OEMs turn to in order to prevent these issues in the first place?

Fighting field failure with transactional file systems

As we mentioned, various file systems solutions exist for safeguarding critical data. Basic FAT remains a simple and robust option with decent performance. Unfortunately, it isn’t able to provide the degree of data integrity that is sometimes needed for safety-critical industries like automotive, aerospace, and industrial.

It bears repeating that embedded device fail-safety can be achieved with the right file system. Transactional file systems like Tuxera’s own Reliance Edge™ and Reliance Nitro™ offer carefully engineered levels of reliability, control, and performance for data that is simply too vital to be lost or corrupted. One of the key features of a high-quality transactional file system is that it never overwrites live data, ensuring the previous version of that data remains safe and sound. This helps preserve user data in the event of power loss.

In the video below, I demonstrate the data preservation differences between a FAT driver and Reliance Edge. The embedded device in the video has a power outage that causes severe image issues making the device nearly useless. Scrambled screens are just one example of what device failure looks like in the field. Imagine if that corrupted image is your system or BIOS update, and that device is storing data that is critical to your use case – a disastrous situation.

Final thoughts

Embedded device failure can cause significant resource costs and time to market delays for the manufacturer. The first step in correctly finding and identifying the causes of those failures involves understanding the ways that the device can fail – such as the level of flash wear on the device, the hardware and software integration, and how proper testing takes place.

The next step in tackling embedded device failures is understanding the role of a file system in securing your critical data, specifically against field failures and power loss. Selecting quality-assured transactional file systems are an effective way of doing that.

 

* This blog post was originally published in June 2020, and has been updated in November 2021.


Embedded device manufacturers – find out how file systems can help bulletproof your critical data.

CONTACT US

Automotive flash – what’s the real lifetime?

Automotive flash – what’s the real lifetime?

Flash memory is useful, but it doesn’t last forever. The process of writing and erasing onto the memory chip degrades the lifetime of the flash, harming reliability. While this is a threat to all embedded devices that store and handle important data, it’s particularly a concern in sophisticated automotive systems. In a connected vehicle, poor flash lifetime can lead to corrupted data, potentially resulting in significant costs as well as even critical system failure and end-user risks.

As a result, it’s extremely important to calculate the flash lifetime of a vehicle to avoid failure more effectively. Calculating the lifetime of the overall design can be a complicated task, though. It’s even trickier without an understanding of how the file system interacts with the media.

Let’s take a look at why that process is so complex, as well as some of the factors impacting the lifetime of an automotive system. Finally, we’ll look at what solutions are available for making the process easier.

NAND flash is the go-to memory choice for the vehicle

Automotive embedded devices now use flash media extensively in their design. Instead of single sensors that use NOR flash to store occasional updates, designs now use NAND flash to store both log files and more frequent updates. These devices are increasingly joining automotive storage clusters, with shared storage that must handle multiple use cases. These domain controllers can be tasked with additional logging and centralized analysis of gathered data before offloading to the cloud.

The NAND-based media has also evolved from earlier designs. It must provide a low-cost solution that also provides the lifetime required by manufacturers. SLC NAND flash with 100,000 program/erase (P/E) cycles has been replaced by storage with multiple bits per cell and a much shorter lifetime.

Write amplification and fragmentation – a duo that’s damaging flash lifetime

Managing data from users, applications, and system updates is handled by the file system. Sensors generate a certain amount of data, but because of the operational limitations of flash media, this sensor data will end up getting multiplied by a small factor as it is stored on the media. This is known as write amplification, and it can shorten the lifetime of the flash.

Each file of data must also have metadata – the file name, permissions, and date and time information. How the gathered data is written can also add to write amplification, especially when data must be committed immediately for system integrity. Fragmentation also has a negative impact. A file system by nature is designed to store files in a contiguous way across the storage media. However, as the file system starts to age and there are less contiguous regions to store files on to, the files start getting broken up into several pieces. When that happens, the file system must allocate additional metadata, or expand allocation tables. Metadata writes flushed immediately to the media (for system integrity) result in a HUGE write amplification factor. An additional design penalty is visible in read performance, when read-ahead buffers are now useless for every fragment encountered.

Based on these estimates and calculations, a designer must solve the equation for the required media lifetime. Ten years is not uncommon in the automotive industry. Recently, this estimate was shown to be insufficient for sensors and devices which were generating too many log files. Presuming these are also factored in the initial design, there is only one question left for the flash media vendor.

How many write cycles is enough?

When we talk to our customers, we find they have recently spoken with flash media vendors who give them a number like “1 million write cycles”. Without context, that phrase just does not provide enough information for how the media will deal with customer use cases. How can this be translated into an expected lifetime?

Each NAND flash media design has an expected maximum for program and erase cycles. This maximum is for each erase block or write page of the flash media. In a perfect world with no write amplification, the raw maximum data that can be written can be calculated by multiplying the maximum P/E cycles and the number (and size) of the erase blocks on the media.

The automotive community and SSD vendors both measure storage in terabytes written (TBW). This is defined by the standards group JEDEC as the “number of terabytes that may be written to the media over its lifetime” (JEDEC standard JESD218). Factoring in the information mentioned above (write amplification factor or WAF, and write volume workload), they can define a requirement for the design. When problems occurred, one recent automaker’s design replaced an 8 GB eMMC with a 64 GB part, theoretically resulting in 8 times as much life.

This goes some distance toward repairing situations where the workload is greater than originally planned for. More frequent security updates (especially over-the-air) and increased logging of sensors and situations are a big factor in these unplanned or at least uncertain write increases. An even bigger factor can be the raw data from DVR and other cameras in modern vehicle designs.

I hate to be the bearer of bad tidings, but just increasing the capacity is not sufficient. Bigger is not necessarily better, when you factor in other changes that come with it.

Larger media, larger flash challenges

For larger media, flash calculations get a bit more difficult – for several reasons. Besides the additional cost for the new part, larger media tends to have larger write page sizes. File system block sizes also expand on larger media. All the previous write amplification factor calculations must be redone for the larger media, and WAF often grows considerably.

Teams which find they have additional space available may add to the amount of data logging they perform. As desktop hard drives and memory capacities have grown, desktop operating systems and applications have never gotten smaller or written less data.

Like your car, your smartphone, and your laptop, the performance of flash media can also drop as it approaches the maximum lifetime. One reason for this is the increase in correctable bit errors. While no data is lost, dealing with bit errors properly takes additional time and can reduce performance. The physical processes of programming the media can also take longer as NAND flash approaches the maximum program and erase cycles. The goal of every design is for that reduction to happen in year eight or nine, not year two.

Getting the numbers right with optimized flash testing

As we’ve seen so far, determining the lifetime of automotive flash is vitally important for ensuring a safe and secure car. But it’s also a complex challenge.

Tuxera’s team of experts can help your team navigate these waters. We have decades of experience in helping customers write their specs and requirements to ensure the right things are being measured, and also measured in a consistent and repeatable way. We have a flash testing service that can use your workload and devices to measure the write amplification of the design and, the media level, and how all of that will affect the resulting lifetime. Tuxera system architects can examine your design to help your team understand the potential pitfalls.

Finally, our quality-assured automotive file systems like Tuxera Reliance™ Velocity and Tuxera FlashFX® Tera were written to work best on flash media, providing high reliability, fail-safety, the lowest write amplification – and the highest resistance to fragmentation possible.

Involving Tuxera in your next project will result in the best performance over the longest lifetime for your media – generating customer satisfaction instead of recalls.

Automotive Tier-1s and OEMs, learn more about how we can help you optimize your flash lifetime and performance.

TUXERA AUTOMOTIVE SOLUTIONS

The nuts & bolts of secure erase

When we remove critical data from our embedded devices using standard means, the data doesn’t fully disappear. Specific actions need to be taken to ensure sensitive data is not just hanging around on the device, waiting to be plundered. One of the best ways to protect that data is by properly erasing it when it is no longer needed.

Understanding NAND-based secure data removal

We spoke recently about this vital process of securely removing data, specifically from NAND-based media. Secure erase isn’t the only way to protect your embedded device data, but it is one of the most effective. Encryption and encoding are also good tools to use for secure data at rest. However, when a design falls into the wrong hands, these methods are insufficient to protect that data forever – it is better to have the data removed when it is no longer required. There is no lasting security through obscurity.

The truth is that with securely erasing NAND-based media, things are not like they were in the past with older media. This is simply a more challenging process than with older magnetic designs, which we described in our earlier blog post for comparison purposes. Ultimately, removing secure data is a process of connected steps, and the best designs involve information from the flash media, file system, and application vendors.

Tuxera has represented each of those roles. During the conference, I touched briefly on our software controller for raw flash media, FlashFX Tera. In this blog post I’d like to describe in more detail some of the steps taken to securely remove data at that level, including the specific tools and methods involved.

Tidying things up with garbage collection

We start by dealing with copies of secure data on the device. Since NAND cannot be modified in place, these copies are left over from copy-on-write commands, wear leveling, and other performance shortcuts. These obsolete copies are removed through a process known as compaction or garbage collection, occurring after the file system notifies the flash media controller that the data in question is no longer in use. FlashFX Tera has an API to request a compaction, similar to the Sanitize API provided by eMMC and UFS media.

From an application level, the process would look like this. Secure data is created on the media. Then, when that data needs to be modified, the application can “overwrite” that existing data. Although the NAND media will not physically overwrite the data pages, it will automatically mark the previous page as ready to be erased. The API can then be called to compact the erase block, resulting in only a single copy of the secure data.

Discards and trims to finish the job

For proper protection, the secure data file then needs to be completely removed. This can be done by overwriting it all first, but the better method is using the file system discard or trim command instead. Following that, a normal compaction (or an immediate one triggered by the API) will remove the last remnant of that secure data from the media. At last, our data is fully erased from our device – it’s safe.

Whitepaper: Keep device data safe with secure erase

Keeping your embedded device data safe and secure is a detailed topic. For more information, download our whitepaper. Read the abstract below:

Removing data securely from flash media is more challenging than older magnetic designs. The software and firmware must work in unison to provide secure solutions that are increasingly in demand. In this paper, we detail the secure interface from the application to the media and point out the possible pitfalls along the way.

 


Let’s talk more about safeguarding your data on embedded devices.

CONTACT US

What is secure erase?

Embedded devices today store a wide variety of data. You would be forgiven for thinking that when data is removed from such a device, it’s completely gone. Unfortunately, that isn’t always the case. While sometimes data is inherently secure through techniques like encryption or encoding, not all device designs provide secure means of data removal.

Just how hard is it to remove embedded device data?

For electronic media, the data must be both erased and overwritten – only then is the data securely deleted from the drive. Some use cases that demand such a thorough level of data erasure include: temporary storage of secure data (for example, a web browser cache), or when changing users on a shared device – or when a device will be sold. Another use case could be in the event of remote theft – a “kill pill” to remove secure data before hackers gain access.

Data that’s removed from such devices can sometimes be recovered – a potentially significant security risk. Suboptimal data removal can lead to sensitive data falling into the wrong hands, and may even reduce the lifetime of the device itself. For these reasons, methods like secure erase are used to make sure data that needs to be disposed of gets properly removed, without the possibility of recovery.

Overwriting data for proper security

Secure erase is a data sanitization method for completely erasing data off of a device. More specifically, it’s a group of firmware commands that together function as an interface for secure data removal. Importantly, secure erase does not simply move data to a different location on the device. Instead, sanitization methods like secure erase aim to permanently wipe data from the device, preventing recoverability.

Secure erase works by overwriting the data at its location with new data that’s random and useless (usually binary 1’s and 0’s). Once this overwriting has been accomplished, software-based data recovery methods (like file or partition recovery programs) won’t be able to recover the data. Furthermore, because secure erase is a command baked into the firmware, any missed write operations are checked – ensuring a more complete and watertight overwriting process.

The above overwriting process is also affected by the form of media on the device. NAND media, for example, is particularly tricky. It adds layers of difficulty to secure erasure as the data we want gone has to be written to a new location first – a technique is called “copy on write”.

While not everyone may agree on the very best method of data sanitization, secure erase is widely considered popular and reliable. It remains a good choice when a permanent solution is needed for data removal on embedded devices.

Secure erase and NAND at Embedded World 2021

Secure erase is a topic with a lot of detail – far too much for a single blog post. Join me this week at Embedded World, where I’ll be giving the following talk on secure erase on NAND media:

Title of talk: “Keeping device data safe with secure erase”

Session: 4.8 Safety & Security: Security Hardware

Date/Time: Wednesday, March 3, 2:00:00 PM – 2:30:00 PM (CET)

Abstract: Removing data securely from flash media is more challenging than older magnetic designs. The software and firmware must work in unison to provide secure solutions that are increasingly in demand. In this talk, we detail the secure interface from the application to the media and point out the possible pitfalls along the way.

After my talk, I’ll be online to answer your questions and talk about secure erase and NAND media.

Final thoughts

It is important to remember that for proper data security, how you get rid of the data is just as important as how you protect it while it’s kept on the device. It is not enough to store data securely and reliably – it must also be disposed of with the correct methods. Optimal data security is a process that encompasses the design of the entire embedded system – from the chosen media through the application itself.

 


Let’s talk about maximizing the security of your embedded device data.

CONTACT US

Comparing protocols for USB devices – which one's more significant?

Universal Serial Bus (USB) is a commonly used interface for transferring data between devices. To achieve that communication, USB connected devices use file system protocols. Embedded devices connected through USB have two different ways of revealing the stored contents of their media: USB Mass Storage (UMS), and the Media Transfer Protocol (MTP). Both of these protocols allow files to be copied to and from host computers, dragged and dropped through GUI interfaces, and enable control of the media. Put simply, the protocol is what allows devices to communicate by way of USB. What are the benefits and disadvantages of these two protocols?

A tale of two protocols

Before I go any further, a little bit of background. Back in 2013, I spoke about comparing two protocols for USB devices. Examples of these designs, sporting operating systems and storage, ran the gamut from MP3 players, to smart phones, to handheld scanners and peripherals. Bluetooth was around, but uncommon for embedded designs and data transfer. As the years have rolled by, USB technology has become faster and more powerful. How do the protocols used measure up now, seven years later? Let’s first look at a technical description, and then an update for where we are today.

USB Mass Storage class, also called UMS, is a protocol that reveals the media block device to the connected host computer. In order for the host to use this storage, a common file system format is required. In other words, if a device with UMS has media formatted with exFAT, the host computer needs to have an exFAT driver to access those files. Usually the host driver has exclusive access to the media, and this is the situation on Windows where a user sees an “eject” option that should be used before removing the device. By comparison, a USB stick or key and an SD card both operate in the same fashion.

The Media Transfer Protocol, also called MTP, is another option. This is an extension of the Picture Transfer Protocol originally developed for cameras. A host computer with an MTP driver is able to communicate with the device at a higher level than the block storage, using packet commands to access and copy files. This removes the requirement that a host computer have a file system driver to match the media format. The device no longer has exclusive access, and no “eject” requirement exists for media connected with this protocol.

Seven years later – is interoperability the deciding factor?

Time has shown connectivity and interoperability to be highly salient factors for USB protocols. According to recent research, more host computers today are running Linux than seven years ago, with both Linux and Mac workstations being more common in business environments. Each of these have stable MTP drivers. Similarly, Android smart phones use ext4 as their default file system, and these connect seamlessly over MTP with a Windows 10 desktop – which has no file system driver for ext4. So if Google moves Android to another default Linux file system in the future, no changes will be required to host system MTP drivers.

Removing the “eject” requirement is another benefit of this difference. As long as the device has its own power, no loss of data can be expected when a resting device is unplugged – data cached on the device end will be written as normal. Device manufacturers are free to choose a reliable file system to match their design goals, instead of being limited to a standard to be matched on all host computers.

Final thoughts

While both MTP and UMS have their uses, the improved connectivity and file system interoperability afforded by MTP is a considerable advantage. And when it comes to preventing the corruption of valuable data in USB connected devices, that advantage becomes even more notable.

If you have any questions about USB or file systems, don’t hesitate to reach out to us. Our global team of file system experts with a specific focus on customer success.

 


Let’s talk more about keeping file systems data secure.

CONTACT US

Wrapping up GENIVI AMM '20: Q&A and whitepaper

Thank you to Mike Nunnery and the GENIVI team for making this All Member Meeting a success at the end of October. While our booth didn’t have many attendees, our talk on Solving Data Storage Challenges in the Automotive Projects of Tomorrow was well attended. In fact, the video is available on Youtube.

Q&A

Below are some of the questions we received, along with slightly more detailed answers.

How should I estimate the lifetime for a memory component?

This starts with a conversation with the memory vendor(s). They provide information in datasheets, and connecting their numbers to block sizes and then actual lifetime is pretty important math. The vendor can also help understand any firmware they have chosen for eMMC, UFS, NVMe and the like. Once your team can translate these values, you can examine write amplification and start making more comprehensive estimates.

Once the design team has prototype designs available, you can start testing and simulating use cases. Be sure to factor in how the design handles under extreme conditions, where bit errors (and subsequent cleanup) can be more frequent. We provide a flash testing service that can measure this level of detail.

As you look ahead towards 2021, where do you see the opportunities and challenges for the data storage market?

Security can provide both challenges and opportunities.

Just as we are working in locations we didn’t expect at the beginning of the year, we are also using devices in more locations than we previously expected. The challenge is figuring out the broad range of expected and unexpected options for a design. Availability for an over-the-air update is one that we spoke of (the FCA Uconnect bug), and I expect with autonomous driving situations, many more will occur.

An opportunity here is to respond to security issues with a consistent message. How to handle secure erase – from the firmware through the file system and into the application – is just one example. The flash media takes time to erase; but customers may not want the system to be unavailable while it does this work. How the system is designed to handle a power interruption and restoration during this work is also important.

With all the functional safety requirements, is this not handled automatically by the Flash memory components themselves and the associated operating system?

Broadly speaking, there are no certified memory components available. The biggest hurdle is determinism, because NAND media write times can vary with prior conditions and even the age of the part. While bit error correction, properly handled, should always return a complete block of data, we have also seen situations where that data is valid, but stale – part of a previous system state. Media redundancy could be an option to solve this sort of problem.

On properly functioning media, the next step is the file system. This can be built into the operating system or added later. Great strides are being made towards certification at this level, with solid designs and traceable testing. A related trend is tracking the process through methods like Automotive SPICE.

As you are looking at data storage challenges in automotive, particularly this year and as we move into next year, how would you prioritize these challenges, in terms of significance and what we need to be focusing on over the next 6-12 months?

During this session, we spoke about situations where programs and applications were writing more to the media than originally expected – Tesla and the desktop Spotify bug. I think it’s crucial to guard against the unknown future. This could perhaps be done by the hypervisor, limiting what a given guest OS is able to write, or at the level of the operating system (especially Android). File systems could play their part by utilizing a quota system or the like.

To an extent, this is a lifetime challenge, and I would prioritize this detail above all. As my colleague writes, cars aren’t cell phones, and most consumers won’t stand for a cell-phone like lifetime from any vehicle.

Other challenges include new technologies, security, and multi-channel multithreaded access from many applications and devices. I think some progress will be made on these in the next year, but they aren’t quite as crucial as preventing future failures from unknown applications.

Do you have any suggestions for finding the root cause of corrupted data?

When it comes to automotive file system challenges, data corruption is something we’ve had plenty of experience in tackling. I’ve written a whitepaper on this very issue, in which I talk about some of the ways we at Tuxera approach data corruption, including (but not limited to) issues in automotive systems.

Read the whitepaper here.

Final thoughts

Thank you once again for the interesting questions. Hopefully my answers have helped shed new light onto automotive file systems. And please enjoy the whitepaper.


Let us help you solve your data storage challenges.

CONTACT US


The hidden costs of automotive storage decisions: flash media wear and MCU failures in 159,000 Tesla cars

Flash lifetime can’t be ignored. Late last year, Tesla had problems with the flash storage memory in its connected cars. The company was in the news again this month, when the Office of Defects Investigations released their report summarizing the MCU failures that affect approximately 159 thousand vehicles. This is interesting as much for the report as for the reaction among embedded developers, some of whom still don't understand that flash media has a limited lifetime.

Examining the metrics, visualizing the costs

Let’s take a closer look at the report itself. The Office of Defects Investigations report noted that there were 2,936 complaints, with thankfully no injuries, fatalities, crashes, or fires. Another 12,523 warranty and non-warranty claims for MCU replacements are also factored into this report. It is good that none of these MCU failures are directly related to safety. The closest problems related to total failure seem to be loss of HVAC controls (for defogging windows) and the Advanced Driver Assistance Support (ADAS).

What I found interesting about the report are Tesla's internal metrics for measuring the flash media wear in the vehicle. Each erase block on the Hynix media is rated for 3000 Program/Erase (P/E) cycles in total. Tesla described nominal daily P/E cycle use as a rate of 0.7 per block, and estimated for that rate that 11-12 years would be required to accumulate a total of 3000 P/E cycles per block. For the 8 GB media, that would work out to 5.6 GB written to the media per day. The file system writes considerably less than that, of course, due to write amplification.

Also highlighted were higher rate users, the 95th percentile of daily use. Tesla expected their P/E cycle use to rate as high as 1.5, where it would take 5-6 years to accumulate the maximum P/E cycles.

The rates of 0.7 and 1.5 are dependent on the chosen media and available space, of course. As of May 2020, Tesla remanufacturing began producing spare parts incorporating a Micron eMMC with 64 GB of storage. This should also bring those rates down by a factor of 8 – assuming the Micron part has a similar P/E cycle lifetime.

Importantly, all the complaints and claims for MCU replacements represent just a small percentage of 159 thousand Model S and Model X vehicles. Tesla did indicate that MCU failures are likely to continue to occur in subject vehicles until 100% of units have failed. An expensive replacement of either the media or the entire MCU board is the only alternative. Tesla has admitted as much, recently informing its customers by email that the eMMC in its faulty vehicles is warranted for 8y/160 kkm, and that they will replace it from 8 GB to 64 GB. Tesla has also agreed to reimburse old repairs. All in all, a costly outcome.

Can patching up the damage be enough?

Tesla has not been idle. Through OTA updates, Tesla has already released 6 firmware patches to help deal with the problem. These patches have at least tried to alleviate previously mentioned loss of HVAC controls and ADAS problems first. The patches overall have ranged from removing high frequency system log messages and suppressing informational log messages when no user was in the vehicle, to increasing the journal commit interval and reducing the settings database write frequency.

Unfortunately, these firmware patches are unlikely to be enough. Once P/E cycles are used, they cannot be regained without replacing the media. A patch late in the cycle will, at best, add only a year of life to the vehicle.

It is also unlikely that future automotive designs will be able to solve problems with reduced logging. If anything, data recording is expected to grow over the next decade. Hypervisors and domain controllers collect data from multiple sensors, storing to common media devices. Another larger source of growth will be autonomous vehicles, with multiple video streams and even more sensor data. These factors highlight the continuing importance of edge storage in the vehicle, as well as proper flash memory management.

Understand the storage stack – before things go wrong

So where should Tesla go from here to deal with all this? At Tuxera, we have encountered issues like Tesla’s numerous times. Our recommendation remains the same as when we wrote about this topic a year ago. Namely, that a complete and correct understanding of the memory devices (and their limitations) and other software components related to data management (the file system and flash management) are key to understanding systems that are designed to be robust. This is the approach that guides our continued collaboration with customers and partners on activities such as workload analysis, lifetime estimation, write amplification measures, and ultimately the selection of that data management software.

Final thoughts

As we have mentioned before, we’re fans of Tesla. But the Office of Defects Investigations report paints a picture of the potential damage that can result from an incomplete understanding of a vehicle’s storage stack. With proper flash memory testing methods unique to the needs of a given use case, flash memory failures can more effectively be prevented.


We work closely with OEMS and Tier 1s to identify flash needs specific to each unique use case. Let’s solve your automotive data storage challenges.

CONTACT US