Wrapping up GENIVI AMM '20: Q&A and whitepaper

Thank you to Mike Nunnery and the GENIVI team for making this All Member Meeting a success at the end of October. While our booth didn’t have many attendees, our talk on Solving Data Storage Challenges in the Automotive Projects of Tomorrow was well attended. In fact, the video is available on Youtube.


Below are some of the questions we received, along with slightly more detailed answers.

How should I estimate the lifetime for a memory component?

This starts with a conversation with the memory vendor(s). They provide information in datasheets, and connecting their numbers to block sizes and then actual lifetime is pretty important math. The vendor can also help understand any firmware they have chosen for eMMC, UFS, NVMe and the like. Once your team can translate these values, you can examine write amplification and start making more comprehensive estimates.

Once the design team has prototype designs available, you can start testing and simulating use cases. Be sure to factor in how the design handles under extreme conditions, where bit errors (and subsequent cleanup) can be more frequent. We provide a flash testing service that can measure this level of detail.

As you look ahead towards 2021, where do you see the opportunities and challenges for the data storage market?

Security can provide both challenges and opportunities.

Just as we are working in locations we didn’t expect at the beginning of the year, we are also using devices in more locations than we previously expected. The challenge is figuring out the broad range of expected and unexpected options for a design. Availability for an over-the-air update is one that we spoke of (the FCA Uconnect bug), and I expect with autonomous driving situations, many more will occur.

An opportunity here is to respond to security issues with a consistent message. How to handle secure erase – from the firmware through the file system and into the application – is just one example. The flash media takes time to erase; but customers may not want the system to be unavailable while it does this work. How the system is designed to handle a power interruption and restoration during this work is also important.

With all the functional safety requirements, is this not handled automatically by the Flash memory components themselves and the associated operating system?

Broadly speaking, there are no certified memory components available. The biggest hurdle is determinism, because NAND media write times can vary with prior conditions and even the age of the part. While bit error correction, properly handled, should always return a complete block of data, we have also seen situations where that data is valid, but stale – part of a previous system state. Media redundancy could be an option to solve this sort of problem.

On properly functioning media, the next step is the file system. This can be built into the operating system or added later. Great strides are being made towards certification at this level, with solid designs and traceable testing. A related trend is tracking the process through methods like Automotive SPICE.

As you are looking at data storage challenges in automotive, particularly this year and as we move into next year, how would you prioritize these challenges, in terms of significance and what we need to be focusing on over the next 6-12 months?

During this session, we spoke about situations where programs and applications were writing more to the media than originally expected – Tesla and the desktop Spotify bug. I think it’s crucial to guard against the unknown future. This could perhaps be done by the hypervisor, limiting what a given guest OS is able to write, or at the level of the operating system (especially Android). File systems could play their part by utilizing a quota system or the like.

To an extent, this is a lifetime challenge, and I would prioritize this detail above all. As my colleague writes, cars aren’t cell phones, and most consumers won’t stand for a cell-phone like lifetime from any vehicle.

Other challenges include new technologies, security, and multi-channel multithreaded access from many applications and devices. I think some progress will be made on these in the next year, but they aren’t quite as crucial as preventing future failures from unknown applications.

Do you have any suggestions for finding the root cause of corrupted data?

When it comes to automotive file system challenges, data corruption is something we’ve had plenty of experience in tackling. I’ve written a whitepaper on this very issue, in which I talk about some of the ways we at Tuxera approach data corruption, including (but not limited to) issues in automotive systems.

Read the whitepaper here.

Final thoughts

Thank you once again for the interesting questions. Hopefully my answers have helped shed new light onto automotive file systems. And please enjoy the whitepaper.

Let us help you solve your data storage challenges.


The hidden costs of automotive storage decisions: flash media wear and MCU failures in 159,000 Tesla cars

Flash lifetime can’t be ignored. Late last year, Tesla had problems with the flash storage memory in its connected cars. The company was in the news again this month, when the Office of Defects Investigations released their report summarizing the MCU failures that affect approximately 159 thousand vehicles. This is interesting as much for the report as for the reaction among embedded developers, some of whom still don't understand that flash media has a limited lifetime.

Examining the metrics, visualizing the costs

Let’s take a closer look at the report itself. The Office of Defects Investigations report noted that there were 2,936 complaints, with thankfully no injuries, fatalities, crashes, or fires. Another 12,523 warranty and non-warranty claims for MCU replacements are also factored into this report. It is good that none of these MCU failures are directly related to safety. The closest problems related to total failure seem to be loss of HVAC controls (for defogging windows) and the Advanced Driver Assistance Support (ADAS).

What I found interesting about the report are Tesla's internal metrics for measuring the flash media wear in the vehicle. Each erase block on the Hynix media is rated for 3000 Program/Erase (P/E) cycles in total. Tesla described nominal daily P/E cycle use as a rate of 0.7 per block, and estimated for that rate that 11-12 years would be required to accumulate a total of 3000 P/E cycles per block. For the 8 GB media, that would work out to 5.6 GB written to the media per day. The file system writes considerably less than that, of course, due to write amplification.

Also highlighted were higher rate users, the 95th percentile of daily use. Tesla expected their P/E cycle use to rate as high as 1.5, where it would take 5-6 years to accumulate the maximum P/E cycles.

The rates of 0.7 and 1.5 are dependent on the chosen media and available space, of course. As of May 2020, Tesla remanufacturing began producing spare parts incorporating a Micron eMMC with 64 GB of storage. This should also bring those rates down by a factor of 8 – assuming the Micron part has a similar P/E cycle lifetime.

Importantly, all the complaints and claims for MCU replacements represent just a small percentage of 159 thousand Model S and Model X vehicles. Tesla did indicate that MCU failures are likely to continue to occur in subject vehicles until 100% of units have failed. An expensive replacement of either the media or the entire MCU board is the only alternative. Tesla has admitted as much, recently informing its customers by email that the eMMC in its faulty vehicles is warranted for 8y/160 kkm, and that they will replace it from 8 GB to 64 GB. Tesla has also agreed to reimburse old repairs. All in all, a costly outcome.

Can patching up the damage be enough?

Tesla has not been idle. Through OTA updates, Tesla has already released 6 firmware patches to help deal with the problem. These patches have at least tried to alleviate previously mentioned loss of HVAC controls and ADAS problems first. The patches overall have ranged from removing high frequency system log messages and suppressing informational log messages when no user was in the vehicle, to increasing the journal commit interval and reducing the settings database write frequency.

Unfortunately, these firmware patches are unlikely to be enough. Once P/E cycles are used, they cannot be regained without replacing the media. A patch late in the cycle will, at best, add only a year of life to the vehicle.

It is also unlikely that future automotive designs will be able to solve problems with reduced logging. If anything, data recording is expected to grow over the next decade. Hypervisors and domain controllers collect data from multiple sensors, storing to common media devices. Another larger source of growth will be autonomous vehicles, with multiple video streams and even more sensor data. These factors highlight the continuing importance of edge storage in the vehicle, as well as proper flash memory management.

Understand the storage stack – before things go wrong

So where should Tesla go from here to deal with all this? At Tuxera, we have encountered issues like Tesla’s numerous times. Our recommendation remains the same as when we wrote about this topic a year ago. Namely, that a complete and correct understanding of the memory devices (and their limitations) and other software components related to data management (the file system and flash management) are key to understanding systems that are designed to be robust. This is the approach that guides our continued collaboration with customers and partners on activities such as workload analysis, lifetime estimation, write amplification measures, and ultimately the selection of that data management software.

Final thoughts

As we have mentioned before, we’re fans of Tesla. But the Office of Defects Investigations report paints a picture of the potential damage that can result from an incomplete understanding of a vehicle’s storage stack. With proper flash memory testing methods unique to the needs of a given use case, flash memory failures can more effectively be prevented.

We work closely with OEMS and Tier 1s to identify flash needs specific to each unique use case. Let’s solve your automotive data storage challenges.


Wrapping up the Embedded Online Conference: Q&A

For a lot of 2020, we’ve been talking about avoiding end-of-life from NAND Correctable Errors. Recently, I spoke about this very topic at the Embedded Online Conference, where I got to digitally interact with many of you, and received your questions. For those not up to speed on the entire topic, please feel free to see the whitepaper we produced here. This topic brought up some interesting questions that I think warrant a little more discussion and digging.

All about the firmware

Perhaps the most common question was, “Where is the error management actually being handled?” For an example project – an ARM single board computer running Linux, with ext3 file systems on both microSD and eMMC – the answer starts with the firmware. This is special code written to work with the NAND flash media and controller. On Linux, there are also drivers to connect that firmware to a standard block device layer, allowing the developer to use block tools like encryption.

While error management is handled by the firmware, the file system can make requests which make that management much easier on the media, adding lifetime to the design. In this case, the interface used is known as Trim or Discard – a notification from the file system that blocks are no longer being used. Developers can use flash storage with the Trim or Discard notifications turned off, and they may see higher short-term performance – but both long term performance and media lifetime will suffer.

Handling errors on flash media designs

Another question I received was related to special flash media designs that contain a one-time programmable (OTP) section. This sort of read-only area can be used for system firmware or default configuration settings. Even that use case does not mean it is impossible for bit errors to occur there. If the OTP section is provided by the vendor (and their firmware), they may have a contingency to handle the situation – reprogramming in place while maintaining power. This is a question worth asking. If the OTP section is more of a design choice by the development team, I would suggest working with the vendor and a flash software team to make sure errors are properly handled. In such cases, optimized and tailored support is crucial. Our team at Tuxera offers design review services which may be helpful.

Some designs however use flash media that doesn’t have firmware. We refer to this as “raw flash”, and on Linux that can mean using a flash file system, such as YAFFS, JFFS2 or UBIFS. This software must include the error handling software which decides whether to ignore a bit error for now, or correct errors by relocating the data. Balancing this choice is dependent on use case and desired lifetime, and it’s something I discuss in our whitepaper. Unfortunately, the Linux flash file systems relocate the data on the first bit error, which can reduce lifetime considerably. This was a good choice when the NAND controllers could only handle error correction on 4 bits of data, but modern controllers can perform bit correction on 40 or more bits per NAND block.

Tuxera’s FlashFX Tera is a Linux solution which can handle these situations with ease. To learn more about it, click here.

Final thoughts

I’ve really appreciated getting to answer questions and discuss file systems with other enthusiasts in the Embedded Online Conference. Later this month, I’ll be speaking on the topic of automotive security at GENIVI AMM. It will be another great opportunity to talk to you about embedded software – this time on the topic of automotive safety. I’m looking forward to the questions and comments from all of you – perspectives that I’m sure will have me thinking about data storage in a new way.

Let us help you solve your data storage challenges.


The Embedded Online Conference – ongoing tech seminars a click away

Just last month, I spoke at the fully virtual Embedded Online Conference. Avoiding end of life from NAND correctable errors is a topic I’ve covered in the past, and it's still just as relevant when it comes to flash memory lifetime.

But just how did I end up speaking at the Embedded Online Conference in the first place?

I was on the train from Frankfurt to Nuremberg for Embedded World 2019, where I was speaking on a couple of topics, and manning the tradeshow booth. I pulled out my laptop to get a little work done, and noticed the gentleman across from me was doing the same. We got to chatting, and I found out that he, Jacob Beningo, also worked in the embedded systems industry, and was looking forward to the three-day show.

Jacob was a consultant, and pitched an interesting idea about an online conference. His idea was that attendees could go to virtual sessions, handle questions through a forum interface, and there would even be a virtual "trade show" floor with product demonstrations. I was definitely interested, and it turns out Jacob was ahead of his time.

Still live, still connected

Surveying my inbox, it looks like the remaining tradeshows this year are going virtual. Fortunately, the folks at Embedded Online Conference had a head-start. They put together a really nice site, with presentations "going live" at particular times. These sessions (and the show) will remain live through July – so you can watch the talks at your own pace, and leave questions and comments too. What's more, there is even a healthy discount on the registration page if you have been furloughed or laid off because of the Coronavirus – this is a great opportunity for training!

I've had some great questions from my talk, and I'm already thinking hard to come up with topics for next year's conference. Will I see you there?

Visit the Embedded Online Conference site


Help! Why are my embedded devices failing?

When devices fail, the problems can be numerous. In conversations with the embedded OEMs we work with, a common issue affects almost every manufacturer – the cost of diagnosing and fixing the causes of field failure. This impacts time-to-market and pulls resources away from development, to be used instead for field diagnostics and post-mortem analysis. This issue is especially relevant for the following reasons:

  1. The need for defect prevention during field operations: The high degree of reliability required for protecting critical data dictates that devices must not fail. To ensure that devices are wear-fail-safe, manufacturers are required to run extensive tests for a range of user scenarios so as to safeguard against edge cases. The analysis of test results can be a daunting task due to several interfaces between hardware, software, and application layers. Hence, there is a need to continuously track these interactions, so that during a failure, any difference in the interactions can be discovered and corrected.
  2. Vulnerability of device to wear-related failures: As flash media continues to increase in density and complexity, it’s also becoming more vulnerable to wear-related failures. With the shrinking lithography comes increased ECC requirements, and the move to more bits/cell. With this also comes a concern that what was written to the disk may not in fact be what is read off the disk. However, most applications assume that the data written to the file system will be completely accurate when read back. If the application does not fully validate the data read, there may be errors in the data that cause the application to fail, hang or just misbehave. These complications require checks to validate data read as against the data written, so as to prevent device failures due to data corruption.
  3. Complexity of hardware and software integration: The complex nature of hardware and software integration within embedded devices makes finding the cause of failures a painstaking job, one that requires coordination between several hardware and software vendors. For this reason, it often takes OEMs days to investigate causes at the file system layer alone. Problems below that layer can entail more extensive testing and involve multiple vendors. Log messages can help manufacturers pinpoint the location of failure so that the correct vendor can be notified.

This ability to pinpoint the cause of failure is especially helpful when an OEM is:

    • Troubleshooting during the manufacturing and testing process to make sure that their devices do not fail for the given user scenarios.
    • Doing post-mortem analysis on parts returned from their customers, in order to understand the reasons for failures, and possible solutions.
    • Required to maintain a log of interactions between the various parts of the device, for future assistance with failure prevention or optimization.

Identifying the causes and costs of field failure is one thing, but what solutions can OEMs turn to in order to prevent these issues in the first place?

Fighting field failure with transactional file systems

Thankfully, various file systems solutions exist for safeguarding critical data. FAT remains a simple and robust option with decent performance. Unfortunately, it isn’t able to provide the degree of data protection or performance that is sometimes needed. In safety-critical industries like automotive, aerospace, and industrial, basic file systems like FAT are often unable to meet the needed performance and reliability.

Transactional file systems like Tuxera’s own Reliance Edge offer a level of reliability, control, and performance for data that is simply too vital to be lost or corrupted. One of the key features of Reliance Edge is that it never overwrites live data, ensuring a backup version of that data remains safe and sound. This helps preserve user data in the event of power loss.

In the video below, I demonstrate the performance and data preservation differences between a FAT driver and Reliance Edge.


Final thoughts

Correctly finding and identifying the cause of field failures is the first step in tackling them. The next step is choosing the right solution – one that’s optimized to secure your critical data specifically in case of field failure and power loss.

Embedded device manufacturers – find out how Reliance Edge can help bulletproof your critical data.



Are existing SMB solutions scalable and can they cope with current demands?

One or two parents required to work from home, online learning for the children, and even recreation all consist of streaming media of some sort. For more secure work, VPN is often the norm. Can all aspects of the internet handle the traffic?

In the current circumstances of home-centric everything, internet service traffic has skyrocketed in both professional and recreational spheres. The streaming service Netflix, for example, saw an unprecedented gain of 15.8 million subscribers for the first quarter of 2020. In my own region of Seattle and Washington State, internet traffic is up considerably – 30% to 40% higher than in January of this year. In response, local service providers such as Comcast and T-Mobile have waived their bandwidth caps, at least in the short term. One of their concerns is this stress test of the "last-mile" services - the modems, routers and other components of home networks.

SMB protocol is more relevant than ever for shared content access

Besides the need for high throughput – or high transfer speeds – another concern is secure access to shared files, and this is where networking protocols come in. Home routers connect to the enterprise local area network (LAN), often through VPN. Many workers staying at home connect through individual paths to a few enterprise servers, and Server Message Block (SMB) is the protocol that allows the sharing of the common files they need to do their jobs.

SMB servers can be open source solutions or proprietary implementations. The most commonly used implementation is called Samba, a helpful open-source alternative. Tuxera maintains its own proprietary implementation – Fusion File Share by Tuxera – with commercial-grade SMB features and enhancements that will handle the current stresses content providers and enterprises are facing during the COVID-19 epidemic – multiple users accessing the same content over the network.

Scalability is critical when countless organizations have switched to remote work

The key measurement for the current situation is scalability, because these network protocols need to provide files to more than just a few people – we’re talking 10s, 100s, even 1000s in the case of a large global enterprise such as a banking or medical institution. Companies are worried if their storage solutions can handle all the load of remote work. When an entire company hits the shared file at once, will all their requests get through without serious delay or even critical failures?

Increased loads have shown Samba can easily max out CPU and memory usage at 100%. This illustrates the challenges facing SMB protocols in today’s crisis. While Samba can be tuned to handle speed issues, implementing proper security and scalability measures unfortunately demands more human and infrastructure resources, increasing costs.

Final thoughts

The increased networking demands we’ve discussed place significant stress on widely used SMB services, with results felt across multiple industries, from banks to medical institutions. These disruptions can put organizations that are integral to societal function at risk. What’s worse, these risks are exacerbated given the uncertain nature of the current pandemic. This wasn't the use case that most network solution providers envisioned, but this is where we are today. Networking protocols that are sluggish and unreliable are simply unacceptable in a world that requires rapid data access.

Thankfully, solutions do exist to help network providers easily tackle speed and scalability in SMB. Latency and client overload are something Tuxera has tested for in SMB networking events for years, and we stand proudly behind our solution.

But regardless of the solution chosen, network service providers must evaluate how they can stay prepared for the scalability and security needs of the crisis today – as well as the needs of tomorrow.



Are you formatting your SD memory cards optimally?

We are excited to share with you an article from one of our valued partners, the SD Association. The following is a snippet from the original article. Be sure to read the full article here: The SD Memory Card Formatter – How this handy tool solves your memory card formatting needs.

For many of us, SD memory cards are an easy way to keep our important files and precious memories stored safely. But after using an SD memory card for a long time, files may begin to fragment, which can result in performance deterioration of the card. That’s when we use simple reformatting methods to wipe cards clean in an effort to restore their reliability and performance. Proper formatting is therefore essential in keeping our critical document files and favorite photos or videos available for future viewing.

First-rate formatting with the SD Memory Card Formatter

When formatting an SD memory card, specific tools and methods are required in order to ensure an effective process with minimal data loss.

The SD Memory Card Formatter, developed by Tuxera, handles SD memory cards in accordance with standards defined by the SD Association. In fact, it’s the official tool for formatting any SD, SDHC, and SDXC memory cards, as recommended by the SD Association. By optimizing an SD memory card to SD Association standards, the SD Memory Card Formatter safely improves the card's performance and lifetime. Operating system (OS) built-in formatters are rarely tested as rigorously, and often may not follow these standards as closely, resulting in formatting processes that are less reliable – and potentially leading to sooner memory card failure.

The SD Memory Card Formatter is designed to be the best tool for the job, for virtually every type of user – offering you the highest level of reliability and data integrity for all of your formatting and reformatting needs.

Read the full article on the SD Association’s website for more information and technical details on how the SD Memory Card Formatter can help you.


Embedded World 2020 wrap-up: smaller players and display tech get the stage

Tuxera has attended Embedded World for many years now, it being one of the premier events for embedded technology not just in Europe, but the world. It’s always an excellent opportunity for us to speak directly to our partners and other industry players. This year was no different, and though an impact from fears of illness was expected – after all Mobile World Congress was cancelled – the event proved insightful and productive.

Breathing room for meaningful interactions – and fun

Attendance was roughly a third of the previous year, and many companies opted out of exhibiting – some at the last minute. The organizers however did a good job of covering for missing booths, moving exhibitors from hall 5 into hall 4, and setting up places to sit in many parts of hall 4A – including an area with a beach theme! Nevertheless, it was a bit weird to walk by some large, fully constructed booths with no people or equipment in them.

There were also fewer people whose sole job was to pass out flyers and invite you into their stand. That led to more substantial conversations with more knowledgeable booth staff for the attendees.

Key meetings and greetings

It also seemed that there was a higher ratio of academics-to-working-professionals. Unlike in prior years when the bulk of students visited on Thursday, eager pupils wandered into our booth on all three days. I had the opportunity to demonstrate our GPL version of Reliance Edge and hear about some of the interesting projects they were working on. Perhaps the added data integrity of our file system will lead to their success!

Another silver lining was the opportunity for exhibitors to spend time with other exhibitors – visiting booths, seeing the demonstrations, and comparing notes. We came away with a few more partnership opportunities than in previous years, when we were busy talking to designers and students. I am especially excited about opportunities with Toradex – who premiered a product to assist those migrating from Windows Embedded to Linux. We also had a chance to explore deepening our partnerships with Green Hills in automotive and Mentor Graphics in resource-constrained certified markets.

Industry trends on display

A big theme this year among exhibitors were graphical interfaces and display technologies. Many of the exhibitors were discussing graphical interfaces and ways to speed up debugging. There were also some truly impressive display technologies, including large transparent screens and flexible ones.

Without the large semiconductor companies at the event, smaller players had a chance to get their messages out. There were a lot of special purpose chip vendors around, but far less bleeding edge chips shown due to the lack of big player attendance. As a result, special purpose chips and bespoke systems developers (as well as open source consortia) had audiences that would have probably overlooked them while busy talking to impressive players like ST Microelectronics and Wind River Systems in other years. At least one customer we spoke with had made the decision not to attend directly due to ST Microelectronics’ absence.

Looking onwards

The dates for next year have already been selected, and it is likely we will attend. What will be interesting to see is the impact this year’s show has on our plans for later this year, and of course next year. We’re already getting excited for Electronica this November, and the chance to meet more big players there. Will large booth companies like ST Microelectronics be back in the same space? Can the organizers do anything to improve attendance? And what new trends will happen to embedded designs in the next year? Always interested to find out!


One small step to a reliable file system

The Reliance Edge File System Essentials (FSE) is one of two API sets supported by Reliance Edge. It’s a minimalistic but reliable alternative to the POSIX-like option.

What are its benefits and how does it work? This feature summary should answer those questions.

The micro storage issue

Many small embedded designs don't have storage for data. Instead, programs on the device are simply loaded and executed. More sensors in the device and data-heavy situations mean a greater need to log some data – or decisions made – for later troubleshooting. Then again, some newer embedded designs are primarily used to gather sensor data, even if it is only until the device is in range of the cloud.

Such increases in data storage needs mean that system designers must eventually migrate from having no file system to needing something – if not a full POSIX implementation. They can take these steps on their own – treating storage as a memory pool here, storing data from multiple sensors there. While such an approach is doable, it opens the door to considerable increases in workload through complexity. For instance, storing data in two different "files" in a memory pool can mean load balancing. When doing this, a system designer also needs to consider unexpected interruptions, special media handling, and especially tests. Not an easy task.

A far better alternative is to hand the task over to experts. Until recently, though, the only file systems available for microcontrollers were simple FAT implementations, with little real thought towards fail safety or even performance. Reliance Edge changes all that – and File System Essentials provides a solid first step.

Reliance Edge FSE – A smart solution

Within FSE, there are no file names or paths. Instead, a set of numbered locations are defined. Within the code, they can be specified by #define values to make the project more readable. These locations can be read from, written to, and truncated. The size of these "files" can also be read. The number of available locations is fixed at compile time.

As mentioned earlier, multiple files increase the complexity of testing for a simple memory pool – doubly so if the effects of power interruption also have to be managed. With FSE however, all the tests are provided, and all core file interactions have already been validated. Reliability is provided by transaction points, also fully tested.

These tests were designed for Automotive SPICE, and Reliance Edge is written in MISRA C. While not necessary for a small embedded device, you can take comfort in just how fully tested this software is. Integrating Reliance Edge FSE into a project may be the simplest – and most effective – next step available today. No operating system required!

A compact, focused tool

Reliance Edge FSE has been designed to fill a very specific role. Like any highly focused solution, this means it sacrifices some breadth to achieve the levels of precision needed. The most obvious curveball of Reliance Edge FSE is that it’s not a full file system. Names, folders, handles, and file attributes are all missing. A file can never be "opened" or "closed" – it just exists.

Another aspect of FSE is that it’s modest with its disk space in order to stay tiny. And while this is a limitation, it’s probably a minor one for most designs. Reliance Edge FSE takes a simple, practical approach to your disk space needs – at format time, all the chosen files are created with zero size. There are no quotas or file size limitations, but all the data still has to fit in the available space.

Reliance Edge FSE uses just 3898 bytes of RAM – though that’s twice what was available for the entire Apollo 11 Guidance Computer, it speaks of how far flash storage needs have come.

Final thoughts

The File System Essentials API set can be a great stepping-stone from no file system at all to full POSIX. In fact, it can even be the only solution needed for some edge designs. With full tests, functionality, and reliability, it’s far better to use Reliance Edge than something written ad-hoc by even the best developers. So if you’re looking for a compact, no-frills tool to handle your embedded flash storage needs – we got you covered.

Embedded manufacturers – let’s work together to ensure your data storage needs are accurately met.


Embedded file systems – trickier than you think

The Electronic Engineering Journal published an interesting article by Jim Turley this week, discussing file system and the popular SD media used. While this article brings up some good points about media reliability, I’d like to dive a little deeper into two of the points he talks about – hopefully giving a bit more perspective. A file system designed for better reliability can be less tricky than you think.

Definitions of reliability

The users of embedded devices are probably not file system experts, and sometimes the designers of the devices aren't either. From the perspective of the user, they just want their data to be on the device when they expect it to. We think of this as data integrity. As the device ages, data retention also becomes a consideration – but that’s a topic for another blog post. Some of the techniques that protect the data integrity include journaling the data, using redundant writes, atomic updates like Tuxera product family, and transaction points provided by Tuxera's Reliance family of file systems.

The designer of the device may – or may not – care about the user data, but the absolute requirement from their perspective is that the device be able to boot and operate. This is the primary focus of most reliability improvements to file systems over the last decades – making the file system fail-safe. Some of the techniques used include logging or journaling the metadata, atomic operations, and utilizing the second FAT table to provide a pseudo-transaction – as in Microsoft Transactional exFAT (TexFAT). Most operations that protect the data integrity also provide a fail-safe environment for the system data.

Underlying all of this is the hardware, and as Jim Turley pointed out, reliability has to be a design concern from top to bottom, not just an add-on or an afterthought. The file system certainly can't prevent failures of the media – blocks or sectors going bad, in other words – but it should be able to detect and mitigate them.

Mitigating media problems

SD media fails in a number of ways, including failure to read or write, and returning erroneous data. The first two are easily detected by the file system, but the third can be a bit trickier.

Detecting erroneous data in the system data provides a different level of fail-safety, and this is often done with a CRC on the file system structures and metadata. The default file system on Linux can do this, but it is not enabled by default.

Once detected, the next step is handling the error – is recovery possible? For user files and folders, a disk check can mark those files – or restore data to a fixed name like FILE0000.CHK – and move on. While the user may lose data, at least the system continues to function. For system files and folders, the solution can be a lot more difficult.

Our files systems either transparently recover on-the-fly or optionally throw an exception for these situations, allowing the system designer to handle some situations gracefully. As an example, an error in the automotive design map data could result in an error message letting the driver know that map data is unavailable or corrupt, and that they should return to a dealer for an update.

The unhandled exception, utilized primarily in system validation, is also useful because it can lock the system down in a read-only state. This allows the test engineers to step in and see exactly where failure occurred, helping them quickly determine the root cause of the failure.

We can go one step further and provide optional CRC protection of user data files, taking user data integrity to a much higher level.

Final thoughts

While Turley's article does point out key design concerns, he suggests that the media is most of the problem. I've used this space to explain some of the file system choices for reliability, and how data integrity differs from fail safety. We also examined how detection of a problem can lead to possible solutions – or at least more graceful failures.

As we’ve seen, SD card reliability can be a tough nut to crack. But with the right expertise, it’s doable.

SD manufacturers – let’s work together to ensure your data is responsive, reliable, and fail-safe.

Contact us