The physical AI data layer: where reliability becomes economics
In physical AI, the data path is the system. As deployments scale, reliability stops being a background property of the...
We are here to help
Have a question or need guidance? Whether you’re searching for resources or want to connect with an expert, we’ve got you covered. Use the search bar on the right to find what you need.
In conversations with customers building physical AI systems, robots on factory floors, autonomous vehicles, drones, industrial controllers, one pattern is showing up again and again. The teams building these systems treat networking determinism as a first-class engineering problem. They specify it, design around it, and qualify against it. Storage determinism gets nowhere near the same scrutiny.
I have not yet met a physical AI team that puts the same weight on storage determinism that they put on networking determinism. And yet in these systems, where a single missed control cycle can be the difference between a working robot and a damaged one, the storage path is exactly where the worst-case behaviors hide.
Earlier posts in this series laid the groundwork: the infrastructure gap holding physical AI back, how physical AI is industrializing once intelligence leaves the lab, and why the discipline is shifting from cloud-style averages to edge-style worst-case guarantees. This post takes that lens to storage.
The word deterministic gets used loosely, and that is part of the problem. Meeting deadlines is usually what task scheduling entails in an RTOS context. In networking, it means bounded packet delivery times. In storage, it should mean the same thing: bounded read and write latency, every time, regardless of what else is happening in the device. Most of the storage industry does not talk about it that way.
A quick way to think about the distinction: a non-deterministic system is usually fast, but occasionally unpredictable. A deterministic system operates within strict timing bounds every single time. In a cloud environment, the first is fine. The architecture and surfaces absorb the occasional spike, at worst, as a slower page load. In a physical system, the occasional spike is the failure. The rest of the control loop builds against the bound that the storage or network is supposed to hold. Breaking the bound causes wrong downstream behavior.
For an engineering audience, that bound is more specific. Determinism is not peak speed or good average latency. It is bounded behavior within a defined operating envelope: workload, device state, temperature, wear level, power-fail model, and lifetime assumptions.
Unpredictable latency in a cloud system is annoying. Unpredictable latency in a physical AI system is a physical risk.
The deterministic networking problem has been worked on for years. Time-Sensitive Networking (TSN) extends Ethernet with bounded latency and scheduled traffic. Deterministic Networking (DetNet) does the same at the IP layer. RDMA, in its various forms, removes the variability of kernel-mediated network paths. None of these are perfect, and none of them solve every workload, but they exist, they are standardized, and they have a clear engineering discipline around them. Physical AI teams know how to ask for them.
Storage has not had the same treatment. There is no TSN equivalent for flash. The standard expectation, average throughput, IOPS, and latency numbers on a datasheet, is a cloud expectation. It tells you nothing about worst-case behavior under sustained write pressure, after the flash has aged, or during garbage collection. For a physical AI system, that is exactly the information that matters.
Some of the hardware-level tools exist, but no broadly adopted, end-to-end equivalent to TSN spans embedded flash deployments. The partial tools, NVMe I/O Determinism, NVM Sets, Predictable Latency Mode, SD performance classes, over-provisioning, pSLC modes, host-managed flash, and specialized file systems, are workload-specific, optional, device-dependent, and rarely treated as part of the physical AI timing architecture. Very few of the physical AI teams I have spoken to are using them deliberately.
It is also worth noting where physical AI edge systems actually run. Most systems do not use NVMe SSDs for their construction. They use eMMC, UFS, SD, raw NAND, or managed NAND. These provide useful mechanisms (command queuing, cache controls, background-operation control, WriteBooster, HPB, health reporting, vendor-specific industrial modes) but they do not provide an end-to-end deterministic latency contract comparable to what the networking world expects from TSN/DetNet, or to NVMe’s optional I/O Determinism model. eMMC offers command queuing, cache controls, reliable write, and background-operation control, but none of these is a standardized deterministic-latency mode. UFS has performance and host-assist mechanisms, but no broadly equivalent NVMe 1.4-style predictable latency mode.
Predictability is not a design goal for flash memory. The physics of NAND flash introduce several effects that work directly against determinism in ways that are well-known to embedded engineers, and largely invisible to anyone treating storage as a black box.
Writes do not go where they are told. Flash can only be written to erased pages, and erases happen at block granularity. When a logical write arrives, the flash translation layer decides where it actually lands. That decision depends on the state of the device at that moment: how full it is, how worn the candidate blocks are, and what background operations are in flight.
Garbage collection happens when it happens. Free pages run out, and the controller has to reclaim them by consolidating valid data and erasing blocks. This is an inherent property of how flash works, and the latency of a write that triggers garbage collection can be one or two orders of magnitude higher than one that does not.
Wear leveling, write amplification, and refresh add more variability. These are all mechanisms that protect endurance and data retention. They also mean that the behavior of the flash depends on its history, not just its current request.
These are not new problems. Engineers have studied and engineered around these problems for decades in embedded systems. What is new is their consequence. In a smart meter, a latency spike on flash is invisible. In a factory robot receiving control commands every millisecond, it is not.
Consider a scenario that a team I work with recently ran through. A robot on a factory floor receives control updates every 1 millisecond. In parallel, it continuously records hundreds of megabytes per hour of sensor, diagnostic, and safety data onto its onboard flash. The control loop is not intentionally waiting on storage, but the logging path shares system resources with the real-time workload: CPU time, memory bandwidth, interrupts, DMA, kernel paths, I/O queues, and the same underlying flash device.
For most of the duty cycle, everything functions correctly and the storage path is fast enough that nobody notices it. Then the device reaches a state where background flash management becomes unavoidable: garbage collection begins, valid data moves, and blocks are erased. A write path that normally completes in sub-millisecond time suddenly takes multiple milliseconds, or on some devices and workloads, tens of milliseconds.
That spike may not directly block the servo loop. In a well-designed system, it should not be possible. However, it can still affect the timing envelope around the loop by filling logging buffers, causing interrupts to arrive at the wrong time, allowing lower-priority I/O to hold a shared lock, consuming memory bandwidth, or making a safety-state update wait behind storage work that the system never budgeted for. The result is not “slow storage” in isolation. It is a missed timing assumption somewhere else in the system.
At best, the robot drops data, enters a fallback mode, or stutters while the system catches up. At worst, it makes a motion decision with stale context or without the diagnostic evidence that the safety architecture expected to be available.
It is not the case that the flash device is defective, which is the root cause. The root cause is that the system’s design favors average-case storage behavior, whereas the robot’s actions depend on worst-case timing. The storage subsystem was treated as a passive destination for data, when in reality it was part of the system’s timing architecture.
In autonomous vehicles, drones, industrial controllers, and other physical AI systems, the same pattern arises where real-time workloads and heavy data logging coexist on constrained edge hardware. The lesson is not that every control loop should write to deterministic storage. The lesson is that storage behavior has to be included in the timing analysis before the system reaches production.
Making storage deterministic in a physical AI context is not a single feature. It is a set of properties that the whole storage stack has to hold together, under realistic load, for the full lifetime of the device.
Bounded worst-case read and write latency under sustained workloads, not just on fresh flash. Garbage collection events lead to predictable behavior, not an occasional pause. Power-fail safety that does not require a long recovery on the next boot. Wear management that does not silently push worst-case latency upward as the device ages. And all of it tested and characterized at the system level, not inferred from datasheet numbers.
This is the work we spend most of our engineering effort on, and it is why we treat determinism as a first-class property of our file systems and flash management software rather than an optimization to be added later.
Determinism is a stack property, not a drive feature, a file-system feature, or an FTL feature in isolation. The application’s write pattern, the file system’s allocation behavior, the FTL’s garbage collection policy, the device’s spare capacity, power-fail handling, and the system’s scheduling model all have to agree on the same timing contract. That is what we engineer toward, and it is why piecemeal optimizations rarely hold under field conditions.
This is also why the deployment age matters. A system that behaves well in the lab on fresh flash may look different after a year in the field. Log files have been created, rotated, deleted, and rewritten thousands of times. Free space is no longer clean and predictable from the file system’s point of view. Inside the flash device, the distribution of valid and invalid pages can make garbage collection more expensive. The same write pattern that looked harmless during qualification can now trigger longer stalls, more write amplification, and higher tail latency. Nothing broke; the storage stack simply moved into a state that the original timing analysis did not cover.
In physical AI, the question is not whether storage sits inside the control loop. It is whether storage can disturb any shared resource that the timing-critical workload requires.
A quick test protocol: ask for worst-case latency, not average throughput. Test aged and preconditioned media, not just fresh devices. Fill the device to realistic field levels. Run the real logging workload, not synthetic sequential writes. Include power-fail-safe commits. Measure p99.99 and max latency over long runs. Then decide whether the storage path belongs inside, adjacent to, or isolated from the timing-critical path.
As physical AI systems move from pilot into volume production, the conversations we are having with customers are shifting. The question is no longer whether a system can hit average-case performance numbers. It is whether the system holds its timing envelope under the conditions it will actually see in the field, for the years it is expected to operate.
Deterministic storage is not a nice-to-have in that world. It is a precondition for scaling. Teams that bake it into their architecture from the start get to treat reliability as a capability; teams that do not will eventually discover that their systems meet the spec at launch and drift out of it in the field. The cost of closing that gap late is always higher than the cost of closing it early, and in physical AI, the cost often shows up in places that are hard to recover from.
Working on a physical AI system where storage timing matters? Let's talk.The infrastructure gap holding physical AI back
Why physical AI industrialization changes everything beneath the model
Models to systems: AI engineering’s next phase
‘The physical AI data layer: where reliability becomes economics
Suggested content for: