Blogs

27.04.2026

Your AI bottleneck isn’t compute. It’s the file system

When AI platforms misbehave in production, the instinct is to buy more GPUs. Many teams are solving the wrong problem.

The real bottleneck may not be compute, but storage. Teams add GPUs, optimize batching, tune the model, and still see latency spike unpredictably, scaling take longer than it should, and workers stall waiting for data. The compute isn’t the constraint; the infrastructure delivering data to that hardware is.

Key takeaways

Compute is rarely the constraint when AI platforms misbehave in production. The storage layer delivering data to that compute usually is.
Training and inference both demand fast, concurrent file access from many workers. That’s a file access problem, not a compute problem.
Common workarounds (local copies, object storage, custom pipelines) each carry hidden costs that scale poorly.
A high-performance shared file system gives every worker one authoritative copy and a consistent view, with no bespoke distribution logic.
SMB 3 with RDMA, multichannel, and scale-out outperforms NFS on AI training workloads. Fusion SMB delivers SMB 3’s full potential on Linux.

Where storage pressure actually comes from

The phase where storage matters most is training. AI training feeds models large datasets of unstructured data: files, documents, and images. It can run for days, weeks, or months. Checkpointing alone creates sustained, intensive storage demand throughout that process. Saving the model’s state at regular intervals so training can resume after failure means the cluster pauses repeatedly to write large amounts of data. In recent MLPerf llama3-8b checkpoint testing, Fusion SMB completed a 107.6 GB checkpoint write in 7.3 seconds; Samba took 27.1 seconds for the same operation. At training scale, that difference compounds across thousands of checkpoints into hours of lost compute time.

After training comes inference. This is the operational phase. The trained model is now responding to live requests. The long, storage-intensive work is complete, though the training data rarely remains static, and the cycle continues as new data is added. Storage still matters at production scale, but the nature of the demand shifts. Rather than sustained data ingestion, the challenge becomes fast, concurrent access across many workers simultaneously.

In both phases, the same underlying problem emerges: dozens, hundreds, or thousands of compute workers need fast, concurrent access to the same data. That’s not a computing challenge; it’s a file access challenge.

The common workarounds each carry hidden costs:

Local copies mean collocating data and storage with compute, logistically difficult or outright impossible at scale. Model updates become a distribution problem.
Object storage can introduce latency variability that is difficult to predict, particularly under high-concurrency AI workloads. Most AI training data exists as files, petabytes of it, not objects. Teams want to work now, not spend time transforming data into a different format before they can begin.
Custom pipelines trade one risk for another: operational complexity that compounds as the platform grows.

The solution: shared file access that was built for this

A remote shared file system solves this cleanly: one authoritative copy of each model or dataset, a consistent view across all workers, and fast startup when new nodes come online. No bespoke distribution logic. No synchronization overhead.

A remote shared file system requires a network protocol with enormous throughput and low latency. SMB has been evolving for over 40 years, but the version released in 2012 (SMB 3) is so different from what you saw in the 90s that you could consider it a new protocol.

AI and ML training is not metadata-heavy. Recent MLPerf 3D-unet training results make this concrete: on identical hardware at 2x200GbE, Fusion SMB delivered 25.45 GB/s of training throughput, while NFSD managed 13.93 GB/s and Samba 2.82 GB/s. NFS was designed with metadata performance as a priority. SMB 3, with RDMA, multichannel, and scale-out, was built to move very large amounts of data fast. AI training rewards the latter. Forget what you know about SMB from open source and old Windows environments; this is a different protocol designed for a different job.

MLPerf benchmark comparison on identical hardware. Left panel shows AI training throughput for 3D-unet at 2x200GbE: Fusion SMB 25.45 GB/s, NFSD 13.93 GB/s, Ganesha 12.92 GB/s, Samba 2.82 GB/s. Right panel shows checkpoint save time for a 107.6 GB llama3-8b model: Fusion SMB 7.3 seconds, NFSD 8.8 seconds, Ganesha 9.8 seconds, Samba 27.1 seconds.

Tuxera didn’t alter SMB itself (SMB is an open but Microsoft-owned protocol) but built an exceptionally high-performance implementation that fully realizes SMB 3’s potential. For teams with a Linux background, Samba is usually the reference point for SMB performance, but Samba isn’t representative of what SMB 3 can actually do.

Fusion SMB on Linux is significantly faster than Samba, scales to far more concurrent workers, and outperforms Windows Server’s own SMB implementation on the workloads that matter for AI. It’s the SMB engine behind several leading high-performance storage platforms, including Weka and IBM Storage Scale.

So what does this mean for your platform?

Storage problems in AI infrastructure are rarely visible until they become production incidents. By the time latency is spiking and workers are stalling, the storage layer is already a liability.

Fusion SMB removes storage as a limiting factor, not by introducing an exotic new system, but by delivering shared, predictable, high-performance file access that scales with the workload. It lets teams reuse familiar tools and security models, the ones already governing the rest of their infrastructure, while meeting the performance demands of modern AI at scale.

For teams moving AI from experimentation into production, that means fewer incidents, faster scaling, and a storage layer that simply stops being a problem. Reliability isn’t a nice-to-have. It’s the foundation everything else is built on.

What’s the real bottleneck in AI infrastructure?

For most teams running AI in production, the bottleneck isn’t compute. It’s the file system delivering data to that compute. When workers stall, latency spikes unpredictably, and scaling slows, the cause is usually shared file access at scale, not GPU capacity.

Does file system performance matter more during AI training or inference?

Both phases create storage pressure, but in different ways. Training drives sustained, intensive demand: large unstructured datasets ingested over days, weeks, or months, plus regular checkpointing. Inference shifts the demand to fast, concurrent access from many workers responding to live requests. In both phases, the underlying problem is the same: many compute workers needing consistent, high-performance access to the same data.

Is SMB faster than NFS for AI workloads?

For AI and ML training, yes. AI training is not metadata-heavy, and recent MLPerf 3D-unet benchmarks show this clearly. On identical hardware at 2x200GbE, Fusion SMB delivered 25.45 GB/s of training throughput compared with 13.93 GB/s for NFSD. NFS was designed with metadata performance as a priority. SMB 3 with RDMA, multichannel, and scale-out is built to read very large amounts of data quickly, which is what AI training actually demands.

How does Fusion SMB compare to Samba for AI training?

The gap is substantial. On the same MLPerf 3D-unet training benchmark, Fusion SMB delivered 25.45 GB/s while Samba delivered 2.82 GB/s on identical hardware. On checkpoint save time for a 107.6 GB llama3-8b model, Fusion SMB completed the write in 7.3 seconds; Samba took 27.1 seconds. At training scale these gaps compound across thousands of operations.

What is Fusion SMB?

Fusion SMB is Tuxera’s high-performance implementation of the SMB 3 protocol. It runs on Linux, delivers significantly higher throughput than Samba, scales to far more concurrent workers, and outperforms Windows Server’s own SMB implementation on AI-relevant workloads. It’s the SMB engine inside several leading high-performance storage platforms, including Weka and IBM Storage Scale.

Why isn’t object storage the right answer for AI training data?

Object storage can introduce latency variability that’s difficult to predict under high-concurrency AI workloads. Most AI training data exists as files, often petabytes of it, not as objects. Teams want to begin training immediately, not transform their data into a different format first.

What is checkpointing and why does it matter for storage?

Checkpointing is the process of saving a model’s state at regular intervals during training, so training can resume after a failure. Because training runs for days, weeks, or months, checkpointing creates sustained, intensive storage demand throughout the entire process. Slow checkpoint writes pause the training cluster, and at scale those pauses compound into hours of lost compute time.

See the benchmarks for yourself

The numbers in this article come from MLPerf testing on a single hardware configuration. Your workload, your network, and your storage will look different. We will run a proof of concept on your infrastructure and share the results.

Talk to a Fusion engineer

Models to systems: AI engineering’s next phase

Blogs

15.04.2026

Models to systems: AI engineering’s next phase

AI capability is no longer the limiting factor in physical AI. As intelligence moves into machines, vehicles, and industrial systems,...

Tuxera showcases stunning performance with Fusion SMB and introduces Fusion NFS at NAB Show

Blogs

19.03.2026

Tuxera showcases stunning performance with Fusion SMB and introduces Fusion NFS at NAB Show

Tuxera is showcasing breakthrough file sharing performance at NAB Show, highlighting its Fusion SMB platform and a technology preview of...

The infrastructure gap holding physical AI back

Blogs

18.03.2026

The infrastructure gap holding physical AI back

As AI moves into vehicles, industrial systems, robotics, medical devices, and aerospace and defense platforms, embedded storage infrastructure stops being...

The hidden layers where broadcast security actually begins

Blogs

24.02.2026

The hidden layers where broadcast security actually begins

Modern broadcast security must move beyond perimeter defenses, embedding encrypted workflows, zero trust principles, and resilient transport directly into distributed...

Your AI bottleneck isn’t compute. It’s the file system

Key takeaways

Where storage pressure actually comes from

So what does this mean for your platform?

Read more about this topic

Models to systems: AI engineering’s next phase

Tuxera showcases stunning performance with Fusion SMB and introduces Fusion NFS at NAB Show

The infrastructure gap holding physical AI back

The hidden layers where broadcast security actually begins

Related products

Proven success

Related success stories

Industries

Related industries

No results found for

Related results:

Popular searches you might try:

Related success
stories