Skip to content

What the Fusion SMB benchmarks actually tell you

Fusion SMB allowed MLPerf to read training data 10x faster than Samba and keep 24 GPUs fed, trouncing the competition. Read on for a quick tour of the benchmark results, before diving into the full report. 

Key takeaways

  • When the file-sharing layer is the bottleneck in AI training, HPC, or media production, the protocol implementation matters more than the hardware underneath it. 
  • On a single test setup with 400 gigabits of available bandwidth, Fusion SMB transferred data at 46.86 GB/s, matching the local storage speed and saturating the available network capacity. Samba could reach 4.17 GB/s on the same hardware. 
  • With MLPerf 3D-UNet, Fusion SMB kept 24 simulated accelerators fed with data, allowing them to operate at or above the 90 percent utilization threshold. Samba could not sustain adequate transfer speed or latency to satisfy a single accelerator. 
  • A 104.7 GB AI checkpoint was reached in just 6.7 seconds with Fusion SMB. Across a multi-day run, a difference of even just a few additional seconds compounds the idle time. 
  • In a production IBM Storage Scale deployment, four Fusion SMB nodes replaced eleven Samba nodes, cutting three-year TCO by 30 to 35 percent. 
  • The full report covers the methodology, the Windows Server comparison, CPU behavior at 2,500 connections, and the architectural reasons behind every gap. 

The differences in benchmark numbers for Fusion SMB and the competition are not subtle. On a single test server, speeds over ten times faster than Samba can be measured and the number of accelerators that can be sustained is double that of the next-best alternative. For follow-up, the complete results, methodology, and architectural reasoning can be found in The Fusion SMB performance and evaluation primer. 

Download the Tuxera Fusion SMB Benchmark Report

How the file-sharing layer became the bottleneck 

For years, the transport protocol was a secondary consideration. The network was the main constraint, followed by the storage system. The file server, as essential as it is, was almost an afterthought. 

With 200Gb and 400Gb network adapters now available in mainstream servers (and with 800Gb on the way), and with fast solid state memory-based storage systems capable of saturating them, the slowest piece is suddenly the file server itself. A protocol implementation that cannot distribute work evenly across multiple CPU cores, cannot make use of hardware offloading features such as Remote Direct Memory Read (RDMA), or one that does not cluster effectively across multiple nodes will cap your throughput long before your hardware does. 

This is the gap that our benchmarks reveal. It is not a question of network versus storage bandwidth, but the differences in the various file sharing implementations pitted against each other on an otherwise even playing field. 

Throughput: the gap widens as the network gets faster

The clearest single point of comparison is sequential read throughput against Samba, the open-source starting point for most Linux deployments. 

Table 1 (sequential read throughput):
Sequential read throughput comparison between Fusion SMB and Samba at three network speeds. At 100GbE Fusion SMB reaches 11.4 GB/s versus Samba's 2.8 GB/s; at 200GbE, 22.5 GB/s versus 4.17 GB/s; at 2x200GbE, 46.86 GB/s versus 4.17 GB/s. All tests use a 1 MB block size.

As the network and storage get faster, Fusion SMB scales toward the storage ceiling while Samba falls further behind. Fusion SMB uses a multithreaded, user-mode design with a mature multichannel implementation and full RDMA support, enabling SMB Direct. The use of RDMA on capable hardware allows for bypassing much of the networking stack, leaving more CPU capacity available for other tasks. As a result, adding a second 200Gb path roughly doubles throughput with Fusion SMB. 

Samba by comparison uses a process-per-connection model and does not support RDMA. The results show that increasing the available bandwidth or storage performance does almost nothing for Samba’s numbers, as the architectural limitations prevent it from turning the extra bandwidth into client-visible throughput. The report explains this in depth. For a more complete side-by-side feature compare, the Fusion SMB instead of Samba blog post will have you covered. 

If your reference point is Windows Server rather than Samba, the report covers that comparison too. The largest gains in favor of Fusion there can be seen in random writes and small-block I/O. A more complete version of that comparison is in the Fusion SMB instead of Windows Server blog post. 

AI training: feeding the accelerators

Peak throughput is one thing, but keeping GPUs busy poses an additional challenge. The MLPerf 3D-UNet benchmark simulates a real AI training workload and measures how many accelerator units a storage server can satisfy above at the required 90 percent utilization threshold. This requires sustained high-performance, as momentary dips in transfer rates can cause breaks or bubbles in the execution of the accelerators. In this benchmark, more passing units equate to more GPUs enjoying a sufficient and reliable flow of data from the server. 

Table 2 (AI training / MLPerf 3D-UNet):
MLPerf 3D-UNet results showing accelerator units sustained above 90 percent utilization by storage platform. Samba over TCP passes 0 units at 2.82 GB/s; Ganesha over TCP passes 9 at 11.36 GB/s; NFSD over RDMA passes 11 at 13.83 GB/s; Fusion SMB over RDMA passes 24 at 27.77 GB/s.

A single Fusion SMB server feeds roughly twice as many GPUs as the same host running other file server implementations such as NFSD or Ganesha. Samba does not register a single passing accelerator with this workload, which makes it unfeasible for AI training at scale. 

SMB 3, along with RDMA, multichannel, and scale-out, allows for the fast movement of very large amounts of data. On training workloads, that is what wins. For why the storage layer rather than GPU compute is the typical bottleneck in production AI, see the AI bottleneck blog post. 

Checkpointing: where seconds become hours

Checkpoints are bursty, large, and frequent. A training run saves its state at regular intervals, so it can resume after a failure, and if storage cannot absorb those writes quickly, the checkpoint becomes the longest-running operation in the run. 

During a trial of 104.7 GB LLaMA3-8B checkpoint, Fusion SMB completed the save in 6.7 seconds. Samba took 28.1 seconds. NFSD and Ganesha landed at 7.9 and 10.4 seconds respectively. Across hundreds of checkpoints in a multi-day run, the difference between 7- and 28 seconds compounds into hours of GPU time spent waiting on storage rather than training. 

Fewer nodes, lower cost: the IBM Storage Scale deployment

Performance numbers on a test system are of course exciting, but the cost of running a production environment is a more real and tangible point of interest. In a real IBM Storage Scale deployment serving 150 GB/s of aggregate throughput, Fusion SMB was used to replace the existing Samba protocol layer at a much smaller footprint. 

IBM Storage Scale deployment comparison after replacing the Samba protocol layer with Fusion SMB. Protocol nodes dropped from 11 to 4 (63% reduction), switch ports from 44 to 16 (64%), and network connections from 22 to 8 (64%). Aggregate throughput rose from 154 GB/s to 160 GB/s (+4%), while 3-year TCO fell 30 to 35 percent below the Samba baseline.

Fewer nodes naturally mean fewer licenses, less power, less cooling, fewer cabling, and in general a lower overhead to operate. Fusion’s performance characteristics still allow for aggregate throughput to go up rather than down. 

What the report covers that this post does not:

This is the high-level view. The full Fusion SMB performance and evaluation primer dives deeper: 

  • The complete test environment and methodology, including why caching and buffering were disabled to measure the server rather than the cache. 
  • CPU and memory behavior at 2,500 simultaneous active SMB connections, where peak CPU utilization stayed under 62 percent. 
  • Full FIO comparison across Samba, Ganesha, NFSD, and Fusion SMB with sequential and random read and write patterns. 
  • A capability-by-capability feature comparison covering all the key features and licensing. 
  • How to start a 45-day evaluation on your own hardware. 

The numbers that matter are yours

Every figure here came from a controlled test environment. Yours will look different, because your network, your storage, and your workloads are different. That is the point of an evaluation: to find out which of these gaps show up most clearly on your stack. 

You may want to check out the full benchmark report if you’re interested, and then talk to our team about running the appropriate tests on your hardware. 

Download the Tuxera Fusion SMB Benchmark Report

Ready to talk to a Fusion engineer? Get in contact

Suggested content for:

Our products

Your mission-critical systems demand uncompromising reliability. Tuxera products mean absolute data integrity. We specialize in file systems, software flash controllers, and secure networking and connectivity solutions. We are the perfect fit for data-intensive, mission-critical workloads. Using Tuxera’s time-proven solutions means that your data is safe and secure – always.

Proven success

Our solutions are trusted by major brands worldwide. When you need reliable, scalable, and lightening-fast data access and transfer across any system or device, Tuxera delivers. Our track record speaks for itself. We’ve been in this business for decades with a clear mission: to be the partner you can trust. Read on to find out more.

Related pages and blog posts
Technical Articles
Datasheets & Specs
Whitepapers