SMB transparent failover with continuous availability eliminates server downtime and is essential in any data-critical environment.

Information is the lifeblood of organizations, driving critical decision-making processes and action. As the volume of data being processed continues to grow exponentially, companies need to prioritize fail-safe measures to safeguard this valuable asset. Relying increasingly on digital platforms, businesses are faced with the challenge to maintain the integrity, accessibility, and availability of their data. The risk of data corruption or unplanned downtime necessitates robust fail-safe mechanisms to mitigate and prevent potential issues.

Fail-safety in SMB and how it helps

The Server Message Block (SMB) protocol provides an efficient, controlled, and secure way for enterprise businesses to access, share, and store their files within their network. Reliability and redundancy are key factors for providing uninterrupted access to critical data assets. Reliable access to the correct data needs to be ensured, even in the event of failure.

Data sharing within and between modern organizations is ubiquitous – hence sharing seamlessly and efficiently between devices, and across teams is essential so the data can be processed and analyzed according to the expected workflow. A single incident affecting the availability of data can be costly to an organization. For example, a financial institution might lose millions in missed transaction fees or from lost trades. Scientists making time sensitive observations might miss or fail to document a cell wall change, a particle collision, or another equally important data point they have waited their entire careers to witness. Many actors such as advanced medical research institutes generate massive amounts of data from sophisticated imaging equipment – and the timely processing of this data is used to save countless lives. For cases like these, fail-safety is a must, not an option.

Fail-safety with SMB-based file sharing is provided by two core functionalities: failover and continuous availability. In simple terms, failover is the ability to recognize an unplanned point of failure and automatically route the incoming client data requests from the failed service to a secondary service.

Combining failover with continuous availability means that when a server becomes unavailable for any reason (planned or otherwise), all data operations are moved transparently to another server in the background, and continue with little to no interruptions to the end user.

A highly scalable, fail-safe SMB solution

An SMB server solution must be rock solid to ensure the availability of key data assets 24/7 without interruptions and data loss. Fusion File Share by Tuxera – our enterprise-grade SMB implementation – is engineered to provide scalable and fail-safe SMB service with continuous availability and transparent failover for seamless enterprise file sharing. It doesn’t matter if employing containers, protocol nodes, or storage servers, Fusion File Share SMB server is the best fail-safe file sharing option available. Offering a wide array of performance tuning options and impressive scalability, Fusion File Share by Tuxera is the fail-safe SMB server for enterprise storage.

Fail-over and continuous availability at work in active-passive SMB file clusters

To make transparent failover work, Fusion File Share needs to keep information about incoming connections and open file handles in a database. This database is located in a shared storage location for the SMB cluster, visible and accessible to all the nodes in that cluster.

To configure continuous availability in Fusion File Share use the following configuration options:

  • ca=true, enables continuous availability.
  • ca_params= <path>, path to the shared database location.

In order to allow SMB clients to quickly, automatically, and reliably reconnect to a new server use the configurations:

  • tcp_tickle = true, to enable TCP tickle ACK
  • tcp_tickle_params=<path>, path to the to same shared storage as in continuous availability option where client connections are tracked.

Sequence of the fail over in active-passive Fusion File Share cluster

1. Active SMB server node fails.

By default, Pacemaker is utilized by Fusion File Share for recognizing failed nodes. When a failure occurs, incoming client connections are paused.

2. A virtual IP-address is floated to a passive node.

When the client reconnects, it won’t know it is connected to a different node. Pacemaker is used for floating the IP-address.

3. The passive node is activated.

When the passive node is activated, it reads all the connection information from the shared database, the location of which is defined by ca_params. It then starts the SMB service.

4. TCP tickle ACK is sent to all the clients.

This will notify the clients that they can now resume ongoing operations, minimizing wait time.

5. All the ongoing operations pick up where they left off when the failure occurred.

Now the failover has been completed successfully without any service interruptions, restarts, or data loss.

Here’s also a video demonstrating how transparent fail-over works:

Final thoughts

Unlock the ability to reliably access your critical data when you need it, without costly, time-consuming server downtime. With its level of performance and scalability, Fusion is the ultimate SMB solution for cloud, software defined storage (SDS), and other file sharing option providers. Fusion File Share not only eliminates downtime when updating, adding, or retiring nodes in your cluster, but also ensures that during unplanned service interruptions, the service workload is seamlessly transferred to a new node without users ever becoming aware of the interruption.

Discover more about how Fusion File Share helps you maintain access to your files when it matters most.