Examining Container Storage Interface (CSI) support for Fusion File Share by Tuxera in Kubernetes

Fusion File Share by Tuxera is an SMB solution that emphasizes high reliability, scalability, and performance. One of the benefits of Fusion is that it features support for Kubernetes storage (something I’ve talked about before) through Container Storage Interface (CSI). What is CSI and Kubernetes, and how can they be used through Fusion File Share? Let’s examine these topics more in detail.

Delving into Kubernetes

The name Kubernetes originates from Greek, meaning helmsman or pilot. Kubernetes itself is a portable, extensible, open-source platform for managing containerized workloads and services, that facilitates both declarative configuration and automation. It has a large, rapidly growing ecosystem. Kubernetes services, support, and tools are widely available. Google open-sourced the Kubernetes project in 2014. The platform combines over 15 years of Google’s experience running production workloads at scale with best-of-breed ideas and practices from the community.

A recent survey by StackRox shows that Kubernetes adoption stands at 86% in the container orchestration market, and now the competition has shifted among cloud vendors who provide Kubernetes. Some of the salient features of Kubernetes that have made it popular include:

(a) Portability – Kubernetes offers portability, as well as faster, simpler deployment times. This means that companies can take advantage of multiple cloud providers if needed, and can grow rapidly without having to re-architect their infrastructure.

(b) Scalability – Kubernetes has an ability to run containers on one or more public cloud environments, in virtual machines, or on bare metal. This means that it can be deployed almost anywhere.

(c) High availability (HA) – Kubernetes addresses high availability at both the application and the infrastructure level. Adding a reliable storage layer to Kubernetes ensures that stateful workloads are highly available. It also ensures that Kubernetes components can be configured for multi-node replication (multi-master).

(d) Open source (OSS) – Since Kubernetes is open source, you can take advantage of the vast ecosystem of other open source tools designed specifically to work with Kubernetes without the lock-in of a closed/proprietary system.

(e) Huge ecosystem – With thousands of developers all over the world, as well as tools with 5,608 GitHub repositories (and counting), you won’t be forging ahead into new territory without help.

(f) Market leader – Kubernetes was developed, used, and maintained by Google. This not only gives it instant credibility, but it also means you can trust that bugs will be fixed and that new features will be released on a regular basis.

As a result of the above features, traditional applications have had to undergo architectural changes when the need to be compatible with, and deployed to, Kubernetes clusters emerged.

How Kubernetes affects deployments

"

Let’s consider a typical website application which has a web-ui frontend and a database backend. In traditional deployments, they are bundled together. But in a container world, they are split and ensured that they operate individually. Due to this, scaling, upgrading etc. gets easier.

When such an application is deployed to a Kubernetes cluster with a scale factor as 1, each of those containers end up in different nodes. The networking, load balancing, etc. are handled internally by Kubernetes itself. Consider the above example, being deployed to a 3 node Kubernetes cluster. The web-ui frontend ends up in node-2 and database backend gets deployed to node-1, and Kubernetes creates an internal load balancer. This routes the incoming traffic to node-2 that hosts the web-ui.

"

Let’s talk a bit about the self-healing concept of Kubernetes. This is a topic that I’ve talked about in previous articles, and refers to the nature of Kubernetes in monitoring the health of the containers hosted inside the cluster. In occasions of unhealthy behavior or problems within a node, the containers are automatically shifted to healthy nodes without user intervention. Considering our website application, assuming node-2 being unhealthy, the web-ui frontend application might be shifted to node-3, which is fairly healthy and idle. Notice that the load balancer understands this, and starts to route the incoming traffic to web-ui frontend to the new node.

Container storage interface (CSI) for Kubernetes

Our above example works quite well for the given scenario. In production workloads however, this is not the case with storage. Usually, storage is hosted outside of the Kubernetes cluster for availability, ease of backup, and many other good reasons. Considering our website application, we would like to host the storage of web-ui frontend and database backend to a cloud file storage.

One option we have is to simply mount the storage to all the nodes. But this solution does not scale. CSI for Kubernetes however addresses this. Kubernetes core started to accept contributions from storage vendors, and the core team soon realized that they had huge dependencies on these vendors for Kubernetes core releases. As a result, the Kubernetes Special Interest Group (SIG) decided to create a Container Storage Interface (CSI) and every storage vendor would implement that interface to get one’s storage working inside Kubernetes.

Considering our website application above, when a container storage interface (CSI) is implemented for a given cloud vendor, due to container shifting from one node to another, the container storage interface (CSI) will ensure that storage is also shifted along with it. The first picture below shows the scenario with web-ui frontend and database backend using cloud storage. The second picture shows the situation after both the container and storage had shifted to a healthy node-3.

Kubernetes SMB CSI

The csi-driver-smb project within the Kubernetes CSI group has been working on providing a CSI based solution for CIFS and SMB. It’s open sourced under Apache 2.0 license and maintained on github. The repository has well detailed installation instructions, a development guide, a troubleshooting guide, plus examples.

The driver deployment is divided into two parts: as Provisioner, which creates or commissions the SMB server, and Deployer, which deploys the driver.

Fusion File Share and Kubernetes CSI

Since Tuxera’s SMB solution, Fusion File Share, is CIFS compliant, the deployer works out-of-the-box with Fusion. The demonstration below will show how to get CSI working for Fusion File Share. The following steps will be performed:

1) A Kubernetes cluster will be provisioned on a cloud vendor. In this example it will be Azure, and we will be using Azure Kubernetes Service (AKS). Learn more about AKS here.

2) Kubernetes SMB CSI will be deployed to the cluster.

3) Fusion File Share by Tuxera will be provisioned on a Virtual machine in the Azure cloud.

A test workload will be deployed to the cluster to see if the CSI driver works with Fusion File Share by Tuxera.

Demonstration

The following is just a demonstration of Kubernetes CSI in Fusion File Share by Tuxera. This approach is not recommended for production workloads. Please get in touch with us for configuring your systems for production scale.

This demonstration will be mostly command line based. It uses Azure, Virtual machines, and localhost CLI for these purposes.

1.

First, ensure that a Kubernetes cluster is running and accessible from the command line.

The above command shows that we have a 3 node Kubernetes cluster running on Azure Kubernetes service (AKS).

2.

Next, deploy the Kubernetes SMB CSI driver to the cluster, using the instructions here.

This deploys 0.3.0 version of Kubernetes CSI SMB driver to our cluster.

3.

Verify if the driver deployment was successful by listing the pods.

The ‘Running’ status for the csi-smb-* pods above shows that our deployment was successful and running healthy.

4.

Create a Kubernetes secret for accessing Fusion File Share. For testing purposes, let’s use test1/test1 as our username/password combination.

5.

Next, install Fusion File Share by Tuxera on a virtual machine in an environment. This can be done in many ways, based on your needs.

For this example, I will be using Azure cloud with docker. Let’s say the public IP address of the virtual machine is E.g. 40.113.16.31.

a) Ensure that the ssh connection works with/without password.

b) Install docker in the virtual machine. I already had them in this virtual machine.

c) From another environment, create the Fusion File Share container archive and copy it to the Virtual machine. Again, there are many ways to do this.

d) Load and run the Fusion File Share by Tuxera server container exposing port 445, and also bind-mounting the /tmp of the container as the tsmb.conf is configured to use /tmp.

6.

After that, ensure that there is a Fusion File Share by Tuxera service listening on port 445. One rudimentary approach to test this is using the telnet client. This is to rule out possibilities of firewall or network interference.

7.

Create a storage class with the server information and share name. Follow the instructions found here. In our config, we just have to replace the ‘source’ attribute with the IP address and share name.

8.

Apply the storage class to the Kubernetes cluster.

9.

Now it’s time to create a workload. Let’s create a stateful set which puts a timestamp every second onto the Fusion File Share mount point. The example I’ve used above is from here.

10.

Verify the mount point exists in the stateful set container. You can see that it is mounted on /mnt/smb.

11.

Also, make sure to verify the timestamp logging is happening on the storage on our virtual machine, at the place where the Fusion File Share server is hosted.

Final thoughts

From what we’ve explored above, it’s very much evident that Tuxera’s SMB solution Fusion File Share has support for Container Storage Interface (CSI). The biggest benefit we attained was from the open source solution of the CSI driver for SMB. Now we have a common platform to start working from, and making contributions to. As Jim Whitehurst put it, “Open source isn’t about saving money, it’s about doing more stuff, and getting incremental innovation with the finite budget you have”. This is the open mindset we all should have, and the one we strive towards at Tuxera.


Fusion File Share is our highly scalable enterprise SMB server on Linux.

FIND OUT MORE


Verifying Tuxera SMB failover and recovery using Kubernetes

Verifying Fusion File Share (formerly Tuxera SMB) server’s automatic connection failover and recovery with Kubernetes

Tuxera SMB (now called Fusion File Share by Tuxera) is a high-performance alternative for open-source Samba and other SMB/CIFS servers. It has state-of-the-art modular architecture that runs in user or kernel space. This enables Fusion File Share to achieve maximum I/O throughput, low latency, and ensures the lowest CPU usage and memory footprint compared to other solutions. It supports the Server Message Block (SMB) protocol’s latest features, including SMB 2, SMB 3, and all related security features.

One of the key benefits of Fusion File Share is its possibility to automatically failover and recover a connection. If a server node fails, terminates, or is shutdown without any prior indication, the SMB client detects the node as unavailable once a time-out or keep-alive mechanism is encountered. This aspect of connection recovery is unreliable and slow. Fusion File Share reconnects using the TCP “tickle ACK” mechanism. This sends an ACK with invalid fields which triggers a set of TCP exchanges, causing the client to promptly recognize a stale connection and then reconnect.

To test recovery, Fusion File Share is wrapped inside a Docker container and deployed to a Kubernetes cluster. Kubernetes is a portable, extensible open-source platform for managing containerized workloads and services that facilitates both declarative configuration and automation.1 The platform is self-healing by nature.2 This means Kubernetes can restart containers that fail, replaces and reschedules containers when nodes die, kills containers that don't respond to user-defined health checks, and doesn't advertise them to clients until they are ready to serve. With this, it’s fairly easy to test Fusion File Share connection failover and recovery with Fusion File Share running as a container inside a Kubernetes cluster.

Methods

A Windows client connects to the Fusion File Share mount, which is running inside the Kubernetes cluster. Then, a file copy is initiated to the mount to ensure continuous connection. While the file copies, the container running Fusion File Share is killed explicitly. This causes Kubernetes to reschedule Fusion File Share into another Kubernetes node and within a few seconds, the copy resumes to the Fusion File Share mount.

Environment setup

Tuxera SMB failover and recovery in Kubernetes

Figure 1 shows the test setup. Three different users have Windows 10 desktop acting as SMB clients. They have mounted a shared folder named “test” to different drives on their machines with path \\tshare\test. The folder is served by Fusion File Share running inside a 2-node Kubernetes cluster. This is an on-premise Kubernetes cluster containing dhcp-180 and dhcp-185 nodes. From the figure, Fusion File Share is served from host dhcp-180. Also, these clusters use iSCSI storage from an external iSCSI initiator. An NGINX reverse proxy runs between the users and Kubernetes cluster. This is because as of Kubernetes 1.9.x, it’s impossible to natively obtain a load-balanced external IP or ingress for an on-premise Kubernetes cluster.3

Testing failover and recovery

To demonstrate Fusion File Share’s failover and recovery, a file is copied from the user’s local disk on a Windows computer to the mount point \\tshare\test. As soon as copying is initiated, the operation starts with a speed based on the network speed. Then, to force a failure, the pod, tsmb-server which runs the Fusion File Share container, is deleted during the copy operation. Kubernetes understands the container was deleted and hence starts the Fusion File Share server container in another server. The end-user would see the copy speed going down to zero for a very short while before the copy resumes instantly. The key point to note here is that copying does not fail, or get interrupted, by any errors.

This gives a seamless experience to the user, who would not even notice the server failure on node dhcp-180 and subsequent recovery on the dhcp-185 node. Fusion File Share SMB server ensures the client waits until the new instance of SMB server is started, performs the reconnection and resumes the file copy. The explanation is demonstrated in the video clip below.

Final thoughts

Fusion File Share leverages the self-healing feature of Kubernetes to transfer Fusion File Share SMB server container from one host to another when a failure occurs giving users a seamless experience during file copy.

As we are already aware that a good end-user’s experience is the key to success, this test demonstrates one of Fusion File Share SMB’s reliability features: to automatically recover in case of failure without interrupting the user’s needs. The end goal of such a feature is to provide the best user experience, and here we have created a way to test that we can deliver on that promise.

Also, in recent times, many organizations are containerizing their infrastructure. Thus, Fusion File Share by Tuxera would also be a great fit for those customers running Kubernetes on their premises. This allows users to reap the combined features and benefits of Kubernetes and Fusion File Share.

References

  1. Kubernetes definition: https://kubernetes.io/docs/concepts/overview/what-is-kubernetes/
  2. Self healing: https://kubernetes.io/docs/concepts/workloads/controllers/replicationcontroller/#what-is-a-replicationcontroller
  3. Kubernetes loadbalancing on-premise: https://medium.com/@maniankara/kubernetes-tcp-load-balancer-service-on-premise-non-cloud-f85c9fd8f43c

Continuous monitoring in file systems development

Applying continuous monitoring of performance benchmarks in file systems development

Continuous Integration1 (CI) has become a very popular addition to software development cycles. With this process, companies are reaping great-quality software development. Typically, continuous integration comprises of building (compilation and packaging), followed by smoke testing. In some cases, this system is extended to continuously deliver software products to production systems—an approach called Continuous Delivery2 (CD). Monitoring performance is a key aim, but it usually comes at a later stage, in the CD pipeline. However, here I describe the benefits of executing the performance benchmarking earlier in the smoke testing phase using Elastic Stack3.

What is Continuous Integration?

From Wikipedia, Continuous Integration (CI) is the practice of merging all developer working copies to a shared mainline several times a day. This is the first step in most DevOps practices and primarily aims to solve integration problems. Usually the software is checked out from the version control system, compiled, and packaged. This is called the build phase. In normal cases, there are builds for every change to the software repository. There is also the smoke testing4 phase, which comes immediately after a successful build phase. If both phases are successful, the committer or developer manages to successfully push the changes to the system. Then, the system gets ready to receive the next change.

A CI system is meaningless if any of these phases takes hours to finish. The biggest challenge with this approach is to have a faster build phase, because smoke testing is typically a very minimal set of tests to verify the software.

After establishing a reliably working CI pipeline, and if there is need to have a faster release process, then the Continuous Deployment or Delivery (CD) pipeline should be implemented. In most systems, CD implementation comes following CI. Also, CD is far more difficult to achieve due to the duration and bureaucracies around it, as it involves multiple phases of testing: integration testing, system testing, end-to-end testing, performance testing, and acceptance testing, to name a few. A software being tested does not have to undergo all these testing phases, but at least some are based on the product or customer needs.  Put simply, the CI’s scope is until “deploy to QA” (see figure below) whereas CD’s scope extends until it is deployed to production.

Tuxera – the basics of continuous delivery

In kernel file systems development, where read/write speed is a crucial factor, performance testing and short-long term performance degradation is important. Performance testing indicates how fast we execute operating system kernel operations. Poor performance indicates slower kernel operations that cause a poor user experience. Anything less than 10% than the previous performance values is not accepted. Also, it is important to visualize how we have been performing over time, as short-term degradation should not accumulate to a higher value.

Various methods exist to speed up CD

It is apparent that faster performance testing is important to achieve faster CD, which is challenging. This is due to factors such as performance testing duration, requirement of real hardware devices, a huge variety of kernel-architecture-file system combinations, amongst others. Taking only the duration factor for slow CD, there are many options which are becoming industry standards. Some advocate that a software can be slightly tested and delivered to a customer in a Canary release6 approach so that it is easier and faster to rollback. There are also approaches which perform regression release testing after the software is in production. This method will always roll-in with fixes and never rollback to previous versions.

How Tuxera does it

At Tuxera, we have performance bench tests for our kernel file system driver products. We run them as a part of the CI pipeline in the smoke testing phase in parallel with other smoke tests. They are called quick performance testing and are run on every commit. The tests are run on a specific kernel-architecture-file system combination that is fast and already indicates any performance improvement or degradation. After a successful smoke testing phase, this is promoted to a full performance testing phase, where we test for more kernel-architecture-file system combinations on various real hardware devices which are already configured.

This solution is based on an early testing approach, facilitated by shifting the phases which usually exist after CI, but before CD. Earlier testing gives us insight into driver performance already at the smoke testing phase.

Analyzing performance test results

Performance tests usually produce a lot of results, especially due to the fact that they are run on a per-commit basis. We use Elastic Stack, which indexes and graphs the performance test result of every commit. Elastic Stack consists of components: Elasticsearch, Logstash and Kibana—which store, process and index, and visualize logs, respectively.

Tuxera's Elastic Stack and CI

From the figure above, Jenkins8 acts as an orchestrator executing tests on devices, fetching and shipping the logs to the Elastic Stack server. For simplicity, the logs are converted to json format so they can be fed straight to Elasticsearch, where they are seen from Kibana instantly. An Nginx reverse proxy acts as a frontend serving the user interface.

The graph on the following page shows the behavior of our unpack, browse, read, write, and rsync tests over Git commits tags on full performance tests. The y-axis indicates the delay in milliseconds for an operation. The x-axis indicates Git commit tags. As one can see, there has been a clear improvement of 75% on rsync speeds as per the 27 July commit.

Tuxera – performance testing
Example graph

This is just a sample example. The overall picture of critical performance tests looks like this:

Tuxera performance test overview
Example performance tests overview

*Please get in touch with us (info@tuxera.com) if you would like to know more specific details about our tests.

What we learned

The CI approach, which is a byproduct of agile practices, defines integration of software components well ahead of the development cycle in comparison with the older waterfall model. However, not many advancements have been made concerning the practices over the years. Our approach, which takes the performance benchmarking phase one step ahead into CI to find problems well in advance, is clearly an improvement for us. It is also evident that huge log processing tools like Elastic Stack have helped us easily store, process, and plot large amounts of logs for our purposes.

Future enhancements

Currently, we are analyzing the graphs manually, which on rare instances can be laborious and slow before the committer is notified. It is also possible to define alerts in Elastic Stack. This means that the committer can be notified whenever the performance drops below a certain range. This is a work in progress.

Final thoughts

It is always better to verify software products well ahead in the pipeline to ensure a very low cycle time. To make the CD faster, it is also a good practice to push some phases in CD ahead into CI if that is feasible. Apparently, all that counts is to reduce the cycle time for a faster product release to production, which this approach tries to solve.

References

  1. https://en.wikipedia.org/wiki/Continuous_integration
  2. https://en.wikipedia.org/wiki/Continuous_delivery
  3. https://www.elastic.co/products
  4. https://en.wikipedia.org/wiki/Smoke_testing_(software)
  5. https://medium.com/@raddougall/moving-to-continuous-delivery-is-hard-but-well-worth-the-effort-a0f4b492b12b
  6. https://martinfowler.com/bliki/CanaryRelease.html
  7. https://www.elastic.co/webinars/introduction-elk-stack
  8. https://jenkins.io

This post is also available as a PDF download.

Download PDF