With the Orion 5.0 release, Cohesity announced the introduction of SpanFS, a new file system uniquely designed to consolidate and manage all secondary storage at scale. SpanFS and its architecture are the core of the Cohesity DataPlatform that enables enterprises to unify the control of their secondary data with web-scale capabilities.

The emphasis on enterprise storage architectures typically focuses on providing specialized capabilities and scalability that are on dependent proprietary hardware capabilities  vendors Scalability and space efficiency features such as compression, deduplication, and snapshots for resiliency standardized file interfaces such as NFS, SMB. Cloud storage architectures are developed by hyperscale companies like Google and Amazon focus on delivering scale-out software-defined solutions that run on commodity x86 hardware with robust and resiliency capabilities to support hardware failures. But they tend to rely on proprietary protocols and APIs for data access.
Today’s enterprise organizations are in desperate need for the best of both storage architectures. Enterprise organizations are looking to move onto software-defined, web-scale solutions that run on commodity x86 hardware, just like cloud storage. The Web-scale capabilities provide multiple advantages such as ‘pay-as-you-grow’ consumption, always-on availability, non-disruptive upgrades (instead of forklift upgrades), simpler management, and lower costs.

Enterprise storage solutions are traditionally deployed into segregated management silos because of different use case and requirements. Typically, purpose-built file systems are introduced which are dependent on vendor specific proprietary features.

For example, purpose-built backup appliances (PBBA) provide in-line variable-length deduplication to maximize space efficiency, but at the expense of random IO performance. Test/dev filers, such as NetApp, provide much better random IO performance and great snapshots, but can’t afford the performance overhead of inline deduplication.

To effectively consolidate secondary storage silos, enterprises need a file system which is simultaneously able to handle the requirements of multiple use cases. It must provide standard NFS, SMB and S3 interfaces, robust IO performance for both sequential and random IO, inline variable length deduplication, and scalable snapshots. And it must provide native integration with the public cloud to support a multicloud data fabric, enabling enterprises to send data to the cloud for archival or more advanced use cases like disaster recovery, test/dev, and analytics. All of this must be done on a web-scale architecture to manage the ever-increasing volumes of data effectively.

SpanFS was specifically designed to manage all secondary data, including backups, files, objects, test/dev, and analytics data, on a web-scale platform that spans from the edge to the cloud.  And overcome the logical and physical constructs limitations of today’s enterprise storage and cloud storage architectures. SpanFS is the combination of the best of both enterprise and cloud storage architectures simultaneously. And it’s the only file system in the industry that simultaneously provides NFS, SMB and S3 interfaces, global deduplication, and unlimited snaps and clones, on a web-scale platform.

SpanFS Architecture

SpanFS is an entirely new file system designed for secondary storage consolidation.

Access Layer – SpanFS exposes industry-standard, globally distributed NFS, SMB, and S3 interfaces and our built-in DataProtect application. All volumes or object buckets can be configured simultaneously on a single Cohesity cluster. The volumes are completely distributed with no single choke point. Each of these volumes benefits from all the unique SpanFS capabilities such as global deduplication, encryption, replication, unlimited snapshots, and file/object level indexing and search.

IO Engine – manages IO operations for all the data written to or read from the system.  It detects random vs. sequential IO profiles, splits the data into chunks, performs deduplication, and directs the data to the most appropriate storage tier (SSD, HDD, cloud storage) based on the IO profile. To keep track and manage the data sitting across nodes, Cohesity also had to build an entirely new metadata store.

Metadata Store – incorporates a consistent, distributed NoSQL store for fast IO operations at scale, SnapTree provides a distributed metadata structure based on B+ tree concepts. SnapTree is unique in its ability to support unlimited, frequent snapshots with no performance degradation. SpanFS has QoS controls built into all layers of the stack to support workload and tenant-based QoS, that can replicate, archive and tier data to another Cohesity cluster or the cloud.

Data Store – is responsible for storing data on HDD, SSD, and cloud storage. The data is spread out across the nodes in the cluster to maximize throughput and performance and is protected either with multi-node replication or with erasure coding. Sequential IOs may go straight to HDDs or to SSDs based on QoS policies. Random IOs are directed to a distributed data journal that resides on SSDs. As the data becomes colder, the data store can tier the data down from SSD to HDD. And hot data can be up-tiered to SSD.

Consistent NoSQL Store – The metadata store uses a distributed NoSQL store that stores the metadata on the SSD tier. This is optimized for fast IO operations, and provides data resiliency across nodes, and is continually balanced across all the nodes.
However, the key-value store by itself provides only ‘eventual consistency.’ To achieve strict consistency, the NoSQL store is complemented with Paxos algorithms.

With Paxos, the NoSQL store offers strict and consistent access to the value associated with each key.

QoS – Quality of Service is designed into every component of the system. As data is processed by the IO Engine, Metadata Store, or Data Store, each operation is prioritized based on QoS. High priority requests are moved ahead in subsystem queues and are given priority placement on the SSD tier.

Replication and Cloud – SpanFS can replicate data to another Cohesity cluster for disaster recovery, and archive data to 3rd party storage like tape libraries, NFS volumes, and S3 storage. SpanFS has also been designed to interoperate seamlessly with all the leading public clouds (AWS, Microsoft Azure, Google Cloud). SpanFS makes it simple to use the cloud in three different ways:

  • CloudArchive enables long-term archival to the cloud, providing a more manageable alternative to tape.
  • CloudTier supports data bursting to the cloud. Cold chunks of data are automatically stored in the cloud and can be tiered back to the Cohesity cluster once they become hot.
  • CloudReplicate provides replication to a Cohesity Cloud Edition cluster running in the cloud. The Cohesity cluster in the cloud manages the data to provide instant access for disaster recovery, test/dev, and analytics use cases.

Cohesity designed SpanFS, as a web-scale, distributed file system that provides unlimited scale across any number of industry-standard x86 nodes. SpanFS manages data across private data centers, and public clouds span media tiers and cover all secondary storage use cases including data protection, file and object storage, cloud integration, test/dev, and analytics.

– Enjoy

For future updates about Cohesity, Primary and Secondary Storage, Cloud Computing, Networking, Cloud-Native Applications (CNA), and anything in our wonderful world of technology, be sure to follow me on Twitter: @PunchingClouds.

X