Distributed Parallel Fault-tolerant File Systems
Distributed file systems, which also are parallel and fault tolerant, stripe and replicate data over multiple servers for high performance and to maintain data integrity. Even if a server fails no data is lost. The file systems are used in both high-performance computing (HPC) and high-availability clusters.
All file systems listed here focus on high availability, scalability and high performance unless otherwise stated below.
|Ceph||Inktank||LGPL||Linux||A massively scalable object store. Ceph was merged into the linux kernel in 2010. Ceph’s foundation is the Reliable Autonomic Distributed Object Store (RADOS), which provides object storage via programmatic interface and S3 or Swift REST APIs, block storage to QEMU/KVM/Linux hosts, and POSIX filesystem storage which can be mounted by linux kernel and FUSE clients.|
|CloudStore||Kosmix||Apache License 2.0||Google File System workalike|
|Cosmos||Microsoft internal||Focuses on fault tolerance, high throughput and scalability. Designed for terabyte and petabyte sized data sets and processing with Dryad.|
|dCache||DESY and others||A write once filesystem, accessible via various protocols|
|ExaFS||Exanet||proprietary software||Distributed file system, runs as part of ExaStore, a Linux based NAS solution that runs on commodity Intel based hardware, serving NFS v2/v3, SMB/CIFS and AFP to Windows, Mac OS, Linux and other UNIX clients.|
|FS-Manager||CDNetworks||proprietary software||Linux||Focused on Content Delivery Network|
|Gfarm file system||X11 License||Linux, Mac OS X, FreeBSD, NetBSD and Solaris||Uses OpenLDAP or PostgreSQL for metadata and FUSE or LUFS for mounting|
|General Parallel File System (GPFS)||IBM||proprietary||AIX, Linux and Windows||Support replication between attached block storage. Symmetric or asymmetric (configurable)|
|GlusterFS||Gluster, a company acquired by Red Hat||GNU General Public License v3||A general purpose distributed file system for scalable storage. It aggregates various storage bricks over Infiniband RDMA or TCP/IP interconnect into one large parallel network file system. GlusterFS is the main component in Red Hat Storage Server.|
|Google File System (GFS)||Google App Engine||Focus on fault tolerance, high throughput and scalability|
|IBRIX Fusion||IBRIX||proprietary software||Linux|
|Lustre||originally developed by Cluster File Systems and currently supported by Intel (formerly Whamcloud)||GPL||Linux||A POSIX-compliant, high-performance filesystem. Lustre has high availability via storage failover|
|MogileFS||Danga Interactive||GPL||Linux (but may be ported)||Is not POSIX compliant, uses a flat namespace, application level, uses MySQL or Postgres for metadata and HTTP for transport.|
|OneFS distributed file system||Isilon||BSD based OS on dedicated Intel based hardware, serving NFS v3 and SMB/CIFS to Windows, Mac OS, Linux and other UNIX clients under a proprietary software|
|Panasas ActiveScale File System (PanFS)||Panasas||proprietary software||Linux||Uses object storage devices|
|PeerFS||Radiant Data Corporation||proprietary software||Linux||Focus on high availability and high performance and uses peer-to-peer replication with multiple sources and targets|
|TerraGrid Cluster File System||Terrascale Technologies Inc||proprietary software||Linux||Implements on demand cache coherency and uses industrial standard iSCSI and a modified version of the XFS file system|
|XtreemFS||open-source (GPL)||cross-platform file system for wide area networks. It replicates the data for fault tolerance and caches metadata and data to improve performance over high-latency links. SSL and X.509 certificates support makes XtreemFS usable over public networks. It also supports Striping for usage in a cluster.|
|Chiron FS||is a fuse-based, transparent replication file system, layering on an existing file system and implementing at the file system level what RAID 1 does at the device level. A notably convenient consequence is the possibility of picking single target directories, without the need of replicating entire partitions. (The project has no visible activity after 2008, a status request in Oct. 2009 in the chironfs-forum is unanswered)|
- PlasmaFS is a free and open-source (GPL) userspace filesystem focusing on data safety and security. PlasmaFS provides a transactional API which is accessible over a SunRPC-based protocol. PlasmaFS can also be mounted as NFS volume, and is POSIX-compliant. Both data and metadata are replicated.
- WebDFS An Open Source scalable, decentralized file store similar to MogileFS in function and purpose. Uses HTTP as the transport. Data is automatically and optimally re-arranged to accommodate the addition of new resources. The lack of central meta data management greatly simplifies deployment and use.
- zFS from IBM (not to be confused with ZFS from Sun Microsystems or the zFS file system provided with IBM's z/OS operating system) focus on cooperative cache and distributed transactions and uses object storage devices. Under development and not freely available.
- Hadoop Distributed File System - free GoogleFS clone produced by Apache. http://hadoop.apache.org/
- HAMMER/ANVIL by Matt Dillon
- OASIS from ETRI. Very similar to the Lustre or Panasas. Available for Linux via. special technology transfer program provided by ETRI.
- GLORY-FS also from ETRI. Very similar to the Google File System or Hadoop, but it is fully POSIX compliant. It is specially optimized for large-scale web 2.0 content services. Version 2.5 is available for Linux via. special technology transfer program provided by ETRI. Windows version is under development.
- PNFS (Parallel NFS) - Clients available for Linux and OpenSolaris and back-ends from NetApp, Panasas, EMC Highroad and IBM GPFS
- Coherent Remote File System (CRFS) - requires Btrfs
- Parallel Optimized Host Message Exchange Layered File System (POHMELFS) and Distributed STorage (DST). POSIX compliant, added to Linux kernel 2.6.30
- Sector from National Center for Data Mining. Sector is a high performance, scalable, and secure distributed file system. Available under Apache License 2.0
- StarFS from CDNetworks. The StarFS is a global storage platform which supports virtualization of distributed file system and event-driven file synchronization with remote StarFS clusters.
- Unilium provides a decentralized, versioning file system stored in content addressable storage, whose data may be hosted across heterogeneous data storage nodes.
Famous quotes containing the words systems, file, distributed and/or parallel:
“In all systems of theology the devil figures as a male person.... Yes, it is women who keep the church going.”
—Don Marquis (18781937)
“A common and natural result of an undue respect for law is, that you may see a file of soldiers, colonel, captain, corporal, privates, powder-monkeys, and all, marching in admirable order over hill and dale to the wars, against their wills, ay, against their common sense and consciences, which makes it very steep marching indeed, and produces a palpitation of the heart.”
—Henry David Thoreau (18171862)
“Indiana was really, I suppose, a Democratic State. It has always been put down in the book as a state that might be carried by a close and careful and perfect organization and a great deal of[from audience: soapMa reference to purchased votes, the word being followed by laughter].
I see reporters here, and therefore I will simply say that everybody showed a great deal of interest in the occasion, and distributed tracts and political documents all through the country.”
—Chester A. Arthur (18291886)
“As I look at the human story I see two stories. They run parallel and never meet. One is of people who live, as they can or must, the events that arrive; the other is of people who live, as they intend, the events they create.”
—Margaret Anderson (18861973)