Cloud computing promises large-scale and seamless access to vast quantities of data across the globe. Appli cations will demand the reliability, consistency, and per- formance of a traditional cluster file system regardless of the physical distance between data centers.
Panache is a scalable, high-performance, clustered file system cache for parallel data-intensive applications that require wide area file access. Panache is the first file system cache to exploit parallelism in every aspect of its design—parallel applications can access and update the cache from multiple nodes while data and metadata is pulled into and pushed out of the cache in parallel. Data is cached and updated using pNFS, which performs parallel I/O between clients and servers, eliminating the single-server bottleneck of vanilla client-server file ac- cess protocols. Furthermore, Panache shields applica- tions from fluctuating WAN latencies and outages and is easy to deploy as it relies on open standards for high- performance file serving and does not require any propri- etary hardware or software to be installed at the remote cluster.
In this paper, we present the overall design and imple- mentation of Panache and evaluate its key features with multiple workloads across local and wide area networks.
Link to the paper:
http://www.usenix.org/events/fast10/tech/full_papers/eshel.pdf
Presented by Ajinkyaatul Alekar
Link to the slides:
http://www.cse.buffalo.edu/faculty/tkosar/cse726/slides/04-alekar.pdf
Review #1 by Saakshi Verma
Panache is a read-write, multi-node, highly scalable, high-performance, clustered, file system cache - for parallel data-intensive applications that require wide area file access with additional features like disconnected operations, persistence across failure and consistency management. It simultaneously involves the update of cache from multiple nodes & the data and metadata updates from cache to remote file system using pNFS. Panache also takes care of fluctuations in WAN latencies, is easy to deploy, does not require any specific resources at remote clusters & eliminates single server bottleneck of file access protocol. It also manages conflict handling and resolution in disconnected mode operations managing them in cluster settings.
The fully parallelizable design is due to parallel ingest, parallel access, parallel update, parallel delayed data write-back & parallel delayed metadata write-back. Also, all data and metadata updates are asynchronous which supports WAN latencies and outages. The two basic components of Panache are: GPFS and pNFS. GPFS being the high performance shared disk cluster file system and pNFS protocoal enabling the clients for direct and parallel access to storage. The pNFS-GPFS architecture is a three layer structure having file based Clients, GPFS Data and state servers and storage.
Link to the full review:
Review #1 by Saakshi Verma
Panache is a read-write, multi-node, highly scalable, high-performance, clustered, file system cache - for parallel data-intensive applications that require wide area file access with additional features like disconnected operations, persistence across failure and consistency management. It simultaneously involves the update of cache from multiple nodes & the data and metadata updates from cache to remote file system using pNFS. Panache also takes care of fluctuations in WAN latencies, is easy to deploy, does not require any specific resources at remote clusters & eliminates single server bottleneck of file access protocol. It also manages conflict handling and resolution in disconnected mode operations managing them in cluster settings.
The fully parallelizable design is due to parallel ingest, parallel access, parallel update, parallel delayed data write-back & parallel delayed metadata write-back. Also, all data and metadata updates are asynchronous which supports WAN latencies and outages. The two basic components of Panache are: GPFS and pNFS. GPFS being the high performance shared disk cluster file system and pNFS protocoal enabling the clients for direct and parallel access to storage. The pNFS-GPFS architecture is a three layer structure having file based Clients, GPFS Data and state servers and storage.
Link to the full review:
http://www.cse.buffalo.edu/faculty/tkosar/cse726/reviews/04-review1-verma.pdf
Review #2 by Rishi Baldawa
Panache is an IBM product which is scalable, high performance and has clustered file system cache; aiming to provide flawless access to remote datasets using POSIX interface. It can be used for parallel data intensive applications that require Wide Area Network. In this file system, every aspect of the architecture is parallel in nature such as ingest, access, updates and write- backs. It takes care of the Wide Area Network latencies and outages using asynchronous operations employ conflict handling and resolution in disconnected mode operations.
Panache uses pNFS along with GPFS. GPFS is used as the high performance storage cluster system while pNFS protocol reduce bottlenecks by using GPFS storage protocols and provide direct access to GPFS. “Panache is implemented as a multi-mode caching layer, integrated within the GPFS that can persistently and consistently cache data and metadata from remote cluster.” Within a cache cluster, every node can access every cached data and metadata providing applications (running on the same cluster) with the same performance as that on server where the data and metadata are actually located. Panache allows asynchronous updates of the cache for improved application performance on the local machine. The Cache cluster architecture has two types of files systems: Cache Clusters and Remote Cluster File Systems. The nodes on these systems can be of two types as well: Application nodes which service application data requests and Gateway Nodes which act as the client proxies to fetch data in parallel from remote site and store it in the cache.
Review #2 by Rishi Baldawa
Panache is an IBM product which is scalable, high performance and has clustered file system cache; aiming to provide flawless access to remote datasets using POSIX interface. It can be used for parallel data intensive applications that require Wide Area Network. In this file system, every aspect of the architecture is parallel in nature such as ingest, access, updates and write- backs. It takes care of the Wide Area Network latencies and outages using asynchronous operations employ conflict handling and resolution in disconnected mode operations.
Panache uses pNFS along with GPFS. GPFS is used as the high performance storage cluster system while pNFS protocol reduce bottlenecks by using GPFS storage protocols and provide direct access to GPFS. “Panache is implemented as a multi-mode caching layer, integrated within the GPFS that can persistently and consistently cache data and metadata from remote cluster.” Within a cache cluster, every node can access every cached data and metadata providing applications (running on the same cluster) with the same performance as that on server where the data and metadata are actually located. Panache allows asynchronous updates of the cache for improved application performance on the local machine. The Cache cluster architecture has two types of files systems: Cache Clusters and Remote Cluster File Systems. The nodes on these systems can be of two types as well: Application nodes which service application data requests and Gateway Nodes which act as the client proxies to fetch data in parallel from remote site and store it in the cache.
Link to the full review: