Thursday, January 27, 2011

Panache: A Parallel File System Cache for Global File Access

Abstract
Cloud computing promises large-scale and seamless access to vast quantities of data across the globe. Appli cations will demand the reliability, consistency, and per- formance of a traditional cluster file system regardless of the physical distance between data centers.

Panache is a scalable, high-performance, clustered file system cache for parallel data-intensive applications that require wide area file access. Panache is the first file system cache to exploit parallelism in every aspect of its design—parallel applications can access and update the cache from multiple nodes while data and metadata is pulled into and pushed out of the cache in parallel. Data is cached and updated using pNFS, which performs parallel I/O between clients and servers, eliminating the single-server bottleneck of vanilla client-server file ac- cess protocols. Furthermore, Panache shields applica- tions from fluctuating WAN latencies and outages and is easy to deploy as it relies on open standards for high- performance file serving and does not require any propri- etary hardware or software to be installed at the remote cluster.

In this paper, we present the overall design and imple- mentation of Panache and evaluate its key features with multiple workloads across local and wide area networks.

Link to the paper:
http://www.usenix.org/events/fast10/tech/full_papers/eshel.pdf

Presented by Ajinkyaatul Alekar
Link to the slides:
http://www.cse.buffalo.edu/faculty/tkosar/cse726/slides/04-alekar.pdf

Review #1 by Saakshi Verma
Panache is a read-write, multi-node, highly scalable, high-performance, clustered, file system cache - for parallel data-intensive applications that require wide area file access with additional features like disconnected operations, persistence across failure and consistency management. It simultaneously involves the update of cache from multiple nodes & the data and metadata updates from cache to remote file system using pNFS. Panache also takes care of fluctuations in WAN latencies, is easy to deploy, does not require any specific resources at remote clusters & eliminates single server bottleneck of file access protocol. It also manages conflict handling and resolution in disconnected mode operations managing them in cluster settings.

The fully parallelizable design is due to parallel ingest, parallel access, parallel update, parallel delayed data write-back & parallel delayed metadata write-back. Also, all data and metadata updates are asynchronous which supports WAN latencies and outages. The two basic components of Panache are: GPFS and pNFS. GPFS being the high performance shared disk cluster file system and pNFS protocoal enabling the clients for direct and parallel access to storage. The pNFS-GPFS architecture is a three layer structure having file based Clients, GPFS Data and state servers and storage.

Link to the full review: 
http://www.cse.buffalo.edu/faculty/tkosar/cse726/reviews/04-review1-verma.pdf

Review #2 by Rishi Baldawa
Panache is an IBM product which is scalable, high performance and has clustered file system cache; aiming to provide flawless access to remote datasets using POSIX interface. It can be used for parallel data intensive applications that require Wide Area Network. In this file system, every aspect of the architecture is parallel in nature such as ingest, access, updates and write- backs. It takes care of the Wide Area Network latencies and outages using asynchronous operations employ conflict handling and resolution in disconnected mode operations.

Panache uses pNFS along with GPFS. GPFS is used as the high performance storage cluster system while pNFS protocol reduce bottlenecks by using GPFS storage protocols and provide direct access to GPFS. “Panache is implemented as a multi-mode caching layer, integrated within the GPFS that can persistently and consistently cache data and metadata from remote cluster.” Within a cache cluster, every node can access every cached data and metadata providing applications (running on the same cluster) with the same performance as that on server where the data and metadata are actually located. Panache allows asynchronous updates of the cache for improved application performance on the local machine. The Cache cluster architecture has two types of files systems: Cache Clusters and Remote Cluster File Systems. The nodes on these systems can be of two types as well: Application nodes which service application data requests and Gateway Nodes which act as the client proxies to fetch data in parallel from remote site and store it in the cache.

Link to the full review: 

Black-Box Problem Diagnosis in Parallel File Systems

Abstract
We focus on automatically diagnosing different perfor- mance problems in parallel file systems by identify- ing, gathering and analyzing OS-level, black-box perfor- mance metrics on every node in the cluster. Our peer- comparison diagnosis approach compares the statistical attributes of these metrics across I/O servers, to identify the faulty node. We develop a root-cause analysis proce- dure that further analyzes the affected metrics to pinpoint the faulty resource (storage or network), and demonstrate that this approach works commonly across stripe-based parallel file systems. We demonstrate our approach for realistic storage and network problems injected into three different file-system benchmarks (dd, IOzone, and Post- Mark), in both PVFS and Lustre clusters.

Link to the paper:
http://www.usenix.org/events/fast10/tech/full_papers/kasick.pdf

Presented by Rishi Baldawa
Link to the slides:
http://www.cse.buffalo.edu/faculty/tkosar/cse726/slides/03-baldawa.pdf

Review #1 by Hiraksh Bhagat
In this paper, the authors has developed an algorithm for automatically diagnosing different performance problems in parallel file systems by comparing different metrics gathered at every node. It uses Black Box performance metrics for peer comparison to basically do two things (i) to find whether any fault exists in the system and (ii) analyze the metrics to pinpoint faulty resource. The main goals of the author are application transparency minimal false alarms, minimal instrumentation overhead and many specific problem coverage. The paper says very clearly of what it is not looking to achieve here like code-level debugging, pathological workloads and diagnosis of non-peers. The paper demonstrates authors’ approach for realistic storage problems injected into different file system bench marks in PVFS and Lustre clusters.

The paper aptly describes why it uses Black Box metrics in peer comparison. It makes various of assumptions like all peer servers have identical software configuration, are synchronized and have a homogenous environment. The problem involving storage and network resources are separated into two classes viz. hog faults and busy or loss faults. Considering a small file system, the paper makes a variety of observations assuming many things which is not entirely true. Based on these observations, the authors developed the diagnosis algorithm. It works in two phases. The first phase finds the faulty server by using PDF on various OS-level metrics. It gives two approaches for this viz. Histogram based approach and Time based approach. Threshold selection is implemented on training data using machine learning algorithms. Phase 2 observes peer divergence in storage and network resources by calculating throughput and latency...



Link to the full review:
http://www.cse.buffalo.edu/faculty/tkosar/cse726/reviews/03-review1-bhagat.pdf

Review #2 by Deepak Agrawal
The paper discusses about the OS-level, Black Box performance metrics applied to every node in Parallel file system like PVFS and Lustre, to identify difference performance problem, with the aim to find the faulty node and the using root cause analysis to find the faulty resource.

The goals of the Black Box testing is to be Application transparent so that the application do not require modification, minimize the false alarms, and minimal Instrumentation overhead so that analysis does not adversely impact performance. There are certain assumption which this paper makes like the IO servers are synchronized and a majority the exhibit fault free behavior, and the client and servers are comprised of homogeneous hardware and workloads.


The problem with storage and network resources the paper is focusing are disk hogs, disk busy, network hogs and packet-loss (network-busy). The paper list downs certain empirical observations of PVFS’s / Lustre file systems , concluding that the approach might apply to parallel file system in general...

Link to the full review:
http://www.cse.buffalo.edu/faculty/tkosar/cse726/reviews/03-review2-agrawal.pdf

Tuesday, January 25, 2011

PVFS: A Parallel File System for Linux Clusters

Abstract
As Linux clusters have matured as platforms for low- cost, high-performance parallel computing, software packages to provide many key services have emerged, especially in areas such as message passing and net- working. One area devoid of support, however, has been parallel file systems, which are critical for high- performance I/O on such clusters. We have developed a parallel file system for Linux clusters, called the Parallel Virtual File System (PVFS). PVFS is intended both as a high-performance parallel file system that anyone can download and use and as a tool for pursuing further re- search in parallel I/O and parallel file systems for Linux clusters.

In this paper, we describe the design and implementa- tion of PVFS and present performance results on the Chiba City cluster at Argonne. We provide performance results for a workload of concurrent reads and writes for various numbers of compute nodes, I/O nodes, and I/O request sizes. We also present performance results for MPI-IO on PVFS, both for a concurrent read/write workload and for the BTIO benchmark. We compare the I/O performance when using a Myrinet network versus a fast-ethernet network for I/O-related communication in PVFS. We obtained read and write bandwidths as high as 700 Mbytes/sec with Myrinet and 225 Mbytes/sec with fast ethernet.


Link to the full paper: 
http://www.cct.lsu.edu/~kosar/csc7700-fall06/papers/Carns00.pdf


Presented by Shashank Kota Sathish
Link to the slides:
In PVFS, Carns et al try to develop a Virtual Parallel File system on top Local File System, hence the name PVFS, for Linux Systems providing dynamic distribution of IO and meta data. The main goals of the authors were high bandwidth concurrent read/writes from multiple nodes to a single file, support for multiple APIs, use of common UNIX commands for DFS, Application of APIs without constant recompilation, robustness, scalability and easy installations and use. The Paper provided a relatively cheap Distributed File System that could be easily applied to any Linux Based Cluster independent of significant hardware requirements and could be use for the applications such as scientific research, media streaming, complex computations etc.

PVFS allows user(s) to store and retrieve data using common UNIX commands (such as ls,cp and rm) where data stripped in round robin fashion and stored on multiple independent machines with different network connections. Data is stored in a distributed fashion to reduce single file bottlenecks and increase the aggregate bandwidth of the system...

Link to the full review:

http://www.cse.buffalo.edu/faculty/tkosar/cse726/reviews/02-review1-baldawa.pdf


Review #2 by Sughosh Kadkol
PVFS is proposed to be an open source solution available for download and use in research for parallel file systems and parallel I/O. The paper discusses the motivation, techniques and experimental results in developing an alternative to parallel file systems dominated by commercial parallel machines. PVFS is designed to provide high bandwidth concurrent I/O, support multiple API sets along with basic UNIX interoperability, be robust, scalable with a relative ease in installation and use. The tool described should provide a simple and cost-effective solution for data intensive research projects.

Platform specific commercial clusters and the lack of suitability of distributed file systems for large parallel scientific applications presented the need for a robust and scalable solution to PFS. To allow simple operation of the PFS, it is designed to include a wrapper in a custom kernel module replacing the standard UNIX wrapper with logic for both kernel and PVFS I/O support. The MPI-IO API is introduced to handle I/O operations to handle a custom data storage specification...

GPFS: A Shared-Disk File System for Large Computing Clusters

Abstract

GPFS is IBM’s parallel, shared-disk file system for cluster computers, available on the RS/6000 SP parallel supercomputer and on Linux clusters. GPFS is used on many of the largest supercomputers in the world. GPFS was built on many of the ideas that were developed in the academic community over the last several years, particularly distributed locking and recovery technology. To date it has been a matter of conjecture how well these ideas scale. We have had the opportunity to test those limits in the context of a product that runs on the largest systems in existence. While in many cases existing ideas scaled well, new approaches were necessary in many key areas. This paper describes GPFS, and discusses how distributed locking and recovery techniques were extended to scale to large clusters.

Link to the full paper:

http://www.almaden.ibm.com/StorageSystems/projects/gpfs/Fast02.pdf

Presented by Prudhvi Reddy Avula
Link to the slides:

http://www.cse.buffalo.edu/faculty/tkosar/cse726/slides/01-avula.pdf


Review #1 by Venkata Sudheerkumar Mupparaju 
This paper “GPFS: A Shared-Disk File System for Large Computing Cluster” describes the overall architecture of GPFS (General Parallel File System) which is IBM's parallel shared-disk file system for cluster computers, paper describes its approach to achieving parallelism and data consistency in cluster environment, it details some of the features that contribute to its performance and scalability, describes the design for fault-tolerance and presents data on its performance.

GPFS achieves its extreme scalability through its shared-disk architecture. SAN provides Shared Disks, but SAN itself does not provide a Shared File System. If you have several computers that have access to a Shared Disk and try to use that disk with a regular File System, the disk logical structure will be damaged very quickly. Disk Space Allocation inconsistency and File Data inconsistency makes it impossible to use Shared Disks with regular File Systems as Shared File Systems. Cluster File Systems are designed to solve the problems outlined above. GPFS is one such parallel File System for cluster computers that provides as closely as possible the behavior of a general- purpose POSIX file system running on a single machine...

Link to the full review: 

http://www.cse.buffalo.edu/faculty/tkosar/cse726/reviews/01-review1-mupparaju.pdf


Review #2 by Pramod Kundapur Nayak

With the thirst for higher computing power in demand, cluster computing has become a trend. Fault-tolerance, boundless computing power, unrestrained storage capacity being prime requirements of a reliable system, cluster computing has been of keen interest among researchers. This paper focuses on the storage aspect of cluster computing by introducing GPFS (General Parallel File System), a file system package from IBM, which provides functionalities similar to standard POSIX file system.

To summarize:
• GPFS appears to work like traditional POSIX file system but provides parallel access to files.
• Enhanced performance achieved through data striping at block level across all disks in file system.
• Supports upto 4096 disks of upto 1TB each, providing a total of 4 petabytes per file system.
• Both file data and metadata of any disk is accessible from any node through disk I/O calls. Further, GPFS facilitates parallel flow of both data and metadata from node to disk.
• Highly reliable with fault-tolerance and replication mechanism.

This paper highlights GPFS’s answers to performance, scalability, concurrency and fault- tolerance issues of large file system and provides bird’s eye view of GPFS...

Link to the full review:

http://www.cse.buffalo.edu/faculty/tkosar/cse726/reviews/01-review2-nayak.pdf