Wednesday, February 16, 2011

The Google File System

Link to the paper: 
http://labs.google.com/papers/gfs-sosp2003.pdf

Presenter: Sudheer Mupparaju
Reviewers: Pramod Nayak and Akshayajit Bhide

7 comments:

  1. I think the single master approach suggested in the paper has many drawbacks for such applications. For example, in case of a big demand of metadata operations on a small file, there will be a pressure on the master, and since only the single master knows how to handle the metadata mappings if the master is down, the whole cluster will be down as well.Also I think this approach limits the scalability of the system. Do you agree?

    ReplyDelete
  2. In section 3.1, the authors state: " The lease mechanism is designed to minimize management overhead at the master." This is a benefit of the lease mechanism, what are the disadvantages of the lease mechanism? How can these disadvantages be mitigated?

    ReplyDelete
  3. In describing the operation of Garbage Collection in section 4.4, the author states that GFS does not immediately reclaim the physical storage after a file is deleted. Wouldn't this affect the availability of the File System?
    It might not affect the availability in case of small files, but in case of very large files or says, a number of large files being deleted, the space will remain unclaimed until the regular garbage collection takes place and could have been used for storing other data.

    ReplyDelete
  4. What happens if a read/write occurs in the middle of a mutation?

    ReplyDelete
  5. During mutation operation, primary signals completion to client only if all replicas have acknowledged their individual mutation. Since primary replica dictates the serial order of mutation, client can be acknowledged after mutation is applied successfully at primary replica; making primary responsible for carrying out mutations on remaining replicas. right ?

    ReplyDelete
  6. @Duygu Master involvement in common operations is minimized by a large chunk size and by chunkleases, which delegates authority to primary replicas in data mutations.

    ReplyDelete
  7. Though it is true that the master only gives the client a list of replicas that contains the requested chunk index. But when millions such requests are targeted to a single master, wont it slow the master down which simultaneously runs other operations as well?

    ReplyDelete