Tuesday, March 1, 2011

The Case for RAMClouds: Scalable High-Performance Storage Entirely in DRAM

Link to the paper: 
http://www.stanford.edu/~ouster/cgi-bin/papers/ramcloud.pdf

Presenter:
 Luquan Huang  
Reviewers: Fahim Patel and Sughosh Kadkol

11 comments:

  1. In section 4.1, the author states that the overhead managing functions of virtualization on application servers contributes to latency. In regards to virtualization, are the authors talking about the xen virtualization technique, employed, by for example, amazon's elastic compute cloud instances? In the xen model, incomming packets must first pass through the driver domain (domain 0 or virtual monitor) and the guest domain (guest operating system) before reaching the application.

    ReplyDelete
  2. Follow up Question: Were any studies done to determine the average latency time delays that virtualization causes within a network infrastructure?

    In addition, the authors mentioned methods to reduce the latency that virtualization causes, such as "utilizing network interfaces that can be mapped directly into an applications address space so that applications can send and receive packets without involvement of the operating system or virtual machine monitor"; Has any further research been conducted in this area?

    ReplyDelete
  3. I think that the approach suggested may not be revolutionary as it is claimed, in practice, while making a cost comparison, we should take the replicas we will need into account as well. Because, we know that power outages happen with the data servers, and when it happens it does not affect only one node, sometimes the whole cluster is down. So, the data protection might be better with the traditional approach. Thus, the operational cost might be lower as well.

    ReplyDelete
  4. What is your take on heat generated by the set-up? Wouldn't it act as a major flaw in the design of the system?
    "It may also be possible to reduce
    RAMCloud energy usage by taking advantage of the
    low-power mode offered by DRAM chips, particularly during periods of low activity. "
    During high throughput can we expect low activity?

    Saakshi

    ReplyDelete
  5. It was stated that currently RAM Clouds cannot be used to store the video or media files. Is it possible to store these files on RAM clouds in compressed format?Has any research been done accordingly? (However this approach compromises on latency by some microseconds)


    Currently the video files are stored on server farms(banks of server).Whenever a video file is requested it is served from one of the server in the bank.The video files having high number of hits are stored in cache. Can we use the same approach of storing the video files having high number of hits in DRAM?

    ReplyDelete
  6. 1)
    in 3.6 when discussing the applicability of RAM clouds, which applications are considered "large" ? (if facebook's non-image data is the upper limit of RAMClouds, doesn't it undermine the usefulness of RAMclouds for all data possible?)

    2)Authors say "It is probably not yet practical to use RAMClouds for large-scale storage of media such as videos, photos, and songs. However, RAMClouds are practical for almost all other online data today" but isn't most of the online data exchange in the formats RAMcloud is not fit for.

    3) lastly, in Table 3, the authors give RAMcloud costs but how does it compare to other systems

    ReplyDelete
  7. In section 4.5 the author simply says that consistency issues are eliminated because the high throughput of a RAMCloud makes data replication unnecessary. But later he says in the Disadvantages section that replication will result in RAMCloud's advantages to be lost for applications that require data replication across datacenters. So there is a trade off between data being replicated or reduced latency.

    Are there any experimental results giving the performance of RAMCloud in applications which require replication?

    Because I feel that without experimental results the RAMCloud approach cannot be employed for applications that require replication. What is your opinion on this?

    ReplyDelete
  8. Is there any thing like optimal size of DRAM to be used for the size of disk (that may affect performance of the system)?

    When are the primary server DRAM updates written to disk?

    How often are the log entries at backup servers written to disk?

    ReplyDelete
  9. Adding to the question above from Sandeep,if data replication is employed then there will be a consistency issue. And how does RAMCloud implement the issue. What are the mechanisms that the RAMCloud use to achieve consistency.
    Again in 4.6 he says that RAMClouds will need to provide access control and security mechanisms reliable enough to allow mutually antagonistic applica-tions to cohabitate in the same RAMCloud cluster. So doesnt replication play a part here.

    ReplyDelete
  10. In the paper it is said that RAMClouds will probably
    use a technology other than DRAM for backup copies
    of data, because of its drawbacks.
    which other technology can be used ?

    ReplyDelete
  11. Section 4.2 says,
    "If a RAMCloud system is to ensure data integrity, it
    must be able to detect all significant forms of corruption; the use of DRAM for storage complicates this in
    several ways. First, bit error rates for DRAM are relatively high [22] so ECC memory will be necessary to
    avoid undetected errors. Even so, there are several other ways that DRAM can become corrupted, such as
    errors in peripheral logic, software bugs that make stray
    writes to memory, and software or hardware errors related to DMA devices. Without special attention, such
    corruptions will not be detected"
    Are there any special mechanisms for RAMCloud to detect these errors?

    ReplyDelete