The author states the method of monitoring the queue back - pressure for detecting contention. What are the parameters in identifying which resources to process when considering the back - pressure? Does it account for the priorities or does it just act like a FIFO Queue?
I think it can be inferred that network and Disk are more critical. If there is disk contention or the network is unavailable, then any CPU scheduling is deferred. And coming to processing requests, the author mentions that if the disk is overloaded, no new disk read operations are scheduled and since disk write operations are critical to the transfer process, writes are given higher priority over reads.
For correct file transfer collision-free hash of file chunks is important. In case of very big file the number of file chunks can be very large increasing the chances of collision. How this situation is handled?
Dsync is designed for general file transfer and mirroring services. However in the context of distributed systems it might need some modifications, like a greater chunk size implementation. The need for data replication would be the greatest application of dsync in a distributed system and since the number of mirrors used would generally be single digit, would it make sense to use a bittorrent style approach for the transfer?
Assumption 4 says that the authors exploit the similarity between files which is calculated by comparing hash codes. Is hash code the only method suggested by the authors? Also, for large files (eg. logs of long running jobs) which are in GBs, the hash code generation itself would turn out to be an overhead, more so if the logs gets updated frequently.
The author states the method of monitoring the queue back - pressure for detecting contention. What are the parameters in identifying which resources to process when considering the back - pressure? Does it account for the priorities or does it just act like a FIFO Queue?
ReplyDeleteI think it can be inferred that network and Disk are more critical. If there is disk contention or the network is unavailable, then any CPU scheduling is deferred. And coming to processing requests, the author mentions that if the disk is overloaded, no new disk read operations are scheduled and since disk write operations are critical to the transfer process, writes are given higher priority over reads.
ReplyDeleteFor correct file transfer collision-free hash of file chunks is important. In case of very big file the number of file chunks can be very large increasing the chances of collision. How this situation is handled?
ReplyDeleteDsync is designed for general file transfer and mirroring services. However in the context of distributed systems it might need some modifications, like a greater chunk size implementation. The need for data replication would be the greatest application of dsync in a distributed system and since the number of mirrors used would generally be single digit, would it make sense to use a bittorrent style approach for the transfer?
ReplyDeleteAssumption 4 says that the authors exploit the similarity between files which is calculated by comparing hash codes. Is hash code the only method suggested by the authors? Also, for large files (eg. logs of long running jobs) which are in GBs, the hash code generation itself would turn out to be an overhead, more so if the logs gets updated frequently.
ReplyDelete