The SlowNodeThreshold heuristic depends on the total progress score of a node requesting a task. Depending on the nature of the task (heavy I/O or high computation/input data), fast nodes may exhibit a low progress score even though the node is capable of executing a speculative task. Does the heuristic fail in this case?
How is progress score in LATE scheduler calculated? (because the one used in hadoop has problems that are mentioned in paper) Does that use the same or different one?
I think the same strategy as hadoop is employed to compute the progress score. Because the author says that how the score is calculated is not very important as long as the finishing order of tasks can be computed. Furthermore, the author also says that the scenarios in which hadoop's strategy fails usually do not occur in Map reduce jobs.
The paper assumes for the Hadoop Scheduler "A task’s progress score is representative of fraction of its total work that it has done. Specifically, in a reduce task, the copy, sort and reduce phases each take about 1/3 of the total time." But in general it is not exactly 1/3 So how does it impact in reality as the time taken for each phase is not uniform ?
One of the reasons of slow performance in the reduce phase is network load. If a reducer is performing slowly due to network load, wouldn't starting another attempt of the same task simply double the network load? This is one reason why Yahoo and Facebook disables speculation for reduce tasks. Please throw some light on the discussion from this perspective.
@Sughosh :The node as u said is not assigned a task if it busy doing I/O. But it also means that the node is in contention and that it's progress score is low and hence it is no longer a fast Node. Hence the heuristic seems fine.
@Ajinkya : Agreed, but the SlowNodeThreshold is defined to be the sum of progress scores for all succeeded and in-progress tasks on the node. This means that when the node asks for a new task it will not be considered because the 'succeeded and in-progress tasks' are below the threshold and thus will be wrongly perceived to be a slow node as you mentioned. The heuristic seems to fail in this situation.
All tests performed by the authors showed that Hadoop Scheduler with No speculation performed worst. But authors also mention that the facebook and yahoo disables speculation to get better performance.... can you comment on this ?
The SlowNodeThreshold heuristic depends on the total progress score of a node requesting a task. Depending on the nature of the task (heavy I/O or high computation/input data), fast nodes may exhibit a low progress score even though the node is capable of executing a speculative task. Does the heuristic fail in this case?
ReplyDeleteHow is progress score in LATE scheduler calculated? (because the one used in hadoop has problems that are mentioned in paper) Does that use the same or different one?
ReplyDeleteI think the same strategy as hadoop is employed to compute the progress score. Because the author says that how the score is calculated is not very important as long as the finishing order of tasks can be computed. Furthermore, the author also says that the scenarios in which hadoop's strategy fails usually do not occur in Map reduce jobs.
ReplyDeleteThe paper assumes for the Hadoop Scheduler "A task’s progress score is representative of fraction of its total work that it has done. Specifically, in a reduce task, the copy, sort and reduce phases each take about 1/3 of the total time."
ReplyDeleteBut in general it is not exactly 1/3 So how does it impact in reality as the time taken for each phase is not uniform ?
One of the reasons of slow performance in the reduce phase is network load. If a reducer is performing slowly due to network load, wouldn't starting another attempt of the same task simply double the network load? This is one reason why Yahoo and Facebook disables speculation for reduce tasks. Please throw some light on the discussion from this perspective.
ReplyDelete@Sughosh :The node as u said is not assigned a task if it busy doing I/O. But it also means that the node is in contention and that it's progress score is low and hence it is no longer a fast Node. Hence the heuristic seems fine.
ReplyDelete@Ajinkya : Agreed, but the SlowNodeThreshold is defined to be the sum of progress scores for all succeeded and in-progress tasks on the
ReplyDeletenode. This means that when the node asks for a new task it will not be considered because the 'succeeded and in-progress tasks' are below the threshold and thus will be wrongly perceived to be a slow node as you mentioned. The heuristic seems to fail in this situation.
All tests performed by the authors showed that Hadoop Scheduler with No speculation performed worst. But authors also mention that the facebook and yahoo disables speculation to get better performance.... can you comment on this ?
ReplyDelete@sudheer
ReplyDeleteIts mentioned in the paper that amount of work completed by the task is taken as progress score in LATE scheduler (similar to Hadoop)