It is assumed that all tasks are independent of each other. How realistic is this assumption?
Scheduler can kill tasks and resubmit it in the queue. Instead of restarting the whole task, can't it save it's last state and next time run it from there?
The scheduler may decide to kill a worker task before it completes in order to allow other jobs to have access to its resources. Such a killed task restarts from the beginning. Also, if a computer fails the job, the job will be re-executed from the start.
Why cant there be a state maintained for each job so that the overhead of re running such jobs are eliminated.
It is mentioned in the paper that,if computations are not placed close to their input data, the network can therefore become a bottleneck. Additionally, reducing network traffic simplifies capacity planning,can you elaborate on this?
In the paper, in the cluster architecture it is mentioned that there is only a single centralized scheduling service. But what happens in case of a single point failure?
The author describes a Computational Model wherein a "root task" which manages the workflow and, according to what i understand, assigns the individual "worker tasks" that run on any computer. Now, as the worker tasks finish their job, the inform the root task about this.
So, my question is, does the root task have to busy-wait till the worker task responds back to it? What happens in case the worker task, which is running on a separate node on the cluster, takes longer than usual due to say, a sequential job or maybe it goes out of the cluster due to problems in the connectivity?
is there a specific reason behind using Dryad?
ReplyDeleteIt is assumed that all tasks are independent of each other. How realistic is this assumption?
Scheduler can kill tasks and resubmit it in the queue. Instead of restarting the whole task, can't it save it's last state and next time run it from there?
How is data locality taken care of in the graph construction?
ReplyDeleteThe scheduler may decide to kill a worker task before it completes in order to allow other jobs to have access to its resources. Such a killed task restarts from the beginning. Also, if a computer fails the job, the job will be re-executed from the start.
ReplyDeleteWhy cant there be a state maintained for each job so that the overhead of re running such jobs are eliminated.
It is mentioned in the paper that,if computations are not placed close to their input data, the network can therefore become a
ReplyDeletebottleneck. Additionally, reducing network traffic simplifies capacity planning,can you elaborate on this?
@anudipa
ReplyDeleteI guess its because of its fine grain resource sharing strategy
--
prudhvi
In the paper, in the cluster architecture it is mentioned that there is only a single centralized scheduling service.
ReplyDeleteBut what happens in case of a single point failure?
--
prudhvi
The author describes a Computational Model wherein a "root task" which manages the workflow and, according to what i understand, assigns the individual "worker tasks" that run on any computer. Now, as the worker tasks finish their job, the inform the root task about this.
ReplyDeleteSo, my question is, does the root task have to busy-wait till the worker task responds back to it? What happens in case the worker task, which is running on a separate node on the cluster, takes longer than usual due to say, a sequential job or maybe it goes out of the cluster due to problems in the connectivity?