Quincy: Fair Scheduling for Distributed Computing Clusters

Tuesday, March 1, 2011

Quincy: Fair Scheduling for Distributed Computing Clusters

Link to the paper:

http://www.sigops.org/sosp/sosp09/papers/isard-sosp09.pdf

Presenter: Saurabh Baisane

Reviewers: Fahim Patel and Kavyashree Prasad

7 comments:

anudipaMarch 2, 2011 at 7:08 PM
is there a specific reason behind using Dryad?

It is assumed that all tasks are independent of each other. How realistic is this assumption?

Scheduler can kill tasks and resubmit it in the queue. Instead of restarting the whole task, can't it save it's last state and next time run it from there?
ReplyDelete
Replies
FahimMarch 2, 2011 at 7:47 PM
How is data locality taken care of in the graph construction?
ReplyDelete
Replies
kavyashrMarch 2, 2011 at 9:43 PM
The scheduler may decide to kill a worker task before it completes in order to allow other jobs to have access to its resources. Such a killed task restarts from the beginning. Also, if a computer fails the job, the job will be re-executed from the start.

Why cant there be a state maintained for each job so that the overhead of re running such jobs are eliminated.
ReplyDelete
Replies
jyothsnaMarch 2, 2011 at 10:43 PM
It is mentioned in the paper that,if computations are not placed close to their input data, the network can therefore become a
bottleneck. Additionally, reducing network trafﬁc simpliﬁes capacity planning,can you elaborate on this?
ReplyDelete
Replies
prudhvireddyMarch 2, 2011 at 11:18 PM
@anudipa
I guess its because of its fine grain resource sharing strategy

--
prudhvi
ReplyDelete
Replies
prudhvireddyMarch 2, 2011 at 11:42 PM
In the paper, in the cluster architecture it is mentioned that there is only a single centralized scheduling service.
But what happens in case of a single point failure?

--
prudhvi
ReplyDelete
Replies
UnknownMarch 3, 2011 at 12:20 AM
The author describes a Computational Model wherein a "root task" which manages the workflow and, according to what i understand, assigns the individual "worker tasks" that run on any computer. Now, as the worker tasks finish their job, the inform the root task about this.

So, my question is, does the root task have to busy-wait till the worker task responds back to it? What happens in case the worker task, which is running on a separate node on the cluster, takes longer than usual due to say, a sequential job or maybe it goes out of the cluster due to problems in the connectivity?
ReplyDelete
Replies

Add comment