Is demand estimation recalculated everytime cores/aggregators/hosts are added to the network? Although the whole idea is scalable, the requirement for convergence seems to make it lack a 'plug and play' property.
The setup of the initial state prior to the simulated annealing is understandable, but unclear as to why it is either needed or useful. Since later optimizations are made on the search space of all core flows instead of just core-destination ones, why cannot all core flows be included in the initial state itself?
Is there an exit condition to the simulated annealing: say an energy-per-flow parameter and if there is, what is the ideal ratio for large data centers?
It may be out of the scope of this paper, but do you think it would be possible to address the problem suggested, by merely partitioning the data into smaller packages, because in the paper, the authors say that ECMP works better for the smaller packages. So, instead of using an extra machine and do all these dynamic computing, could we benefit from the bandwidth more by partitioning the big chunks of data into smaller data packages, or would it be too hard to keep the data stable?
In the case of Simulated annealing they assign a single core switch for each destination host. But in this case if there is more traffic to a particular destination host wont the single core switch be a bottle neck in the system? Also is the mapping of the core switch to the destination host one-one or left to the implementation?
A key limitation of ECMP is that two or more large,long-lived flows can collide on their hash and end up on the same output port, creating a bottleneck.Is there any mechanism to tweak the hash function when such hash collisions occur?
Currently in production networks whenever there are hash collisions observed the network administrators manually tweak the hash function to achieve better performance.
In the global first fit algorithm, it greedily searches for the paths that can accommodate the flow. What happens when all the paths which matches the flow are exceeding their capacity ?
@shashank global first fit algorithm doesn't guarantee that all flows will be accommodated and so it is not a good choice when the links become saturated
In section 3.2 the authors state that the “centralized scheduler is POSSIBLY replicated for fail-over and scalability” why do you think the authors said POSSIBLY, instead of giving a definitive statement ? In regards to the overhead involved in replication, do you think overhead would cause major latency issues ? I f so what do you think can be done to mitigate theses issues.
In section 3.2, the authors stated that “In this model, whenever a flow persists for some time and its bandwidth demand grows beyond a defined limit, we assign it a path using one of the scheduling algorithms described in Section 4.” How long is some time, is there a particular time or range of time? Also what would be a typical defined limit? Does this defined limit determine what scheduling algorithm should be used?
The author says that the flow entries expire after a timeout once the flow terminates. Why do you think the flow entries have to be maintained for a given time even after the flow has been terminated?
Is demand estimation recalculated everytime cores/aggregators/hosts are added to the network? Although the whole idea is scalable, the requirement for convergence seems to make it lack a 'plug and play' property.
ReplyDeleteThe setup of the initial state prior to the simulated annealing is understandable, but unclear as to why it is either needed or useful. Since later optimizations are made on the search space of all core flows instead of just core-destination ones, why cannot all core flows be included in the initial state itself?
Is there an exit condition to the simulated annealing: say an energy-per-flow parameter and if there is, what is the ideal ratio for large data centers?
It may be out of the scope of this paper, but do you think it would be possible to address the problem suggested, by merely partitioning the data into smaller packages, because in the paper, the authors say that ECMP works better for the smaller packages. So, instead of using an extra machine and do all these dynamic computing, could we benefit from the bandwidth more by partitioning the big chunks of data into smaller data packages, or would it be too hard to keep the data stable?
ReplyDeleteIn the case of Simulated annealing they assign a single core switch for each destination host. But in this case if there is more traffic to a particular destination host wont the single core switch be a bottle neck in the system? Also is the mapping of the core switch to the destination host one-one or left to the implementation?
ReplyDeleteA key limitation of ECMP is that two or more large,long-lived flows can collide on their hash and end up on the same output port, creating a bottleneck.Is there any mechanism to tweak the hash function when such hash collisions occur?
ReplyDeleteCurrently in production networks whenever there are hash collisions observed the network administrators manually tweak the hash function to achieve better performance.
It is said that Hedera uses PortLand routing to avoid fault tolerance.
ReplyDeleteWhat is portland routing mechanism and how can it be used for fault tolerance?
In the global first fit algorithm, it greedily searches for the paths that can accommodate the flow. What happens when all the paths which matches the flow are exceeding their capacity ?
ReplyDeleteThe paper speaks about flow scheduling, but it does not assign any priority to it. Will it not be a good idea to add flow based priority?
ReplyDelete@shashank global first fit algorithm doesn't guarantee that all flows will be accommodated and so it is not a good choice when the links become saturated
ReplyDeleteWhy do you think the authors chose to design a centralized scheduler vs. a non-centralized scheduler? What are the major trade-offs between the two?
ReplyDeleteIn section 3.2 the authors state that the “centralized scheduler is POSSIBLY replicated for fail-over and scalability” why do you think the authors said POSSIBLY, instead of giving a definitive statement ? In regards to the overhead involved in replication, do you think overhead would cause major latency issues ? I f so what do you think can be done to mitigate theses issues.
ReplyDeleteIn section 3.2, the authors stated that “In this model, whenever a flow persists for some time
ReplyDeleteand its bandwidth demand grows beyond a defined limit, we assign it a path using one of the scheduling algorithms described in Section 4.” How long is some time, is there a particular time or range of time? Also what would be a typical defined limit? Does this defined limit determine what scheduling algorithm should be used?
The author says that the flow entries expire after a timeout once the flow terminates. Why do you think the flow entries have to be maintained for a given time even after the flow has been terminated?
ReplyDelete