Loop-aware scheduling in HaLoop aims to improve efficiency for iterative algorithms by placing map and reduce tasks that access the same data on the same physical machines across iterations. It maintains mappings of data partitions to slave nodes and uses caching techniques like caching reducer inputs and outputs to reuse computations and minimize data shuffling. The number of reduce tasks is kept the same across iterations to maintain a consistent hash function for routing mapper outputs to reducers.
8. Loop-aware Schedulingplace on the same physical machines those map and reduce tasks that occur in different iterations but access the same data.
9. Scheduling Algorithmthe number of reduce tasks should be invariant across iterations, so that the hash function assigning mapper outputs to reducer nodes remains unchanged.the master node maintains a mapping from each slave node to the data partitions that this node processed in the previous iteration.
10. CachesReducer Input CacheSame key hashed to same reducer.f must be deterministic, same across iterations, take tuple t as only the input.Number of reducers remains unchanged.Reducer Output CacheThat is, if two Reduce function calls produce the same output key from two different reducer input keys, both reducer input keys must be in the same partition so that they are sent to the same reduce task.Mapper Input Cache