Comp150CPA: Clouds and Power-Aware Computing
Classroom Exercise 10b (unassigned)
Map and Reduce
Spring 2011
group member 1: ____________________________ login: ______________
group member 2: ____________________________ login: ______________
group member 3: ____________________________ login: ______________
group member 4: ____________________________ login: ______________
group member 5: ____________________________ login: ______________
In class we have studied how to convert familiar algorithms
to the MapReduce framework. Let's explore that in more detail.
Why can the reducer and combiner be the same subroutine in WordCount?
Suppose that rather than counting words, you want to compute the
range of frequencies of words, i.e., the lowest and highest word
counts over all documents in a set. Describe the mapper input and output
necessary to solve this problem. Feel free to use new classes.
Suppose that after mapping, you have four mapper outputs with contents
Mapper 0
key
value
foo
1
albatross
10
Mapper 1
key
value
albatross
7
foo
3
Mapper 2
key
value
bane
5
joe
1
Mapper 3
key
value
bar
5
albatross
7
Suppose that the partioning hash function maps words beginning
with "a"-"g" to 0, "h"-"n" to 1, "o"-"u" to 2, and "v"-"z" to 3.
Simulate the result of each step of the butterfly sort and describe
how many communication steps are necessary to completely sort the data
above to appropriate workers.
(Advanced) Suppose that we represent a table of distances between
cities in MapReduce as city1 -> < distance, city2 > where
city1 and city2 are cities and distance is the shortest distance between them.
Give a MapReduce algorithm for determining the shortest distance between
two cities X and Y, using intermediate cities when necessary. What does the
Map step do, and what does the Reduce step do?