The following are sketches of the important ingredients of a correct answer. Many answers are acceptable.
Answer: The key to the problem is what the mapper does and what the reducer does. The default input to the first mapper is a numeric offset and a document line. We need to transform that into pairs < document, position > where "document" is a document name and "position" is where the word occurs. Any pseudo-code that does this is acceptable.
For the reducer, one outputs the minimum "position" for each document. as a pair < document, position >. Since "minimum" is associative, one can use this reducer as a combiner.
But at this point, one isn't done. What if there are lots of these? One still needs to pick the one that is minimum. For this, we map < document, position > to < 1, < document, position > > and then, in a second reducer, pick the one for which the position is minimum.
Any pseudo-code that does this is acceptable. It is much less efficient to reduce < 1, < document, position > > in the first step.
Answer: The key is that in AppEngine, Map/Reduce operations are hidden inside data queries. To exploit this, one must input data in a form that implies appropriate Map/Reduce operations during queries. .
First, one must input the "documents" from Hadoop into a datastore that has the same structure, i.e., the AppEngine objects should have subparts such as < document, lineNumber, word >. One queries these for the objects whose words match the word in question, which is in essence a (hidden) Map operation. One can then iterate through the results and pick the minimum. Since one does not have "explicit" access to "reduce", this is slower.
However, there is another, much more clever option. One can set
the ordering for a query and the number of results to be
query.setRange(1,1) limits the query to
return the very first result in order, while using
query.setOrdering("lineNumber") orders the results by
line number. Using the two together is, in essence, a
The key to this problem is that a "negotiation" is a pair of repeated
loops, where the loops communicate with each other without deadlock.
In this diagram, which is a single negotiation, note that the decisions are done in parallel based upon communication, and that one does not branch before sharing state. Many people branched before sharing state, which causes the process to deadlock.
Answer: Cost of downtime = revenue lost + work lost.