Comp150CPA: Clouds and Power-Aware Computing
Classroom Exercise 8
Pig and Data Transformation
Spring 2011

group member 1: ____________________________ login: ______________

group member 2: ____________________________ login: ______________

group member 3: ____________________________ login: ______________

group member 4: ____________________________ login: ______________

group member 5: ____________________________ login: ______________

In class we have studied the Pig language for Map/Reduce and its basic transformations. Let's explore the latter in more detail. Suppose that inside Pig, we have
grunt> DUMP x; 
... Success!!
grunt> DESCRIBE x
x: {name: chararray, species: chararray}
grunt> DUMP y; 
... Success!!
grunt> DESCRIBE y
x: {species: chararray, action: chararray}

  1. What is printed by the following scripts?
    1. z = FOREACH x GENERATE name,$0; 
      DUMP z; 

    2. z = FILTER y BY action=='Growls'; 
      DUMP z; 

    3. z = GROUP y by species; 
      DUMP z; 

    4. z = JOIN x BY species,y BY species; 
      DUMP z; 

    5. z = JOIN x BY species,y BY species; 
      w = FOREACH z GENERATE $0,$3; 
      DUMP w; 

  2. (Advanced) Why is execution delayed as long as possible in Pig?