Comp150CPA: Clouds and Power-Aware Computing
Classroom Exercise 8
Pig and Data Transformation
Spring 2011

group member 1: ____________________________ login: ______________

group member 2: ____________________________ login: ______________

group member 3: ____________________________ login: ______________

group member 4: ____________________________ login: ______________

group member 5: ____________________________ login: ______________

In class we have studied the Pig language for Map/Reduce and its basic transformations. Let's explore the latter in more detail. Suppose that inside Pig, we have
 
grunt> DUMP x; 
... Success!!
(George,Bear)
(Frank,Dog)
(George,Bear)
(Bill,Cat)
(Amy,Bear)
grunt> DESCRIBE x
x: {name: chararray, species: chararray}
 
grunt> DUMP y; 
... Success!!
(Bear,Hugs)
(Dog,Barks)
(Dog,Growls)
(Cat,Purrs)
(Bear,Growls)
grunt> DESCRIBE y
x: {species: chararray, action: chararray}

  1. What is printed by the following scripts?
    1. z = FOREACH x GENERATE name,$0; 
      DUMP z; 
      





    2. z = FILTER y BY action=='Growls'; 
      DUMP z; 
      





    3. z = GROUP y by species; 
      DUMP z; 
      





    4. z = JOIN x BY species,y BY species; 
      DUMP z; 
      





    5. z = JOIN x BY species,y BY species; 
      w = FOREACH z GENERATE $0,$3; 
      DUMP w; 
      




  2. (Advanced) Why is execution delayed as long as possible in Pig?