Comp150CPA: Clouds and Power-Aware Computing
Classroom Exercise 8
Pig and Data Transformation
Spring 2011

group member 1: ____________________________ login: ______________

group member 2: ____________________________ login: ______________

group member 3: ____________________________ login: ______________

group member 4: ____________________________ login: ______________

group member 5: ____________________________ login: ______________

In class we have studied the Pig language for Map/Reduce and its basic transformations. Let's explore the latter in more detail. Suppose that inside Pig, we have
 
grunt> DUMP x; 
... Success!!
(George,Bear)
(Frank,Dog)
(George,Bear)
(Bill,Cat)
(Amy,Bear)
grunt> DESCRIBE x
x: {name: chararray, species: chararray}
 
grunt> DUMP y; 
... Success!!
(Bear,Hugs)
(Dog,Barks)
(Dog,Growls)
(Cat,Purrs)
(Bear,Growls)
grunt> DESCRIBE y
x: {species: chararray, action: chararray}

  1. What is printed by the following scripts?
    1. z = FOREACH x GENERATE name,$0; 
      DUMP z; 
      

      Answer:
       
      (George,George)
      (Frank,Frank)
      (George,George)
      (Bill,Bill)
      (Amy,Amy)
      
    2. z = FILTER y BY action=='Growls'; 
      DUMP z; 
      

      Answer:
       
      (Dog,Growls)
      (Bear,Growls)
      
    3. z = GROUP y by species; 
      DUMP z; 
      

      Answer:
       
      (Bear,{(Bear,Hugs),(Bear,Growls)})
      (Dog,{(Dog,Barks),(Dog,Growls)})
      (Cat,{(Cat,Purrs)})
      
    4. z = JOIN x BY species,y BY species; 
      DUMP z; 
      

      Answer:
       
      (George,Bear,Bear,Hugs)
      (George,Bear,Bear,Growls)
      (Frank,Dog,Dog,Barks)
      (Frank,Dog,Dog,Growls)
      (George,Bear,Bear,Hugs)
      (George,Bear,Bear,Growls)
      (Bill,Cat,Cat,Purrs)
      (Amy,Bear,Bear,Hugs)
      (Amy,Bear,Bear,Growls)
      
    5. z = JOIN x BY species,y BY species; 
      w = FOREACH z GENERATE $0,$3; 
      DUMP w; 
      

      Answer:
       
      (George,Hugs)
      (George,Growls)
      (Frank,Barks)
      (Frank,Growls)
      (George,Hugs)
      (George,Growls)
      (Bill,Purrs)
      (Amy,Hugs)
      (Amy,Growls)
      
  2. (Advanced) Why is execution delayed as long as possible in Pig?
    Answer: The whole query is much easier to optimize than partial queries, and allows more flexibility in making the query run as fast as possible. Also, in the case of error, changing one statement will sometimes repair the whole query, even though that statement is changed later than its consequents!