Welcome to the COMP150DW Web Page

Spring 2004

CONTACT || ANNOUNCEMENTS || LECTURES || HOMEWORK || LINKS


Announcements

o Assignments 1-3 grades are now available via 'progress'
o The command psql -h pgsql may not be working. To access pgsql type (from sun or moon): ssh pgsql
When prompted, enter your password, and when you get the command prompt, type: psql
o There is a mailing list for the class. If you have not received messages from it, please email majordomo@cs.tufts.edu with the command 'subscribe comp150dw' in the body of the message.

Contact Information

Teacher: Professor Daniel Hebert - dhebert@mitre.org

TA: Kate Butler - kbutlr@cs.tufts.edu
Office Hours: Monday 3-4pm, Wednesday 4-5pm (Halligan 118)

Homework
HW1 || HW2 || HW3 || PROJECT || GENERAL INFO

Project :: DUE MAY 13
o Informational document and zip file located here
o Submit with 'provide comp150dw project'
o Note: All data must be entered into the tables. However, the program need not use it all.
o Please use 'provide' for as much of your information as possible: This includes .doc or .ppt files. If you turn something into my mailbox it MUST be there on Friday morning when I come to pick it up -- the projects will be graded that weekend and I won't be able to get at the office after Friday

Deliverables:
o Your Data Warehouse schema
o ALL data in your tables
o The program code
o A readme file with instructions on how to compile, run, etc., plus any information you feel might be helpful in addition.



Homework 3 :: DUE APRIL 22 BEFORE CLASS

o This homework assignment will utilize the data warehouse you previously built incorporating the hurricane data
o Implement an automated Attribute-Oriented Induction capability on your hurricane data
o Input needed :

  • Your relation tables for the hurricane data
  • A DMQuery for characterization
  • A list of attributes
  • A set of concept hierarchies or generalization operators
  • A generalized relation threshold
  • Attribute generalization thresholds for each attribute
o Transform the DMQL statement to a relational query (can do this by hand)
o Use this relational query in your program to retrieve data and then perform the AOI (see algorithm on pg. 188 of book)
o Visualize the results of the AOI via a generalized relation

Deliverables:
o The DMQL statement
o The program code
o A readme file with instructions on how to compile, run, etc., plus any information you feel might be helpful in addition.

Hints:
o This assignment requires just one query.
o After performing the AOI the result can be output in a table. You don't need to output the results from the original query -- just the results after the AOI.
o Even if your HW2 tables had some problems, you probably don't need to redo them to finish this assignment. The data won't be regraded.
o Input does not mean literal input to your program. It just means things you need to be aware of before you jump in to writing it.
o Interactivity is not a requirement.


Homework 2 :: DUE MARCH 18 BEFORE CLASS

o Design a data warehouse based on the hurricane data provided (three years of data). Two data sources with slightly different formats and data
o Utilize a star or snowflake schema
o Implement the data warehouse utilizing relational tables in postgres
o Integrate the data from the two separate data sources and import into your data warehouse
o Keep the data warehouse as you will utilize it in your next homework assignment

Hints
o There is no one right answer to this assignment.

Deliverables
o The schema for the data warehouse.
o The tables themselves (in PostGres).


Homework 1 :: DUE MARCH 11 BEFORE CLASS

o Implement a Data Warehouse using a star schema for the sales example in Chapter 2 page 49
o Import the following data into your database. SalesFact.txt, BranchDimension.txt, ItemDimension.txt, LocationDimension.txt, TimeDimension.txt (data files)
o Use “\COPY” command in postgresql to copy data
o Implement a program that allows you to do the following :

  • roll-up on total sales by year and by year/quarter
  • roll-up on total sales by country, by state, and by city
  • roll-up on total sales by item brand and by item type (digital or analog)
  • drill down on total sales by month and by day
  • drill down on total sales by item name
  • drill down on total sales by street address
o Have a simple user interface to allow activation of each of the functions
o Use SQL statements to implement each roll-up, drill-down

Hints
o Example - Run drill down on total sales by street address in Query Analyzer: select sum(s.dollarssold) as totalsales, l.street from SalesFactTable s, LocationDimensionTable l where l.locationkey = s.locationkey group by l.street
o You can run psql from any of the unix machines with: psql -h pgsql
o \?, \h and \h [command] in psql will provide information
o Here's an example of a c program which interfaces with psql: test1.c. To compile it, log into pgsql and use gcc -lpq.
o The syntax of the copy command is: \copy [table name] from '[file name]' using delimiters '[delimiter]'
o If you're getting errors running PostGres from a remote host, try logging in to pgsql directly.


How to turn in your assignments

To turn in your postgres assignments type the following:
provide comp150dw hwX filenames
where filenames are the names of the files you want to submit, and X is the number of the assignment.
Start with X = 1
You must always submit all your code and executables.

How to check your grades on assignments

To check your grades for your assignments, type:
progress comp150dw

Remember
To get credit for your assignments, you need to grant the TA permission to access your tables. This can be done with the following psql command:
'grant all on table to kbutlr;'
where table is a comma seperated list of your tables' names

Lecture Slides

Lecture slides are in MS PowerPoint format.

Lectures

PostgreSQL Stuff

You can find the Web page for PostgreSQL here.