A first course for those studying toward the Master of Science in Data Science and related degrees, including engineering Master of Science degrees with an emphasis upon data analysis.

Teaching Staff

Professor: J Singh, Jitendra.Singh@tufts.edu, Phone: (617) 444-9640
TA: Hongjie Wang, hongjie.wang@tufts.edu

Class Meetings

Lectures: Tue, Thu 10:30 am — 11:45 am. Synchronously over Zoom. You will need to sign in using your tufts credentials.

Office Hours:
J:: Wed 9:30 am — 10:30 am. Virtual Halligan, Room 44 .
Hongjie:: Mon 9:00 am — 6:00 pm. Tisch Library

J's Office: Virtual Halligan, Room 44

Zoom recording policy

To accommodate students who are unable to attend class for whatever reasons, the lectures and class discussion will be recorded.

By participating, you are consenting to the recording. If you have objections to being recorded, please contact me before class.

Prerequisites

COMP-205 is intended as a first course for those studying toward the Master of Science in Data Science and related degrees, including engineering Master of Science degrees with an emphasis upon data analysis.

The target audience for COMP-205 is MS students who do not have a lot of Computer Science (CS) background, typically first year Data Science students or non-CS grad students. If you were a CS undergrad are a CS grad student, this course will probably not teach you much you don't already know. If in doubt, please speak with the instructor. The formal prerequisites for the course are:

  • Two semesters of college mathematics (e.g., Calculus 1 and 2).
  • Ability to write a simple program in some high-level language will help considerably (e.g., C, C++, Java, Python, Basic, etc).

Computing Environment

We will be using Google Colab for computer resources in DS-205. Before the first class, please visit the Colab link. Then, perform these steps:

  1. File → Save a copy in Drive
  2. Runtime → Run all
  3. File → Rename the file to "My Welcome to Colaboratory"
  4. Close the tab.
  5. Visit your Google Drive, locate your "My Welcome to Colaboratory" file and double-click — the file should reopen!

Please visit gradescope.com to confirm that you have completed these steps. If you had any trouble, please reach out to your colleagues over get in touch with me right away!

Motivation: Literate Computing

Data Science is not about data! Well, it is 🙂 but Data Science is really about using data to persuade!

Doing data analysis is the easy part of Data Science. The hard part is showing others how you got there and persuading them to take action. From Economics to Science in general to Computational Biology, this is a pervasive and multi-disciplinary trend!

The perspective this course takes on data analysis is that data analysis is a social activity, and that human communication is as important as analysis skills. Thus, we emphasize not just analysis, but also communicating results to others in a manner that allows reproduction of results and critical analysis.

One of the most powerful techniques for data analysis is literate programming in which an interpretation of program results is presented alongside the program itself. Thus, another researcher can view both your analysis methods and your conclusions and evaluate whether those make sense together.

Rather than organizing code according to requirements that privilege the computer’s execution of the code, literate programming treats a program as literature understandable to human beings, prioritizing the programmer’s own thought process. Literate programming as designed by Donald Knuth takes the form of written prose, with computer-actionable code embedded in macros (an abbreviated format for writing code). Literate programming tools are used to generate two outputs from the literate program: “tangled” code that can be executed by the computer, and “woven” formatted documentation. [Source]

There is increased emphasis on not only publishing interesting results but also publishing exactly how those results were obtained. Jupyter notebooks permit researchers to publish not only the raw data they obtained in their research but also the calculations that led to their conclusions, making it possible for other researchers to replicate the analysis and try alternative analyses. A compound of Julia, Python and R, Jupyter serves the need for repeatable data science analyses.

Using Jupyter Notebooks and Python 3, this course concentrates upon the programming tasks often required for Data collection & transformation, with examples of analysis & and modeling followed by interpretation & decision-making.

Reference Textbooks and Resources

There is no prescribed textbook for COMP-205 but the internet is rich with resources for Data Science using Python. Specifically, Python Data Science Handbook by Jake Van der Plos is an excellent reference.

Seeking Help

Please keep in mind the following as regards to approaching the instructor for help.

  1. First, please utilize the discussion boards on Piazza - as it is monitored daily. Plus other students could likely benefit from the Q & A. Some students may answer your questions even faster!
  2. To get in touch with a TA or the instructor, please send them a message via Piazza. Second-best, send an email at the address provided above.
  3. To get in touch with the instructor for a matter unrelated to course content, please email the instructor. Please keep the use of email to confidential matters, not for general class discussion
  4. If no response from the above within 24 hrs, or in case of an emergency, please call the instructor at (617) 444-9640.


Grades for the Course

The grades will be allocated as follows:

Item
% score
Class Participation
  • Thoughtful (and helpful) questions/comments in class and on Canvas,
  • Willingness to help peers when they are stuck, (without doing the work for them),
  • Participation in office hours.
7 %
Weekly Online exercises
53 %
Take Home quizzes (points allocated as shown in the week-by-week plan below)
40 %

Week-by-week Schedule

Mtg Tu Th Topic Notebook Quiz
1 9/9 Introduction to Literate Programming
2 9/14 Jupyter: Controls, REPL, Markdown
Jupyter: Here be Dragons
01-02
01-03
3 9/16 Python: What does this code do?
Data Types: Numbers, Strings, Iterables
Iterables: Mutable, Duplicated, Ordered
01-04
01-05
4 9/21 Control Flow: if/then, try/catch, iteration, comprehensions 01-06
5 9/23 Functions & Classes 01-07 Quiz 1 due 9/28, 7 %
6 9/28 Imports, Libraries & Modules 02-12
7 9/30 Functions & Encapsulation 02-13
8 10/5 Datetime Modules & Classes 02-14
9 10/7 Collection Classes 02-15 Quiz 2 due 10/12, 8 %
10 10/12 Vector Data Operations: Numpy 03-01
11 10/14 Multi-Dimensional Arrays: Numpy 03-02
12 10/19 Array Data Ingest: Numpy 03-03
13 10/21 CSV Data Abstractions: Pandas 03-04
14 10/26 Data Indexing: Pandas 03-05
15 10/28 Data Merging and Joining: Pandas 03-06
16 11/2 Data Aggregation and Grouping: Pandas 03-07 Quiz 3 due 11/9, 10 %
17 11/4 Scientific Plotting: Matplotlib 04-01
18 11/9 Statistical Plotting: Seaborn 04-02
11/11 Veteran's Day
19 11/16 Other Plotting:Plotly, Bokeh, GMap 04-03
20 11/18 Statistical Analysis of Experiments 05-01
21 11/23 Organizing Jupyter Notebooks 05-02
11/25 Thanksgiving Holiday
22 11/30 Classification 05-03
23 12/2 Clustering 05-04 Quiz 4 due 12/7, 10 %
24 12/7 Case Study 1: World Happiness Report 06-01
25 12/9 Case Study 2: Monte Carlo Methods 06-02
26 12/14 Wrap Up (last lecture) Case Study due 12/21, 5 %





About the Instructor

I have worked in Cloud Computing, Big Data and Python since 2008. Python has been my programming language of choice ever since!

I received my Ph.D. in Electrical Engineering working on solving large-scale matrix problems in Electromagnetics. I was initially on the EE faculty. I spent a major part of my career in industry, mostly in Systems Architect roles, first Computer-Aided Design and later in Finance. Throughout my career, I have stayed close to data and databases as my area of focus.

Please call me Jitendra or J or Prof. J, whichever you prefer. (No period after the J)


Policies

Late Work Policy

For weekly online classes, please plan to submit your work early and often! The last version submitted by the due date will be considered; the submissions after the due date will be evaluated as below:

  • Up to 24 hrs late, 5% penalty
  • 24-60 hrs late, 15% penalty
  • More than 60 hrs late, no credit.

For take home quizzes, due dates / times will be strictly enforced

Academic Integrity

You are expected to be familiar with the Student Guide to Academic Integrity at Tufts (available here).

Academic Accommodations

If you need course adaptations or accommodations because of a disability, or if you have medical information to share with us that may impact your performance or participation in this course, please make an appointment with us as soon as possible.  

If you have approved accommodations, please request your accommodation letters online through the Office of Disability Services student portal. If you have not already done so, students with disabilities who need to utilize accommodations for this course are encouraged to contact the Office of Disability Services as soon as possible to ensure that such accommodations are implemented in a timely fashion.