A first course for those studying toward the Master of Science in Data Science and related degrees, including engineering Master of Science degrees with an emphasis upon data analysis.
Professor: J Singh, Jitendra.Singh@tufts.edu, Phone: (617) 444-9640
TA: Hongjie Wang, email@example.com
Lectures: Tue, Thu 10:30 am — 11:45 am. Synchronously over Zoom. You will need to sign in using your tufts credentials.
J:: Wed 9:30 am — 10:30 am. Virtual Halligan, Room 44 .
Hongjie:: Mon 9:00 am — 6:00 pm. Tisch Library
J's Office: Virtual Halligan, Room 44
Zoom recording policy
To accommodate students who are unable to attend class for whatever reasons,
the lectures and class discussion will be recorded.
By participating, you are consenting to the recording. If you have objections to being recorded, please contact me before class.
COMP-205 is intended as a first course for those studying toward the Master of Science in Data Science and related degrees, including engineering Master of Science degrees with an emphasis upon data analysis.
The target audience for COMP-205 is MS students who do not have a lot of Computer Science (CS) background, typically first year Data Science students or non-CS grad students. If you were a CS undergrad are a CS grad student, this course will probably not teach you much you don't already know. If in doubt, please speak with the instructor. The formal prerequisites for the course are:
- Two semesters of college mathematics (e.g., Calculus 1 and 2).
- Ability to write a simple program in some high-level language will help considerably (e.g., C, C++, Java, Python, Basic, etc).
We will be using Google Colab for computer resources in DS-205. Before the first class, please visit the Colab link. Then, perform these steps:
- File → Save a copy in Drive
- Runtime → Run all
- File → Rename the file to "My Welcome to Colaboratory"
- Close the tab.
- Visit your Google Drive, locate your "My Welcome to Colaboratory" file and double-click — the file should reopen!
Please visit gradescope.com to confirm that you have completed these steps. If you had any trouble, please reach out to your colleagues over get in touch with me right away!
Motivation: Literate Computing
Data Science is not about data! Well, it is 🙂 but Data Science is really about using data to persuade!
Doing data analysis is the easy part of Data Science. The hard part is showing others how you got there and persuading them to take action. From Economics to Science in general to Computational Biology, this is a pervasive and multi-disciplinary trend!
The perspective this course takes on data analysis is that data analysis is a social activity, and that human communication is as important as analysis skills. Thus, we emphasize not just analysis, but also communicating results to others in a manner that allows reproduction of results and critical analysis.
One of the most powerful techniques for data analysis is literate programming in which an interpretation of program results is presented alongside the program itself. Thus, another researcher can view both your analysis methods and your conclusions and evaluate whether those make sense together.
Rather than organizing code according to requirements that privilege the computer’s execution of the code, literate programming treats a program as literature understandable to human beings, prioritizing the programmer’s own thought process. Literate programming as designed by Donald Knuth takes the form of written prose, with computer-actionable code embedded in macros (an abbreviated format for writing code). Literate programming tools are used to generate two outputs from the literate program: “tangled” code that can be executed by the computer, and “woven” formatted documentation. [Source]
There is increased emphasis on not only publishing interesting results but also publishing exactly how those results were obtained. Jupyter notebooks permit researchers to publish not only the raw data they obtained in their research but also the calculations that led to their conclusions, making it possible for other researchers to replicate the analysis and try alternative analyses. A compound of Julia, Python and R, Jupyter serves the need for repeatable data science analyses.
Using Jupyter Notebooks and Python 3, this course concentrates upon the programming tasks often required for Data collection & transformation, with examples of analysis & and modeling followed by interpretation & decision-making.
Reference Textbooks and Resources
There is no prescribed textbook for COMP-205 but the internet is rich with resources for Data Science using Python. Specifically, Python Data Science Handbook by Jake Van der Plos is an excellent reference.
Please keep in mind the following as regards to approaching the instructor for help.
- First, please utilize the discussion boards on Piazza - as it is monitored daily. Plus other students could likely benefit from the Q & A. Some students may answer your questions even faster!
- To get in touch with a TA or the instructor, please send them a message via Piazza. Second-best, send an email at the address provided above.
- To get in touch with the instructor for a matter unrelated to course content, please email the instructor. Please keep the use of email to confidential matters, not for general class discussion
- If no response from the above within 24 hrs, or in case of an emergency, please call the instructor at (617) 444-9640.
Grades for the Course
The grades will be allocated as follows:
Weekly Online exercises
Take Home quizzes (points allocated as shown in the week-by-week plan below)
|1||9/9||Introduction to Literate Programming|
|2||9/14||Jupyter: Controls, REPL, Markdown
Jupyter: Here be Dragons
|3||9/16||Python: What does this code do?
Data Types: Numbers, Strings, Iterables
Iterables: Mutable, Duplicated, Ordered
|4||9/21||Control Flow: if/then, try/catch, iteration, comprehensions||01-06|
|5||9/23||Functions & Classes||01-07||Quiz 1 due 9/28, 7 %|
|6||9/28||Imports, Libraries & Modules||02-12|
|7||9/30||Functions & Encapsulation||02-13|
|8||10/5||Datetime Modules & Classes||02-14|
|9||10/7||Collection Classes||02-15||Quiz 2 due 10/12, 8 %|
|10||10/12||Vector Data Operations: Numpy||03-01|
|11||10/14||Multi-Dimensional Arrays: Numpy||03-02|
|12||10/19||Array Data Ingest: Numpy||03-03|
|13||10/21||CSV Data Abstractions: Pandas||03-04|
|14||10/26||Data Indexing: Pandas||03-05|
|15||10/28||Data Merging and Joining: Pandas||03-06|
|16||11/2||Data Aggregation and Grouping: Pandas||03-07||Quiz 3 due 11/9, 10 %|
|17||11/4||Scientific Plotting: Matplotlib||04-01|
|18||11/9||Statistical Plotting: Seaborn||04-02|
|19||11/16||Other Plotting:Plotly, Bokeh, GMap||04-03|
|20||11/18||Statistical Analysis of Experiments||05-01|
|21||11/23||Organizing Jupyter Notebooks||05-02|
|23||12/2||Clustering||05-04||Quiz 4 due 12/7, 10 %|
|24||12/7||Case Study 1: World Happiness Report||06-01|
|25||12/9||Case Study 2: Monte Carlo Methods||06-02|
|26||12/14||Wrap Up (last lecture)||Case Study due 12/21, 5 %|
About the Instructor
I have worked in Cloud Computing, Big Data and Python since 2008. Python has been my programming language of choice ever since!
I received my Ph.D. in Electrical Engineering working on solving large-scale matrix problems in Electromagnetics. I was initially on the EE faculty. I spent a major part of my career in industry, mostly in Systems Architect roles, first Computer-Aided Design and later in Finance. Throughout my career, I have stayed close to data and databases as my area of focus.
Please call me Jitendra or J or Prof. J, whichever you prefer. (No period after the J)
Late Work Policy
For weekly online classes, please plan to submit your work early and often! The last version submitted by the due date will be considered; the submissions after the due date will be evaluated as below:
- Up to 24 hrs late, 5% penalty
- 24-60 hrs late, 15% penalty
- More than 60 hrs late, no credit.
For take home quizzes, due dates / times will be strictly enforced
You are expected to be familiar with the Student Guide to Academic Integrity at Tufts (available here).
If you need course adaptations or accommodations because of a disability, or if you have medical information to share with us that may impact your performance or participation in this course, please make an appointment with us as soon as possible.
If you have approved accommodations, please request your accommodation letters online through the Office of Disability Services student portal. If you have not already done so, students with disabilities who need to utilize accommodations for this course are encouraged to contact the Office of Disability Services as soon as possible to ensure that such accommodations are implemented in a timely fashion.