Welcome to CS-119!
Professor: J Singh, Jitendra.Singh@tufts.edu
Lectures: Tue, Thu 4:30 pm — 5:45 pm. Joyce Cummings Center, Room 160.
If unable to attend the lecture in person, please consider attending Synchronously over Zoom. You will need to sign in using your tufts credentials.
Office Hours: To be announced over Piazza
Please do not come to class when exhibiting even mild Covid-19 symptoms.
If you are ill or symptomatic, please alert the teaching team via Piazza.
This guideline applies to everyone: the students, TAs and the instructor.
Zoom recordings will be available to those who are unable to attend due to illness, anxiety, grief or trauma. More on illness-related policies below.
To make zoom recordings available for your unexpected absences, please be aware that you are consenting to being recorded. If you have objections to being recorded, please contact me before class.
Big Data deals with emerging applications in science and engineering disciplines that generate and collect data at unprecedented speed, scale, and complexity that need to be managed and analyzed efficiently.
This course introduces the latest techniques and infrastructures developed for big data including parallel and distributed database systems, map-reduce infrastructures, scalable platforms for complex data types, stream processing systems, and cloud-based computing.
The course content will be a blend of theory, algorithms and practical (hands on) work, involving software design, coding, testing and debugging!
Prerequisites: Formally, there are no prerequisites for this course. Still, COMP-119 should not be your first programming course. Familiarity with database internals (COMP-115) is helpful but is not required. The Background Assessment exercise available in Gradescope will give you an indication of your preparedness for CS-119.
Programming Languages: Big Data work across the industry involves Python, Java, Scala, shell programming and SQL. Most of our work in this class will be in Python, but familiarity with the programming languages cited will give you a head start. It is expected that students taking CS-119 know at least some of these and will pick up the rest, as required, on their own.
Reference Textbooks and Resources
All of the books are either on Safari Books Online (available free through the Library) or downloadable free for private use.
The Datacenter as a Computer
Designing Warehouse-Scale Machines, Third Edition
Luiz André Barroso
Download from Tufts Library with your credentials.
Mining of Massive Datasets, 2nd edition
Cambridge University Press
Download for private use only from Stanford InfoLab.
Everybody lies: big data, new data, and what the internet can tell us about who we really are
Sources for Hands-On work
Hadoop: The Definitive Guide: Storage and Analysis at Internet Scale, 4th Edition
Spark: The Definitive Guide
Please keep in mind the following as regards to approaching the instructor for help.
- For technical questions, please utilize Piazza — plus other students could likely benefit from the Q & A. Some students may answer your questions even faster! Public questions will receive a higher priority because they apply to everyone.
- For questions that have specifics of your solution that you don't want others to see, it's OK to post private questions.
- To get in touch with the instructor for a matter unrelated to course content, please email the instructor. Please keep the use of email to confidential matters, not for general class discussion
- If no response from the above within 24 hrs, or in case of an emergency, please call the instructor at (617) 444-9640.
Grades for the Course
The grades will be allocated as follows:
Your class participation score is intended to reflect your effort in maintaining a collaborative learning environment for everyone.
Academic paper reviews
Big Data is a continuously evolving field. The ability to consume published, peer-reviewed papers is critical your ability to stay current with it. We will follow a methodology proposed by S. Keshav to review some of the seminal papers in the field.
The quizzes are intended to reflect your effort in staying current with what's happening in class. They are typically due a week after they are assigned, never more than two weeks. Only the best 70% (7 out of 10) of your quiz scores will count towards the final grade.
Choose a topic you wish to research, or a prototype you would like to build. Write a proposal for doing the research or building the prototype (as the case may be).. A research proposal is a detailed plan or ‘blueprint’ for the intended study, and once it is completed, the research project should flow smoothly.
Writing proposals is an integral part of moving up in the organization where you work, be it academe or industry.
Most significant Research projects involve putting together a system, obtaining the data and running experiments. The first step in the research is to obtain funding for it. You will not be doing the actual research just yet — you will be creating a proposal for securing funding for the research.
Each student has an allowance for use of a Cloud Platform to be used for Quizzes 3-6.
About the Instructor
I have worked in Cloud Computing, Big Data and Python since 2008. Python has been my programming language of choice ever since!
I received my Ph.D. in Electrical Engineering working on solving large-scale matrix problems in Electromagnetics. I was initially on the EE faculty at WPI, then left to spend a major part of my career in industry, mostly in Systems Architect roles, first Computer-Aided Design and later in Finance. Throughout my career, I have stayed close to data and databases as my area of focus. I rejoined WPI in the CS department in 2009 teaching Databases and Big Data. I've been with Tufts since 2018.
Please call me Jitendra or J or Prof. J, whichever you prefer. (No period after the J)
Late Work Policy
All students will be allowed 6 late tokens. Each token allows you to be late on an assignment by 1 day.
You are expected to be familiar with the Student Guide to Academic Integrity at Tufts (available here).
If you need course adaptations or accommodations because of a disability, or if you have medical information to share with us that may impact your performance or participation in this course, please make an appointment with us as soon as possible.
If you have approved accommodations, please request your accommodation letters online through the Office of Disability Services student portal. If you have not already done so, students with disabilities who need to utilize accommodations for this course are encouraged to contact the Office of Disability Services as soon as possible to ensure that such accommodations are implemented in a timely fashion.