About the Course

Introduction to Data Science is being offered a second time in the Fall 2013. One major change this year is that I will be co-teaching with Dr. Kayur Patel, a computer scientist at Google. Here’s the Syllabus:

Introduction to Data Science Syllabus, Fall 2013
Statistics W4242, Columbia University

Professors: Dr. Kayur Patel (kp2566@columbia.edu) and Dr. Rachel Schutt (rrs2117@columbia.edu)
Lab Instructor: Jared Lander (jpl2135@columbia.edu)
Teaching Assistant: Haolei Weng (hw2375@columbia.edu)
Project Coordinator: Anna Hurley (anna.c.hurley@gmail.com)

Location and Time
Lectures: Mondays and Wednesdays, 6:10-7:25pm @ 428 Pupin Laboratories
Labs: Tuesdays, 6:10-7:25pm @ 312 Math OR 7:40-8:55pm @ 417 Math
TA Office Hours: Tuesdays and Fridays, 2:00-4:00pm @ Lounge 10th floor, SSW (School of Social Work) and by appointment

Course Description
This course serves as an introduction to the interdisciplinary and emerging field of data science. Students will learn to combine tools and techniques from statistics, computer science, data visualization and the social sciences to solve problems using data. Central threads include: (1) the data science process from data collection to product, (2) tools for working with both big and small datasets, (3) statistical modeling and machine learning, and (4) real world topics and case studies. The course consists of: (1) core lectures by the instructors, (2) guest lectures from data scientists who are experts in their fields, and (3) a course-long project. Topics and tools will include data wrangling and munging, machine learning algorithms, statistical models, data visualization, data journalism, R, ethics, MapReduce, and data pipelines.

Goals of the course
1) Learn about what it’s like to be a data scientist
2) Be able to do some of what a data scientist does

Schedule and course structure
The course is organized into two sections. The first section is devoted to the data science process. Lectures during this period will  correspond to the various stages of the process to build student skill sets and understanding. The second section is special topics and case studies in data science and will include guest lectures that demonstrate the data science process in context, as well as deeper dives into different classes of data including text, images and graphs.


Canceled [Rosh Hashanah]


Introduction, Syllabus, Data Science Process


Data Science Process, Intro to Algorithms


Scoping Projects, Asking good questions [Drew Conway, Datakind]


Data: Unstructured vs. Structured Data, Databases


Sampling and exploratory data analysis


Statistical modeling and inference


HCI and Data Science


Feature Selection, Kaggle Competition [Will Cukierski, Kaggle]


Machine Learning Overview: Classification, Regression, Clustering


Machine Learning: Specific algorithms


Visualization: Charts, Graphs, Precognitive Features


Visualization: Interactive visualizations, Infographics


Data & Journalism


Data & Journalism [Steve Lohr & Andy Lehren, The New York Times]


Working at Scale: memory, parallelization, mapreduce [Aaron Kimball]


Midterm Project Presentations (Ignite Talks)


Academic Holiday

11/6/2013- 12/2/2013

Special topics and case studies may include natural language processing, machine translation, crowd-sourcing, mechanical turk, social network data. Guest lecturers most likely from Facebook, Google, Foursquare, Microsoft Research

12/4/2013, 12/9/2013

Project Presentations


  1. Do you have this class ONLINE! I am very much interested.

  2. I am looking forward to seeing this course offered on coursera!

  3. Ditto for Coursera. I would like to take this.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: