Category Course Information

Data products in the wild

Hi Students, Monday’s lecture will focus on Human Factors in Data Science. The class will be an onslaught of needs finding, design, prototyping, and evaluation. It will be intense; brace yourselves. As data scientists, you will ultimately produce a data product, be it a graph or a report or a presentation. This product will affect the […]

Philosophy of Data Science: Embrace the Practical and the Profound

This is my last blog post for Statistics 4242, Introduction to Data Science at Columbia University. All final projects have been turned in; grades have been given; the semester is over. I reserve the right to start blogging again at a later date. Dear Students, From the beginning, this course viewed Data Science simultaneously in […]

Tonight’s “Guest” Lecturers: The Wonderful Students

Tonight is the last class of the first iteration of Introduction to Data Science. I’m excited for the first half of tonight when the students will take over and deliver their lecture. They’ve sat through the entire semester, now it’s their turn. What do they have to say about Data Science? I’m looking forward to […]

Tonight’s Guest Speakers: David Crawshaw and Josh Wills

Tonight we have two guest speakers, David Crawshaw and Josh Wills, both of whom I’ve had the pleasure of working with at Google. I hesitate to call them “data engineers” because that term is as problematic or potentially overloaded as “data scientist”, but suffice it to say that they’ve both worked as software engineers and […]

Tonight’s Guest Speaker: Claudia Perlich

Claudia Perlich currently serves as Chief Scientist at m6d. In this role, Claudia designs, develops, analyzes and optimizes the machine learning that informs brands on how to find their best prospective customers. She and the team of m6d scientists live and breathe web-wide data to fuel new business and marketplace intelligence. An active industry speaker […]

Tonight’s Guest Speaker: Ori Stitelman; Two Announcements

Tonight’s guest speaker is Ori Stitelman from Media 6 Degrees (m6d). Earlier on in the semester we had his colleague, Brian Dalessandro, speak to us about classifiers, logistic regression and evaluation. Ori will be talking about causal modeling. It will be interesting to think about how your understanding of Data Science has evolved since Brian visited us.

Ori Stitelman is a Senior Data Scientist at m6d. His responsibilities include prototyping methods for improving m6d’s display advertisement targeting product, creating fraud detection tools, as well as developing methods for estimating the causal effect of advertising. Ori received a Ph.D. in Biostatistics from the University of California, Berkeley where his primary research focus was on developing methods for estimating causal effects.

Two Announcements:

(1) Eurry and Kaz won Best Data Narrative in the Hubway Competition! Congratulations!!

(2) As a reminder, next week’s lecture is on Monday night (November 19th) rather than Wednesday night because of the long Thanksgiving weekend.

Columbia University Data Science-y Updates

Dear Students, I want to let you know about the following: (1) Institute for Data Sciences and Engineering: Columbia University’s new Institute for Data Sciences and Engineering has launched a website: The word “Data” also modifies “Engineering”, in case there was confusion. Data Sciences and Data Engineering. (2) Kaggle Competition: As you know, our class […]