Category Models

Data products in the wild

Hi Students, Monday’s lecture will focus on Human Factors in Data Science. The class will be an onslaught of needs finding, design, prototyping, and evaluation. It will be intense; brace yourselves. As data scientists, you will ultimately produce a data product, be it a graph or a report or a presentation. This product will affect the […]

Week 12: Predictive modeling, Data Leakage, Model Evaluation

Each week Cathy O’Neil blogs about the class. Cross-posted from This week’s guest lecturer in Rachel Schutt’s Columbia Data Science class was Claudia Perlich. Claudia has been the Chief Scientist at m6d for 3 years. Before that she was a data analytics group at the IBM center that developed Watson, the computer that won […]

Experiments, A/B Testing and Causal Modeling

Screenshot of article by Brian Christian from Wired magazine, The A/B Test: Inside the Technology That’s Changing the Rules of Business from April 2012

Dear Students,

I want to address explicitly why Causal Modeling and Experiments are part of this course. The last two lectures have addressed observational studies and causal modeling and a bit on experiments. []

Week 11: Estimating Causal Effects

Each week Cathy O’Neil blogs about the class. Cross-posted from This week in Rachel Schutt’s Data Science course at Columbia we had Ori Stitelman, a data scientist at Media6Degrees. We also learned last night of a new Columbia course: STAT 4249 Applied Data Science, taught by Rachel Schutt and Ian Langmore. More information can […]

Brief Introduction to Social Network Modeling

Here is a brief, and not comprehensive, introduction to social network modeling drawing from academic literature in mathematics, statistics, computer science, sociology and physics. This will simply introduce some basic models. I won’t get into stochastic (random) processes on networks (such as epidemics or “cascades”), dynamics of networks, nor algorithms for approximating metrics on large-scale […]

Week 7:, Recommendation Engines, SVD, Alternating Least Squares, Convexity, Filter Bubbles

Each week Cathy O’Neil blogs about the class. Cross-posted from Last night in Rachel Schutt’s Columbia Data Science course we had Matt Gattis come and talk to us about recommendation engines. Matt graduated from MIT in CS, worked at SiteAdvisor, and co-founded hunch as its CTO, which recently got acquired by eBay. Here’s what […]

Week 5: GetGlue, time series, financial modeling, advanced regression, and ethics

Each week Cathy O’Neil blogs about the class. Cross-posted from But what makes this week unique is that Cathy was our guest lecturer. So first I need to introduce her, and then what follows is her blog post. Students in the class already know Cathy because she comes each week, asks good questions and […]