Category Defining Data Science

10 Important Data Science Ideas

Here’s a list of 10 important ideas we’ve explored this semester so far. 10. Interdisciplinary Data Science teams My experience¬†at Google, along with DJ Patil’s piece on Building Data Science teams, informs my understanding of the importance of interdisciplinary teams. The students who showed up to take this class are from across departments and disciplines. […]

The Case for Data Science

Dear Students, Data Science is an emerging field in industry, yet not well-defined as an academic subject. This is the first course at Columbia that has the term “Data Science” in the title. So recently, Allen Bernard, a freelance journalist working on an article for about the emerging role of the data scientist asked […]

Week 4: The Data Science Process, k-means, Classifiers, Logistic Regression and Evaluation

Each week Cathy O’Neil blogs about the class. Cross-posted from This week our guest lecturer for the Columbia Data Science class was Brian Dalessandro. Brian works at Media6Degrees as a VP of Data Science, and he’s super active in the research community. He’s also served as co-chair of the KDD competition. Before Brian started, […]

The Data Science Process

Dear Students, Now that we’ve had our first guest lecture, I’d like to revisit the general framework I proposed for thinking about the data science process on the first day of class (when I generalized the example from Google Plus), and show how Jake’s lecture fits within this framework. Throughout the semester we’ll see that […]

Curse of dimensionality

This is a guest post by Professor Matthew Jones, from Columbia’s History department, who has been attending the course. I invited him to give his perspective on the course thus far. Few things lurk as much a challenge and instigation in data mining (or machine learning or the data sciences) as the “curse of dimensionality.” […]

Visualizing Bill Cleveland’s original Data Science Proposal

I described the origins of and short history of Data Science in week 1. The origins include a 2001 action plan, by William Cleveland, a statistician, written when he was at Bell Labs, to define propose Data Science as a new academic discipline. A student in our class, Eurry Kim (with permission), created the following: […]

Week 1 Report: Current View of the Scope of the Course

“data science”: collection of best practices taught to you by experts in the field eager to come teach you filling a gap we see in current education Data Science: research area Columbia University Institute for Data Sciences We’re at Columbia; We showed up for a Data Science class; We represent Columbia’s interdisciplinary research community. What […]