Category Machine Learning Algorithms

The New York Philharmonic: Don’t be incompetent!

A couple weeks ago I was at the New York Philharmonic. The conductor, critically-acclaimed Alan Gilbert, and the piano soloist, Emanuel Ax, “broke the fourth wall” and explained Schoenberg’s Piano Concerto to the audience before playing it. They described Schoenberg’s 12-tone technique for composing music as: the composer selects a range of 12 notes and […]

Week 7:, Recommendation Engines, SVD, Alternating Least Squares, Convexity, Filter Bubbles

Each week Cathy O’Neil blogs about the class. Cross-posted from Last night in Rachel Schutt’s Columbia Data Science course we had Matt Gattis come and talk to us about recommendation engines. Matt graduated from MIT in CS, worked at SiteAdvisor, and co-founded hunch as its CTO, which recently got acquired by eBay. Here’s what […]

Week 6: Kaggle, crowdsourcing, decision trees, random forests, social networks, and Google’s hybrid research environment

Each week Cathy O’Neil blogs about the class. Cross-posted from Yesterday we had two guest lecturers, who took up approximately half the time each. First we welcomed William Cukierski from Kaggle, a data science competition platform. Will went to Cornell for a B.A. in physics and to Rutgers to get his Ph.D. in biomedical […]

Week 4: The Data Science Process, k-means, Classifiers, Logistic Regression and Evaluation

Each week Cathy O’Neil blogs about the class. Cross-posted from This week our guest lecturer for the Columbia Data Science class was Brian Dalessandro. Brian works at Media6Degrees as a VP of Data Science, and he’s super active in the research community. He’s also served as co-chair of the KDD competition. Before Brian started, […]

Human Ingenuity

Read the full paper here: Presented, in part, as inspiration: observe the elegance and simplicity of the model; the deep insight that solved a problem as massive as ranking sites on the web with a solution involving eigenvectors. Presented, also, for the student discussion on Anderson’s article.

The Data Science Process

Dear Students, Now that we’ve had our first guest lecture, I’d like to revisit the general framework I proposed for thinking about the data science process on the first day of class (when I generalized the example from Google Plus), and show how Jake’s lecture fits within this framework. Throughout the semester we’ll see that […]

Week 3: Naive Bayes, Laplace Smoothing, APIs and Scraping data off the web

Cathy O’Neil blogs about the class each week. Crossposted from In the third week of the Columbia Data Science course, our guest lecturer was Jake Hofman. Jake is at Microsoft Research after recently leaving Yahoo! Research. He got a Ph.D. in physics at Columbia and taught a fantastic course on modeling last semester at […]