This week Rachel will cover machine learning. I hope you guys love the material as much as I do. Well, maybe not as much as I do… I spent the better part of a decade writing a book on how to build machine learning tools. Since I’ve spent some time thinking about making machine learning easier to use, I thought I’d supplement the lecture with a couple of pointers.
A deeper dive into machine learning algorithms
Rachel will cover a lot material in two classes, but the field of machine learning is deep. If you’re interested in learning more, I’d suggest looking at Andrew Ng’s Coursera course, which happens to start on October 14th. Andrew is a clear communicator, which makes him an great instructor. I have first hand experience–the Coursera course is modeled after the machine learning course I took as a graduate student. If you are interested learning more of the math, you can look at videos from that course.
Feature generation is key
Learning the math isn’t just an opportunity to geek out: understanding machine learning algorithms is important. By understanding how they work, you know when and how to apply them. However, in most cases, you will not actually write the code that does the learning. Pick your favorite programming language. Someone has created a machine learning library for it. The library you need is a simple Google search away.
Most of the work is in data munging and feature generation. There is an excellent article called A Few Useful Things to Know about Machine Learning by Pedro Domingos in Communications of the ACM. On Pedro’s list of useful things, number eight is “feature engineering is the key.”
At the end of the day, some machine learning projects succeed and some fail. What makes the difference? Easily the most important factor is the features used. If you have many independent features that each correlate well with the class, learning is easy. On the other hand, if the class is a very complex function of the features, you may not be able to learn it. Often, the raw data is not in a form that is amenable to learning, but you can construct features from it that are. This is typically where most of the effort in a machine learning project goes. It is often also one of the most interesting parts, where intuition, creativity and “black art” are as important as the technical stuff.
Feature generation is the process by which you take raw data (e.g. images, text) and turn them into something the machine learning algorithm can understand (e.g. pixels of the image, word counts for the text). Building features that properly correlate to the differences you are interested in is important. Pedro argues that this step is as important as the machine learning algorithm. I’d say it’s more important.
Play to your Domain…
The notion that feature generation is a “black art” comes from the fact that domain expertise is needed. Many great machine learning researchers are experts in various other domains (e.g. linguistics, medicine, computer vision, signal processing). Domain expertise allows them to better understand which measurements might help distinguish examples in their dataset and write code to take those measurements. The learning algorithm takes a list of potentially useful measurements (i.e. features) and builds a model from a subset of those measurements that correlate best.
However, domain expertise in one field doesn’t really mean domain expertise in all fields. If you are working on a speech recognition system, you are probably better off hiring someone that has a background in signal processing rather than someone that has spent their time analyzing medical records. For feature generation, it’s harder to create a general theory that spans domains.
… but don’t blindly optimize one metric
Finally, domain expertise is also important for choosing your evaluation metric. Many machine learning techniques have standard metrics for evaluating performance. For instance, if you are building a classifier, you might compare predictions made by a classifier with ground truth labels to get an accuracy measure.
It’s tempting to optimize this measure. If your accuracy is 82%, you might be able to tune a few parameters to increase accuracy to 84%. You might be able to use a more complicated algorithm to push it to 86%. It can be deeply satisfying to push these numbers higher and higher.
But even ignoring obvious problems of overfitting, pushing accuracy up might not matter, especially if it comes with other tradeoffs. You have to understand the performance in the context of the application. If a more complicated algorithm provides a small increase in accuracy at the cost of interpretability, it may not be worth the tradeoff, especially if your application tries to explain its decisions to its users.