Vol. 1 - "Will we do the drawing again this year?"
I teach an introduction to data science for biology majors.
The greatest challenge of this class is to help students build up an intuition about the many moving parts in the process of machine learning. What is data? What is a machine? How do we measure learning? Why are some datasets more difficult to learn from than others?
After a little experimentation, I ended up building a short activity, which we do on day 1, that communicates most of these concepts in a pretty fun, icebreaky sort of way. The setup is basic, and I think it packs plenty of important ideas together in a short amount of time.
Students are assigned to random groups of five, and they all share a giant piece of Kraft paper, and some felt markers (I am, indeed, a Numberphile fan). One of the students receives a piece of paper which is kept hidden from the rest of the group. On the piece of paper is a painting; this year it was Joan Miró, and Gustav Klimt the year before.
The challenge is as follows: the student describes the drawing however they like, and the rest of the group must reproduce it. They cannot erase, they can only build upon what they have already put on the paper. It is gloriously chaotic, but it also works surprisingly well!
I like this exercise because it serves as a starting point for discussions about information, the representation of information, and the process of learning. During the debriefing, we establish parallels between the in-class experience and the practice of machine learning. The painting is the reality. The description of the painting is the data representing reality. The students reproducing the painting are the algorithms, learning from features they identify in the data. The final drawing on the table is the prediction. Whether it looks like the reality is a measure of model performance.
And then, we can compare the drawings. Do they look the same? Are some of them more accurate about parts of the original painting than others? Are some features of the original painting more difficult to capture? This is useful to introduce ideas about bootstrapping (did all students describe the painting in the same way?), or ensemble models (would we get a better representation by merging all the drawings?), which we revisit much later in the term.
The last part of this activity is to think about “good enough”. Can we identify the original painting from the drawings? Do we get enough of the broad strokes that the prediction can serve as a useful representation of reality? Do we trust the model to convey significant information about what was measured? These are important questions, and they are difficult to engage with if we approach them from a purely statistical standpoint. By bringing the complexity all the way down, students can start thinking about them in a more abstract way.
To my surprise, this exercise was very well received by students. “Will we do the drawing again this year?” is a question I have been asked before the class started. Not dwelling for too long on the fact that this class peaks in its first hour, I think the success of this exercise tells something about how simple, low-budget, low-effort activities can still create engaging and active learning environments.
And now, stay tuned for Vol. 2 on Jan. 29, in which I will discuss the idea of “teaching the classics” during seminars, and argue for the importance of also discussing papers with outdated methodologies.