Peer grading: inventing the light bulb

A real-time chronicle of a seasoned professor who has just completed giving his first massively open online course.

With the deadline for submitting the final exam in my MOOC having now passed, the students are engaging in the Peer Evaluation process. I know of just two cases where this has been tried in a genuine MOOC (where the M means what it says), one in Computer Science, the other in Humanities, and both encountered enormous difficulties, and as a result a lot of student frustration. My case was no different.

Anticipating problems, I had given the class a much simplified version of the process – with no grade points at stake – at the end of Week 4, so they could familiarize themselves with the process and the platform mechanics before they had to do it for real. That might have helped, but the real difficulties only emerged when 1,520 exam scripts started to make their way through the system.

By then the instructional part of the course was over. The class had seen and worked through all the material in the curriculum, and had completed five machine-graded problem sets. Consequently, there were enough data in the system to award certificates fairly if we had to abandon the peer evaluation process as a grading device, as happened for that humanities MOOC I mentioned, where the professor decided on the fly to make that part of the exam optional. So I was able to sleep at night. But only just.

With over 1,000 of the students now engaged in the peer review process, and three days left to the deadline for completing grading, I am inclined to see the whole thing through to the (bitter) end. We need the data that this first trial will produce so we can figure out how to make it work better next time.

Long before the course launched, I felt sure that there were two things we would need to accomplish, and accomplish well, in order to make a (conceptual, proof-oriented) advanced math MOOC work: the establishment (and data gathering from) small study groups in which students could help one another, and the provision of a crowd-sourced evaluation and grading system.

When I put my course together, the Coursera platform supported neither. They were working on a calibrated peer review module, but implementing the group interaction side was still in the future. (The user-base growth of Coursera has been so phenomenal, it’s a wonder they can keep the system running at all!)

Thus, when my course launched, there was no grouping system, nor indeed any social media functionality other than the common discussion forums. So the students had to form their own groups using whatever media they could: Facebook, Skype, Google Groups, Google Docs, or even the local pub, bar, or coffee shop for co-located groups. Those probably worked out fine, but since they were outside our platform, we had no way to monitor the activity – an essential functionality if we are to turn this initial, experimental phase of MOOCs  into something robust and useful in the long term.

Coursera had built a beta-release, peer evaluation system for a course on Human Computer Interaction, given by a Stanford colleague of mine. But his needs were different from mine, so the platform module needed more work – more work than there was really time for! In my last post, I described some of the things I had to cope with to get my exam up and running. (To be honest, I like the atmosphere of working in startup mode, but even in Silicon Valley there are still only 24 hours in a day.)

It’s important to remember that the first wave of MOOCs in the current, explosive, growth period all came out of computer science departments, first at Stanford, then at MIT. But CS is an atypical case when it comes to online learning. Although many aspects of computer science involve qualitative judgments and conceptual reasoning, the core parts of the subject are highly procedural, and lend themselves to instruction-based learning and to machine evaluation and grading. (“Is that piece of code correct?” Just see if it runs as intended.)

The core notion in university level mathematics, however, is the proof. But you can’t learn how to prove something by being told or shown how to do it any more than you can learn how to ride a bike by being told or shown. You have to try for yourself, and keep trying, and falling, until it finally clicks. Moreover, apart from some very special, and atypical, simple cases, proofs cannot be machine graded. In that regard, they are more like essays than calculations. Indeed, one of the things I told my students was that a good proof is a story, that explains why something is the case.

Feedback from others struggling to master abstract concepts and proofs can help enormously. Study groups can provide that, along with the psychological stimulus of knowing that others are having just as much difficulty as you are. Since companies like Facebook have shown us how to build platforms that support the creation of groups, that part can be provided online. And when Coursera is able to devote resources to doing it, I know it will work just fine. (If they want to, they can simply hire some engineers from Facebook, which is little more than a mile away. I gather that, like Google before it, the fun period there has long since passed and fully vested employees are looking to move.)

The other issue, that of evaluation and grading, is more tricky. The traditional solution is for the professor to evaluate and grade the class, perhaps assisted by one or more TAs (Teaching Assistants). But for classes that number in the tens of thousands, that is clearly out of the question. Though it’s tempting to dream about building a Wikipedia-like community of dedicated, math-PhD-bearing volunteers, who will participate in a mathematical MOOC whenever it is offered – indeed I do dream about it – it would take time to build up such a community, and what’s more, it’s hard to see there being enough qualified volunteers to handle the many different math MOOCs that will soon be offered by different instructors. (In contrast, there is just one Wikipedia, of course.)

That leaves just one solution: peer grading, where all the students in the class, or at least a significant portion thereof, are given the task of grading the work of their peers. In other words, we have to make this work. And to do that, we have to take the first step. I just did.

Knowing just how many unknowns we were dealing with, my expectations were not high, and I tried to prepare the students for what could well turn out to be chaos. (It did.) The website description of the exam grading system was littered with my cautions and references to being “live beta”. On October 15, when the test run without the grading part was about to launch, I posted yet one more cautionary note on the main course announcements page:

… using the Calibrated Peer Review System for a course like this is, I believe, new. (It’s certainly new to me and my assistants!) So this is all very much experimental. Please approach it in that spirit!

Even so, many of the students were taken aback by just how clunky and buggy the thing was, and the forums sprung alive with exasperated flames. I took solace in the recent release of Apple Maps on the iPhone, which showed that even with the resources and expert personnel available to one of the world’s wealthiest companies, product launches can go badly wrong – and we were just one guy and two part-time, volunteer student assistants, working on a platform being built under us by a small startup company sustained on free Coke and stock options. (I’m guessing the part about the Coke and the options, but that is the prevalent Silicon Valley model.)

At which point, one of those oh-so-timely events occurred that are often described as “Acts of God.” Just when I worried that I was about to witness, and be responsible for starting, the first global, massive open online riot (MOOR) in a math class, Hurricane Sandy struck the Eastern Seaboard, reminding everyone that a clunky system for grading math exams is not the worst thing in the world. Calm, reasoned, steadying, constructive posts started to appear on the forum.  I was getting my feedback after all. The world was a good place once again.

Failure (meaning things don’t go smoothly, or maybe don’t work at all) doesn’t bother me. If it did, I’d never have become a mathematician, a profession in which the failure rate in first attempts to solve a problem is somewhere north of 95%. The important thing is to get enough data to increase the chances of getting it right – or far more likely, just getting it better – the second time round. Give me enough feedback, and I count that “failure” as a success.

As Edison is said to have replied to a young reporter about his many failed attempts to construct a light bulb, “Why would I ever give up? I now know definitively over 9,000 ways that an electric light bulb will not work. Success is almost in my grasp.” (Edison supposedly failed a further 1,000 times before he got it right. Please don’t tell my students that. We are just at failure 1.)

If there were one piece of advice I’d give to anyone about to give their first MOOC, it’s this: remember Edison.

To be continued …