Posts Tagged 'calibrated peer review'

Peer grading: inventing the light bulb

A real-time chronicle of a seasoned professor who has just completed giving his first massively open online course.

With the deadline for submitting the final exam in my MOOC having now passed, the students are engaging in the Peer Evaluation process. I know of just two cases where this has been tried in a genuine MOOC (where the M means what it says), one in Computer Science, the other in Humanities, and both encountered enormous difficulties, and as a result a lot of student frustration. My case was no different.

Anticipating problems, I had given the class a much simplified version of the process – with no grade points at stake – at the end of Week 4, so they could familiarize themselves with the process and the platform mechanics before they had to do it for real. That might have helped, but the real difficulties only emerged when 1,520 exam scripts started to make their way through the system.

By then the instructional part of the course was over. The class had seen and worked through all the material in the curriculum, and had completed five machine-graded problem sets. Consequently, there were enough data in the system to award certificates fairly if we had to abandon the peer evaluation process as a grading device, as happened for that humanities MOOC I mentioned, where the professor decided on the fly to make that part of the exam optional. So I was able to sleep at night. But only just.

With over 1,000 of the students now engaged in the peer review process, and three days left to the deadline for completing grading, I am inclined to see the whole thing through to the (bitter) end. We need the data that this first trial will produce so we can figure out how to make it work better next time.

Long before the course launched, I felt sure that there were two things we would need to accomplish, and accomplish well, in order to make a (conceptual, proof-oriented) advanced math MOOC work: the establishment (and data gathering from) small study groups in which students could help one another, and the provision of a crowd-sourced evaluation and grading system.

When I put my course together, the Coursera platform supported neither. They were working on a calibrated peer review module, but implementing the group interaction side was still in the future. (The user-base growth of Coursera has been so phenomenal, it’s a wonder they can keep the system running at all!)

Thus, when my course launched, there was no grouping system, nor indeed any social media functionality other than the common discussion forums. So the students had to form their own groups using whatever media they could: Facebook, Skype, Google Groups, Google Docs, or even the local pub, bar, or coffee shop for co-located groups. Those probably worked out fine, but since they were outside our platform, we had no way to monitor the activity – an essential functionality if we are to turn this initial, experimental phase of MOOCs  into something robust and useful in the long term.

Coursera had built a beta-release, peer evaluation system for a course on Human Computer Interaction, given by a Stanford colleague of mine. But his needs were different from mine, so the platform module needed more work – more work than there was really time for! In my last post, I described some of the things I had to cope with to get my exam up and running. (To be honest, I like the atmosphere of working in startup mode, but even in Silicon Valley there are still only 24 hours in a day.)

It’s important to remember that the first wave of MOOCs in the current, explosive, growth period all came out of computer science departments, first at Stanford, then at MIT. But CS is an atypical case when it comes to online learning. Although many aspects of computer science involve qualitative judgments and conceptual reasoning, the core parts of the subject are highly procedural, and lend themselves to instruction-based learning and to machine evaluation and grading. (“Is that piece of code correct?” Just see if it runs as intended.)

The core notion in university level mathematics, however, is the proof. But you can’t learn how to prove something by being told or shown how to do it any more than you can learn how to ride a bike by being told or shown. You have to try for yourself, and keep trying, and falling, until it finally clicks. Moreover, apart from some very special, and atypical, simple cases, proofs cannot be machine graded. In that regard, they are more like essays than calculations. Indeed, one of the things I told my students was that a good proof is a story, that explains why something is the case.

Feedback from others struggling to master abstract concepts and proofs can help enormously. Study groups can provide that, along with the psychological stimulus of knowing that others are having just as much difficulty as you are. Since companies like Facebook have shown us how to build platforms that support the creation of groups, that part can be provided online. And when Coursera is able to devote resources to doing it, I know it will work just fine. (If they want to, they can simply hire some engineers from Facebook, which is little more than a mile away. I gather that, like Google before it, the fun period there has long since passed and fully vested employees are looking to move.)

The other issue, that of evaluation and grading, is more tricky. The traditional solution is for the professor to evaluate and grade the class, perhaps assisted by one or more TAs (Teaching Assistants). But for classes that number in the tens of thousands, that is clearly out of the question. Though it’s tempting to dream about building a Wikipedia-like community of dedicated, math-PhD-bearing volunteers, who will participate in a mathematical MOOC whenever it is offered – indeed I do dream about it – it would take time to build up such a community, and what’s more, it’s hard to see there being enough qualified volunteers to handle the many different math MOOCs that will soon be offered by different instructors. (In contrast, there is just one Wikipedia, of course.)

That leaves just one solution: peer grading, where all the students in the class, or at least a significant portion thereof, are given the task of grading the work of their peers. In other words, we have to make this work. And to do that, we have to take the first step. I just did.

Knowing just how many unknowns we were dealing with, my expectations were not high, and I tried to prepare the students for what could well turn out to be chaos. (It did.) The website description of the exam grading system was littered with my cautions and references to being “live beta”. On October 15, when the test run without the grading part was about to launch, I posted yet one more cautionary note on the main course announcements page:

… using the Calibrated Peer Review System for a course like this is, I believe, new. (It’s certainly new to me and my assistants!) So this is all very much experimental. Please approach it in that spirit!

Even so, many of the students were taken aback by just how clunky and buggy the thing was, and the forums sprung alive with exasperated flames. I took solace in the recent release of Apple Maps on the iPhone, which showed that even with the resources and expert personnel available to one of the world’s wealthiest companies, product launches can go badly wrong – and we were just one guy and two part-time, volunteer student assistants, working on a platform being built under us by a small startup company sustained on free Coke and stock options. (I’m guessing the part about the Coke and the options, but that is the prevalent Silicon Valley model.)

At which point, one of those oh-so-timely events occurred that are often described as “Acts of God.” Just when I worried that I was about to witness, and be responsible for starting, the first global, massive open online riot (MOOR) in a math class, Hurricane Sandy struck the Eastern Seaboard, reminding everyone that a clunky system for grading math exams is not the worst thing in the world. Calm, reasoned, steadying, constructive posts started to appear on the forum.  I was getting my feedback after all. The world was a good place once again.

Failure (meaning things don’t go smoothly, or maybe don’t work at all) doesn’t bother me. If it did, I’d never have become a mathematician, a profession in which the failure rate in first attempts to solve a problem is somewhere north of 95%. The important thing is to get enough data to increase the chances of getting it right – or far more likely, just getting it better – the second time round. Give me enough feedback, and I count that “failure” as a success.

As Edison is said to have replied to a young reporter about his many failed attempts to construct a light bulb, “Why would I ever give up? I now know definitively over 9,000 ways that an electric light bulb will not work. Success is almost in my grasp.” (Edison supposedly failed a further 1,000 times before he got it right. Please don’t tell my students that. We are just at failure 1.)

If there were one piece of advice I’d give to anyone about to give their first MOOC, it’s this: remember Edison.

To be continued …

It’s About Time (in Part): MOOC Planning – Part 10

 A real-time chronicle of a seasoned professor embarking on his first massively open online course.

Well, lectures have ended and the course has now switched gears. For those still left in the course (17% of the final enrollment total of 64,045), the next two weeks are focused on trying to make sense of everything they have learned, and working on the final exam — which in the case of my course involves peer evaluation.

Calibrated Peer Review is not new. A study of its use in the high school system by Sadler and Good, published in 2006, has become compulsory reading for those of us planning and giving MOOCs that cover material that cannot be machine graded. [If you want to see how I am using it, just enroll in the class and read the description of the "Peer Review system". There is no obligation to do anything more than browse around the site! No one will know you are not simply a dog that can use a computer.]

As I was working on my course, Coursera was still frantically building out their platform to support peer evaluation. There was a lot of just-in-time construction. It’s been a long time since I’ve had to go behind a user-friendly interface and dig into the underlying code to do something on a computer, and the programming languages have all changed since I last did that.

One thing I had to learn was one of the ways networked computers keep time. I now know that at the time of writing these words, 7:00AM Pacific Daylight Time on October 22, 2012,  exactly 1,350,914,400 seconds have elapsed since the first second of January 1st, 1970, Eastern Standard Time. That was the start of Unix Time.

I needed to learn to work in Unix Time in order to set the various opening times and completion deadlines for the exam process. I expect that by the time the next instructor puts together a MOOC, she or he will be greeted by a nice, friendly Coursera interface with pulldown menus and boxes to tick — which probably will come as a great relief to any humanities professors reading this, who don’t have any programming in their background.

[By coincidence, Unix was the last programming language I had any proficiency in, but I did not need to know Unix to use Unix Time. I just used an online converter. Unix was developed in 1969 at AT&T Bell Laboratories in New Jersey. Hence the 1970 EST baseline.]

In fact, time conversion issues in general turned out to be a  continuing, major headache in a course with students all over the world. One thing we will not do again is have 12:00PM Stanford Time, aka Coursera Time (i.e., PDT), as any of the course deadlines. It might seem a nice clean stopping point, and there are all those memories of Gary Cooper’s deadline in the classic Western movie High Noon, but many students missed the deadline for the first submitted assignment because they thought 12:00PM meant midnight, which in some parts of the world made them a whole day late.

The arbitrary illogicality of the AM/PM distinction is not apparent to those of us who grew up with it. But my course TA and I are now very aware of the problems it can lead to! In future, we’ll stick to unambiguous times that stay away from noon and midnight. But even then, with local computer systems usually working on local time, to say nothing of the different Summer and Winter Times, which change on different dates around the world, timing events in MOOCs is going to remain a problematic issue, just as it is for international travelers and professionals who collaborate globally over Skype and other conferencing services. (When I used the Unix Time conversion app, I had to remember that Unix thinks New Jersey is currently just two hours ahead of California, not the three hours United Airlines uses when it flies me there. Confusing, isn’t it?)

The reason why times are an issue in my course is that it is a course. At first glance, it may look little different from Khan Academy, where there are no time issues at all. But Khan Academy is really just an educational resource. (At least, that’s the part most people are familiar with and use, namely the video library that started it all. People use it as a video version of a textbook — or more precisely a video equivalent to that good old standby Cliffs Notes, which got many of us through an exam in an obligatory subject we were not particularly interested in.)

In contrast, in my case, as I’ve discussed earlier in this blog series (in particular, Part 6), my goal was to take a standard university course (one I’ve given many times over the years, at different universities, including Stanford) and make it available to anyone in the world, for free. To the degree I could make it happen, they would get the same learning experience.

That meant that the main goal would be to build a (short-lived) learning community. The video-recorded lectures and tutorials were simply tools to make that happen and to orchestrate events. Real learning takes place when students work on assignments on their own, when they repeatedly fail to solve a problem, and when they interact (with the professor and with one another) — not when they watch a lecture or read a book.

To achieve that goal, the MOOC would, as I stated in Part 6, involve admissions, lectures, peer interaction, professor interaction, problem-solving, assignments, exams, deadlines, and certification. To use the mnemonic I coined early on in this series, the basic design principle is WYSIWOSG: What You See Is What Our Students Get.

As we go forward, I intend to iterate on the course design, based on the data we collect from the students (and 64,000 students very definitely puts us into the Big Data realm). But my basic principle will remain that of offering a course, not the provision of a video library. And the reason for that should be obvious to anyone who has been following this blog series, as well as some of the posts on my other blogs Devlin’s Angle and profkeithdevlin.org. The focus is not on acquiring facts or mastering basic skills, but on learning to think a certain way (in my case, like a professional mathematician). And that requires both a lot of effort and (for most of us) a lot of interaction with others trying to achieve the same goal.

Our ancestors in the 11th Century started to develop what to this day remains the best way we know to achieve this at scale: the university, where people become members of a learning community in which learning takes place in a hothouse atmosphere that involves periods of intense interaction as deadlines loom, sustained by the rapidly formed social bonds that emerge as a result of that same pressure.

While I will likely experiment with variants of this model that allow for participation by students who have demanding, full-time jobs, I doubt I will abandon that basic model. It has lasted for a thousand years for a good reason. It works.

To be continued …

Final Lecture: MOOC Planning – Part 9

A real-time chronicle of a seasoned professor embarking on his first massively open online course.

I gave my last lecture of the course yesterday (discounting the tutorial session that will go out next week), and we are now starting a two week exam period.

“Giving” a lecture means the video becomes available for streaming. For logistic reasons (high among them, my survival and continued sanity — assuming anyone who organizes and gives a MOOC, for no payment, is sane), I recorded all the lectures weeks ago, well before the course started.  The weekly tutorial sessions come the closest to being live. I record them one or two days before posting, so I can use them to respond to issues raised in the online course discussion forum.

The initial course enrollment of 63,649 has dropped to 11,848 individuals that the platform says are still active on the site. At around 20%, that’s pretty high by current MOOC standards, though I don’t know whether that is something to be pleased about, since  it’s not at all clear what the right definition of “success” is for a MOOC.

Some might argue that 20% completion indicates that the standards are too low. I don’t think that’s true for my course. Completion does, after all, simply mean that a student is still engaged. The degree to which they have mastered the material is unclear. So having 80% drop out could mean the standard is too high.

In my case, I did not set out to achieve any particular completion rate; rather I adopted a WYSIWOSG approach — “What You See Is What Our Students Get.” I offered a MOOC that is essentially the first half of a ten week course I’ve given at many universities over the years, including Stanford. That meant my students would experience a Stanford-level course. But they would not be subject to passing a Stanford-level exam.

In fact, I could not offer anything close to a Stanford-exam experience. There is a Final Exam, and it has some challenging questions, but it is not taken under controlled, supervised conditions. Moreover,  since it involves constructing proofs, it cannot be machine graded, and thus has to be graded by other students, using a crowd sourcing method (Calibrated Peer Review). That put a significant limitation on the kinds of exam questions I could ask. On top of that, the grading is done by as many different people as there are students, and I assume most of them are not expert mathematicians. As a result, it’s at most a “better-than-nothing” solution. Would any of us want to be treated by a doctor whose final exam had been peer graded (only) by fellow students, even if the exam and the grading had been carried out under strictly controlled conditions?

On the other hand, looking at and attempting to evaluate the work of fellow students is a powerful learning experience, so if you view MOOCs as vehicles for learning, rather than a route to a qualification, then peer evaluation has a lot to be said for it. Traditional universities offer both learning and qualifications. MOOCs currently provide the former. Whether they eventually offer the latter as well remains to be seen. There are certainly ways it can be done, and that may be one way that MOOCs will make money. (Udacity already does offer a credentialing option, for a fee.)

In designing my course, I tried to optimize for learning in small groups, perhaps five to fifteen at a time. The goal was to build learning communities, within which students could help one another. Since there is no possibility of regular, direct interaction with the instructor (me) and my one TA (Paul), students have to seek help from fellow students. There is no other way. But, on its own, group work is not enough. Learning how to think mathematically (the focus of my course) requires feedback from others, but it needs to include feedback from people already expert in mathematical thinking. This means that, in order to truly succeed, not only do students need to work in groups (at least part of the time), and subject their attempts to the scrutiny of others, some of those interactions have to be with experts.

One original idea I had turned out not to work, though whether through the idea itself being flawed or the naive way we implemented it is not clear to me. That was to ask students at the start of the course to register if they had sufficient knowledge and experience with the course material to act as “Community TAs”, and be so designated in the discussion forums. Though over 600 signed up to play that role, many soon found they did not have sufficient knowledge to perform the task. Fortunately,a relatively small number of sign-ups did have the necessary background, as well as the interpersonal skills to give advice in a supporting, non-threatening way, and they more or less  ensured that the forum discussions met the needs of many students (or so it seems).

Another idea was to assign students to study groups, and use an initial survey to try to identify those with some background knowledge and seed them into the groups. Unfortunately, Coursera does not (yet) have functionality to support the creation and running of groups, apart from the creation of forum threads. So instead, in my first lecture, I suggested to the students that they form their own study groups in whatever way they could.

The first place to do that was, of course, the discussion forums on the course website, which very soon listed several pages of groups. Some used the discussion forum itself to work together, while others migrated offsite to some other location, physical or virtual, with Skype seeming a common medium. Shortly after the course launched, several students discovered GetStudyRoom, a virtual meeting place dedicated to MOOCs, built by a small startup company.

In any event, students quickly found their own solutions. But with students forming groups in so many different ways on different media, there was no way to track how many remained active or how successful they have been.

The study groups listed on the course website show a wide variety of criteria used to bring the groups together. Nationality and location were popular, with groups such as Brazil Study Group, Grupo de Estudo Português, All Students From Asia, and Study Group for Students Located in Karachi, Pakistan. Then there were groups with a more specific focus, such as Musicians, Parents of Homeschooled Children, Older/Retired English Speakers Discussion for Assignment 1, and, two of my favorites, After 8pm (UK time) English speakers with a day job and the delightfully named Just Hanging on Study Group.

The forum has seen a lot of activity: 15,088 posts and 13,622 comments, spread across 2712 different threads, viewed 430,769 times. Though I have been monitoring the forums on an almost daily basis, to maintain an overall sense of how the course is going, it’s clearly not possible to view everything. For the most part I restricted my attention to the posts that garnered a number of up-votes. Students vote posts up and down, and once a post shows 5 or more up-votes, I take that as an indication that the issue may be worth looking at.

The thread with the highest number of up-votes (165) was titled Deadlines way too short. Clearly, the question of deadlines was a hot topic. How, if at all, to respond to such feedback is no easy matter. In a course with tens of thousands of students, even a post with hundreds of up-votes represents just a tiny fraction of the class. Moreover, threads typically include opinions on both sides of an issue.

For instance, in threads about the pace of the course, some students complained that they did not have enough time to complete assignments, and pleaded for more relaxed deadlines, whereas others said they thrived on the pace, which stimulated them to keep on top of the material. For many, an ivy-league MOOC offers the first opportunity to experience an elite university course, and I think some are surprised at the level and pace. (I fact, I did keep the pace down for the first three weeks, but I also do that when I give a transition course in a regular setting, since I know how difficult it is to make that transition from high school math to university level mathematics.)

A common suggestion/request was to simply post the course materials online and let students access them according to their own schedules, much like Khan Academy. This raises a lot of issues about the nature of learning and the role MOOCs can (might? should?) play. But this blog post has already gone on long enough, so I’ll take up that issue next time.

To be continued …


I'm Dr. Keith Devlin, a mathematician at Stanford University. In fall 2012, I gave my first free, open, online math course and this spring I am giving my second. This blog chronicles my experiences as they happen.

Twitter Updates

New Book 2012

New book 2011

New e-book 2011

New book 2011

May 2013
M T W T F S S
« Mar    
 12345
6789101112
13141516171819
20212223242526
2728293031  

Follow

Get every new post delivered to your Inbox.

Join 442 other followers

%d bloggers like this: