calibrated peer review

In Part 6, I talk about the new Test Flight process.

In the past, when students enrolled for my MOOC, they essentially had three options. One was not to take it as a course at all, but just regard it as a resource to peruse over time or to pick and choose from. A second was to take the entire course, but do so on their own time-scale. Or they could take it as a course, and go through it at the designated pace.

As do many MOOC designers, I tried to make sure my course could be used in all three ways. Though the vast majority of MOOC students fall into the first category, the other two are the ones that require by far the greatest effort by the course designer. They are the learners who have significant ambitions and will put in a lot of effort over several weeks.

The students in the last category will surely gain the most. In particular, they move through the course in lockstep with a cohort of several thousand other students who can all learn from and support one another, as they face each course deadline at the same time. Those students form the core community that is the heart of the course.

When the new class enrolls at the start of February, the ones intending to take an entire course as scheduled will have a new choice. They can take what I am calling the Basic Course, which lasts eight weeks, or the Extended Course, which lasts ten. As I described in my last post, those extra two weeks are devoted to a process I am calling Test Flight.

In the previous two versions of the course, the final weeks nine and ten had been devoted to a Final Exam, one week for completion of the (open book) exam itself, the following week to peer evaluation. In peer evaluation, which started as soon as the class had completed and submitted their exam solutions, each student went through a number of activities:

1. Using a rubric I supplied, each student evaluated three completed examination scripts assembled by me, and then compared their results to mine. (Those three samples were selected by me to highlight particular features of evaluation that typically arise for those problems.)

2. Having thus had some initial practice at evaluation, each student then evaluated three examination scripts submitted by fellow students. (The Coursera platform randomly and anonymously distributed the completed papers.)

3. Each student then evaluated their own completed examination.

This was the system Coursera recommended, and for which they developed their peer evaluation module. (Actually, they suggested that each student evaluated five peer submissions, but at least for my course, that would have put a huge time requirement on the students, so I settled for three.)

Their original goal, and mine, was to provide a means for assigning course grades in a discipline where machine evaluation is not possible. The theory was that, if each student is evaluated by sufficiently many fellow students, each of whom had undergone an initial training period, then the final grade – computed from all the peer grades plus the self-grade – would be fairly reliable, and indeed there is research that supports this assumption. (Certainly, students who evaluate their own work immediately after evaluating that of other students tend to be very objective.)

As far as I could tell, the system worked as intended. If the goal of a MOOC is to take a regular university course and make it widely available on the Internet, then my first three sessions of the course were acceptably successful. But MOOCifying my regular Mathematical Thinking (transition) class was always just my starting point.

Since I was aware from the outset that the MOOC version of my regular classroom course was just a two-dimensional shadow of the real thing, where I interact with my class on a regular basis and give them specific feedback on their work, my intention always was to iteratively develop the MOOC into something that takes maximum advantage of the medium to provide something new of value – whatever that turns out to be.

I expected that, as MOOCs evolve, they would over time come to be structured differently and be used in ways that could be very different from our original design goals. That, after all, is what almost always happens with any new product or technology.

One thing I observed was that, while students often began feeling very nervous about the requirement that they evaluate the work of fellow students, and (justifiably) had significant doubts about being able to do a good job, the majority found the process of evaluating mathematical arguments both enjoyable and a hugely beneficial learning process.

Actually, I need to say a bit more about that “majority” claim. My only regular means of judging the reactions of the class to the various elements of the course was to read the postings on the course discussion forums. I spent at least an hour every day going through those forums, occasionally posting a response of my own, but mostly just reading.

Since the number of regular forum posters is in the hundreds, but the effective (full-term) class was in excess of 5,000 in each of the sessions, forum posters are, by virtue of being forum posters, not representative. Nevertheless, I had to proceed on the assumption that any issue or opinion that was shared (or voted up) by more than one or two forum posters was likely to reflect the views of a significant percentage of the entire (full-term) class.

Since I made gradual changes to the course based on that feedback, this means that over time, my course has been developing in a way that suits the more active forum posters. Arguably that is reasonable, since their level of activity suggests they are the ones most committed, and hence the ones whose needs and preferences the course should try to meet. Still, there are many uncertainties here.

To return to my point about the learning and comprehension benefits evaluators gained from analyzing work of their peers, that did not come as a surprise. I had found that myself when, as a graduate student TA, I first had to evaluate students’ work. I had observed it in my students when I had used it in some of my regular classes. And I had read and heard a number of reports from other instructors who noted the same thing.

It was when I factored the learning benefits of evaluating mathematical arguments in with my ongoing frustration with the degree to which “grade hunting” kept getting in the way of learning, that I finally decided to turn the whole exam part on its head.

While some universities and some instructors may set out to provide credentialing MOOCs, my goal was always to focus on the learning, drawing more on my knowledge of video games and video-game learning (see my blog profkeithdevlin.org) than on my familiarity with university education (see my Stanford homepage).

Most of what I know about giving a university-level course involves significant student-faculty interaction and interpersonal engagement, whereas a well-designed video game maintains the player’s attention and involvement using very different mechanisms. With a MOOC of necessity being absent any significant instructor-student interaction, I felt from the outset that the worlds of television and gaming would provide the key weapons I needed to create and maintain student attention in a MOOC.

[A lot of my understanding of how TV captures the viewer’s attention I learned from my close Stanford colleague, Prof Byron Reeves, who did a lot of the groundbreaking research in that area. He subsequently took his findings on television into the video game business, co-authoring the book Total Engagement: Using Games and Virtual Worlds to Change the Way People Work and Businesses Compete.]

So from the outset of my foray into the world of online education, I was looking to move away from traditional higher-education pedagogic models and structure, and towards what we know about (television and) video games, hopefully ending up with something of value in between.

The idea of awarding a Statement of Accomplishment based on accumulated grade points had to go sooner or later, and along with it the Final Exam. Hence, with Session Four, both will be gone. From now on, it is all about the experience – about trying (and failing!).

The intention for the upcoming session is that a student who completes the Basic Course will have learned enough to be able to make useful, and confident use of mathematical thinking in their work and in their daily lives. Completion of the Test Flight process in the Extended Course will (start to) prepare them for further study in mathematics or a mathematically-dependent discipline – or at least provide enough of a taste of university-level mathematics to help them decide if they want to pursue it further.

At heart, Test Flight is the original Final Exam process, but with a very different purpose, and accordingly structured differently.

As a course culmination activity, building on but separate from the earlier part of the course – and definitely not designed to evaluate what has been learned in the course – Test Flight has its own goal: to provide those taking part with a brief hands-on experience of “life as a mathematician.”

The students are asked to construct mathematical arguments to prove results, and then to evaluate other proofs of the same results. The format is just like the weekly Problem Sets that have met throughout the course, and performance level has no more or less significance.

The evaluation rubric, originally employed to try to guarantee accurate peer grading of the exam, has been modified to guide the evaluator in understanding what factors go into making a good mathematical argument. (I made that change in the previous session.)

After the students have used the rubric to evaluate the three Problem Set solutions supplied by me, they view a video in which I evaluate the same submissions. Not because mine provides the “correct” evaluations. There is usually no single solution to a question and no such thing as the “right” one. Rather, I am providing examples, so they can compare their evaluations with mine.

After that, they then proceed to evaluate three randomly-assigned, anonymously-presented submissions from other students, and finally they evaluate their own submission.

Procedurally, it is essentially the same as the previous Final Exam. But the emphasis has been totally switched from a focus on the person being evaluated (who wants to be evaluated fairly, of course) to the individual doing the evaluation (where striving for a reliable evaluation is a tool to aid learning on the part of the evaluator).

Though I ran a complete trial of the process last time, the course structure was largely unchanged. In particular, there was still a Final Exam for which performance affected the grade, and hence the awarding of a certificate. As a consequence, although I observed enough to give me confidence the Test Flight process could be made to work, there was a square-peg-in-a-round-hole aspect in what I did then that caused some issues.

I am hoping (and expecting) things will go smoother next time. For sure, further adjustments will be required. But overall, I am happy with the way things are developing. I feel the course is moving in the general direction I wanted to go when I set out. I believe I (and the successive generations of students) are slowly getting there. I just don’t know where “there” is exactly, what “there” looks like, and how far in the future we’ll arrive.

As the man said, “To boldly go …”

M	T	W	T	F	S	S
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

Tag: calibrated peer review

MathThink MOOC v4 – Part 6