Posts Tagged 'calibrated peer review'

MathThink MOOC v4 – Part 6

In Part 6, I talk about the new Test Flight process.

In the past, when students enrolled for my MOOC, they essentially had three options. One was not to take it as a course at all, but just regard it as a resource to peruse over time or to pick and choose from. A second was to take the entire course, but do so on their own time-scale. Or they could take it as a course, and go through it at the designated pace.

As do many MOOC designers, I tried to make sure my course could be used in all three ways. Though the vast majority of MOOC students fall into the first category, the other two are the ones that require by far the greatest effort by the course designer. They are the learners who have significant ambitions and will put in a lot of effort over several weeks.

The students in the last category will surely gain the most. In particular, they move through the course in lockstep with a cohort of several thousand other students who can all learn from and support one another, as they face each course deadline at the same time. Those students form the core community that is the heart of the course.

When the new class enrolls at the start of February, the ones intending to take an entire course as scheduled will have a new choice. They can take what I am calling the Basic Course, which lasts eight weeks, or the Extended Course, which lasts ten. As I described in my last post, those extra two weeks are devoted to a process I am calling Test Flight.

In the previous two versions of the course, the final weeks nine and ten had been devoted to a Final Exam, one week for completion of the (open book) exam itself, the following week to peer evaluation. In peer evaluation, which started as soon as the class had completed and submitted their exam solutions, each student went through a number of activities:

1. Using a rubric I supplied, each student evaluated three completed examination scripts assembled by me, and then compared their results to mine. (Those three samples were selected by me to highlight particular features of evaluation that typically arise for those problems.)

2. Having thus had some initial practice at evaluation, each student then evaluated three examination scripts submitted by fellow students. (The Coursera platform randomly and anonymously distributed the completed papers.)

3. Each student then evaluated their own completed examination.

This was the system Coursera recommended, and for which they developed their peer evaluation module. (Actually, they suggested that each student evaluated five peer submissions, but at least for my course, that would have put a huge time requirement on the students, so I settled for three.)

Their original goal, and mine, was to provide a means for assigning course grades in a discipline where machine evaluation is not possible. The theory was that, if each student is evaluated by sufficiently many fellow students, each of whom had undergone an initial training period, then the final grade – computed from all the peer grades plus the self-grade – would be fairly reliable, and indeed there is research that supports this assumption. (Certainly, students who evaluate their own work immediately after evaluating that of other students tend to be very objective.)

As far as I could tell, the system worked as intended. If the goal of a MOOC is to take a regular university course and make it widely available on the Internet, then my first three sessions of the course were acceptably successful. But MOOCifying my regular Mathematical Thinking (transition) class was always just my starting point.

Since I was aware from the outset that the MOOC version of my regular classroom course was just a two-dimensional shadow of the real thing, where I interact with my class on a regular basis and give them specific feedback on their work, my intention always was to iteratively develop the MOOC into something that takes maximum advantage of the medium to provide something new of value – whatever that turns out to be.

I expected that, as MOOCs evolve, they would over time come to be structured differently and be used in ways that could be very different from our original design goals. That, after all, is what almost always happens with any new product or technology.

One thing I observed was that, while students often began feeling very nervous about the requirement that they evaluate the work of fellow students, and (justifiably) had significant doubts about being able to do a good job, the majority found the process of  evaluating mathematical arguments both enjoyable and a hugely beneficial learning process.

Actually, I need to say a bit more about that “majority” claim. My only regular means of judging the reactions of the class to the various elements of the course was to read the postings on the course discussion forums. I spent at least an hour every day going through those forums, occasionally posting a response of my own, but mostly just reading.

Since the number of regular forum posters is in the hundreds, but the effective (full-term) class was in excess of 5,000 in each of the sessions, forum posters are, by virtue of being forum posters, not representative. Nevertheless, I had to proceed on the assumption that any issue or opinion that was shared (or voted up) by more than one or two forum posters was likely to reflect the views of a significant percentage of the entire (full-term) class.

Since I made gradual changes to the course based on that feedback, this means that over time, my course has been developing in a way that suits the more active forum posters. Arguably that is reasonable, since their level of activity suggests they are the ones most committed, and hence the ones whose needs and preferences the course should try to meet. Still, there are many uncertainties here.

To return to my point about the learning and comprehension benefits evaluators gained from analyzing work of their peers, that did not come as a surprise. I had found that myself when, as a graduate student TA, I first had to evaluate students’ work. I had observed it in my students when I had used it in some of my regular classes. And I had read and heard a number of reports from other instructors who noted the same thing.

It was when I factored the learning benefits of evaluating mathematical arguments in with my ongoing frustration with the degree to which “grade hunting” kept getting in the way of learning, that I finally decided to turn the whole exam part on its head.

While some universities and some instructors may set out to provide credentialing MOOCs, my goal was always to focus on the learning, drawing more on my knowledge of video games and video-game learning (see my blog profkeithdevlin.org) than on my familiarity with university education (see my Stanford homepage).

Most of what I know about giving a university-level course involves significant student-faculty interaction and interpersonal engagement, whereas a well-designed video game maintains the player’s attention and involvement using very different mechanisms. With a MOOC of necessity being absent any significant instructor-student interaction, I felt from the outset that the worlds of television and gaming would provide the key weapons I needed to create and maintain student attention in a MOOC.

[A lot of my understanding of how TV captures the viewer’s attention I learned from my close Stanford colleague, Prof Byron Reeves, who did a lot of the groundbreaking research in that area. He subsequently took his findings on television into the video game business, co-authoring the book Total Engagement: Using Games and Virtual Worlds to Change the Way People Work and Businesses Compete.]

So from the outset of my foray into the world of online education, I was looking to move away from traditional higher-education pedagogic models and structure, and towards what we know about (television and) video games, hopefully ending up with something of value in between.

The idea of awarding a Statement of Accomplishment based on accumulated grade points had to go sooner or later, and along with it the Final Exam. Hence, with Session Four, both will be gone. From now on, it is all about the experience – about trying (and failing!).

The intention for the upcoming session is that a student who completes the Basic Course will have learned enough to be able to make useful, and confident use of mathematical thinking in their work and in their daily lives. Completion of the Test Flight process in the Extended Course will (start to) prepare them for further study in mathematics or a mathematically-dependent discipline – or at least provide enough of a taste of university-level mathematics to help them decide if they want to pursue it further.

At heart, Test Flight is the original Final Exam process, but with a very different purpose, and accordingly structured differently.

As a course culmination activity, building on but separate from the earlier part of the course – and definitely not designed to evaluate what has been learned in the course – Test Flight has its own goal: to provide those taking part with a brief hands-on experience of “life as a mathematician.”

The students are asked to construct mathematical arguments to prove results, and then to evaluate other proofs of the same results. The format is just like the weekly Problem Sets that have met throughout the course, and performance level has no more or less significance.

The evaluation rubric, originally employed to try to guarantee accurate peer grading of the exam, has been modified to guide the evaluator in understanding what factors go into making a good mathematical argument.  (I made that change in the previous session.)

After the students have used the rubric to evaluate the three Problem Set solutions supplied by me, they view a video in which I evaluate the same submissions. Not because mine provides the “correct” evaluations. There is usually no single solution to a question and no such thing as the “right” one. Rather, I am providing examples, so they can compare their evaluations with mine.

After that, they then proceed to evaluate three randomly-assigned, anonymously-presented submissions from other students, and finally they evaluate their own submission.

Procedurally, it is essentially the same as the previous Final Exam. But the emphasis has been totally switched from a focus on the person being evaluated (who wants to be evaluated fairly, of course) to the individual doing the evaluation (where striving for a reliable evaluation is a tool to aid learning on the part of the evaluator).

Though I ran a complete trial of the process last time, the course structure was largely unchanged. In particular, there was still a Final Exam for which performance affected the grade, and hence the awarding of a certificate. As a consequence, although I observed enough to give me confidence the Test Flight process could be made to work, there was a square-peg-in-a-round-hole aspect in what I did then that caused some issues.

I am hoping (and expecting) things will go smoother next time. For sure, further adjustments will be required. But overall, I am happy with the way things are developing. I feel the course is moving in the general direction I wanted to go when I set out. I believe I (and the successive generations of students) are slowly getting there. I just don’t know where “there” is exactly, what “there” looks like, and how far in the future we’ll arrive.

As the man said, “To boldly go …”

Advertisement

Evaluation rubrics: the good, the bad, and the ugly

A real-time chronicle of a seasoned professor just about to launch the third edition of his massively open online course.

With the third session of my MOOC Introduction to Mathematical Thinking starting on September 2, I am busy putting the final touches to the course materials. As I did when I offered the second session earlier this year, I have made some changes to the way the course is structured. The underlying content remains the same, however – indeed at heart it has not changed since I first began teaching a high school to university “transition” course back in the late 1970s, when I was a young university lecturer just starting out on my career.

With the primary focus on helping students develop an new way of thinking, the course was always very light on “content” but high on internal reflection. A typical assignment question might require four or five minutes to write out the answer; but getting to the point where that is possible might take the student several hours of thought, sometimes days. Students who approach the course thinking it is an introductory course on logic – some of whom likely will, as they have in the past,  post on the course forum that they cannot understand why I am proceeding so slowly and making such heavy weather of the material – will, if they don’t walk away in disgust, eventually (by about week four) realize they are completely lost. Habituated to courses that rush through a pile of material that required mostly procedural mastery, they find it challenging, and in many cases impossible, to slow down and adopt the questioning, reflective approach this course requires.

My course uses elementary linguistics and formal logic as a vehicle to help develop new thinking skills that are essential for university mathematics majors, very valuable for STEM majors, and of considerable value for anyone who wants to lead a more rewarding life. But it is definitely not a course in linguistics or logic. It is about thinking.

Starting with an analysis of certain features of ordinary language, as I do, provides a starting point that is accessible to everyone – though because the language I examine is English, students for whom that is a second language are at a disadvantage. That is unavoidable. (A Spanish language version, embedded in Hispanic culture, is currently under development. I hope other deep translations follow.)

And formal logic is so simple and structured, and so accessible to a beginner, that it too is well suited to an introductory level course on analytic, and in particular mathematical, thinking.

Why my course videos are longer than most

The imperative of a student devoting substantial periods of time engaged in sustained contemplation of the course material has led to me making two decisions that go against the current grain in MOOCs. First, the pace is slow. I speak far more slowly than I normally do, and I repeat each point at least once, and often more so. Second, I do not break my “lectures” into the now-almost-obligatory no-longer-than-seven-and-ideally-under-three-minutes snippets. For the course’s second running, I did split the later hour or more long videos into half-hour sections, but that was to make it easier for students without fast broadband access, who have to download the videos overnight to watch them.

Of course, students can speed up or slow down the videos, they can watch them as many times as they want, and they can stop and start them to suit their schedules. But then they are in control and make those decisions based on their own progress and understanding. My course does not come pre-digested. It is slow cooking, not fast food.

Learning by evaluation

The main difference returning students will notice in the new session is the much greater emphasis on developing evaluation skills. Fairy early in the course, students will be presented with purported mathematical proofs that they have to evaluate according to a grading rubric.

At first these will be fairly short arguments, designed by me to illustrate various key features of proofs, and often incorporating common mistakes beginners make. Later on, the complexity increases. For those students who elect to take the final exam (and thereby become eligible to earn a Distinction grade for the course), evaluation will culminate in grading three randomly assigned, anonymized exam submissions from fellow students, followed by grading their own submission.

Peer evaluation is essential in MOOCs that involve work that cannot be machine graded, definitely the category into which my Mathematical Thinking course falls. The method I use for the Final Exam is called Calibrated Peer Review. It has a long history and proven acceptable results. (I describe it in some detail on my MOOC course website – accessible to anyone who signs up for the course.) So adopting peer evaluation for my course was unavoidable.

The first time I offered the course, I delayed peer evaluation until the final couple of weeks, when it was restricted to the final exam. Though things went better than I had feared, there were problems. The main issues, which came as no surprise, were, first, that many students felt very uneasy grading the work of others, second, many of them did not do a good job, and third, the rubric (which I had taken off another university’s Internet shelf) did not work at all well.

On the other hand, many students posted forum comments saying they found they enjoyed that part of the course, and learned more in those final two weeks than in the entire earlier part of the course.

I had in fact expected this would be the case, and had told the class early on that many of them would have that reaction. In particular, evaluating the work of fellow students is a very powerful, known way to learn new material. Nevertheless, it came as a great relief when this actually transpired.

As a result of my experience in the first session, when I gave the course a second time this spring, I increased the number of assignment exercises that required students to evaluate purported proofs. I also altered the rubric to make it better suited to what I see as the main points in the course.

The outcome, as far as I could ascertain from reading the comments student posted on the course discussion forum, was that it went much better. But it was still far from perfect. The two main issues were the rubric itself and how to use it.

Designing a rubric

Designing a good rubric is not at all easy for any course, and I think particularly challenging for a course on more advanced parts of mathematics. Qualitative grading of mathematical arguments, like grading essays or works of art, is a holistic skill that takes years to acquire to a degree it can be used to evaluate performance with some degree of reliability. A beginner attempting evaluation needs guidance, most typically provided by an evaluation rubric. The idea is to replace the holistic application of a lifetime’s acquisition of tacit domain knowledge with a number of categories that the evaluator should look for.

The more fine-grained the rubric, the easier it will be for the novice evaluator, but the more onerous the grading task becomes. The rubric I started with for my course had six factors, which I felt was about right – enough to make the task doable for the student yet not too many to turn it into a dull chore. I have retained that number. But, based on the experiences of students using the rubric, I changed several categories the first time I repeated the course and I have changed one category for the upcoming third session.

In each of the six categories in the rubric, the student must chose between three levels, which I name Novice, Apprentice, and Practitioner. I chose the names to emphasize that we are using evaluation as a way to learn, and the focus is to measure progress along a path of development, not assign summative performance judgments of “poor”, “okay”, and “good”.

The intention in having just three levels is to force a student evaluator to make a decision about the work being assessed. But this can be particularly difficult for a beginner who is, of course, lacking in confidence in their ability to do that. To counter that, in this third session, when the student enters the numerical value that course software will use to track progress, the numerical equivalents to those three categories are not 0, 1, 2, but 0, 2, and 4. The student can enter 1 or 3 as a “middle value” if they are undecided as to which category to assign.

Using the rubric

Even with “middling” grades available for the rubric items, most students will find the evaluation process difficult and very time consuming. A rubric simply breaks a single evaluation task into a number of smaller evaluation tasks, six in my case. In so doing, it guides the student as to what things to look for, but the student still has to make qualitative judgments within each of the categories.

To help them make these judgments, the last time I gave the course, I provided them with tutorial videos that take them through the grading process. I record myself grading the same sample arguments that they have just attempted to evaluate, verbalizing my thinking process as I go, explaining why I make the calls I do. They are not the most riveting of videos, and they can be a bit long (ten minutes for some assignment questions). But I don’t know of any other way of conveying something of the expertise I have built up over a lifetime. It is essentially a modern implementation of the age-old apprentice system of acquiring tacit knowledge by working alongside the expert.

Unfortunately, as an expert, I make calls based on important distinctions that for me jump from the student’s page, but are not even remotely apparent to a beginner. The result last time was, for some questions, considerable frustration on the part of the students.

To try to mitigate this problem (I don’t think it can be eliminated), I changed some aspects of the way the rubric is formulated and described, and decided to introduce the entire evaluation notion much earlier in the course. The result is that evaluation is now a very central component of the course. Indeed, evaluating mathematical arguments now plays a role equal to constructing them.

If it goes well – and based on my previous experience with this course, I think it will go better than last time – I will almost certainly adopt a similar approach if and when I give the course in a traditional classroom setting once again. (A heavy travel schedule associated with running a research lab means I have not taught a regular undergraduate class for several years now, though an attractive offer to spend a term at Princeton early next year will give me a much welcomed opportunity to spend some time in the classroom once again.)

Evaluating to learn, not to grade

One feature of a MOOC – or at least a MOOC like mine that does not offer college credit – is that the focus is on learning, not acquiring a credential. Thus, grading can be used entirely for formative purposes, as a guide to progress, not to provide a summative measure of achievement. As an instructor, I find the separation of the teaching and the grading extremely freeing. For one thing, with the assignment of grades out of the picture, the relationship between teacher and student is changed significantly. Also, it means numerical grades can be used as useful indicators of progress. A grade of 35% can be given for a piece of work annotated as “good” (i.e., good for someone taking an introductory course for the first time). The number indicates how much improvement would be required to take the student to the level of an expert practitioner.

To be sure, students who encounter this use of grades for the first time find it takes some getting used to. They are so habituated to the (nonsensical but widespread) notion that anything less than an A is a “failure” that they can be very discouraged when their work earns them a “mere” 35%. But in order to function as a school-to-university transition course, it has to help them adjust to a world where 35% if often a respectable passing grade.

(A student who regularly scores in the 90% range in advanced undergraduate mathematics courses can likely jump straight into a Ph.D. program – and some have done just that. 35% really can be a good result for a beginner.)

One final point about peer evaluation is an issue I encountered last time that surprised me, though perhaps it should not have, given everything I know about a lot of high school mathematics instruction. Many students approached grading the work of others as a punitive process of looking to deduct points. Some went so far as to complain (sometimes angrily) on the discussion forums about my video-streamed grading as being far too lenient.

In fact, one or two even held the view that if a mathematical argument was not logically correct, the only possible grade to give was 0. This particular perspective worried me on two counts.

Firstly, it assumes a degree of logical infallibility that no living mathematician possesses. I doubt there is a single published mathematical proof of more than a few paragraphs that does not include some minor logical slips, and hence is technically incorrect. (Most of the geometric proofs in Euclid’s Elements would score 0 if logical correctness were the sole metric!)

Second, my course is not a mathematics course, it is about mathematical thinking, and has the clearly stated aim of looking at the many different aspects of mathematical arguments required to make them “good.” Logical correctness is just one item on that six-point rubric. As a result, at most 4 of the possible 24 points available can be deducted in an argument is logically incorrect. (Actually, 8 can be deducted, as the final category is “Overall assessment”, designed to encourage precisely what the phrase suggest.)

To be sure, if my course were a mathematics course, I would assign greater weight to logical correctness. As it is, all six categories carry equal weight. But that is deliberate. Most of my students’ entire mathematical education has been in a world where “getting the right answer” is the holy grail. One other objective of transition courses is to break them of that debilitating default assumption.

Finally, and remember, this is for posterity, so be honest. How do you feel?

I’ve written elsewhere that I think MOOCs as such will not be the cause of a revolution in higher education. Rather they are just part of what is more like to be an evolution, though a major one to be sure. From the point of view of an instructor, though, they are providing us with a wonderful domain to re-examine all of our assumptions about how to teach and how students learn. As you can surely tell, I continue to have a blast in the MOOCasphere.

To be continued …

Peer grading: inventing the light bulb

A real-time chronicle of a seasoned professor who has just completed giving his first massively open online course.

With the deadline for submitting the final exam in my MOOC having now passed, the students are engaging in the Peer Evaluation process. I know of just two cases where this has been tried in a genuine MOOC (where the M means what it says), one in Computer Science, the other in Humanities, and both encountered enormous difficulties, and as a result a lot of student frustration. My case was no different.

Anticipating problems, I had given the class a much simplified version of the process – with no grade points at stake – at the end of Week 4, so they could familiarize themselves with the process and the platform mechanics before they had to do it for real. That might have helped, but the real difficulties only emerged when 1,520 exam scripts started to make their way through the system.

By then the instructional part of the course was over. The class had seen and worked through all the material in the curriculum, and had completed five machine-graded problem sets. Consequently, there were enough data in the system to award certificates fairly if we had to abandon the peer evaluation process as a grading device, as happened for that humanities MOOC I mentioned, where the professor decided on the fly to make that part of the exam optional. So I was able to sleep at night. But only just.

With over 1,000 of the students now engaged in the peer review process, and three days left to the deadline for completing grading, I am inclined to see the whole thing through to the (bitter) end. We need the data that this first trial will produce so we can figure out how to make it work better next time.

Long before the course launched, I felt sure that there were two things we would need to accomplish, and accomplish well, in order to make a (conceptual, proof-oriented) advanced math MOOC work: the establishment (and data gathering from) small study groups in which students could help one another, and the provision of a crowd-sourced evaluation and grading system.

When I put my course together, the Coursera platform supported neither. They were working on a calibrated peer review module, but implementing the group interaction side was still in the future. (The user-base growth of Coursera has been so phenomenal, it’s a wonder they can keep the system running at all!)

Thus, when my course launched, there was no grouping system, nor indeed any social media functionality other than the common discussion forums. So the students had to form their own groups using whatever media they could: Facebook, Skype, Google Groups, Google Docs, or even the local pub, bar, or coffee shop for co-located groups. Those probably worked out fine, but since they were outside our platform, we had no way to monitor the activity – an essential functionality if we are to turn this initial, experimental phase of MOOCs  into something robust and useful in the long term.

Coursera had built a beta-release, peer evaluation system for a course on Human Computer Interaction, given by a Stanford colleague of mine. But his needs were different from mine, so the platform module needed more work – more work than there was really time for! In my last post, I described some of the things I had to cope with to get my exam up and running. (To be honest, I like the atmosphere of working in startup mode, but even in Silicon Valley there are still only 24 hours in a day.)

It’s important to remember that the first wave of MOOCs in the current, explosive, growth period all came out of computer science departments, first at Stanford, then at MIT. But CS is an atypical case when it comes to online learning. Although many aspects of computer science involve qualitative judgments and conceptual reasoning, the core parts of the subject are highly procedural, and lend themselves to instruction-based learning and to machine evaluation and grading. (“Is that piece of code correct?” Just see if it runs as intended.)

The core notion in university level mathematics, however, is the proof. But you can’t learn how to prove something by being told or shown how to do it any more than you can learn how to ride a bike by being told or shown. You have to try for yourself, and keep trying, and falling, until it finally clicks. Moreover, apart from some very special, and atypical, simple cases, proofs cannot be machine graded. In that regard, they are more like essays than calculations. Indeed, one of the things I told my students was that a good proof is a story, that explains why something is the case.

Feedback from others struggling to master abstract concepts and proofs can help enormously. Study groups can provide that, along with the psychological stimulus of knowing that others are having just as much difficulty as you are. Since companies like Facebook have shown us how to build platforms that support the creation of groups, that part can be provided online. And when Coursera is able to devote resources to doing it, I know it will work just fine. (If they want to, they can simply hire some engineers from Facebook, which is little more than a mile away. I gather that, like Google before it, the fun period there has long since passed and fully vested employees are looking to move.)

The other issue, that of evaluation and grading, is more tricky. The traditional solution is for the professor to evaluate and grade the class, perhaps assisted by one or more TAs (Teaching Assistants). But for classes that number in the tens of thousands, that is clearly out of the question. Though it’s tempting to dream about building a Wikipedia-like community of dedicated, math-PhD-bearing volunteers, who will participate in a mathematical MOOC whenever it is offered – indeed I do dream about it – it would take time to build up such a community, and what’s more, it’s hard to see there being enough qualified volunteers to handle the many different math MOOCs that will soon be offered by different instructors. (In contrast, there is just one Wikipedia, of course.)

That leaves just one solution: peer grading, where all the students in the class, or at least a significant portion thereof, are given the task of grading the work of their peers. In other words, we have to make this work. And to do that, we have to take the first step. I just did.

Knowing just how many unknowns we were dealing with, my expectations were not high, and I tried to prepare the students for what could well turn out to be chaos. (It did.) The website description of the exam grading system was littered with my cautions and references to being “live beta”. On October 15, when the test run without the grading part was about to launch, I posted yet one more cautionary note on the main course announcements page:

… using the Calibrated Peer Review System for a course like this is, I believe, new. (It’s certainly new to me and my assistants!) So this is all very much experimental. Please approach it in that spirit!

Even so, many of the students were taken aback by just how clunky and buggy the thing was, and the forums sprung alive with exasperated flames. I took solace in the recent release of Apple Maps on the iPhone, which showed that even with the resources and expert personnel available to one of the world’s wealthiest companies, product launches can go badly wrong – and we were just one guy and two part-time, volunteer student assistants, working on a platform being built under us by a small startup company sustained on free Coke and stock options. (I’m guessing the part about the Coke and the options, but that is the prevalent Silicon Valley model.)

At which point, one of those oh-so-timely events occurred that are often described as “Acts of God.” Just when I worried that I was about to witness, and be responsible for starting, the first global, massive open online riot (MOOR) in a math class, Hurricane Sandy struck the Eastern Seaboard, reminding everyone that a clunky system for grading math exams is not the worst thing in the world. Calm, reasoned, steadying, constructive posts started to appear on the forum.  I was getting my feedback after all. The world was a good place once again.

Failure (meaning things don’t go smoothly, or maybe don’t work at all) doesn’t bother me. If it did, I’d never have become a mathematician, a profession in which the failure rate in first attempts to solve a problem is somewhere north of 95%. The important thing is to get enough data to increase the chances of getting it right – or far more likely, just getting it better – the second time round. Give me enough feedback, and I count that “failure” as a success.

As Edison is said to have replied to a young reporter about his many failed attempts to construct a light bulb, “Why would I ever give up? I now know definitively over 9,000 ways that an electric light bulb will not work. Success is almost in my grasp.” (Edison supposedly failed a further 1,000 times before he got it right. Please don’t tell my students that. We are just at failure 1.)

If there were one piece of advice I’d give to anyone about to give their first MOOC, it’s this: remember Edison.

To be continued …

It’s About Time (in Part): MOOC Planning – Part 10

 A real-time chronicle of a seasoned professor embarking on his first massively open online course.

Well, lectures have ended and the course has now switched gears. For those still left in the course (17% of the final enrollment total of 64,045), the next two weeks are focused on trying to make sense of everything they have learned, and working on the final exam — which in the case of my course involves peer evaluation.

Calibrated Peer Review is not new. A study of its use in the high school system by Sadler and Good, published in 2006, has become compulsory reading for those of us planning and giving MOOCs that cover material that cannot be machine graded. [If you want to see how I am using it, just enroll in the class and read the description of the “Peer Review system”. There is no obligation to do anything more than browse around the site! No one will know you are not simply a dog that can use a computer.]

As I was working on my course, Coursera was still frantically building out their platform to support peer evaluation. There was a lot of just-in-time construction. It’s been a long time since I’ve had to go behind a user-friendly interface and dig into the underlying code to do something on a computer, and the programming languages have all changed since I last did that.

One thing I had to learn was one of the ways networked computers keep time. I now know that at the time of writing these words, 7:00AM Pacific Daylight Time on October 22, 2012,  exactly 1,350,914,400 seconds have elapsed since the first second of January 1st, 1970, Eastern Standard Time. That was the start of Unix Time.

I needed to learn to work in Unix Time in order to set the various opening times and completion deadlines for the exam process. I expect that by the time the next instructor puts together a MOOC, she or he will be greeted by a nice, friendly Coursera interface with pulldown menus and boxes to tick — which probably will come as a great relief to any humanities professors reading this, who don’t have any programming in their background.

[By coincidence, Unix was the last programming language I had any proficiency in, but I did not need to know Unix to use Unix Time. I just used an online converter. Unix was developed in 1969 at AT&T Bell Laboratories in New Jersey. Hence the 1970 EST baseline.]

In fact, time conversion issues in general turned out to be a  continuing, major headache in a course with students all over the world. One thing we will not do again is have 12:00PM Stanford Time, aka Coursera Time (i.e., PDT), as any of the course deadlines. It might seem a nice clean stopping point, and there are all those memories of Gary Cooper’s deadline in the classic Western movie High Noon, but many students missed the deadline for the first submitted assignment because they thought 12:00PM meant midnight, which in some parts of the world made them a whole day late.

The arbitrary illogicality of the AM/PM distinction is not apparent to those of us who grew up with it. But my course TA and I are now very aware of the problems it can lead to! In future, we’ll stick to unambiguous times that stay away from noon and midnight. But even then, with local computer systems usually working on local time, to say nothing of the different Summer and Winter Times, which change on different dates around the world, timing events in MOOCs is going to remain a problematic issue, just as it is for international travelers and professionals who collaborate globally over Skype and other conferencing services. (When I used the Unix Time conversion app, I had to remember that Unix thinks New Jersey is currently just two hours ahead of California, not the three hours United Airlines uses when it flies me there. Confusing, isn’t it?)

The reason why times are an issue in my course is that it is a course. At first glance, it may look little different from Khan Academy, where there are no time issues at all. But Khan Academy is really just an educational resource. (At least, that’s the part most people are familiar with and use, namely the video library that started it all. People use it as a video version of a textbook — or more precisely a video equivalent to that good old standby Cliffs Notes, which got many of us through an exam in an obligatory subject we were not particularly interested in.)

In contrast, in my case, as I’ve discussed earlier in this blog series (in particular, Part 6), my goal was to take a standard university course (one I’ve given many times over the years, at different universities, including Stanford) and make it available to anyone in the world, for free. To the degree I could make it happen, they would get the same learning experience.

That meant that the main goal would be to build a (short-lived) learning community. The video-recorded lectures and tutorials were simply tools to make that happen and to orchestrate events. Real learning takes place when students work on assignments on their own, when they repeatedly fail to solve a problem, and when they interact (with the professor and with one another) — not when they watch a lecture or read a book.

To achieve that goal, the MOOC would, as I stated in Part 6, involve admissions, lectures, peer interaction, professor interaction, problem-solving, assignments, exams, deadlines, and certification. To use the mnemonic I coined early on in this series, the basic design principle is WYSIWOSG: What You See Is What Our Students Get.

As we go forward, I intend to iterate on the course design, based on the data we collect from the students (and 64,000 students very definitely puts us into the Big Data realm). But my basic principle will remain that of offering a course, not the provision of a video library. And the reason for that should be obvious to anyone who has been following this blog series, as well as some of the posts on my other blogs Devlin’s Angle and profkeithdevlin.org. The focus is not on acquiring facts or mastering basic skills, but on learning to think a certain way (in my case, like a professional mathematician). And that requires both a lot of effort and (for most of us) a lot of interaction with others trying to achieve the same goal.

Our ancestors in the 11th Century started to develop what to this day remains the best way we know to achieve this at scale: the university, where people become members of a learning community in which learning takes place in a hothouse atmosphere that involves periods of intense interaction as deadlines loom, sustained by the rapidly formed social bonds that emerge as a result of that same pressure.

While I will likely experiment with variants of this model that allow for participation by students who have demanding, full-time jobs, I doubt I will abandon that basic model. It has lasted for a thousand years for a good reason. It works.

To be continued …

Final Lecture: MOOC Planning – Part 9

A real-time chronicle of a seasoned professor embarking on his first massively open online course.

I gave my last lecture of the course yesterday (discounting the tutorial session that will go out next week), and we are now starting a two week exam period.

“Giving” a lecture means the video becomes available for streaming. For logistic reasons (high among them, my survival and continued sanity — assuming anyone who organizes and gives a MOOC, for no payment, is sane), I recorded all the lectures weeks ago, well before the course started.  The weekly tutorial sessions come the closest to being live. I record them one or two days before posting, so I can use them to respond to issues raised in the online course discussion forum.

The initial course enrollment of 63,649 has dropped to 11,848 individuals that the platform says are still active on the site. At around 20%, that’s pretty high by current MOOC standards, though I don’t know whether that is something to be pleased about, since  it’s not at all clear what the right definition of “success” is for a MOOC.

Some might argue that 20% completion indicates that the standards are too low. I don’t think that’s true for my course. Completion does, after all, simply mean that a student is still engaged. The degree to which they have mastered the material is unclear. So having 80% drop out could mean the standard is too high.

In my case, I did not set out to achieve any particular completion rate; rather I adopted a WYSIWOSG approach — “What You See Is What Our Students Get.” I offered a MOOC that is essentially the first half of a ten week course I’ve given at many universities over the years, including Stanford. That meant my students would experience a Stanford-level course. But they would not be subject to passing a Stanford-level exam.

In fact, I could not offer anything close to a Stanford-exam experience. There is a Final Exam, and it has some challenging questions, but it is not taken under controlled, supervised conditions. Moreover,  since it involves constructing proofs, it cannot be machine graded, and thus has to be graded by other students, using a crowd sourcing method (Calibrated Peer Review). That put a significant limitation on the kinds of exam questions I could ask. On top of that, the grading is done by as many different people as there are students, and I assume most of them are not expert mathematicians. As a result, it’s at most a “better-than-nothing” solution. Would any of us want to be treated by a doctor whose final exam had been peer graded (only) by fellow students, even if the exam and the grading had been carried out under strictly controlled conditions?

On the other hand, looking at and attempting to evaluate the work of fellow students is a powerful learning experience, so if you view MOOCs as vehicles for learning, rather than a route to a qualification, then peer evaluation has a lot to be said for it. Traditional universities offer both learning and qualifications. MOOCs currently provide the former. Whether they eventually offer the latter as well remains to be seen. There are certainly ways it can be done, and that may be one way that MOOCs will make money. (Udacity already does offer a credentialing option, for a fee.)

In designing my course, I tried to optimize for learning in small groups, perhaps five to fifteen at a time. The goal was to build learning communities, within which students could help one another. Since there is no possibility of regular, direct interaction with the instructor (me) and my one TA (Paul), students have to seek help from fellow students. There is no other way. But, on its own, group work is not enough. Learning how to think mathematically (the focus of my course) requires feedback from others, but it needs to include feedback from people already expert in mathematical thinking. This means that, in order to truly succeed, not only do students need to work in groups (at least part of the time), and subject their attempts to the scrutiny of others, some of those interactions have to be with experts.

One original idea I had turned out not to work, though whether through the idea itself being flawed or the naive way we implemented it is not clear to me. That was to ask students at the start of the course to register if they had sufficient knowledge and experience with the course material to act as “Community TAs”, and be so designated in the discussion forums. Though over 600 signed up to play that role, many soon found they did not have sufficient knowledge to perform the task. Fortunately,a relatively small number of sign-ups did have the necessary background, as well as the interpersonal skills to give advice in a supporting, non-threatening way, and they more or less  ensured that the forum discussions met the needs of many students (or so it seems).

Another idea was to assign students to study groups, and use an initial survey to try to identify those with some background knowledge and seed them into the groups. Unfortunately, Coursera does not (yet) have functionality to support the creation and running of groups, apart from the creation of forum threads. So instead, in my first lecture, I suggested to the students that they form their own study groups in whatever way they could.

The first place to do that was, of course, the discussion forums on the course website, which very soon listed several pages of groups. Some used the discussion forum itself to work together, while others migrated offsite to some other location, physical or virtual, with Skype seeming a common medium. Shortly after the course launched, several students discovered GetStudyRoom, a virtual meeting place dedicated to MOOCs, built by a small startup company.

In any event, students quickly found their own solutions. But with students forming groups in so many different ways on different media, there was no way to track how many remained active or how successful they have been.

The study groups listed on the course website show a wide variety of criteria used to bring the groups together. Nationality and location were popular, with groups such as Brazil Study Group, Grupo de Estudo Português, All Students From Asia, and Study Group for Students Located in Karachi, Pakistan. Then there were groups with a more specific focus, such as Musicians, Parents of Homeschooled Children, Older/Retired English Speakers Discussion for Assignment 1, and, two of my favorites, After 8pm (UK time) English speakers with a day job and the delightfully named Just Hanging on Study Group.

The forum has seen a lot of activity: 15,088 posts and 13,622 comments, spread across 2712 different threads, viewed 430,769 times. Though I have been monitoring the forums on an almost daily basis, to maintain an overall sense of how the course is going, it’s clearly not possible to view everything. For the most part I restricted my attention to the posts that garnered a number of up-votes. Students vote posts up and down, and once a post shows 5 or more up-votes, I take that as an indication that the issue may be worth looking at.

The thread with the highest number of up-votes (165) was titled Deadlines way too short. Clearly, the question of deadlines was a hot topic. How, if at all, to respond to such feedback is no easy matter. In a course with tens of thousands of students, even a post with hundreds of up-votes represents just a tiny fraction of the class. Moreover, threads typically include opinions on both sides of an issue.

For instance, in threads about the pace of the course, some students complained that they did not have enough time to complete assignments, and pleaded for more relaxed deadlines, whereas others said they thrived on the pace, which stimulated them to keep on top of the material. For many, an ivy-league MOOC offers the first opportunity to experience an elite university course, and I think some are surprised at the level and pace. (I fact, I did keep the pace down for the first three weeks, but I also do that when I give a transition course in a regular setting, since I know how difficult it is to make that transition from high school math to university level mathematics.)

A common suggestion/request was to simply post the course materials online and let students access them according to their own schedules, much like Khan Academy. This raises a lot of issues about the nature of learning and the role MOOCs can (might? should?) play. But this blog post has already gone on long enough, so I’ll take up that issue next time.

To be continued …


I'm Dr. Keith Devlin, a mathematician at Stanford University. I gave my first free, open, online math course in fall 2012, and have been offering it twice a year since then. This blog chronicles my experiences as they happen.

Twitter Updates

New Book 2012

New book 2011

New e-book 2011

New book 2011

March 2023
M T W T F S S
 12345
6789101112
13141516171819
20212223242526
2728293031  

%d bloggers like this: