## Posts Tagged 'evaluation rubric'

### Do all Americans have the same age?

With a special Bonus Feature: A correct proof of Fermat’s Last Theorem that fits in a Tweet.

When the students in my Introduction to Mathematical Thinking MOOC encounter a difficulty with an assignment problem, many of them take to the course Discussion Forum to discuss it. By far the longest single thread in the course was for Problem Set 6, Question 5, a couple of weeks ago, which grew rapidly to 193 original student posts, garnering 1,051 views.

The mathematical topic was proofs by mathematical induction. I had given an example in the video-lecture, and then presented the students with a number of purported induction proofs to evaluate according to the course rubric. (See the previous post in this blog for background on the course structure and its rationale, together with a link to the rubric.)

PS6, Q5 presented them with a purported induction proof that in any finite group of Americans, everyone has the same age (and hence all Americans have the same age). Clearly, this is a ludicrously false claim.

The argument I gave in support of the statement was 19 lines long. Each line comprised a single, fairly simple statement. The lines were numbered. The students’ task was to locate the first line where the proof broke down.

The question had a clear and unambiguous correct answer. The logical chain held up for a certain number of steps, and then the logic failed. But I had constructed the argument with the deliberate intent of making the identification of that failure line a tricky task. (You will find variants of this problem all over the Web. I made it particularly fiendish.)

And fiendish is how the students found it. In fact, only 1 in 5 (exactly 20%) got it right. One other (incorrect) line was chosen by slightly more students (23%), while other lines selected ranged widely over many of the lines. Indeed, there were only two lines of the total 19 that no one selected.

Many interesting points were raised and debated – in many cases in heated fashion – in the ensuing forum discussion. For an online course focused on group discussion, this was easily one of the most successful problems I gave them, with learning taking place on many levels.

One of the meta lessons I wanted this particular exercise to provide was the realization that there is a lot more to proofs than whether they are right or wrong. (See the companion post to this in my profkeithdevlin.org blog for a lot more on what role proofs play in mathematics.) The argument I had constructed was, with one subtly positioned logical slip, entirely correct. 18 of the 19 lines are fine. Yet, the claim purportedly being proved in so absurd, in a very real sense the entire argument must be nonsense from the getgo. And so it is.

The widespread belief that proofs are primarily about right and wrong is the argumentation analog of the equally widely held belief that mathematics is about “answer getting” that I discussed in my recent post on Devlin’s Angle for the Mathematical Association of America. (Yes, that makes three Devlin blogs. Everybody has a blog these days. If you want to stand out from the crowd, you need two or more.)

Both beliefs – math is all about answer getting and proofs are all about truth – are, I believe, a consequence of the way mathematics and proofs are presented in our K-12 system. What is taught is so unrepresentative of mathematics as practiced by professional mathematicians, there surely has to be an explanation.

Presumably, the perception that mathematics is about answer getting came about in the days before we had calculators and computers, when (accurate) answer getting was an important part of a useful mathematics education. Its continued survival well into the digital age can probably be ascribed to systemic inertia (of which there is no lack in the world of education), with the additional incentive that right/wrong questions are extremely easy to grade (by machine, if you are an administrator who prefers to buy equipment than pay teachers)!

In contrast, evaluating mathematical thinking and problem solving is much more difficult and requires a lot of time on the part of a skilled teacher.

Similarly, for the simple kinds of proofs encountered in high school, determining whether an argument is correct or not is usually easy, but evaluating it as a proof is much more difficult and requires a lot more skill and experience – as the students in my MOOC have been discovering to their continued great frustration.

The idea that proofs are primarily about truth and correctness is very ingrained. When presented with an argument that is extremely well crafted but has an obvious flaw (so this clearly does not include my Americans’ age example), many students find it hard to evaluate the overall structure of the argument. Yet proofs are all about structure. As I keep emphasizing, to my MOOC students and anyone else who is willing to listen, in effect, proofs are stories mathematicians tell to convince the intended recipient that a certain statement is true.

If you forget that, and focus entirely, or even almost entirely, on logical validity, you end up with absurdities like my example of a logically correct proof of Fermat’s Last Theorem so small it will fit into a Tweet, let alone the margin of a book:

Thanks to some work by Andrew Wiles and Richard Taylor, that tweeted argument is logically correct. Every statement follows logically from the preceding part of the argument. If you want to fault it, you have to examine the structure, pointing out that there are some steps missed out that the intended reader may not be able to reconstruct, especially as there are no reasons given. (See here and here for the missing bits.)

The fact is not that logical correctness is not important. It’s that its importance is only in the context of many other features proofs need to have in order to function as intended.

What features? Well, for starters, how about the features of proofs I list in the rubric for my MOOC?

I’ll tell you one thing. Andrew Wiles would not have had his paper accepted for publication if he had not addressed all the points on that rubric!

No, Wiles did not take my course before proving his famous result. The flow is the other way round. I formulated the rubric to try to identify some of the factors professional mathematicians like Wiles make tacit use of all the time when writing up proofs for publication. You would not believe the objection many people have to a rubric that tries to make that skill set available.

And I’m not talking about the strange folks who post “it’s the end of civilized life as we know it” commentaries on the Drexel Math Forum (cc-ing me directly, because they suspect, rightly, that I don’t frequent the site). Many of the good folks who voluntarily spend ten weeks struggling through my MOOC object as well. And not a few of them indicate in Forum posts where they learned to put so much emphasis on logical correctness. A fictional composite of a fair number of posts I’ve seen over the five runs of my MOOC runs thus: When I was at university, if there was a logical error in my proof, the professor would award zero points.

As a mathematician who knows how f-ing hard it can be to prove an original result, reading those kinds of comments fills me with more dismay that you can possibly imagine.

To end on a positive note, at least you have now seen a concise, but correct proof of Fermat’s Last Theorem.

### How is it going this time?

My Mathematical Thinking MOOC is now starting its ninth week out of a possible ten. (The last two weeks are optional, for those wanting to get more heavily involved in the mathematics.)

At the start of the week, registrations were at 38,221, of whom 24,342 had visited the site at least once, with 2,818 logging on in the previous week. But none of those numbers is significant – by which I mean significant in terms of the course I am offering. (People drop in on MOOCs for a variety of reasons besides taking the course.)

The figure of most interest to me is the number of students who completed and submitted the weekly Problem Set. In my sense, those are the real course students. As of last week, they numbered 1,013, and all of them will almost certainly complete the course. That is a big class. The undergraduate class I taught at Princeton this past spring (using my MOOC as one of several resources) had just 9 students.

My MOOC has two main themes: understanding how mathematicians abstract formal counterparts to everyday notions, and how they make use of those abstractions to extend our cognitive understanding of our world.

For much of the time the focus is on language, since that is the mechanism used to formulate and define abstract concepts and prove results about them.

The heavy focus on language and its use in reasoning gives the course appeal to two different kinds of students: those looking to investigate some issues of language use and sharpen their reasoning skills, and those wanting to develop their analytic problem solving skills for mathematics, science, or engineering. (The latter are the ones who typically do the optional final two weeks of the course.)

The pedagogy underlying the course is Inquiry-Based Learning.

To make that approach work in a MOOC, where many students have no opportunity to interact directly with a mathematics expert, I have to design the course in a way that encourages interaction with other students, either on the course Discussion Forum on the course website or using social media or local meetings.

Early in the course, I identify a few students whose Forum posts indicate good metacognitive skills and appoint them “Community Teaching Assistants”. A badge against their name then tells other students that it is worthwhile paying attention to their posts. The CTAs, there are currently thirteen of them, and I also have a back-channel discussion forum to discuss any problematic issues before posting on the public channel.

It seems to work acceptably well. To date, there have been over 3,700 original posts (from 957 students) and 3,639 response comments on the course Discussion Forum.

Since the only practical form of regular performance evaluation in a MOOC involves machine grading – which boils down to some form of multiple choice questions – it’s not possible to ask students to construct mathematical proofs. The process is far too creative.

Instead, I ask them to evaluate proofs (more precisely, purported proofs). To help them do this, I provide a five-point rubric that requires them to view each argument from different perspectives, assigning a “grade” on a five-point numerical scale. See here for the current version of the evaluation rubric.

Notice that the rubric has a sixth category, where they have to summarize their five individual-category evaluations into a single, overall “grade” on the same five-point scale. How they perform the aggregation is up to them. The overall goal is to help the students come to appreciate the different features of proofs, as used in present-day mathematics. The rubric asks them first to look at the proof from the five different perspectives, then integrate those assessments into a single evaluation.

After the students have completed an evaluation of a purported proof, their (numerical) evaluations are machine graded (more about this in a moment), after which they view a video of me evaluating the same proof so they can compare their assessment to one expert.

The goal in comparing their evaluation to mine is not to learn to assign numerical evaluation marks the way I do. For one thing, evaluation of proofs is a very subjective, holistic thing. For another, having been evaluating proofs by both students and experts for many decades, I have achieved a level of expertise that no beginner could hope to match. Moreover, I almost never evaluate using a rubric.

Rather, the point of the exercise is to help the students come to understand what makes an argument (1) a proof, and (2) a good proof, by examining it from different perspectives. (For a discussion of the approach to proofs I take, see my most recent post on my other blog, profkeithdevlin.org.)

To facilitate this, the entire process is set up as a game with rules. (Of course, that is true for any organized educational process, but in the case of my MOOC the course design is strongly influenced by video games – see many of the previous posts in that blog for more on game-based learning, starting here.)

In particular, the points they are awarded (by machine grading) for how close they get to my numerical proof-evaluation score are, like all the points the Coursera platform gives out in my course, very much like the points awarded in a typical video game. They are important in the moment, but have no external significance. In particular, success in the course and the award of a certificate does not depend on a student’s points total. My course offers a learning experience, not a paper qualification. (The certificate attests that they had that experience.)

Overall, I’ve been pleased with the results of this way to handle mathematical argumentation in a MOOC. But it is not without difficulties. I’ll say more in my next post, where I will describe some of the observations I have made so far.

Stay tuned…

### Evaluation rubrics: the good, the bad, and the ugly

A real-time chronicle of a seasoned professor just about to launch the third edition of his massively open online course.

With the third session of my MOOC Introduction to Mathematical Thinking starting on September 2, I am busy putting the final touches to the course materials. As I did when I offered the second session earlier this year, I have made some changes to the way the course is structured. The underlying content remains the same, however – indeed at heart it has not changed since I first began teaching a high school to university “transition” course back in the late 1970s, when I was a young university lecturer just starting out on my career.

With the primary focus on helping students develop an new way of thinking, the course was always very light on “content” but high on internal reflection. A typical assignment question might require four or five minutes to write out the answer; but getting to the point where that is possible might take the student several hours of thought, sometimes days. Students who approach the course thinking it is an introductory course on logic – some of whom likely will, as they have in the past,  post on the course forum that they cannot understand why I am proceeding so slowly and making such heavy weather of the material – will, if they don’t walk away in disgust, eventually (by about week four) realize they are completely lost. Habituated to courses that rush through a pile of material that required mostly procedural mastery, they find it challenging, and in many cases impossible, to slow down and adopt the questioning, reflective approach this course requires.

My course uses elementary linguistics and formal logic as a vehicle to help develop new thinking skills that are essential for university mathematics majors, very valuable for STEM majors, and of considerable value for anyone who wants to lead a more rewarding life. But it is definitely not a course in linguistics or logic. It is about thinking.

Starting with an analysis of certain features of ordinary language, as I do, provides a starting point that is accessible to everyone – though because the language I examine is English, students for whom that is a second language are at a disadvantage. That is unavoidable. (A Spanish language version, embedded in Hispanic culture, is currently under development. I hope other deep translations follow.)

And formal logic is so simple and structured, and so accessible to a beginner, that it too is well suited to an introductory level course on analytic, and in particular mathematical, thinking.

Why my course videos are longer than most

The imperative of a student devoting substantial periods of time engaged in sustained contemplation of the course material has led to me making two decisions that go against the current grain in MOOCs. First, the pace is slow. I speak far more slowly than I normally do, and I repeat each point at least once, and often more so. Second, I do not break my “lectures” into the now-almost-obligatory no-longer-than-seven-and-ideally-under-three-minutes snippets. For the course’s second running, I did split the later hour or more long videos into half-hour sections, but that was to make it easier for students without fast broadband access, who have to download the videos overnight to watch them.

Of course, students can speed up or slow down the videos, they can watch them as many times as they want, and they can stop and start them to suit their schedules. But then they are in control and make those decisions based on their own progress and understanding. My course does not come pre-digested. It is slow cooking, not fast food.

Learning by evaluation

The main difference returning students will notice in the new session is the much greater emphasis on developing evaluation skills. Fairy early in the course, students will be presented with purported mathematical proofs that they have to evaluate according to a grading rubric.

At first these will be fairly short arguments, designed by me to illustrate various key features of proofs, and often incorporating common mistakes beginners make. Later on, the complexity increases. For those students who elect to take the final exam (and thereby become eligible to earn a Distinction grade for the course), evaluation will culminate in grading three randomly assigned, anonymized exam submissions from fellow students, followed by grading their own submission.

Peer evaluation is essential in MOOCs that involve work that cannot be machine graded, definitely the category into which my Mathematical Thinking course falls. The method I use for the Final Exam is called Calibrated Peer Review. It has a long history and proven acceptable results. (I describe it in some detail on my MOOC course website – accessible to anyone who signs up for the course.) So adopting peer evaluation for my course was unavoidable.

The first time I offered the course, I delayed peer evaluation until the final couple of weeks, when it was restricted to the final exam. Though things went better than I had feared, there were problems. The main issues, which came as no surprise, were, first, that many students felt very uneasy grading the work of others, second, many of them did not do a good job, and third, the rubric (which I had taken off another university’s Internet shelf) did not work at all well.

On the other hand, many students posted forum comments saying they found they enjoyed that part of the course, and learned more in those final two weeks than in the entire earlier part of the course.

I had in fact expected this would be the case, and had told the class early on that many of them would have that reaction. In particular, evaluating the work of fellow students is a very powerful, known way to learn new material. Nevertheless, it came as a great relief when this actually transpired.

As a result of my experience in the first session, when I gave the course a second time this spring, I increased the number of assignment exercises that required students to evaluate purported proofs. I also altered the rubric to make it better suited to what I see as the main points in the course.

The outcome, as far as I could ascertain from reading the comments student posted on the course discussion forum, was that it went much better. But it was still far from perfect. The two main issues were the rubric itself and how to use it.

Designing a rubric

Designing a good rubric is not at all easy for any course, and I think particularly challenging for a course on more advanced parts of mathematics. Qualitative grading of mathematical arguments, like grading essays or works of art, is a holistic skill that takes years to acquire to a degree it can be used to evaluate performance with some degree of reliability. A beginner attempting evaluation needs guidance, most typically provided by an evaluation rubric. The idea is to replace the holistic application of a lifetime’s acquisition of tacit domain knowledge with a number of categories that the evaluator should look for.

The more fine-grained the rubric, the easier it will be for the novice evaluator, but the more onerous the grading task becomes. The rubric I started with for my course had six factors, which I felt was about right – enough to make the task doable for the student yet not too many to turn it into a dull chore. I have retained that number. But, based on the experiences of students using the rubric, I changed several categories the first time I repeated the course and I have changed one category for the upcoming third session.

In each of the six categories in the rubric, the student must chose between three levels, which I name Novice, Apprentice, and Practitioner. I chose the names to emphasize that we are using evaluation as a way to learn, and the focus is to measure progress along a path of development, not assign summative performance judgments of “poor”, “okay”, and “good”.

The intention in having just three levels is to force a student evaluator to make a decision about the work being assessed. But this can be particularly difficult for a beginner who is, of course, lacking in confidence in their ability to do that. To counter that, in this third session, when the student enters the numerical value that course software will use to track progress, the numerical equivalents to those three categories are not 0, 1, 2, but 0, 2, and 4. The student can enter 1 or 3 as a “middle value” if they are undecided as to which category to assign.

Using the rubric

Even with “middling” grades available for the rubric items, most students will find the evaluation process difficult and very time consuming. A rubric simply breaks a single evaluation task into a number of smaller evaluation tasks, six in my case. In so doing, it guides the student as to what things to look for, but the student still has to make qualitative judgments within each of the categories.

To help them make these judgments, the last time I gave the course, I provided them with tutorial videos that take them through the grading process. I record myself grading the same sample arguments that they have just attempted to evaluate, verbalizing my thinking process as I go, explaining why I make the calls I do. They are not the most riveting of videos, and they can be a bit long (ten minutes for some assignment questions). But I don’t know of any other way of conveying something of the expertise I have built up over a lifetime. It is essentially a modern implementation of the age-old apprentice system of acquiring tacit knowledge by working alongside the expert.

Unfortunately, as an expert, I make calls based on important distinctions that for me jump from the student’s page, but are not even remotely apparent to a beginner. The result last time was, for some questions, considerable frustration on the part of the students.

To try to mitigate this problem (I don’t think it can be eliminated), I changed some aspects of the way the rubric is formulated and described, and decided to introduce the entire evaluation notion much earlier in the course. The result is that evaluation is now a very central component of the course. Indeed, evaluating mathematical arguments now plays a role equal to constructing them.

If it goes well – and based on my previous experience with this course, I think it will go better than last time – I will almost certainly adopt a similar approach if and when I give the course in a traditional classroom setting once again. (A heavy travel schedule associated with running a research lab means I have not taught a regular undergraduate class for several years now, though an attractive offer to spend a term at Princeton early next year will give me a much welcomed opportunity to spend some time in the classroom once again.)

Evaluating to learn, not to grade

One feature of a MOOC – or at least a MOOC like mine that does not offer college credit – is that the focus is on learning, not acquiring a credential. Thus, grading can be used entirely for formative purposes, as a guide to progress, not to provide a summative measure of achievement. As an instructor, I find the separation of the teaching and the grading extremely freeing. For one thing, with the assignment of grades out of the picture, the relationship between teacher and student is changed significantly. Also, it means numerical grades can be used as useful indicators of progress. A grade of 35% can be given for a piece of work annotated as “good” (i.e., good for someone taking an introductory course for the first time). The number indicates how much improvement would be required to take the student to the level of an expert practitioner.

To be sure, students who encounter this use of grades for the first time find it takes some getting used to. They are so habituated to the (nonsensical but widespread) notion that anything less than an A is a “failure” that they can be very discouraged when their work earns them a “mere” 35%. But in order to function as a school-to-university transition course, it has to help them adjust to a world where 35% if often a respectable passing grade.

(A student who regularly scores in the 90% range in advanced undergraduate mathematics courses can likely jump straight into a Ph.D. program – and some have done just that. 35% really can be a good result for a beginner.)

One final point about peer evaluation is an issue I encountered last time that surprised me, though perhaps it should not have, given everything I know about a lot of high school mathematics instruction. Many students approached grading the work of others as a punitive process of looking to deduct points. Some went so far as to complain (sometimes angrily) on the discussion forums about my video-streamed grading as being far too lenient.

In fact, one or two even held the view that if a mathematical argument was not logically correct, the only possible grade to give was 0. This particular perspective worried me on two counts.

Firstly, it assumes a degree of logical infallibility that no living mathematician possesses. I doubt there is a single published mathematical proof of more than a few paragraphs that does not include some minor logical slips, and hence is technically incorrect. (Most of the geometric proofs in Euclid’s Elements would score 0 if logical correctness were the sole metric!)

Second, my course is not a mathematics course, it is about mathematical thinking, and has the clearly stated aim of looking at the many different aspects of mathematical arguments required to make them “good.” Logical correctness is just one item on that six-point rubric. As a result, at most 4 of the possible 24 points available can be deducted in an argument is logically incorrect. (Actually, 8 can be deducted, as the final category is “Overall assessment”, designed to encourage precisely what the phrase suggest.)

To be sure, if my course were a mathematics course, I would assign greater weight to logical correctness. As it is, all six categories carry equal weight. But that is deliberate. Most of my students’ entire mathematical education has been in a world where “getting the right answer” is the holy grail. One other objective of transition courses is to break them of that debilitating default assumption.

Finally, and remember, this is for posterity, so be honest. How do you feel?

I’ve written elsewhere that I think MOOCs as such will not be the cause of a revolution in higher education. Rather they are just part of what is more like to be an evolution, though a major one to be sure. From the point of view of an instructor, though, they are providing us with a wonderful domain to re-examine all of our assumptions about how to teach and how students learn. As you can surely tell, I continue to have a blast in the MOOCasphere.

To be continued …

I'm Dr. Keith Devlin, a mathematician at Stanford University. I gave my first free, open, online math course in fall 2012, and have been offering it twice a year since then. This blog chronicles my experiences as they happen.