Friday, April 10, 2009

Assessing Student Learning

Assessing Student Learning
Overview

What are instructional objectives and how are they used? Setting out objectives at the beginning of a course is an essential step in providing a framework into which individual lessons will fit. Without such a framework it is easy to wander off the track, to spend too much time on topics that are not central to the course. An instructional objective, sometimes called a behavioral objective, is a statement of skills or concepts that students are expected to know at the end of some period of instruction.

Typically, an instructional objective is stated in such a way as to make clear how the objective will be measured. In practice, the skeleton of a behavioral objective is condition—performance— criterion. Instructional objectives must be adapted to the subject matter being taught (Hamilton, 1985). When students must learn well-defined skills or information with a single right answer, specific instructional objectives should be written as follows:

1. Given 10 problems involving addition of two fractions with like denominators, students will solve at least 9 correctly.

2. Given 10 sentences lacking verbs, students will correctly choose verbs that agree in number in at least 8 sentences. Examples: My cat and I birthdays in May. Each of us [want, wants] to go to college.

3. Given a 4-meter rope attached to the ceiling, students will be able to climb to the top in less than 20 seconds.

Instructional objectives should be specific enough to be meaningful. For example, an objective concerning immigrants might be written as follows: Students will develop a full appreciation for the diversity of peoples who have contributed to the development of U.S. society.

In planning lessons, it is important to consider the skills required in the tasks to be taught or assigned. For example, a teacher might ask students to use the school library to write a brief report on a topic of interest. The task seems straightforward enough, but consider the separate skills involved:

* Knowing alphabetical order
* Using the card catalog to find books on a subject
* Using a book index to find information on a topic
* Getting the main idea from expository material
* Planning or outlining a brief report
* Writing expository paragraphs
* Knowing language mechanics skills (such as capitalization, punctuation, and usage)

These skills could themselves be broken down into subskills. The teacher must be aware of the subskills involved in any learning task to be certain that students know what they need to know to succeed. Before assigning the library report task, the teacher would need to be sure that students knew how to use the card catalog and book indexes, among other things, and could comprehend and write expository material.

The process of breaking tasks or objectives down into their simpler components is called task analysis. In planning a lesson, a three-step process for task analysis may be used:

1. Identify prerequisite skills. What should students already know before you teach the lesson? For example, for a lesson on long division, students must know their subtraction, multiplication, and division facts and must be able to subtract and multiply with renaming.

2. Identify component skills. In the actual lesson, what subskills must students be taught before they can learn to achieve the larger objective? To return to the long-division example, students will need to learn estimating, dividing, multiplying, subtracting, checking, bringing down the next digit, and then repeating the process. Each of these steps must be planned for, taught, and assessed during the lesson.

3. Plan how component skills will be assembled into the final skill. The final step in task analysis is to assemble the subskills back into the complete process being taught. For example, students might be able to estimate, to divide, and to multiply, but this does not necessarily mean that they can do long division. The sub-skills must be integrated into a complete process that students can understand and practice.

Because instructional objectives are stated in terms of how they will be measured, it is clear that objectives are closely linked to assessment. An assessment is any measure of the degree to which students have learned the objectives set out for them. Most assessments in schools are tests or quizzes, or informal verbal assessments such as questions in class. One critical principle of assessment is that assessments and objectives must be clearly linked. Students learn some proportion of what they are taught; the greater the overlap between what was taught and what is tested, the better students will score on the test and the more accurately any need for additional instruction can be determined.

BLOOM’S TAXONOMY In 1956, Benjamin Bloom and some fellow researchers published a taxonomy of educational objectives that has been extremely influential in the research and practice of education ever since. Bloom and his colleagues categorized objectives from simple to complex or from factual to conceptual. The key elements of what is commonly called Bloom’s taxonomy.

1. Knowledge (recalling information): The lowest level of objectives in Bloom’s hierarchy, knowledge refers to objectives such as memorizing math facts or formulas, scientific principles, or verb conjugations.

2. Comprehension (translating, interpreting, or extrapolating information): Comprehension objectives require that students show an understanding of information as well as the ability to use it. Examples include interpreting the meaning of a diagram, graph, or parable; inferring the principle underlying a science experiment; and predicting what might happen next in a story.

3. Application (using principles or abstractions to solve novel or real-life problems): Application objectives require students to use knowledge or principles to solve practical problems. Examples include using geometric principles to figure Out how many gallons of water to put into a swimming pool of given dimensions and using knowledge of the relationship between temperature and pressure to explain why a balloon is larger on a hot day than on a cold day.

4. Analysis (breaking down complex information or ideas into simpler parts to understand how the parts relate or are organized): Analysis objectives involve having students see the underlying structure of complex information or ideas. Examples of analysis objectives include contrasting schooling in the United States with education in Japan, understanding how the functions of the carburetor and distributor are related in an automobile engine, and identifying the main idea of a short story.

5. Synthesis (creation of something that did not exist before): Synthesis objectives involve using skills to create completely new products. Examples include writing a composition, deriving a mathematical rule, designing a science experiment to solve a problem, and making up a new sentence in a foreign language.

6. Evaluation (judging something against a given standard): Evaluation objectives require making value judgments against some criterion or standard. For example, students might be asked to compare the strengths and weaknesses of two home computers in terms of flexibility, power, and available software.

The primary importance of Bloom’s taxonomy is in its reminder that we want students to have many levels of skills. All too often, teachers focus on measurable knowledge and comprehension objectives and forget that students cannot be considered proficient in many skills until they can apply or synthesize those skills. On the other side of the coin, some teachers fail to make certain that students are well rooted in the basics before heading off into higher-order objectives.

Learning facts and skills is not the only important goal of instruction. Sometimes the feelings that students have about a subject or about their own skills are at least as important as how much information they learn. Instructional goals related to attitudes and values are called affective objectives. Many people would argue that a principal purpose of a U.S. history or civics course is to promote values of patriotism and civic responsibility, and one purpose of any mathematics course is to give students confidence in their ability to use mathematics. In planning instruction, it is important to consider affective as well as cognitive objectives.

Why is evaluation important? Evaluation, or assessment, refers to all the means used in schools to formally measure student performance (Weber, 1999; Wiggins, 1999). These include quizzes and tests, written evaluations, and grades. Student evaluation usually focuses on academic achievement, but many schools also assess behaviors and attitudes.

Student evaluations serve six primary purposes:

1. Feedback to students
2. Feedback to teachers
3. Information to parents
4. Information for selection and certification
5. Information for accountability
6. Incentives to increase student effort

Students need to know the results of their efforts. Regular evaluation gives them feedback on their strengths and weaknesses. To be useful as feedback, evaluations should be as specific as possible. For example, Cross and Cross (1980/81) found that students who received written feedback in addition to letter grades were more likely than other students to believe that their efforts, rather than luck or other external factors, determined their success in school. One of the most important (and often overlooked) functions of evaluating student learning is to provide feedback to teachers on the effectiveness of their instruction.

A report card is called a report card because it reports information on student progress. This reporting function of evaluation is important for several reasons. First, routine school evaluations of many kinds (test scores, stars, and certificates as well as report card grades) keep parents informed about their children’s schoolwork. For example, if a student’s grades are dropping, the parents might know why and might be able to help the student get back on track. Second, grades and other evaluations set up informal home based reinforcement systems.

Often, evaluations of students serve as data for the evaluation of teachers, schools, districts, or even states. Every state has some form of statewide testing program that allows the states to rank every school in terms of student performance. In addition to state tests, school districts often use tests for similar purposes (for example, in grades not tested by the state). These test scores are also often used in evaluations of principals, teachers, and superintendents. Consequently, these tests are taken very seriously.

One important use of evaluations is to motivate students to give their best efforts. In essence, high grades, stars, and prizes are given as rewards for good work. Students value grades and prizes primarily because their parents value them. Some high school students also value grades because they are important for getting into selective colleges.

How is student learning evaluated? To understand how assessments can be used most effectively in classroom instruction, it is important to know the differences between formative and summative evaluation and between norm-referenced and criterion-referenced interpretation. The distinction between formative and summative evaluations was explained in the discussion of mastery learning in Chapter 9, but this distinction also applies to a broader range of evaluation issues. Essentially, a formative evaluation asks, “How well are you doing and how can you be doing better?” A summative evaluation asks, “How well did you do?” Formative, or diagnostic, tests are given to discover strengths and weaknesses in learning and to make midcourse corrections in pace or content of instruction. Formative evaluations might even be made “on the fly” during instruction through oral or brief written learning probes. In contrast, summative evaluation refers to tests of student knowledge at the end of instructional units (such as final exams). Summative evaluations may or may not be frequent, but they must be reliable and (in general) should allow for comparisons among students. Summative evaluations should also be closely tied to formative evaluations and to course objectives.

Norm-referenced interpretations focus on comparisons of a student’s scores with those of other students. Within a classroom, for example, grades commonly are used to give teachers an idea of how well a student has performed in comparison with classmates. A student might also have a grade-level or school rank; and in standardized testing, student scores might be compared with those of a nationally representative norm group.

Criterion-referenced interpretations focus on assessing students’ mastery of specific skills, regardless of how other students did on the same skills. Criterion-referenced evaluations are best if they are closely tied to specific objectives or well-specified domains of the curriculum being taught.

At a minimum, two types of evaluation should be used: one directed at providing incentive and feedback and the other directed at ranking individual students relative to the larger group. Traditional grades are often inadequate as incentives to encourage students to give their best efforts and as feedback to teachers and students. The principal problem is that grades are given too infrequently, are too far removed in time from student performance, and are poorly tied to specific student behaviors.

Comparative evaluations are traditionally provided by grades and by standardized tests. Unlike incentive/feedback evaluations, comparative evaluations need not be conducted frequently. Rather, the emphasis in comparative evaluations must be on fair, unbiased, reliable assessment of student performance.

How are tests constructed? Achievement tests should measure clearly defined learning objectives that are in harmony with instructional objectives. Perhaps the most important principle of achievement testing is that the tests should correspond with the course objectives and with the instruction that is actually provided. Achievement tests should be as reliable as possible and should be interpreted with caution. A test is reliable to the degree that students who were tested a second time would fall in the same rank order. In general, writers of achievement tests increase reliability by using relatively large numbers of items and by using few items that almost all students get right or that almost all students miss.

The first step in the test development process is to decide which concept domains the test will measure and how many test items will be allocated to each concept. Grorilund (2000) and Bloom, Hastings, and Madaus (1971) suggest that teachers make up a table of specifications for each instructional unit listing the various objectives taught and different levels of understanding to be assessed. The levels of understanding might correspond to Bloom’s taxonomy of educational objectives. The table of specifications varies for each type of course and is nearly identical to behavior content matrixes, discussed earlier in this chapter. This is as it should be; a behavior content matrix is used to lay out objectives for a course, and the table of specifications tests those objectives.

Test items that can be scored correct or incorrect without the need for interpretation are referred to as selected-response items. Multiple-choice, true-false, and matching items are the most common forms. Note that the correct answer appears on the test and the student’s task is to select it. There is no ambiguity about whether the student has or has not selected the correct answer.

Considered by some educators to be the most useful and flexible of all test forms, multiple-choice items can be used in tests for most school subjects. True—false items can be seen as one form of multiple-choice. The main drawback of true—false items is that students have a 50 percent chance of guessing correctly. But in compensation, students can respond to them more quickly and so they can cover a broad range of content efficiently. Matching items are commonly presented in the form of two lists, say A and B. For each item in list A, the student has to select one item in list B. The basis for choosing must be clearly explained in the directions.

Constructed-response items require the student to supply rather than to select the answer. They also usually require some degree of judgment in scoring. The simplest form is fill-in-the-blank items, which can often be written to reduce or eliminate ambiguity in scoring.

Short essay questions allow students to respond in their own words. The most common form for a short essay item includes a question for the student to answer. The answer may range from a sentence or two to a page of, say, 100 to 150 words. A long essay item requires more length and more time, allowing greater opportunity for students to demonstrate organization and development of ideas. Although they differ in length, the methods available to write and score them are similar. An essay item should contain specific information that students are to address. Some teachers are reluctant to name the particulars that they wish the student to discuss, because they believe that supplying a word or phrase in the instructions is giving away too much information.

Essay items have a number of advantages in addition to letting students state ideas in their own words. Essay items are not susceptible to correct guesses. They can be used to measure creative abilities, such as writing talent or imagination in constructing hypothetical events. Essay items might require students to combine several concepts in their response. They can assess organization and fluency.

On the negative side is the problem of reliability in scoring essay responses. Some studies demonstrate that independent marking of the same essay response by several teachers results in appraisals ranging from excellent to a failing grade. This gross difference in evaluations indicates a wide range of marking criteria and standards among teachers of similar backgrounds.

A problem-solving assessment requires students to organize, select, and apply complex procedures that have at least several important steps or components. As in evaluating short essay items, you should begin your preparation for appraising problem-solving responses by writing either a model response or, perhaps more practically, an outline of the essential components or procedures that are involved in problem solving. As with essays, problem-solving responses may take several different yet valid approaches. The outline must be flexible enough to accommodate all valid possibilities.

What are authentic, portfolio, and performance assessments? After much criticism of standardized testing, critics have developed and implemented alternative assessment systems that are designed to avoid the problems of typical multiple-choice tests. The key idea behind the testing alternatives is that students should be asked to document their learning or demonstrate that they can actually do something real with the information and skills they have learned in school.

One goal of these “alternative assessments” is to demonstrate achievement in realistic contexts. In reading, for example, the authentic assessment movement has led to the development of tests in which students are asked to read and interpret longer sections and show their metacognitive awareness of reading strategies.

One popular form of alternative assessment is called portfolio assessment: the collection and evaluation of samples of student work over an extended period. Teachers may collect student compositions, projects, and other evidence of higher-order functioning and use this evidence to evaluate student progress over time. Portfolio assessment has important uses when teachers want to evaluate students for reports to parents or other within-school purposes. When combined with a consistent and public rubric, portfolios showing improvement over time can provide powerful evidence of change to parents and to students themselves.

Tests that involve actual demonstrations of knowledge or skills in real life are called performance assessments. For example, ninth-graders might be asked to conduct an oral history project, reading about a significant recent event and then interviewing the individuals involved. The quality of the oral histories, done over a period of weeks, would indicate the degree of the students’ mastery of the social studies concepts involved.

Performance assessments are far more expensive than traditional multiple-choice measures, but most experts and policy makers are coming to agree that the investment is worthwhile if it produces markedly better tests and therefore leads to better teaching and learning.

How are grades determined? Many sets of grading criteria exist, but regardless of the level of school that teachers teach in, they generally agree on the need to explain the meaning of grades they give. Grades should communicate at least the relative value of a student’s work in a class. They should also help students to understand better what is expected of them and how they might improve. Teachers and schools that use letter grades attach the following general meanings to the letters:

1. A = superior; exceptional; outstanding attainment
2. B = very good, but not superior; above average
3. C = competent, but not remarkable work or performance; average
4. D = minimum passing, but serious weaknesses are indicated; below average
5. F = failure to pass; serious weaknesses demonstrated

The criteria for giving letter grades might be specified by a school administration, but grading criteria are most often set by individual teachers using very broad guidelines. In practice, few teachers could get away with giving half their students A’s or with failing too many students; but between these two extremes, teachers may have considerable leeway.

ABSOLUTE GRADING STANDARDS Grades maybe given according to absolute or relative standards. Absolute grading standards might consist of preestablished percentage scores required for a given grade. In another form of absolute standards, called criterion-referenced grading, the teacher decides in advance what performances constitute outstanding (A), above-average (B), average (C), below-average (D), and inadequate (F) mastery of the instructional objective.

RELATIVE GRADING STANDARDS A relative grading standard exists whenever a teacher gives grades according to the students’ rank in their class or grade. The classic form of relative grading is specifying what percentage of students will be given A’s, B’s, and so on. A form of this practice is called grading on the curve, because students are given grades on the basis of their position on a predetermined distribution of scores. Strict grading on the curve and guidelines for numbers of A’s and B’s have been disappearing in recent years. For one thing, there has been a general grade inflation; more A’s and B’s are given now than in the past, and C is no longer the expected average grade but often indicates below-average performance.

SCORING RIJBRICS FOR PERFORMANCE GRADING A key requirement for the use of performance grading is collection of work samples from students that indicate their level of performance on a developmental sequence. Collecting and evaluating work that students are already doing in class (such as compositions, lab reports, or projects) is called portfolio assessment.

Several other approaches to grading are used in conjunction with innovative instructional approaches. In contract grading, students negotiate a particular amount of work or level of performance that they will achieve to receive a certain grade For example, a student might agree to complete five book reports of a given length in a marking period to receive an A. Mastery grading, an important part of mastery learning, involves establishing a standard of mastery, such as 80 or 90 percent correct on a test.

Most schools give report cards four or six times per year, that is, every 6 or 9 weeks. Report card grades are most often derived from some combination of the following factors:

1. Scores on quizzes and tests
2. Scores on papers and projects
3. Scores on homework
4. Scores on seatwork
5. Class participation (academic behaviors in class, answers to class questions, and so on)
6. Deportment (classroom behavior, tardiness, attitude)
7. Effort

One important principle in report card grading is that grades should never be a surprise. Students should always know how their grades will be computed, whether classwork and homework are included, and whether class participation and effort are taken into account. Being clear about standards for grading helps a teacher avert many complaints about unexpectedly low grades and, more important, lets students know exactly what they must do to improve their grades.

An important principle is that grades should be private. There is no need for students to know one another’s grades; making grades public only invites invidious comparisons among students. Finally, it is important to restate that grades are only one method of student evaluation. Written evaluations that add information can provide useful information to parents and students.

No comments:

Post a Comment