Introduction to Twenty Questions:
A Proposed Student Reaction to Instruction Form
UW Colleges

 

UWC Senate Professional Standards Committee
[Senators Angela Burger, Michael Collins, David Etzwiler, Shirley Hensch
Students: Sharla Pringle;  Chandra Amann, Craig Trost]
 

 

We recognize that student reactions to instruction provide some evidence of the multiple dimensions of instructional effectiveness, (see studies by Marsh, Feldman, Abrami, Kulik & McKeachie, Cashin). We also assumed faculty would appreciate a form in which specific questions and sets of questions have a solid foundation in research. We have chosen twenty questions which fit both criteria. The Colleges will be using the attached scantron form, which offers three write-in areas for comments on the reverse. An obvious question: what if a particular question does not apply to a specific course? Instructions will advise students to leave the alternatives blank if a question is not applicable for a particular course.
There is no perfect instrument, and the form is in a "take it or leave it" status. We are not open to amendments to questions at this point in time. The Senate will vote it up or down in its first meeting in October. If passed, it will be utilized in this fall semester. If rejected, then the Committee will want specific detail on what is to be changed, so that we can have an alternate form ready for discussion and decision by November.
In this report the Senate Professional Standards Committee provides an explanation or justification for the items we chose, citing our sources. Read it carefully. Just as important as choosing questions is suggesting the most appropriate use for the data collected. Following the detailed description of the individual items that the committee is proposing is an extended discussion of how the data from these items can be utilized in both summative (comparative, as in merit deliberations) and formative (individual analysis, as for promotion and tenure; diagnostic) reviews.
We decided to begin with queries about the student, partly to give "ownership." Then we move to questions about the course, and then to questions about the instructor. The final two questions relate back to the student. The alternatives range from Strongly Agree (5 pts), Agree, Uncertain, Disagree, Strongly Disagree (1 pt). This type of range is mandated by the scantron form. We preferred these alternatives to "Definitely true….definitely false". Note that we did have to revise our question form, with reluctance, to suit the options.

 

 

Student Reaction to Instruction in the UW Colleges

 
                                Questions                 Justification

 

1. I really wanted to take this course. 

 

This question taps their motivation, while eliminating problems associated with "required" or "elective." (Some courses may be required for an AA, or for a major; the dichotomy does not offer a reliable guide to the interest in or motivation of students. The critical issue is not whether it was required, but whether students had any interest in the subject matter.) As Cashin (1988, 3) says instructors are more likely to obtain higher global ratings in classes where students are motivated. See Marsh (1984, 1987). 

Whether students had a strong desire to take the course is an important variable. In Cashin and Downey's study (1992, 566-567) motivation correlated with the overall evaluation measure at .31, accounted for 9% of the variance, and was significant at the .0001 level. Instructors usually have little control over student motivation. 

 

2. I worked harder on this course than on most of my other college courses. This item plus "difficulty of course" give us two variables in measuring Marsh's factor of workload/difficulty (see extended discussion below). Marsh shows this measure loaded at .9 for his workload/difficulty factor (1994, 638; Marsh 1984). Cashin shows a positive correlation of .29 between this item and global instructor rating (1988, 4; Cashin & Slawson, 1977). This is a diagnostic measure for formative evaluation. Correlating this item with expected grade should give us useful information. 

 

3. Compared to my other college courses, I feel I learned a great deal in this course. Cashin and Downey’s 1992 study showed that a question on how much was learned was an excellent global indicator, correlating .83 with other measures of learning (facts, concepts, problem-solving, liberal education, creativity, expressiveness oral and written, etc., weighted in terms of their essentiality in the course). Marsh's Confirmatory Factor Analysis showed a .89 fit with the learning factor. Indeed, Marsh thought this question ought to be one of the three "global" questions used by committees in personnel decisions (1994, 646). Student learning is a measure of instructional effectiveness in studies by Cohen (1981) and Feldman (1989a). 

However, Cashin and Marsh point out that very few departments and universities use this measure. And Scriven introduced a limitation: "The best teaching is not that which produces the most learning, since what is learned may be worthless" (1981, 248). 

We include it as an important question, which would be of diagnostic use by faculty, departments and campus committees, but which has limited value for summative (merit) evaluations. We do include this in one adjusted score, which, again, would be most useful in formative evaluation. 

 

4. The text(s) and readings were useful in helping me learn. This question was discovered in the Iowa State University form, and retained because it provides useful information for instructor; definitely formative. 

 

 
 
5. The outside assignments (e.g., homework, reports, and projects) were useful in helping me learn. Also found in the Iowa form. Provides useful feedback. Formative. 

  

 

6. The in-class activities (e.g., discussions, small groups, and problems) were useful in helping me learn. 

 

Also from the Iowa form. Provides useful feedback. Formative. 
7. This course was one of the most difficult college courses I have taken. Marsh utilizes this question as one measure of "workload/difficulty" factor. (See q 17 comments for details.) The difficulty level of the course is often considered a key variable to control for bias in measuring effective teaching and was included by Cashin & Downey in their multiple composite measure. This question itself is useful as a diagnostic for formative evaluations. 

 

8. The tests, assignments and projects focused on the objectives of the course. This question fits into Marsh's "exams/grades" factor of SEEQ (Students' Evaluations of Educational Quality). It is similar to one of IDEA items asking if tests, etc., covered "important points" of the course. Provides useful diagnostic or formative evaluation. 

 

9. The instructor was fair and reasonable in grading exams and assignments. Students asked for this question, which is not found in the IDEA, Marsh, Iowa, etc. forms. A similar question was in the Hieser/Andrews "Son of Student" form. Students contend that their response to this question is not likely to be negative because of their own low grade, that they can separate general fairness from their own performance. It will be interesting to correlate this item with expected grade. Useful for formative, diagnostic feedback. 

 

10. The instructor and/or syllabus provided an adequate description of the course and its requirements. Students requested this item; the course should "have direction." Some faculty put basic information in the syllabus; others describe the course, often in the initial class periods, but some may periodically "sum and steer." This wording should suffice. This question fits within Marsh's "organization" factor (1994, 638). Note Scriven’s definition of good teaching included "the curriculum coverage and the teaching process are consistent with what has been promised." (1981, 248) Discovering if anything has been promised would be important for diagnostic, formative evaluation. 

 

 
 
11. In class the instructor provided material beyond that offered in the text or readings. Students wanted to know if the Instructor "taught the text," "supplemented the text" or "ignored the text."  This query was high on their list of concerns.  The response options (strongly agree...) do not permit the specific question.  We have introduced substitute wording.
12. The instructor usually seemed well-prepared for class. Initially proposed by Hieser/Andrews in "Son of Student," this question speaks to a basic expectation of students. Marsh uses this item to measure "organization," one of his critical factors.  
Useful as a diagnostic for the faculty member, and for formative reviews by departments and campus committees. 

 

13. The instructor  seemed enthusiastic about the subject matter. One of the basic items in multi-dimensional studies of effective teaching by several researchers including Marsh (1982, 1987), Feldman (1988, 1989), Abrami (1990). In Marsh's study enthusiasm and "expressiveness" loaded highest, in explaining the global rating of the instructor. (Just another way of saying that if an instructor is not interested or enthusiastic about the subject, it will be difficult to generate interest among the students. As Cashin says, "making the lecture interesting as well as informative helps students learn content....being expressive or interesting [is] related to an instructor's teaching effectiveness,...[is] not source of bias." (1988, 5) 

This measure is useful for diagnostic purposes by instructors and for formative evaluation. 

 

14. The instructor introduced stimulating ideas about the subject. In stepwise regression analysis to determine if there were better combinations of variables than those originally selected to measure teaching effectiveness, Cashin and Downey (1992, 566-567) discovered that this item accounted for 58% of the variance in measures of student learning. No other item, other than the global, explained more than 12% of the variance. Marsh's factor analysis shows this item loads on learning/value, enthusiasm, and workload of course. 

 This item is useful in formative evaluation, and gives the instructor useful feedback on student perception. Given its high correlation with student learning, it could be used as part of a score focusing on learning. 
 

 
 
15. The instructor spoke clearly and distinctly.  The committee vetoed questions on speaking in a monotone, giving dry and dull presentations, or speaking with expressiveness and variety (even though these are considered measures of enthusiasm according to Marsh, 1994, 638).  
    "Spoke clearly and distinctly" raised issues of accent.  The Committee rejected  "the instructor mumbled and was not audible."   Student and faculty members  felt strongly that the instructor should be understandable. 

Question 15 is diagnostic, for formative evaluation. 

 

16. The instructor was available to help with questions or homework outside of class. UWC stresses the openness to students, frequently citing the availability for help. Students want this question. We will instruct students to give no response if they never sought help outside of class. 

Diagnostic item for formative analysis. We have no data on how this item relates to "good instruction" or effective teaching, as it has not been included on IDEA, Marsh, etc. 

 

17. Overall, I consider this instructor to be an excellent teacher. Global question, essential for summative evaluation (merit). Cashin & Downey (1992, 566-7), using regression analysis, showed that over 50% of the variance of student learning measures, in a multi-dimensional study of over 17,000 classes from over 105 institutions of higher learning, were accounted for by this question (instructor), or the following (course). Marsh (1994, 642) showed that the correlation of global instructor was very high on five of his six factors of teacher effectiveness (learning .847, enthusiasm .837, organization .895, interaction .777, exams .528, difficulty .058.) 

 

18. Overall, I consider this to be an excellent course. Standard question for evaluation, although it is probably more useful for formative analysis by departments. The merit exercise does not "rate" courses but instructors.  Cashin & Downey (1992) showed high correlations between this item and their overall learning scale, and Marsh (1994) shows high correlations between this item and 5 of the 6 factors. 

 

19. I expect to get a grade of: A=A; B=B; C=C or Pass/Satisfactory; D=D; E=F or Fail/Unsatisfactory This is an obvious question, and the Committee hopes that it will be possible for faculty, departments and/or campus committees to analyze the link between grades and ratings. 

  

A 20x20 correlation matrix would be very useful. Studies by Feldman (1976), Howard & Maxwell (1982), Marsh (1984) and Cashin (1988) show a "low positive" correlation between expected grades and student ratings (.1 to .3).

 
 
20. I am completing college credits of: A=15 or less; B=16-30; C=31-45; D=46-60; E=60+. 

 

The Committee deliberated at length on this question, trying different formats. The Freshman/sophomore label was rejected as too broad. We had vigorous debate on credit categories, because very bright students often take 18 credits their first semester. However, 15 credits is the standard definition in most colleges of a semester load, as are the other breaks we wrote in. There is no perfect set of categories. 

Cashin (1988), Braskamp (1984) and Aleamoni (1981) showed higher level courses tend to receive higher ratings. The IDEA data shows a correlation of only .07. It is more likely that sophomores will take higher level courses than freshman. 

Note that Cashin's Summary of Research (1988, 3) reported that neither age, sex, level of student (Fr, Soph), GPA, nor personality scores were related to student ratings. 

 
 
Write-in Questions
Because the Scantron form must be printed, the Committee has reluctantly concluded that we have to offer common questions for write-in responses. We propose:
Write-in Area 1: What did you like best about the course or the instructor?
Write-in Area 2: What do you suggest might improve the course or the instruction?
Write-in Area 3: What else do you want the Department, the campus, and the instructor to know about this course or its instruction?
 
 
Presentation of data:
For each class, the n and percentage, plus the mean and standard deviation will be given for the row (the question).
In addition, in an effort to improve the quality of the response statistics, adjusted calculations will be offered for each class, one (offered in two forms) adjusts for student bias, studied extensively in the literature.  This adjusted score  might be useful for summative (comparative as in merit).  The other adjusts the global instructor score by two measures of "learning;" it stems from the research but is not a standard adjustment.  It might be useful for formative (diagnostic), as we test it out.   No instructor will have his global rating decline in the adjustment process, but some will have the rating increased more than others will. These adjustments have often been the topic of conversations among faculty, but were not possible to consider at all under the old essay form. Prior to adoption of the essay, the studies had not been made which would support any adjustment.
     
  1. Adjustment for Student bias
  2. The first measure adjusts the "global instructor" rating (q 17) by known biases in student ratings, and for this we offer two options. Marsh (1984) and Cashin (1988) define bias in terms of variables which are "not a function of the instructor's teaching effectiveness" and over which there may be little control.
    "Student motivation or class size may impact teaching effectiveness, but instructors should not be faulted if they are less effective teaching large classes of unmotivated students than their colleagues are with small classes of motivated students." (Cashin, 1988, 3)
    In 1985 Cashin called for studies of comparative data to determine bias. By 1992, Cashin and Downey had data from 105 institutions of higher learning, on 17,183 classes. They had determined 3 sources of bias largely outside a professor's control, but which nonetheless affected student ratings, and were significant at the .0001 level (1992, 566-567). These were Motivation of students, Difficulty of course, and Size of class. They checked and confirmed reliability of the items by utilizing the 23,488 class database (1992, 566). In stepwise regression analysis of 29,543 classes, the three control variables explained between 13- 18% of the variance, depending on which of 5 criteria were chosen as the dependent variable (Cashin, Downey and Sixbury, 1994, 652-53).
  3. Motivation of the student
    1. Students who do not want to take the course are likely to give lower ratings to the instructor, as shown by the research (Marsh 1984, 1987; Aleamoni, 1981; Feldman 1978; Cashin & Downey 1992). In the 1992 study, motivation had the highest R and R2 of the three items.
    2. All instructors get some points for motivation, but the larger points go to those classes with less-motivated students. These classes are harder to teach, and instructors are likely to receive lower ratings.
  4. Difficulty of the subject matter
    1. Multiple concerns resulted in inclusion of this item in the composite score. On the one hand, many studies do show a low (.22) but nonetheless positive correlation between difficulty and rating of the instructor (Marsh 1984, Cashin 1988). In 1994, in one of the largest studies made (29,543 classes), Marsh reported the correlation of global instructor with difficulty to be .058, important since the correlations with other factors was so much higher, ranging from .53 to .89 (642).
    2. On the other hand, grading leniency is one hypothesis for the positive correlation between expected grades and global instructor rating (Marsh 1984, Feldman 1976, Howard and Maxwell 1980, 1981). Another hypothesis is that students who learn more obtain higher grades and give higher ratings (meaning leniency is not always at fault). Another hypothesis is that high motivation leads to greater learning, higher grades, and higher ratings, thus not a factor of leniency. (See Cashin 1988 for review of literature.)
    3. The concerns about grading and grading leniency (under the instructor's control), which are known to impact the global instructor ranking, make the inclusion of "difficulty" a reasonable check measure. Cashin and Downey added it as a control variable in their 1992 study. Its correlation with an overall-learning score was low but was still significant at the .0001 level (566-567).
    4. Each instructor gets some points for difficulty, but the larger points go to those whose students judge this to be a course of greater difficulty.
  5. Size of the class
    1. Students give slightly higher ratings to instructors in small classes. Feldman (1984) found a weak inverse association of -.09. Cashin & Slawson in 1977 found a -.18, and Cashin & Downey in 1992 found a -.19 impact. Again, this is a minor item, but still significant at the .0001 level. We have modified the class-sizes found in the literature (1-14, 15-34, 39-99, 100+) to fit the conditions in the Colleges. The larger the class, the more fractional points added to the global rating:
      1. 12 or less .1
      2. 13-25 .2
      3. 26-50 .3
      4. 51 or more .4
  6. The adjustments are in fractions of a point: .5 not 5; .1 not 1.
  7.  
    Examples of Adjustment for Student Bias
    Using arbitrary global instructor (q 17) mean of 4.2:
Components: Global Instructor + Motivation + Difficulty of course, + size of class
  • Motivation: High = .1, Low = .5;
  • Difficulty: Easy = .1, Hard = .5;
  • Size: small (1-12)= .1; 13-25= .2; 26-50= .3; 51+ = .4.
  •  
     
    Instructor 4.2 , low student M, average difficulty, size 25 4.2+.5+.3+.2= 5.2
    Instructor 4.2, average M, average difficulty, size 25 4.2+.3+.3+.2= 5.0
    Instructor 4.2, high M, average difficulty, size 25 

     

    4.2+.1+.3+.2= 4.8 

     

    Instructor 4.2, average M, easy course, size 25 4.2+.3+.1+.2= 4.8
    Instructor 4.2, average M, average difficulty course, size 25 4.2+.3+.3+.2= 5.0
    Instructor 4.2, average M, difficult course, size 25 

     

    4.2+.3+.5+.2= 5.2
    Instructor 4.2, average M, average D, size 12 4.2+.3+.3+.1= 4.9
    Instructor 4.2, average M, average D, size 25 4.2+.3+.3+.2= 5.0
    Instructor 4.2, average M, average D, size 45 4.2+.3+.3+.3= 5.1
    Instructor 4.2, average M, average D, size 60 

     

    4.2+.3+.3+.4= 5.2
     

     
     

    The Professional Standards Committee does not want to push faculty towards offering the large lecture/smaller discussion or lab, or away from writing courses,  so that faculty might earn a few more points. Nor do we want systematic discrimination against faculty who teach valued but small courses. Therefore we offer a second adjustment omitting size.

     

    Global Instructor + Motivation of student (high=1, low=5) + Difficulty of course (most difficult =5, easiest = 1). A few limited examples will suffice:

     
     
    Instructor 4.2 , low student M, average difficulty 4.2+.5+.3= 5.0
    Instructor 4.2, average M, average difficulty 4.2+.3+.3= 4.8
    Instructor 4.2, high M, average difficulty 

     

    4.2+.1+.3= 4.6 

     

    Instructor 4.2, average M, easy course 4.2+.3+.1= 4.6
    Instructor 4.2, average M, average difficulty course 4.2+.3+.3= 4.8
    Instructor 4.2, average M, difficult course 

     

    4.2+.3+.5= 5.0
    Instructor 4.2, average M, average D,  4.2+.3+.3 = 4.8
     

     

    Why offer both tabulations? One member said that departments might want to include size, to compare faculty in similar-sized sections, or to compare faculty giving mammoth lectures, or those giving labs. It might not be useful at all on campuses. So, provide both.

    .
    1. Formative Purpose: Adjustment in terms of Learning
    This adjustment is not found in the literature. However, Marsh, Cashin and Downey and others put great effort into trying to use student learning as a criterion for effective teaching. [The IDEA form includes ten learning goals, recognizing that different courses and different instructors have different goals. Some focus on facts, others on creativity, or principles, or providing a liberal education, or problem-solving. Cashin and Downey tried to develop a small set of questions that correlated highly with the learning (1992). Marsh (1994) criticized their composite, which Cashin, et al., (1994) defended.]
     
    The debate, and the evidence put forth to support their respective views, showed very clearly that two items correlated highly with different types of learning, regardless of raw or weighted composite scores. One item was that "the instructor introduced stimulating ideas" (our q 14) and the other was the global "Overall I learned a great deal" (our q 3).

    These two variables seem to give us a way to adjust the global instructor (q 17) item to give a slightly different orientation on effectiveness in terms of learning. One variable focuses on the instructor, and the other on the student perception. This adjusted score might be useful for departmental assessment of student learning; it might be useful for formative (retention, promotion, tenure deliberations) or diagnostic review. Given the vigor of the debate (Marsh 1994, Cashin, Downey and Sixbury 94), we would caution against its use in summative (merit) evaluations. Remember that we made this one up, and it will need testing.

     
     
    1. Instructor + Introduced Stimulating Ideas + Learned a lot
      1. Stimulating Ideas: Strongly Agree = .5; Strongly Disagree = .1
      2. Learned a great deal: Strongly Agree=.5; Strongly Disagree=.1
     
    Examples of Adjustment for Teaching/Learning
    Utilizing an arbitrary Global Instructor mean of 4.2
     
     
    I=4.2 + high Stimulation + high Learning 4.2+.5+.5 = 5.2
    I=4.2 + med Stimulation + high Learning 4.2+.3+.5 = 5.0
    I=4.2 + low Stimulation + high Learning 4.2+.1+.5 = 4.8
    I=4.2 + high Stimulation, med Learning 4.2+.5+.3 = 5.0
    I=4.2 + med Stimulation + med Learning 4.2+.3+.3 = 4.8
    I=4.2 + low Stimulation + med Learning 4.2+.1+.3 = 4.6
    I=4.2 + high Stimulation + low Learning 4.2+.5+.1 = 4.8
    I=4.2 + med Stimulation + low Learning 4.2+.3+.1 = 4.6
    I=4.2 + low Stimulation + low Learning 4.2+.1+.1 = 4.4
     
     
    While not exhaustive, these examples show the range possible with this adjustment with a common global instructor mean.

    Two Additional Features of  Presentation

    In addition to n, percentages, means and standard deviations, this new form permits two additional features:  (1) an  additional statistical analysis--a 20x20 correlation matrix--will be printed out for each class.  This will enable faculty to see relationships for themselves:
     

    Individual analysis may lead faculty to question whether an anomaly they have discovered, or linkages of significance for them, also exist within their department, or for particular courses.  It is very easy, under this new form, to get access to data for departments, or courses, for such analysis. Or, analysis of their own data might lead to a full-scale research project on all UWC data.    Faculty may present findings in their department (or campus); they also might get some publishable articles.  (Given recent trends in the UWC, grants may be available for such studies.)

    (2)  Faculty will receive, through e-mail attachment, the statistical data, in usable form--i.e., they will be able to conduct more esoteric studies--3-way correlations, factor analysis, step-wise regression, analysis of variance, etc.   Having their own data also permits faculty to make studies over time, which might be useful for building a file for promotion or post-tenure review. Again, analysis of their own data may lead to their making more extensive analyzes for the department, or Colleges.    If faculty don't want to engage in such activities,  they can delete the attachment.

    Each instrument of student evaluation has strengths and weaknesses. One of the strong points of this form is that we gain the data to make a different kind of study  about ourselves, our teaching and our students.  It also enables us to access a body of data so that we can conduct scholarly research.  Asking  three general questions meets the concerns of  those who value the written student response, in essay form.
     

  • Recommendations on Utilization of Data
  • Cashin & Downey (1992) make a case, which we found persuasive, of the value of choosing a few global indicators for summative (merit) evaluations, with a larger number of varied questions for formative (promotion, tenure, post-tenure) and diagnostic reviews. Their view is echoed by Braskamp et al. (1985), Abrami, Levanthal & Dickens (1981). Cashin & Downey (1992, 1994) also advocate the wisdom of offering a measure that adjusts for known student biases, based on solid research over the last two decades.
    Herbert Marsh, disciple of omni-dimensionality, offers a different approach. He opposes use of global measures, preferring to develop a weighted average of six to nine factors (learning/value, enthusiasm, organization, group interaction, individual rapport, breadth of coverage, exams/grades, assignments, and workload/difficulty). Marsh, of course, wants the weights to be assigned on the basis of empirical and logical analysis.
    Unfortunately, Marsh has not developed such weights. A second problem is his willingness to accept quite different items as evidence of his factors. A third is that some of the factors (breadth of coverage, for example) require analysis other than student response. A fourth factor is the criticism generated by other scholars on the wisdom of combining six or nine factors for one score. Hoyt points out that "well-organized garbage still smells" (1973, 153). Scriven opposes "style" items such as organization and promoting discussion because these do not correlate reliably with learning measures across disciplines, levels, and circumstances (1981, 251). Naftulin, Ware & Donnally (1973) warn against using some measures of enthusiasm (i.e., expressiveness in presentation), pointing to the "Dr. Fox" effect. (Students respond to the style rather than substance.) On the other hand, Cashin points out that "making the lecture interesting as well as informative helps students learn content, especially when incentives and testing are missing" (1988, 3) Scriven cautioned against using student learning items because "what is learned may be worthless" (1981, 248).

    Given these (and other) caveats, the Committee strongly suggests that no composite score of 6-9 factors, weighted or not, be utilized for summative (comparative, as in merit) deliberations. However, we recognize that faculty on personnel committees will make their own decisions on what is important, and how it is to be collated and used.

    Conclusion
    1. This proposal is divided into three parts, and the  Committee recommends adoption of all three:
      • the twenty-question form, and the write-in alternatives.
      • the package of statistical data (n, %, means, standard deviations) to be sent to the usual three recipients:  department, campus, instructor, and a correlation matrix to be sent to the faculty member.
      • the adjusted scores, two of  which control for student bias, and one  which include variables on learning.
    2. The Committee recommends that merit committees engaged in summative evaluation use:
      1. q 17 Overall, I consider this instructor to be an excellent teacher.
      2. (They also might want to use the global q 18: Overall, I consider this to be an excellent course. The caveat is that merit committees are not rating courses, but instructors. )
    3. Summative committees will have both Adjusted-for-Student-Bias score that could be utilized at their discretion.
    4. The Committee realizes that summative committees might wish to take note of responses to other questions, but would strongly advise against developing a composite mean of all questions, or even of those which might fit the "Marsh 6."
    5. The Committee recommends that other questions, as well as the adjusted score for teaching/learning, be used for formative or diagnostic evaluation.
    6. The Committee recommends that Fox Valley retain the electronic data-set, and  notes that original scantron forms will  be returned, because these may contain the written comments of students. The Committee seeks guidance of the Senate on the return policy:
      1. The easiest might be to return the originals to the campus, with each campus making two copies (at least of the "essay" side), distributing one to the department chair and the other to the faculty member).
      2. An alternative would be to send the originals to departments, with each chair making two or more copies: one for faculty, and others for relevant campuses.
      3. The most time-consuming for the processing campus (Fox Valley) would be for them to copy the forms 2 times,  and to send the 3 copies to faculty, departments, campus.
    7. If the faculty, through departments, campuses, and Senate, do not adopt this instrument, the Professional Standards Committee would like specific guidance on modifications.
     
    CITATIONS
    (Starred items can be requested from the Committee.)
    Abrami, P. C. (1985) Dimensions of Effective College Instruction. Review of Higher Education 8, 211-228.

    Abrami, P. C. (1989). How Should We Use Student Ratings to Evaluate Teaching? Research in Higher Education 30, 221-227.

    Abrami, P. C. & d'Apollonia (1990). The Dimensionality of Ratings and Their Use in Personnel Decisions. In M. Theall & J. Franklin, eds. Student ratings of Instruction: Issues for Improving Practice: New Directions for Teaching and Learning 43, (pp. 97-111). San Francisco: Jossey-Bass.

    Abrami, P. C. & d'Apollonia (1991). Multidimensional Students' Evaluations of Teaching Effectiveness: Generalizability of "N=1" Research: Comment on Marsh (1991). Journal of Educational Psychology, 83, 411-415.

    Abrami, P. C. Leventhal, l. L. & Dickens, W.J. (1981) Multidimensionality of Student Ratings of Instruction. Instructional Evaluation 6, 12-17.

    Aleamoni, L. M. (1981). Student Ratings of Instruction, in J. Millman, Ed. Handbook of Teacher Evaluation (pp. 110-145). Beverly Hills, CA: Sage.

    Braskamp, L. A., Brandenberg, D. C. & Ory, J.C. (1984). Evaluating Teaching Effectiveness: A Practical Guide. Beverly Hills, CA: Sage.

    Cashin, W. E. (1985). Student Ratings: The Need for Comparative Data. Paper presented at the annual meeting of the American Educational Research Association, Chicago, IL. [ERIC Document Reproduction Service No. ED 261 098.]

    *Cashin, W. E. (1988). Student Ratings of Teaching: Summary of Research. (IDEA paper No. 20). Manhattan, KS: Kansas State University, Division of Continuing Education.

    *Cashin, W. E. & Downey, R. G. (1992) Using Global Student Rating Items for Summative Evaluation, Journal of Educational Psychology 84, n4, 536-572.

    *Cashin, W. E., Downey, R. G., & Sixbury, G. R. (1994) Global and Specific Ratings of Teaching Effectiveness and Their Relation to Course Objectives: Reply to Marsh (1994). Journal of Educational Psychology 86, n4, 649-657.

    Cashin, W. E. & Slawson (1977). IDEA Technical Report No. 2: Description of Data Base 1976-1977. Manhattan: Kansas State University. Center for Faculty Evaluation and Development.

    Cohen, P. A. (1981). Student Ratings of Instruction and Student Achievement: A Meta-analysis of Multisection Validity Studies. Review of Educational Research, 51, 281-309.

    Feldman, K. A. (1976) Grades and College Students' Evaluations of Their Courses and Teachers. Research in Higher Education, 4, 69-111.

    Feldman, K. A. (1978) Course Characteristics and College Students' Ratings of their Teachers: What We Know and What We Don't. Research in Higher Education, 9, 199-242.

    Feldman, K.A. (1979) The Association Between Student Ratings of Specific Instructional Dimensions and Student Achievement: Refining and Extending the Synthesis of Data from Multisection Validity Studies. Research in Higher Education, 30, 583-645.

    Feldman, K. A. (1984) Class Size and College Students' Evaluation of Teachers and Courses: A Closer Look. Research in Higher Education, 21, 45-116.

    Feldman, K. A. (1988) Effective College Teaching from the Students' and Faculty's View: Matched or Mismatched Priorities? Research in Higher Education, 28. 291-344.

    Feldman, K. A. (1989). The Association Between Student Ratings of Specific Instructional Dimensions and Student Achievement: Refining and Extending the Synthesis of Data from Multisection Validity Studies. Research in Higher Education 30, 536-645.

    Howard, G. S. and Maxwell, W. E. (1980). The Correlation Between Student Satisfaction and Grades: A Case of Mistaken Causation?" Journal of Educational Psychology. 72, 810-820.

    Howard, G. S. & Maxwell, S. W. (1982) Do Grades Contaminate Student Evaluations of Instruction? Research in Higher Education, 16, 175-188.
    Hoyt, D. P. (1973) Measurement of Instructional Effectiveness. Research in Higher Education, 1, 367-378.

    *Koon, J. and Murray, H. G. (1995) Using Multiple Outcomes to Validate Student Ratings of Overall Teacher Effectiveness. Journal of Higher Education, 66, n1, 61-81.

    Kulik, J. A. & McKeachie, W. J. (1975). The Evaluation of Teachers in Higher Education, in F. N. Kerlinger, ed. Review of Research in Education v3, (pp210-240). Itasca, IL: F.E. Peacock.

    Marsh, Herbert W. (1982) SEEQ: A Reliable, Valid and Useful Instrument for Collecting Students' Evaluations of University Teaching. British Journal of Educational Psychology, 52, 77-95.

    Marsh, Herbert W. (1984). Students' Evaluations of University Teaching: Dimensionality, Reliability, Validity, Potential Biases, and Utility. Journal of Educational Psychology, 76, 707-754.

    Marsh, H. W. (1987). Students' Evaluations of University Teaching: Research Findings, Methodological Issues, and Directions for Future Research. International Journal of Educational Research, 11, 253-388.

    Marsh, H. W. (1989). Responses to Reviews of "Students' Evaluations of University Teaching: Research Findings, Methodological Issues, and Directions for Future Research." Instructional Evaluation, 10, 5-9.

    Marsh, H. W. (1991a). A Multidimensional Perspective on Students' Evaluations of Teaching Effectiveness: A Test of Alternative Higher-Order Structures. Journal of Educational Psychology, 83, 285-296.

    Marsh, H. W. (1991b). A Multidimensional Perspective on Students' Evaluations of Teaching Effectiveness: Reply to Abrami and d'Apollonia (1991). Journal of Educational Psychology, 83, 416-421.

    *Marsh. H. W. (1994). Weighting for the Right Criteria in the Instructional Development and Effectiveness Assessment (IDEA) System: Global and Specific Ratings of Teaching Effectiveness and Their Relation to Course Objectives. Journal of Educational Psychology, 86, n4, 631-648.

    Marsh, H. W. & Ware, J. E. (1982) Effects of Expressiveness, Content Coverage, and Incentive on Multidimensional Student Rating Scales: New Interpretations of the Dr. Fox Effect. Journal of Educational Psychology. 74, 126-134.

    Naftulin, D. H., Ware, J. E., & Donnelly, F.A. (1973) The Doctor Fox Lecture: A Paradigm of Educational Seduction. Journal of Medical Education, 48, 630-635.

    Scriven, M. (1981) Summative Teacher Evaluation, in J. Millman, ed. Handbook of Teacher Evaluation (pp. 244-271). Beverly Hills, CA: Sage.
     

     VOTE ON THE 20 QS