A literature review on the student evaluation of teaching: An examination of the search, experience, and credence qualities of SET

Competition among higher education institutions has pushed universities to expand their competitive advantages. Based on the assumption that the core functions of universities are academic, understanding the teaching–learning process with the help of student evaluation of teaching (SET) would seem to be a logical solution in increasing competitiveness. The paper aims to discuss these issues.

Design/methodology/approach

The current paper presents a narrative literature review examining how SETs work within the concept of service marketing, focusing specifically on the search, experience, and credence qualities of the provider. A review of the various factors that affect the collection of SETs is also included.

Findings

Relevant findings show the influence of students’ prior expectations on SET ratings. Therefore, teachers are advised to establish a psychological contract with the students at the start of the semester. Such an agreement should be negotiated, setting out the potential benefits of undertaking the course and a clear definition of acceptable performance within the class. Moreover, connections should be made between courses and subjects in order to provide an overall view of the entire program together with future career pathways.

Originality/value

Given the complex factors affecting SETs and the antecedents involved, there appears to be no single perfect tool to adequately reflect what is happening in the classroom. As different SETs may be needed for different courses and subjects, options such as faculty self-evaluation and peer-evaluation might be considered to augment current SETs.

Keywords

Higher education
Student expectations
Service marketing
Teacher evaluation
Teaching and learning process

Citation

Ching, G. (2018), "A literature review on the student evaluation of teaching: An examination of the search, experience, and credence qualities of SET", Higher Education Evaluation and Development, Vol. 12 No. 2, pp. 63-84. https://doi.org/10.1108/HEED-04-2018-0009

Publisher

Emerald Publishing Limited

License

Published in Higher Education Evaluation and Development. Published by Emerald Publishing Limited. This article is published under the Creative Commons Attribution (CC BY 4.0) licence. Anyone may reproduce, distribute, translate and create derivative works of this article (for both commercial and non-commercial purposes), subject to full attribution to the original publication and authors. The full terms of this licence may be seen at http://creativecommons.org/licences/by/4.0/legalcode

1. Introduction

For the past number of years, the increasing number of degree providing institutions has dramatically changed global higher education (Altbach et al., 2009; Usher, 2009). This rising number of higher education institutions has actually led to increased competition among universities (Naidoo, 2016). Furthermore, with cutbacks in government funding for higher education (Mitchell et al., 2016), differentiation is essential for universities to distinguish themselves and compete with other institutions (Staley and Trinkle, 2011). Such differentiation of higher education institutions has become commonplace, forcing universities to become more innovative, cost conscious, and entrepreneurial (Longanecker, 2016; MacGregor, 2015).

These global dilemmas are not new to Taiwan, wherein universities have to outperform each other for financial subsidies, while also competing to recruit new students (Chou and Ching, 2012). The problem of recruitment results from a serious decline of birth rate in Taiwan. The National Statistics Office of Taiwan (2018) reported that birth figures declined from 346,208 in 1985 to 166,886 in 2010, representing a fall of almost 50 percent. Projecting these numbers into university entrants, a drop of around 20,000 incoming students can be noted for the academic year 2016/2017 (Chang, 2014). In fact, only 241,000 freshmen students are noted for the current 2017/2018 academic year and this number is expected to drop to around only 157,000 in 2028 (Wu, 2018). This issue of declining number of students has resulted in financial difficulties for academic institutions (Chen and Chang, 2010). In such difficult times, it is crucial for higher education institutions in Taiwan to differentiate themselves and develop their competitive advantages.

In the age of big data, differentiation can be achieved with the help of large data sets that provide institutions with the capacity to address complex institutional issues (Daniel, 2015; Norris and Baer, 2013). Many researchers have begun to collect and analyze institutional data sets to address various administrative and instructional issues faced by the universities (Picciano, 2012). The results of these studies can provide school administrators and students with useful information (Castleman, 2016). In Taiwan, big data has provided institutions with information on topics such as trends in enrollment rates, students’ online learning performances, and research outputs measured by number of academic publications (Tseng, 2016). Another study reported on the advantages of collecting and understanding student learning experiences using big data (Lin and Chen, 2016). Based on the assumption that the core functions of higher education institutions remains to be academic (Altbach, 2011), i.e., teaching and learning, determining and understanding the quality of the teaching learning process with the aid of big data can be extremely useful.

In order to understand the quality of the teaching learning process, higher education institutions in Taiwan and elsewhere have long been using the student evaluation of teaching (SET), which provides feedback on teaching performance and appraises faculty members (Aleamoni and Hexner, 1980; Centra, 1979; Clayson, 2009; Curran and Rosen, 2006; Pozo-Muñoz et al., 2000). Even though the practice of using SETs is well established in higher education institutions (Rice, 1988; Wachtel, 2006) and is considered relatively reliable for evaluating courses and instructors (Aleamoni, 1999; Marsh, 1987, 2007; Nasser and Fresko, 2002), its usefulness and effectiveness has been challenged (Boring et al., 2016).

Over time, numerous issues have arisen in research on SETs. It has been claimed that SETs are used as a tool by students to reward or punish their instructor (Clayson et al., 2006), that SET results differ across areas (course, subject, and discipline) (Chen, 2006) and type (including course design and class size) of study (Feldman, 1978; Marsh, 1980), and that completion rate and background demographics of students significantly affect SETs (Stark and Freishtat, 2014). Moreover, SETs can be biased with respect to the gender of the instructor and that of the students (Boring et al., 2016). Interestingly, recent research has found that effective teachers are receiving low SET ratings (Braga et al., 2014; Kornell and Hausman, 2016). This has caused many institutions, including universities in Taiwan, to analyze and redesign their SETs (Chen, 2006; Zhang, 2003).

In light of these issues, the current paper shall seek to provide a better understanding of the inner workings of SETs. With the better understanding of SETs, more appropriate and effective evaluation tools can be developed. In addition, the categorization of education as a type of services (WTO, 1998) has also opened up new ways of looking into the entire academe. Anchoring on the narrative literature review paradigm, this paper will shape the discussion of SETs within the concept of service marketing. A common framework used to evaluate services, which is to determine the search, experience, and credence qualities of the provider (Fisk et al., 2014, p. 151; Wilson et al., 2012, p. 29). In addition, the paper will review the definitions of SET in the existing literature as well as the dimensions commonly used to measure the quality of teaching. Finally, the various factors that affects the collection of SETs are discussed.

2. Methodology

The current study is anchored on a literature review paradigm. For any study, a literature review is an integral part of the entire process (Fink, 2005; Hart, 1998; Machi and McEvoy, 2016) In general, literature reviews involve database retrievals and searches defined by a specific topic (Rother, 2007). To perform a comprehensive literature review, researchers adopt various approaches for organizing and synthesizing information, adopting either a qualitative or quantitative perspective for data interpretation (Baumeister and Leary, 1997; Cronin et al., 2008; Fink, 2005; Hart, 1998; Lipsey and Wilson, 2001; Petticrew and Roberts, 2005; Rocco and Plakhotnik, 2009; Torraco, 2005).

For the current study, the researcher adopts a narrative literature review approach. Narrative review or more commonly refer to as traditional literature review is a comprehensive, critical, and objective analysis of the current knowledge on a topic (Charles Stuart University Library, 2018). The review should be objective insofar as it should have a specific focus but should provide critiques of important issues (Dudovskiy, 2018). More importantly, the results of a narrative review are qualitative in nature (Rother, 2007).

The study follows the suggestion of Green et al. (2006) with regard to synthesizing search results retrieved from computer databases. For the current study, the researcher used Google Scholar as a starting point, followed by searches within ProQuest and PsycINFO. Keywords used for searches were “student evaluation of teaching” and related terminologies (see next section for more information on SET synonymous terms). Selections of relevant articles are explicit and potentially biased insofar as the researcher focuses on the search, experience, and credence qualities of providers within SET studies. Data analysis methods consist of a procedure for organizing information into specific themes developed by Miles and Huberman (1994) and Glaser’s (1965, 1978) technique for continuous comparison of previously gathered data.

3. Defining student evaluation of teaching

In relation to students’ college experience, determining whether a course or a teacher is good or bad can be equated to measuring service quality (Curran and Rosen, 2006). This is especially the case with regard to SETs. The concepts behind SETs have been discussed since the early 1920s (Otani et al., 2012), and literally thousands of studies have been carried out on these various interrelated concepts (Marsh, 2007). Furthermore, within the vast spectrum of literature on the topic, a variety of related terms are used interchangeably. Hence, a thorough, comprehensive literature review is impossible.

SET is a relatively recent term that is used synonymously with several previous terminologies such as Student Evaluation Of Educational Quality (SEEQ) (Coffey and Gibbs, 2001; Grammatikopoulos et al., 2015; Lidice and Saglam, 2013), SET effectiveness (Marsh, 1987, 2007), student evaluation of teacher performance (Chuah and Hill, 2004; Coburn, 1984; Flood, 1970; Poonyakanok et al., 1986), student evaluation of instruction (Aleamoni, 1974; Aleamoni and Hexner, 1980; Clayson et al., 2006; Powell, 1977), student course satisfaction (Betoret, 2007; Bolliger, 2004; Rivera and Rice, 2002), or just simply student course evaluation (Anderson et al., 2005; Babcock, 2010; Bembenutty, 2009; Chen, 2016; Duggan and Carlson-Bancroft, 2016; Huynh, 2015; Pravikoff and Nadasen, 2015; Stark and Freishtat, 2014). Despite the difference in terms, the core objectives of all of the above are similar.

According to the classification and definition of basic higher education terms set by the United Nations Educational, Scientific and Cultural Organization, SET is described as:

[…] the process of using student inputs concerning the general activity and attitude of teachers. These observations allow the overall assessors to determine the degree of conformability between student expectations and the actual teaching approaches of teachers. Student evaluations are expected to offer insights regarding the attitude in class of a teacher and/or the abilities of a teacher […]
(Vlăsceanu et al., 2004, pp. 59-60).

This definition implies three main aspects, namely, the evaluation of the teacher (the teacher itself), the teaching process (general activity and teaching approaches), and the learning outcomes as perceived by the students (student expectations) This is similar to the framework for evaluating service marketing, whereby the teacher corresponds to the “search” qualities, the teaching process to the “experience” qualities, and the learning outcomes to the “credence” qualities (see Figure 1).

3.1 Search qualities in SET

As previously mentioned, one of the first aspects of SET that focuses on the teacher is the student evaluation of the teacher, or rather the student’s perception of the teacher’s characteristics (Fox et al., 1983). As Tagiuri (1969) notes, in an early study, a person’s (in this case a teacher’s) personality, characteristics, qualities, and inner states (p. 395) matter significantly. Early research findings suggest that students sometimes interpret a teacher’s creativeness as a positive characteristic (Costin et al., 1971), while others note that a teacher’s personality traits affect their SET ratings (Clayson and Sheffet, 2006; Mogan and Knox, 1987; Murray et al., 1990). For instance, the interpersonal characteristics of teachers influence interactions between the students (Mogan and Knox, 1987), which ultimately leads to better engagement and learning (Hu and Ching, 2012; Hu et al., 2015; Skinner and Belmont, 1993).

This focus on the teacher also leads to various biases in SET. For example, teachers’ physical appearance can have an effect on their SET ratings (Bonds-Raacke and Raacke, 2007; Hultman and Oghazi, 2008). Felton et al. (2004) in their study of the university teachers rating website (www.ratemyprofessors.com/) conclude after analyzing 3,190 faculty members across 65,678 posts that physically attractive teachers get higher ratings. In addition, a study by Buck and Tiene (1989) finds that attractive female teachers, even if they are considered authoritarian, tend to receive higher SET ratings compared to their less attractive female counterparts. Besides the teacher’s physical appearance, gender, and age are also important (Buck and Tiene, 1989; Sohr-Preston et al., 2016). Younger male faculty members were found to receive higher ratings (Boring et al., 2016), while more senior teachers received lower SET ratings (Clayson, 1999). Similarly, a teacher’s ethnicity is also a factor (Dee, 2005; Ehrenberg et al., 1995). For instance, students may consider female African American teachers more sympathetic (Patton, 1999), which can affect their SET ratings. These biases in SETs are unfair since an individual’s demographics and personality traits are fixed and cannot be changed.

Drawing on the concept of service marketing, the aforementioned teacher factors can be considered the search qualities that students look for before enrolling in a particular course. Students sometimes look for easy teachers just to pass a subject (Felton et al., 2004). However, research shows that most students tend to search for competent teachers (Feldman, 1984) and credible faculty members (Patton, 1999; Pogue and Ahyun, 2006). This disproves the fallacy that easy teachers receive high SET ratings (Beatty and Zahn, 1990; Marsh and Roche, 2000).

By definition, search qualities are the easily observable and most common physical attributes a product (or in this case a teacher or course) may possess (Curran and Rosen, 2006, p. 136). Moreover, these search qualities are apparent and can be judged relative to similar options (Lubienski, 2007). What is most important is that students are able to judge these search qualities beforehand. This means that students have certain initial preferences with regard to aspects such as the type of teacher, the physical characteristics of the classroom, or even the schedule of the course. Students tend to compare various options before signing up for a class. In addition, social psychology literature has long demonstrated the influence of beauty on individual judgments (Adams, 1977; Berscheid and Walster, 1974). Individuals tend to relate beauty to being good (Eagly et al., 1991). This halo effect explains why teachers’ attractiveness tends to influence their SET ratings (Felton et al., 2004). Furthermore, students may also have a preference with regard to the physical situation of the classroom (Douglas and Gifford, 2001; Hill and Epps, 2010), which influences their overall level of satisfaction.

In summary, more emphasis should be placed on the perceived expectations of students, which can be discerned from their search qualities. As studies by Buck and Tiene (1989) and Patton (1999) find, students tend to associate physical attributes with certain types of behavior, such as expecting attractive female teachers to be more feminine and female African American teachers to be more sympathetic. Another important issue here is that students are expecting something, regardless of their reasons for having these expectations of their teachers and courses. These expectations, whether arising from stereotyping of attributes or hearsay from schoolmates, must be met to satisfy the students. However, this should not be the case, and teachers should focus on building their professionalism and credibility (Carter, 2016). In-class behaviors such as self-disclosure, humor, warmth, clarity, enthusiasm, verbal and nonverbal messages, teacher immediacy; nonverbal interactions that enhance closeness (Mehrabian, 1968, p. 203), and affinity seeking; the creation of positive feelings toward oneself (Bell and Daly, 1984) are just a few examples of effective strategies that can positively affect how students view teachers (Patton, 1999). These behaviors make for effective teaching and can also prevent students from stereotyping teachers because of their appearance or on account of demographic features.

3.2 Experience qualities in SET

Besides the teacher, the second aspect of SET identified is the teaching process. In reality, this is what the majority of SETs currently being used measures (Algozzine et al., 2004; Wachtel, 2006). The goal of SETs is to determine the teachers’ teaching effectiveness (Marsh, 2007). Such instruments have been used throughout academia for a long time, but their validity, reliability, and usefulness are still being challenged (Aleamoni, 1974, 1999; Aleamoni and Hexner, 1980; Arreola, 2007; Costin et al., 1971; Feldman, 1978, 1984; Marsh, 1984, 2007; Marsh and Roche, 1997; Rodin and Rodin, 1972; Wright et al., 1984). This makes sense since teaching is a complex activity (Shulman, 1987), so the factors used to measure a teacher’s effectiveness are multidimensional (Marsh, 1991, 2007; Marsh and Bailey, 1993) and difficult to comprehend. Nonetheless, it has been proven that SETs contribute to faculty development by enhancing the teaching and learning experience (Centra, 1979; Marsh, 1980, 1984, 1987, 1991, 2007; Marsh and Bailey, 1993; Marsh and Roche, 1997). This function is formative (Shulman, 1987), meaning that SETs can provide evidence to support improvements that shape the overall quality of teaching (Berk, 2005, p. 48).

Evaluating the teaching process is a complex and complicated undertaking, requiring a full understanding of how students came to conclusions with regard to their teachers and courses (Curran and Rosen, 2006). Typically, taking a university course would require the student to attend class every week, which corresponds to repeated service encounters that are critical to later evaluation (Solomon et al., 1985). Within the concept of service marketing, these repeated service encounters (which in this case are the repeated classroom encounters) correspond to the experience qualities that students perceive when taking the course. These experience qualities are not readily apparent and can only be judged when the entire service experience is over (generally at the end of the course) (Curran and Rosen, 2006; Lubienski, 2007). However, because such experiences are repeated, it can be difficult to know whether the resulting SET ratings are based on an overall experience of the course or just on one single event that made a lasting impression (Curran and Rosen, 2006). Furthermore, students attend class with their classmates, so there are other individuals partaking of the service delivery at the same time. Therefore, the interactions of these students within the class might either enhance or reduce the service quality, which might, in turn, affect an individual student’s experience (Grove and Fisk, 1997).

Based on the above points, evidence shows that students can compare their teachers with other teachers teaching the same course before actually signing up for the class. However, it is most likely that student would ask around, seeking other students who had taken the course already and asking for their comments. This is because students generally do not have access to SET results (Marlin, 1987). Marsh (2007) notes that although a few universities do publish their SET summaries, this is solely for the purpose of course or subject selection. The publication of SET results is controversial (Babad et al., 1999; Perry et al., 1979) and is generally regarded negatively by faculty members (Howell and Symbaluk, 2001).

It is important to note that, based on asking around prior to taking a course, students might expect to receive a certain grade or a certain amount of classwork, or even have expectations with regard to how the lectures are conducted (Nowell, 2007; Remedios and Lieberman, 2008; Sander et al., 2000; Voss et al., 2007). If teachers then behave contrary to the students expectations, students may be disappointed and SET ratings may be affected (Bennett, 1982). Such student expectations can also contribute to the development of a psychological contract between the teacher and the students. These prior expectations, whether arising from the students’ desire to benefit from the course (Voss et al., 2007) or from hearsay, are found to contribute to such a psychological contract (Roehling, 1997).

A psychological contract can be defined as any individual beliefs, shaped by the institution, regarding the terms of an exchange agreement, between students and their teachers (Kolb, Rubin, and McIntyre, 1984; Rousseau, 1995, 2001). Recent research finds that when the psychological contract between the teacher and the students is positive, learning motivation is enhanced (Liao, 2013). Furthermore, these agreements might be either implicitly or explicitly made between the teachers and students. To make them more effective, the agreements should be negotiated at the start of the term, and should constitute a shared psychological contract between the teacher and the students (Pietersen, 2014). More importantly, Cohen (1980) notes that if SETs are accomplished during the middle of the semester, teachers are still able to improve their teaching pedagogy re-aligning the previously agreed upon psychological contract. Hence, faculty members who received mid-semester feedback ended up with significantly higher SET ratings than their counterparts who did not have a mid-semester evaluation (Cohen, 1980). Ultimately, mid-semester feedbacks provide ample opportunity for instructional improvement (Overall and Marsh, 1979).

In summary, it has been noted in the literature that evaluating the teaching process, or rather the effectiveness of teaching, is a complex task. It is multidimensional and mostly concerns the experience qualities of the students who have taken the course. More important, in relation to the numerous biases associated with SETs discussed in the introduction of this paper, perceptions of course delivery and service quality are affected by a variety of issues, including peers, class size, and type of course. Adding to the fact that students have their own personal expectations of what the course should offer, it is difficult to satisfy every student. Pietersen (2014) suggests the making of a psychological contract between the teacher and the students, to provide clear study goals and remove false expectations. In addition, an evaluation can be conducted in the middle of a semester, giving the teacher ample opportunity to address students’ doubts and to re-adjust the shared contract based on students’ abilities. Furthermore, as the goal of SETs is to provide formative suggestions for teachers to improve teaching, it is also prudent to include statements on the provision of formative lessons and on how course designs contribute to student learning (Brownell and Swaner, 2009; Kuh, 2008; Kuh et al., 2013).

3.3 Credence qualities and SETs

The last component of SETs identified is the evaluation of learning outcomes, more specifically, the accomplishment of goals. It has long been accepted that goals are important predictors of educationally relevant learning outcomes (Ames and Archer, 1988), while also focusing on the motivational aspects driven by mastery and performance-approach goals (Harackiewicz et al., 2002). In simple terms, if students clearly understand the skills necessary for future employment, while also understanding taking a certain course will enable them to master those skills, they should be motivated to do well in that course. Research shows that students are more engaged with their academic classwork when future career consequences are clearly understood (Greene et al., 2004; Miller et al., 1996). However, in reality many students are uncertain of their study goals and are at risk of dropping out (Mäkinen et al., 2004).

A university education is characterized by high credence properties (Curran and Rosen, 2006). Credence qualities are those properties that are not apparent, can be never be fully known or appreciated by students (Lubienski, 2007), and might, therefore, be impossible to evaluate (Curran and Rosen, 2006). Credence properties are generally found in goods and services that are characterized by high levels of complexity (Darby and Karni, 1973), such as the teaching and learning process. More importantly, even after the service has been used (in this case, when a student graduate from the university), the consumer (student) may still find it difficult to evaluate the service (Zeithaml, 1981). A famous example of credence qualities in a product can be found in the taking of vitamin pills, where there is low verification of the alleged effectiveness and quality of the product, even after it has been tried by the consumer (Galetzka et al., 2006). In higher education, the true value of a course may be realized by a student only after the skills and knowledge learned are used in employment, which might be several months or even years after the service has ceased (Curran and Rosen, 2006).

The credence qualities of higher education are related to the concept of academic delay of gratification (Bembenutty, 2009). Academic delay of gratification is a term used to describe the “postponement of immediately available opportunities to satisfy impulses in favor of pursuing chosen important academic rewards or goals that are temporally remote but ostensibly more valuable” (Bembenutty and Karabenick, 1998, p. 329). Similar to what is described by the achievement goals theory, students are motivated when they clearly perceive benefits that lead to future success (Bembenutty, 1999). In addition, students who adhere to the academic delay of gratitude principle tend to become autonomous learners (Bembenutty and Karabenick, 2004). If students know the usefulness of the course subject, they are more willing to delay gratification, participate in class, and complete academic tasks, and are ultimately more satisfied and hence give high SET ratings (Bembenutty, 2009).

In summary, the literature shows that besides formatives evaluations, SETs also include summative evaluations (Kuzmanovic et al., 2012; Mortelmans and Spooren, 2009; Otani et al., 2012; Spooren and Van Loon, 2012), which involve summing up the overall performance of teachers (Berk, 2005). Summative SETs generally contribute to teacher audits and evaluations that may lead to the hiring, tenure, and even promotion of faculty members (Arthur, 2009; Berk, 2005; Marks, 2000; Stark and Freishtat, 2014). Literature suggests that school administrators should be careful in using SET results containing many summative evaluations (Spooren et al., 2013), because, with respect to the credence properties of education, students might be unable to grasp the entire and actual benefits of certain courses. In order for effective learning to occur, the potential benefits of the course and an outline of acceptable performance should be defined in advance (Otter, 1995). Moreover, connections should be made between previous, current, and future courses, thus providing an overview of the entire program together with a clear outline of future career pathways.

4. Dimensions of SET

As has been noted, SETs are complex and involves multiple interrelated dimensions. In his early meta-analysis, Feldman (1978) shows that although most studies focus on the overall rating of the instructor. However, SETs that focus only on summative evaluations and that use global measures (few summary items) are highly discouraged (Cashin and Downey, 1992; Marks, 2000; Sproule, 2000). The majority of SETs aim at a more comprehensive rating of teachers and, as Marsh (2007) notes, are mostly constructed around the concept of effective teaching. The usefulness and effectiveness of an SET depends on how well it can capture the concepts it measures. Hence, careful design is essential (Aleamoni, 1974, 1999; Aleamoni and Hexner, 1980; Arreola, 2007).

One of the early syntheses of SETs is conducted by analyzing students’ views of the characteristics of a superior teacher (Feldman, 1976). For the study, three categories are identified: presentation, which includes teachers’ enthusiasm for teaching and for the subject matter, their ability to motivate students’, their knowledge of the subject matter, clarity of presentation, and organization of the course; facilitation, which denotes teachers’ availability for consultation (helpfulness), their ability to show concern and respect for students (friendliness), and their capacity to encourage learners through class interactions and discussions (openness); and regulation, which includes the teachers’ ability to set clear objectives and requirements, appropriateness of course materials (including supplementary learning resources) and coursework (with regard to difficulty and workload), fairness in evaluating students and providing feedback, and classroom management skills (Feldman, 1976).

Another early analysis of SETs conducted by Hildebrand (1973) and Hildebrand et al. (1971) and his associates identifies five constructs for measuring the effectiveness of teaching: analytic/synthetic skills, which includes the depth of a teacher’s scholarship and his or her analytic ability and conceptual understanding of the course content; organization/clarity, denoting the teacher’s presentation skills in the course subject area; instructor group interaction, which describes the teacher’s ability to actively interact with the class, his or overall rapport with the class, sensitivity to class responses, and ability to maintain active class participation; instructor–individual student interaction, which includes the teacher’s ability to establish mutual respect and rapport with individual students; and dynamism/enthusiasm, which relates to the teacher’s enthusiasm for teaching and includes confidence, excitement about the subject, and pleasure in teaching (Hildebrand et al., 1971, p. 18).

More recently, the SEEQ is frequently used by many higher education institutions. The SEEQ measures nine factors that constitute quality instruction (Marsh, 1982, 1987; Marsh and Dunkin, 1997; Richardson, 2005). These are assignments and readings, breadth of coverage, examinations and grading, group interaction, individual rapport, instructor enthusiasm, learning and academic value, organization and clarity, and workload and difficulty (Marsh, 2007, p. 323). Some SEEQ studies include an overall summative evaluation of the course subject as an additional factor (Schellhase, 2010). The similarities with Hildebrand (1973) and Hildebrand et al. (1971) and Feldman’s (1976) criteria of effective teaching are apparent.

In a series of studies conducted at the University of Hawaii, SET is first analyzed with respect to the perspectives of faculty members, which identifies important factors such as evaluation information based from students, information from peers (colleagues), student performance and grades, and external performance evaluations of teachers (Meredith, 1977). A study that included apprentice teachers (practice teachers) found that students preferred instructors who exhibited classroom organizational skills, who focused on students’ learning outcomes, and who interacted well with students (Meredith and Bub, 1977). A set of evaluation criteria was developed based on a study of both faculty members and students in the School of Law at the University of Hawaii, which included dimensions such as knowledge of subject matter, ability to stimulate interest and motivate students, organization of the course, preparation for the course, concern for students, quality of course materials, and an overall summative evaluation of the teacher (Meredith, 1978). Other studies measured teaching excellence by subject mastery, teaching skills, and personal qualities of the teacher (Meredith, 1985b), while an overall analysis of student satisfaction used the criteria social interaction, teaching quality, campus environment, employment opportunities, and classroom facilities (Meredith, 1985a), all of which contribute to SET ratings.

In summary, it is noted that SETs can vary depending on whether the evaluations are from the perspective of faculty members (how teachers teach) or from the students (how students learn). However, although several variations of SETs exist, comparisons suggest that as long as the overall objective is to evaluate effective teaching, dimensions within these SETs are interrelated and may overlap (Marsh, 1984, 2007; Marsh and Bailey, 1993; Marsh and Dunkin, 1997). A study conducted by the American Association of University Professors involving 9,000 faculty members found that SETs are generally established with controversial biases and issues (Flaherty, 2015). The more important issue is the establishment of the objectives for SET implementation within the university and careful decision making about who should participate in the development of such an evaluation instrument.

5. Antecedents of SET

Within the vast literature on SETs, analysis of their validity and reliability has identified various antecedents affecting effective evaluation. SET ratings are dependent on several issues, including the various biases already discussed. The first obvious antecedent is the instructor, as can be discerned from the previous discussions. Besides personality issues, gender plays an important role. Boring et al. (2016) find that SETs are statistically biased against female faculty, and that such biases can cause effective teachers to get lower SET ratings than less effective ones. MacNell et al. (2015) conducted an experiment in which students were blind to the gender of their online course instructors. For the experiment, two online course instructors were selected, one male and one female, and each was given two classes to teach. Later in the course, each instructor presented as one gender to one class and the opposite gender to the other class. The SET results gathered at the end of the semester are interesting. Regardless of the instructor’s real gender, students gave the teacher they thought was male and the actual male teacher higher SET ratings than the teacher they perceived as female. This experiment clearly shows that the rating difference results from gender bias (Marcotte, 2014).

Previous studies also show that the time of SET evaluation matters. As discussed, when SET evaluations are administered during the middle of the semester, results can assist teachers in re-evaluating their course design to better fit with the students’ needs and capabilities. However, this phenomenon is limited. SETs are mostly given before the end of the term or during final examinations, and studies have shown that ratings taken at this time tend to be lower compared to evaluations conducted a few weeks before final exams (Braskamp et al., 1984). Interestingly, no significant differences were found when comparing SET ratings gathered before the end of the semester with those taken in the first week of the succeeding term (Frey, 1976). This debunks the fallacy that students tend seek revenge on teachers because of issues with the grades received (Clayson et al., 2006; Skinner and Belmont, 1993). In fact, studies have proven that students who received poor grades were less likely to care enough to complete the SET (Liegle and McDonald, 2005).

In terms of the students themselves, as previously mentioned the background demographics of students do significantly affect SETs (Stark and Freishtat, 2014). Although some biases are found between gender and SET ratings (Boring et al., 2016; Feldman, 1977), still there are no consistent evidence of such difference exists (Wachtel, 2006). For instance, different studies have shown that male and female students give higher ratings as compared to their peers of opposite genders (Tatro, 1995). In some instances, students evaluate their same gender teachers higher than their opposite gender instructors (Centra, 1993a, b). With regards to ethnicity, Marsh et al. (1997) translated the SEEQ instrument into Chinese and found that there are no significant differences with the results reported as compared with the studies done in the USA. In other Chinese studies, besides the significant differences in SET ratings between students of various discipline and nature (Chen and Watkins, 2010; Liu et al., 2016), it is well noted that linguistics or foreign language teachers tend to received higher evaluations than the faculty of other discipline (Chen and Watkins, 2010).

Administration conditions or the way SETs are administered also matters. Currently, SETs are mostly collected using online course evaluations (Spooren and Van Loon, 2012). However, literature shows that online SETs results in lower participation (Anderson et al., 2005; Avery et al., 2006), although reminders do increase the response rate (Norris and Conn, 2005). With paper-and-pen SETs, the person administering the evaluation also contributes to any inconsistencies in the ratings. This holds true even if the teacher leaves the room during the SET administration and the forms are anonymous, as students may still be reluctant to provide an objective evaluation (Pulich, 1984). Many researchers have agreed that SETs should be entrusted to a third-party individual for effective collection (Braskamp et al., 1984; Centra, 1979).

The characteristics of the course subject also matters. Wachtel (2006) notes that the nature of the course subject, such as whether it is a required course or an elective, affects how students rate its importance. Sometimes students give higher ratings for elective course subjects due to their having a prior interest in the subject (Feldman, 1978). Class schedule can sometimes affect ratings, and odd schedules such as early morning classes or late afternoon classes have been found to receive the lowest SET ratings (Koushki and Kuhn, 1982). However, inconsistencies were found in several other studies (Aleamoni, 1999; Centra, 1979; Feldman, 1978; Wachtel, 2006), but it has been suggested that the level of the course is a relevant factor. The year or level of the course is closely related to the students’ age; as students continue with their studies, they becomes more mature and become aware that their opinions are taken seriously by the school administration (Spooren and Van Loon, 2012). Class size has also been found to have an impact (Feldman, 1978; Marsh, 2007) since bigger classes tend to present less opportunities for interaction between the teacher and the individual students, which can affect ratings (Meredith and Ogasawara, 1982). Finally, the subject area and the discipline also greatly influence SET ratings. Since the discipline affects how classes are held (e.g. laboratory classes compared to lecture intensive courses), comparisons between colleges are not advisable (Wachtel, 2006). For instance, task-oriented subjects such as mathematics and science offer less interaction than the social sciences (Centra, 1993a, b).

In summary, apart from the issues relating to students that affect SETs discussed in the “Experience Qualities” section of this paper, including their gender, learning motivations, and grade expectations (Boring et al., 2016), many more have been added to the discussion. Having examined the various antecedents of SETs, it is apparent that one model is not suitable for all instances. More specifically, one single type of SET cannot and should not be used to collect students’ perception across all courses and subjects. This is actually the main reason why some higher education institutions choose to use global measures to collect the summative evaluations of the class. In practice, separate SETs should be used for different course types. Since this can place a significant burden on institutions, careful analysis and research is necessary.

6. Conclusion

To sum up, literature has shown that the use of SETs to collect information regarding the teaching–learning process is commonplace. However, given the complex nature of academic processes, the data resulting from SETs are questionable and limited. The current paper presents a review of the literature on SETs, focusing on the concept of service marketing evaluation. The framework’s three criteria are used to examine SETs, whereby the teacher represents the “search” qualities, the teaching process the “experience” qualities, and the learning outcomes the “credence” qualities.

The search qualities with regard to SETs are the easily observable attributes of teachers. These may include the appearance, gender, age, ethnicity, and personalities traits of faculty members. In practice, course subject selections are made prior to enrollment in a course; students can compare faculty members when deciding which one to enroll with. Hence, the expectations of students are important. It has been noted that stereotyping faculty members according to certain demographic factors such as gender and age is unfair since these features are fixed and impossible to change. Students should look beyond these obvious factors and focus more on the teachers’ credibility and competencies.

Beyond initial search preferences, students place much importance on evaluating their learning experiences. As the literature suggests, for the sake of simplicity, many SETs include only global summative evaluations of the teaching–learning process. However, given that the nature of the learning experience is complex and multidimensional, evidence to support student development should be in the form of formative judgments. Furthermore, the actual teaching–learning process is composed of repeated service encounters (a semester in Taiwan typically lasts around 18 weeks). It is, therefore, difficult to determine whether a single class experience or the collective sum of the semester’s learning encounters contribute to the SET ratings. Considering the influence of prior expectations on SET ratings, teachers are advised to establish a psychological contract with the students. To make these agreements effective, they should be negotiated at the start of the term, so that they are shared contracts between the teacher and the students.

Finally, accepting that university education is characterized by high credence qualities, students must be aware of the concept of academic delay of gratification, so that they understand and accept that the benefits of undertaking a course are not immediate. Combining this with the importance of students’ expectations and the usefulness of creating a psychological contract, clear definitions of the potential benefits and acceptable performance should be provided during the first class. Moreover, connections should be made between previous, current, and future courses, thus providing an overview of the entire program together with career pathways.

In summary, since SETs are frequently used to collect information on effective teaching, it is important for higher education institutions to establish what kinds of SETs are effective. Given the complex factors involved and the various antecedents of SETs, it appears that no one perfect tool exists to accurately measure what happens in the classroom. As different SETs may be necessary for different courses and subjects, options such as faculty members’ self-evaluation and/or faculty members’ peer-evaluation might be considered to provide what is lacking in SETs. It is hoped that as technology advances, an innovative way of collecting SETs might be found to make the process more productive.

6.1 Recommendations for further research

Having analyzed the above issues, several recommendations for further research are proposed:

Develop and validate an SET

The development of the SET is important in the ongoing dialogue within the literature. As the literature shows, SETs are only useful if they can appropriately capture what they are being used to measure. Hence, in order to develop a relevant and constructive SET, the participation of significant stakeholders, such as school administrators, faculty, and students, is essential. A constructive SET would be capable of providing formative recommendations to improve the performance of both faculty members and students. More important, an effective SET should consider the service attributes (the search, experience, and credence qualities) that students want.

Develop an SET software program

In the current age of technological advances and big data, students are adept at using mobile devices (Gikas and Grant, 2013). Therefore, an app designed to collect SET ratings – either directly after each class, twice a semester (after midterm exams and before the end of the semester), or once before the end of the semester –could be made available to students for easy and convenient collection of data. This could initiate a new strand of SET literature. Combining technology with pedagogy can provide a more accurate evaluation of teaching, by facilitating the collection of real-time SET results.