|Year : 2015 | Volume
| Issue : 3 | Page : 194-200
An Enhancement-focused framework for developing high quality single best answer multiple choice questions
Tahra AlMahmoud1, Margaret Ann Elzubeir2, Sami Shaban3, Frank Branicki4
1 Department of Surgery, College of Medicine and Health Sciences, United Arab Emirates University, Al Ain, United Arab Emirates
2 Professor, Department of Medical Education, College of Medicine and Health Sciences, United Arab Emirates University, Al Ain, United Arab Emirates
3 Department of Medical Education, College of Medicine and Health Sciences, United Arab Emirates University, Al Ain, United Arab Emirates
4 Professor, Department of Surgery, College of Medicine and Health Sciences, United Arab Emirates University, Al Ain, United Arab Emirates
|Date of Web Publication||11-Mar-2016|
College of Medicine and Health Sciences, United Arab Emirates University, P.O. Box: 17666, Al Ain
United Arab Emirates
Source of Support: None, Conflict of Interest: None
Background: The primary goal of any assessment of students is to provide valid and reliable evaluations of students' knowledge and skills as well as provision of accurate feedback to students about their performance. Contrary to best practice guidelines for development of multiple choice questions (MCQs), however, items used within medical schools are often flawed. This disappoints students and discourages examiners from using in-house MCQ databases. Vetting and reviewing items can improve the quality of MCQs. In this paper, we describe our approach to standardize the format used for MCQ assessment and provide recommendations for quality enhancement of high-stakes assessment. Methods: A collaborative enhancement-focused vetting and review approach to development of high quality single best answer MCQs has been described. Results: Implementation of a collaborative strategy to blueprint, vet, review and standard set MCQ items for high stakes examinations can effectively contribute to assessment quality assurance. Similarly, shared responsibility for post examination analyses of items may reveal the psychometric properties of items in need of improvement and contribute to closure of the assessment outcomes feedback loop. Discussion: Devolving responsibility for implementation of assessment processes as an integral part of educational practices and values can maximize reliability and standards of assessment processes. We contend that while logistics and time constraints are of concern to busy faculty members, judicious utilization of resources to develop well-written MCQ items are well worth the effort to produce reliable and valid examinee scores. An enhancement-focused approach can be institutionally rewarding and lead to improved quality of high stakes assessments.
Keywords: Multiple Choice Questions, quality assurance, enhancement-focused, vetting, standard setting, blueprinting
|How to cite this article:|
AlMahmoud T, Elzubeir MA, Shaban S, Branicki F. An Enhancement-focused framework for developing high quality single best answer multiple choice questions. Educ Health 2015;28:194-200
|How to cite this URL:|
AlMahmoud T, Elzubeir MA, Shaban S, Branicki F. An Enhancement-focused framework for developing high quality single best answer multiple choice questions. Educ Health [serial online] 2015 [cited 2020 May 29];28:194-200. Available from: http://www.educationforhealth.net/text.asp?2015/28/3/194/178604
| Background|| |
"Nothing that we do to, or for, our students is more important than our assessment of their work and the feedback we give them on it. The results of our assessment influence our students for the rest of their lives and careers - fine if we get it right, but unthinkable if we get it wrong". 
Historically, there have been misconceptions that Multiple Choice Questions (MCQs) are a tool for assessment of isolated factual knowledge. , The perception is often that MCQs lack ability to test higher order cognitive skills, when compared with free-response questions for which examinees generate answers rather than select fixed response options. The use of clinical vignettes, with analysis and synthesis of information facilitates clinical reasoning and critical decision making skills and also enables basic science knowledge to be tested in a clinical context.  Furthermore, in contrast to short-answer and hybrid format questions, MCQ items containing authentic clinical vignettes have been shown to improve retrieval practice and meaningful learning. , MCQs are a commonly used student assessment method in health professions education with the advantages of being able to test a wide range of content with large numbers of students, with rapid scoring along with its objectivity, reliability and efficiency if constructed properly. , Hence, more recently, many high stakes examinations have adopted use of MCQs rather than true and false and other question formats. ,, Several authors have given best practice guidance on construction of high quality MCQs ,,,, but violations persist within health professions education.  Research has, however, shown that education and training of faculty substantially improves the quality of MCQs they create , and particularly for life-altering, high stakes examinations developing faculty skills in writing good quality MCQs should be an imperative. In recent years many international medical schools introduced faculty enhancement initiatives conducted by medical education experts to assist question writers in the processes for improved item quality. Research has also shown that these development opportunities should ideally be of a longitudinal nature, allowing for practice, reflection and feedback. 
Why an enhancement-focused approach?
This process requires collaborative efforts from key stake holders, including content experts who contribute to the validation of relevance of content being assessed, clerkship coordinators who can identify questions that are not covered during the courses and medical educators who assess item structures. Paramount to development of items and an assessment blueprint is stake holders' understanding of the curriculum.  Design of the blueprint (also referred to as the test specification) involves reflection on core content areas to be covered in the examination, weightings of the content areas, levels of cognitive complexity, most appropriate test item format (e.g. MCQs, OSCE stations) and number of items. Indeed, th roughout the process recommended herein, we highlight MCQ quality enhancement as a core dimension of quality assurance and consider promoting and safeguarding validity and reliability of MCQ items achievable only through a collaborative approach involving various core clinical disciplines, including Internal Medicine, Family Medicine, Obstetrics/Gynaecology, Paediatrics, Psychiatry, Public Health, Radiology and Surgery.
Test material development and review process
We outline below a time-efficient, phased or stepwise, practical approach to quality enhancement of a written in-house MCQ assessment; from vetting in departments to standard setting, students' orientation and assessment delivery. Step-wise approaches are not limited to health care education quality improvement initiatives. Both educational and health care organizations increasingly advocate use of structured, incremental approaches to improving delivery and management of educational and care processes, which can be reviewed and adjusted according to circumstances and needs of relevant stakeholders. ,,
Step 1: Item development, review and collation at department level
The first step is the requisition of a specified set of MCQs from each clinical department wherein a single faculty member in the discipline coordinates the assignment and outlines the content and number of items to be prepared for his/her department. The number of requested questions varies for sub-specialties and depends on weighting of the discipline in the assessment blueprint. Following generation of the items at department level, these are reviewed by a Coordinator, nominated by the chairs of each department, who makes a brief assessment of the soundness of individual test item. General checking of the structure and grammatical construct of the submitted material is undertaken by the Coordinator and, if necessary, the question is returned to source for verification of queries.
Step 2: Review by the Final Integrated Examination (FIE) Committee
The second step involves vigorous review of each item. In our context this included the Final Integrated Committee (FIE Committee), comprising a panel of content experts most familiar with the various domains being assessed, including six to ten faculty clinicians including coordinators from different disciplines along with a senior medical educator. In this way, items are developed using a multidisciplinary collaborative approach which reduces unnecessary burden on individual faculty preparing MCQs, increases collective ownership of the end product, and ensures transparency and a focus on enhancement as a core dimension of quality assurance in assessment.
The Coordinator representative for each academic department reads the projected question aloud. The assessment format is reviewed collectively including its structure, content, clarity, and level of difficulty. At this point if a question is thought to be suitable but in need of major modification, the item is returned to the source for revision and resubmission [Table 1] and [Table 2]. Questions that are considered poor in structure or content, principally requiring recall of isolated facts or a nill-defined task and which could not be clarified or revamped are eliminated by the panel. In appropriate items in the option list are replaced by plausible options. Finally, the edited questions are returned to the source for revision and final approval.
|Table 1: Example of an acceptable MCQ item from a structural perspective.*Note that the stem is clear and the incorrect options are not totally wrong|
Click here to view
|Table 2: Example of a poorly constructed MCQ, options provide clues to the test wise examinee as the symptoms are acute|
Click here to view
Finalized contributions are considered as a whole in terms of the appropriate level of difficulty for a graduating medical student, and whether there is an obviously better way of testing the content.
Step 3: Generating first attempt and resit papers
Subsequently, department chairs are requested to select a set of (40) questions for two in-house written examination papers. In the event of students failing the first paper (i.e., they have an overall mark below the pass mark), they are given a resit opportunity in the autumn. Two papers are therefore generated, each comprising 100 different questions; one of these papers is for the resit examination. Candidates for re-examination are normally re-examined in all elements of the assessment and by the same methods as the first attempt.
Step 4: Standard setting
Standard setting involves deciding on a score that determines success or failure on the assessment. It thus connotes the minimal desirable level of performance (the standard) students must achieve to demonstrate mastery of knowledge, skills or ability before graduating. , The purpose and stakes of the assessment thus play an important role in determining which method of standard setting is applied. Having determined that an absolute rather than relative or norm referenced approach was appropriate for this high stakes examination where competence for licensing is being established, we adopted practical standard setting considerations including: Deciding on the method of standard setting; selecting the judges; training judges, holding standard setting meetings; and calculating the cut point. 
We selected the modified Angoff method of standard setting which utilizes trained subject experts as standard setters. This well researched, relatively easy to describe and execute method is frequently adopted in health professions assessment. Despite extensive research conducted comparing the various standard-setting methods, none unequivocally support use of one method above others.  Downing et al.  however lists steps to implementing the Angoff standard setting process. In our context, a minimum of six content expert judges who are teachers or course coordinators at the level of attainment being considered were included. All standard setters must be knowledgeable about what students should know at this final stage of undergraduate education and training.
Following training, judges individually estimate what percentage of the hypothetical border line candidate(s) would respond correctly to the item. A useful aid to this procedure in our context was utilization of clickers, an audience response system, to log and display each individual judge's response. Estimates appeared instantly on a screen in view of the group. If there was wide divergence of opinion, after discussion, outliers were free to change their responses. Judges' estimates are then averaged for each item and the cut-score is the sum of these averages. Standard setting results were compiled once judgments were made for each of the 100 MCQ items. Recently published research has, however, indicated that an individual modified Angoff method when compared with a group standard setting approach is less time consuming and feasible. 
Following standard setting, it is advised to conduct an evaluation of the procedure to determine its validity. Validity evidence includes procedural, internal and external sources. Twenty years ago, Kane indicated that procedural validity evidence comes from evaluation of selection and execution of the standard setting method(s), including selection and qualifications of judges, their application of the methodology and perspectives regarding its implementation and appropriateness.  Internal validity evidence can be provided by evaluation of the consistency of judges' ratings, e.g., correlation and measures of variability of recommended cut scores. Triangulation of results of standard setting with another external indicator of cut score is a recommended source of external validity evidence.  This type of validity evidence is however, more difficult to collect because it typically involves use of an additional panel of judges, an additional standard setting method or obtaining results from another measure of the same constructs.
Step 6: Student orientation to the exam
Most medical schools provide orientation for students at commencement of their studies informing them of teaching, learning and assessment principles and practices at their institution. Students are important stakeholders in assessment and making procedures explicit and transparent is good practice. Some medical schools have also found engaging students in the development of formative assessment items, which then contribute to the bank of questions advantageous to both students and the institution.  In our context, the FIE Director annually orients students some weeks prior to this high stakes, Final Integrated Exam (FIE) about examination components and process, verbally and in writing. Furthermore, for the first time in the 2012/13 academic year an electronic version of the FIE was made available to students and they were given information regarding how to take the test online. We have found that use of technology to deliver and manage assessments in a secure, reliable and valid manner, with scrambling of questions for security, is highly effective.
Faculty members of medical colleges may or may not be familiar with modern concepts in medical education or indeed, be relatively new to teaching and assessment processes. Another issue is that in many cosmopolitan countries such as the UAE, tutors/examiners are from diverse cultural and linguistic backgrounds with English not being their first language. This carries the potential for a lack of uniformity in the conduct and quality of assessment. Furthermore, clinical faculties are often pre-occupied with other commitments such as teaching, research and clinical duties, and may have little time for the generation of quality assessment tools.
| Discussion|| |
It is often incorrectly assumed that MCQs are unsuitable for testing problem solving skills, but this is in fact quite possible if constructed properly. ,, Indeed, if constructed according to published guidelines, and are drawn from a representative sample of pre-determined learning outcomes and objectives, MCQs can provide a high degree of test validity and provide extensive coverage of higher order cognitive thinking skills such as evaluation, synthesis and application of knowledge. We avoid context-free scenarios, as these tend to test factual knowledge  and substituted these with short case-based questions which involves a more complex thought process when making a decision. 
Logistics and the costs of the context-rich MCQ preparation and the refinement process might raise concerns, particularly, that producing well-constructed scenarios may be time consuming ranging from three hours for an inexperienced question writer to only 15 minutes for someone who is experienced.  This could be a major barrier to implementing the process, but as faculty members become familiar and gain experience with the exercise as a whole it becomes more effective. As observed by others, questions prepared solely by individuals are poorer test material than those generated by collaborative efforts.
Using a 'blueprint' [Table 3] and [Table 4] ensures mapping of test items to specific learning outcomes and allows sampling across a wide range of topics and skill domains. We discovered that including a theme, wherein the title for the clinical scenario is chosen, facilitates attention towards a focused problem [Table 1], [Table 2]. In addition, faculties have told us that providing ideal sample questions are useful. Some have suggested that during the Committee review process, after reading the stem, reviewers are able to answer the question without even seeing the list  sometimes referred to as a 'hand-cover' test. We support their view that it does provide a check on item difficulty. We have further observed, that with increasing number of faculty engaged in this enhancement-focused exercise, the quality of the initial item offerings submitted by departments is improving in quality, acceptability and applicability.
We support the premise that it is course out comes that should determine what the question is testing.  Question formats should be consistent with the stated outcomes of the course, hence, we ask a content expert to be the item writer for that content. In many instances he/she is the faculty member involved in that particular aspect of clerkship teaching. This also supports students' motivation to do well by focusing their attention on what might appear in the examination paper, directing their study efforts to curriculum outcomes. When question content is imbedded in the learned curriculum this enhances classroom interactive teaching and attendance at hospital activities (alignment and positive reinforcement). Tips on preparing high quality MCQ exams are provided in [Table 5].
|Table 5: Key learning points for high quality single best answer multiple choice questions|
Click here to view
Following the conduct of assessments, individual item discriminatory index (point bi-serial) data have been considered in collaboration with our College Assessment Officer, who has expertise in data analysis. Accordingly, occasionally these data have led to exclusion of new questions from summative assessments. As a result, we now utilize our question bank containing acceptable point bi-serial data for 50% of our future examination papers. Accordingly, following completion of the exercise for generation of new FIE questions for each exam paper we can examine the 'blueprint' covered and go directly to the bank for access to curriculum content in the blueprint which are not covered. This process can be facilitated by previous tagging of bank questions in various fields, options, organ systems and disciplines, domain of epidemiology, diagnosis, investigations, treatment as well as degrees of difficulty. This endeavor will hopefully serve to further enhance the quality of our written MCQ papers used for high stakes summative assessments.
For high stakes assessments in which the described process was utilized, analysis of 800 questions used over the past eight years revealed that most questions resulted in good psychometrics: The mean item difficulty index (i.e., percentage of students with the correct response) was 66% (range 62-68%) and the items' discriminatory index (i.e., the point bi-serial value which is a correlation between those who got the question right and those who did better regarding the mean exam score) was 2.1 (range: 1.2-2.2). Another indication of high quality questions produced is the high correlation between our high stakes assessments (produced via the described process) and international assessments that the same students -appear for such as National Board of Medical Examiners and International Foundations Of Medicine examinations (correlation was 0.7 and 0.8 respectively).
| Conclusion|| |
Writing MCQs for high stakes examinations is a complex process. While logistics, cost considerations and time constraints are of concern to busy faculty members, there is nevertheless, a consensus that well-written MCQs based on blue printing and standard setting are required to produce reliable and valid examinee scores. An enhancement-focused approach can be rewarding and lead to improved outcomes for assessment of students.
We wish to acknowledge the important contributions of all Heads of Department, College of Medicine and Health Sciences, their representatives and coordinators who have been vital to the success of our enhancement-focused approach to MCQ improvement at our institution.
Financial support and sponsorship
Conflicts of interest
There are no conflicts of interest.
| References|| |
Race P, Brown S, Smith B: 500 Tips on Assessment 2nd
edition. London: Routledge-Falmer; 2005.
Case SM, Swanson DB: Extended-matching items: A practical alternative to free-response questions. Teaching and learning in Medicine 1993;5(2):107-15.
Schuwirth LW, van der Vleuten CP: ABC of learning and teaching in medicine: Written assessment. BMJ 2003;326(7390):643-5. .
Karpicke JD, Blunt JR: Retrieval practice produces more learning than elaborative studying with concept mapping. Science 2011;331(6018):772-5. .
Smith MA, Karpicke JD: Retrieval practice with short-answer, multiple-choice, and hybrid tests. Memory 2014;22(7):784-802. .
Case SM, Swanson DB: Constructing Written Test Questions for the Basic and Clinical Sciences. 3rd
edition (revised) edition. Edited by Examiners NBoM. Philadelphia 2001.
McCoubrie P, McKnight L: Single best answer MCQs: A new format for the FRCR part 2a exam. Clin Radiol 2008;63(5):506-10. .
Tan LT, McAleer JJ: The introduction of single best answer questions as a test of knowledge in the final examination for the fellowship of the Royal College of Radiologists in Clinical Oncology. Clin Oncol (R Coll Radiol) 2008;20(8):571-6. .
Downing SM: The effects of violating standard item writing principles on tests and students: The consequences of using flawed test items on achievement examinations in medical education. Adv Health Sci Educ Theory Pract 2005;10(2):133-43. .
Knight AM, Cole KA, Kern DE, Barker LR, Kolodner K, Wright SM: Long-term follow-up of a longitudinal faculty development program in teaching skills. J Gen Intern Med 2005;20(8):721-5. .
Casteleijn NF, Visser FW, Drenth JP, Gevers TJ, Groen GJ, Hogan MC, Gansevoort RT: A stepwise approach for effective management of chronic pain in autosomal-dominant polycystic kidney disease. Nephrol Dial Transplant 2014;29 Suppl 4:iv142-153. .
de Groot JJ, Maessen JM, Slangen BF, Winkens B, Dirksen CD, van der Weijden T: A stepped strategy that aims at the nationwide implementation of the Enhanced Recovery After Surgery programme in major gynaecological surgery: Study protocol of a cluster randomised controlled trial. Implement Sci 2015;10(1):106. .
Downing SM, Tekian A, Yudkowsky R: Procedures for establishing defensible absolute passing scores on performance examinations in health professions education. Teach Learn Med 2006;18(1):50-7. .
Senthong V, Chindaprasirt J, Sawanyawisuth K, Aekphachaisawat N, Chaowattanapanit S, Limpawattana P, Choonhakarn C, Sookprasert A: Group versus modified individual standard-setting on multiple-choice questions with the Angoff method for fourth-year medical students in the internal medicine clerkship. Adv Med Educ Pract 2013;4:195-200. .
Kane M: Validating the performance standards associated with passing scores. Review of Educational Research 1994;64(3):425-61.
Oldham J, Freeman A, Chamberlain S, Ricketts C: Enhancing Teaching and Learning through assessment: Deriving an appropriate model. The Netherlands: Springer; 2007.
Norcini JJ, McKinley DW: Assessment methods in medical education. Teaching and Teacher Education 2007;23:239-50.
van der Vleuten CP, Schuwirth LW: Assessing professional competence: From methods to programmes. Med Educ 2005, 39(3):309-317. .
[Table 1], [Table 2], [Table 3], [Table 4], [Table 5]