Education for Health

: 2016  |  Volume : 29  |  Issue : 1  |  Page : 25--29

Intersecting gender, evaluations, and examinations: Averting gender bias in an obstetrics and gynecology clerkship in the United States

Laura Jacques1, Kristina Kaljo1, Robert Treat2, Joseph Davis1, Rahmouna Farez1, Michael Lund1,  
1 Department of Obstetrics and Gynecology, Medical College of Wisconsin, Wisconsin, USA
2 Department of Emergency Medicine, Medical College of Wisconsin, Wisconsin, USA

Correspondence Address:
Laura Jacques
Department of Obstetrics and Gynecology, Medical College of Wisconsin, 9200 West Wisconsin Avenue, Milwaukee, Wisconsin 53226


Background: The purpose of this study was to determine whether gender bias was present in the final third-year medical student obstetrics/gynecology clerkship performance evaluation completed by faculty and resident physicians. Methods: This was a retrospective cohort study of third-year medical students over the course of ten years (2004 – 2014) at a private medical school in the northern US state of Wisconsin. Each student's performance during their required 6-week obstetrics/gynecology clerkship was assessed by a combination of the student's scores on a clinical performance evaluation and on a standardized national subject examination. The clinical performance evaluations are comprised of 10 domains, each using a 9-point Likert scale and completed by faculty and resident physicians. All clerkships at our institution use the same evaluation form, which was designed and validated by the medical education statistics department. Final obstetrics/gynecology clerkship average clinical evaluation scores (Scale 1-9) and obstetrics/gynecology standardized national subject examination scores (Percentile 1-99) were compared to see if a gender based difference between subject examination and performance evaluation scores existed. Results: 1,976 student records were analyzed. Mean standardized national subject exam scores were significantly higher for females [74.4 (8.1)] than males [72.9 (8.2)] (Possible range 1-99) with Cohen's d = 0.2 (P = 0.001). The average female score on the clinical evaluation was mean (SD) = 7.4 (0.9), compared to an average clinical evaluation score of 7.2 (1.0) for males (P = 0.001) (range 1-9). Performance on the standardized national subject exam was significantly correlated (r = 0.3, P = 0.001) with clinical evaluation scores, and when split by gender the strength of the correlation remained. Discussion: Medical student performance on the standardized national subject exam correlated with clinical evaluations independent of gender. Women had higher scores on both the subject examination and the clinical performance evaluations. There was no evidence of gender bias in the students' clinical evaluation scores.

How to cite this article:
Jacques L, Kaljo K, Treat R, Davis J, Farez R, Lund M. Intersecting gender, evaluations, and examinations: Averting gender bias in an obstetrics and gynecology clerkship in the United States.Educ Health 2016;29:25-29

How to cite this URL:
Jacques L, Kaljo K, Treat R, Davis J, Farez R, Lund M. Intersecting gender, evaluations, and examinations: Averting gender bias in an obstetrics and gynecology clerkship in the United States. Educ Health [serial online] 2016 [cited 2020 Aug 15 ];29:25-29
Available from:

Full Text


After decades of female underrepresentation in medicine, medical education in the United States has begun to shift towards a more equal composition of female and male students.[1],[2],[3] As of 2011, women comprised 47% of the medical student population, a significant increase from 1966 when only7% of students enrolled in medical schools were female.[4] The specialty of obstetrics and gynecology (OB/GYN) has followed an even more marked trajectory of gender redistribution as women now make up 81% of the OB/GYN resident population. These statistics, while reassuring, fail to also address apparent academic achievement gaps between male and female medical students.[5],[6],[7],[8]

Women enrolled in the OB/GYN clerkship appear to perform at or above men on standardized tests, but clinical performance evaluations by faculty for these same females does not always follow the same pattern. In some cases women have received lower performance evaluations then their male counterparts.[5],[9],[10] Unfortunately, due to their subjective nature, performance assessments can allow implicit bias to occur. Several studies have proposed that clinical evaluations may be skewed to favor male students,[9],[10],[11],[12],[13] suggested by equal evaluation scores for men and women despite females out performing males on knowledge-based written exams.[9],[11] Specifically, research has shown that “evaluators may see competence signals as a threat to the traditional gender hierarchy, which leads to a negative bias when evaluating women's on-the-job performance”.[13] This is of concern as the clinical evaluation has become a key metric in modern medical education, and the performance of US medical students is often measured through a combination of standardized testing and clinical evaluations.

These considerations prompted an analysis of a decade's worth of student data at our institution, with two main goals: To ascertain whether gender bias exists in the outcomes for third-year medical students enrolled in an obstetrics and gynecology clerkship, and to examine whether such bias, if present, was more pronounced among women with more “competence signals,” that is the most academically proficient women. Additionally, we wanted to evaluate the questions of the subjective clinical evaluation to identify whether certain questions favor a certain gender. By identifying areas of bias, we can work to ensure that these gender disparities are reduced or eliminated.


We conducted this retrospective cohort study at a private Midwestern U.S. medical school which enrolls students nationally and internationally. The study received approval from the Institutional Review Board. Over the ten years that the data were collected (2004-2014), the student body of approximately 200 students per academic year was 45% female [Figure 1]. A total of 1,976 students' data were included in the study; no students were excluded. The principal investigator is the current third year OB/GYN clerkship director but was not involved with the clerkship during the study period. The other investigators in the study include the clerkship director during the study period as well as other members of the OB/GYN educational team. The OB/GYN clerkship is a third-year, six week rotation where students work with the majority of the OB/GYN faculty and residents at three different clinical sites. At the conclusion of the rotation, residents and faculty are asked to evaluate the students they worked with via an online clinical evaluation form. The clinical evaluation is an online evaluative tool that is divided into ten competency domains [Table 1] and scored on a 9 point Likert scale. The evaluation instrument was created and validated by the medical education statistics department at our institution and is used for all of the third-year medical student clerkships at our institution. The evaluation form did not change over the study period and no major changes in course construction or objectives occurred during the study period. An overall average of the evaluation's competency scores is used as part of the students' clerkship grade. At the conclusion of the clerkship, all students take the OB/GYN National Board of Medical Examiners (NBME) written subject examination, and this score is factored into the student's overall grade.{Figure 1}{Table 1}

The data were de-identified and stratified according to student gender. Student gender was assigned based upon the students' self-reported gender on their medical school application. NBME subject exam scores were selected in this study as a surrogate for academic proficiency. Students who scored in the top tercile (top thirty-three percent) of their class were identified as the most “academically successful”. Data were analyzed with two-way analysis of variance (ANOVA), using clinical evaluation scores as an outcome variable and gender and NBME subject examterciles (i.e., splitting scores into upper, middle, and lower thirds) as predictors. Cohen's d (coefficient) was calculated for effect size due to gender and NBME examtercile.

Additionally, NBME exam scores as an outcome variable were split by gender via independent t-test. Pearson correlations (r) were used to establish relational strength between clinical evaluation domains and NBME subject exam scores. Inter-item reliability was determined with Cronbach alpha and an inter-item correlation matrix for the ten clinical evaluation domains was reported to indicate the relational strength between the domains. Statistical power of determining mean differences in clinical evaluation scores between females and males was determined with Power Analysis and Sample Size (PASS) 12.0 software (Kaysville, Utah). The statistical power was estimated to be 0.91 for an estimated sample size of N = 2000, mean differences of 0.2 and standard deviation of 1.5, and criterion alpha = 0.050.


The average female score on the clinical evaluation was 7.4 (S.D. 0.9), compared to an average clinical evaluation score of 7.2 (1.0) for males (P = 0.001) [Table 1]. The average female score on the clinical evaluation for the top tercile of students was 7.6 compared to an average clinical evaluation score of 7.5 for males (P = 0.920) [Table 2].{Table 2}

Mean NBME subject exam scores were significantly higher for females [74.4 (8.1)] than males [72.9 (8.2)], with Cohen's d = 0.2 (P = 0.001). The top tercile NBME scores averaged 81.9 for females and 81.6 for males (P = 0.379).

Students' performance on the subjective clinical evaluations correlated with their NBME subject exam scores (Cronbach alpha = 0.97). The Pearson correlation between clinical evaluations and NBME exam scores for all students was r = 0.3 (P = 0.001), and when split by gender, the strength of the correlation remained at r = 0.3 for each group.

Two-way ANOVA reported statistically significant (P ≤0.050) differences in clinical evaluationmean scores for: (a) Gender (higher for females in eight of ten domains and overall average) [Table 1] and (b) NBME subject examtercile (higher for the upper tercile in all ten domains and overall average). Effect size of overall clinical evaluation average when split by (a) gender was Cohen's d = 0.2 and, (b) NBME subject examtercile, Cohen's d = 0.6. Women who scored higher on their NBME subject examinations scored higher on their clinical evaluations.

[Table 1] reports the mean scores for the individual clinical evaluation domains split by gender. This data produced significantly higher scores for women in eight of ten domains, with technical skills and medical problem solving-decision making having no statistically significant differences.

[Table 3] reports the data for the individual clinical evaluation domains broken down by both gender and NBME subjectexam score tercile. There were four statistically significant interactions between gender and clinical evaluation tercile in the domains of history (P = 0.043), medical problem solving-assessment (P = 0.045), medical problem solving-decision making (P = 0.015) and oral presentation (P = 0.048). These interactions indicate at least two of six subgroups (female in upper, middle and lower tercile and male in upper, middle and lower tercile) have significant mean differences.{Table 3}

[Table 2] reports the level of statistical significance between females and male clinical evaluation scores for only the upper NBME subject exam tercile scores. No significant differences were reported in clinical evaluation scores between males and females for any of these domains.


We hypothesized that while women might perform better on the NBME subject exam, men would match women in clinical performance evaluations indicating a gender bias on the part of the faculty's assessments of students. Notably, unlike previous studies, we also proposed an explanation for this discrepancy. Based upon the work of Inesi and Cable [13] we proposed that female students might be penalized due to subconsciously held social mores; specifically, a successful woman may upset the gender hierarchy. The more successful a woman is, the more likely she is to experience this negative bias. Unlike previous studies that demonstrated females underperforming on clinical evaluations compared to males,[9],[11] our results show that overall females scored better on both their clinical evaluations and their NBME subject exams.

We further surmised that the most academically proficient women (as identified by the top third of NBME subject exam scores) might be more vulnerable to gender bias, as their intelligence might intimidate evaluators.[13] The analysis indicated that there was no statistical difference between the male and female clinical evaluation scores in the top NBME subject examtercile. The women in this group also had equivalent NBME exam scores to the men. The results did not support the hypothesis.

Overall these results are encouraging; the subjective evaluation scores correlate well with examination scores suggesting that clinical evaluations are another valuable representation of student performance. The greater OB/GYN knowledge a student has results in higher evaluation grades regardless of gender.

In addition to the leading hypotheses, we analyzed the individual clinical evaluation domains to detect evidence of bias. Scoring in nearly all clinical evaluation domains was associated with gender and women scored higher than men in these areas. The two domains that did not favor females were “medical problem solving-decision making” and “technical skills”. These two areas are interesting in that they may be evaluating more stereotypically male behaviors and thus may be more difficult for women to achieve high scores. Certain behaviors, those that are self-assertive and achievement-oriented, have been stereotyped by society as “male behaviors” and research has indicated that women are evaluated punitively when they exhibit these behaviors.[12],[13] Perhaps the exception of these two evaluation domains from the pattern of women scoring higher than men, is in part due to these implicit cultural biases.

Existing research alludes to various discrepancies between male and female competencies and skills while in medical school. For example, females outperformed males on multiple-choice exams as well as on Observed Structured Clinical Examinations (OSCE) during an OB/GYN clerkship, yet there was no difference between male and female clinical evaluation scores.[9] Similarly, a retrospective analysis presented findings that while female students received higher clerkship grades and typically outperformed male students in both the OSCE and National Board of Medical Examiners (NBME) subject exam during the obstetrics and gynecology rotation, they did not score higher on clinical evaluations.[11] Finally, despite women scoring significantly higher on their NBME subject exam, women attained equal clinical evaluation scores.

As a consequence of our investigation, we generated new questions to continue our research in this area. Why are women scoring higher on their subject exams? Do women have an inherent advantage because of exposure to material such as the menstrual cycle, contraception and reproduction through life experience? Is there something that our clerkship should be doing in order to better assist men with learning and preparing for the subject exam? What are the cultural expectations for both genders that may be influencing the assessment of their clinical performance? These questions provide ample opportunities for continuous and ongoing research.

The strengths of this study are the number of students included in the study population, the extended length of time that data was collected, the use of reliable data, and the examination of individual domain scores for evidence of bias. The weaknesses of the study are that we used only two sources of clinical performance measures as reportable outcome and that we do not have information about the gender of evaluators, as this may additionally contribute to gender bias. Our work shows that gender bias is not always overt; it comes with nuances that might not be apparent unless we closely scrutinize the means by which our students are assessed.


The medical student performance on the clinical evaluation correlated with the NBME subject examination independent of gender. Female medical students had higher scores on both clinical evaluations and subject examinations. These results are reassuring as they suggest that gender bias was not evident during the evaluation of both male and female medical student and that faculty and resident physicians evaluated student performance irrespective of gender. While this study has temporarily calmed issues of gender bias among the obstetrics/gynecology faculty, residents, and medical students, it is important to be cognizant that issues of gender equity continue to be prevalent both nationally and globally.

Financial support and sponsorship


Conflicts of interest

There are no conflicts of interest.


1Ku M. When does gender matter? Gender differences in specialty choice among physicians. Work Occup 2011;38:221-62.
2Longo P, Straehley CJ. Whack! I've hit the glass ceiling! Women's efforts to gain status in surgery. Gend Med 2008;5:88-100.
3Miller GD, Kemmelmeier M, Dupey P. Gender differences in worry during medical school. Med Educ 2013;47:932-41.
4Roskovensky LB, Grbic D, Matthew D. The changing gender composition of US medical school applicants and matriculants. Analysis in Brief-AAMC. Washington, DC: American Association of Medical Colleges 2012;12;1-2.
5Chang JC, Odrobina MR, McIntyre-Seltman K. The effect of student gender on the obstetrics and gynecology clerkship experience. J Womens Health (Larchmt) 2010;19:87-92.
6Erikson C, Jones K, Tilton C. 2012 Physician Specialty Data Book. Washington D.C.: Association of American Medical Colleges; 2012.
7Bleakley A. Gender matters in medical education. Med Educ 2013;47:59-70.
8McKinstry B. Are there too many female medical graduates? Yes. BMJ 2008;336:748.
9Bienstock JL, Martin S, Tzou W, Fox HE. Medical students' gender is a predictor of success in the obstetrics and gynecology basic clerkship. Teach Learn Med 2002;14:240-3.
10Bibbo C, Bustamante A, Wang L, Friedman F Jr., Chen KT. Toward a better understanding of gender-based performance in the obstetrics and gynecology clerkship: Women outscore men on the NBME subject examination at one medical school. Acad Med 2015;90:379-83.
11Craig LB, Smith C, Crow SM, Driver W, Wallace M, Thompson BM. Obstetrics and gynecology clerkship for males and females: Similar curriculum, different outcomes? Med Educ Online 2013;18:21506.
12Heilman ME, Wallen AS, Fuchs D, Tamkins MM. Penalties for success: Reactions to women who succeed at male gender-typed tasks. J Appl Psychol 2004;89:416-27.
13Inesi M, Cable D. When accomplishments come back to haunt you: The negative effect of competence signals on women's performance evaluations. Pers Psychol 2015;68:615-57.