|ORIGINAL RESEARCH PAPER
|Year : 2012 | Volume
| Issue : 1 | Page : 33-39
Assessing Undergraduate Competence in Evidencebased Medicine: A Preliminary Study on the Correlation Between Two Objective Instruments
NM Lai1, CL Teng2, S Nalliah3
1 Department of Paediatrics, Monash University Sunway Campus, Jeffrey Cheah School of Medicine and Health Sciences, Malaysia
2 Department of Family Medicine, International Medical University, Malaysia
3 Department of Obstetrics and Gynaecology, International Medical University, Malaysia
|Date of Submission||05-Aug-2011|
|Date of Revision||24-Feb-2012|
|Date of Acceptance||20-Mar-2012|
|Date of Web Publication||30-Jul-2012|
N M Lai
Senior Lecturer, Department of Paediatrics, Monash University Sunway Campus, Jeffrey Cheah School of Medicine and Health Sciences, JKR 1235, Bukit Azah, 80100, Johor Bahru, Johor
Source of Support: None, Conflict of Interest: None
Context: The Fresno test and the Berlin Questionnaire are two validated instruments for objectively assessing competence in evidence-based medicine (EBM). Although both instruments purport to assess a comprehensive range of EBM knowledge, they differ in their formats. We undertook a preliminary study using the adapted version of the two instruments to assess their correlations when administered to medical students. The adaptations were made mainly to simplify the presentation for our undergraduate students while preserving the contents that were assessed. Methods: We recruited final-year students from a Malaysian medical school from September 2006 to August 2007. The students received a structured EBM training program within their curriculum. They took the two instruments concurrently, midway through their final six months of training. We determined the correlations using either the Pearson's or Spearman's correlation depending on the data distribution. Results: Of the 120 students invited, 72 (60.0%) participated in the study. The adapted Fresno test and the Berlin Questionnaire had a Cronbach's alfa of 0.66 and 0.70, respectively. Inter-rater correlation (r) of the adapted Fresno test was 0.9. The students scored 45.4% on average [standard deviation (SD) 10.1] on the Fresno test and 44.7% (SD 14.9) on the Berlin Questionnaire (P = 0.7). The overall correlation between the two instruments was poor (r = 0.2, 95% confidence interval: -0.07 to 0.42, P = 0.08), and correlations remained poor between items assessing the same EBM domains (r = 0.01-0.2, P = 0.07-0.9). Discussion: The adapted versions of the Fresno test and the Berlin Questionnaire correlated poorly when administered to medical students. The two instruments may not be used interchangeably to assess undergraduate competence in EBM.
Keywords: Assessment, evidence-based medicine, medical education
|How to cite this article:|
Lai N M, Teng C L, Nalliah S. Assessing Undergraduate Competence in Evidencebased Medicine: A Preliminary Study on the Correlation Between Two Objective Instruments. Educ Health 2012;25:33-9
|How to cite this URL:|
Lai N M, Teng C L, Nalliah S. Assessing Undergraduate Competence in Evidencebased Medicine: A Preliminary Study on the Correlation Between Two Objective Instruments. Educ Health [serial online] 2012 [cited 2020 Mar 31];25:33-9. Available from: http://www.educationforhealth.net/text.asp?2012/25/1/33/99204
| Introduction|| |
Evidence-based medicine (EBM) was formally introduced in 1992 as an approach to enable the health care practitioners to make well-informed clinical decisions in the face of rapidly the expanding medical literature of variable quality.  A well-accepted definition of EBM was provided in 1996 by Sackett et al., as "the conscientious, explicit, and judicious use of current best evidence in making decisions about the care of individual patients."  EBM is now a central approach in health care decision making and a standard component in the Health Sciences curricula. EBM is a broad discipline with multiple domains, including asking good clinical questions, accessing, appraising and applying the evidence.  Many teaching methods on EBM have been described under a variety of settings, including classroom, bedside and mixed settings, by teachers of different backgrounds, including researchers, epidemiologists and clinicians.  Given the broad scope of the subject, its central role in patient care and the varied methods of teaching, it is important to evaluate the effectiveness of EBM teaching by assessing the learners' competence in EBM. Numerous instruments have been developed to measure competence in EBM,  each incorporating some domains or key skills, although none is recognized to date as the reference standard.
Two recently developed EBM assessment instruments are the Fresno test and the Berlin Questionnaire. , Both instruments purport to measure a comprehensive range of EBM knowledge objectively. The Fresno test comprises mostly short-answer questions, some of which demand basic arithmetic skills.  The 12-question test has been administered on medical residents,  occupational therapists  and medical students  in published reports. The Berlin Questionnaire comprises 15 scenario-based multiple-choice questions, some involving more extensive arithmetic than the Fresno test.  If these two instruments are truly comprehensive in what they measure, and indeed measure the same domains, the results of the two should correlate strongly and they could be used interchangeably depending on the format of questions desired and practicality. As far as we are aware, there have been no reports that have evaluated the correlation between these two instruments when administered concurrently to subjects in one sitting.
We undertook this study principally to evaluate the internal consistencies of the instruments adapted to suit our setting and measure their overall correlation in student performance. We also aimed to measure the correlations between the groups of questions in the two instruments purported to assess the same EBM domains. We hypothesized that the two instruments would correlate strongly overall and also between items addressing the same domains, as indicated by correlation coefficients of at least 0.5.
| Methods|| |
Participants and settings
This was a cross-sectional study in which the participants completed the two instruments in one sitting. We recruited a group of medical students from the International Medical University of Malaysia who were in the final 6 months of their 5-year undergraduate training. All 120 students from two cohorts were invited to participate in the study. The first cohort undertook their 6-month training from September 2006 to February 2007 and the second cohort from March to August 2007.
The students received EBM training progressively in their undergraduate training from the first year, with lectures, problem-based learning, research projects and short EBM summaries. In the final six months of their training, they received a clinically integrated training program encompassing all major tenets of EBM. This training program, jointly developed by the first two authors in May 2003, consisted of (i) overview lectures on the principles of EBM, searching and critical appraisal, (ii) small-group training integrated with bedside clinical sessions and (iii) journal club. The first author, who was then the resident coordinator and the assigned EBM teacher, facilitated all the training sessions, including the introductory lectures and small-group sessions, and trained the other resident faculty members.
We adapted both instruments to simplify the presentation to suit the students in our setting. The first author undertook the adaptations and two other authors determined the face validity of the adaptations and made changes to the wording to ensure that the contents assessed were the same as the original instruments. We piloted the adapted Fresno test on 12 final-year medical students in December 2005, with resultant further wording changes to some questions. The adapted instruments are included in the Appendix. The following are specific details on the adaptations:
The Fresno test
- Both clinical scenarios on question one were changed to reflect common clinical conditions in Malaysia, with corresponding changes made in question three.
- The order of questions from five to seven was reversed to reflect the format in which critical appraisal was taught in our program, namely, assessing validity and clinical importance, followed by applicability.
- Question 10 in the original test, which was a short-answered question, was moved to become question 12 in our adapted version, with a change in format to a multiple-choice question. In the original test, the candidate was asked to write down an acceptable 95% confidence interval given a clinical scenario. In our revised version, the candidate was given seven choices of response from the same clinical scenario, among which two were correct answers. The candidate was required to choose both correct answers to be awarded the full mark. A single correct answer would be accorded half the mark, and any wrong answer chosen in combination with any other answer would be accorded no mark. The change in format was decided by the authors to lessen the students' confusion that might arise for mistakenly thinking that they had to statistically generate the appropriate 95% confidence interval from the scenario given (while in truth any number would be accepted provided the range encompasses the average and did not include one).
- For questions 10 and 11, while all the statistical concepts assessed remained unchanged, the clinical scenarios were changed to actual published studies that are quoted.
- The grading rubrics were revised according to the changes above.
We categorized each item in the instruments into eight major domains in EBM to enable a meaningful comparison and correlation between the matching domains. These were well-accepted core domains in EBM,  and were in accordance with the domains delineated by Ramos et al. in the grading rubrics of the original Fresno test.  Two authors (NML and CLT) categorized the items, first independently and later via discussion, leading to a consensus. The result of our categorization is displayed in [Table 1].
- We reworded all the questions except question nine to shorten the scenarios and simplify the presentation while retaining all the elements assessed.
- Question nine: we added a short lead-in statement: "Which statement/s is/are correct?".
- Question 11: we replaced the setting of the hypothetical studies cited (Bolivian, Argentinean and Chilean) with the letters A, B and C, respectively. In keeping the questions brief, we decided to remove the specific settings of the hypothetical studies as they were considered non-essential in the question.
- Question 13: we removed the references on the places where the studies were conducted, as they were considered non-essential in the question.
- Question 14: we removed the reference to the German language literature, as it was considered non-essential in the question.
|Table 1: The items in both instruments categorized according to the major EBM domains assessed|
Click here to view
Conduct of the study
The students received a study information leaflet and a briefing from the first author (NML) prior to the study. Participation was voluntary with written consent. We informed the students that their choice to participate or not would not affect their university standings. An administrative staff oversaw the consent signing in the absence of any investigator. All students received the clinically integrated EBM training, a standard program in the university curriculum, whether or not they participated.
Two authors (LNM and TCL) independently scored the adapted Fresno test scripts, while a member of the administrative staff scored the Berlin Questionnaire using a multiple-choice answer grid. The answer grid was identical to the answer sheet that the participants used, but was pre-marked with the correct answers. The administrative staff was instructed to compare each participant's answers to the correct answers provided and count the total number of correct answers for each participant at the end of the marking. We analyzed the inter-rater difference in the final scores of the Fresno test and obtained a mean difference and a standard deviation (SD). If the inter-rater difference was more than two SDs apart, we discussed the scripts concerned to reach a consensus score for each question. We averaged the final scores of all other scripts. We also converted the scores for both instruments to percentages. We obtained the sum scores under each domain by combining the scores of all items categorized under the domain. For example, for the domain of "study design," the sum score for the Fresno test consisted of the combined scores from questions three, eight and nine, and the sum score for the Berlin Questionnaire consisted of the combined scores from questions six, seven and 14 [Table 1].
We performed reliability analysis to assess the internal consistency of both instruments, expressed as Cronbach's alfa with 95% confidence interval (CI) of the intra-class correlation coefficient. We assessed the inter-rater correlation of the adapted Fresno test scores using Pearson's correlation. We converted the scores of the two instruments into percentages. We constructed a Bland-Altman plot to illustrate the agreement between the two instruments graphically, by plotting the average between the percentage scores of the two instruments for each participant on the X-axis and the difference between the percentage scores on the Y-axis. The plot enables a global visual assessment of the level of agreement between the two instruments by the degree of scattering of the data points.  We used a paired t-test to compare the total scores and the Mann-Whitney-U test to compare the scores in each domain. We used Pearson's correlation to correlate the total scores of both instruments and Spearman's correlation to correlate the scores in each domain [PASW 18 (Chicago, IL, USA)].
We performed a post-hoc power analysis using the methods of Faul et al. via the G*Power software.  We considered a correlation coefficient (r) of at least 0.5 to be important, and set a correlation coefficient of zero as a reference for our null hypothesis, with an alfa of 0.05. We considered a power of at least 80% as acceptable in detecting such a degree of correlation. Our sample of 72 paired data provided a power of 99.5% in detecting such a degree of correlation.
The study was approved by the Research and Ethics Committees, International Medical University, Malaysia.
| Results|| |
Seventy-two of the 120 students invited (60%) participated in the study, among which 53 of 59 from the first cohort (89.8%) and 19 of 61 from the second cohort (31.1%) participated in the study. The low response rate of the second cohort was due to the fact that around half of the class was unable to make themselves available for the study due to learning commitments in their attached hospital. The internal consistency of the adapted Fresno test, expressed as Cronbach's alfa, was 0.66 (95% CI for intra-class correlation coefficient: 0.54-0.76), while the adapted Berlin Questionnaire had a Cronbach's alfa of 0.70 (95% CI for intra-class correlation coefficient: 0.60-0.79). The final scores of the Fresno test between the two raters differed with a mean of 9.6 out of 212 points (4.5%) (95% CI: 7.2-12.1 points), with a very strong correlation between the two raters' scores (r: 0.9). Student mean group performances in both tests were similar: for the Fresno test, the mean score was 96.2 out of 212 (45.4%) [SD 21.4 (10.1%)] and for the Berlin Questionnaire, the mean score was 6.7 out of 15 (44.7%) [SD 2.2 (14.9%)]. The mean difference in percentage between the scores of the two tests was 0.7% (95% CI: -3.2% to 4.5%) (P = 0.7). The score distributions of both tests are illustrated in [Figure 1] and [Figure 2].
|Figure 1: A histogram illustrating the score distribution (in percentages) for the Fresno test|
Click here to view
|Figure 2: A histogram illustrating the score distribution (in percentages) for the Berlin Questionnaire|
Click here to view
|Figure 3: A Bland-Altman plot illustrating the agreement between the Fresno test and the Berlin Questionnaire*|
*The Bland-Altman plot provides a visual assessment of the agreement between the readings of two different scales that measure the same property by plotting the average against the difference between the readings of the two scales. We used the Bland-Altman plot to depict the agreement between the student scores in the Fresno test and the Berlin Questionnaire, both of which were developed to measure competence in Evidence Based Medicine. In the fi gure above, each data point was constructed by plotting each student's average scores of the Fresno test and the Berlin Questionnaire (X-axis) against the difference in scores between the two instruments (Y-axis). The mean difference in the participants' scores between the two instruments and two standard deviations on either side of the mean are also displayed
Click here to view
The agreement between the scores of the Fresno test and the Berlin Questionnaire is graphically depicted in a Bland-Altman plot in [Figure 3]. Scores on the two tests were widely scattered, with no specific pattern of association. There was a wide range of difference in scores between the two tests, from 33.1% higher (2 SD above the mean difference) to 31.8% lower in the Fresno test (2 SD below the mean difference) [Figure 3]. Overall, the correlation between the total scores of the two tests was poor (r = 0.2, 95% CI: -0.07 to 0.42, P = 0.08).
[Table 2] displays the student scores in point and in percentages on the Fresno test and the Berlin Questionnaire according to the EBM domain assessed. In general, the students scored higher on the Fresno test in the domains of "study design" and "internal validity," and on the Berlin Questionnaire in the domains of "magnitude of effect/clinical importance" and "diagnostic accuracy" [Table 2]. [Table 3] illustrates the correlations between the two instruments in each domain, if applicable. It shows that even within the same domain, the correlations remained poor (r: -0.2 to 0.2, P = 0.07-0.9) [Table 3].
|Table 2: A breakdown of scores according to the EBM domains assessed by the two instruments. All the fi gures are rounded up to the nearest single decimals. The P values were obtained after comparing the scores of the two instruments in the same domain|
Click here to view
|Table 3: Correlations between the two instruments according to each EBM domain assessed|
Click here to view
| Discussion|| |
Our study shows that the adapted versions of the Fresno test and the Berlin Questionnaire had reasonable internal consistencies, and our inter-rater correlation was very strong in the adapted Fresno test. However, the overall student scores in the two instruments correlated poorly. The correlations remained poor even between questions that were mapped to the same domains of EBM. We also found that the poor correlations in this study could not be explained by a difference in the levels of difficulty as the students' performances were similar in these two tests.
Previous studies have evaluated the association between self-perceived competence and objectively assessed competence in EBM, and most of them showed weak associations. ,,,, As far as we are aware, this is the first study to evaluate the correlation between two objective assessment instruments in EBM. According to a systematic review, the two instruments that we used in this study have been considered the most comprehensively validated tools in assessing competence in EBM.  Our findings provided some evidence of the limitations of these two instruments.
Although the Fresno test and the Berlin Questionnaire cover some common EBM domains [Table 1], they differ not only in format but also in the way they address the EBM domains. The Fresno test covers all four major steps in EBM (asking, accessing, appraising and applying), but appears to test mainly recall or, in some parts, comprehension, while the Berlin Questionnaire covers a narrower range of EBM domains (mainly appraising and applying) and appears to assess comprehension and, possibly in some part, application. For example, in questions five to seven in the Fresno test, the candidates are asked to nominate criteria for internal validity, clinical importance and applicability, respectively, and these involved, mainly, recall and, probably in some parts, comprehension. In question eight, the candidates were asked to name the study design that was best for a study about diagnosis, and this also assessed recall. Of twelve questions, only three appeared to assess application. These included question one (generating clinical question), question 11 (deriving absolute and relative risk reductions) and question 12 (deriving expressions on diagnostic accuracy). On the other hand, in the Berlin Questionnaire, all items required the candidates to identify the most appropriate choices among similarly sounding statements. These included descriptions on study design, comments relating to biases of a study and procedures in deriving statistical expressions, which could not be answered by merely invoking recall, but probably required a sufficient level of comprehension to apply the knowledge acquired. However, it is questionable whether the exclusive use of multiple choice questions in the Berlin Questionnaire was effective in assessing application. Although the students performed better in certain domains in the Fresno test and others in the Berlin Questionnaire, their wide ranges of scores precluded any meaningful conclusion. Judging from the poor correlations between the two instruments even among the same domains, it could be inferred that the two instruments presented here address the EBM domains very differently, and neither, if used alone, provides a sufficiently comprehensive assessment in EBM.
Another factor that might have contributed to the poor correlations between the instruments was our students' commitments in the study. Because of the voluntary nature of the study, only 60% of the students participated. Because the tests did not count toward their grades, the students might not have been able to sustain their interest or focus throughout. As the students took two tests consecutively, the loss of interest or mental fatigue might have affected their performances, especially in the second test. These might have introduced random variations in their scores, which might have affected the correlations. In this study, the students' overall performance in the Fresno test was poorer than expected of our graduates. They scored 45.4% on average (SD 10.1%) compared with 57.5% (SD 10.4%) attained by a previous cohort of students who sat for the test under similar circumstances at the end of their medical training.  This was possibly because the tests were administered mid-way through their clerkship, before they completed their EBM training.
We acknowledge the following limitations in our study. First, this is a preliminary study using the adapted instruments. Although the aim of our adaptation was to simplify the presentation and improve readability while preserving the contents, the internal consistency of the adapted Fresno test did not reach the acceptable level of 0.7.  Further revisions are needed to provide a better validation of the original instrument. A second limitation is that the difference in the scoring format of the instruments (one had a wide score range of 0-212 and another had a narrow score range of 0-16) might have contributed to the apparent poor agreements between the instruments. Next, both instruments were originally validated on postgraduate respondents. Although one could argue that the EBM domains taught and assessed should not differ, whether at an undergraduate or postgraduate level, the approach to assessment and hence the design of the assessment tools might need to tailor to a specific level of training. Thus, some uncertainties remained on the suitability of the instruments in assessing undergraduates. Next, our sample was small and was represented only by undergraduate medical students. Our findings might not be applicable to the students of other health disciplines and practicing health care providers. The low response rate of the second cohort due partly to suboptimal timing of the study in relation to the students' learning commitments represented a further limitation.
| Conclusions|| |
This study provides preliminary evidence that the adapted Fresno test and the Berlin Questionnaire correlated poorly when administered to a group of medical students. This suggests that they should not be used interchangeably in assessing undergraduate competence in EBM. Further correlation studies using more extensively validated versions of the instruments should be conducted. Because EBM is a broad discipline with multiple key skills, further efforts appear necessary to identify the elements that are essential in constituting a sufficiently comprehensive instrument that measures its competence at different levels of learning.
| Acknowledgment|| |
The authors would like to thank Mrs. Hapiza Baharom from the Administrative Department, International Medical University, for providing administrative support in the study, including scoring the Berlin Questionnaire.
| References|| |
|1.||Evidence-based medicine Working Group. A new approach to teaching the practice of medicine. Journal of the American Medical Association. 1992; 268(17):2420-2425. |
|2.||Sackett DL, Rosenberg WM, Gray JA, Haynes RB, Richardson WS. Evidence based medicine: What it is and what it isn't. British Medical Journal. 1996; 312(7023):71-72. |
|3.||Straus S, Richardson W, Glasziou P, Haynes R. Evidence based medicine: How to teach and practice EBM. Edinburgh: Elsevier, Churchill Livingston; 2005. |
|4.||Del Mar C, Glasziou P, Mayer D. Teaching evidence based Medicine. British Medical Journal. 2004; 329:989-990. |
|5.||Shaneyfelt T, Baum KD, Bell D, Feldstein D, Houston TK, Kaatz S, Whelan C, Green M. Instruments for evaluating education in evidence-based practice: A systematic review. Journal of the American Medical Association. 2006; 296(9):1116-1127. |
|6.||Ramos KD, Schafer S, Tracz SM. Validation of the fresno test of competence in evidence based medicine. British Medical Journal. 2003; 326(7384):318-321. |
|7.||Fritsche L, Greenhalgh T, Falck-Ytter Y, Neumayer HH, Kunz R. Do short courses in evidence based medicine improve knowledge and skills? Validation of Berlin questionnaire and before and after study of courses in evidence based medicine. British Medical Journal. 2002; 325(7376):1338-1341. |
|8.||Dinkevich E, Markinson A, Ahsan S, Lawrence B. Effect of a brief intervention on evidence-based medicine skills of pediatric residents. BMC Medical Education. 2006; 6:1. |
|9.||McCluskey A, Lovarini M. Providing education on evidence-based practice improved knowledge but did not change behaviour: A before and after study. BMC Medical Education. 2005; 5:40. |
|10.||West CP, Jaeger TM, McDonald FS. Extended evaluation of a longitudinal medical school evidence-based medicine curriculum. Journal of General Internal Medicine. 2011; 26(6):611-615. |
|11.||Bland JM, Altman DG. Statistical methods for assessing agreement between two methods of clinical measurement. Lancet. 1986; 1(8476):307-310. |
|12.||Faul F, Erdfelder E, Buchner A, Lang AG. Statistical power analyses using G*Power 3.1: Tests for correlation and regression analyses. Behavior Research Methods. 2009; 41(4):1149-1160. |
|13.||Caspi O, McKnight P, Kruse L, Cunningham V, Figueredo AJ, Sechrest L. Evidence-based medicine: Discrepancy between perceived competence and actual performance among graduating medical students. Medical Teacher. 2006; 28(4):318-325. |
|14.||Okoromah CA, Adenuga AO, Lesi FE. Evidence-based medicine curriculum: Impact on medical students. Medical Education. 2006; 40(5):465-466. |
|15.||Young JM, Glasziou P, Ward JE. General practitioners' self ratings of skills in evidence based medicine: Validation study. British Medical Journal. 2002; 324(7343):950-951. |
|16.||Khan KS, Awonuga AO, Dwarakanath LS, Taylor R. Assessments in evidence-based medicine workshops: loose connection between perception of knowledge and its objective assessment. Medical Teacher. 2001; 23(1):92-94. |
|17.||West CP, McDonald FS. Evaluation of a longitudinal medical school evidence-based medicine curriculum: A pilot study. Journal of General Internal Medicine. 2008; 23(7):1057-1059. |
|18.||Roger E. Diffusion of innovations. New York: Free Press; 1983. |
[Figure 1], [Figure 2], [Figure 3]
[Table 1], [Table 2], [Table 3]
|This article has been cited by|
||Development and Validation of a Test for Competence in Evidence-Based Medicine
| ||Rushad Patell,Paola Raska,Natalie Lee,Gina Luciano,Deborah J. DiNardo,Amiran Baduashvili,Mel L. Anderson,Frank Merritt,Michael B. Rothberg |
| ||Journal of General Internal Medicine. 2019; |
|[Pubmed] | [DOI]|
||How to choose an evidence-based medicine knowledge test for medical students? Comparison of three knowledge measures
| ||Ivan Buljan,Ana Jeroncic,Mario Malicki,Matko Marušic,Ana Marušic |
| ||BMC Medical Education. 2018; 18(1) |
|[Pubmed] | [DOI]|
||The impact of clinical maturity on competency in evidence-based medicine: a mixed-methods study: Table 1
| ||Dragan Ilic,Basia Diug |
| ||Postgraduate Medical Journal. 2016; 92(1091): 506 |
|[Pubmed] | [DOI]|