Education for Health

: 2018  |  Volume : 31  |  Issue : 3  |  Page : 148--154

A psychometric appraisal of the dundee ready education environment measure in a medical school in Chile

Irmeli Roine1, Yerko Molina2, Marianella Cáneo1,  
1 Medical Faculty, University Diego Portales, Santiago, Chile
2 School of Psychology, University Adolfo Ibáñez, Santiago, Chile

Correspondence Address:
Irmeli Roine
Quillay 2580, Providencia, Santiago


Background: The Dundee Ready Education Environment Measure (DREEM) is used in the curricular development of future health professionals worldwide, but often without first locally testing its psychometric qualities, for example, construct validity and internal consistency. These characteristics are modified by different environments, but must be locally appropriate to obtain unequivocal and reliable conclusions about the strong and weak areas of a curriculum. Here, we report the results of the psychometric testing of DREEM results in our institution in Chile. Methods: All 1st–5th-year undergraduate medical students were asked to respond the DREEM questionnaire. The construct validity of the results was assessed by an exploratory factor analysis (EFA), and their internal consistency was assessed by Cronbach's α. The Institutional Review Board approved the study, and each student signed an informed consent. Results: A total of 304 (88%) eligible students, aged 22 ± 2 years, 46% of females, answered the questionnaire. The EFA determined four instead of the original DREEM's five subareas with clearly different item contents. The inner consistencies of the locally defined subareas of teaching, learning, teachers and organizational aspects, and self-perception surpassed the originals with Cronbach's α values of 0.79, 0.78, 0.77, and 0.82, respectively. Discussion: The optimal psychometric structure to accurately interpret our DREEM results differed from both the original and previous similar studies, including one from Chile. There are several potential explanations for these differences, but most importantly, they underline the need to first define the psychometric characteristics of the test results, to obtain accurate conclusions about the strengths and the weaknesses of a curriculum.

How to cite this article:
Roine I, Molina Y, Cáneo M. A psychometric appraisal of the dundee ready education environment measure in a medical school in Chile.Educ Health 2018;31:148-154

How to cite this URL:
Roine I, Molina Y, Cáneo M. A psychometric appraisal of the dundee ready education environment measure in a medical school in Chile. Educ Health [serial online] 2018 [cited 2019 Aug 19 ];31:148-154
Available from:

Full Text


The Dundee Ready Education Environment Measure (DREEM) was developed 20 years ago by an international staff led by Dr. S. Roff for evaluating the educational climate of future health professionals.[1] It consists of fifty items that denote the different aspects of teaching and learning grouped together into five subareas (also called factors, or dominions) covering teaching, teachers, academic self-perception, atmosphere, and social self-perception. The degree of agreement of the responder with an item is scored in a 5-point Likert scale, ranging from “fully disagree” to “fully agree,” giving it 0 to 4 points, respectively. This transforms the responder's perceptions into numbers and scores, which allow quantitative analysis. The total DREEM score describes the general educational climate and has been used to compare different curriculums, or medical faculties, or the students' perceptions according to sex, study year, and grade point average.[2],[3] The grouping of the items into smaller data units represented by the subareas is meant to facilitate the interpretation and understanding of the relationships and patterns within its fifty items and help identify in more detail the strong and the weak areas of a given curriculum.

When an instrument measuring perceptions is used in a novel environment, some of its psychometric characteristics can change, such as the subarea structure (construct validity) and reliability (internal consistency). Before using it as a diagnostic tool, experts recommend testing them first locally.[4],[5],[6] Such local studies from Sweden, Germany, Chile, and Ghana defined subareas that differed from the originals.[4],[7],[8],[9] They also found the reliability of specific subareas as insufficient, although sufficient for the whole instrument. The potential consequences of a locally inaccurate subarea structure are equivocal conclusions about a curriculum's areas of strengths and weaknesses, whereas an insufficient internal consistency makes the instrument an inaccurate tool.

The objective of our study was to examine both the construct validity and the internal consistency of DREEM using the results from our 1st- to 5th-year medical students in Santiago, Chile, and to compare them with the original DREEM and previous similar studies.


This is a descriptive, observational study.

All the 1st- to 5th-year medical students present at the medical school on the 3rd week of the second semester of 2016, before the end of term examinations, were invited to respond anonymously the DREEM questionnaire in its Spanish, validated version.[10] The students who were absent on that day were offered another opportunity to participate a few days later. The answers, marked on an electronically readable sheet, were transformed into a database, where the points given to the negative items were reversed, as instructed.[1] The Faculty's Ethics Committee approved the study protocol, and each participant signed the informed consent form.

Our medical school, part of the Faculty of Medicine of the private University Diego Portales, has a curriculum of 14 semesters leading to the professional title of a physician. The curriculum consists of ten semesters of integrated undergraduate studies and four semesters of internships. The students have contact with patients starting from the first semester by performing medical interviews under the supervision of teachers. The methodology of problem-based learning starts by the third semester, and simulation is employed during the fifth–tenth semesters as an aid to develop practical skills in various medical procedures and scenarios.

Evaluation of construct validity

Construct validity was evaluated by an exploratory factor analysis (EFA).[11],[12] An adequate sample size for this analysis is defined as at least 300.[6]

To select an appropriate matrix and extraction method, all items' results were first subjected to a descriptive analysis, including statistical evaluations of asymmetry and kurtosis, and the test of normality of Kolmogorov–Smirnov. Because nine of the fifty items had a coefficient of asymmetry inferior to −1 or superior to 1, and none had a normal distribution, a polychoric matrix was used [13],[14] with a minimal ordinal squares extraction method. The appropriateness of using a polychoric matrix was examined by the test of Bartlett [15] and the index of Kayser–Meyer–Olkin (KMO).[16]

The number of subareas to extract was determined by considering the following three criteria in a complementary fashion: (1) Kaiser–Guttman eigenvalue >1,[16] (2) Cattell's scree plot,[17] and (3) the parallel analysis of Horn.[18] The adequacy of the solution was examined through the value of its root mean square of residuals (RMSR).[19]

The pertinence of the items to one of the subareas was based on their factor loadings. If the item showed cross loading, for example, a factor loading above 0.3 with more than one subarea, it was assigned to the one with a higher loading if the difference was above 0.110, or, when below 0.110, to complement one of the other subareas according to that subareas other item contents after a discussion and in agreement with all the authors. All the authors participated in the naming of the subareas to best reflect their item contents.

Internal consistency

The reliability of DREEM was analyzed in the dimension of internal consistency using Cronbach's α.[20] This was done for the subareas and the whole of the original DREEM, although the last is considered inappropriate in a large questionnaire, and the ones defined in the present analysis. The reliability expressed by α value from 0.60 to 0.69 was considered acceptable, from 0.70 to 0.79 as high, and above 0.8 as optimal, whereas values above 0.9 suggest that the items may be too similar.[21]

Statview ® software version 5.0.1 (SAS Institute, Cary, NC, USA) and SPSS software (SPSS for Windows, version 22, 2012; SPSS Inc., Chicago, IL, USA) were used for the analysis of the data, Excel for graphics, and Factor for factorial analysis.


In all, 304 of the 345 (88%) eligible undergraduate students fulfilling the inclusion criteria answered the questionnaire, with 140 (46%) females and 164 (54%) males. Twenty-five students could not be reached and 16 students did not sign the informed consent. The mean age of the participants was 22 ± 2.5 years. In all, the number (percentage) of students who participated per year of study was 68/72 (94%), 60/70 (86%), 63/78 (81%), 72/82 (88%), and 41/43 (95%) for the 1st-, 2nd-, 3rd-, 4th-, and 5th-year students, respectively. The sample contained a similar proportion of all the 1st–4th year students (20%–24%), but a smaller proportion of the 5th-year students (13%) because many of them were stationed at a clinical facility, out of the medical school. A total of 298 students answered each question, five missed one question, and one missed two questions. The minimum total score was 59 and the maximum was 170, with a normal distribution and a mean of 122 ± 22 points.

Construct validity

The analyzed matrix possessed adequate adjustment statistics for the realization of the factorial analysis with the result of the test of Bartlett of 3460.7 (df = 1225; P < 0.001), showing that the polychoric matrix correlations did not correspond to an identity matrix. Furthermore, the KMO index of 0.84 confirms that the matrix is adequate for this analysis.

The Kaiser–Guttman test identified 14 subareas with eigenvalues above 1, whereas the Cattell's scree test demonstrated a significant increase in the slope by four subareas [Figure 1]. The parallel analysis of Horn indicated an ideal number of subareas as four based on the finding that they had eigenvalues above those obtained from random samples and in agreement with the scree test. Taking into account the three results, we decided to use four subareas. Their eigenvalues were 11.03 3.05, 2.54 2.21, respectively, and they explained 37.7% of the total variance of the instrument. The subarea solution was rotated obliquely because we did not expect orthogenicity between them. The adjustment statistics for the four subareas' solution showed a RMSR value of 0.0559. This is lower than the value of 0.0711 for the present model by the criteria of Kelley, thus implying that the four subareas' solution fits adequately the empiric data.{Figure 1}

Three of the fifty items showed factor loadings below 0.3 (items 3, 25, and 48) [Table 1] and they, together with item 38 (with a factor loading of 0.323), had communalities below 0.2. They were not excluded from our subarea solution in order to preserve the total DREEM score and thus the comparability with results from numerous previous studies. Of the other items, 32 loaded to one subarea, whereas 14 cross loaded to two and one to three subareas. Of the 15 cross loading items, nine had a loading factor difference above 0.110, whereas six had a difference below 0.110 and were assigned to the subarea with similar contents. The four subareas formed [Table 1] were titled teaching, learning experience, teacher characteristics and organizational aspects, and self-perception. Their item contents differed clearly from that of the original DREEM subareas. For instance, our “self-perception” area had two items from the original “teachers,” three from “academic self-perception,” four from “atmosphere,” and four from “social self-perception.”{Table 1}

Internal consistency

For the original DREEM subareas, the internal consistency α was 0.77, 0.75, 0.64, 0.69, and 0.59 for teaching, teachers, academic self-perception, atmosphere, and social self-perception, respectively (0.91 for the whole instrument). For the new four subareas' solution, the internal consistency was higher with α values of 0.79, 0.78, and 0.77 for the subareas teaching, learning, and teacher characteristics and organizational aspects, respectively, and optimal for self-perception with α of 0.82.


Our analysis of the subarea structure (construct validity) of DREEM results in Chilean undergraduate medical students identified four instead of the five original DREEM subareas, and the new subareas surpassed the originals in internal consistency [Table 1]. In finding a subarea structure that differed from the original DREEM, our results agree with four previous EFAs from similar samples [Table 2]. They found either five subareas [4],[7] or four,[8],[9] and explained, similarly to our results, approximately 40% of the encountered variance. Importantly, in all cases, including ours, the subarea titles were similar to the originals, but their item contents differed clearly from both the original DREEM and in-between the different studies. Like us, they also found improved internal consistencies in the new subareas.{Table 2}

The variation in the psychometric qualities of DREEM in different environments is usually attributed to cultural, social, or curricular aspects. Yet, a closer look at the methodological details of these studies reveals other aspects that can also explain the divergent results. First, the student samples were similar [Table 2], and at least four of the five had the minimal required sample size, but they presented some differences in age, sex, year of study, sampling methods, and response rates. In one study,[4] some of the students were included twice in different years, whereas in another study,[8] sampling or inclusion criteria were not specified, and two studies showed either a very low response rate of 55%[7] or did not inform it.[8] Although these differences were not very striking, they can limit the direct comparability of the results because sex, year of study, and a possible selection bias of voluntary students with maybe higher grade point averages are known to significantly influence the students' perceptions [2] and thus the DREEM results.

Second, the procedures to examine construct validity varied. We used an EFA, which is presently considered as the best practice,[6],[22],[23] whereas the others reportedly used EFA, but actually employed the principal component analysis [PCA, [Table 2]. Both EFA and PCA are complex mathematical procedures requiring adequate software and specific knowhow, but they are not interchangeable and probably give different results.[6] Furthermore, the types of matrix used, extraction methods, and the criteria to define the number of factors differed between the studies [Table 2].

Third, the varying criteria to allocate the cross loading items can also have contributed to the differences in the subareas. Although ideally there should be only a few of the cross loading items,[24] their number in the compared studies was substantial ranging from 13 to 21 in four of the five studies, thus encompassing up 26%–42% of the total DREEM items [Table 3]. None of the cross loading items was the same in all of the five studies, and only three were repeated in four of the five studies. The allocation principle in two studies [4],[8] was to adhere to the higher factor loading, even if the difference was minimal, whereas we took into account the size of the difference, and two others [7],[9] did not specify the criteria.{Table 3}

All studies reported very high internal consistencies with Cronbach's α values above 0.9 for the whole of the original DREEM indicating unnecessary repetition, whereas in three studies, including ours, the original DREEM subarea of social self-perception displayed insufficient reliability below 0.6. In all the five studies, the internal consistencies of the subareas improved with the new subarea solution (data not shown, except for our results), thus increasing the reliability of their results.

Altogether, we find DREEM a highly useful instrument for examining the educational climate with the great advantages of its several validated translations and widespread use in many medical schools and health faculties all over the world. Our results and their comparison with previous, similar studies showing psychometric differences, not only compared with the original DREEM but also in-between these studies, underline the need to first test the psychometric characteristics of the DREEM results locally to obtain optimal accuracy and reliability in defining the strengths and the weaknesses of a given curriculum.

The psychometric differences may stem not only from cultural aspects, but also from the notable methodological differences between the individual studies. A robust analysis of combined DREEM data from different parts of the world with adequate methodology may be a way to define a truly international version of it.[25]


The limitations of our study include an underrepresentation of the 5th-year students' opinions because half of them were not present at the medical school at the time of the study, although we obtained a good response rate and usually fully answered questionnaires from the students. How much this could have influenced the factorial analysis results is not known, but based on the total number of participants, we deem its effect as probably quite small.


The authors thank Mr. Carlos Roldan for transforming the answer sheets into a database and the goodwill of the students in answering the questionnaire.

Financial support and sponsorship


Conflicts of interest

There are no conflicts of interest.


1Roff S, McAleer S, Harden RN, Al-Qahtani M, Uddin Ahmed A, Deza H, et al. Development and validation of the Dundee ready education environment measure (DREEM). Med Teach 1997;19:295-9.
2McAleer S, Roff S. A practical guide to using the Dundee Ready Education Measure (DREEM). AMEE Medical Education Guide No. 23 Curriculum, Environment, Climate, Quality and Change in Medical Education; a Unifying Perspective. Dundee, UK: Association of Medical Education in Europe; 2002.
3Roff S. The Dundee ready educational environment measure (DREEM) – a generic instrument for measuring students' perceptions of undergraduate health professions curricula. Med Teach 2005;27:322-5.
4Jakobsson U, Danielsen N, Edgren G. Psychometric evaluation of the Dundee ready educational environment measure: Swedish version. Med Teach 2011;33:e267-74.
5Hammond SM, O'Rourke M, Kelly M, Bennett D, O'Flynn S. A psychometric appraisal of the DREEM. BMC Med Educ 2012;12:2.
6Wetzel AP. Factor analysis methods and validity evidence: A review of instrument development across the medical education continuum. Acad Med 2012;87:1060-9.
7Rotthoff T, Ostapczuk MS, De Bruin J, Decking U, Schneider M, Ritz-Timme S, et al. Assessing the learning environment of a faculty: Psychometric validation of the German version of the Dundee ready education environment measure with students and teachers. Med Teach 2011;33:e624-36.
8Ortega BJ, Pérez VC, Ortiz ML, Fasce HE, McColl CP, Torres AG, et al. An assessment of the Dundee ready education environment measure (DREEM) in Chilean university students. Rev Med Chil 2015;143:651-7.
9Mogre V, Amalba A. Psychometric properties of the Dundee ready educational environment measure in a sample of Ghanaian medical students. Educ Health (Abingdon) 2016;29:16-24.
10Riquelme A, Oporto M, Oporto J, Méndez JI, Viviani P, Salech F, et al. Measuring students' perceptions of the educational climate of the new curriculum at the Pontificia Universidad Católica de Chile: Performance of the Spanish translation of the Dundee ready education environment measure (DREEM). Educ Health (Abingdon) 2009;22:112.
11Floyd FJ, Widaman KF. Factor analysis in the development and refinement of clinical assessment instruments. Psychol Assess 1995;7:286-99.
12Batista-Foguet JM, Coenders G, Alonso J. Confirmatory factor analysis. It's usefulness in the validation of questionnaires related with health. Med Clin (Barcelona) 2004;122:21-7.
13Muthén B, Kaplan D. A comparison of some methodologies for the factor analysis of non-normal Likert variables. Br J Math Stat Psychol 1985;38:171-89.
14Muthén B, Kaplan D. A comparison of some methodologies for the factor analysis of non-normal Likert variables: A note on the size of the model. Br J Math Stat Psychol 1992;45:19-30.
15Bandalos DL, Finney SJ. Factor Analysis: Exploratory and confirmatory. In: Hancock GR, Mueller RO, editors. Reviewer's Guide to Quantitative Methods. 1st ed. New York: Rutledge; 2010.
16Kaiser HF. A second generation little jiffy. Psychometrika 1970;35:401-15.
17Cattell RB, editor. Extracting factors: The algebraic picture. In: The Scientific use of Factor Analysis in Behavioral and Life Sciences. New York: Plenum Press; 1978. p. 15-39.
18Horn JL. A rationale and test for the number of factors in factor analysis. Psychometrika 1965;30:179-85.
19Harman HH, editor. Factor analysis model. In: Modern Factor Analysis. 2nd ed. Chicago: University of Chicago Press; 1962. p. 11-33.
20Cortina JM. What is coefficient alpha? An examination of theory and applications. J Appl Psychol 1993;78:98-104.
21Cervantes V. Interpretaciones del coeficiente alpha de Cronbach. Av Med 2005;3:9-28.
22Lloret-Segura S, Ferreres-Traver A, Hernández-Baeza A, Tomás-Marco I. Exploratory analysis of items: A practical, reviewed and updated guide. Ann Psicol Spain 2014;30:1151-69.
23American Educational Research Association, American Psychological Association, National Council on Measurement in Education. Standards for Educational and Psychological Testing. Washington, DC: American Educational Research Association; 2014.
24Yong AG, Pearce S. A beginner's guide to factor analysis: Focusing on exploratory factor analysis. Tutor Quant Methods Psychol 2013;9:79-94.
25Roff S, McAleer S. Robust DREEM factor analysis. Med Teach 2015;37:602-3.