Education for Health

: 2010  |  Volume : 23  |  Issue : 1  |  Page : 348-

Validation of the Greek Translation of the Dundee Ready Education Environment Measure (DREEM)

ID Dimoliatis1, E Vasilaki1, P Anastassopoulos2, JP Ioannidis3, S Roff4,  
1 Department of Hygiene & Epidemiology, University of Ioannina School of Medicine, Greece
2 Fetal Medicine, Harris Birthright Research Centre, King's College Hospital, London, UK
3 Department of Hygiene & Epidemiology, University of Ioannina School of Medicine, Greece; Department of Medicine, Tufts University School of Medicine, Boston, USA
4 Centre for Medical Education, Dundee University Medical School, Dundee, UK

Correspondence Address:
I D Dimoliatis
University Campus, 45110, Ioannina, Greece


Context: The educational environment makes an important contribution to student learning. The DREEM questionnaire is a validated tool assessing the environment. Objectives: To translate and validate the DREEM into Greek. Methods: Forward translations from English were produced by three independent Greek translators and then back translations by five independent bilingual translators. The Greek DREEM.v0 that was produced was administered to 831 undergraduate students from six Greek medical schools. Cronbach«SQ»s alpha and test-retest correlation were used to evaluate reliability and factor analysis was used to assess validity. Questions that increased alpha if deleted and/or sorted unexpectedly in factor analysis were further checked through two focus groups. Findings: Questionnaires were returned by 487 respondents (59%), who were representative of all surveyed students by gender but not by year of study or medical school. The instrument«SQ»s overall alpha was 0.90, and for the learning, teachers, academic, atmosphere and social subscales the alphas were 0.79 (expected 0.69), 0.78 (0.67), 0.69 (0.60), 0.68 (0.69), 0.48 (0.57), respectively. In a subset of the whole sample, test and retest alphas were both 0.90, and mean item scores highly correlated (p<0.001). Factor analysis produced meaningful subscales but not always matching the original ones. Focus group evaluation revealed possible misunderstanding for questions 17, 25, 29 and 38, which were revised in the DREEM.Gr.v1. The group mean overall scale score was 107.7 (SD 20.2), with significant differences across medical schools (p<0.001). Conclusion: Alphas and test-retest correlation suggest the Greek translated and validated DREEM scale is a reliable tool for assessing the medical education environment and for informing policy. Factor analysis and focus group input suggest it is a valid tool. Reasonable school differences suggest the instrument«SQ»s sensitivity.

How to cite this article:
Dimoliatis I D, Vasilaki E, Anastassopoulos P, Ioannidis J P, Roff S. Validation of the Greek Translation of the Dundee Ready Education Environment Measure (DREEM).Educ Health 2010;23:348-348

How to cite this URL:
Dimoliatis I D, Vasilaki E, Anastassopoulos P, Ioannidis J P, Roff S. Validation of the Greek Translation of the Dundee Ready Education Environment Measure (DREEM). Educ Health [serial online] 2010 [cited 2020 Jul 9 ];23:348-348
Available from:

Full Text


Aside from the formal curriculum, students and teachers are well aware of the educational ‘environment’ or ‘climate’ of their institution. Is it competitive? Authoritarian? Relaxed? Stressful? Does it vary among courses within the curriculum? Does it motivate? Demotivate? All students respond to these elements in their learning environment1-4.

Roff et al.5 developed the Dundee Ready Education Environment Measure (DREEM), an international, culturally non-specific, generic instrument that provides global readings and diagnostic analyses of undergraduate educational environments within health professions institutions6. It generates a profile of a particular institution’s environmental strengths and weaknesses. The measure can be used by faculty and evaluators to compare students’ perceptions of the education environments within and between institutions or student cohorts. It can be used to assess relationships between environments and students’ academic achievements and can serve as a predictive tool for identifying achievers and underachievers7-11.

This valuable tool was originally developed in English and has been translated into about twenty languages, but not yet into Greek. The aim of this study was to translate the DREEM into the Greek language and to validate the translated instrument.


The DREEM Questionnaire

The DREEM inventory (questionnaire, scale, instrument, tool) consists of 41 positively worded statements (items, questions) each scored 0 to 4, and nine negative ones scored in reverse (4 to 0). It generates an overall score and five subscale scores regarding students’ perceptions of learning, perceptions of teachers, academic self- perceptions, perceptions of the atmosphere, and social self-perceptions. On all measures (items, subscales, overall score), high scores indicate a good environment - the higher the better5,6. Items and subscales can be seen in Table 1.

Table 1:  The original DREEM questionnaire, and the overall, subscale and question mean scores obtained by DREEM.Gr.v0 and their interpretation.


Three researchers (ID, EV, PA) independently translated the original English version of the DREEM into Greek. Via face-to-face and email interactive sessions, they discussed and resolved the differences in their translations and reached consensus on an initial best wording. This draft was piloted with 50 third-year Ioannina University medical students, and their comments were incorporated into an improved version which was piloted with 14 more volunteers from the same school. Pilot testing showed that students had some difficulties grasping the meaning of two questions (17, 29) and because almost all Greek medical students are competent in English, the English terms “cheating” and “feedback” within these items were kept in the Greek translation, in parentheses, to help in students’ understanding. Advice was also sought from eight interested faculty members. The open-ended question on the instrument, “Comments”, was replaced by the more specific “If you could change three things about the medical school, what would they be?” after Whittle et al.11.

The product was back-translated by five independent bilingual translators who were unaware of the original English version. All back translators got back to the original meaning and for many statements to the exact original wording. Two of us (ID, EV) refereed wording differences, and the DREEM.Gr.v0 was thus produced.


We conducted a survey among Greek medical students approved by the Body of Directors of the Ioannina University Medical School. With this data, we tested reliability, validity, sensitivity and responsiveness of the translated instrument.


The DREEM.Gr.v0 was transformed to an anonymous, scanable form and distributed by interested faculty members to a convenience sample of students within six of the seven Greek medical schools during November and December 2007. The questionnaire was retested under the same conditions 3.5 weeks later with the same Ioannina students, who were asked to respond without trying to recall their previous responses. The first two years of medical school in Greece are preclinical, the third is transitional, and the last three are clinical.

All completed forms were scanned by a highly reliable optical mark recognition scanner (OpScan iNSIGHT™, Pearson NCS). Using the QuickTesting software produced by Anova Consulting, an electronic data file was obtained, which was checked against the original completed questionnaires. The nine negative questions were reverse-coded prior to any calculations.


Cronbach’s alpha, alpha if item deleted, and test-retest reliability metrics were calculated. Since alpha depends on both the length of the scale (the number of questions) and the correlation of the items within the scale (actual reliability), the Spearman-Browne formula,

αsubscale = kαscale/(1+(k-1)αscale)

where k is the number of items of the subscale divided by the number of items of the overall scale12,13, was used to estimate expected subscale alphas. For good reliability, the observed scale alpha should be greater than 0.7013 if not greater than 0.8014, and observed subscale alphas should be greater than expected.

Responses from tested and retested Ioannina students (both anonymous) were compared by checking Cronbach’s alphas and item mean score correlation.


Content validity was addressed by the original DREEM. Not willing to limit our ability to compare results internationally, we did not delete or add any items, change items’ original (randomly arranged) order, or rearrange subscales. We followed standard instrument translation/validation methodology13,14.

We checked whether the original five subscales fit our data (construct validity) using confirmatory factor analysis (CFA) under the same conditions as used in the instrument’s original development, i.e., five factors and requiring loadings of 0.3 or greater5. To see whether different factor clustering operated in our data, we also performed exploratory factor analysis (EFA) under the conditions of eigenvalues>1 and loadings≥0.3.

Questions whose deletion increased the overall alpha, and those that loaded less than 0.3 on the expected factor, loaded on two factors or were grouped within an unexpected factor were thoroughly examined in a preliminary focus group (one psychologist, one PhD psychiatry student, three fifth-year and six fourth-year Ioannina medical students) and in a second focus group of 48 third-year Ioannina medical students.

Sensitivity and Responsiveness

We checked differences among groups (genders, years of study, and schools) to test the instrument’s ability to detect differences if they really exist (sensitivity). The instrument’s responsiveness, reflecting changes within a group over time, was beyond the scope of this study and was not tested.

Statistics & Software

Non-parametric tests were used to compare means and to correlate tested and retested mean item scores. To be consistent with most published relevant work on the DREEM and related instruments, we also considered their parametric equivalents14-16. Reported p-values for differences according to gender, year of study and medical school were considered significant at pResults


Eight hundred and thirty-one questionnaires were distributed and 487 students responded. Respondents were representative of Greek medical students in terms of gender but not according to school or year of study. Overall response was 58.6% (Athens 10/284; Crete 43/75; Ioannina 102/102; Thessaly 112/150; Thessaloniki 127/127; Thrace 93/93). Participants’ year of study was preclinical 12%, transitional 62%, clinical 26%.


There were 79 invalid scanner readings (0.3% of all possible 24350 readings) within 61 questionnaires (13%). Checking against the original completed questionnaires, we found that invalid readings were caused when participants had chosen two options and either deleted the one by an X (73) or left them both undeleted (6). We therefore corrected the electronic data by either keeping the undeleted option when that choice was clear or by deleting both responses and treating them as missing values when the choice was not clear.

There were 279 missing values (1%) in 129 questionnaires (27%); thus the overall alpha calculations were based on the remaining 358 questionnaires. All questions had missing values, but questions 29, 18, 6 and 11 had 2.7, 4.3, 4.8 and 5.7 times more than the average of the rest of the 46 questions, reflecting either difficulties in capturing the feedback concept (q29) or in participants not having clinical experience yet (q6, q11, q18). The 24071 non-missing values ranged from ‘strongly agree’ 2016 (8%), ‘agree’ 8373 (35%), ‘uncertain’ 7104 (30%), ‘disagree’ 5077 (21%), to ‘strongly disagree’ 1501 (6%).


Cronbach’s alpha (Table 2) was greater than 0.70 when the entire inventory was considered and greater than expected when subscales were considered, except for the social subscale and marginally for the atmosphere subscale. If items 17 and 25 (and marginally 19 and 50) were deleted, overall alpha was increased.

Alphas among tested and retested students were identical (0.90) and both greater than 0.70 (Table 2). The correlation between tested and retested mean item scores was very high (Kendall’s tau-b 0.862, pTable 2:  Observed and (in parentheses) minimum expected Cronbach’s alpha for different scales and subscales.

Figure 1:  Correlation between tested and retested mean item scores of Ioannina medical students.


In the confirmatory factor analysis (CFA), eight items did not reach the target factor loading of 0.3 (Tables 3 and 4). The remaining 42 items loaded onto factors f1 to f5 somewhat differently than the original DREEM analysis, while six of those items loaded onto two factors (loadings≥0.3). However, the new factors can be interpreted in a meaningful way: f1 describes the qualities of a good teacher, with the exception of two negative questions (35,48) that might fit better with f3; f2 describes a relaxed atmosphere; f3 addresses the negative aspects of the climate; f4 addresses students’ learning and coping strategies; and f5 addresses students’ social life.

In the exploratory factor analysis (EFA), six items, mostly the same as in CFA, had loadings less than 0.3 (range 0.19 to 0.29). The remaining 44 spread out in 12 factors. Seven items loaded onto two factors. The new arrangement of items also produced meaningful underlying factors. It seems that f1 has been divided into F1 (good teachers), F2 (motivating teaching, with the exception of the negative question 35), F8 (developing teaching), and F9 (encouraging and supporting). There are two contradictions with two negative questions: q35 (disappointing experience) fitted into two opposite factors, one about motivating (F2; wrongly) and the other about demotivating (F3; correctly); and q25 (factual learning, a negative item) has been sorted onto the same factor (F12) with q45 (relevant content, a positive item).

In the focus groups, the vast majority of students misunderstood q29 and q38, almost none translated ‘cheating’ (q17) differently than ‘αντιγραφή’ (a Greek slang term for copying during the exams), and three in four perceived positively the negative q25. Recoding q25 as positive and rerunning EFA under exactly the same conditions as in Table 4, q25 loaded 0.23 onto F1 ‘good teachers’, suggesting that the whole sample might have understood its concept in reverse.

Table 3:  The original five factors (subscales) and the Confirmatory and Exploratory Factor Analysis of our data.

Table 4:  The original five factors and the Confirmatory and Exploratory Factor Matrix of our data set.


The mean overall score for all students was 107.7 (SD 20.2), with no differences according to gender (p=0.86) or year of study (p=0.21). Mean overall scores did differ across schools (pDiscussion

This study’s goal was to translate and validate the DREEM in the Greek language. We will discuss four interrelated but also independently important features of the scales from the translation: reliability, validity, sensitivity and responsiveness. We will also discuss the translated instrument’s limitations.


All overall alphas were much higher than the 0.7013 or 0.8014 thresholds generally considered acceptable for scales and also similar to published studies of the DREEM translation in other languages (Primparyon et al.17: 0.91; Mayya & Roff18 : 0.92; de Oliveira Filho19: 0.93; Riquelme et al.20: 0.91). The subscale alphas were higher than expected except for the social (and perhaps the atmosphere).

In our test-retest exercise we found the same alpha and an extremely high correlation between tested and retested mean question scores. We have not seen a previous test-retest assessment of the DREEM in the published literature.

Thus, we can conclude that our translation produced a reliable questionnaire.


Content validity, a matter of the original DREEM, and face validity are both optimized by involving a wide range of individuals in scale development13. The various types and numbers of people involved in the translation, back-translation, consulting, piloting and focus groups indicate that the original content has been successfully transformed into the destination language. That all five alternative answers in all questions were used supports this conclusion21. More missing values in ‘clinical’ questions 6, 11 and 18 were expected (participants were mostly preclinical students), but we did not expect this in q29 about feedback.

Factor analysis (FA) plays a major role in construct validation13. In general, both confirmatory and exploratory factor analysis produced sensible subscales; however, they did not quite match the original English ones. One reason might be that the originals were largely arrived at by consensus of a qualitative group rather than by statistical methods5 as in our case. A second reason might be that participants failed to realize that negative questions should be chosen in reverse because the scanable forms provided empty boxes without a reminder whether each was an ‘agree’ or a ‘disagree’ box (a face validity issue, corrected in DREEM.Gr.v1). But the most likely reason seems to be the ineffectively translated items spotted by factor analysis and/or 'alpha-if-item-deleted' reliability analysis. Although their translation was appropriate and back-translators had no difficulty, they failed to adjust for the English-Greek cultural differences. Our solution was to give definitions in paraphrases instead of simply translating terms. For example, q38 in DREEM.Gr.v0 read “The learning objectives of the courses are clear to me” and in DREEM.Gr.v1 it became “At the beginning of the course, the teachers clarify what new things I should know or what I should be able to do at the end of the course.”22. Finally, though the original ‘cheating’ might convey a spectrum of cheats, no student in focus groups thought q17 might be something different than ‘αντιγραφή’ and we adopted it. These solutions made keeping English terms in parentheses unnecessary.

There is no other ‘gold standard’ or well-established instrument in Greek and this is not a prognostic study; thus, concurrent and prognostic validity remain unchecked.

Sensitivity and Responsiveness

In addition to reliability usually being a prerequisite for sensitivity13, the differences found among schools might be a good index of the translated instrument’s ability to detect real differences among groups (sensitivity). It seems reasonable that these differences really do exist. The fact that there were no differences between schools for the social subscale reinforces rather than vitiates this claim; it seems reasonable that the social life is equally good in all cities. The same applies to understanding the lack of differences among students of different genders or year of study: it seems reasonable that all students perceive the educational environment of their schools comparably, whether they are male or female, or in the preclinical, transitional or clinical stage of training. Alternatively, the scale might have missed real differences between gender and year of training: we cannot know that from this study’s data.

Testing the ability of our tool to detect changes over time within a group was beyond the scope of this study. Thus we have no indication of its responsiveness other than its sensitivity: a highly sensitive scale will usually also be highly responsive13.

Losing information

Almost one in three answers (30%) were neither in the ‘agree’ nor in the ‘disagree’ side, but in the middle ‘uncertain’, making their interpretation almost impossible. What does ‘uncertain’ really mean? Either, the participant has no experience, is unable to understand what is being asked, and thus prefers a ‘not applicable’ option. These questions include 6, 11, 18 for preclinical students. Or the student does have the experience, does understand what is asked for, but either has not decided yet (the really ‘uncertain’ option) or has decided to stand exactly in the middle (‘not agree nor disagree’ option).

The solution adopted in DREEM.Gr.v1 was to split the ‘uncertain’ option into ‘rather agree’ and ‘rather disagree’ options (slightly agree/disagree), and to prompt students to answer all questions except when they really have no knowledge or experience. They could then leave the question unanswered. While offering more options to participants, the solution is closer to ‘five to nine’ options, an ideal congruent with both respondent preferences and reliability statistics14. This also prevents the ‘central tendency bias’ that results in loss of reliability and sensitivity14.


After this validation study successfully debugged the DREEM.Gr.v0, the final DREEM.Gr.v1 has been created (Table 5). A reliable, valid, sensitive and probably responsive Greek version of the DREEM is now available for prospectively evaluating and monitoring the medical educational environment. Although no one can prove scale validity in any absolute sense,13,16, we expect that its future comprehensive application will help educators and evaluators obtain a good picture of the medical educational environment to inform policies and interventions.

Table 5:  The final version of the Greek DREEM after this validation study (DREEM.Gr.v1)

This study highlights the importance of assessing and reporting details of scale reliability and factor loadings, done best through the use of a combination of criteria23, even when assessing a tool already validated in another language. This study also suggests that when translated instruments have items that do not load onto any factor, they may be suffering from poor translation, especially a failure to adequately address cultural differences between the peoples speaking the original and target languages. Providing subjects with definitions (glossary) of confusing terms, instead of attempting to find single-word substitutes in the new language, can sometimes be the solution.


Although this is an almost national administration of the DREEM, our sample is not representative of all medical schools and years of study. Various sampling schemes, response rates, and cohort sizes across medical schools and years of study make the data more appropriate for testing the translated instrument than it does for detecting educational environment quality or comparing schools. Although our mean overall score (107.7) was quite similar to those from other countries, ranging from 97 to 143 in a recent review24, reported scores should be interpreted as a starting point and with caution, especially the ill-translated questions discussed above.


Virginia Draina, Didi Vergados, Mary O’Connor, Elpiniki Pappu and Lena Cavanagh for back translation; Mary Gouva for her help in reliability and factor analysis; George Tsianos for his comments; colleagues Antony Kafatos, Victoria Kalapothaki, Anastassios Germenis, Alexis Benos, and Theodore Constantinidis for their invaluable help in data collection. We also would like to thank the Education for Health editors and the unknown reviewers whose comments greatly improved our work.


1. Genn J. AMEE Medical Education Guide No 23 (Part I): Curriculum, environment, climate, quality and change in medical education - a unifying perspective. Medical Teacher. 2001; 23(4), 337-344.

2. Genn J, Harden RM. What is medical education here really like? Suggestions for action research studies of climates of medical education environments. Medical Teacher. 1986; 8:111-124.

3. Hutchinson L. The ABC of learning and teaching: educational environment. British Medical Journal. 2003; 326, 810-812.

4. Roff S. Educational Environment and Climate. University of Dundee, Centre for Medical Education, Theme Curriculum Development CD10; 2005.

5. Roff S, McAleer S, Harden RM, Al-Qahtani M, Uddin AA, Deza H, Groenen G, Primparyon P. Development and Validation of the Dundee Ready Education Environment Measure (DREEM). Medical Teacher. 1997; 19(4):295-299.

6. McAleer S, Roff S. Part 3; A practical guide to using the Dundee Ready Education Measure (DREEM). In, J. M. Genn (ed), AMEE Medical Education Guide No.23 Curriculum, environment, climate, quality and change in medical education; a unifying perspective. Dundee, UK: Association of Medical Education in Europe; 2002.

7. Roff S. The Dundee Ready Educational Measurement (DREEM): a generic instrument for measuring students’ perceptions of undergraduate health professions curricula. Medical Teacher. 2005; 27(4):322-325.

8. Dunne F, McAleer S, Roff S. Assessment of the undergraduate medical education environment in a large UK medical school. Health Education Journal. 2006; 65:149-158.

9. Avalos G, Freeman C, Dunne F. Determining the quality of the medical educational environment at an Irish medical school using the DREEM inventory. Irish Medical Journal. 2007; 100(7):522-525.

10. Miles S, Leinster SJ. Medical students' perceptions of their educational environment: expected versus actual perceptions. Medical Education. 2007; 41(3):265-72.

11. Whittle SR, Whelan B, Murdoch-Eaton DG. DREEM and beyond; studies of the educational environment as a means for its enhancement. Education for Health. 2007; 20(1):7.

12. Norusis/SPSS Inc. SPSS Professional Statistics™ 7.5. Chapter 13 Measuring scales: reliability analysis examples; 1997. p. 103-111.

13. Fayers PM, Machin D. Quality of life: assessment, analysis and interpretation. John Willey & Sons. England; 2000.

14. Streiner DL, Norman GR. Health Measurement Scales - a practical guide to their development and use. 4th edition. Oxford University Press. Oxford; 2008.

15. Berk RA. Thirteen strategies to measure college teaching. Stylus publishing LLC, Sterling, Virginia, USA; 2006. p. 188.

16. Norman G, Eva KW. Quantitative research methods in medical education. Association for the Study of Medical Education booklet series Understanding Medical Education; 2008.

17. Primparyon P, Roff S, McAleer S, Poonchai B, Pemba S. Educational environment, student approaches to learning and academic achievement in a Thai nursing school. Medical Teacher. 2000; 22:359-365.

18. Mayya S, Roff S. Students’ Perceptions of Educational Environment: A Comparison of Academic Achievers and Under-Achievers at Kasturba Medical College, India. Education for Health. 2004; 17(3), 280-291.

19. de Oliveira Filho GR, Vieira JE, Schonhorst L. Psychometric properties of the Dundee Ready Educational Environment Measure (DREEM) applied to medical residents. Medical Teacher. 2005; 27(4):343-347.

20. Riquelme A, Oporto M, Oporto J, Mendez J, Viviani P, Salech F, Chianale J, Moreno R, Sanchez I. Measuring Students' Perceptions of the Educational Climate of the New Curriculum at the Pontificia Universidad Catolica de Chile: Performance of the Spanish Translation of the Dundee Ready Education Environment Measure (DREEM). Education for Health 9 (online), 2009: 112. Available from:

21. Pappa E, Kontodimopoulos N, Niakas D. Psychometric evaluation and normative data for the Greek SF-36 health survey using a large urban population sample. Archives of Hellenic Medicine. 2006; 23(2):159-166.

22. Vassilopoulos D. Medical Education. Self-edition, Athens (in Greek); 2006. p. 27-43.

23. Schönrock-Adema J, Heijne-Penninga M, Van Hell EA, Cohen-Schotanus J. Necessary steps in factor analysis: enhancing validation studies of educational instruments. The PHEEM applied to clerks as an example. Medical Teacher. 2009; 31(6):e226-232.

24. Dimoliatis I. D. K. (accepted 6 July 2009). The Dundee Ready Education Environment Measure (DREEM) instrument in Greek: how to use and preliminary results for the Greek medical educational environment. Archives of Hellenic Medicine (in press). In Greek with an English abstract. Available from