Predicting Trainee Therapists’ Abilities with Letters of Recommendation Part 1
Clinical Impact Statement: Qualitative ratings derived from the letters of recommendation (LORs) were not related to psychotherapy process measures attained from the trainee’s first clinical case, raising questions regarding their utility for this purpose. Several limitations, inherent in LORs, are addressed.
According to the American Psychological Association’s 2019 report on Admissions, Applications, and Acceptances, over 40,000 individuals applied to clinical psychology programs in the 2016-2017 academic year, with acceptance rates of 12-30% (Michalski et al., 2019). Due to an increasing interest in clinical and counseling psychology (Norcross & Sayette, 2014) and a limited amount of space in graduate programs, discerning what factors could predict an applicant’s success is important. The most common tools used to select graduate students for admission are Graduate Record Examination (GRE) scores, undergraduate grade point average (GPA), letters of recommendation (LORs), and personal statements (Kuncel et al., 2001; Sternberg & Williams, 1997).
While GRE scores and undergraduate GPAs have shown some ability to predict outcomes such as graduate GPA, graduation rate, and faculty ratings (Kuncel et al., 2010; Schwager et al., 2015; Sternberg & Williams, 1997), neither correlates with qualities considered important for conducting therapy (Educational Testing Services, n.d.; Smaby et al., 2005). Moreover, the coronavirus pandemic has increased graduate admission committees’ reliance on considerations other than GPA and GRE scores (Burke, 2020; Hu, 2020).
Other areas of consideration for admission to clinical and counseling programs include personal statements and letters of recommendation, both of which provide a more qualitative view of the applicant and, as such, may better assess applicants’ important personal characteristics. LORs provide a unique view of an applicant from outside sources (e.g., professors, supervisors), offering a more objective view of the applicant’s characteristics than personal statements. Given their strength over other application materials at capturing nuanced personal qualities, many universities use LORs to identify qualities related to a greater clinical potential, such as creativity, interpersonal style, mental agility, maturity, and drive. However, previous research has raised questions regarding the effectiveness of using LORs as predictors of trainee success due to difficulties such as restriction of range, various biases, and lack of reliability (McCarthy & Goffin, 2001; Miller & Rybroek, 1988). Prior research suggests that the best way to compensate for these difficulties and improve predictive validity is by having letter-writers (as opposed to letter-readers) give a quantitative rating of applicant characteristics or by examining LORs in a structured way in which specific content areas are coded (Kuncel et al., 2014).
The primary aim of this study is to evaluate whether LORs and associated quantitative ratings are able to predict therapeutic ability in clinical graduate trainees. Results will be presented in two parts. In this issue of the Bulletin, we will provide an analysis of the quantitative rating scores that were provided by letter-writers. Then, in the next issue, we will provide an analysis of the qualitative LOR scores obtained through structured letter-reader analysis. Specifically, the relationships between letter writers’ quantitative ratings, the qualitative LOR scores, and client ratings of average session depth and quality, the alliance, and overall perceived helpfulness from the therapy will be assessed in this series.
All participants (n = 45) were trainees in a clinical master’s program. The sample was 69.6% female, with a mean age of 23.7 (SD = 3.59). Participants were 71.7% European American, 13.0% African American, 8.7% Hispanic, 2.2% Asian American, and 4.4% other.
Clients were undergraduate volunteers from a class focused on personal growth and learning. None of the clients or therapists knew each other prior to the therapy, and the professor of the undergraduate course did not receive any information about the therapy other than confirmation of student attendance. Clients (n = 45) were 73.3% female with a mean age of 20.8 (SD = 4.14). Clients were 44.4% European American, 35.6% African American, 11.1% Asian American, 4.4% Hispanic, and 4.4% other.
Letter-writers were primarily professors (87.7%) of various ranks (e.g., instructors, assistant, associate, full) but also included employers, coworkers, and graduate students. Just over half of all letter-writers (53.7%) were female. The mean length of time the letter-writers had known the applicants was just over two years, but this varied widely (M = 28.92 months, SD = 33.20 months, range = 2 to 276 months).
Of the 45 participants, 35 (77.8%) had a closed file, meaning that these participants chose to give up their right to review their own LORs.
Attainment of letter-writer data
The consent process for the graduate students who participated in this study occurred after the admissions process and was completely voluntary. For those who consented, LORs were obtained from the participants’ application materials. Each participant was required to provide three LORs as part of the application packet. In addition to a written letter, letter-writers also provided quantitative ratings of the applicant. The quantitative ratings and qualitative assessments of letter-writer data were linked to the clients’ process and outcome ratings for each participant.
During their first year of graduate school, all clinical students took a beginning therapy course based on the three-stage helping skills model from Helping Skills: Facilitating Exploration, Insight, and Action (3rd ed. & 4th ed.; Hill, 2009, 2014). As a part of this class, trainees saw their first clinical case, a 4-session therapy with an undergraduate client. Prior to the start of the therapy, all participants consented to provide ratings of the therapy session to be utilized for both training and research purposes. Sessions were non-manualized, and clients were told they could discuss any topics they desired, with the exception of harm to self or others, or the endangerment of a child or elder. The first session was a 1.5-hour intake session, and the remaining three sessions were approximately 45 minutes in length.
Quantitative Ratings of Applicants: Letter-writers rated applicants on the following traits: intellectual ability, oral communication skills, written expression, imagination/originality, initiative/motivation, industry perseverance, and maturity on a 4-point scale ranging from 5 (Excellent [Highest 10%]) to 2 (Below Average). Raters could also indicate a 1 for Not Observed. All not observed scores were discarded from analyses. Letter-writers also responded to the item “please indicate your overall recommendation” on a 5-point scale from 1 (Not Recommended) to 5 (Recommended Strongly).
Assessment of Therapy Measures: After every session, clients completed the Session Evaluation Questionnaire (SEQ; Stiles & Snow, 1984). The SEQ consists of 24 bipolar adjective pairs, with each rated on a 1 to 7 semantic differential rating scale. Scores from the item asking clients’ perception of the session quality (bad-good), as well as the SEQ Depth index, were used for this study. Session quality consisted of a single item asking clients to rate the overall quality of the session, with higher scores indicating higher quality. The Depth index included several items using bipolar adjectives that reflect how powerful or effective the client perceived the session. The scores in both of these areas were added across all four sessions to create one summary quality score and one summary depth score for each participant, with 28 as the highest possible score for both of the measures. At the end of the third session, clients completed the Working Alliance Inventory (WAI; Horvath & Greenberg, 1989) to assess client-rated alliance. At the end of the fourth, and final, session, the client rated the overall helpfulness of the therapy using one item, rated on a 7-point Likert-type scale (1 = not at all helpful to 7 = extremely helpful).
Results and Discussion
The restriction of range found for the quantitative ratings highlights some of the issues inherent in LORs, which make it difficult to use them to identify the positive characteristics desired in trainee therapists. Specifically, the marked restriction of range in the quantitative ratings in this study likely reflects two types of selection bias. First, applicants choose individuals to write letters that they believe will make favorable statements. Second, the letters are from student applicants who were accepted into the program. Given that only students with mostly positive LORs and high quantitative ratings from their selected letter-writers would have been accepted, the information from these items may be more effective in identifying problematic applicants who were not offered acceptance (i.e., LORs may be more effective in eliminating applicants as opposed to differentiating accepted applicants). Unfortunately, only accepted applicants could be analyzed due to ethical reasons. A related issue is that nearly a quarter of applicants opted to have their letters in an open file. A letter-writer who knows the applicant will be able to read the letter that they provide may be inclined to score the applicant more favorably.
The findings and implications of this study must be considered in light of its limitations. As all participants came from a single university, results may differ between universities with different admissions policies, graduate acceptance committees, areas of importance, etc. Additionally, this study followed trainees through only one four-session training case with volunteer clients. Results may differ as the number of sessions/clients increase. Assessing outcomes with other measures, such as symptom reduction, may also yield different results. Future research should continue to investigate the use of LORs and associated quantitative ratings in terms of predicting therapy effectiveness.
As noted previously, the issues of selection bias and restriction of range are inherent problems with information received from letter writers. Specifically, the apparent ceiling effect of the quantitative ratings is a possible explanation for the null results found when correlating with therapy process and outcome measures. While LORs do offer some advantages over other application materials for assessing important therapist qualities, the problems associated with LORs may prevent them from being useful in predicting therapeutic ability, at least within accepted applicants. The next issue of the Bulletin will investigate the ability of LORs to predict therapist abilities using a qualitative analysis of the letters written. A more comprehensive view of the inherent issues of LORs, as well as future directions, will be addressed in the next issue. With the coronavirus having spurred many admissions committees to more heavily weight LORs, the degree to which LORs can help identify students who will be good therapists has become a timely issue. Continuing to explore the best methods for assessing applicants in terms of potential therapeutic ability is of critical importance.
Cite This Article
Hoffman, Z. T., Jarrard, C., Lewis, C., Widner, S., Johnson, A., Siefert, C., & Slavin-Mulford, J. (2021). Predicting trainee therapists’ abilities with letters of recommendation part 1: Quantitative scores. Psychotherapy Bulletin, 56(3), 19-24.
Burke, L. (2020, April 13). The asterisk semester. Inside Higher ED. https://www.insidehighered.com/news/2020/04/13/how-will-passfail-affect-students-future
Educational Testing Services (2017). Guidelines for use of GRE scores. https://www.ets.org/gre/institutions/admissions/using_scores/guidelines?WT.ac=40361_owt19_180820
Hill, C. E. (2009). Helping Skills: Facilitating exploration, insight, and action (3rd Ed.). American Psychological Association.
Hill, C. E. (2014). Helping Skills: Facilitating exploration, insight, and action (4th Ed.). American Psychological Association. https://doi.org/10.1037/14345-000
Horvath, A., & Greenberg, L. (1989). Development and validation of the working alliance inventory. Journal of Counseling Psychology, 36(2), 223-233. https://doi.org/10.1037/0022-0188.8.131.52
Hu, J. C. (2020, June 24). Graduate programs drop GRE after online version raises concerns about fairness. Science. https://www.sciencemag.org/careers/2020/06/graduate-programs-drop-gre-after-online-version-raises-concerns-about-fairness. https://doi.org/10.1126/science.caredit.abd4989
Kuncel, N. R., Hezlett, S. A., & Ones, D. S. (2001). A comprehensive meta-analysis of the predictive validity of the Graduate Record Examinations: Implications for graduate student selection and performance. Psychological Bulletin, 127(1), 162-181. https://doi.org/10.1037//0033-2909.127.1.162
Kuncel, N. R., Wee, S., Serafin, L., & Hezlett, S. A. (2010). The validity of the Graduate Record
Examination for master’s and doctoral programs: A meta-analytic investigation. Educational and Psychological Measurement, 70(2), 340-352. https://doi.org/10.1177/0013164409344508
Kuncel, N. R., Kochevar, R. J., & Ones, D. S. (2014). A meta-analysis of letters of recommendation in college and graduate admissions: Reasons for hope. International Journal of Selection and Assessment, 22(1), 101-108. https://doi.org/10.1111/ijsa.12060
McCarthy, J. M. & Goffin, R. D. (2001). Improving the validity of letters of recommendation: An investigation of three standardized reference forms. Military Psychology, 13(4), 199-222. https://doi.org/10.1207/S15327876MP1304_2
Michalski, D. S., Cope, C., & Fowler, G. A. (2019). Graduate study in psychology summary report: Admissions, applications, and acceptances. American Psychological Association.
Miller, R. K. & Van Rybroek, G. J. (1988). Internship letters of recommendation: Where are the other 90%? Professional Psychology: Research and Practice, 19(1), 115-117. https://doi.org/10.1037/0735-7028.19.1.115
Norcross, J. C., & Sayette, M. A. (2014). Insider’s guide to graduate programs in clinical and counseling psychology, 2014/2015 edition. Guilford Press. https://doi.org/10.1037/t15120-000
Schwager, I. T. L., Hülsheger, U. R., Bridgeman, B., & Lang, J. W. B. (2015). Graduate student selection: Graduate Record Examination, socioeconomic status, and undergraduate grade point average as predictors of study success in a western European university. International Journal of Selection and Assessment, 23(1), 71-79. https://doi.org/10.1111/ijsa.12096
Smaby, M. H., Maddux, C. D., Richmond, A. S., Lepkowski, W. J., & Packman, J. (2005). Academic admission requirements as predictors of counseling knowledge, personal development, and counseling skills. Counselor Education and Supervision, 45(1), 43-57. https://doi.org/10.1002/j.1556-6978.2005.tb00129.x
Sternberg, R. J., & Williams, W. M. (1997). Does the Graduate Record Examination predict meaningful success in the graduate training of psychologists? A case study. American Psychologist, 52(6), 630-641. https://doi.org/10.1037/0003-066X.52.6.630
Stiles, W. B., & Snow, J. S. (1984). Counseling session impact as viewed by novice counselors and their clients. Journal of Counseling Psychology, 31(1), 3-12. https://doi.org/10.1037/0022-0184.108.40.206