Psychotherapy Bulletin

Psychotherapy Bulletin

Clinical Impact Statement: Qualitative ratings derived from the letters of recommendation (LORs) were not related to psychotherapy process measures attained from the trainee’s first clinical case, raising questions regarding their utility for this purpose. Several limitations, inherent in LORs, are addressed.

In the last issue of the Bulletin, we began exploring the very timely issue of the use of letters of recommendation (LORs) by clinical and counseling graduate programs as a tool to select students with high potential to be effective therapists. Not only do programs use LORs routinely for this process, but LORs have received even more emphasis in the admissions process in the time of the coronavirus given issues related to accessibility of GRE tests and inconsistencies within undergraduate GPAs (Burke, 2020; Hu, 2020). Widely used admissions materials such as GRE scores, undergraduate GPA, and personal statements do not seem to be able to comprehensively capture important therapist characteristics, such as interpersonal style (e.g., Brown, 2004; Smaby et al., 2005; Sternberg & Williams, 1997). On the other hand, LORs provide a window into applicants’ work ethic, interpersonal style, performance in real-world settings, and other qualities that other admissions materials do not (McCarthy & Goffin, 2001).

There is limited research regarding predictive ability of LORs (McCarthy & Goffin, 2001). In the last issue of the Bulletin, we investigated the possible relationship between letter writers’ quantitative ratings of traits such as intellectual ability, oral communication skills, written expression, and overall recommendation on therapy process and outcome measures. As discussed in the previous issue, our results indicated that these quantitative scores were not helpful in terms of predicting therapist ability (e.g., therapy process and outcomes). It is likely that at least part of the reason for null the findings is that ratings provided by the letter writers were uniformly high (i.e., a strong ceiling effect). This issue corresponds to problems inherent in LORs in which people typically only ask individuals to write letters who they believe will rate them favorably (see McCarthy & Goffin, 2001). One possible way to limit these inherent issues in LORs is to convert the written letter into numerical ratings using specific coding methodology (Kuncel et al., 2014).

The current study qualitatively examined the LORs as a potential method of evaluating these characteristics and screening for high-potential therapists. Specifically, LORs were qualitatively assessed using an adjective coding method developed by Peres and Garcia (1962). As part of this method, all relevant adjectives from a LOR are organized into one of five broad categories (i.e., mental agility, vigor, dependability-reliability, urbanity, cooperation-consideration). We examined the relationships between these LOR scores and client ratings of average session depth and quality, the alliance, and overall perceived helpfulness from the therapy. Therapy process and outcome measures were completed by the first therapy client of each respective trainee. Examining trainees’ first case minimizes the influence of factors such as training and orientation allowing trainees’ personal characteristics to exert greater influence (Chapman et al., 2009).



All participants (n = 45) were trainees in a clinical master’s program. The sample was 69.6% female, with a mean age of 23.7 (SD = 3.59). Participants were 71.7% European American, 13.0% African American, 8.7% Hispanic, 2.2% Asian American, and 4.4% other.

Clients were undergraduate volunteers from a class focused on personal growth and learning. None of the clients or therapists knew each other prior to the therapy and the professor of the undergraduate course did not receive any information about the therapy other than confirmation of student attendance. Clients (n = 45) were 73.3% female with a mean age of 20.8 (SD = 4.14). Clients were 44.4% European American, 35.6% African American, 11.1% Asian American, 4.4% Hispanic, and 4.4% other.

Letter-writers were primarily professors (87.7%) of various ranks (e.g., instructors, assistant, associate, full) but also included employers, coworkers, graduate students, and other positions. Just over half of all letter-writers (53.7%) were female. The mean length of time the letter-writers had known the applicants was just over two years, but this varied widely (M = 28.92 months, SD = 33.20 months, range = 2 to 276 months).

Of the 45 participants, 35 (77.8%) had a closed file, meaning that these participants chose to give up their right to review their own LORs.


Attainment of LORs

The consent process for graduate students participating in this study occurred after the admissions process and was completely voluntary. For those consenting, LORs were obtained from the participants’ application materials. Each participant was required to provide three LORs as part of the application packet. In addition to a written letter, letter-writers also provided quantitative ratings of the applicant. Both LORs and letter-writers’ quantitative ratings were linked to the clients’ process and outcome ratings for each participant.


During their first year of graduate school, all clinical students took a beginning therapy course based on the three-stage helping skills model from Helping Skills: Facilitating Exploration, Insight, and Action (3rd ed. & 4th ed.; Hill, 2009, 2014). As a part of this class, trainees saw their first clinical case, a 4-session therapy with an undergraduate client. Prior to the start of the therapy, all participants consented to providing ratings of the therapy session to be utilized for both training and research purposes. Sessions were non-manualized, and clients were told they could discuss any topics they desired, with the exception of self or other harm, or the endangerment of a child or elder. The first session was a 1.5-hour intake session and the remaining three sessions were approximately 45 minutes in length.


Qualitative Letter of Recommendation Coding: We used Peres and Garcia’s (1962) method for qualitative LOR assessment. Two trained raters read each LOR and highlighted all adjectives relating to the applicant. Highlighted adjectives were then sorted into one of five categories described by Peres and Garcia (1962). To facilitate categorization of adjectives, researchers made use of the list of pre-factored adjectives created by Aamodt (1996). This list also includes short phrases, as well as single word adjectives. These brief descriptions coupled with Peres and Garcia’s (1962) categories were mental agility (i.e., ability to apply information/knowledge), dependability-reliability (i.e., ability to follow through), vigor (i.e., active in class discussion), urbanity (e.g., achiever, assertive, defends ideas), and consideration-cooperation (e.g., altruistic, conscientious, desire to help others). For adjectives or phrases related to relevant categories but not in this list, letter-raters used their best judgment to select the most appropriate category. Each rater then summed the number of adjectives for each category in each letter. The average score for each category across the three letters was then attained for each rater. Then, the average of the two raters’ scores were used for each variable (see Figure 1 for clarification on the rating process).

Intraclass correlation coefficient (ICCs) two-way random-effects models were calculated to assess rater agreement for each of the five qualitative variables using the Spearman-Brown prediction formula. Scores were in the excellent range (Shrout & Fleiss, 1979) for each category (ICC [2,2] = 0.90 to 0.94).

Assessment of Therapy Measures: After every session, clients completed the Session Evaluation Questionnaire (SEQ; Stiles & Snow, 1984). The SEQ consists of 24 bipolar adjective pairs with each rated on a 1 to 7 semantic differential rating scale. Scores from the item asking clients’ perception of the session quality (bad-good), as well as the SEQ Depth index, were used for this study. Session quality consisted of one scale asking clients to rate the overall quality of the session, with higher scores indicating higher quality. The Depth index comprised of several items using bipolar adjectives that reflect how powerful or effective the client perceived the session. The scores in both of these areas were added across all four sessions to create one summary quality score and one summary depth score for each participant with 28 as the highest possible score for both of the measures. At the end of the third session, clients completed the Working Alliance Inventory (WAI; Horvath & Greenberg, 1989) to assess client-rated alliance. At the end of the fourth, and final, session, the client rated how helpful they believed the therapy to be overall using one item, rated on a 7-point Likert-type scale (1 = not at all helpful to 7 = extremely helpful).

Results and Discussion

Table 1 provides descriptive information on the qualitative LOR ratings and the therapy process and outcome variables. Additionally, Table 2 shows the correlation coefficients between each of the variables. Here, none of the qualitative LOR variables correlated with any of the therapy process and outcome variables. These results question whether LORs are helpful in predicting clinical ability. In understanding these null findings, several issues should be considered.

First, in line with previous research on LORs (Kuncel et al., 2014) we found that the letter writers differed widely in their use of adjectives when discussing the same applicant. Stated differently, qualitative ratings varied widely within each participant. For example, one participant had zero adjectives in letter 1 and 16 adjectives in letter 3 that fell within the cooperation/consideration category. Another participant had 20 adjectives in letter one and 1 adjective in letter 3 that fell within the mental agility category.

This low agreement may be due to markedly different letter lengths, different opinions on what factors are important to highlight in LORs, or even letter-writers using a single letter for multiple applicants. In fact, we encountered nearly identical recommendation letters written by the same person for different students, with only names of students and courses changing. If letter-writers are using a pre-made letter for multiple students, the letters will not accurately reflect individual applicant’s abilities. Of note, these findings are in line with previous research showing that a single letter writer is likely to show more agreement across applicants than several writers for the same applicant (Aamodt et al., 1993). This finding suggests that letters are more influenced by the writer than the applicant.

A second major inherent issue of LORs has to do with the environment in which letter-writers are observing the applicant, which may also account for some of the variability across letters. The majority of letters in this sample (87.7%) were written by professors observing the applicants in an academic setting. While some of a professor’s assessment of a student for a LOR is likely based on attributes such as demeanor and interpersonal qualities, much of the content is probably based on performance in classes and research experience at the undergraduate level.

Overall, when these findings are reviewed alongside those presented in the last Bulletin, the results suggest that neither quantitative scores provided by letter writers nor qualitatively coded scores from the letters predict therapy process and outcome variables when looking at letters written for admissions to graduate school. These null findings are likely related to the inherent issues of LORs (e.g., selection bias, low agreement, and restriction of range) which have been raised in prior research regarding the predictive ability of LORs more broadly (e.g., Aamodt et al., 1993; Kuncel et al., 2014; Miller & Van Rybroek, 1988).


This study raises important implications for the direction of future research on LORs, such as examining differences in LOR content based on the position of the letter-writer (e.g., professors, employers, fellow students). Future research could also examine reference calls, as opposed to LORs, for information on applicants. It is possible that recommenders will be more forthright about an applicant’s qualities through verbal rather than written communication. Such a method could provide graduate programs with better information during the selection process. Given an insufficient number of mental health providers and limited spots available in graduate programs, finding methods that can select for students with the highest potential as therapists is important. Indeed, selecting applicants via current screening methods may be excluding individuals with higher potential as therapists, but with little to no information on rejected applicants, it is almost impossible to know for sure. Regardless, LORs in their current state may not be the best method of assessing applicants with regards to clinical ability.

Cite This Article

Hoffman, Z. T., Jarrard, C., Lewis, C., Widner, S., Johnson, A., Siefert, C., & Slavin-Mulford, J. (2021). Predicting trainee therapists’ abilities with letters of recommendation part 2: Quantitative scores. Psychotherapy Bulletin, 56(4), 19-24.


Aamodt, M. (1996).  Applied Industrial-Organizational Psychology (2nd ed.). Brooks/Cole Publishing.

Aamodt, M., Bryan, D., & Whitcomb, A. (1993). Predicting performance with letters of recommendation. Public Personnel Management, 22, 81-90.

Brown, R. (2004). Self-composed: Rhetoric in psychology personal statements. Written Communication, 21, 242-260.

Burke, L. (2020, April 13). The asterisk semester. Inside Higher ED.

Chapman, B., Talbot, N., Tatman, A, & Britton, P. (2009). Personality traits and the working alliance in psychotherapy trainees: An organizing role for the five factor model? Journal of Social and Clinical Psychology, 28, 577-596.

Hill, C. E. (2009). Helping Skills: Facilitating exploration, insight, and action (3rd ed.). American Psychological Association.

Hill, C. E. (2014). Helping Skills: Facilitating exploration, insight, and action (4th ed.). American Psychological Association.

Horvath, A., & Greenberg, L. (1989). Development and validation of the working alliance inventory. Journal of Counseling Psychology, 36(2), 223-233.

Hu, J. C. (2020, June 24). Graduate programs drop GRE after online version raises concerns about fairness. Science.

Kuncel, N. R., Kochevar, R. J., & Ones, D. S. (2014). A meta-analysis of letters of recommendation in college and graduate admissions: Reasons for hope. International Journal of Selection and Assessment, 22(1), 101-107.

McCarthy, J. M. & Goffin, R. D. (2001). Improving the validity of letters of recommendation: An investigation of three standardized reference forms. Military Psychology, 13(4), 199-222.

Miller, K. & Rybroek, G. (1988). Internship letters of recommendation: Where are the other 90%? Professional Psychology: Research and Practice, 19, 115-117.

Peres, S. & Garcia, R. (1962). Validity and dimensions of descriptive adjectives used in reference letters for engineering applicants. Personnel Psychology, 15, 279-286.

Shrout, P. & Fleiss, J. (1979). Intraclass correlations: Uses in assessing rater reliability. Psychological Bulletin, 86, 420-428.

Smaby, M. H., Maddux, C. D., Richmond, A. S., Lepkowski, W. J., & Packman, J. (2005). Academic admission requirements as predictors of counseling knowledge, personal development, and counseling skills. Counselor Education and Supervision, 45(1), 43-57.

Sternberg, R. J., & Williams, W. M. (1997). Does the Graduate Record Examination predict meaningful success in the graduate training of psychologists? A case study. American Psychologist, 52(6), 630-641.

Stiles, W. B., & Snow, J. S. (1984). Counseling session impact as viewed by novice counselors and their clients. Journal of Counseling Psychology31(1), 3-12.


Submit a Comment

Your email address will not be published. Required fields are marked *