Psychotherapy Bulletin

Psychotherapy Bulletin

Authors’ Note: We are honored and delighted that we were awarded the Norine Johnson Psychotherapy Research Grant by Division 29 of the American Psychological Association (APA). This generous grant provides a unique opportunity to study therapist factors, which is an area of research that is rarely supported by external funding. We are fortunate to have this support to further our research program on therapist effects and measurement-based care in routine treatment settings.


Research has demonstrated significant between-therapist variability in both process (e.g., working alliance) and outcome (e.g., symptom reduction), pointing to the so-called therapist effect (Baldwin & Imel, 2013). Although still in its infancy with regard to empirical scrutiny, thinking in this area has largely assumed that more effective therapists possess specific characteristics that foster consistently positive processes and outcomes with all of their clients. The possibility that therapists possess strengths and weaknesses (e.g., certain therapists are more effective with certain types of clients) has received less attention from both researchers and therapists themselves. As therapists, if we assume that we are generally and equally effective with all of our clients, this will likely have implications for our decision making and client care (e.g., the clients we choose to see, referrals that we make [or do not make], whether or not we seek feedback, and our use of outcome measurement to guide treatment).

Previous research has shown that when asked about their general effectiveness, most therapists report that they are more effective than the average therapist (the so-called “Lake Wobegon effect”; Walfish, McAlister, O’Donnell, & Lambert, 2012). Furthermore, when based on clinical judgment alone (compared with actuarially guided judgments), therapists are much less accurate in predicting whether or not a particular client will deteriorate or drop out of treatment (Hannan et al., 2005). To date, research in this area has primarily focused on global measures of outcome. However, some research suggests that therapist effectiveness may differ across problem domains (Kraus, Castonguay, Boswell, Nordberg, & Hayes, 2011; Kraus et al., 2015). Thus, investigations of therapist effectiveness (and, consequently, “therapist effects”) and the accuracy of therapists’ judgments of their own effectiveness may benefit from greater specificity.

For example, Kraus et al. (2011) demonstrated that the majority of therapists in a large naturalistic sample had clients who demonstrated improvement in multiple domains of psychopathology and functioning (measured by the Treatment Outcome Package, TOP; Kraus, Seligman, & Jordan, 2005). However, the precise domains in which these improvements were observed differed among therapists; certain therapists had clients who consistently demonstrated improvements in depression, while others had clients who consistently demonstrated improvements in social functioning. Therefore, there may be a specificity or “matching” factor that helps explain the relative effectiveness of different therapists, as well as the accuracy (or inaccuracy) of therapists’ judgments of their own effectiveness. Furthermore, inconsistent with therapists’ belief in their overall effectiveness with their clients, Kraus et al. (2011) found that many therapists have at least one area in which they demonstrate reliable deterioration with their clients. These findings were largely replicated in another large therapist and client sample (Kraus et al., 2015), yet this more recent study applied a risk-adjustment model that accounted for diverse client characteristics, similar to the approach taken by Saxon and Barkham (2012).

If it is true that therapists are differentially effective with their clients, it is crucial to find methods to help therapists predict with whom they are likely to be more or less effective. Given the limitations of therapist self-ratings of general effectiveness and clinical judgment alone, there is a pressing need for more robust methods to assess therapist effectiveness in treating different problem domains. Employing a multi-dimensional outcome tool has immense promise for harnessing the potential specificity of therapist effectiveness in the service of enhancing client care. Consequently, the goals of this mixed methods study are to examine (a) therapists’ predictions regarding their own effectiveness with particular types of clients, by comparing perceptions of effectiveness across a number of domains (including working alliance formation and symptom reduction) with data derived from multidimensional routine outcome monitoring (ROM), and (b) the factors that contribute to therapists’ judgments regarding their effectiveness, or lack thereof, with particular clients.


We are recruiting psychotherapists (N=40) practicing in community mental health care (CHMC) settings. There are no specific exclusion criteria beyond willingness to comply with study procedures. We anticipate that the final therapist sample will resemble that from our previous work on therapist effects in community settings (Kraus et al., 2011; Kraus et al., 2015), with therapists averaging 10 or more years of experience. Various training backgrounds will also be represented, including, but not limited to, psychologists, mental health counselors, social workers, and clinical nurses.


Treatment Outcome Package (TOP; Kraus et al., 2005). The TOP will serve as the primary outcome measure, and its subscales will form the basis of therapists’ domain specific effectiveness predictions. The TOP evaluates behavioral health symptoms, functioning, and case mix variables. It consists of 58 items assessing 12 symptom and functional domains (risk-adjusted based on case mix assessment): work functioning, sexual functioning, social conflict, depression, panic (somatic anxiety), psychosis, suicidal ideation, violence, mania, sleep, substance abuse, and quality of life. Global symptom severity can also be assessed by summing all items or by averaging the z-scores across each of the 12 clinical scales. Domain-specific symptom severity is quantified as the risk-adjusted individual z-scores for each clinical scale. The TOP has been shown to have excellent factorial structure, as well as good 1-week test-retest reliability across the 12 scales. It is sensitive to change while possessing limited floor and ceiling effects. The TOP also has demonstrated good convergent validity.

Working Alliance Inventory, Short Form (WAI-S; Tracey & Kokotovic, 1989). The 12-item WAI-S assesses three dimensions of the therapeutic relationship outlined by Bordin (1979): (a) agreement on goals (goals), (b) agreement on how to achieve these goals (tasks), and (c) the affective relationship (bond). A total alliance score can also be calculated. Internal consistency is strong, with alphas ranging between .89 and .98. 

Effectiveness Predictions. We will obtain two types of ratings to examine the stability or variability in self-ratings across the 12 TOP domains and the WAI-S. One rating will be a Likert scale rating of effectiveness across all TOP domains. For example, “In treating your clients’ symptoms of DEPRESSION, would you say you are: (1) Always ineffective to (7) Always effective?” (middle anchor [4] Inconsistently effective). The working alliance item will ask about their effectiveness at establishing a positive working alliance with their clients. These ratings will allow us to see if some therapists truly see themselves as “generalists” who are good at most things, or how many self-rate more as “specialists.” The second type of rating will be a rank ordering of relative effectiveness across the 12 TOP domains (e.g., most effective in treating depression, followed by anxiety, substance use, etc.).


Volunteering therapists first complete the symptom domain and working alliance prediction items (along with a general information questionnaire). Addressing our first research question, we then select the predicted highest- and lowest-rated domains for each therapist. We will then compare how these therapists perform across multiple clients (minimum n=5 per therapist) in their caseloads with regard to their clients’ reported alliance ratings and outcome scores in these problem domains. We also will conduct comparisons across all therapists and their respective clients to see if therapists indeed reliably produce better alliances and outcomes in the domains of client functioning for which they see themselves as being more effective, relative to domains in which they perceived themselves as being less effective. Addressing our second question, we will also ask a random subset of therapists (n=15) via semi-structured interview about the factors that contributed to their judgments regarding their effectiveness, or lack thereof, with particular clients. These responses will be examined qualitatively with consensual methods.

Anticipated Outcomes

Despite emerging evidence indicating potential domain specificity in effectiveness (Kraus et al., 2011; Kraus et al., 2015), we anticipate finding more generalist tendencies than specialist tendencies in self-ratings, which also reflects the current and historical state of clinical training. However, given the lack of research in this area, the aims of this study are largely exploratory. If therapists are not accurate prognosticators (e.g., have clients who fail to demonstrate either positive alliances or improvements in domains that were self-perceived to be particular strengths), this would support the importance of providing therapists with multi-dimensional feedback via routine outcome and alliance measurement and perhaps helping them to consider additional training or supervision in particular areas. Conversely, if the results run counter to this, it would suggest that therapists might be good at predicting their areas of relative competence. In our view, effective therapists possess a balanced view of their relative strengths and weaknesses in addressing particular problem areas and clients. This awareness should lead them to work more often with particular clients, to seek particular training experiences that address areas of relative struggling, and/or to limit their practices to specialty areas of known efficacy. A therapist’s relative accuracy in making these determinations has important implications for client care, the use of measurement tools, and better understanding of the nuances of the therapist effect. 

Lessons Learned

We would like to end this article by “meta-communicating” about the process of implementing this research. As noted, this study is being conducted in routine community mental health settings and involves both therapists and their clients. This research does not involve archival data (e.g., secondary analyses of de-identified routine outcomes data) or a university-based clinic staffed by graduate student trainees. We are in no way diminishing research involving archival data or research that is conducted with graduate trainees. We have conducted, and will continue to conduct, such research. However, for this particular study, we elected to adopt a community-based participatory research approach. In doing so we have faced various obstacles to study implementation and have only recently reached the initial recruitment phase. These obstacles were not entirely unanticipated. We have written elsewhere about potential barriers to the implementation of ROM and process research in community treatment settings (see Boswell, Kraus, Miller, & Lambert, 2015; Castonguay, Boswell et al., 2010; Castonguay, Nelson et al., 2010).

One consequence of a truly collaborative approach to community-based research is that participants play multiple roles and partnering institutions must navigate those roles when reaching formal work agreements. For example, a CMHC may have multiple layers of internal review that are completely removed from an academic institution’s internal review board (IRB). Perhaps not surprisingly, these boards may have unique concerns regarding the study procedures, and the solution for one problem that is raised by organization A may directly contradict a demand from organization B. Furthermore, because they are not research institutions, many CMHCs do not carry a federal wide assurance number, which may be a requirement for some academic institutions to engage in collaborative research. In short, the complexities of this research approach require a high degree of open communication and considerable patience. Thankfully, more academic institutions are seeing the value of community-based and community-engaged research and are becoming less inclined to rigidly apply old rules to new methods. Fortunately, we have also received tremendous support from Division 29 throughout this process, and we are extremely grateful for this.

Be the 1st to vote.
Cite This Article

Boswell, J. F., & Constantino, M. J. (2015). Clinicians self-judgement of effectiveness. Psychotherapy Bulletin, 50(4), 15-19.


Baldwin, S. A., & Imel, Z. E. (2013). Therapist effects: Findings and methods. In M. J. Lambert (Ed.), Bergin and Garfield’s handbook of psychotherapy and behavior change (pp. 258–297). New York, NY: Wiley.

Bordin, E. S. (1979). The generalizability of the psychoanalytic concept of the working alliance. Psychotherapy: Theory, Research & Practice, 16, 252-260.

Boswell, J. F., Kraus, D. R., Miller, S., & Lambert, M. J. (2015). Implementing routine outcome assessment in clinical practice: Benefits, challenges, and solutions. Psychotherapy Research, 25, 6-19. doi: 10.1080/10503307.2013.817696

Castonguay, L. G., Boswell, J. F., Zack, S. E., Baker, S., Boutselis, M. A., Chiswick, N. R.,…Grosse Holtforth, M. (2010). Helpful and hindering events in psychotherapy: A practice research network study. Psychotherapy: Theory, Research, Practice, and Training, 47, 327-344.

Castonguay, L. G., Nelson, D., Boutselis, M,. Chiswick, N., Damer, D., Hemmelstein, N.,…Borkovec, T. (2010). Clinicians and/or researchers? A qualitative analysis of therapists’ experiences in a practice research network. Psychotherapy: Theory, Research, Practice, and Training, 47, 345-354.

Hannan, C., Lambert, M. J., Harmon, C., Nielsen, S. L., Smart, D. W., Shimokawa, K., & Sutton, S. W. (2005). A lab test and algorithms for identifying clients at risk for treatment failure. Journal of Clinical Psychology: In Session, 61, 155-163. doi: 10.1002/jclp.20108

Kraus, D. R., Anderson, P., Bentley, J. H., Boswell, J. F., Constantino, M. J., Baxter, E. E., & Castonguay, L. G. (2015). Predicting therapist effectiveness from their own practice-based evidence. Manuscript submitted for review.

Kraus, D. R., Castonguay, L. G., Boswell, J. F., Nordberg, S. S., & Hayes, J. A. (2011). Therapist effectiveness: Implications for accountability and patient care. Psychotherapy Research, 21, 267–276.

Kraus, D. R., Seligman, D., & Jordan, J. R., (2005). Validation of a behavioral health treatment outcome and assessment tool designed for naturalistic settings: The Treatment Outcome Package. Journal of Clinical Psychology, 61, 285–314.

Saxon, D., & Barkham, M. (2012). Patterns of therapist variability: Therapist effects and the contribution of patient severity and risk. Journal of Consulting and Clinical Psychology, 80, 535-546. doi: 10.1037/a0028898

Tracey, T. J., & Kokotovic, A. M. (1989). Factor structure of the Working Alliance Inventory. Psychological Assessment, 1, 207-210.

Walfish, S., McAlister, B., O’Donnell, P., & Lambert, M. J. (2012). An investigation of self-assessment bias in mental health providers. Psychological Reports, 110, 639–644.


Submit a Comment

Your email address will not be published. Required fields are marked *