An MTurk Primer for Psychotherapy Researchers
Clinical Impact Statement: This article describes the history and basics of using MTurk for conducting psychotherapy research. It also discusses potential advantages and disadvantages of MTurk and the existing research on this online recruitment platform. Recommendations for using MTurk and future research directions are discussed.
In recent years, psychology researchers have begun to use online methods for participant recruitment and data collection. One of the most popular online methods is Amazon’s Mechanical Turk (MTurk), an online crowdsourcing website. To get a glance of its popularity, we recently did a Google Scholar search using the keyword “Mechanical Turk” (see Figure 1 for summary of citations by year).
While Google Scholar identified on 23 scholarly articles with MTurk in 2005, 10,500 citations were identified for 2017. Specific to the field of psychology, some have estimated that approximately half of all researchers have utilized MTurk for data collection (Goodman, Cryder, & Cheema, 2012). Despite the increasingly high use of MTurk in the field of psychology in general, many psychotherapy researchers are unfamiliar with this data collection tool. Thus, in this article we present a primer on MTurk for psychotherapy researchers.
MTurk is a crowdsourcing marketplace tool that was originally created in 2005 by Amazon.com’s founder, Jeff Bezos. Crowdsourcing means it utilizes an unorganized collection of individuals to do work. It can be contrasted to outsourcing, in which work is allocated to a defined organization. In creating MTurk, Bezos aimed to create a decentralized marketplace of workers to perform tasks that computers could not perform, or could not perform efficiently, such as recognizing patterns, audio transcription, filtering adult content, writing short product descriptions, and discerning meaning from images or text (Mason & Suri, 2012; Pontin, 2007). Seeing the value that MTurk brought to his own company, Bezos believed that it might be useful to others that had similar needs (Pontin, 2007).
Interestingly, the name MTurk was inspired by “The Turk,” an automated chess-playing invention developed by Wolfgang von Kempelen (Paolacci, Chandler, & Ipeirotis, 2010). The Turk was eventually revealed to not be an automation, but a human chess master who was disguised under the chess board and controlling the chess movements of a humanoid dummy. In a similar way, MTurk is a platform that allows humans to help perform tasks for which computers are not yet suited.
To utilize this platform, researchers must first register for an account at the MTurk website (https://www.mturk.com/mturk/welcome). In this system, individuals or businesses from around the world can register as “requesters” who advertise tasks that require completing, or as “workers” who work to fulfill the specific tasks the requesters post. “Requesters” post Human Intelligence Tasks (HITs), which are online tasks that can be done by “workers” using a computing device. Examples of these tasks include writing, evaluating product advertisements and websites, using simple templates, transcribing, and completing online research (Buhrmester, Kwang, & Gosling, 2011). For online research, researchers can create a survey within MTurk or can utilize other online survey tools like SurveyMonkey or Qualtrics. If using a different survey tool, researchers simply create a HIT that gives the worker a unique identifier and a link to the survey, thus allowing the researcher to approve only HITs that were submitted by “workers” with the specific identifier (Mason & Suri, 2012).
Once created, tasks are displayed on the site in a standardized format, with “workers” being able to browse or search for specific jobs. All HITs include information about the title of the HIT, the name of the “requester” who created the HIT, the compensation associated with completing the HIT, the number of HITs of this type available to be worked on, how much time the “requester” has allotted for completing the HIT, and the date/time when the HIT expires (Mason & Suri, 2012). Other information that is commonly presented includes a more detailed description of the HIT and the “worker” qualifications to be able to work on a HIT. “Requesters” can also provide keywords that workers can search, with the keywords with the most HITs being “data,” “collection,” “easy,” “writing,” and “transcribe” (Ipeirotis, 2010a). After reviewing posted studies, “workers” can decide which HITs they would like to complete. Upon completion, “requesters” get to review the quality of the work completed by the “workers” and make a decision about compensation based on the quality of the work. “Workers” who repeatedly receive poor ratings on their work quality may be disqualified from future HITs, depending on the specifications set up by the “requesters.”
As of 2007, there were more than 100,000 MTurk workers in more than 100 countries across the globe (Pontin, 2007), with that number expanding to over 500,000 workers in more than 190 countries in 2014 (Paolacci & Chandler, 2014). A number of studies have sought to examine the characteristics of the MTurk worker population (Berinsky, Huber, & Lenz, 2012; Paolacci et al., 2010; Shapiro, Chandler, & Mueller, 2013). In fact, there is a website called MTurk Tracker (Ipeirotis, 2010a) that shows a daily update of the demographics of MTurk users based on a brief survey (gender, year of birth, marital status, household size, household income, and country) that is posted to MTurk every 15 minutes; workers are restricted to answering the survey once per month (Ipeirotis, 2015). Overall, MTurk samples are found to be more representative than college samples (Berinsky et al., 2012) and samples obtained through many other online sources (Casler, Bickel, & Hackett, 2013).
Research does indicate that money is not the sole motivation for participation in MTurk. For example, approximately 70% of U.S. MTurk workers indicate that they use it as a fruitful way to spend free time while making cash, while approximately 40% report doing it because the tasks are fun. Most workers spend a day or less per week working on MTurk, and generally complete between 20 and 100 HITS in this amount of time (Ipeirotis, 2010b). There is no set reimbursement fee that is required by MTurk, although there have been discussions to try to require reimbursement that is comparable to minimum wages (Miller, Crowe, Weiss, Maples-Keller, & Lynam, 2017). Some studies have found that the current mean pay for an MTurk worker is between $1.38 and $1.71 per hour (Horton & Chilton, 2010; Paolacci et al., 2010). When analyzing data collected from over 165,000 HIT groups, Ipeirotis (2010a) found that 10% of HITS had an incentive of $0.02 or less, 50% had a price above $0.10 and 15% had a price above $1. Researcher have to pay a 40% commission fee to Amazon for using MTurk, which is a recent increase from its 10% commission fee (Buhrmester et al., 2011; Miller et al., 2017). Based on this, a researcher who pays $100 dollars for participant payments would owe MTurk an additional $40 in fees, bringing the total cost of participant reimbursement up to $140.
Potential Advantages of Using MTurk for Psychological Research
The increased popularity of MTurk is likely related to the numerous potential benefits of conducting research through the crowdsourcing platform. One of the main advantages of using MTurk for study recruitment is increased access to a large subject pool. With hundreds of thousands of “workers,” researchers have access to a substantially larger subject pool through MTurk than they might have using traditional recruiting methods (i.e., undergraduate research pools or flyer advertisements) (Pontin, 2007). A large subject pool can allow researchers to more easily recruit enough participants based on a priori power analyses for more complex statistical designs. This may be particularly valuable for researchers who would not otherwise have easy access to participants, such as researchers from smaller colleges or universities, researchers in more isolated geographical locations, and new researchers who may not yet have a network of collaborators to aid in study recruitment at multiple sites (Mason & Suri, 2012; Smith & Leigh, 1997).
A second advantage of using MTurk is the ability to recruit a more diverse sample of participants than might be available through traditional recruitment methods (Pontin, 2007). By not having barriers on the location where data is collected, researchers can have access to more demographically diverse subset of the population and, depending on the online recruitment criteria, can have access to international populations to study. MTurk workers have been found to be more diverse than traditional undergraduate samples and standard Internet samples, especially in regard to age, ethnicity, and educational level (Behrend, Sharek, Meade, & Wiebe, 2011; Casler et al., 2013). This diversity in research samples can aid in the generalizability of the data and can also allow cross-cultural research questions to be more easily examined. Besides diversity in common demographics, the anonymous format also allows access to unique populations, like people from stigmatized groups or people who might hold unsocially desirable views (Wright, 2005).
A third advantage of using MTurk for psychology research is the speed at which studies can be conducted (Mason & Suri, 2012; Wright, 2005). While there are some daily and weekly seasonal trends in workload HITs through MTurk (such as posting of HITs being slightly more likely during the weekdays and completion of HITs dropping on Mondays, often a function of limited HITs being posted over the weekend), overall, participation is fairly constant (Ipeirotis, 2010a). Due to ease of quick data collection, researchers have been able to get several hundred participants a day using MTurk for recruitment (Berinsky et al., 2012).
A fourth benefit of conducting psychological research through MTurk is that the research can be conducted at a relatively low cost. Through advertising online, researchers can save costs on recruitment methods (e.g., making flyers, postage, research assistant time, etc.) and travel costs that might be needed to bring participants to the researcher or the researcher to the participants. There can also be saved costs associated with not having to have a dedicated place to conduct in-person studies. Additionally, as mentioned earlier, the incentives given to participants tend to be less than traditional incentive rates given to in-person participants (Berinsky et al., 2012; Ipeirotis, 2010a).
Drawbacks of Using MTurk for Psychological Research
While there are numerous benefits to MTurk, there are also several disadvantages. One disadvantage is a self-selection bias, workers who volunteer for the specific study may differ in important ways from workers who do not. While there could be personality or clinical differences between those groups, this bias might also include more socioeconomic factors, with individuals who do not have financial means to access the Internet regularly not being involved in research as often (Hartz et al., 2017). These sampling issues can impact the generalizability of the study results.
Attrition is another disadvantage of online studies conducted through MTurk. Research indicates that attrition is more likely to occur in online experiments than laboratory experiments, possibly because of technology issues (i.e., Internet connectivity), distraction, or lack of the social pressure that is present in in-person data collection (Mason & Suri, 2012). When looking at MTurk dropout rates, researchers have found that it can be as high as 51% of the sample (Zhou & Fishbach, 2016).
The fact that MTurk workers have the opportunity to participate in many studies also creates some unique challenges. Since MTurk workers often complete many surveys, they are more likely to have filled out many of the common psychological instruments which could impact study results (Miller et al., 2017). Therefore, researchers should be cautious about conducting research where practice effects might influence study findings.
Another concern with MTurk is that participants might not meet study criteria. To address this concern, Chandler and Shapiro (2016) recommend unobtrusively prescreening using an initial questionnaire to screen for desired criteria and restrict access to the longer questionnaire to workers who meet the inclusion criteria. In an example of this process at work, Wiens and Walker (2015) had an initial questionnaire on beverage preference that was used to screen for inclusion criteria for a study on alcoholism. To help insure that individuals meet the screening criteria, researchers have also either asked a screening questionnaire again during the actual survey or used knowledge based questions that correlate with the screening criteria (i.e., having individuals claiming to be Veterans order insignia by rank) to determine responses that should be excluded from the analysis (Chandler & Shapiro, 2016). Related, some will create “bots” to complete the work instead of actual human participants. Researchers who use MTurk should include CAPTCHA questions, attention checks, and type in response questions to ensure human participation.
There is worry that participants in online forums like MTurk do not provide high quality valid data (Buhrmester et al., 2011). However, many studies have indicated that similar quality data or even higher quality data can be obtained through MTurk when compared to other recruitment methods (Buhrmester et al., 2011; Eriksson & Simpson, 2010; Horton, Rand, & Zeckhauser, 2011; Lutz, 2016; Paolacci et al., 2010; Rand, 2012). Still, to check data quality, researchers can examine the data, just like in in-person studies, and consider excluding data that has very obvious random responding. To examine the accuracy and truthfulness of data provided by MTurk workers, comparison studies can also be done with samples collected through more traditional methods.
MTurk for Psychotherapy Research
Although there is a growing body of evidence supporting the validity of data obtained through MTurk for psychology research in general, less is known about whether it is an appropriate tool for psychotherapy research. It is possible that a sample of MTurk workers who engage in psychotherapy might in some way be different than individuals who present to psychotherapy clinics where research is typically conducted.
To date, only a few studies have specifically examined the clinical characteristics of MTurk users (Arditte, Cek, Shaw, & Timpano, 2016; Kim & Hodgins, 2017; Miller et al., 2017; Shapiro et al., 2013; Wymbs & Dawson, 2015). Overall, it has been noted that the rates of some clinical phenomena (such as depression, anxiety disorder, trauma, and substance use) met or exceeded the rates reported in the general population (Shapiro et al., 2013). Further, the psychometric properties of several clinical measures (e.g., BDI, BAI, DASS-21, PID-5) have been established in MTurk samples (Arditte et al., 2016; Miller et al., 2017; Shapiro et al., 2013).
Future Directions for Psychotherapy Research in MTurk
While these studies have added to the knowledge base regarding clinical phenomenon in a general population of MTurk users, there is more research that is needed for psychotherapy process and outcome researchers to be able to confidently use this research platform. Studies need to examine the prevalence of clinical symptoms and the psychometric properties of common instruments with a population of MTurk users who report currently engaging in psychotherapy. More specifically, research is needed that directly compares results obtained from MTurk workers who report engaging in psychotherapy to clients that might be presenting for psychotherapy in traditional clinics. In addition, further research is needed to establish the best practice standards (e.g., compensation rates, screening questions, eligibility requirements) for conducting psychotherapy research in MTurk.
Cite This Article
Tompkins, K. A. & Swift, J. K. (2019). An MTurk primer for psychotherapy researchers. Psychotherapy Bulletin, 54(2), 22-28.
Arditte, K. A., Çek, D., Shaw, A. M., & Timpano, K. R. (2016). The importance of assessing clinical phenomena in Mechanical Turk research. Psychological Assessment, 28(6), 684-691. https://doi.org/10.1037/pas0000217
Behrend, T. S., Sharek, D. J., Meade, A. W., & Wiebe, E. N. (2011). The viability of crowdsourcing for survey research. Behavior Research Methods, 43, 800-813. https://doi.org/10.3758/s13428-011-0081-0
Berinsky, A. J., Huber, G. A., & Lenz, G. S. (2012). Evaluating online labor markets for experimental research: Amazon.com’s Mechanical Turk. Political Analysis, 20(3), 351-368. https://doi.org/10.1093/pan/mpr057
Buhrmester, M., Kwang, T., & Gosling, S. D. (2011). Amazon’s Mechanical Turk: A new source of inexpensive, yet high-quality, data? Perspectives on Psychological Science, 6(1), 3-5. http://dx.doi.org/10.1177/1745691610393980
Casler, K., Bickel, L., & Hackett, E. (2013). Separate but equal? A comparison of participants and data gathered via Amazon’s MTurk, social media, and face-to-face behavioral testing. Computers in Human Behavior, 29(6), 2156-2160. http://dx.doi.org/10.1016/j.chb.2013.05.009
Chandler, J., & Shapiro, D. (2016). Conducting clinical research using crowdsourced convenience samples. Annual Review of Clinical Psychology, 12, 53-81. https://dx.doi.org/10.1146/annurev-clinpsy-021815-093623
Eriksson, K., & Simpson, B. (2010). Emotional reactions to losing explain gender differences in entering a risky lottery. Judgment and Decision Making, 5(3), 159-163.
Goodman, J. K., Cryder, C. E., & Cheema, A. (2013). Data collection in a flat world: The strengths and weaknesses of Mechanical Turk samples. Journal of Behavioral Decision Making, 26(3), 213-224. http://dx.doi.org/10.1002/bdm.1753
Hartz, S. M., Quan, T., Ibiebele, A., Fisher, S. L., Olfson, E., Salyer, P., & Bierut, L. J. (2017). The significant impact of education, poverty, and race on Internet-based research participant engagement. Genetics in Medicine, 19(2), 240-243, https://doi.org/10.1038/gim.2016.91
Horton, J. J., & Chilton, L. B. (2010). The labor economics of paid crowdsourcing. Proceedings of the 11th Association for Computing Machinery conference on Electronic Commerce. New York, NY: ACM.
Horton, J. J., Rand, D. G., & Zeckhauser, R. J. (2011). The online laboratory: Conducting experiments in a real labor market. Experimental Economics, 14(3), 399-425. https://doi.org/10.1007/s10683-011-9273-9
Ipeirotis, P. G. (2010a). Analyzing the Amazon Mechanical Turk marketplace. ACM XRDS: Crossroads, 17(2), 16-21. https://doi.org/10.1145/1869086.1869094
Ipeirotis, P. (2010b, March 9). The new demographics of Mechanical Turk. [blog post]. Retrieved from: http://www.behind-the-enemy-lines.com/2010/03/new-demographics-of-mechanical-turk.html
Ipeirotis, P. (2015, April 6). Demographics of Mechanical Turk: Now Live! [blog post]. Retrieved from: http://www.behind-the-enemy-lines.com/2015/04/demographics-of-mechanical-turk-now.html
Kim, H. S., & Hodgins, D. C. (2017). Reliability and validity of data obtained from alcohol, cannabis, and gambling populations on Amazon’s Mechanical Turk. Psychology of Addictive Behaviors, 31(1), 85-94. https://doi.org/10.1037/adb0000219
Lutz, J. (2016). The validity of crowdsourcing data in studying anger and aggressive behavior: A comparison of online and laboratory data. Social Psychology, 47(1), 38-51. https://doi.org/10.1027/1864-9335/a000256
Mason, W., & Suri, S. (2012). Conducting behavioral research on Amazon’s Mechanical Turk. Behavior Research Methods, 44(1), 1-23. https://doi.org/10.3758/s13428-011-0124-6
Miller, J. D., Crowe, M., Weiss, B., Maples-Keller, J. L., & Lynam, D. R. (2017). Using online, crowdsourcing platforms for data collection in personality disorder research: The example of Amazon’s Mechanical Turk. Personality Disorders: Theory, Research, and Treatment, 8(1), 26-34. https://doi.org/10.1037/per0000191
Paolacci, G., & Chandler, J. (2014). Inside the Turk: Understanding Mechanical Turk as a participant pool. Current Directions in Psychological Science, 23(3), 184-188. http://dx.doi.org/10.1177/0963721414531598
Paolacci, G., Chandler, J., & Ipeirotis, P. G. (2010). Running experiments on Amazon Mechanical Turk. Judgment and Decision Making, 5(5), 411-419.
Pontin, J. (2007, March 25). Artificial intelligence: With help from the humans. The New York Times. Retrieved from: http://www.nytimes.com/
Rand, D. G. (2012). The promise of Mechanical Turk: How online labor markets can help theorists run behavioral experiments. Journal of Theoretical Biology, 299, 172-179. https://doi.org/10.1016/j.jtbi.2011.03.004
Shapiro, D. N., Chandler, J., & Mueller, P. A. (2013). Using Mechanical Turk to study clinical populations. Clinical Psychological Science, 1(2), 213–220. https://doi.org/10.1177/2167702612469015
Smith, M. A., & Leigh, B. (1997). Virtual subjects: Using the Internet as an alternative source of subjects and research environment. Behavior Research Methods, 29(4), 496-505. http://dx.doi.org/10.3758/BF03210601
Wiens, T. K., & Walker, L. J. (2015). The chronic disease concept of addition: Helpful or harmful? Addiction Research & Theory, 23(4), 309-321. https://doi.org/10.3109/16066359.2014.987760
Wright, K. B. (2005). Researching Internet-based populations: Advantages and disadvantages of online survey research, online questionnaire authoring software packages, and web survey services. Journal of Computer-Mediated Communication, 10(3). https://doi.org/10.1111/j.1083-6101.2005.tb00259.x
Wymbs, B. T., & Dawson, A. E. (2015). Screening Amazon’s Mechanical Turk for adults with ADHD. Journal of Attention Disorders. https://doi.org/10.1177/1087054715597471
Zhou, H., & Fishbach, A. (2016). The pitfall of experimenting on the web: How unattended selective attrition leads to surprising (yet false) research conclusions. Journal of Personality and Social Psychology, 111(4), 493-504. https://doi.org/10.1037/pspa0000056