June 6, 2022 – At a time of growing interest in the potential role of artificial intelligence (AI) technology in medicine and healthcare, a new study finds that the groundbreaking ChatGPT chatbot performs poorly on a major specialty self-assessment tool, reports Urology Practice®, an Official Journal of the American Urological Association (AUA). The journal is published in the Lippincott portfolio by Wolters Kluwer.
ChatGPT achieved less than a 30% rate of correct answers on the AUA's widely used Self-Assessment Study Program for Urology (SASP). " ChatGPT not only has a low rate of correct answers regarding clinical questions in urologic practice, but also makes certain types of errors that pose a risk of spreading medical misinformation," comments Christopher M. Deibert, MD, MPH, and colleagues of University of Nebraska Medical Center.
Can AI-trained chatbot pass a test of clinical urology knowledge?
Recent advances in large language models (LLMs) provide opportunities for adapting AI technology as a tool for mediating human interaction. "With adequate training and application, these AI systems can process complex information, analyze relationships between ideas, and generate coherent responses to an inquiry," according to the authors.
ChatGPT (Chat Generative Pre-Trained Transformer) is an innovative LLM chatbot that has spurred interest in use in a wide range of settings – including health and medicine. In one recent study, ChatGPT scored at or near passing levels on all three steps of the United States Medical Licensing Examination (USMLE), without any special training or feedback on medical topics. Could this innovative AI-trained tool perform similarly well on a more advanced test of clinical knowledge in a surgical specialty?
To find out, Dr. Deibert and colleagues evaluated ChatGPT's performance on the AUA's Self-Assessment Study Program (SASP) – a 150-question practice examination addressing the core curriculum of medical knowledge in urology. The SASP is a valuable test of clinical knowledge for urologists in training and practicing specialists preparing for Board certification. The study excluded 15 questions containing visual information such as pictures or graphs.
ChatGPT scores low on SASP, with 'redundant and cyclical' explanations
Overall, ChatGPT gave correct answers to less than 30% of SASP questions: 28.2% of multiple-choice questions and 26.7% of open-ended questions. The chatbot provided "indeterminate" responses to several questions. On these questions, accuracy was decreased when the LLM model was asked to regenerate its answers.
For most open-ended questions, ChatGPT provided an explanation for the selected answer. The explanations provided by ChatGPT were longer than those provided by SASP, but "frequently redundant and cyclical in nature," according to the authors.
"Overall, ChatGPT often gave vague justifications with broad statements and rarely commented on specifics," Dr. Deibert and colleagues write. Even when given feedback, "ChatGPT continuously reiterated the original explanation despite it being inaccurate."
ChatGPT's poor accuracy on the SASP contrasts with its performance on the USMLE and other graduate-level exams. The authors suggest that while ChatGPT may do well on tests requiring recall of facts, it seems to fall short on questions pertaining to clinical medicine, which require "simultaneous weighing of multiple overlapping facts, situations and outcomes."
"Given that LLMs are limited by their human training, further research is needed to understand their limitations and capabilities across multiple disciplines before it is made available for general use," Dr. Deibert and colleagues conclude. "As is, utilization of ChatGPT in urology has a high likelihood of facilitating medical misinformation for the untrained user."
Wolters Kluwer provides trusted clinical technology and evidence-based solutions that engage clinicians, patients, researchers and students in effective decision-making and outcomes across healthcare. We support clinical effectiveness, learning and research, clinical surveillance and compliance, as well as data solutions. For more information about our solutions, visit https://www.wolterskluwer.com/en/health and follow us on LinkedIn and Twitter @WKHealth.
About Urology Practice
An Official Journal of the American Urological Association (AUA), Urology Practice focuses on clinical trends, challenges and practice applications in the four areas of Business, Health Policy, the Specialty and Patient Care. Information that can be used in everyday practice will be provided to the urology community via peer-reviewed clinical practice articles (including best practices, reviews, clinical guidelines, select clinical trials, editorials and white papers), "research letters" (brief original studies with an important clinical message), the business of the practice of urology, urology health policy issues, urology education and training, as well as content for urology care team members. Urology Practice is now indexed by MEDLINE, the National Library of Medicine's bibliographic database of life sciences and biomedical information.
About the American Urological Association
Founded in 1902 and headquartered near Baltimore, Maryland, the American Urological Association is a leading advocate for the specialty of urology, and has more than 23,000 members throughout the world. The AUA is a premier urologic association, providing invaluable support to the urologic community as it pursues its mission of fostering the highest standards of urologic care through education, research and the formulation of health care policy. To learn more about the AUA visit: www.auanet.org
About Wolters Kluwer
Wolters Kluwer (EURONEXT: WKL) is a global leader in professional information, software solutions, and services for the healthcare, tax and accounting, financial and corporate compliance, legal and regulatory, and corporate performance and ESG sectors. We help our customers make critical decisions every day by providing expert solutions that combine deep domain knowledge with specialized technology and services.
Wolters Kluwer reported 2022 annual revenues of €5.5 billion. The group serves customers in over 180 countries, maintains operations in over 40 countries, and employs approximately 20,000 people worldwide. The company is headquartered in Alphen aan den Rijn, the Netherlands.
MEDIA CONTACTRegister for reporter access to contact details