As AI capabilities advance in complex medical scenarios that doctors face on a daily basis, the technology remains controversial in medical communities.

  • geosoco@kbin.social
    link
    fedilink
    arrow-up
    9
    ·
    10 months ago

    Details matter here. Here’s a few from the study:

    ChatGPT achieved an overall accuracy of 71.7% (95% CI 69.3%-74.1%) across all 36 clinical vignettes. The LLM demonstrated the highest performance in making a final diagnosis with an accuracy of 76.9% (95% CI 67.8%-86.1%) and the lowest performance in generating an initial differential diagnosis with an accuracy of 60.3% (95% CI 54.2%-66.6%). Compared to answering questions about general medical knowledge, ChatGPT demonstrated inferior performance on differential diagnosis (β=–15.8%; P<.001) and clinical management (β=–7.4%; P=.02) question types.

    At the time of the study, 36 vignette modules were available on the web, and 34 of the 36 were available on the web as of ChatGPT’s September 2021 training data cutoff date. All 36 modules passed the eligibility criteria of having a primarily textual basis and were included in the ChatGPT model assessment.

    All questions requesting the clinician to analyze images were excluded from our study, as ChatGPT is a text-based AI without the ability to interpret visual information.