Enhancing sonologist examination performance with large language models: an analytical study of ChatGPT-4 and Claude 3
Abstract
Aim: To evaluate the effectiveness of two large language models, ChatGPT-4 and Claude 3, in improving the accuracy of question responses by senior sonologist and junior sonologist.
Material and methods: A senior and a junior sonologist were given a practice exam. After answering the questions, they reviewed the responses and explanations provided by ChatGPT-4 and Claude 3. The accuracy and scores before and after incorporating the models' input were analyzed to compare their effectiveness.
Results: No statistically significant differences were found between the two models' responses scores for each section (all p>0.05). For junior sonologist, both ChatGPT-4 (p=0.039) and Claude 3 (p=0.039) significantly improved scores in basic knowledge. The responses provided by ChatGPT-4 also significantly improved scores in relevant professional knowledge (p=0.038), though their explanations did not (p=0.077). For all exam sections, both models' responses and explanations significantly improved scores (all p<0.05). For senior sonologist, both ChatGPT-4's responses (p=0.022) and explanations (p=0.034) improved scores in basic knowledge, as did Claude 3's explanations (p=0.003). Across all sections, Claude 3's explanations significantly improved scores (p=0.041).
Conclusion: ChatGPT-4 and Claude 3 significantly improved sonologist' examination performance, particularly in basic knowledge.
Keywords
DOI: http://dx.doi.org/10.11152/mu-4505
Refbacks
- There are currently no refbacks.