Large language models for structured cardiovascular data extraction: a foundation for scalable research and clinical applications
article
Aims Automated extraction of information from cardiac reports would benefit both clinical reporting and research. Large language models (LLMs) hold promise for such automation, but their clinical performance and practical implementation across various computational environments remain unclear. This study aims to evaluate the feasibility and performance of LLM-based classification of echocardiogram and invasive coronary angiography reports, using real-world clinical data across local, high-performance computing and cloud-based platforms. Methods and results The angiography and echocardiography reports of 1000 patients, admitted with acute coronary syndrome, were labelled for multiple key diagnostic elements, including left ventricular function (LVF), culprit vessel, and acute occlusions. Report classification models were developed using LLMs via (i) prompt-based and (ii) fine-tuning approaches. Performance was assessed across different model types and compute infrastructures, with attention to class imbalance, ambiguous label annotations, and implementation costs. Large language models demonstrated strong performance in extracting structured diagnostic information from cardiac reports. Cloud-based models (such as GPT-4o) achieved the highest accuracy (0.87 for culprit vessel and 1.0 for LVF) and generalizability, but also smaller models run on a local high-performance cluster achieved reasonable accuracy, especially for less complex tasks (0.634 for culprit vessel and 0.984 for LVF). Classification was feasible with minimal pre-processing, enabling potential integration into electronic health record systems or research pipelines. Class imbalance, reflective of real-world prevalence, had a greater impact on fine-tuning approaches. Conclusion Large language models can reliably classify structured cardiology reports across diverse computed infrastructures. Their accuracy and adaptability support their use in clinical and research settings, particularly for scalable report structuring and dataset generation. © The Author(s) 2025. Published by Oxford University Press on behalf of the European Society of Cardiology.
Topics
TNO Identifier
1025845
Source title
European Heart Journal Digital Health
Pages
1-10