Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Copyright © 2024. All rights reserved by Centaur Labs.
Researchers from Microsoft Research and the University of Alicante released PadChest-GR (Grounded-Reporting) in late 2024, an innovative dataset designed to improve the quality of generative AI models for Chest X-ray (CXR) imaging.
The team used the Centaur Labs platform to complete all the annotations for this dataset, and we’re thrilled to have been able to contribute to their success. This dataset is now available to researchers globally.
If Radiology Report Generation (RRG) aims to create free-text radiology reports from clinical images, Grounded Radiology Report Generation (GRRG) takes it a step further, by including the localization of individual findings in the image. The 2024 paper "MAIRA-2: Grounded Radiology Report Generation" introduces both the first model to demonstrate the power of GRRG (MAIRA-2), as well as the task of GRRG and the output of a '"Grounded Radiology Report".
The MAIRA-2 research team defines a "grounded radiology report" as "a list of sentences from the Findings section [of a radiology report], each describing at most a single observation from the image(s), and associated with zero or more spatial annotations indicating the location of that observation if appropriate." An example of a "Grounded radiology report" is below.
By spatially grounding radiological findings, AI teams will be able to more easily verify the quality of the draft radiology reports their models generate. This verification is essential, as model quality and explainability are critical to build both clinician and patient trust in AI, particularly in generative AI.
Today, there are many CXR image datasets that are labelled for diagnosis and finding classification tasks, or that come with the associated text-based radiology reports for automated draft report generation. Some datasets also include spatial annotations to localise labels (for finding, anatomy, or device; e.g. ‘pneumothorax’) or single finding phrases, such as ‘Left retrocardiac opacity’.
What has been missing - and needed - to enable AI teams to build GRRG models are datasets that have both the spatial annotations, and the direct links to the complete sets of descriptive sentences from the Findings.
PadChest-GR is the first manually curated dataset for Grounded Radiology Report Generation (GRRG).
It includes:
We collaborated closely with researchers from Microsoft Research, the University of Alicante and the rest of the team to ensure seamless annotation of this novel dataset. Radiologist annotators used our HIPAA-compliant annotation platform to complete all data annotation.
Annotation was completed in two stages:
For both stages, every study or finding was analyzed independently by two professionals. The frontal image was always displayed beside the prior image (when available), so findings regarding progression could be identified.
The development of PadChest-GR was also supported by Microsoft Research, the Department of Radiology at University Hospital Sant Joan d’Alacant, Universitat d'Alacant, MedBravo, and the University of Cambridge. The research was financially supported by the University of Alicante-Microsoft research collaboration, which is funded by Microsoft.
📚 Read the pre-print - You can read the complete pre-print about the PadChest-GR dataset here - "PadChest-GR: A Bilingual Chest X-ray Dataset for Grounded Radiology Report Generation"
🎁 Use the dataset - Researchers interested in using the PadChest-GR dataset can access it and its accompanying documentation here.
🛠️ Build a dataset to enable GRRG - The MAIRA-2 research team created a useful and public resource to get you started - "Grounded Radiology Reporting: Annotation Protocol".
👋 Reach out to us - If you are interested in building datasets to enable GRRG - we'd be happy to support you!