Crowdsourced annotations research accepted to MICCAI 2024 conference

Read on for an overview of the research and key results proving the power of crowdsourced annotations to improve AI-based congestion scoring for bedside ultrasounds.

We are excited to share that our latest research manuscript in collaboration with Brigham and Women’s Hospital (BWH), "Can Crowdsourced Annotations Improve AI-based Congestion Scoring for Bedside Lung Ultrasound?", has been accepted for presentation at the MICCAI 2024 conference. This acceptance underscores the importance of our work in improving AI models for medical data analysis using crowdsourced annotations.

‍

Addressing Variability in Lung Ultrasound Interpretation

Lung ultrasound (LUS) is a crucial diagnostic tool in emergency and acute care settings due to its portability and cost-effectiveness. However, the interpretation of LUS images, especially B-line artifacts indicative of pulmonary congestion, often varies significantly among clinicians. This variability can impact diagnostic accuracy and patient outcomes.

Automated algorithms can help reduce this variability when used in conjunction with clinician review. However, the training of these algorithms requires high-quality annotated data which can be costly to obtain via experts. Our study leverages crowdsourced annotations from Centaur’s labeling platform to enhance AI model training. By collecting over 550,000 crowdsourced opinions on LUS images from 299 patients, we generated a substantial dataset of 31,000 B-line annotations. This large-scale annotation effort enabled us to train a more accurate AI model for LUS analysis, achieving a 94% accuracy in B-line counting (within a margin of 1 B-line) on a test set of 100 patients.

‍

Key Results

1. Improvement in AI Model Performance with Data Volume

The research provides robust evidence for the utility of crowdsourced data in AI model training. By bootstrapping a small amount of high-quality, expert-labeled data with large volumes of crowdsourced annotations, the study shows substantial improvements in AI model performance. The AI model trained on this augmented dataset demonstrated high accuracy and reliability in B-line detection, which is crucial for accurate pulmonary congestion assessment.

‍

2. Time Efficiency of Crowdsourced Annotations

The study highlights the significant time savings achieved through crowdsourcing compared to expert labeling. Traditional expert labeling is time-intensive, with our five experts spending a total of 75 seconds on average to annotate an image from a LUS video clip. Given that our crowdsourcing approach generated expert-quality annotations on 31,000 LUS images from analyzing 550,000 crowdsourced opinions, we saved 650 expert hours. Each expert annotating for one hour per day would translate to 48 annotated frames per day per expert, whereas the Centaur platform generated 1,500 annotated frames per day. This efficiency is critical for developing AI models that require extensive labeled datasets.

‍

3. Generalizability of Centaur’s Crowdsourcing Approach

The study serves as another proof point for the generalizability of our crowdsourcing platform across various medical data labeling tasks. Previous studies have shown that crowdsourced annotations from the Centaur platform can match or outperform individual experts in accuracy in classification of skin lesions (Duhaime et al., 2023), B-line classification (Duggan et al., 2024), and B-line segmentation (Jin et al., 2023), and Centaur crowdsourced annotations have been used to train and improve highly accurate deep learning models in multimodal surgical tissue segmentation (Skinner et al., 2024), breast tissue pathology (link), and lung sound detection from digital stethoscopes (link). This study extends our previous results showing accuracy of our crowdsourced B-line segmentations and demonstrates their value in training accurate medical deep learning models. This versatility suggests that Centaur’s crowdsourcing methods can be effectively applied to a wide range of medical data annotation needs.

‍

Practical Implications and Future Work

For clinicians, AI models trained with volumes of expert-quality data now achievable using crowdsourced annotations can provide more consistent and accurate LUS interpretations, aiding in better diagnosis and treatment decisions. For researchers and developers, the study offers a scalable and cost-effective method for generating large annotated datasets, which are essential for training high-performance AI models.

For our ongoing research collaboration with BWH in ultrasound AI, we are addressing challenges such as annotation ambiguity between single and merged B-lines with novel unambiguous pulmonary congestion measures. This will enhance the consistency of AI model outputs.

For the Centaur platform, we are continually optimizing our crowdsourcing algorithms for labeler skill assessment, opinion aggregation, and measuring annotation confidence. We are also expanding its application to other areas of medical imaging, including 3D segmentation of DICOM studies and named entity recognition (NER) for medical and scientific texts.

‍

Conclusion

The acceptance of our manuscript to MICCAI 2024 marks a significant step forward in the application of crowdsourced annotations for medical AI. Our findings suggest that crowdsourcing is not only a viable alternative to traditional expert labeling but also a powerful tool for enhancing AI model accuracy and efficiency. As we present our work at MICCAI 2024, we look forward to contributing to the ongoing advancements in medical AI and data annotation methodologies.

We extend our gratitude to the researchers, crowd annotators, and collaborators who made this study possible. We are excited to share our insights and continue exploring innovative solutions to improve medical data analysis.

Our research was only possible with the support of Mass Life Sciences. We are very excited to continue our partnership as we advance AI development in healthcare.

‍

➡️ Visit our research page to learn more about our collaborations.

📄 Want a copy of our existing publications? Download our Research e-book.

‍

References

Duhaime, E. P., Jin, M., Moulton, T., Weber, J., Kurtansky, N. R., Halpern, A., & Rotemberg, V. (2023). Nonexpert Crowds Outperform Expert Individuals in Diagnostic Accuracy on a Skin Lesion Diagnosis Task. 2023 IEEE 20th International Symposium on Biomedical Imaging (ISBI). https://doi.org/10.1109/isbi53787.2023.10230646

Duggan, N. M., Jin, M., Mendicuti, M. a. D., Hallisey, S., Bernier, D., Selame, L. A., Asgari-Targhi, A., Fischetti, C. E., Lucassen, R., Samir, A. E., Duhaime, E., Kapur, T., & Goldsmith, A. J. (2024). Gamified Crowdsourcing as a Novel Approach to lung Ultrasound dataset Labeling: A Prospective Analysis. Journal of Medical Internet Research, 26, e51397. https://doi.org/10.2196/51397

Jin, M., Duggan, N. M., Bashyakarla, V., Mendicuti, M. a. D., Hallisey, S., Bernier, D., Stegeman, J., Duhaime, E., Kapur, T., & Goldsmith, A. J. (2023, December 15). Expert-Level annotation quality achieved by gamified crowdsourcing for B-line segmentation in lung ultrasound. arXiv.org. https://arxiv.org/abs/2312.10198v1

Skinner, G., Chen, T., Jentis, G., Liu, Y., McCulloh, C., Harzman, A., Huang, E., Kalady, M., & Kim, P. (2024). Real-time near infrared artificial intelligence using scalable non-expert crowdsourcing in colorectal surgery. Npj Digital Medicine, 7(1). https://doi.org/10.1038/s41746-024-01095-8

‍

Accurate and scalable health data labeling and model evaluation

Blog

Crowdsourced annotations research accepted to MICCAI 2024 conference

Read on for an overview of the research and key results proving the power of crowdsourced annotations to improve AI-based congestion scoring for bedside ultrasounds.

Addressing Variability in Lung Ultrasound Interpretation

Key Results

Practical Implications and Future Work

Conclusion

References

Related posts

Multimodal Annotation Case Study: Gamified data labeling boosts model accuracy from 70% to 93% for Eight Sleep

How multiple opinions drive huge gains in data labeling accuracy

The power of metadata