DinoBloom: A Foundation Model for Generalizable Cell Embeddings in Hematology

简介

在血液学领域，计算模型具有显著的潜力，可以提高诊断准确性、简化工作流程，并减少分析外周血或骨髓涂片中单个细胞的繁琐工作。然而，由于大批量效应、小数据集大小和从自然图像中进行转移学习的表现不佳，计算模型的临床应用受到了阻碍。为了解决这些挑战，我们介绍了DinoBloom，这是血液学中单个细胞图像的第一个基础模型，利用了定制的DINOv2流程。我们的模型建立在13个不同的、公开可用的外周血和骨髓涂片数据集的广泛收集基础之上，是迄今为止血液学中最大的开源队列，包括超过380,000个白细胞图像。为了评估其泛化能力，我们在具有挑战性的领域转移的外部数据集上进行了评估。我们展示了我们的模型在血液和骨髓涂片的细胞类型分类的线性探测和k近邻评估以及弱监督多实例学习方面，比现有的医学和非医学视觉模型表现更好。四个DinoBloom模型（small、base、large和giant）的家族可以适应广泛的下游应用，成为分类问题的强基线，并促进新数据集中批次效应的评估。所有模型均可在github.com/marrlab/DinoBloom上获得。
图表
解决问题

DinoBloom: A Foundation Model for Hematology Single Cell Images
关键思路

The paper proposes DinoBloom, the first foundation model for single cell images in hematology, utilizing a tailored DINOv2 pipeline. The model is built upon an extensive collection of 13 diverse, publicly available datasets of peripheral blood and bone marrow smears, comprising over 380,000 white blood cell images. The model outperforms existing medical and non-medical vision models in cell-type classification and acute myeloid leukemia subtyping.
其它亮点

The model is evaluated on an external dataset with a challenging domain shift and shows promising generalization capability. A family of four DinoBloom models can be adapted for a wide range of downstream applications and facilitate the assessment of batch effects in new datasets. The paper also provides open-source code for all models on GitHub.
相关研究

Recent related studies include 'Deep learning for image analysis in hematology: A review' and 'Deep learning for hematological malignancy: Detection and diagnosis'.

DinoBloom: A Foundation Model for Generalizable Cell Embeddings in Hematology

评论