A portable retina fundus photos dataset for clinical, demographic, and diabetic retinopathy prediction
ChenweiWu, David Restrepo, Luis Filipe Nakayama, Lucas Zago Ribeiro, Zitao Shuai, Nathan Santos Barboza, Maria Luiza Vieira Sousa, Raul Dias Fitterman, Alexandre Durao Alves Pereira, CaioVinicius Saito Regatieri, Jose Augusto Stuchi, Fernando Korn Malerbi & Rafael E.Andrade
Ophthalmological data is scarce in low- and middle-income countries (LMICs), and traditional tabletop fundus cameras are costly and inaccessible for widespread screening. This lack of data and accessibility hinders the early diagnosis and monitoring of ocular pathologies, such as diabetic retinopathy (DR), a leading cause of preventable blindness. The emergence of compact, portable retinal cameras offers a cost-effective and accessible solution for community health screenings and telemedicine, but a lack of representative datasets captured with this new modality, particularly from diverse LMIC populations, prevents the development and fair validation of generalizable Artificial Intelligence (AI) algorithms.
The work received approval from the Institutional Review Board of Instituto de Ensino Superior Presidente Tancredo de Almeida Neves (IPTAN) under protocol number CAAE 64219922.3.0000.9667. It encompassed retinal fundus photos alongside clinical and demographic data. Notably, all identifiable patient information was removed from the images within this dataset to ensure confidentiality and adherence to ethical standards. All patients have given written consent to the image capture and open publication.
The mBRSET dataset, comprising 5,164 images from 1,291 diverse Brazilian patients captured with a handheld camera (Phelcom Eyer), was successfully collected and made publicly available. Patient demographics showed a majority of females (65.1%) and a mean age of 61.4 years, with a high prevalence of systemic hypertension (71.4%) and a significant portion lacking health insurance (92.3%). Image analysis revealed that 76.8% of images had no DR, while 4.3% had Proliferative DR. State-of-the-art deep learning models (ConvNeXt V2, DINO V2, and SwinV2) achieved high performance in clinical tasks, with F1 scores up to 87.4 for binary DR classification and 83.06 for macular edema detection. Notably, the models also demonstrated the ability to predict demographic and socioeconomic factors like gender (F1 score up to 84.38) and insurance status (F1 score up to 76.11) from retinal images, a previously unexplored finding. The diagnostic performance remained robust despite the inherent variability of handheld camera images.
The mBRSET dataset is the first publicly available diabetic retinopathy dataset captured using handheld retinal cameras in real-world, high-burden LMIC settings (Brazil). By including extensive clinical and demographic metadata, it serves as a crucial resource for developing, benchmarking, and validating fair and generalizable AI algorithms for ocular care. The high performance of deep learning models on this dataset, even with the image variability of portable devices, validates its utility for real-world applications. Furthermore, the capacity of AI models to infer socioeconomic disparities from retinal images emphasizes the dataset's potential for population health research and addressing healthcare inequalities in resource-constrained environments.