RESUMO
BACKGROUND: Deep learning has been increasingly investigated for assisting clinical in vitro fertilization (IVF). The first technical step in many tasks is to visually detect and locate sperm, oocytes, and embryos in images. For clinical deployment of such deep learning models, different clinics use different image acquisition hardware and different sample preprocessing protocols, raising the concern over whether the reported accuracy of a deep learning model by one clinic could be reproduced in another clinic. Here we aim to investigate the effect of each imaging factor on the generalizability of object detection models, using sperm analysis as a pilot example. METHODS: Ablation studies were performed using state-of-the-art models for detecting human sperm to quantitatively assess how model precision (false-positive detection) and recall (missed detection) were affected by imaging magnification, imaging mode, and sample preprocessing protocols. The results led to the hypothesis that the richness of image acquisition conditions in a training dataset deterministically affects model generalizability. The hypothesis was tested by first enriching the training dataset with a wide range of imaging conditions, then validated through internal blind tests on new samples and external multi-center clinical validations. RESULTS: Ablation experiments revealed that removing subsets of data from the training dataset significantly reduced model precision. Removing raw sample images from the training dataset caused the largest drop in model precision, whereas removing 20x images caused the largest drop in model recall. by incorporating different imaging and sample preprocessing conditions into a rich training dataset, the model achieved an intraclass correlation coefficient (ICC) of 0.97 (95% CI: 0.94-0.99) for precision, and an ICC of 0.97 (95% CI: 0.93-0.99) for recall. Multi-center clinical validation showed no significant differences in model precision or recall across different clinics and applications. CONCLUSIONS: The results validated the hypothesis that the richness of data in the training dataset is a key factor impacting model generalizability. These findings highlight the importance of diversity in a training dataset for model evaluation and suggest that future deep learning models in andrology and reproductive medicine should incorporate comprehensive feature sets for enhanced generalizability across clinics.
Assuntos
Aprendizado Profundo , Espermatozoides , Humanos , Projetos Piloto , Masculino , Espermatozoides/fisiologia , Fertilização in vitro/métodos , Processamento de Imagem Assistida por Computador/métodos , Análise do Sêmen/métodosRESUMO
BACKGROUND: The XPEN60 CRP&SAA (hereafter XPEN60) is a new automated hematology analyzer that can rapidly detect C-reactive protein (CRP), serum amyloid A (SAA), and blood cell counts (CBC), including the 5-part differential of white blood cells (5-DIFF). The aim of this study was to evaluate the analytical performance of XPEN60. METHODS: The analytical performance of XPEN60 was evaluated on the basis of several parameters, including the limit of blank (LoB), limit of detection (LoD), limit of quantitation (LoQ), precision, accuracy, carryover, linearity, clinical reportable range (CRR), and interference test. In addition, method comparisons between CBC and 5-DIFF, CRP, and SAA were performed on several systems. RESULTS: Total imprecision and accuracy for all parameters fell within acceptable criteria, and excellent measurements were observed in the dilution linearity (coefficient of determination, R2 > .99). LoBs and LoDs (0 and 0.21 mg/L for CRP, 1.1 and 2.27 mg/L for SAA) satisfy the manufacturer's statement. LoQs were 0.61 and 3.62 mg/L for CRP and SAA, respectively. No significant carryover or interference tests (<10%) were observed in this study. The comparison analysis demonstrated strong agreement between XPEN60 results and those of Sysmex-XN1000 (XN1000), except for basophils (Bas) and eosinophils (Eos). The data correlated well with E601 and Mindray CRP-M100 for CRP. CONCLUSION: XPEN60 was demonstrated satisfactory analytical performance, which made it well-suited for use in clinical laboratories, emergency departments, and community hospitals.