RESUMO
Pigeons' unexpected competence in learning to categorize unseen histopathological images has remained an unexplained discovery for almost a decade (Levensonet al2015PLoS One10e0141357). Could it be that knowledge transferred from their bird's-eye views of the earth's surface gleaned during flight contributes to this ability? Employing a simulation-based verification strategy, we recapitulate this biological phenomenon with a machine-learning analog. We model pigeons' visual experience during flight with the self-supervised pre-training of a deep neural network on BirdsEyeViewNet; our large-scale aerial imagery dataset. As an analog of the differential food reinforcement performed in Levensonet al's study 2015PLoS One10e0141357), we apply transfer learning from this pre-trained model to the same Hematoxylin and Eosin (H&E) histopathology and radiology images and tasks that the pigeons were trained and tested on. The study demonstrates that pre-training neural networks with bird's-eye view data results in close agreement with pigeons' performance. These results support transfer learning as a reasonable computational model of pigeon representation learning. This is further validated with six large-scale downstream classification tasks using H&E stained whole slide image datasets representing diverse cancer types.
Assuntos
Columbidae , Neoplasias , Redes Neurais de Computação , Animais , Columbidae/fisiologia , Neoplasias/patologia , Neoplasias/diagnóstico por imagem , Aprendizado de Máquina , Voo Animal/fisiologiaRESUMO
Patients with High-Grade Serous Ovarian Cancer (HGSOC) exhibit varied responses to treatment, with 20-30% showing de novo resistance to platinum-based chemotherapy. While hematoxylin-eosin (H&E) pathological slides are used for routine diagnosis of cancer type, they may also contain diagnostically useful information about treatment response. Our study demonstrates that combining H&E-stained Whole Slide Images (WSIs) with proteomic signatures using a multimodal deep learning framework significantly improves the prediction of platinum response in both discovery and validation cohorts. This method outperforms the Homologous Recombination Deficiency (HRD) score in predicting platinum response and overall patient survival. The study sets new performance benchmarks and explores the intersection of histology and proteomics, highlighting phenotypes related to treatment response pathways, including homologous recombination, DNA damage response, nucleotide synthesis, apoptosis, and ER stress. This integrative approach has the potential to improve personalized treatment and provide insights into the therapeutic vulnerabilities of HGSOC.
RESUMO
Body condition scoring is a simple method to estimate the energy supply of dairy cattle. Our study aims to investigate the accuracy with which supervised machine learning, specifically a deep convolutional neural network (CNN), can be used to retrieve body condition score (BCS) classes estimated by an expert. We recorded images of animals' rumps in three large-scale farms using a simple action camera. The images were annotated with classes and three different-sized bounding boxes by an expert. A CNN pretrained model was fine-tuned on 12 and 3 BCS classes. Training in 12 classes with a 0 error range, the Cohen's kappa value yielded minimal agreement between the model predictions and ground truth. Allowing an error range of 0.25, we obtained minimum or weak agreement. With an error range of 0.5, we had strong or almost perfect agreement. The kappa values for the approach trained on three classes show that we can classify all animals into BCS categories with at least moderate agreement. Furthermore, CNNs trained on 3 BCS classes showed a remarkably higher proportion of strong agreement than those trained in 12 classes. The prediction precision when training with various annotation region sizes showed no meaningful differences. The weights of our trained CNNs are freely available, supporting similar works.
RESUMO
Leveraging recent advances in computational modeling of proteins with AlphaFold2 (AF2) we provide a complete curated data set of all single mutations from each of the 7 main SARS-CoV-2 lineages spike protein receptor binding domain (RBD) resulting in 3819X7 = 26733 PDB structures. We visualize the generated structures and show that AF2 pLDDT values are correlated with state-of-the-art disorder approximations, implying some internal protein dynamics are also captured by the model. Joint increasing mutational coverage of both structural and phenotype data coupled with advances in machine learning can be leveraged to accelerate virology research, specifically future variant prediction. We hope this data release can offer assistance into further understanding of the local and global mutational landscape of SARS-CoV-2 as well as provide insight into the biological understanding that 3D structure acts as a bridge between protein genotype and phenotype.
Assuntos
COVID-19 , SARS-CoV-2 , Humanos , Simulação por Computador , Furilfuramida , Mutação , SARS-CoV-2/genéticaRESUMO
IMPORTANCE: Climate-sensitive disease vectors, such as ticks, respond to the environment with changes in their microbiome. These changes can affect the emergence or re-emergence of various vector-borne pathogens, such as the causative agent of Lyme borreliosis (LB) or tick-borne encephalitis. This aspect is particularly emphasized in light of climate change. The climatically representative assessment of microbiome differences in various developmental stages of the most common Central European tick species, Ixodes ricinus, deepens our understanding of the potential climatic factors behind microbial relative abundance and interaction changes. This knowledge can support the development of novel disease vector control strategies.
Assuntos
Ixodes , Doença de Lyme , Animais , Hungria , Doença de Lyme/epidemiologia , Inquéritos e QuestionáriosRESUMO
Statistical learning algorithms strongly rely on an oversimplified assumption for optimal performance, that is, source (training) and target (testing) data are independent and identically distributed. Variation in human tissue, physician labeling and physical imaging parameters (PIPs) in the generative process, yield medical image datasets with statistics that render this central assumption false. When deploying models, new examples are often out of distribution with respect to training data, thus, training robust dependable and predictive models is still a challenge in medical imaging with significant accuracy drops common for deployed models. This statistical variation between training and testing data is referred to as domain shift (DS).To the best of our knowledge we provide the first empirical evidence that variation in PIPs between test and train medical image datasets is a significant driver of DS and model generalization error is correlated with this variance. We show significant covariate shift occurs due to a selection bias in sampling from a small area of PIP space for both inter and intra-hospital regimes. In order to show this, we control for population shift, prevalence shift, data selection biases and annotation biases to investigate the sole effect of the physical generation process on model generalization for a proxy task of age group estimation on a combined 44 k image mammogram dataset collected from five hospitals.We hypothesize that training data should be sampled evenly from PIP space to produce the most robust models and hope this study provides motivation to retain medical image generation metadata that is almost always discarded or redacted in open source datasets. This metadata measured with standard international units can provide a universal regularizing anchor between distributions generated across the world for all current and future imaging modalities.