RESUMEN
Motivation: Segmental Duplications (SDs) are DNA fragments longer than 1 kbp, distributed within and between chromosomes and sharing more than 90% identity. Although they hold a significant role in genomic fluidity and adaptability, many key questions about their intrinsic characteristics and mutability remain unsolved due to the persistent difficulty of sequencing highly duplicated genomic regions. The recent development of long and linked-read NGS technologies will increase the need to search for SDs in genomes newly sequenced with these technics. The main limitation of SD analysis will soon be the availability of efficient detection software, to retrieve and compare SD genomic component between species or lineages. Results: In this paper, we present the open-source ASGART, 'A Segmental duplications Gathering And Refining Tool', developed to search for segmental duplications (SDs) in any assembled sequence. We have tested and benchmarked ASGART on five models organisms. Our results demonstrate ASGART's ability to extract SDs from any genome-wide sequence, regardless of genomic size or organizational complexity and quicker than any other software available. Availability and implementation: The online version of ASGART is available at http://asgart.irit.fr. The source code of ASGART is available both on the ASGART website and at https://github.com/delehef/asgart. Supplementary information: Supplementary data are available at Bioinformatics online.
Asunto(s)
Duplicaciones Segmentarias en el Genoma , Análisis de Secuencia de ADN/métodos , Programas Informáticos , Animales , Mapeo Cromosómico/métodos , Eucariontes/genética , Genómica/métodos , HumanosRESUMEN
Artificial intelligence (AI)-assisted diagnosis is an ongoing revolution in pathology. However, a frequent drawback of AI models is their propension to make decisions based rather on bias in training dataset than on concrete biological features, thus weakening pathologists' trust in these tools. Technically, it is well known that microscopic images are altered by tissue processing and staining procedures, being one of the main sources of bias in machine learning for digital pathology. So as to deal with it, many teams have written about color normalization and augmentation methods. However, only a few of them have monitored their effects on bias reduction and model generalizability. In our study, two methods for stain augmentation (AugmentHE) and fast normalization (HEnorm) have been created and their effect on bias reduction has been monitored. Actually, they have also been compared to previously described strategies. To that end, a multicenter dataset created for breast cancer histological grading has been used. Thanks to it, classification models have been trained in a single center before assessing its performance in other centers images. This setting led to extensively monitor bias reduction while providing accurate insight of both augmentation and normalization methods. AugmentHE provided an 81% increase in color dispersion compared to geometric augmentations only. In addition, every classification model that involved AugmentHE presented a significant increase in the area under receiving operator characteristic curve (AUC) over the widely used RGB shift. More precisely, AugmentHE-based models showed at least 0.14 AUC increase over RGB shift-based models. Regarding normalization, HEnorm appeared to be up to 78x faster than conventional methods. It also provided satisfying results in terms of bias reduction. Altogether, our pipeline composed of AugmentHE and HEnorm improved AUC on biased data by up to 21.7% compared to usual augmentations. Conventional normalization methods coupled with AugmentHE yielded similar results while being much slower. In conclusion, we have validated an open-source tool that can be used in any deep learning-based digital pathology project on H&E whole slide images (WSI) that efficiently reduces stain-induced bias and later on might help increase pathologists' confidence when using AI-based products.
Asunto(s)
Inteligencia Artificial , Neoplasias de la Mama , Femenino , Humanos , Colorantes , Aprendizaje Automático , Coloración y Etiquetado , Estudios Multicéntricos como AsuntoRESUMEN
We designed artificial intelligence-based prediction models (AIPM) using 52 diagnostic variables from 3687 patients included in the DATAML registry treated with intensive chemotherapy (IC, N = 3030) or azacitidine (AZA, N = 657) for an acute myeloid leukemia (AML). A neural network called multilayer perceptron (MLP) achieved a prediction accuracy for overall survival (OS) of 68.5% and 62.1% in the IC and AZA cohorts, respectively. The Boruta algorithm could select the most important variables for prediction without decreasing accuracy. Thirteen features were retained with this algorithm in the IC cohort: age, cytogenetic risk, white blood cells count, LDH, platelet count, albumin, MPO expression, mean corpuscular volume, CD117 expression, NPM1 mutation, AML status (de novo or secondary), multilineage dysplasia and ASXL1 mutation; and 7 variables in the AZA cohort: blood blasts, serum ferritin, CD56, LDH, hemoglobin, CD13 and disseminated intravascular coagulation (DIC). We believe that AIPM could help hematologists to deal with the huge amount of data available at diagnosis, enabling them to have an OS estimation and guide their treatment choice. Our registry-based AIPM could offer a large real-life dataset with original and exhaustive features and select a low number of diagnostic features with an equivalent accuracy of prediction, more appropriate to routine practice.
Asunto(s)
Antimetabolitos Antineoplásicos , Leucemia Mieloide Aguda , Humanos , Antimetabolitos Antineoplásicos/uso terapéutico , Inteligencia Artificial , Resultado del Tratamiento , Leucemia Mieloide Aguda/diagnóstico , Leucemia Mieloide Aguda/tratamiento farmacológico , Leucemia Mieloide Aguda/genética , Azacitidina/uso terapéutico , Sistema de RegistrosRESUMEN
A crucial step in inbred plant breeding is the choice of mating design to derive high-performing inbred varieties while also maintaining a competitive breeding population to secure sufficient genetic gain in future generations. In practice, the mating design usually relies on crosses involving the best parental inbred lines to ensure high mean progeny performance. This excludes crosses involving lower performing but more complementary parents in terms of favorable alleles. We predicted the ability of crosses to produce putative outstanding progenies (high mean and high variance progeny distribution) using genomic prediction models. This study compared the benefits and drawbacks of 7 genomic cross selection criteria (CSC) in terms of genetic gain for 1 trait and genetic diversity in the next generation. Six CSC were already published, and we propose an improved CSC that can estimate the proportion of progeny above a threshold defined for the whole mating plan. We simulated mating designs optimized using different CSC. The 835 elite parents came from a real breeding program and were evaluated between 2000 and 2016. We applied constraints on parental contributions and genetic similarities between selected parents according to usual breeder practices. Our results showed that CSC based on progeny variance estimation increased the genetic value of superior progenies by up to 5% in the next generation compared to CSC based on the progeny mean estimation (i.e. parental genetic values) alone. It also increased the genetic gain (up to 4%) and/or maintained more genetic diversity at QTLs (up to 4% more genic variance when the marker effects were perfectly estimated).