Búsqueda | Portal de Búsqueda de la BVS España

1.

Machine learning models for accurate prioritization of variants of uncertain significance.

Mahecha, Daniel; Nuñez, Haydemar; Lattig, Maria C; Duitama, Jorge.

Hum Mutat ; 43(4): 449-460, 2022 04.

Artículo en Inglés | MEDLINE | ID: mdl-35143088

RESUMEN

The growing use of next-generation sequencing technologies on genetic diagnosis has produced an exponential increase in the number of variants of uncertain significance (VUS). In this manuscript, we compare three machine learning methods to classify VUS as Pathogenic or No pathogenic, implementing a Random Forest (RF), a Support Vector Machine (SVM), and a Multilayer Perceptron. To train the models, we extracted high-quality variants from ClinVar that were previously classified as VUS. For each variant, we retrieved nine conservation scores, the loss-of-function tool, and allele frequencies. For the RF and SVM models, hyperparameters were tuned using cross-validation with a grid search. The three models were tested on a nonoverlapping set of variants that had been classified as VUS over the last 3 years, but had been reclassified in August 2020. The three models yielded superior accuracy on this set compared to the benchmarked tools. The RF-based model yielded the best performance across different variant types and was used to create VusPrize, an open-source software tool for prioritization of VUS. We believe that our model can improve the process of genetic diagnosis in research and clinical settings.

Asunto(s)

Secuenciación de Nucleótidos de Alto Rendimiento , Aprendizaje Automático , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Humanos , Redes Neurales de la Computación , Programas Informáticos , Máquina de Vectores de Soporte

2.

Structural variants in 3000 rice genomes.

Fuentes, Roven Rommel; Chebotarov, Dmytro; Duitama, Jorge; Smith, Sean; De la Hoz, Juan Fernando; Mohiyuddin, Marghoob; Wing, Rod A; McNally, Kenneth L; Tatarinova, Tatiana; Grigoriev, Andrey; Mauleon, Ramil; Alexandrov, Nickolai.

Genome Res ; 29(5): 870-880, 2019 05.

Artículo en Inglés | MEDLINE | ID: mdl-30992303

RESUMEN

Investigation of large structural variants (SVs) is a challenging yet important task in understanding trait differences in highly repetitive genomes. Combining different bioinformatic approaches for SV detection, we analyzed whole-genome sequencing data from 3000 rice genomes and identified 63 million individual SV calls that grouped into 1.5 million allelic variants. We found enrichment of long SVs in promoters and an excess of shorter variants in 5' UTRs. Across the rice genomes, we identified regions of high SV frequency enriched in stress response genes. We demonstrated how SVs may help in finding causative variants in genome-wide association analysis. These new insights into rice genome biology are valuable for understanding the effects SVs have on gene function, with the prospect of identifying novel agronomically important alleles that can be utilized to improve cultivated rice.

Asunto(s)

Variación Genética , Genoma de Planta , Variación Estructural del Genoma , Genómica/métodos , Oryza/genética , Alelos , Mapeo Cromosómico , Elementos Transponibles de ADN , Estudio de Asociación del Genoma Completo/métodos , Fenotipo , Análisis de Secuencia de ADN/métodos , Estrés Fisiológico/genética

3.

Loss of pod strings in common bean is associated with gene duplication, retrotransposon insertion and overexpression of PvIND.

Parker, Travis A; Cetz, Jose; de Sousa, Lorenna Lopes; Kuzay, Saarah; Lo, Sassoum; Floriani, Talissa de Oliveira; Njau, Serah; Arunga, Esther; Duitama, Jorge; Jernstedt, Judy; Myers, James R; Llaca, Victor; Herrera-Estrella, Alfredo; Gepts, Paul.

New Phytol ; 235(6): 2454-2465, 2022 09.

Artículo en Inglés | MEDLINE | ID: mdl-35708662

RESUMEN

Fruit development has been central in the evolution and domestication of flowering plants. In common bean (Phaseolus vulgaris), the principal global grain legume staple, two main production categories are distinguished by fibre deposition in pods: dry beans, with fibrous, stringy pods; and stringless snap/green beans, with reduced fibre deposition, which frequently revert to the ancestral stringy state. Here, we identify genetic and developmental patterns associated with pod fibre deposition. Transcriptional, anatomical, epigenetic and genetic regulation of pod strings were explored through RNA-seq, RT-qPCR, fluorescence microscopy, bisulfite sequencing and whole-genome sequencing. Overexpression of the INDEHISCENT ('PvIND') orthologue was observed in stringless types compared with isogenic stringy lines, associated with overspecification of weak dehiscence-zone cells throughout the pod vascular sheath. No differences in DNA methylation were correlated with this phenotype. Nonstringy varieties showed a tandemly direct duplicated PvIND and a Ty1-copia retrotransposon inserted between the two repeats. These sequence features are lost during pod reversion and are predictive of pod phenotype in diverse materials, supporting their role in PvIND overexpression and reversible string phenotype. Our results give insight into reversible gain-of-function mutations and possible genetic solutions to the reversion problem, of considerable economic value for green bean production.

Asunto(s)

Phaseolus , Domesticación , Duplicación de Gen , Phaseolus/genética , Fenotipo , Retroelementos/genética

4.

The Cerebrospinal Fluid Proteomic Response to Traumatic and Nontraumatic Acute Brain Injury: A Prospective Study.

Santacruz, Carlos A; Vincent, Jean-Louis; Duitama, Jorge; Bautista, Edwin; Imbault, Virginie; Bruneau, Michaël; Creteur, Jacques; Brimioulle, Serge; Communi, David; Taccone, Fabio S.

Neurocrit Care ; 37(2): 463-470, 2022 10.

Artículo en Inglés | MEDLINE | ID: mdl-35523916

RESUMEN

BACKGROUND: Quantitative analysis of ventricular cerebrospinal fluid (vCSF) proteins following acute brain injury (ABI) may help identify pathophysiological pathways and potential biomarkers that can predict unfavorable outcome. METHODS: In this prospective proteomic analysis study, consecutive patients with severe ABI expected to require intraventricular catheterization for intracranial pressure (ICP) monitoring for at least 5 days and patients without ABI admitted for elective clipping of an unruptured cerebral aneurysm were included. vCSF samples were collected within the first 24 h after ABI and ventriculostomy insertion and then every 24 h for 5 days. In patients without ABI, a single vCSF sample was collected at the time of elective clipping. Data-independent acquisition and sequential window acquisition of all theoretical spectra (SWATH) mass spectrometry were used to compare differences in protein expression in patients with ABI and patients without ABI and in patients with traumatic and nontraumatic ABI. Differences in protein expression according to different ICP values, intensive care unit outcome, subarachnoid hemorrhage (SAH) versus traumatic brain injury (TBI), and good versus poor 3-month functional status (assessed by using the Glasgow Outcome Scale) were also evaluated. vCSF proteins with significant differences between groups were compared by using linear models and selected for gene ontology analysis using R Language and the Panther database. RESULTS: We included 50 patients with ABI (SAH n = 23, TBI n = 15, intracranial hemorrhage n = 6, ischemic stroke n = 3, others n = 3) and 12 patients without ABI. There were significant differences in the expression of 255 proteins between patients with and without ABI (p < 0.01). There were intraday and interday differences in expression of seven proteins related to increased inflammation, apoptosis, oxidative stress, and cellular response to hypoxia and injury. Among these, glial fibrillary acidic protein expression was higher in patients with ABI with severe intracranial hypertension (ICH) (ICP ≥ 30 mm Hg) or death compared to those without (log 2 fold change: + 2.4; p < 0.001), suggesting extensive primary astroglial injury or death. There were differences in the expression of 96 proteins between patients with traumatic and nontraumatic ABI (p < 0.05); intraday and interday differences were observed for six proteins related to structural damage, complement activation, and cholesterol metabolism. Thirty-nine vCSF proteins were associated with an increased risk of severe ICH (ICP ≥ 30 mm Hg) in patients with traumatic compared with nontraumatic ABI (p < 0.05). No significant differences were found in protein expression between patients with SAH versus TBI or between those with good versus poor 3-month Glasgow Outcome Scale score. CONCLUSIONS: Dysregulated vCSF protein expression after ABI may be associated with an increased risk of severe ICH and death.

Asunto(s)

Lesiones Traumáticas del Encéfalo , Lesiones Encefálicas , Hipertensión Intracraneal , Hemorragia Subaracnoidea , Biomarcadores , Colesterol , Proteína Ácida Fibrilar de la Glía , Humanos , Hipertensión Intracraneal/etiología , Presión Intracraneal/fisiología , Estudios Prospectivos , Proteómica , Hemorragia Subaracnoidea/complicaciones

5.

Genetic mapping for agronomic traits in a MAGIC population of common bean (Phaseolus vulgaris L.) under drought conditions.

Diaz, Santiago; Ariza-Suarez, Daniel; Izquierdo, Paulo; Lobaton, Juan David; de la Hoz, Juan Fernando; Acevedo, Fernando; Duitama, Jorge; Guerrero, Alberto F; Cajiao, Cesar; Mayor, Victor; Beebe, Stephen E; Raatz, Bodo.

BMC Genomics ; 21(1): 799, 2020 Nov 16.

Artículo en Inglés | MEDLINE | ID: mdl-33198642

RESUMEN

BACKGROUND: Common bean is an important staple crop in the tropics of Africa, Asia and the Americas. Particularly smallholder farmers rely on bean as a source for calories, protein and micronutrients. Drought is a major production constraint for common bean, a situation that will be aggravated with current climate change scenarios. In this context, new tools designed to understand the genetic basis governing the phenotypic responses to abiotic stress are required to improve transfer of desirable traits into cultivated beans. RESULTS: A multiparent advanced generation intercross (MAGIC) population of common bean was generated from eight Mesoamerican breeding lines representing the phenotypic and genotypic diversity of the CIAT Mesoamerican breeding program. This population was assessed under drought conditions in two field trials for yield, 100 seed weight, iron and zinc accumulation, phenology and pod harvest index. Transgressive segregation was observed for most of these traits. Yield was positively correlated with yield components and pod harvest index (PHI), and negative correlations were found with phenology traits and micromineral contents. Founder haplotypes in the population were identified using Genotyping by Sequencing (GBS). No major population structure was observed in the population. Whole Genome Sequencing (WGS) data from the founder lines was used to impute genotyping data for GWAS. Genetic mapping was carried out with two methods, using association mapping with GWAS, and linkage mapping with haplotype-based interval screening. Thirteen high confidence QTL were identified using both methods and several QTL hotspots were found controlling multiple traits. A major QTL hotspot located on chromosome Pv01 for phenology traits and yield was identified. Further hotspots affecting several traits were observed on chromosomes Pv03 and Pv08. A major QTL for seed Fe content was contributed by MIB778, the founder line with highest micromineral accumulation. Based on imputed WGS data, candidate genes are reported for the identified major QTL, and sequence changes were identified that could cause the phenotypic variation. CONCLUSIONS: This work demonstrates the importance of this common bean MAGIC population for genetic mapping of agronomic traits, to identify trait associations for molecular breeding tool design and as a new genetic resource for the bean research community.

Asunto(s)

Phaseolus , África , Asia , Mapeo Cromosómico , Sequías , Phaseolus/genética , Fenotipo , Fitomejoramiento , Sitios de Carácter Cuantitativo

6.

NGSEP3: accurate variant calling across species and sequencing protocols.

Tello, Daniel; Gil, Juanita; Loaiza, Cristian D; Riascos, John J; Cardozo, Nicolás; Duitama, Jorge.

Bioinformatics ; 35(22): 4716-4723, 2019 11 01.

Artículo en Inglés | MEDLINE | ID: mdl-31099384

RESUMEN

MOTIVATION: Accurate detection, genotyping and downstream analysis of genomic variants from high-throughput sequencing data are fundamental features in modern production pipelines for genetic-based diagnosis in medicine or genomic selection in plant and animal breeding. Our research group maintains the Next-Generation Sequencing Experience Platform (NGSEP) as a precise, efficient and easy-to-use software solution for these features. RESULTS: Understanding that incorrect alignments around short tandem repeats are an important source of genotyping errors, we implemented in NGSEP new algorithms for realignment and haplotype clustering of reads spanning indels and short tandem repeats. We performed extensive benchmark experiments comparing NGSEP to state-of-the-art software using real data from three sequencing protocols and four species with different distributions of repetitive elements. NGSEP consistently shows comparative accuracy and better efficiency compared to the existing solutions. We expect that this work will contribute to the continuous improvement of quality in variant calling needed for modern applications in medicine and agriculture. AVAILABILITY AND IMPLEMENTATION: NGSEP is available as open source software at http://ngsep.sf.net. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Asunto(s)

Secuenciación de Nucleótidos de Alto Rendimiento , Programas Informáticos , Algoritmos , Genómica , Mutación INDEL , Análisis de Secuencia de ADN

7.

Genomic Variability of Phytophthora palmivora Isolates from Different Oil Palm Cultivation Regions in Colombia.

Gil, Juanita; Herrera, Mariana; Duitama, Jorge; Sarria, Greicy; Restrepo, Silvia; Romero, Hernán Mauricio.

Phytopathology ; 110(9): 1553-1564, 2020 Sep.

Artículo en Inglés | MEDLINE | ID: mdl-32314947

RESUMEN

Palm oil is the most consumed vegetable oil globally, and Colombia is the largest palm oil producer in South America and fourth worldwide. However, oil palm plantations in Colombia are affected by bud rot disease caused by the oomycete Phytophthora palmivora, leading to significant economic losses. Infection processes by plant pathogens involve the secretion of effector molecules, which alter the functioning or structure of host cells. Current long-read sequencing technologies provide the information needed to produce high-quality genome assemblies, enabling a comprehensive annotation of effectors. Here, we describe the development of genomic resources for P. palmivora, including a high-quality genome assembly based on long and short-read sequencing data, intraspecies variability for 12 isolates from different oil palm cultivation regions in Colombia, and a catalog of over 1,000 candidate effector proteins. A total of 45,416 genes were annotated from the new genome assembled in 2,322 contigs adding to 165.5 Mbp, which represents an improvement of two times more gene models, 33 times better contiguity, and 11 times less fragmentation compared with currently available genomic resources for the species. Analysis of nucleotide evolution in paralogs suggests a recent whole-genome duplication event. Genetic differences were identified among isolates showing variable virulence levels. We expect that these novel genomic resources contribute to the characterization of the species and the understanding of the interaction of P. palmivora with oil palm and could be further exploited as tools for the development of effective strategies for disease control.

Asunto(s)

Phytophthora , Colombia , Genómica , Enfermedades de las Plantas , América del Sur

8.

Translocation of a parthenogenesis gene candidate to an alternate carrier chromosome in apomictic Brachiaria humidicola.

Worthington, Margaret; Ebina, Masumi; Yamanaka, Naoki; Heffelfinger, Christopher; Quintero, Constanza; Zapata, Yeny Patricia; Perez, Juan Guillermo; Selvaraj, Michael; Ishitani, Manabu; Duitama, Jorge; de la Hoz, Juan Fernando; Rao, Idupulapati; Dellaporta, Stephen; Tohme, Joe; Arango, Jacobo.

BMC Genomics ; 20(1): 41, 2019 Jan 14.

Artículo en Inglés | MEDLINE | ID: mdl-30642244

RESUMEN

BACKGROUND: The apomictic reproductive mode of Brachiaria (syn. Urochloa) forage species allows breeders to faithfully propagate heterozygous genotypes through seed over multiple generations. In Brachiaria, reproductive mode segregates as single dominant locus, the apospory-specific genomic region (ASGR). The AGSR has been mapped to an area of reduced recombination on Brachiaria decumbens chromosome 5. A primer pair designed within ASGR-BABY BOOM-like (BBML), the candidate gene for the parthenogenesis component of apomixis in Pennisetum squamulatum, was diagnostic for reproductive mode in the closely related species B. ruziziensis, B. brizantha, and B. decumbens. In this study, we used a mapping population of the distantly related commercial species B. humidicola to map the ASGR and test for conservation of ASGR-BBML sequences across Brachiaria species. RESULTS: Dense genetic maps were constructed for the maternal and paternal genomes of a hexaploid (2n = 6x = 36) B. humidicola F1 mapping population (n = 102) using genotyping-by-sequencing, simple sequence repeat, amplified fragment length polymorphism, and transcriptome derived single nucleotide polymorphism markers. Comparative genomics with Setaria italica provided confirmation for x = 6 as the base chromosome number of B. humidicola. High resolution molecular karyotyping indicated that the six homologous chromosomes of the sexual female parent paired at random, whereas preferential pairing of subgenomes was observed in the apomictic male parent. Furthermore, evidence for compensated aneuploidy was found in the apomictic parent, with only five homologous linkage groups identified for chromosome 5 and seven homologous linkage groups of chromosome 6. The ASGR mapped to B. humidicola chromosome 1, a region syntenic with chromosomes 1 and 7 of S. italica. The ASGR-BBML specific PCR product cosegregated with the ASGR in the F1 mapping population, despite its location on a different carrier chromosome than B. decumbens. CONCLUSIONS: The first dense molecular maps of B. humidicola provide strong support for cytogenetic evidence indicating a base chromosome number of six in this species. Furthermore, these results show conservation of the ASGR across the Paniceae in different chromosomal backgrounds and support postulation of the ASGR-BBML as candidate genes for the parthenogenesis component of apomixis.

Asunto(s)

Apomixis , Brachiaria/genética , Mapeo Cromosómico , Partenogénesis/genética , Cromosomas de las Plantas , Genómica , Cariotipificación , Translocación Genética

9.

EXPLoRA-web: linkage analysis of quantitative trait loci using bulk segregant analysis.

Pulido-Tamayo, Sergio; Duitama, Jorge; Marchal, Kathleen.

Nucleic Acids Res ; 44(W1): W142-6, 2016 07 08.

Artículo en Inglés | MEDLINE | ID: mdl-27105844

RESUMEN

Identification of genomic regions associated with a phenotype of interest is a fundamental step toward solving questions in biology and improving industrial research. Bulk segregant analysis (BSA) combined with high-throughput sequencing is a technique to efficiently identify these genomic regions associated with a trait of interest. However, distinguishing true from spuriously linked genomic regions and accurately delineating the genomic positions of these truly linked regions requires the use of complex statistical models currently implemented in software tools that are generally difficult to operate for non-expert users. To facilitate the exploration and analysis of data generated by bulked segregant analysis, we present EXPLoRA-web, a web service wrapped around our previously published algorithm EXPLoRA, which exploits linkage disequilibrium to increase the power and accuracy of quantitative trait loci identification in BSA analysis. EXPLoRA-web provides a user friendly interface that enables easy data upload and parallel processing of different parameter configurations. Results are provided graphically and as BED file and/or text file and the input is expected in widely used formats, enabling straightforward BSA data analysis. The web server is available at http://bioinformatics.intec.ugent.be/explora-web/.

Asunto(s)

Algoritmos , Desequilibrio de Ligamiento , Sitios de Carácter Cuantitativo , Carácter Cuantitativo Heredable , Programas Informáticos , Alelos , Animales , Bacterias/genética , Bacterias/metabolismo , Gráficos por Computador , Frecuencia de los Genes , Secuenciación de Nucleótidos de Alto Rendimiento , Almacenamiento y Recuperación de la Información , Internet , Fenotipo

10.

Bioinformatic analysis of genotype by sequencing (GBS) data with NGSEP.

Perea, Claudia; De La Hoz, Juan Fernando; Cruz, Daniel Felipe; Lobaton, Juan David; Izquierdo, Paulo; Quintero, Juan Camilo; Raatz, Bodo; Duitama, Jorge.

BMC Genomics ; 17 Suppl 5: 498, 2016 08 31.

Artículo en Inglés | MEDLINE | ID: mdl-27585926

RESUMEN

BACKGROUND: Therecent development and availability of different genotype by sequencing (GBS) protocols provided a cost-effective approach to perform high-resolution genomic analysis of entire populations in different species. The central component of all these protocols is the digestion of the initial DNA with known restriction enzymes, to generate sequencing fragments at predictable and reproducible sites. This allows to genotype thousands of genetic markers on populations with hundreds of individuals. Because GBS protocols achieve parallel genotyping through high throughput sequencing (HTS), every GBS protocol must include a bioinformatics pipeline for analysis of HTS data. Our bioinformatics group recently developed the Next Generation Sequencing Eclipse Plugin (NGSEP) for accurate, efficient, and user-friendly analysis of HTS data. RESULTS: Here we present the latest functionalities implemented in NGSEP in the context of the analysis of GBS data. We implemented a one step wizard to perform parallel read alignment, variants identification and genotyping from HTS reads sequenced from entire populations. We added different filters for variants, samples and genotype calls as well as calculation of summary statistics overall and per sample, and diversity statistics per site. NGSEP includes a module to translate genotype calls to some of the most widely used input formats for integration with several tools to perform downstream analyses such as population structure analysis, construction of genetic maps, genetic mapping of complex traits and phenotype prediction for genomic selection. We assessed the accuracy of NGSEP on two highly heterozygous F1 cassava populations and on an inbred common bean population, and we showed that NGSEP provides similar or better accuracy compared to other widely used software packages for variants detection such as GATK, Samtools and Tassel. CONCLUSIONS: NGSEP is a powerful, accurate and efficient bioinformatics software tool for analysis of HTS data, and also one of the best bioinformatic packages to facilitate the analysis and to maximize the genomic variability information that can be obtained from GBS experiments for population genomics.

Asunto(s)

Genes de Plantas , Técnicas de Genotipaje , Secuenciación de Nucleótidos de Alto Rendimiento , Biología Computacional , Genotipo , Manihot/genética , Phaseolus/genética , Análisis de Secuencia de ADN

11.

Large-scale analysis of tandem repeat variability in the human genome.

Duitama, Jorge; Zablotskaya, Alena; Gemayel, Rita; Jansen, An; Belet, Stefanie; Vermeesch, Joris R; Verstrepen, Kevin J; Froyen, Guy.

Nucleic Acids Res ; 42(9): 5728-41, 2014 May.

Artículo en Inglés | MEDLINE | ID: mdl-24682812

RESUMEN

Tandem repeats are short DNA sequences that are repeated head-to-tail with a propensity to be variable. They constitute a significant proportion of the human genome, also occurring within coding and regulatory regions. Variation in these repeats can alter the function and/or expression of genes allowing organisms to swiftly adapt to novel environments. Importantly, some repeat expansions have also been linked to certain neurodegenerative diseases. Therefore, accurate sequencing of tandem repeats could contribute to our understanding of common phenotypic variability and might uncover missing genetic factors in idiopathic clinical conditions. However, despite long-standing evidence for the functional role of repeats, they are largely ignored because of technical limitations in sequencing, mapping and typing. Here, we report on a novel capture technique and data filtering protocol that allowed simultaneous sequencing of thousands of tandem repeats in the human genomes of a three generation family using GS-FLX-plus Titanium technology. Our results demonstrated that up to 7.6% of tandem repeats in this family (4% in coding sequences) differ from the reference sequence, and identified a de novo variation in the family tree. The method opens new routes to look at this underappreciated type of genetic variability, including the identification of novel disease-related repeats.

Asunto(s)

Genoma Humano , Polimorfismo Genético , Secuencias Repetidas en Tándem , Secuencia de Bases , Femenino , Componentes del Gen , Humanos , Masculino , Datos de Secuencia Molecular , Linaje , Análisis de Secuencia de ADN

12.

An integrated framework for discovery and genotyping of genomic variants from high-throughput sequencing experiments.

Duitama, Jorge; Quintero, Juan Camilo; Cruz, Daniel Felipe; Quintero, Constanza; Hubmann, Georg; Foulquié-Moreno, Maria R; Verstrepen, Kevin J; Thevelein, Johan M; Tohme, Joe.

Nucleic Acids Res ; 42(6): e44, 2014 Apr.

Artículo en Inglés | MEDLINE | ID: mdl-24413664

RESUMEN

Recent advances in high-throughput sequencing (HTS) technologies and computing capacity have produced unprecedented amounts of genomic data that have unraveled the genetics of phenotypic variability in several species. However, operating and integrating current software tools for data analysis still require important investments in highly skilled personnel. Developing accurate, efficient and user-friendly software packages for HTS data analysis will lead to a more rapid discovery of genomic elements relevant to medical, agricultural and industrial applications. We therefore developed Next-Generation Sequencing Eclipse Plug-in (NGSEP), a new software tool for integrated, efficient and user-friendly detection of single nucleotide variants (SNVs), indels and copy number variants (CNVs). NGSEP includes modules for read alignment, sorting, merging, functional annotation of variants, filtering and quality statistics. Analysis of sequencing experiments in yeast, rice and human samples shows that NGSEP has superior accuracy and efficiency, compared with currently available packages for variants detection. We also show that only a comprehensive and accurate identification of repeat regions and CNVs allows researchers to properly separate SNVs from differences between copies of repeat elements. We expect that NGSEP will become a strong support tool to empower the analysis of sequencing data in a wide range of research projects on different species.

Asunto(s)

Variación Genética , Técnicas de Genotipaje , Secuenciación de Nucleótidos de Alto Rendimiento , Programas Informáticos , Algoritmos , Variaciones en el Número de Copia de ADN , Genómica/métodos , Humanos , Mutación INDEL , Oryza/genética

13.

Comparative polygenic analysis of maximal ethanol accumulation capacity and tolerance to high ethanol levels of cell proliferation in yeast.

Pais, Thiago M; Foulquié-Moreno, María R; Hubmann, Georg; Duitama, Jorge; Swinnen, Steve; Goovaerts, Annelies; Yang, Yudi; Dumortier, Françoise; Thevelein, Johan M.

PLoS Genet ; 9(6): e1003548, 2013 Jun.

Artículo en Inglés | MEDLINE | ID: mdl-23754966

RESUMEN

The yeast Saccharomyces cerevisiae is able to accumulate ≥17% ethanol (v/v) by fermentation in the absence of cell proliferation. The genetic basis of this unique capacity is unknown. Up to now, all research has focused on tolerance of yeast cell proliferation to high ethanol levels. Comparison of maximal ethanol accumulation capacity and ethanol tolerance of cell proliferation in 68 yeast strains showed a poor correlation, but higher ethanol tolerance of cell proliferation clearly increased the likelihood of superior maximal ethanol accumulation capacity. We have applied pooled-segregant whole-genome sequence analysis to identify the polygenic basis of these two complex traits using segregants from a cross of a haploid derivative of the sake strain CBS1585 and the lab strain BY. From a total of 301 segregants, 22 superior segregants accumulating ≥17% ethanol in small-scale fermentations and 32 superior segregants growing in the presence of 18% ethanol, were separately pooled and sequenced. Plotting SNP variant frequency against chromosomal position revealed eleven and eight Quantitative Trait Loci (QTLs) for the two traits, respectively, and showed that the genetic basis of the two traits is partially different. Fine-mapping and Reciprocal Hemizygosity Analysis identified ADE1, URA3, and KIN3, encoding a protein kinase involved in DNA damage repair, as specific causative genes for maximal ethanol accumulation capacity. These genes, as well as the previously identified MKT1 gene, were not linked in this genetic background to tolerance of cell proliferation to high ethanol levels. The superior KIN3 allele contained two SNPs, which are absent in all yeast strains sequenced up to now. This work provides the first insight in the genetic basis of maximal ethanol accumulation capacity in yeast and reveals for the first time the importance of DNA damage repair in yeast ethanol tolerance.

Asunto(s)

Proliferación Celular , Etanol/metabolismo , Sitios de Carácter Cuantitativo/genética , Saccharomyces cerevisiae/genética , Bebidas Alcohólicas/microbiología , Alelos , Mapeo Cromosómico , Daño del ADN/efectos de los fármacos , Daño del ADN/genética , Reparación del ADN/efectos de los fármacos , Reparación del ADN/genética , Tolerancia a Medicamentos/genética , Etanol/farmacología , Genoma , Polimorfismo de Nucleótido Simple , Proteínas Serina-Treonina Quinasas/genética , Saccharomyces cerevisiae/metabolismo , Proteínas de Saccharomyces cerevisiae/genética

14.

Improved linkage analysis of Quantitative Trait Loci using bulk segregants unveils a novel determinant of high ethanol tolerance in yeast.

Duitama, Jorge; Sánchez-Rodríguez, Aminael; Goovaerts, Annelies; Pulido-Tamayo, Sergio; Hubmann, Georg; Foulquié-Moreno, María R; Thevelein, Johan M; Verstrepen, Kevin J; Marchal, Kathleen.

BMC Genomics ; 15: 207, 2014 Mar 19.

Artículo en Inglés | MEDLINE | ID: mdl-24640961

RESUMEN

BACKGROUND: Bulk segregant analysis (BSA) coupled to high throughput sequencing is a powerful method to map genomic regions related with phenotypes of interest. It relies on crossing two parents, one inferior and one superior for a trait of interest. Segregants displaying the trait of the superior parent are pooled, the DNA extracted and sequenced. Genomic regions linked to the trait of interest are identified by searching the pool for overrepresented alleles that normally originate from the superior parent. BSA data analysis is non-trivial due to sequencing, alignment and screening errors. RESULTS: To increase the power of the BSA technology and obtain a better distinction between spuriously and truly linked regions, we developed EXPLoRA (EXtraction of over-rePresented aLleles in BSA), an algorithm for BSA data analysis that explicitly models the dependency between neighboring marker sites by exploiting the properties of linkage disequilibrium through a Hidden Markov Model (HMM). Reanalyzing a BSA dataset for high ethanol tolerance in yeast allowed reliably identifying QTLs linked to this phenotype that could not be identified with statistical significance in the original study. Experimental validation of one of the least pronounced linked regions, by identifying its causative gene VPS70, confirmed the potential of our method. CONCLUSIONS: EXPLoRA has a performance at least as good as the state-of-the-art and it is robust even at low signal to noise ratio's i.e. when the true linkage signal is diluted by sampling, screening errors or when few segregants are available.

Asunto(s)

Algoritmos , Etanol/farmacología , Sitios de Carácter Cuantitativo , Saccharomyces cerevisiae/efectos de los fármacos , Mapeo Cromosómico , Ligamiento Genético , Desequilibrio de Ligamiento , Cadenas de Markov , Fenotipo , Saccharomyces cerevisiae/genética

15.

A comprehensively molecular haplotype-resolved genome of a European individual.

Suk, Eun-Kyung; McEwen, Gayle K; Duitama, Jorge; Nowick, Katja; Schulz, Sabrina; Palczewski, Stefanie; Schreiber, Stefan; Holloway, Dustin T; McLaughlin, Stephen; Peckham, Heather; Lee, Clarence; Huebsch, Thomas; Hoehe, Margret R.

Genome Res ; 21(10): 1672-85, 2011 Oct.

Artículo en Inglés | MEDLINE | ID: mdl-21813624

RESUMEN

Independent determination of both haplotype sequences of an individual genome is essential to relate genetic variation to genome function, phenotype, and disease. To address the importance of phase, we have generated the most complete haplotype-resolved genome to date, "Max Planck One" (MP1), by fosmid pool-based next generation sequencing. Virtually all SNPs (>99%) and 80,000 indels were phased into haploid sequences of up to 6.3 Mb (N50 ~1 Mb). The completeness of phasing allowed determination of the concrete molecular haplotype pairs for the vast majority of genes (81%) including potential regulatory sequences, of which >90% were found to be constituted by two different molecular forms. A subset of 159 genes with potentially severe mutations in either cis or trans configurations exemplified in particular the role of phase for gene function, disease, and clinical interpretation of personal genomes (e.g., BRCA1). Extended genomic regions harboring manifold combinations of physically and/or functionally related genes and regulatory elements were resolved into their underlying "haploid landscapes," which may define the functional genome. Moreover, the majority of genes and functional sequences were found to contain individual or rare SNPs, which cannot be phased from population data alone, emphasizing the importance of molecular phasing for characterizing a genome in its molecular individuality. Our work provides the foundation to understand that the distinction of molecular haplotypes is essential to resolve the (inherently individual) biology of genes, genomes, and disease, establishing a reference point for "phase-sensitive" personal genomics. MP1's annotated haploid genomes are available as a public resource.

Asunto(s)

Genoma Humano , Haplotipos , Femenino , Genómica , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Mutación INDEL , Masculino , Persona de Mediana Edad , Polimorfismo de Nucleótido Simple , Análisis de Secuencia de ADN

16.

Functional divergence of gene duplicates through ectopic recombination.

Christiaens, Joaquin F; Van Mulders, Sebastiaan E; Duitama, Jorge; Brown, Chris A; Ghequire, Maarten G; De Meester, Luc; Michiels, Jan; Wenseleers, Tom; Voordeckers, Karin; Verstrepen, Kevin J.

EMBO Rep ; 13(12): 1145-51, 2012 Dec.

Artículo en Inglés | MEDLINE | ID: mdl-23070367

RESUMEN

Gene duplication stimulates evolutionary innovation as the resulting paralogs acquire mutations that lead to sub- or neofunctionalization. A comprehensive in silico analysis of paralogs in Saccharomyces cerevisiae reveals that duplicates of cell-surface and subtelomeric genes also undergo ectopic recombination, which leads to new chimaeric alleles. Mimicking such intergenic recombination events in the FLO (flocculation) family of cell-surface genes shows that chimaeric FLO alleles confer different adhesion phenotypes than the parental genes. Our results indicate that intergenic recombination between paralogs can generate a large set of new alleles, thereby providing the raw material for evolutionary adaptation and innovation.

Asunto(s)

Duplicación de Gen/genética , Lectinas de Unión a Manosa/genética , Recombinación Genética , Proteínas de Saccharomyces cerevisiae/genética , Saccharomyces cerevisiae/genética , Alelos , Adhesión Celular/genética , Evolución Molecular , Regulación Fúngica de la Expresión Génica , Variación Genética , Mutación , Fenotipo , Homología de Secuencia de Aminoácido

17.

Fosmid-based whole genome haplotyping of a HapMap trio child: evaluation of Single Individual Haplotyping techniques.

Duitama, Jorge; McEwen, Gayle K; Huebsch, Thomas; Palczewski, Stefanie; Schulz, Sabrina; Verstrepen, Kevin; Suk, Eun-Kyung; Hoehe, Margret R.

Nucleic Acids Res ; 40(5): 2041-53, 2012 Mar.

Artículo en Inglés | MEDLINE | ID: mdl-22102577

RESUMEN

Determining the underlying haplotypes of individual human genomes is an essential, but currently difficult, step toward a complete understanding of genome function. Fosmid pool-based next-generation sequencing allows genome-wide generation of 40-kb haploid DNA segments, which can be phased into contiguous molecular haplotypes computationally by Single Individual Haplotyping (SIH). Many SIH algorithms have been proposed, but the accuracy of such methods has been difficult to assess due to the lack of real benchmark data. To address this problem, we generated whole genome fosmid sequence data from a HapMap trio child, NA12878, for which reliable haplotypes have already been produced. We assembled haplotypes using eight algorithms for SIH and carried out direct comparisons of their accuracy, completeness and efficiency. Our comparisons indicate that fosmid-based haplotyping can deliver highly accurate results even at low coverage and that our SIH algorithm, ReFHap, is able to efficiently produce high-quality haplotypes. We expanded the haplotypes for NA12878 by combining the current haplotypes with our fosmid-based haplotypes, producing near-to-complete new gold-standard haplotypes containing almost 98% of heterozygous SNPs. This improvement includes notable fractions of disease-related and GWA SNPs. Integrated with other molecular biological data sets, this phase information will advance the emerging field of diploid genomics.

Asunto(s)

Genoma Humano , Proyecto Mapa de Haplotipos , Haplotipos , Análisis de Secuencia de ADN , Algoritmos , Genómica/normas , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Polimorfismo de Nucleótido Simple , Análisis de Secuencia de ADN/normas

18.

A graph clustering algorithm for detection and genotyping of structural variants from long reads.

Gaitán, Nicolás; Duitama, Jorge.

Gigascience ; 132024 Jan 02.

Artículo en Inglés | MEDLINE | ID: mdl-38206589

RESUMEN

BACKGROUND: Structural variants (SVs) are genomic polymorphisms defined by their length (>50 bp). The usual types of SVs are deletions, insertions, translocations, inversions, and copy number variants. SV detection and genotyping is fundamental given the role of SVs in phenomena such as phenotypic variation and evolutionary events. Thus, methods to identify SVs using long-read sequencing data have been recently developed. FINDINGS: We present an accurate and efficient algorithm to predict germline SVs from long-read sequencing data. The algorithm starts collecting evidence (signatures) of SVs from read alignments. Then, signatures are clustered based on a Euclidean graph with coordinates calculated from lengths and genomic positions. Clustering is performed by the DBSCAN algorithm, which provides the advantage of delimiting clusters with high resolution. Clusters are transformed into SVs and a Bayesian model allows to precisely genotype SVs based on their supporting evidence. This algorithm is integrated into the single sample variants detector of the Next Generation Sequencing Experience Platform, which facilitates the integration with other functionalities for genomics analysis. We performed multiple benchmark experiments, including simulation and real data, representing different genome profiles, sequencing technologies (PacBio HiFi, ONT), and read depths. CONCLUSION: The results show that our approach outperformed state-of-the-art tools on germline SV calling and genotyping, especially at low depths, and in error-prone repetitive regions. We believe this work significantly contributes to the development of bioinformatic strategies to maximize the use of long-read sequencing technologies.

Asunto(s)

Algoritmos , Benchmarking , Teorema de Bayes , Genotipo , Análisis por Conglomerados

19.

Phylogenomic approaches reveal a robust time-scale phylogeny of the Terminal Fusarium Clade.

Lizcano Salas, Andrés Felipe; Duitama, Jorge; Restrepo, Silvia; Celis Ramírez, Adriana Marcela.

IMA Fungus ; 15(1): 13, 2024 Jun 07.

Artículo en Inglés | MEDLINE | ID: mdl-38849861

RESUMEN

The Terminal Fusarium Clade (TFC) is a group in the Nectriaceae family with agricultural and clinical relevance. In recent years, various phylogenies have been presented in the literature, showing disagreement in the topologies, but only a few studies have conducted analyses on the divergence time scale of the group. Therefore, the evolutionary history of this group is still being determined. This study aimed to understand the evolutionary history of the TFC from a phylogenomic perspective. To achieve this objective, we performed a phylogenomic analysis using the available genomes in GenBank and ran eight different pipelines. We presented a new robust topology of the TFC that differs at some nodes from previous studies. These new relationships allowed us to formulate new hypotheses about the evolutionary history of the TFC. We also inferred new divergence time estimates, which differ from those of previous studies due to topology discordances and taxon sampling. The results suggested an important diversification process in the Neogene period, likely associated with the diversification and predominance of terrestrial ecosystems by angiosperms. In conclusion, we presented a robust time-scale phylogeny that allowed us to formulate new hypotheses regarding the evolutionary history of the TFC.

20.

A phased genome assembly of a Colombian Trypanosoma cruzi TcI strain and the evolution of gene families.

Hoyos Sanchez, Maria Camila; Ospina Zapata, Hader Sebastian; Suarez, Brayhan Dario; Ospina, Carlos; Barbosa, Hamilton Julian; Carranza Martinez, Julio Cesar; Vallejo, Gustavo Adolfo; Urrea Montes, Daniel; Duitama, Jorge.

Sci Rep ; 14(1): 2054, 2024 01 24.

Artículo en Inglés | MEDLINE | ID: mdl-38267502

RESUMEN

Chagas is an endemic disease in tropical regions of Latin America, caused by the parasite Trypanosoma cruzi. High intraspecies variability and genome complexity have been challenges to assemble high quality genomes needed for studies in evolution, population genomics, diagnosis and drug development. Here we present a chromosome-level phased assembly of a TcI T. cruzi strain (Dm25). While 29 chromosomes show a large collinearity with the assembly of the Brazil A4 strain, three chromosomes show both large heterozygosity and large divergence, compared to previous assemblies of TcI T. cruzi strains. Nucleotide and protein evolution statistics indicate that T. cruzi Marinkellei separated before the diversification of T. cruzi in the known DTUs. Interchromosomal paralogs of dispersed gene families and histones appeared before but at the same time have a more strict purifying selection, compared to other repeat families. Previously unreported large tandem arrays of protein kinases and histones were identified in this assembly. Over one million variants obtained from Illumina reads aligned to the primary assembly clearly separate the main DTUs. We expect that this new assembly will be a valuable resource for further studies on evolution and functional genomics of Trypanosomatids.

Asunto(s)

Enfermedad de Chagas , Trypanosoma cruzi , Humanos , Trypanosoma cruzi/genética , Colombia , Histonas , Brasil

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA