Búsqueda | Portal Regional de la BVS

Co-expression pan-network reveals genes involved in complex traits within maize pan-genome.

Cagirici, H Busra; Andorf, Carson M; Sen, Taner Z.

BMC Plant Biol ; 22(1): 595, 2022 Dec 19.

Artículo en Inglés | MEDLINE | ID: mdl-36529716

RESUMEN

BACKGROUND: With the advances in the high throughput next generation sequencing technologies, genome-wide association studies (GWAS) have identified a large set of variants associated with complex phenotypic traits at a very fine scale. Despite the progress in GWAS, identification of genotype-phenotype relationship remains challenging in maize due to its nature with dozens of variants controlling the same trait. As the causal variations results in the change in expression, gene expression analyses carry a pivotal role in unraveling the transcriptional regulatory mechanisms behind the phenotypes. RESULTS: To address these challenges, we incorporated the gene expression and GWAS-driven traits to extend the knowledge of genotype-phenotype relationships and transcriptional regulatory mechanisms behind the phenotypes. We constructed a large collection of gene co-expression networks and identified more than 2 million co-expressing gene pairs in the GWAS-driven pan-network which contains all the gene-pairs in individual genomes of the nested association mapping (NAM) population. We defined four sub-categories for the pan-network: (1) core-network contains the highest represented ~ 1% of the gene-pairs, (2) near-core network contains the next highest represented 1-5% of the gene-pairs, (3) private-network contains ~ 50% of the gene pairs that are unique to individual genomes, and (4) the dispensable-network contains the remaining 50-95% of the gene-pairs in the maize pan-genome. Strikingly, the private-network contained almost all the genes in the pan-network but lacked half of the interactions. We performed gene ontology (GO) enrichment analysis for the pan-, core-, and private- networks and compared the contributions of variants overlapping with genes and promoters to the GWAS-driven pan-network. CONCLUSIONS: Gene co-expression networks revealed meaningful information about groups of co-regulated genes that play a central role in regulatory processes. Pan-network approach enabled us to visualize the global view of the gene regulatory network for the studied system that could not be well inferred by the core-network alone.

Asunto(s)

Estudio de Asociación del Genoma Completo , Zea mays , Zea mays/genética , Estudio de Asociación del Genoma Completo/métodos , Herencia Multifactorial , Fenotipo , Redes Reguladoras de Genes , Polimorfismo de Nucleótido Simple/genética

G4Boost: a machine learning-based tool for quadruplex identification and stability prediction.

Cagirici, H Busra; Budak, Hikmet; Sen, Taner Z.

BMC Bioinformatics ; 23(1): 240, 2022 Jun 18.

Artículo en Inglés | MEDLINE | ID: mdl-35717172

RESUMEN

BACKGROUND: G-quadruplexes (G4s), formed within guanine-rich nucleic acids, are secondary structures involved in important biological processes. Although every G4 motif has the potential to form a stable G4 structure, not every G4 motif would, and accurate energy-based methods are needed to assess their structural stability. Here, we present a decision tree-based prediction tool, G4Boost, to identify G4 motifs and predict their secondary structure folding probability and thermodynamic stability based on their sequences, nucleotide compositions, and estimated structural topologies. RESULTS: G4Boost predicted the quadruplex folding state with an accuracy greater then 93% and an F1-score of 0.96, and the folding energy with an RMSE of 4.28 and R2 of 0.95 only by the means of sequence intrinsic feature. G4Boost was successfully applied and validated to predict the stability of experimentally-determined G4 structures, including for plants and humans. CONCLUSION: G4Boost outperformed the three machine-learning based prediction tools, DeepG4, Quadron, and G4RNA Screener, in terms of both accuracy and F1-score, and can be highly useful for G4 prediction to understand gene regulation across species including plants and humans.

Asunto(s)

G-Cuádruplex , Regulación de la Expresión Génica , Guanina/química , Humanos , Aprendizaje Automático , Termodinámica

GrainGenes: a data-rich repository for small grains genetics and genomics.

Yao, Eric; Blake, Victoria C; Cooper, Laurel; Wight, Charlene P; Michel, Steve; Cagirici, H Busra; Lazo, Gerard R; Birkett, Clay L; Waring, David J; Jannink, Jean-Luc; Holmes, Ian; Waters, Amanda J; Eickholt, David P; Sen, Taner Z.

Database (Oxford) ; 20222022 05 25.

Artículo en Inglés | MEDLINE | ID: mdl-35616118

RESUMEN

As one of the US Department of Agriculture-Agricultural Research Service flagship databases, GrainGenes (https://wheat.pw.usda.gov) serves the data and community needs of globally distributed small grains researchers for the genetic improvement of the Triticeae family and Avena species that include wheat, barley, rye and oat. GrainGenes accomplishes its mission by continually enriching its cross-linked data content following the findable, accessible, interoperable and reusable principles, enhancing and maintaining an intuitive web interface, creating tools to enable easy data access and establishing data connections within and between GrainGenes and other biological databases to facilitate knowledge discovery. GrainGenes operates within the biological database community, collaborates with curators and genome sequencing groups and contributes to the AgBioData Consortium and the International Wheat Initiative through the Wheat Information System (WheatIS). Interactive and linked content is paramount for successful biological databases and GrainGenes now has 2917 manually curated gene records, including 289 genes and 254 alleles from the Wheat Gene Catalogue (WGC). There are >4.8 million gene models in 51 genome browser assemblies, 6273 quantitative trait loci and >1.4 million genetic loci on 4756 genetic and physical maps contained within 443 mapping sets, complete with standardized metadata. Most notably, 50 new genome browsers that include outputs from the Wheat and Barley PanGenome projects have been created. We provide an example of an expression quantitative trait loci track on the International Wheat Genome Sequencing Consortium Chinese Spring wheat browser to demonstrate how genome browser tracks can be adapted for different data types. To help users benefit more from its data, GrainGenes created four tutorials available on YouTube. GrainGenes is executing its vision of service by continuously responding to the needs of the global small grains community by creating a centralized, long-term, interconnected data repository. Database URL:https://wheat.pw.usda.gov.

Asunto(s)

Genoma de Planta , Hordeum , Avena/genética , Mapeo Cromosómico , Bases de Datos Genéticas , Genoma de Planta/genética , Genómica , Hordeum/genética , Sitios de Carácter Cuantitativo , Triticum/genética

Multiple Variant Calling Pipelines in Wheat Whole Exome Sequencing.

Cagirici, H Busra; Akpinar, Bala Ani; Sen, Taner Z; Budak, Hikmet.

Int J Mol Sci ; 22(19)2021 Sep 27.

Artículo en Inglés | MEDLINE | ID: mdl-34638743

RESUMEN

The highly challenging hexaploid wheat (Triticum aestivum) genome is becoming ever more accessible due to the continued development of multiple reference genomes, a factor which aids in the plight to better understand variation in important traits. Although the process of variant calling is relatively straightforward, selection of the best combination of the computational tools for read alignment and variant calling stages of the analysis and efficient filtering of the false variant calls are not always easy tasks. Previous studies have analyzed the impact of methods on the quality metrics in diploid organisms. Given that variant identification in wheat largely relies on accurate mining of exome data, there is a critical need to better understand how different methods affect the analysis of whole exome sequencing (WES) data in polyploid species. This study aims to address this by performing whole exome sequencing of 48 wheat cultivars and assessing the performance of various variant calling pipelines at their suggested settings. The results show that all the pipelines require filtering to eliminate false-positive calls. The high consensus among the reference SNPs called by the best-performing pipelines suggests that filtering provides accurate and reproducible results. This study also provides detailed comparisons for high sensitivity and precision at individual and population levels for the raw and filtered SNP calls.

Asunto(s)

Secuenciación del Exoma , Genoma de Planta , Polimorfismo de Nucleótido Simple , Poliploidía , Triticum/genética

mirMachine: A One-Stop Shop for Plant miRNA Annotation.

Cagirici, H Busra; Sen, Taner Z; Budak, Hikmet.

J Vis Exp ; (171)2021 05 01.

Artículo en Inglés | MEDLINE | ID: mdl-33999024

RESUMEN

Of different types of noncoding RNAs, microRNAs (miRNAs) have arguably been in the spotlight over the last decade. As post-transcriptional regulators of gene expression, miRNAs play key roles in various cellular pathways, including both development and response to a/biotic stress, such as drought and diseases. Having high-quality reference genome sequences enabled identification and annotation of miRNAs in several plant species, where miRNA sequences are highly conserved. As computational miRNA identification and annotation processes are mostly error-prone processes, homology-based predictions increase prediction accuracy. We developed and have improved the miRNA annotation pipeline, SUmir, in the last decade, which has been used for several plant genomes since then. This study presents a fully automated, new miRNA pipeline, mirMachine (miRNA Machine), by (i) adding an additional filtering step on the secondary structure predictions, (ii) making it fully automated, and (iii) introducing new options to predict either known miRNA based on homology or novel miRNAs based on small RNA sequencing reads using the previous pipeline. The new miRNA pipeline, mirMachine, was tested using The Arabidopsis Information Resource, TAIR10, release of the Arabidopsis genome and the International Wheat Genome Sequencing Consortium (IWGSC) wheat reference genome v2.

Asunto(s)

Arabidopsis , MicroARNs , Arabidopsis/genética , Secuencia de Bases , Regulación de la Expresión Génica de las Plantas , Genoma de Planta/genética , Secuenciación de Nucleótidos de Alto Rendimiento , MicroARNs/genética , ARN de Planta/genética , Análisis de Secuencia de ARN

Genome-wide discovery of G-quadruplexes in barley.

Cagirici, H Busra; Budak, Hikmet; Sen, Taner Z.

Sci Rep ; 11(1): 7876, 2021 04 12.

Artículo en Inglés | MEDLINE | ID: mdl-33846409

RESUMEN

G-quadruplexes (G4s) are four-stranded nucleic acid structures with closely spaced guanine bases forming square planar G-quartets. Aberrant formation of G4 structures has been associated with genomic instability. However, most plant species are lacking comprehensive studies of G4 motifs. In this study, genome-wide identification of G4 motifs in barley was performed, followed by a comparison of genomic distribution and molecular functions to other monocot species, such as wheat, maize, and rice. Similar to the reports on human and some plants like wheat, G4 motifs peaked around the 5' untranslated region (5' UTR), the first coding domain sequence, and the first intron start sites on antisense strands. Our comparative analyses in human, Arabidopsis, maize, rice, and sorghum demonstrated that the peak points could be erroneously merged into a single peak when large window sizes are used. We also showed that the G4 distributions around genic regions are relatively similar in the species studied, except in the case of Arabidopsis. G4 containing genes in monocots showed conserved molecular functions for transcription initiation and hydrolase activity. Additionally, we provided examples of imperfect G4 motifs.

Asunto(s)

G-Cuádruplex , Hordeum/genética , Arabidopsis/genética , Genoma Humano , Genoma de Planta , Humanos , Polimorfismo de Nucleótido Simple , Zea mays/genética

LncMachine: a machine learning algorithm for long noncoding RNA annotation in plants.

Cagirici, H Busra; Galvez, S; Sen, Taner Z; Budak, Hikmet.

Funct Integr Genomics ; 21(2): 195-204, 2021 Mar.

Artículo en Inglés | MEDLINE | ID: mdl-33635499

RESUMEN

Following the elucidation of the critical roles they play in numerous important biological processes, long noncoding RNAs (lncRNAs) have gained vast attention in recent years. Manual annotation of lncRNAs is restricted by known gene annotations and is prone to false prediction due to the incompleteness of available data. However, with the advent of high-throughput sequencing technologies, a magnitude of high-quality data has become available for annotation, especially for plant species such as wheat. Here, we compared prediction accuracies of several machine learning algorithms using a 10-fold cross-validation. This study includes a comprehensive feature selection step to refine irrelevant and repeated features. We present a crop-specific, alignment-free coding potential prediction tool, LncMachine, that performs at higher prediction accuracies than the currently available popular tools (CPC2, CPAT, and CNIT) when used with the Random Forest algorithm. Further, LncMachine with Random Forest performed well on human and mouse data, with an average accuracy of 92.67%. LncMachine only requires either a FASTA file or a TAB separated CSV file containing features as input files. LncMachine can deploy several user-provided algorithms in real time and therefore be effortlessly applied to a wide range of studies.

Asunto(s)

Biología Computacional , Anotación de Secuencia Molecular , Plantas/genética , ARN Largo no Codificante/genética , Algoritmos , Secuenciación de Nucleótidos de Alto Rendimiento , Aprendizaje Automático , ARN Largo no Codificante/clasificación

Genome-Wide Discovery of G-Quadruplexes in Wheat: Distribution and Putative Functional Roles.

Cagirici, H Busra; Sen, Taner Z.

G3 (Bethesda) ; 10(6): 2021-2032, 2020 06 01.

Artículo en Inglés | MEDLINE | ID: mdl-32295768

RESUMEN

G-quadruplexes are nucleic acid secondary structures formed by a stack of square planar G-quartets. G-quadruplexes were implicated in many biological functions including telomere maintenance, replication, transcription, and translation, in many species including humans and plants. For wheat, however, though it is one of the world's most important staple food, no G-quadruplex studies have been reported to date. Here, we computationally identify putative G4 structures (G4s) in wheat genome for the first time and compare its distribution across the genome against five other genomes (human, maize, Arabidopsis, rice, and sorghum). We identified close to 1 million G4 motifs with a density of 76 G4s/Mb across the whole genome and 93 G4s/Mb over genic regions. Remarkably, G4s were enriched around three regions, two located on the antisense and one on the sense strand at the following positions: 1) the transcription start site (TSS) (antisense), 2) the first coding domain sequence (CDS) (antisense), and 3) the start codon (sense). Functional enrichment analysis revealed that the gene models containing G4 motifs within these peaks were associated with specific gene ontology (GO) terms, such as developmental process, localization, and cellular component organization or biogenesis. We investigated genes encoding MADS-box transcription factors and showed examples of G4 motifs within critical regulatory regions in the VRN-1 genes in wheat. Furthermore, comparison with other plants showed that monocots share a similar distribution of G4s, but Arabidopsis shows a unique G4 distribution. Our study shows for the first time the prevalence and possible functional roles of G4s in wheat.

Asunto(s)

G-Cuádruplex , Humanos , Secuencias Reguladoras de Ácidos Nucleicos , Sitio de Iniciación de la Transcripción , Triticum/genética , Zea mays

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA