Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 164
Filtrar
1.
Nat Methods ; 21(5): 793-797, 2024 May.
Artigo em Inglês | MEDLINE | ID: mdl-38509328

RESUMO

SQANTI3 is a tool designed for the quality control, curation and annotation of long-read transcript models obtained with third-generation sequencing technologies. Leveraging its annotation framework, SQANTI3 calculates quality descriptors of transcript models, junctions and transcript ends. With this information, potential artifacts can be identified and replaced with reliable sequences. Furthermore, the integrated functional annotation feature enables subsequent functional iso-transcriptomics analyses.


Assuntos
Anotação de Sequência Molecular , Transcriptoma , Humanos , Anotação de Sequência Molecular/métodos , Software , Perfilação da Expressão Gênica/métodos , Análise de Sequência de RNA/métodos , Isoformas de Proteínas/genética , Sequenciamento de Nucleotídeos em Larga Escala/métodos
2.
Nucleic Acids Res ; 52(5): e28, 2024 Mar 21.
Artigo em Inglês | MEDLINE | ID: mdl-38340337

RESUMO

Advances in affordable transcriptome sequencing combined with better exon and gene prediction has motivated many to compare transcription across the tree of life. We develop a mathematical framework to calculate complexity and compare transcript models. Structural features, i.e. intron retention (IR), donor/acceptor site variation, alternative exon cassettes, alternative 5'/3' UTRs, are compared and the distance between transcript models is calculated with nucleotide level precision. All metrics are implemented in a PyPi package, TranD and output can be used to summarize splicing patterns for a transcriptome (1GTF) and between transcriptomes (2GTF). TranD output enables quantitative comparisons between: annotations augmented by empirical RNA-seq data and the original transcript models; transcript model prediction tools for longread RNA-seq (e.g. FLAIR versus Isoseq3); alternate annotations for a species (e.g. RefSeq vs Ensembl); and between closely related species. In C. elegans, Z. mays, D. melanogaster, D. simulans and H. sapiens, alternative exons were observed more frequently in combination with an alternative donor/acceptor than alone. Transcript models in RefSeq and Ensembl are linked and both have unique transcript models with empirical support. D. melanogaster and D. simulans, share many transcript models and long-read RNAseq data suggests that both species are under-annotated. We recommend combined references.


Assuntos
Processamento Alternativo , Transcriptoma , Animais , Caenorhabditis elegans/genética , Drosophila melanogaster/genética , Perfilação da Expressão Gênica , Nucleotídeos , Splicing de RNA , Análise de Sequência de RNA , Especificidade da Espécie , Transcriptoma/genética , Software
3.
Bioinformatics ; 2024 Jul 08.
Artigo em Inglês | MEDLINE | ID: mdl-38976653

RESUMO

MOTIVATION: Understanding the dynamics of gene expression across different cellular states is crucial for discerning the mechanisms underneath cellular differentiation. Genes that exhibit variation in mean expression as a function of Pseudotime and between branching trajectories are expected to govern cell fate decisions. We introduce scMaSigPro, a method for the identification of differential gene expression patterns along Pseudotime and branching paths simultaneously. RESULTS: We assessed the performance of scMaSigPro using synthetic and public datasets. Our evaluation shows that scMaSigPro outperforms existing methods in controlling the False Positive Rate and is computationally efficient. AVAILABILITY AND IMPLEMENTATION: scMaSigPro is available as a free R package (version 4.0 or higher) under the GPL(≥2) license on GitHub at 'github.com/BioBam/scMaSigPro' and archived with version 0.03 on Zenodo at 'zenodo.org/records/12568922'.

4.
Nucleic Acids Res ; 50(W1): W551-W559, 2022 07 05.
Artigo em Inglês | MEDLINE | ID: mdl-35609982

RESUMO

PaintOmics is a web server for the integrative analysis and visualisation of multi-omics datasets using biological pathway maps. PaintOmics 4 has several notable updates that improve and extend analyses. Three pathway databases are now supported: KEGG, Reactome and MapMan, providing more comprehensive pathway knowledge for animals and plants. New metabolite analysis methods fill gaps in traditional pathway-based enrichment methods. The metabolite hub analysis selects compounds with a high number of significant genes in their neighbouring network, suggesting regulation by gene expression changes. The metabolite class activity analysis tests the hypothesis that a metabolic class has a higher-than-expected proportion of significant elements, indicating that these compounds are regulated in the experiment. Finally, PaintOmics 4 includes a regulatory omics module to analyse the contribution of trans-regulatory layers (microRNA and transcription factors, RNA-binding proteins) to regulate pathways. We show the performance of PaintOmics 4 on both mouse and plant data to highlight how these new analysis features provide novel insights into regulatory biology. PaintOmics 4 is available at https://paintomics.org/.


Assuntos
MicroRNAs , Multiômica , Animais , Camundongos , Bases de Dados Factuais , MicroRNAs/genética , Fatores de Transcrição , Biologia Computacional/métodos
5.
Bioinformatics ; 38(9): 2657-2658, 2022 04 28.
Artigo em Inglês | MEDLINE | ID: mdl-35238331

RESUMO

MOTIVATION: Batch effects in omics datasets are usually a source of technical noise that masks the biological signal and hampers data analysis. Batch effect removal has been widely addressed for individual omics technologies. However, multi-omic datasets may combine data obtained in different batches where omics type and batch are often confounded. Moreover, systematic biases may be introduced without notice during data acquisition, which creates a hidden batch effect. Current methods fail to address batch effect correction in these cases. RESULTS: In this article, we introduce the MultiBaC R package, a tool for batch effect removal in multi-omics and hidden batch effect scenarios. The package includes a diversity of graphical outputs for model validation and assessment of the batch effect correction. AVAILABILITY AND IMPLEMENTATION: MultiBaC package is available on Bioconductor (https://www.bioconductor.org/packages/release/bioc/html/MultiBaC.html) and GitHub (https://github.com/ConesaLab/MultiBaC.git). The data underlying this article are available in Gene Expression Omnibus repository (accession numbers GSE11521, GSE1002, GSE56622 and GSE43747). SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Biologia Computacional , Software
6.
Mol Syst Biol ; 17(6): e9864, 2021 06.
Artigo em Inglês | MEDLINE | ID: mdl-34132490

RESUMO

Understanding stem cell regulatory circuits is the next challenge in plant biology, as these cells are essential for tissue growth and organ regeneration in response to stress. In the Arabidopsis primary root apex, stem cell-specific transcription factors BRAVO and WOX5 co-localize in the quiescent centre (QC) cells, where they commonly repress cell division so that these cells can act as a reservoir to replenish surrounding stem cells, yet their molecular connection remains unknown. Genetic and biochemical analysis indicates that BRAVO and WOX5 form a transcription factor complex that modulates gene expression in the QC cells to preserve overall root growth and architecture. Furthermore, by using mathematical modelling we establish that BRAVO uses the WOX5/BRAVO complex to promote WOX5 activity in the stem cells. Our results unveil the importance of transcriptional regulatory circuits in plant stem cell development.


Assuntos
Proteínas de Arabidopsis , Arabidopsis , Arabidopsis/genética , Arabidopsis/metabolismo , Proteínas de Arabidopsis/genética , Proteínas de Arabidopsis/metabolismo , Divisão Celular , Regulação da Expressão Gênica de Plantas , Proteínas de Homeodomínio/genética , Meristema/genética , Meristema/metabolismo , Nitrilas , Raízes de Plantas/genética , Raízes de Plantas/metabolismo
7.
PLoS Biol ; 17(4): e2006506, 2019 04.
Artigo em Inglês | MEDLINE | ID: mdl-30978178

RESUMO

The differentiation of self-renewing progenitor cells requires not only the regulation of lineage- and developmental stage-specific genes but also the coordinated adaptation of housekeeping functions from a metabolically active, proliferative state toward quiescence. How metabolic and cell-cycle states are coordinated with the regulation of cell type-specific genes is an important question, because dissociation between differentiation, cell cycle, and metabolic states is a hallmark of cancer. Here, we use a model system to systematically identify key transcriptional regulators of Ikaros-dependent B cell-progenitor differentiation. We find that the coordinated regulation of housekeeping functions and tissue-specific gene expression requires a feedforward circuit whereby Ikaros down-regulates the expression of Myc. Our findings show how coordination between differentiation and housekeeping states can be achieved by interconnected regulators. Similar principles likely coordinate differentiation and housekeeping functions during progenitor cell differentiation in other cell lineages.


Assuntos
Linfócitos B/citologia , Genes myc , Células Precursoras de Linfócitos B/citologia , Animais , Linfócitos B/metabolismo , Ciclo Celular/fisiologia , Diferenciação Celular/genética , Linhagem da Célula , Bases de Dados Genéticas , Regulação para Baixo , Regulação da Expressão Gênica , Genes Essenciais , Humanos , Fator de Transcrição Ikaros/metabolismo , Ativação Linfocitária , Camundongos , Células Precursoras de Linfócitos B/metabolismo , Fatores de Transcrição/metabolismo
8.
Genome Res ; 2018 Feb 09.
Artigo em Inglês | MEDLINE | ID: mdl-29440222

RESUMO

High-throughput sequencing of full-length transcripts using long reads has paved the way for the discovery of thousands of novel transcripts, even in well-annotated mammalian species. The advances in sequencing technology have created a need for studies and tools that can characterize these novel variants. Here, we present SQANTI, an automated pipeline for the classification of long-read transcripts that can assess the quality of data and the preprocessing pipeline using 47 unique descriptors. We apply SQANTI to a neuronal mouse transcriptome using Pacific Biosciences (PacBio) long reads and illustrate how the tool is effective in characterizing and describing the composition of the full-length transcriptome. We perform extensive evaluation of ToFU PacBio transcripts by PCR to reveal that an important number of the novel transcripts are technical artifacts of the sequencing approach and that SQANTI quality descriptors can be used to engineer a filtering strategy to remove them. Most novel transcripts in this curated transcriptome are novel combinations of existing splice sites, resulting more frequently in novel ORFs than novel UTRs, and are enriched in both general metabolic and neural-specific functions. We show that these new transcripts have a major impact in the correct quantification of transcript levels by state-of-the-art short-read-based quantification algorithms. By comparing our iso-transcriptome with public proteomics databases, we find that alternative isoforms are elusive to proteogenomics detection. SQANTI allows the user to maximize the analytical outcome of long-read technologies by providing the tools to deliver quality-evaluated and curated full-length transcriptomes.

9.
Brief Bioinform ; 20(2): 471-481, 2019 03 22.
Artigo em Inglês | MEDLINE | ID: mdl-29040385

RESUMO

Over the last few years, RNA-seq has been used to study alterations in alternative splicing related to several diseases. Bioinformatics workflows used to perform these studies can be divided into two groups, those finding changes in the absolute isoform expression and those studying differential splicing. Many computational methods for transcriptomics analysis have been developed, evaluated and compared; however, there are not enough reports of systematic and objective assessment of processing pipelines as a whole. Moreover, comparative studies have been performed considering separately the changes in absolute or relative isoform expression levels. Consequently, no consensus exists about the best practices and appropriate workflows to analyse alternative and differential splicing. To assist the adequate pipeline choice, we present here a benchmarking of nine commonly used workflows to detect differential isoform expression and splicing. We evaluated the workflows performance over different experimental scenarios where changes in absolute and relative isoform expression occurred simultaneously. In addition, the effect of the number of isoforms per gene, and the magnitude of the expression change over pipeline performances were also evaluated. Our results suggest that workflow performance is influenced by the number of replicates per condition and the conditions heterogeneity. In general, workflows based on DESeq2, DEXSeq, Limma and NOISeq performed well over a wide range of transcriptomics experiments. In particular, we suggest the use of workflows based on Limma when high precision is required, and DESeq2 and DEXseq pipelines to prioritize sensitivity. When several replicates per condition are available, NOISeq and Limma pipelines are indicated.


Assuntos
Processamento Alternativo , Benchmarking/métodos , Biologia Computacional/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Proteínas de Neoplasias/genética , Neoplasias da Próstata/genética , Análise de Sequência de RNA/métodos , Estudos de Casos e Controles , Perfilação da Expressão Gênica , Humanos , Masculino , Proteínas de Neoplasias/metabolismo , Próstata/metabolismo , Neoplasias da Próstata/metabolismo , Isoformas de Proteínas , Fluxo de Trabalho
10.
Bioinformatics ; 36(Suppl_2): i795-i803, 2020 12 30.
Artigo em Inglês | MEDLINE | ID: mdl-33381819

RESUMO

MOTIVATION: Molecular pathway databases represent cellular processes in a structured and standardized way. These databases support the community-wide utilization of pathway information in biological research and the computational analysis of high-throughput biochemical data. Although pathway databases are critical in genomics research, the fast progress of biomedical sciences prevents databases from staying up-to-date. Moreover, the compartmentalization of cellular reactions into defined pathways reflects arbitrary choices that might not always be aligned with the needs of the researcher. Today, no tool exists that allow the easy creation of user-defined pathway representations. RESULTS: Here we present Padhoc, a pipeline for pathway ad hoc reconstruction. Based on a set of user-provided keywords, Padhoc combines natural language processing, database knowledge extraction, orthology search and powerful graph algorithms to create navigable pathways tailored to the user's needs. We validate Padhoc with a set of well-established Escherichia coli pathways and demonstrate usability to create not-yet-available pathways in model (human) and non-model (sweet orange) organisms. AVAILABILITY AND IMPLEMENTATION: Padhoc is freely available at https://github.com/ConesaLab/padhoc. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Biologia Computacional , Software , Algoritmos , Bases de Dados Factuais , Genômica , Humanos
11.
Bioinformatics ; 36(Suppl_2): i618-i624, 2020 12 30.
Artigo em Inglês | MEDLINE | ID: mdl-33381847

RESUMO

MOTIVATION: microRNAs (miRNAs) are essential components of gene expression regulation at the post-transcriptional level. miRNAs have a well-defined molecular structure and this has facilitated the development of computational and high-throughput approaches to predict miRNAs genes. However, due to their short size, miRNAs have often been incorrectly annotated in both plants and animals. Consequently, published miRNA annotations and miRNA databases are enriched for false miRNAs, jeopardizing their utility as molecular information resources. To address this problem, we developed MirCure, a new software for quality control, filtering and curation of miRNA candidates. MirCure is an easy-to-use tool with a graphical interface that allows both scoring of miRNA reliability and browsing of supporting evidence by manual curators. RESULTS: Given a list of miRNA candidates, MirCure evaluates a number of miRNA-specific features based on gene expression, biogenesis and conservation data, and generates a score that can be used to discard poorly supported miRNA annotations. MirCure can also curate and adjust the annotation of the 5p and 3p arms based on user-provided small RNA-seq data. We evaluated MirCure on a set of manually curated animal and plant miRNAs and demonstrated great accuracy. Moreover, we show that MirCure can be used to revisit previous bona fide miRNAs annotations to improve miRNA databases. AVAILABILITY AND IMPLEMENTATION: The MirCure software and all the additional scripts used in this project are publicly available at https://github.com/ConesaLab/MirCure. A Docker image of MirCure is available at https://hub.docker.com/r/conesalab/mircure. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
MicroRNAs , Animais , Biologia Computacional , MicroRNAs/genética , Plantas/genética , Controle de Qualidade , Reprodutibilidade dos Testes , Software
12.
EMBO Rep ; 20(12): e47964, 2019 12 05.
Artigo em Inglês | MEDLINE | ID: mdl-31680439

RESUMO

RNA-binding proteins (RBPs) participate in all steps of gene expression, underscoring their potential as regulators of RNA homeostasis. We structurally and functionally characterize Mip6, a four-RNA recognition motif (RRM)-containing RBP, as a functional and physical interactor of the export factor Mex67. Mip6-RRM4 directly interacts with the ubiquitin-associated (UBA) domain of Mex67 through a loop containing tryptophan 442. Mip6 shuttles between the nucleus and the cytoplasm in a Mex67-dependent manner and concentrates in cytoplasmic foci under stress. Photoactivatable ribonucleoside-enhanced crosslinking and immunoprecipitation experiments show preferential binding of Mip6 to mRNAs regulated by the stress-response Msn2/4 transcription factors. Consistent with this binding, MIP6 deletion affects their export and expression levels. Additionally, Mip6 interacts physically and/or functionally with proteins with a role in mRNA metabolism and transcription such as Rrp6, Xrn1, Sgf73, and Rpb1. These results reveal a novel role for Mip6 in the homeostasis of Msn2/4-dependent transcripts through its direct interaction with the Mex67 UBA domain.


Assuntos
Núcleo Celular/metabolismo , Proteínas Nucleares/metabolismo , Proteínas de Transporte Nucleocitoplasmático/metabolismo , Proteínas de Ligação a RNA/metabolismo , Proteínas de Saccharomyces cerevisiae/metabolismo , Transporte Ativo do Núcleo Celular , Sítios de Ligação , Proteínas de Ligação a DNA/genética , Proteínas de Ligação a DNA/metabolismo , Proteínas Nucleares/química , Proteínas Nucleares/genética , Proteínas de Transporte Nucleocitoplasmático/química , Proteínas de Transporte Nucleocitoplasmático/genética , Ligação Proteica , RNA Mensageiro/genética , RNA Mensageiro/metabolismo , Proteínas de Ligação a RNA/química , Proteínas de Ligação a RNA/genética , Saccharomyces cerevisiae , Proteínas de Saccharomyces cerevisiae/química , Proteínas de Saccharomyces cerevisiae/genética , Estresse Fisiológico , Fatores de Transcrição/genética , Fatores de Transcrição/metabolismo
13.
Cell Biol Toxicol ; 37(1): 129-149, 2021 02.
Artigo em Inglês | MEDLINE | ID: mdl-33404927

RESUMO

Patients with liver cirrhosis may develop covert or minimal hepatic encephalopathy (MHE). Hyperammonemia (HA) and peripheral inflammation play synergistic roles in inducing the cognitive and motor alterations in MHE. The cerebellum is one of the main cerebral regions affected in MHE. Rats with chronic HA show some motor and cognitive alterations reproducing neurological impairment in cirrhotic patients with MHE. Neuroinflammation and altered neurotransmission and signal transduction in the cerebellum from hyperammonemic (HA) rats are associated with motor and cognitive dysfunction, but underlying mechanisms are not completely known. The aim of this work was to use a multi-omic approach to study molecular alterations in the cerebellum from hyperammonemic rats to uncover new molecular mechanisms associated with hyperammonemia-induced cerebellar function impairment. We analyzed metabolomic, transcriptomic, and proteomic data from the same cerebellums from control and HA rats and performed a multi-omic integrative analysis of signaling pathway enrichment with the PaintOmics tool. The histaminergic system, corticotropin-releasing hormone, cyclic GMP-protein kinase G pathway, and intercellular communication in the cerebellar immune system were some of the most relevant enriched pathways in HA rats. In summary, this is a good approach to find altered pathways, which helps to describe the molecular mechanisms involved in the alteration of brain function in rats with chronic HA and to propose possible therapeutic targets to improve MHE symptoms.


Assuntos
Cerebelo/fisiopatologia , Hiperamonemia/complicações , Animais , Apresentação de Antígeno/imunologia , Moléculas de Adesão Celular/metabolismo , GMP Cíclico/metabolismo , Proteínas Quinases Dependentes de GMP Cíclico/metabolismo , Hiperamonemia/imunologia , Ligantes , Masculino , Ratos Wistar , Transmissão Sináptica/fisiologia
14.
Genome Res ; 27(11): 1807-1815, 2017 11.
Artigo em Inglês | MEDLINE | ID: mdl-29025893

RESUMO

Genome-wide association studies (GWAS) have identified multiple, shared allelic associations with many autoimmune diseases. However, the pathogenic contributions of variants residing in risk loci remain unresolved. The location of the majority of shared disease-associated variants in noncoding regions suggests they contribute to risk of autoimmunity through effects on gene expression in the immune system. In the current study, we test this hypothesis by applying RNA sequencing to CD4+, CD8+, and CD19+ lymphocyte populations isolated from 81 subjects with type 1 diabetes (T1D). We characterize and compare the expression patterns across these cell types for three gene sets: all genes, the set of genes implicated in autoimmune disease risk by GWAS, and the subset of these genes specifically implicated in T1D. We performed RNA sequencing and aligned the reads to both the human reference genome and a catalog of all possible splicing events developed from the genome, thereby providing a comprehensive evaluation of the roles of gene expression and alternative splicing (AS) in autoimmunity. Autoimmune candidate genes displayed greater expression specificity in the three lymphocyte populations relative to other genes, with significantly increased levels of splicing events, particularly those predicted to have substantial effects on protein isoform structure and function (e.g., intron retention, exon skipping). The majority of single-nucleotide polymorphisms within T1D-associated loci were also associated with one or more cis-expression quantitative trait loci (cis-eQTLs) and/or splicing eQTLs. Our findings highlight a substantial, and previously underrecognized, role for AS in the pathogenesis of autoimmune disorders and particularly for T1D.


Assuntos
Processamento Alternativo , Diabetes Mellitus Tipo 1/genética , Perfilação da Expressão Gênica/métodos , Linfócitos/química , Análise de Sequência de RNA/métodos , Adulto , Linfócitos T CD4-Positivos/química , Linfócitos T CD4-Positivos/imunologia , Linfócitos T CD8-Positivos/química , Linfócitos T CD8-Positivos/imunologia , Diabetes Mellitus Tipo 1/imunologia , Feminino , Redes Reguladoras de Genes , Predisposição Genética para Doença , Estudo de Associação Genômica Ampla , Humanos , Linfócitos/imunologia , Masculino , Isoformas de Proteínas/química , Isoformas de Proteínas/metabolismo , Locos de Características Quantitativas , Receptores CCR1/metabolismo
15.
BMC Plant Biol ; 20(1): 539, 2020 Nov 30.
Artigo em Inglês | MEDLINE | ID: mdl-33256589

RESUMO

BACKGROUND: RNA sequencing has been widely used to profile genome-wide gene expression and identify candidate genes controlling disease resistance and other important traits in plants. Gerbera daisy is one of the most important flowers in the global floricultural trade, and powdery mildew (PM) is the most important disease of gerbera. Genetic improvement of gerbera PM resistance has become a crucial goal in gerbera breeding. A better understanding of the genetic control of gerbera resistance to PM can expedite the development of PM-resistant cultivars. RESULTS: The objectives of this study were to identify gerbera genotypes with contrasting phenotypes in PM resistance and sequence and analyze their leaf transcriptomes to identify disease resistance and susceptibility genes differentially expressed and associated with PM resistance. An additional objective was to identify SNPs and SSRs for use in future genetic studies. We identified two gerbera genotypes, UFGE 4033 and 06-245-03, that were resistant and susceptible to PM, respectively. De novo assembly of their leaf transcriptomes using four complementary pipelines resulted in 145,348 transcripts with a N50 of 1124 bp, of which 67,312 transcripts contained open reading frames and 48,268 were expressed in both genotypes. A total of 494 transcripts were likely involved in disease resistance, and 17 and 24 transcripts were up- and down-regulated, respectively, in UFGE 4033 compared to 06-245-03. These gerbera disease resistance transcripts were most similar to the NBS-LRR class of plant resistance genes conferring resistance to various pathogens in plants. Four disease susceptibility transcripts (MLO-like) were expressed only or highly expressed in 06-245-03, offering excellent candidate targets for gene editing for PM resistance in gerbera. A total of 449,897 SNPs and 19,393 SSRs were revealed in the gerbera transcriptomes, which can be a valuable resource for developing new molecular markers. CONCLUSION: This study represents the first transcriptomic analysis of gerbera PM resistance, a highly important yet complex trait in a globally important floral crop. The differentially expressed disease resistance and susceptibility transcripts identified provide excellent targets for development of molecular markers and genetic maps, cloning of disease resistance genes, or targeted mutagenesis of disease susceptibility genes for PM resistance in gerbera.


Assuntos
Ascomicetos , Asteraceae/genética , Resistência à Doença/genética , Doenças das Plantas/genética , Transcriptoma/genética , Asteraceae/microbiologia , Genótipo , Repetições de Microssatélites , Fenótipo , Melhoramento Vegetal , Doenças das Plantas/microbiologia , Folhas de Planta/metabolismo , Polimorfismo de Nucleotídeo Único , RNA-Seq , Reação em Cadeia da Polimerase em Tempo Real
16.
PLoS Comput Biol ; 15(11): e1006555, 2019 11.
Artigo em Inglês | MEDLINE | ID: mdl-31682608

RESUMO

Rapid advances in single-cell assays have outpaced methods for analysis of those data types. Different single-cell assays show extensive variation in sensitivity and signal to noise levels. In particular, scATAC-seq generates extremely sparse and noisy datasets. Existing methods developed to analyze this data require cells amenable to pseudo-time analysis or require datasets with drastically different cell-types. We describe a novel approach using self-organizing maps (SOM) to link scATAC-seq regions with scRNA-seq genes that overcomes these challenges and can generate draft regulatory networks. Our SOMatic package generates chromatin and gene expression SOMs separately and combines them using a linking function. We applied SOMatic on a mouse pre-B cell differentiation time-course using controlled Ikaros over-expression to recover gene ontology enrichments, identify motifs in genomic regions showing similar single-cell profiles, and generate a gene regulatory network that both recovers known interactions and predicts new Ikaros targets during the differentiation process. The ability of linked SOMs to detect emergent properties from multiple types of highly-dimensional genomic data with very different signal properties opens new avenues for integrative analysis of heterogeneous data.


Assuntos
Análise de Sequência de DNA/métodos , Análise de Sequência de RNA/métodos , Análise de Célula Única/métodos , Algoritmos , Animais , Biologia Computacional/métodos , Perfilação da Expressão Gênica/métodos , Redes Reguladoras de Genes/genética , Genoma , Genômica/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Humanos , Software
17.
Nucleic Acids Res ; 46(W1): W503-W509, 2018 07 02.
Artigo em Inglês | MEDLINE | ID: mdl-29800320

RESUMO

The increasing availability of multi-omic platforms poses new challenges to data analysis. Joint visualization of multi-omics data is instrumental in better understanding interconnections across molecular layers and in fully utilizing the multi-omic resources available to make biological discoveries. We present here PaintOmics 3, a web-based resource for the integrated visualization of multiple omic data types onto KEGG pathway diagrams. PaintOmics 3 combines server-end capabilities for data analysis with the potential of modern web resources for data visualization, providing researchers with a powerful framework for interactive exploration of their multi-omics information. Unlike other visualization tools, PaintOmics 3 covers a comprehensive pathway analysis workflow, including automatic feature name/identifier conversion, multi-layered feature matching, pathway enrichment, network analysis, interactive heatmaps, trend charts, and more. It accepts a wide variety of omic types, including transcriptomics, proteomics and metabolomics, as well as region-based approaches such as ATAC-seq or ChIP-seq data. The tool is freely available at www.paintomics.org.


Assuntos
Regulação da Expressão Gênica , Redes e Vias Metabólicas/genética , Transdução de Sinais/genética , Software , Transcriptoma , Linhagem Celular Transformada , Reprogramação Celular , Gráficos por Computador , Fibroblastos/citologia , Fibroblastos/metabolismo , Genômica/métodos , Humanos , Internet , Metabolômica/métodos , Anotação de Sequência Molecular , Proteômica/métodos
18.
Mol Microbiol ; 107(1): 116-131, 2018 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-29105190

RESUMO

Transcriptional regulation is the key to ensuring that proteins are expressed at the proper time and the proper amount. In Escherichia coli, the transcription factor cAMP receptor protein (CRP) is responsible for much of this regulation. Questions remain, however, regarding the regulation of CRP activity itself. Here, we demonstrate that a lysine (K100) on the surface of CRP has a dual function: to promote CRP activity at Class II promoters, and to ensure proper CRP steady state levels. Both functions require the lysine's positive charge; intriguingly, the positive charge of K100 can be neutralized by acetylation using the central metabolite acetyl phosphate as the acetyl donor. We propose that CRP K100 acetylation could be a mechanism by which the cell downwardly tunes CRP-dependent Class II promoter activity, whilst elevating CRP steady state levels, thus indirectly increasing Class I promoter activity. This mechanism would operate under conditions that favor acetate fermentation, such as during growth on glucose as the sole carbon source or when carbon flux exceeds the capacity of the central metabolic pathways.


Assuntos
Proteína Receptora de AMP Cíclico/genética , Proteína Receptora de AMP Cíclico/metabolismo , Proteínas de Escherichia coli/genética , Proteínas de Escherichia coli/metabolismo , Lisina/metabolismo , Acetilação , Sítios de Ligação , Escherichia coli/genética , Escherichia coli/metabolismo , Regulação Bacteriana da Expressão Gênica/genética , Regiões Promotoras Genéticas/genética , Processamento de Proteína Pós-Traducional/genética , Proteínas Repressoras/metabolismo , Fatores de Transcrição/metabolismo
19.
Bioinformatics ; 34(9): 1547-1554, 2018 05 01.
Artigo em Inglês | MEDLINE | ID: mdl-29272325

RESUMO

Motivation: Best performing named entity recognition (NER) methods for biomedical literature are based on hand-crafted features or task-specific rules, which are costly to produce and difficult to generalize to other corpora. End-to-end neural networks achieve state-of-the-art performance without hand-crafted features and task-specific knowledge in non-biomedical NER tasks. However, in the biomedical domain, using the same architecture does not yield competitive performance compared with conventional machine learning models. Results: We propose a novel end-to-end deep learning approach for biomedical NER tasks that leverages the local contexts based on n-gram character and word embeddings via Convolutional Neural Network (CNN). We call this approach GRAM-CNN. To automatically label a word, this method uses the local information around a word. Therefore, the GRAM-CNN method does not require any specific knowledge or feature engineering and can be theoretically applied to a wide range of existing NER problems. The GRAM-CNN approach was evaluated on three well-known biomedical datasets containing different BioNER entities. It obtained an F1-score of 87.26% on the Biocreative II dataset, 87.26% on the NCBI dataset and 72.57% on the JNLPBA dataset. Those results put GRAM-CNN in the lead of the biological NER methods. To the best of our knowledge, we are the first to apply CNN based structures to BioNER problems. Availability and implementation: The GRAM-CNN source code, datasets and pre-trained model are available online at: https://github.com/valdersoul/GRAM-CNN. Contact: andyli@ece.ufl.edu or aconesa@ufl.edu. Supplementary information: Supplementary data are available at Bioinformatics online.


Assuntos
Biologia Computacional/métodos , Aprendizado Profundo , Software
20.
Bioinformatics ; 34(3): 524-526, 2018 02 01.
Artigo em Inglês | MEDLINE | ID: mdl-28968682

RESUMO

Motivation: As sequencing technologies improve their capacity to detect distinct transcripts of the same gene and to address complex experimental designs such as longitudinal studies, there is a need to develop statistical methods for the analysis of isoform expression changes in time series data. Results: Iso-maSigPro is a new functionality of the R package maSigPro for transcriptomics time series data analysis. Iso-maSigPro identifies genes with a differential isoform usage across time. The package also includes new clustering and visualization functions that allow grouping of genes with similar expression patterns at the isoform level, as well as those genes with a shift in major expressed isoform. Availability and implementation: The package is freely available under the LGPL license from the Bioconductor web site. Contact: mj.nueda@ua.es or aconesa@ufl.edu. Supplementary information: Supplementary data are available at Bioinformatics online.


Assuntos
Perfilação da Expressão Gênica/métodos , Isoformas de RNA/análise , Análise de Sequência de RNA/métodos , Software , Animais , Linfócitos B/metabolismo , Linfócitos B/fisiologia , Diferenciação Celular , Regulação da Expressão Gênica , Camundongos , Isoformas de RNA/genética
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA