Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 14 de 14
Filtrar
1.
Bioinformatics ; 37(Suppl_1): i289-i298, 2021 07 12.
Artículo en Inglés | MEDLINE | ID: mdl-34252942

RESUMEN

MOTIVATION: Circular RNA (circRNA) is a novel class of long non-coding RNAs that have been broadly discovered in the eukaryotic transcriptome. The circular structure arises from a non-canonical splicing process, where the donor site backspliced to an upstream acceptor site. These circRNA sequences are conserved across species. More importantly, rising evidence suggests their vital roles in gene regulation and association with diseases. As the fundamental effort toward elucidating their functions and mechanisms, several computational methods have been proposed to predict the circular structure from the primary sequence. Recently, advanced computational methods leverage deep learning to capture the relevant patterns from RNA sequences and model their interactions to facilitate the prediction. However, these methods fail to fully explore positional information of splice junctions and their deep interaction. RESULTS: We present a robust end-to-end framework, Junction Encoder with Deep Interaction (JEDI), for circRNA prediction using only nucleotide sequences. JEDI first leverages the attention mechanism to encode each junction site based on deep bidirectional recurrent neural networks and then presents the novel cross-attention layer to model deep interaction among these sites for backsplicing. Finally, JEDI can not only predict circRNAs but also interpret relationships among splice sites to discover backsplicing hotspots within a gene region. Experiments demonstrate JEDI significantly outperforms state-of-the-art approaches in circRNA prediction on both isoform level and gene level. Moreover, JEDI also shows promising results on zero-shot backsplicing discovery, where none of the existing approaches can achieve. AVAILABILITY AND IMPLEMENTATION: The implementation of our framework is available at https://github.com/hallogameboy/JEDI. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
ARN Circular , ARN Largo no Codificante , Redes Neurales de la Computación , ARN/genética , Sitios de Empalme de ARN/genética , Empalme del ARN
2.
Am J Hum Genet ; 100(5): 789-802, 2017 May 04.
Artículo en Inglés | MEDLINE | ID: mdl-28475861

RESUMEN

Recent successes in genome-wide association studies (GWASs) make it possible to address important questions about the genetic architecture of complex traits, such as allele frequency and effect size. One lesser-known aspect of complex traits is the extent of allelic heterogeneity (AH) arising from multiple causal variants at a locus. We developed a computational method to infer the probability of AH and applied it to three GWASs and four expression quantitative trait loci (eQTL) datasets. We identified a total of 4,152 loci with strong evidence of AH. The proportion of all loci with identified AH is 4%-23% in eQTLs, 35% in GWASs of high-density lipoprotein (HDL), and 23% in GWASs of schizophrenia. For eQTLs, we observed a strong correlation between sample size and the proportion of loci with AH (R2 = 0.85, p = 2.2 × 10-16), indicating that statistical power prevents identification of AH in other loci. Understanding the extent of AH may guide the development of new methods for fine mapping and association mapping of complex traits.


Asunto(s)
Alelos , Frecuencia de los Genes , Sitios de Carácter Cuantitativo , Bases de Datos Genéticas , Estudios de Asociación Genética , Humanos , Desequilibrio de Ligamiento , Modelos Moleculares , Fenotipo
3.
Bioinformatics ; 35(14): i305-i314, 2019 07 15.
Artículo en Inglés | MEDLINE | ID: mdl-31510705

RESUMEN

MOTIVATION: Sequence-based protein-protein interaction (PPI) prediction represents a fundamental computational biology problem. To address this problem, extensive research efforts have been made to extract predefined features from the sequences. Based on these features, statistical algorithms are learned to classify the PPIs. However, such explicit features are usually costly to extract, and typically have limited coverage on the PPI information. RESULTS: We present an end-to-end framework, PIPR (Protein-Protein Interaction Prediction Based on Siamese Residual RCNN), for PPI predictions using only the protein sequences. PIPR incorporates a deep residual recurrent convolutional neural network in the Siamese architecture, which leverages both robust local features and contextualized information, which are significant for capturing the mutual influence of proteins sequences. PIPR relieves the data pre-processing efforts that are required by other systems, and generalizes well to different application scenarios. Experimental evaluations show that PIPR outperforms various state-of-the-art systems on the binary PPI prediction problem. Moreover, it shows a promising performance on more challenging problems of interaction type prediction and binding affinity estimation, where existing approaches fall short. AVAILABILITY AND IMPLEMENTATION: The implementation is available at https://github.com/muhaochen/seq_ppi.git. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Biología Computacional , Redes Neurales de la Computación , Algoritmos , Secuencia de Aminoácidos , Unión Proteica , Proteínas
4.
Hum Genomics ; 13(Suppl 1): 47, 2019 10 22.
Artículo en Inglés | MEDLINE | ID: mdl-31639050

RESUMEN

BACKGROUND: Microbes are greatly associated with human health and disease, especially in densely populated cities. It is essential to understand the microbial ecosystem in an urban environment for cities to monitor the transmission of infectious diseases and detect potentially urgent threats. To achieve this goal, the DNA sample collection and analysis have been conducted at subway stations in major cities. However, city-scale sampling with the fine-grained geo-spatial resolution is expensive and laborious. In this paper, we introduce MetaMLAnn, a neural network based approach to infer microbial communities at unsampled locations given information reflecting different factors, including subway line networks, sampling material types, and microbial composition patterns. RESULTS: We evaluate the effectiveness of MetaMLAnn based on the public metagenomics dataset collected from multiple locations in the New York and Boston subway systems. The experimental results suggest that MetaMLAnn consistently performs better than other five conventional classifiers under different taxonomic ranks. At genus level, MetaMLAnn can achieve F1 scores of 0.63 and 0.72 on the New York and the Boston datasets, respectively. CONCLUSIONS: By exploiting heterogeneous features, MetaMLAnn captures the hidden interactions between microbial compositions and the urban environment, which enables precise predictions of microbial communities at unmeasured locations.


Asunto(s)
Metagenómica/métodos , Microbiota/genética , Redes Neurales de la Computación , Algoritmos , Boston , Ciudades , Bases de Datos Genéticas , Modelos Genéticos , New York , Reproducibilidad de los Resultados
5.
Methods ; 166: 74-82, 2019 08 15.
Artículo en Inglés | MEDLINE | ID: mdl-30885720

RESUMEN

The human microbiome plays a number of critical roles, impacting almost every aspect of human health and well-being. Conditions in the microbiome have been linked to a number of significant diseases. Additionally, revolutions in sequencing technology have led to a rapid increase in publicly-available sequencing data. Consequently, there have been growing efforts to predict disease status from metagenomic sequencing data, with a proliferation of new approaches in the last few years. Some of these efforts have explored utilizing a powerful form of machine learning called deep learning, which has been applied successfully in several biological domains. Here, we review some of these methods and the algorithms that they are based on, with a particular focus on deep learning methods. We also perform a deeper analysis of Type 2 Diabetes and obesity datasets that have eluded improved results, using a variety of machine learning and feature extraction methods. We conclude by offering perspectives on study design considerations that may impact results and future directions the field can take to improve results and offer more valuable conclusions. The scripts and extracted features for the analyses conducted in this paper are available via GitHub:https://github.com/nlapier2/metapheno.


Asunto(s)
Aprendizaje Profundo , Diabetes Mellitus Tipo 2/genética , Metagenoma/genética , Obesidad/genética , Algoritmos , Diabetes Mellitus Tipo 2/microbiología , Humanos , Aprendizaje Automático/estadística & datos numéricos , Metagenómica/métodos , Microbiota/genética , Obesidad/microbiología
6.
Plant Cell Environ ; 35(4): 682-701, 2012 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-21988609

RESUMEN

In the autumn, stems of woody perennials such as forest trees undergo a transition from active growth to dormancy. We used microarray transcriptomic profiling in combination with a proteomics analysis to elucidate processes that occur during this growth-to-dormancy transition in a conifer, white spruce (Picea glauca[Moench] Voss). Several differentially expressed genes were likely associated with the developmental transition that occurs during growth cessation in the cambial zone and the concomitant completion of cell maturation in vascular tissues. Genes encoding for cell wall and membrane biosynthetic enzymes showed transcript abundance patterns consistent with completion of cell maturation, and also of cell wall and membrane modifications potentially enabling cells to withstand the harsh conditions of winter. Several differentially expressed genes were identified that encoded putative regulators of cambial activity, cell development and of the photoperiodic pathway. Reconfiguration of carbon allocation figured centrally in the tree's overwintering preparations. For example, genes associated with carbon-based defences such as terpenoids were down-regulated, while many genes associated with protein-based defences and other stress mitigation mechanisms were up-regulated. Several of these correspond to proteins that were accumulated during the growth-to-dormancy transition, emphasizing the importance of stress protection in the tree's adaptive response to overwintering.


Asunto(s)
Perfilación de la Expresión Génica/métodos , Regulación de la Expresión Génica de las Plantas/fisiología , Picea/fisiología , Proteómica/métodos , Adaptación Fisiológica/fisiología , Cámbium/genética , Cámbium/crecimiento & desarrollo , Cámbium/metabolismo , Pared Celular/genética , Pared Celular/metabolismo , Frío , Análisis de Secuencia por Matrices de Oligonucleótidos/métodos , Fotoperiodo , Picea/genética , Picea/crecimiento & desarrollo , Picea/metabolismo , Proteínas de Plantas/genética , Proteínas de Plantas/metabolismo , Estrés Fisiológico/fisiología , Árboles
7.
Physiol Plant ; 144(4): 303-19, 2012 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-22172013

RESUMEN

While many studies have characterized changes to the transcriptome of plants attacked by shoot-eating insect pests, few have examined transcriptome-level effects of root pests. Maize (Zea mays) seedlings were subjected to infestation for approximately 2 weeks by the root herbivore southern corn rootworm (SCR) Diabrotica undecimpunctata howardi, and changes in transcript abundance within both roots and shoots were analyzed using a 57K element microarray. A total of 541 genes showed statistically significant changes in transcript abundance in infested roots, including genes encoding many pathogenesis-related proteins such as chitinases, proteinase inhibitors, peroxidases and ß-1,3-glucanases. Several WRKY transcription factors--often associated with biotic responses--exhibited increased transcript abundance upon SCR feeding. Differentially expressed (DE) genes were also detected in shoots of infested vs control plants. Quantitative Reverse Transcriptase Polymerase Chain Reaction (RT-PCR) was used to confirm patterns of transcript abundance for several significant DE genes using an independent experiment with a 2-6 day period of SCR infestation. Because of the well-documented roles that jasmonic acid (JA) or salicylic acid (SA) play in herbivory responses, the effect of exogenous JA or SA application on transcript abundance corresponding to the same subset of SCR-responsive genes was assessed. The response of these genes at the level of transcript abundance to SA and JA differed between roots and shoots and also differed among the genes that were examined. These data suggested that SA- and JA-dependent and independent signals contributed to the transcriptome-level changes in maize roots and shoots in response to SCR infestation.


Asunto(s)
Escarabajos/fisiología , Regulación de la Expresión Génica de las Plantas/genética , Enfermedades de las Plantas/parasitología , Raíces de Plantas/genética , Transcriptoma , Zea mays/genética , Animales , Ciclopentanos/farmacología , Perfilación de la Expresión Génica , Herbivoria , Larva , Análisis de Secuencia por Matrices de Oligonucleótidos , Oxilipinas/farmacología , Reguladores del Crecimiento de las Plantas/farmacología , Raíces de Plantas/parasitología , Brotes de la Planta/genética , Brotes de la Planta/parasitología , ARN de Planta/genética , Ácido Salicílico/farmacología , Transducción de Señal/genética , Regulación hacia Arriba/genética , Zea mays/parasitología
8.
Plant Cell Environ ; 34(3): 480-500, 2011 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-21118421

RESUMEN

Bud formation is an adaptive trait that temperate forest trees have acquired to facilitate seasonal synchronization. We have characterized transcriptome-level changes that occur during bud formation of white spruce [Picea glauca (Moench) Voss], a primarily determinate species in which preformed stem units contained within the apical bud constitute most of next season's growth. Microarray analysis identified 4460 differentially expressed sequences in shoot tips during short day-induced bud formation. Cluster analysis revealed distinct temporal patterns of expression, and functional classification of genes in these clusters implied molecular processes that coincide with anatomical changes occurring in the developing bud. Comparing expression profiles in developing buds under long day and short day conditions identified possible photoperiod-responsive genes that may not be essential for bud development. Several genes putatively associated with hormone signalling were identified, and hormone quantification revealed distinct profiles for abscisic acid (ABA), cytokinins, auxin and their metabolites that can be related to morphological changes to the bud. Comparison of gene expression profiles during bud formation in different tissues revealed 108 genes that are differentially expressed only in developing buds and show greater transcript abundance in developing buds than other tissues. These findings provide a temporal roadmap of bud formation in white spruce.


Asunto(s)
Perfilación de la Expresión Génica , Picea/crecimiento & desarrollo , Picea/genética , Ácido Abscísico/análisis , Análisis por Conglomerados , Citocininas/análisis , Regulación de la Expresión Génica de las Plantas , Ácidos Indolacéticos/análisis , Análisis de Secuencia por Matrices de Oligonucleótidos , Fotoperiodo , Brotes de la Planta/genética , Brotes de la Planta/crecimiento & desarrollo , Quebec , ARN de Planta/genética
9.
Med Rev (Berl) ; 1(2): 114-125, 2021 Dec 20.
Artículo en Inglés | MEDLINE | ID: mdl-35881666

RESUMEN

Objectives: Genomic signatures like k-mers have become one of the most prominent approaches to describe genomic data. As a result, myriad real-world applications, such as the construction of de Bruijn graphs in genome assembly, have been benefited by recognizing genomic signatures. In other words, an efficient approach of genomic signature profiling is an essential need for tackling high-throughput sequencing reads. However, most of the existing approaches only recognize fixed-size k-mers while many research studies have shown the importance of considering variable-length k-mers. Methods: In this paper, we present a novel genomic signature profiling approach, TahcoRoll, by extending the Aho-Corasick algorithm (AC) for the task of profiling variable-length k-mers. We first group nucleotides into two clusters and represent each cluster with a bit. The rolling hash technique is further utilized to encode signatures and read patterns for efficient matching. Results: In extensive experiments, TahcoRoll significantly outperforms the most state-of-the-art k-mer counters and has the capability of processing reads across different sequencing platforms on a budget desktop computer. Conclusions: The single-thread version of TahcoRoll is as efficient as the eight-thread version of the state-of-the-art, JellyFish, while the eight-thread TahcoRoll outperforms the eight-thread JellyFish by at least four times.

10.
NAR Genom Bioinform ; 2(2): lqaa015, 2020 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-32166223

RESUMEN

The functional impact of protein mutations is reflected on the alteration of conformation and thermodynamics of protein-protein interactions (PPIs). Quantifying the changes of two interacting proteins upon mutations is commonly carried out by computational approaches. Hence, extensive research efforts have been put to the extraction of energetic or structural features on proteins, followed by statistical learning methods to estimate the effects of mutations on PPI properties. Nonetheless, such features require extensive human labors and expert knowledge to obtain, and have limited abilities to reflect point mutations. We present an end-to-end deep learning framework, MuPIPR (Mutation Effects in Protein-protein Interaction PRediction Using Contextualized Representations), to estimate the effects of mutations on PPIs. MuPIPR incorporates a contextualized representation mechanism of amino acids to propagate the effects of a point mutation to surrounding amino acid representations, therefore amplifying the subtle change in a long protein sequence. On top of that, MuPIPR leverages a Siamese residual recurrent convolutional neural encoder to encode a wild-type protein pair and its mutation pair. Multi-layer perceptron regressors are applied to the protein pair representations to predict the quantifiable changes of PPI properties upon mutations. Experimental evaluations show that, with only sequence information, MuPIPR outperforms various state-of-the-art systems on estimating the changes of binding affinity for SKEMPI v1, and offers comparable performance on SKEMPI v2. Meanwhile, MuPIPR also demonstrates state-of-the-art performance on estimating the changes of buried surface areas. The software implementation is available at https://github.com/guangyu-zhou/MuPIPR.

11.
Nat Commun ; 10(1): 4054, 2019 09 06.
Artículo en Inglés | MEDLINE | ID: mdl-31492842

RESUMEN

Transposable elements (TE) comprise roughly half of the human genome. Though initially derided as junk DNA, they have been widely hypothesized to contribute to the evolution of gene regulation. However, the contribution of TE to the genetic architecture of diseases remains unknown. Here, we analyze data from 41 independent diseases and complex traits to draw three conclusions. First, TE are uniquely informative for disease heritability. Despite overall depletion for heritability (54% of SNPs, 39 ± 2% of heritability), TE explain substantially more heritability than expected based on their depletion for known functional annotations. This implies that TE acquire function in ways that differ from known functional annotations. Second, older TE contribute more to disease heritability, consistent with acquiring biological function. Third, Short Interspersed Nuclear Elements (SINE) are far more enriched for blood traits than for other traits. Our results can help elucidate the biological roles that TE play in the genetic architecture of diseases.


Asunto(s)
Elementos Transponibles de ADN/genética , Enfermedad/genética , Regulación de la Expresión Génica , Genoma Humano/genética , Patrón de Herencia/genética , Retroelementos/genética , Algoritmos , Enfermedades Autoinmunes/sangre , Enfermedades Autoinmunes/genética , Encefalopatías/sangre , Encefalopatías/genética , Evolución Molecular , Humanos , Polimorfismo de Nucleótido Simple , Sitios de Carácter Cuantitativo/genética , Elementos de Nucleótido Esparcido Corto/genética
12.
Nat Genet ; 50(7): 1041-1047, 2018 07.
Artículo en Inglés | MEDLINE | ID: mdl-29942083

RESUMEN

There is increasing evidence that many risk loci found using genome-wide association studies are molecular quantitative trait loci (QTLs). Here we introduce a new set of functional annotations based on causal posterior probabilities of fine-mapped molecular cis-QTLs, using data from the Genotype-Tissue Expression (GTEx) and BLUEPRINT consortia. We show that these annotations are more strongly enriched for heritability (5.84× for eQTLs; P = 1.19 × 10-31) across 41 diseases and complex traits than annotations containing all significant molecular QTLs (1.80× for expression (e)QTLs). eQTL annotations obtained by meta-analyzing all GTEx tissues generally performed best, whereas tissue-specific eQTL annotations produced stronger enrichments for blood- and brain-related diseases and traits. eQTL annotations restricted to loss-of-function intolerant genes were even more enriched for heritability (17.06×; P = 1.20 × 10-35). All molecular QTLs except splicing QTLs remained significantly enriched in joint analysis, indicating that each of these annotations is uniquely informative for disease and complex trait architectures.


Asunto(s)
Enfermedad/genética , Herencia Multifactorial , Sitios de Carácter Cuantitativo , Estudio de Asociación del Genoma Completo/métodos , Humanos , Fenotipo , Polimorfismo de Nucleótido Simple , Carácter Cuantitativo Heredable
13.
Artículo en Inglés | MEDLINE | ID: mdl-27429446

RESUMEN

RNA-Sequencing has been the leading technology to quantify expression of thousands of genes simultaneously. The data analysis of an RNA-Seq experiment starts from aligning short reads to the reference genome/transcriptome or reconstructed transcriptome. However, current aligners lack the sensitivity to distinguish reads that come from homologous regions of an genome. One group of these homologies is the paralog pseudogenes. Pseudogenes arise from duplication of a set of protein coding genes, and have been considered as degraded paralogs in the genome due to their lost of functionality. Recent studies have provided evidence to support their novel regulatory roles in biological processes. With the growing interests in quantifying the expression level of pseudogenes at different tissues or cell lines, it is critical to have a sensitive method that can correctly align ambiguous reads and accurately estimate the expression level among homologous genes. Previously in PseudoLasso, we proposed a linear regression approach to learn read alignment behaviors, and to leverage this knowledge for abundance estimation and alignment correction. In this paper, we extend the work of PseudoLasso by grouping the homologous genomic regions into different communities using a community detection algorithm, followed by building a linear regression model separately for each community. The results show that this approach is able to retain the same accuracy as PseudoLasso. By breaking the genome into smaller homologous communities, the running time is improved from quadratic growth to linear with respect to the number of genes.


Asunto(s)
Biología Computacional/métodos , Seudogenes/genética , Alineación de Secuencia/métodos , Análisis de Secuencia de ARN/métodos , Algoritmos , Humanos , Modelos Lineales , ARN no Traducido/genética
14.
J Chem Ecol ; 34(8): 1013-25, 2008 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-18581175

RESUMEN

Colorado potato beetle (CPB) is a leading pest of solanaceous plants. Despite the economic importance of this pest, surprisingly few studies have been carried out to characterize its molecular interaction with the potato plant. In particular, little is known about the effect of CPB elicitors on gene expression associated with the plant's defense response. In order to discover putative CPB elicitor-responsive genes, the TIGR 11,421 EST Solanaceae microarray was used to identify genes that are differentially expressed in response to the addition of CPB regurgitant to wounded potato leaves. By applying a cutoff corresponding to an adjusted P-value of <0.01 and a fold change of >1.5 or <0.67, we found that 73 of these genes are induced by regurgitant treatment of wounded leaves when compared to wounding alone, whereas 54 genes are repressed by this treatment. This gene set likely includes regurgitant-responsive genes as well as wounding-responsive genes whose expression patterns are further enhanced by the presence of regurgitant. Real-time polymerase chain reaction was used to validate differential expression by regurgitant treatment for five of these genes. In general, genes that encoded proteins involved in secondary metabolism and stress were induced by regurgitant; genes associated with photosynthesis were repressed. One induced gene that encodes aromatic amino acid decarboxylase is responsible for synthesis of the precursor of 2-phenylethanol. This is significant because 2-phenylethanol is recognized by the CPB predator Perillus bioculatis. In addition, three of the 16 type 1 and type 2 proteinase inhibitor clones present on the potato microarray were repressed by application of CPB regurgitant to wounded leaves. Given that proteinase inhibitors are known to interfere with digestion of proteins in the insect midgut, repression of these proteinase inhibitors by CPB may inhibit this component of the plant's defense arsenal. These data suggest that beyond the wound response, CPB elicitors play a role in mediating the plant/insect interaction.


Asunto(s)
Escarabajos/fisiología , Perfilación de la Expresión Génica , Análisis de Secuencia por Matrices de Oligonucleótidos , Hojas de la Planta/genética , Hojas de la Planta/fisiología , Solanum tuberosum/genética , Solanum tuberosum/fisiología , Animales , Metabolismo de los Hidratos de Carbono/genética , Regulación hacia Abajo , Alimentos , Regulación de la Expresión Génica de las Plantas , Nitrógeno/metabolismo , Fotosíntesis/genética , Hojas de la Planta/metabolismo , Biosíntesis de Proteínas/genética , Solanum tuberosum/metabolismo , Regulación hacia Arriba
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA