Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 30
Filtrar
Mais filtros

Base de dados
Tipo de documento
Intervalo de ano de publicação
1.
Cell ; 180(5): 915-927.e16, 2020 03 05.
Artigo em Inglês | MEDLINE | ID: mdl-32084333

RESUMO

The dichotomous model of "drivers" and "passengers" in cancer posits that only a few mutations in a tumor strongly affect its progression, with the remaining ones being inconsequential. Here, we leveraged the comprehensive variant dataset from the ICGC/TCGA Pan-Cancer Analysis of Whole Genomes (PCAWG) project to demonstrate that-in addition to the dichotomy of high- and low-impact variants-there is a third group of medium-impact putative passengers. Moreover, we also found that molecular impact correlates with subclonal architecture (i.e., early versus late mutations), and different signatures encode for mutations with divergent impact. Furthermore, we adapted an additive-effects model from complex-trait studies to show that the aggregated effect of putative passengers, including undetected weak drivers, provides significant additional power (∼12% additive variance) for predicting cancerous phenotypes, beyond PCAWG-identified driver mutations. Finally, this framework allowed us to estimate the frequency of potential weak-driver mutations in PCAWG samples lacking any well-characterized driver alterations.


Assuntos
Genoma Humano/genética , Genômica/métodos , Mutação/genética , Neoplasias/genética , Análise Mutacional de DNA/métodos , Progressão da Doença , Humanos , Neoplasias/patologia , Sequenciamento Completo do Genoma
2.
PLoS Comput Biol ; 19(7): e1011222, 2023 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-37410793

RESUMO

The COVID-19 pandemic caused by the SARS-CoV-2 virus has resulted in millions of deaths worldwide. The disease presents with various manifestations that can vary in severity and long-term outcomes. Previous efforts have contributed to the development of effective strategies for treatment and prevention by uncovering the mechanism of viral infection. We now know all the direct protein-protein interactions that occur during the lifecycle of SARS-CoV-2 infection, but it is critical to move beyond these known interactions to a comprehensive understanding of the "full interactome" of SARS-CoV-2 infection, which incorporates human microRNAs (miRNAs), additional human protein-coding genes, and exogenous microbes. Potentially, this will help in developing new drugs to treat COVID-19, differentiating the nuances of long COVID, and identifying histopathological signatures in SARS-CoV-2-infected organs. To construct the full interactome, we developed a statistical modeling approach called MLCrosstalk (multiple-layer crosstalk) based on latent Dirichlet allocation. MLCrosstalk integrates data from multiple sources, including microbes, human protein-coding genes, miRNAs, and human protein-protein interactions. It constructs "topics" that group SARS-CoV-2 with genes and microbes based on similar patterns of co-occurrence across patient samples. We use these topics to infer linkages between SARS-CoV-2 and protein-coding genes, miRNAs, and microbes. We then refine these initial linkages using network propagation to contextualize them within a larger framework of network and pathway structures. Using MLCrosstalk, we identified genes in the IL1-processing and VEGFA-VEGFR2 pathways that are linked to SARS-CoV-2. We also found that Rothia mucilaginosa and Prevotella melaninogenica are positively and negatively correlated with SARS-CoV-2 abundance, a finding corroborated by analysis of single-cell sequencing data.


Assuntos
COVID-19 , MicroRNAs , Humanos , SARS-CoV-2/genética , Síndrome de COVID-19 Pós-Aguda , Pandemias/prevenção & controle , MicroRNAs/genética
3.
Bioinformatics ; 37(18): 2998-3000, 2021 09 29.
Artigo em Inglês | MEDLINE | ID: mdl-33792640

RESUMO

MOTIVATION: Traditionally, an individual can only query and retrieve information from a genome browser by using accessories such as a mouse and keyboard. However, technology has changed the way that people interact with their screens. We hypothesized that we could leverage technological advances to use voice recognition as an interactive input to query and visualize genomic information. RESULTS: We developed an Amazon Alexa skill called Gene Tracer that allows users to use their voice to find disease-associated gene information, deleterious mutations and gene networks, while simultaneously enjoy a genome browser-like visualization experience on their screen. As the voice can be well recognized and understood, Gene Tracer provides users with more flexibility to acquire knowledge and is broadly applicable to other scenarios. AVAILABILITYAND IMPLEMENTATION: Alexa skill store (https://www.amazon.com/LT-Gene-tracer/dp/B08HCL1V68/) and a demonstration video (https://youtu.be/XbDbx7JDKmI). SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Genômica , Software , Genoma , Armazenamento e Recuperação da Informação , Mutação
4.
PLoS Comput Biol ; 17(8): e1009303, 2021 08.
Artigo em Inglês | MEDLINE | ID: mdl-34424894

RESUMO

The development of mobile-health technology has the potential to revolutionize personalized medicine. Biomedical sensors (e.g., wearables) can assist with determining treatment plans for individuals, provide quantitative information to healthcare providers, and give objective measurements of health, leading to the goal of precise phenotypic correlates for genotypes. Even though treatments and interventions are becoming more specific and datasets more abundant, measuring the causal impact of health interventions requires careful considerations of complex covariate structures, as well as knowledge of the temporal and spatial properties of the data. Thus, interpreting biomedical sensor data needs to make use of specialized statistical models. Here, we show how the Bayesian structural time series framework, widely used in economics, can be applied to these data. This framework corrects for covariates to provide accurate assessments of the significance of interventions. Furthermore, it allows for a time-dependent confidence interval of impact, which is useful for considering individualized assessments of intervention efficacy. We provide a customized biomedical adaptor tool, MhealthCI, around a specific implementation of the Bayesian structural time series framework that uniformly processes, prepares, and registers diverse biomedical data. We apply the software implementation of MhealthCI to a structured set of examples in biomedicine to showcase the ability of the framework to evaluate interventions with varying levels of data richness and covariate complexity and also compare the performance to other models. Specifically, we show how the framework is able to evaluate an exercise intervention's effect on stabilizing blood glucose in a diabetes dataset. We also provide a future-anticipating illustration from a behavioral dataset showcasing how the framework integrates complex spatial covariates. Overall, we show the robustness of the Bayesian structural time series framework when applied to biomedical sensor data, highlighting its increasing value for current and future datasets.


Assuntos
Teorema de Bayes , Modelos Estatísticos , Técnicas Biossensoriais , Conjuntos de Dados como Assunto , Humanos , Software
5.
PLoS Genet ; 15(8): e1007860, 2019 08.
Artigo em Inglês | MEDLINE | ID: mdl-31469829

RESUMO

There has been much effort to prioritize genomic variants with respect to their impact on "function". However, function is often not precisely defined: sometimes it is the disease association of a variant; on other occasions, it reflects a molecular effect on transcription or epigenetics. Here, we coupled multiple genomic predictors to build GRAM, a GeneRAlized Model, to predict a well-defined experimental target: the expression-modulating effect of a non-coding variant on its associated gene, in a transferable, cell-specific manner. Firstly, we performed feature engineering: using LASSO, a regularized linear model, we found transcription factor (TF) binding most predictive, especially for TFs that are hubs in the regulatory network; in contrast, evolutionary conservation, a popular feature in many other variant-impact predictors, has almost no contribution. Moreover, TF binding inferred from in vitro SELEX is as effective as that from in vivo ChIP-Seq. Second, we implemented GRAM integrating only SELEX features and expression profiles; thus, the program combines a universal regulatory score with an easily obtainable modifier reflecting the particular cell type. We benchmarked GRAM on large-scale MPRA datasets, achieving AUROC scores of 0.72 in GM12878 and 0.66 in a multi-cell line dataset. We then evaluated the performance of GRAM on targeted regions using luciferase assays in the MCF7 and K562 cell lines. We noted that changing the insertion position of the construct relative to the reporter gene gave very different results, highlighting the importance of carefully defining the exact prediction target of the model. Finally, we illustrated the utility of GRAM in fine-mapping causal variants and developed a practical software pipeline to carry this out. In particular, we demonstrated in specific examples how the pipeline could pinpoint variants that directly modulate gene expression within a larger linkage-disequilibrium block associated with a phenotype of interest (e.g., for an eQTL).


Assuntos
Regulação da Expressão Gênica/genética , Variação Genética/genética , Análise de Sequência de DNA/métodos , Algoritmos , Sítios de Ligação , Simulação por Computador , Previsões/métodos , Genômica/métodos , Humanos , Desequilíbrio de Ligação/genética , Modelos Genéticos , Ligação Proteica/genética , Locos de Características Quantitativas/genética , Software , Fatores de Transcrição/genética
6.
Bioinformatics ; 36(Suppl_1): i474-i481, 2020 07 01.
Artigo em Inglês | MEDLINE | ID: mdl-32657410

RESUMO

MOTIVATION: Recently, many chromatin immunoprecipitation sequencing experiments have been carried out for a diverse group of transcription factors (TFs) in many different types of human cells. These experiments manifest large-scale and dynamic changes in regulatory network connectivity (i.e. network 'rewiring'), highlighting the different regulatory programs operating in disparate cellular states. However, due to the dense and noisy nature of current regulatory networks, directly comparing the gains and losses of targets of key TFs across cell states is often not informative. Thus, here, we seek an abstracted, low-dimensional representation to understand the main features of network change. RESULTS: We propose a method called TopicNet that applies latent Dirichlet allocation to extract functional topics for a collection of genes regulated by a given TF. We then define a rewiring score to quantify regulatory-network changes in terms of the topic changes for this TF. Using this framework, we can pinpoint particular TFs that change greatly in network connectivity between different cellular states (such as observed in oncogenesis). Also, incorporating gene expression data, we define a topic activity score that measures the degree to which a given topic is active in a particular cellular state. And we show how activity differences can indicate differential survival in various cancers. AVAILABILITY AND IMPLEMENTATION: The TopicNet framework and related analysis were implemented using R and all codes are available at https://github.com/gersteinlab/topicnet. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Redes Reguladoras de Genes , Fatores de Transcrição , Sequenciamento de Cromatina por Imunoprecipitação , Humanos , Fatores de Transcrição/genética
7.
BMC Bioinformatics ; 21(1): 457, 2020 Oct 15.
Artigo em Inglês | MEDLINE | ID: mdl-33059594

RESUMO

BACKGROUND: The pathogenesis of asthma is a complex process involving multiple genes and pathways. Identifying biomarkers from asthma datasets, especially those that include heterogeneous subpopulations, is challenging. Potentially, autoencoders provide ideal frameworks for such tasks as they can embed complex, noisy high-dimensional gene expression data into a low-dimensional latent space in an unsupervised fashion, enabling us to extract distinguishing features from expression data. RESULTS: Here, we developed a framework combining a denoising autoencoder and a supervised learning classifier to identify gene signatures related to asthma severity. Using the trained autoencoder with 50 hidden units, we found that hierarchical clustering on the low-dimensional embedding corresponds well with previously defined and clinically relevant clusters of patients. Moreover, each hidden unit has contributions from each of the genes, and pathway analysis of these contributions shows that the hidden units are significantly enriched in known asthma-related pathways. We then used genes that contribute most to the hidden units to develop a secondary random-forest classifier for directly predicting asthma severity. The feature importance metric from this classifier identified a signature based on 50 key genes, which are associated with severity. Furthermore, we can use these key genes to successfully estimate FEV1/FVC ratios across patients, via support-vector-machine regression. CONCLUSION: We found that the denoising autoencoder framework can extract meaningful patterns corresponding to functional gene groups and patient clusters from the gene expression of asthma patients.


Assuntos
Algoritmos , Asma/genética , Perfilação da Expressão Gênica , Regulação da Expressão Gênica , Escarro/metabolismo , Área Sob a Curva , Asma/patologia , Análise por Conglomerados , Humanos , Anotação de Sequência Molecular , Curva ROC , Índice de Gravidade de Doença , Máquina de Vetores de Suporte
8.
BMC Bioinformatics ; 21(1): 281, 2020 Jul 02.
Artigo em Inglês | MEDLINE | ID: mdl-32615918

RESUMO

BACKGROUND: During transcription, numerous transcription factors (TFs) bind to targets in a highly coordinated manner to control the gene expression. Alterations in groups of TF-binding profiles (i.e. "co-binding changes") can affect the co-regulating associations between TFs (i.e. "rewiring the co-regulator network"). This, in turn, can potentially drive downstream expression changes, phenotypic variation, and even disease. However, quantification of co-regulatory network rewiring has not been comprehensively studied. RESULTS: To address this, we propose DiNeR, a computational method to directly construct a differential TF co-regulation network from paired disease-to-normal ChIP-seq data. Specifically, DiNeR uses a graphical model to capture the gained and lost edges in the co-regulation network. Then, it adopts a stability-based, sparsity-tuning criterion -- by sub-sampling the complete binding profiles to remove spurious edges -- to report only significant co-regulation alterations. Finally, DiNeR highlights hubs in the resultant differential network as key TFs associated with disease. We assembled genome-wide binding profiles of 104 TFs in the K562 and GM12878 cell lines, which loosely model the transition between normal and cancerous states in chronic myeloid leukemia (CML). In total, we identified 351 significantly altered TF co-regulation pairs. In particular, we found that the co-binding of the tumor suppressor BRCA1 and RNA polymerase II, a well-known transcriptional pair in healthy cells, was disrupted in tumors. Thus, DiNeR successfully extracted hub regulators and discovered well-known risk genes. CONCLUSIONS: Our method DiNeR makes it possible to quantify changes in co-regulatory networks and identify alterations to TF co-binding patterns, highlighting key disease regulators. Our method DiNeR makes it possible to quantify changes in co-regulatory networks and identify alterations to TF co-binding patterns, highlighting key disease regulators.


Assuntos
Redes Reguladoras de Genes , Modelos Genéticos , Software , Imunoprecipitação da Cromatina , Regulação da Expressão Gênica , Genoma , Humanos , Células K562 , Leucemia Mielogênica Crônica BCR-ABL Positiva/genética , Ligação Proteica , Fatores de Transcrição/metabolismo , Transcrição Gênica
9.
PLoS Comput Biol ; 13(7): e1005647, 2017 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-28742097

RESUMO

Genome-wide proximity ligation based assays such as Hi-C have revealed that eukaryotic genomes are organized into structural units called topologically associating domains (TADs). From a visual examination of the chromosomal contact map, however, it is clear that the organization of the domains is not simple or obvious. Instead, TADs exhibit various length scales and, in many cases, a nested arrangement. Here, by exploiting the resemblance between TADs in a chromosomal contact map and densely connected modules in a network, we formulate TAD identification as a network optimization problem and propose an algorithm, MrTADFinder, to identify TADs from intra-chromosomal contact maps. MrTADFinder is based on the network-science concept of modularity. A key component of it is deriving an appropriate background model for contacts in a random chain, by numerically solving a set of matrix equations. The background model preserves the observed coverage of each genomic bin as well as the distance dependence of the contact frequency for any pair of bins exhibited by the empirical map. Also, by introducing a tunable resolution parameter, MrTADFinder provides a self-consistent approach for identifying TADs at different length scales, hence the acronym "Mr" standing for Multiple Resolutions. We then apply MrTADFinder to various Hi-C datasets. The identified domain boundaries are marked by characteristic signatures in chromatin marks and transcription factors (TF) that are consistent with earlier work. Moreover, by calling TADs at different length scales, we observe that boundary signatures change with resolution, with different chromatin features having different characteristic length scales. Furthermore, we report an enrichment of HOT (high-occupancy target) regions near TAD boundaries and investigate the role of different TFs in determining boundaries at various resolutions. To further explore the interplay between TADs and epigenetic marks, as tumor mutational burden is known to be coupled to chromatin structure, we examine how somatic mutations are distributed across boundaries and find a clear stepwise pattern. Overall, MrTADFinder provides a novel computational framework to explore the multi-scale structures in Hi-C contact maps.


Assuntos
Cromatina , Cromossomos , Biologia Computacional/métodos , Modelos Genéticos , Algoritmos , Linhagem Celular , Núcleo Celular/química , Núcleo Celular/genética , Cromatina/química , Cromatina/genética , Cromatina/ultraestrutura , Cromossomos/química , Cromossomos/genética , Cromossomos/ultraestrutura , Genoma/genética , Genoma/fisiologia , Humanos , Ligação Proteica , Fatores de Transcrição/metabolismo
10.
PLoS Comput Biol ; 11(5): e1004269, 2015 May.
Artigo em Inglês | MEDLINE | ID: mdl-25996148

RESUMO

The regulatory architecture of breast cancer is extraordinarily complex and gene misregulation can occur at many levels, with transcriptional malfunction being a major cause. This dysfunctional process typically involves additional regulatory modulators including DNA methylation. Thus, the interplay between transcription factor (TF) binding and DNA methylation are two components of a cancer regulatory interactome presumed to display correlated signals. As proof of concept, we performed a systematic motif-based in silico analysis to infer all potential TFs that are involved in breast cancer prognosis through an association with DNA methylation changes. Using breast cancer DNA methylation and clinical data derived from The Cancer Genome Atlas (TCGA), we carried out a systematic inference of TFs whose misregulation underlie different clinical subtypes of breast cancer. Our analysis identified TFs known to be associated with clinical outcomes of p53 and ER (estrogen receptor) subtypes of breast cancer, while also predicting new TFs that may also be involved. Furthermore, our results suggest that misregulation in breast cancer can be caused by the binding of alternative factors to the binding sites of TFs whose activity has been ablated. Overall, this study provides a comprehensive analysis that links DNA methylation to TF binding to patient prognosis.


Assuntos
Neoplasias da Mama/genética , Metilação de DNA , Regulação Neoplásica da Expressão Gênica , Motivos de Aminoácidos , Sítios de Ligação , Neoplasias da Mama/patologia , Análise por Conglomerados , Ilhas de CpG , DNA de Neoplasias/metabolismo , Feminino , Perfilação da Expressão Gênica , Humanos , Prognóstico , Receptores de Estrogênio/genética , Receptores de Estrogênio/metabolismo , Fatores de Transcrição/metabolismo , Resultado do Tratamento , Proteína Supressora de Tumor p53/metabolismo
11.
bioRxiv ; 2024 Mar 30.
Artigo em Inglês | MEDLINE | ID: mdl-38562822

RESUMO

Single-cell genomics is a powerful tool for studying heterogeneous tissues such as the brain. Yet, little is understood about how genetic variants influence cell-level gene expression. Addressing this, we uniformly processed single-nuclei, multi-omics datasets into a resource comprising >2.8M nuclei from the prefrontal cortex across 388 individuals. For 28 cell types, we assessed population-level variation in expression and chromatin across gene families and drug targets. We identified >550K cell-type-specific regulatory elements and >1.4M single-cell expression-quantitative-trait loci, which we used to build cell-type regulatory and cell-to-cell communication networks. These networks manifest cellular changes in aging and neuropsychiatric disorders. We further constructed an integrative model accurately imputing single-cell expression and simulating perturbations; the model prioritized ~250 disease-risk genes and drug targets with associated cell types.

12.
Science ; 384(6698): eadi5199, 2024 May 24.
Artigo em Inglês | MEDLINE | ID: mdl-38781369

RESUMO

Single-cell genomics is a powerful tool for studying heterogeneous tissues such as the brain. Yet little is understood about how genetic variants influence cell-level gene expression. Addressing this, we uniformly processed single-nuclei, multiomics datasets into a resource comprising >2.8 million nuclei from the prefrontal cortex across 388 individuals. For 28 cell types, we assessed population-level variation in expression and chromatin across gene families and drug targets. We identified >550,000 cell type-specific regulatory elements and >1.4 million single-cell expression quantitative trait loci, which we used to build cell-type regulatory and cell-to-cell communication networks. These networks manifest cellular changes in aging and neuropsychiatric disorders. We further constructed an integrative model accurately imputing single-cell expression and simulating perturbations; the model prioritized ~250 disease-risk genes and drug targets with associated cell types.


Assuntos
Encéfalo , Redes Reguladoras de Genes , Transtornos Mentais , Análise de Célula Única , Humanos , Envelhecimento/genética , Encéfalo/metabolismo , Comunicação Celular/genética , Cromatina/metabolismo , Cromatina/genética , Genômica , Transtornos Mentais/genética , Córtex Pré-Frontal/metabolismo , Córtex Pré-Frontal/fisiologia , Locos de Características Quantitativas
13.
Bioinformatics ; 27(3): 421-2, 2011 Feb 01.
Artigo em Inglês | MEDLINE | ID: mdl-21169377

RESUMO

UNLABELLED: Sequencing reads generated by RNA-sequencing (RNA-seq) must first be mapped back to the genome through alignment before they can be further analyzed. Current fast and memory-saving short-read mappers could give us a quick view of the transcriptome. However, they are neither designed for reads that span across splice junctions nor for repetitive reads, which can be mapped to multiple locations in the genome (multi-reads). Here, we describe a new software package: ABMapper, which is specifically designed for exploring all putative locations of reads that are mapped to splice junctions or repetitive in nature. AVAILABILITY AND IMPLEMENTATION: The software is freely available at: http://abmapper.sourceforge.net/. The software is written in C++ and PERL. It runs on all major platforms and operating systems including Windows, Mac OS X and LINUX.


Assuntos
Genômica/métodos , Alinhamento de Sequência/métodos , Software , Humanos , Splicing de RNA , Transcriptoma
14.
BMC Bioinformatics ; 12 Suppl 5: S2, 2011.
Artigo em Inglês | MEDLINE | ID: mdl-21988959

RESUMO

BACKGROUND: RNA sequencing (RNA-seq) measures gene expression levels and permits splicing analysis. Many existing aligners are capable of mapping millions of sequencing reads onto a reference genome. For reads that can be mapped to multiple positions along the reference genome (multireads), these aligners may either randomly assign them to a location, or discard them altogether. Either way could bias downstream analyses. Meanwhile, challenges remain in the alignment of reads spanning across splice junctions. Existing splicing-aware aligners that rely on the read-count method in identifying junction sites are inevitably affected by sequencing depths. RESULTS: The distance between aligned positions of paired-end (PE) reads or two parts of a spliced read is dependent on the experiment protocol and gene structures. We here proposed a new method that employs an empirical geometric-tail (GT) distribution of intron lengths to make a rational choice in multireads selection and splice-sites detection, according to the aligned distances from PE and sliced reads. CONCLUSIONS: GT models that combine sequence similarity from alignment, and together with the probability of length distribution, could accurately determine the location of both multireads and spliced reads.


Assuntos
Splicing de RNA , Análise de Sequência de RNA/métodos , Animais , Expressão Gênica , Genoma , Humanos , Íntrons , Funções Verossimilhança , Software , Distribuições Estatísticas
15.
BMC Genomics ; 11: 402, 2010 Jun 24.
Artigo em Inglês | MEDLINE | ID: mdl-20576098

RESUMO

BACKGROUND: Thousands of plants and animals possess pharmacological properties and there is an increased interest in using these materials for therapy and health maintenance. Efficacies of the application is critically dependent on the use of genuine materials. For time to time, life-threatening poisoning is found because toxic adulterant or substitute is administered. DNA barcoding provides a definitive means of authentication and for conducting molecular systematics studies. Owing to the reduced cost in DNA authentication, the volume of the DNA barcodes produced for medicinal materials is on the rise and necessitates the development of an integrated DNA database. DESCRIPTION: We have developed an integrated DNA barcode multimedia information platform- Medicinal Materials DNA Barcode Database (MMDBD) for data retrieval and similarity search. MMDBD contains over 1000 species of medicinal materials listed in the Chinese Pharmacopoeia and American Herbal Pharmacopoeia. MMDBD also contains useful information of the medicinal material, including resources, adulterant information, medical parts, photographs, primers used for obtaining the barcodes and key references. MMDBD can be accessed at http://www.cuhk.edu.hk/icm/mmdbd.htm. CONCLUSIONS: This work provides a centralized medicinal materials DNA barcode database and bioinformatics tools for data storage, analysis and exchange for promoting the identification of medicinal materials. MMDBD has the largest collection of DNA barcodes of medicinal materials and is a useful resource for researchers in conservation, systematic study, forensic and herbal industry.


Assuntos
Impressões Digitais de DNA , Bases de Dados de Ácidos Nucleicos , Internet , Farmacologia , Animais , Análise de Sequência de DNA , Software , Interface Usuário-Computador
16.
Genome Biol ; 21(1): 151, 2020 07 30.
Artigo em Inglês | MEDLINE | ID: mdl-32727537

RESUMO

RNA-binding proteins (RBPs) play key roles in post-transcriptional regulation and disease. Their binding sites cover more of the genome than coding exons; nevertheless, most noncoding variant prioritization methods only focus on transcriptional regulation. Here, we integrate the portfolio of ENCODE-RBP experiments to develop RADAR, a variant-scoring framework. RADAR uses conservation, RNA structure, network centrality, and motifs to provide an overall impact score. Then, it further incorporates tissue-specific inputs to highlight disease-specific variants. Our results demonstrate RADAR can successfully pinpoint variants, both somatic and germline, associated with RBP-function dysregulation, which cannot be found by most current prioritization methods, for example, variants affecting splicing.


Assuntos
Genômica/métodos , Processamento Pós-Transcricional do RNA/genética , Proteínas de Ligação a RNA/genética , Software , Neoplasias da Mama/genética , Humanos
17.
Genome Biol ; 21(1): 150, 2020 06 22.
Artigo em Inglês | MEDLINE | ID: mdl-32571363

RESUMO

Sputum induction is a non-invasive method to evaluate the airway environment, particularly for asthma. RNA sequencing (RNA-seq) of sputum samples can be challenging to interpret due to the complex and heterogeneous mixtures of human cells and exogenous (microbial) material. In this study, we develop a pipeline that integrates dimensionality reduction and statistical modeling to grapple with the heterogeneity. LDA(Latent Dirichlet allocation)-link connects microbes to genes using reduced-dimensionality LDA topics. We validate our method with single-cell RNA-seq and microscopy and then apply it to the sputum of asthmatic patients to find known and novel relationships between microbes and genes.


Assuntos
Asma/microbiologia , Biologia Computacional/métodos , Microbiota , Análise de Sequência de RNA , Escarro/química , Asma/genética , Estudos de Casos e Controles , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , Escarro/citologia , Aprendizado de Máquina não Supervisionado
18.
Nat Commun ; 11(1): 3696, 2020 07 29.
Artigo em Inglês | MEDLINE | ID: mdl-32728046

RESUMO

ENCODE comprises thousands of functional genomics datasets, and the encyclopedia covers hundreds of cell types, providing a universal annotation for genome interpretation. However, for particular applications, it may be advantageous to use a customized annotation. Here, we develop such a custom annotation by leveraging advanced assays, such as eCLIP, Hi-C, and whole-genome STARR-seq on a number of data-rich ENCODE cell types. A key aspect of this annotation is comprehensive and experimentally derived networks of both transcription factors and RNA-binding proteins (TFs and RBPs). Cancer, a disease of system-wide dysregulation, is an ideal application for such a network-based annotation. Specifically, for cancer-associated cell types, we put regulators into hierarchies and measure their network change (rewiring) during oncogenesis. We also extensively survey TF-RBP crosstalk, highlighting how SUB1, a previously uncharacterized RBP, drives aberrant tumor expression and amplifies the effect of MYC, a well-known oncogenic TF. Furthermore, we show how our annotation allows us to place oncogenic transformations in the context of a broad cell space; here, many normal-to-tumor transitions move towards a stem-like state, while oncogene knockdowns show an opposing trend. Finally, we organize the resource into a coherent workflow to prioritize key elements and variants, in addition to regulators. We showcase the application of this prioritization to somatic burdening, cancer differential expression and GWAS. Targeted validations of the prioritized regulators, elements and variants using siRNA knockdowns, CRISPR-based editing, and luciferase assays demonstrate the value of the ENCODE resource.


Assuntos
Bases de Dados Genéticas , Genômica , Neoplasias/genética , Linhagem Celular Tumoral , Transformação Celular Neoplásica/genética , Redes Reguladoras de Genes , Humanos , Mutação/genética , Reprodutibilidade dos Testes , Fatores de Transcrição/metabolismo
19.
Structure ; 27(9): 1469-1481.e3, 2019 09 03.
Artigo em Inglês | MEDLINE | ID: mdl-31279629

RESUMO

A key issue in drug design is how population variation affects drug efficacy by altering binding affinity (BA) in different individuals, an essential consideration for government regulators. Ideally, we would like to evaluate the BA perturbations of millions of single-nucleotide variants (SNVs). However, only hundreds of protein-drug complexes with SNVs have experimentally characterized BAs, constituting too small a gold standard for straightforward statistical model training. Thus, we take a hybrid approach: using physically based calculations to bootstrap the parameterization of a full model. In particular, we do 3D structure-based docking on ∼10,000 SNVs modifying known protein-drug complexes to construct a pseudo gold standard. Then we use this augmented set of BAs to train a statistical model combining structure, ligand and sequence features and illustrate how it can be applied to millions of SNVs. Finally, we show that our model has good cross-validated performance (97% AUROC) and can also be validated by orthogonal ligand-binding data.


Assuntos
Biologia Computacional/métodos , Polimorfismo de Nucleotídeo Único , Proteínas/química , Proteínas/genética , Bases de Dados de Proteínas , Desenho de Fármacos , Humanos , Ligantes , Aprendizado de Máquina , Modelos Estatísticos , Simulação de Acoplamento Molecular , Ligação Proteica , Conformação Proteica , Proteínas/metabolismo
20.
Nucleic Acids Res ; 34(Database issue): D664-7, 2006 Jan 01.
Artigo em Inglês | MEDLINE | ID: mdl-16381954

RESUMO

Antisense oligonucleotides (ODNs) technology is one of the important approaches for the sequence-specific knockdown of gene expression. ODNs have been used as research tools in the post-genome era, as well as new types of therapeutic agents. Since finding effective target sites within RNA is a hard work for antisense ODNs design, various experimental methods and computational approaches have been proposed. For better sharing of the experimented and published ODNs, valid and invalid ODNs reported in literatures are screened, collected and stored in AOBase. Till now, approximately 700 ODNs against 46 target mRNAs are contained in AOBase. Entries can be explored via TargetSearch and AOSearch web retrieval interfaces. AOBase can not only be useful in ODNs selection for gene function exploration, but also contribute to mining rules and developing algorithms for rational ODNs design. AOBase is freely accessible via http://www.bioit.org.cn/ao/aobase.


Assuntos
Bases de Dados de Ácidos Nucleicos , Oligonucleotídeos Antissenso/química , Algoritmos , Internet , RNA Mensageiro/química , Software , Interface Usuário-Computador
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA