Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 115
Filtrar
1.
Cell ; 183(6): 1617-1633.e22, 2020 12 10.
Artigo em Inglês | MEDLINE | ID: mdl-33259802

RESUMO

Histone H3.3 glycine 34 to arginine/valine (G34R/V) mutations drive deadly gliomas and show exquisite regional and temporal specificity, suggesting a developmental context permissive to their effects. Here we show that 50% of G34R/V tumors (n = 95) bear activating PDGFRA mutations that display strong selection pressure at recurrence. Although considered gliomas, G34R/V tumors actually arise in GSX2/DLX-expressing interneuron progenitors, where G34R/V mutations impair neuronal differentiation. The lineage of origin may facilitate PDGFRA co-option through a chromatin loop connecting PDGFRA to GSX2 regulatory elements, promoting PDGFRA overexpression and mutation. At the single-cell level, G34R/V tumors harbor dual neuronal/astroglial identity and lack oligodendroglial programs, actively repressed by GSX2/DLX-mediated cell fate specification. G34R/V may become dispensable for tumor maintenance, whereas mutant-PDGFRA is potently oncogenic. Collectively, our results open novel research avenues in deadly tumors. G34R/V gliomas are neuronal malignancies where interneuron progenitors are stalled in differentiation by G34R/V mutations and malignant gliogenesis is promoted by co-option of a potentially targetable pathway, PDGFRA signaling.


Assuntos
Neoplasias Encefálicas/genética , Carcinogênese/genética , Glioma/genética , Histonas/genética , Interneurônios/metabolismo , Mutação/genética , Células-Tronco Neurais/metabolismo , Receptor alfa de Fator de Crescimento Derivado de Plaquetas/genética , Animais , Astrócitos/metabolismo , Astrócitos/patologia , Neoplasias Encefálicas/patologia , Carcinogênese/patologia , Linhagem da Célula , Reprogramação Celular/genética , Cromatina/metabolismo , Embrião de Mamíferos/metabolismo , Epigênese Genética , Regulação Neoplásica da Expressão Gênica , Inativação Gênica , Glioma/patologia , Histonas/metabolismo , Lisina/metabolismo , Camundongos Endogâmicos C57BL , Modelos Biológicos , Gradação de Tumores , Oligodendroglia/metabolismo , Regiões Promotoras Genéticas/genética , Prosencéfalo/embriologia , Receptor alfa de Fator de Crescimento Derivado de Plaquetas/metabolismo , Transcrição Gênica , Transcriptoma/genética
2.
Bioinformatics ; 40(2)2024 02 01.
Artigo em Inglês | MEDLINE | ID: mdl-38291894

RESUMO

MOTIVATION: Up to 75% of the human genome encodes RNAs. The function of many non-coding RNAs relies on their ability to fold into 3D structures. Specifically, nucleotides inside secondary structure loops form non-canonical base pairs that help stabilize complex local 3D structures. These RNA 3D motifs can promote specific interactions with other molecules or serve as catalytic sites. RESULTS: We introduce PERFUMES, a computational pipeline to identify 3D motifs that can be associated with observable features. Given a set of RNA sequences with associated binary experimental measurements, PERFUMES searches for RNA 3D motifs using BayesPairing2 and extracts those that are over-represented in the set of positive sequences. It also conducts a thermodynamics analysis of the structural context that can support the interpretation of the predictions. We illustrate PERFUMES' usage on the SNRPA protein binding site, for which the tool retrieved both previously known binder motifs and new ones. AVAILABILITY AND IMPLEMENTATION: PERFUMES is an open-source Python package (https://jwgitlab.cs.mcgill.ca/arnaud_chol/perfumes).


Assuntos
Perfumes , Humanos , Conformação de Ácido Nucleico , Motivos de Nucleotídeos , Pareamento de Bases , RNA/química
3.
Bioinformatics ; 39(39 Suppl 1): i386-i393, 2023 06 30.
Artigo em Inglês | MEDLINE | ID: mdl-37387127

RESUMO

MOTIVATION: Accurately assessing contacts between DNA fragments inside the nucleus with Hi-C experiment is crucial for understanding the role of 3D genome organization in gene regulation. This challenging task is due in part to the high sequencing depth of Hi-C libraries required to support high-resolution analyses. Most existing Hi-C data are collected with limited sequencing coverage, leading to poor chromatin interaction frequency estimation. Current computational approaches to enhance Hi-C signals focus on the analysis of individual Hi-C datasets of interest, without taking advantage of the facts that (i) several hundred Hi-C contact maps are publicly available and (ii) the vast majority of local spatial organizations are conserved across multiple cell types. RESULTS: Here, we present RefHiC-SR, an attention-based deep learning framework that uses a reference panel of Hi-C datasets to facilitate the enhancement of Hi-C data resolution of a given study sample. We compare RefHiC-SR against tools that do not use reference samples and find that RefHiC-SR outperforms other programs across different cell types, and sequencing depths. It also enables high-accuracy mapping of structures such as loops and topologically associating domains. AVAILABILITY AND IMPLEMENTATION: https://github.com/BlanchetteLab/RefHiC.


Assuntos
Núcleo Celular , Bibliotecas , Cromatina/genética
4.
Bioinformatics ; 38(Suppl 1): i299-i306, 2022 06 24.
Artigo em Inglês | MEDLINE | ID: mdl-35758792

RESUMO

MOTIVATION: The computational prediction of regulatory function associated with a genomic sequence is of utter importance in -omics study, which facilitates our understanding of the underlying mechanisms underpinning the vast gene regulatory network. Prominent examples in this area include the binding prediction of transcription factors in DNA regulatory regions, and predicting RNA-protein interaction in the context of post-transcriptional gene expression. However, existing computational methods have suffered from high false-positive rates and have seldom used any evolutionary information, despite the vast amount of available orthologous data across multitudes of extant and ancestral genomes, which readily present an opportunity to improve the accuracy of existing computational methods. RESULTS: In this study, we present a novel probabilistic approach called PhyloPGM that leverages previously trained TFBS or RNA-RBP binding predictors by aggregating their predictions from various orthologous regions, in order to boost the overall prediction accuracy on human sequences. Throughout our experiments, PhyloPGM has shown significant improvement over baselines such as the sequence-based RNA-RBP binding predictor RNATracker and the sequence-based TFBS predictor that is known as FactorNet. PhyloPGM is simple in principle, easy to implement and yet, yields impressive results. AVAILABILITY AND IMPLEMENTATION: The PhyloPGM package is available at https://github.com/BlanchetteLab/PhyloPGM. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Genômica , Sequências Reguladoras de Ácido Nucleico , DNA , Genômica/métodos , Humanos , RNA , Análise de Sequência de DNA/métodos
5.
BMC Cancer ; 22(1): 1297, 2022 Dec 12.
Artigo em Inglês | MEDLINE | ID: mdl-36503484

RESUMO

BACKGROUND: Juvenile Pilocytic Astrocytomas (JPAs) are one of the most common pediatric brain tumors, and they are driven by aberrant activation of the mitogen-activated protein kinase (MAPK) signaling pathway. RAF-fusions are the most common genetic alterations identified in JPAs, with the prototypical KIAA1549-BRAF fusion leading to loss of BRAF's auto-inhibitory domain and subsequent constitutive kinase activation. JPAs are highly vascular and show pervasive immune infiltration, which can lead to low tumor cell purity in clinical samples. This can result in gene fusions that are difficult to detect with conventional omics approaches including RNA-Seq. METHODS: To this effect, we applied RNA-Seq as well as linked-read whole-genome sequencing and in situ Hi-C as new approaches to detect and characterize low-frequency gene fusions at the genomic, transcriptomic and spatial level. RESULTS: Integration of these datasets allowed the identification and detailed characterization of two novel BRAF fusion partners, PTPRZ1 and TOP2B, in addition to the canonical fusion with partner KIAA1549. Additionally, our Hi-C datasets enabled investigations of 3D genome architecture in JPAs which showed a high level of correlation in 3D compartment annotations between JPAs compared to other pediatric tumors, and high similarity to normal adult astrocytes. We detected interactions between BRAF and its fusion partners exclusively in tumor samples containing BRAF fusions. CONCLUSIONS: We demonstrate the power of integrating multi-omic datasets to identify low frequency fusions and characterize the JPA genome at high resolution. We suggest that linked-reads and Hi-C could be used in clinic for the detection and characterization of JPAs.


Assuntos
Astrocitoma , Neoplasias Encefálicas , Criança , Adulto , Humanos , Multiômica , Proteínas Proto-Oncogênicas B-raf/genética , Proteínas de Fusão Oncogênica/genética , Astrocitoma/patologia , Neoplasias Encefálicas/patologia , Proteínas Tirosina Fosfatases Classe 5 Semelhantes a Receptores
6.
Nucleic Acids Res ; 48(D1): D166-D173, 2020 01 08.
Artigo em Inglês | MEDLINE | ID: mdl-31724725

RESUMO

Protein-RNA interactions are essential for controlling most aspects of RNA metabolism, including synthesis, processing, trafficking, stability and degradation. In vitro selection methods, such as RNAcompete and RNA Bind-n-Seq, have defined the consensus target motifs of hundreds of RNA-binding proteins (RBPs). However, readily available information about the distribution features of these motifs across full transcriptomes was hitherto lacking. Here, we introduce oRNAment (o RNA motifs enrichment in transcriptomes), a database that catalogues the putative motif instances of 223 RBPs, encompassing 453 motifs, in a transcriptome-wide fashion. The database covers 525 718 complete coding and non-coding RNA species across the transcriptomes of human and four prominent model organisms: Caenorhabditis elegans, Danio rerio, Drosophila melanogaster and Mus musculus. The unique features of oRNAment include: (i) hosting of the most comprehensive mapping of RBP motif instances to date, with 421 133 612 putative binding sites described across five species; (ii) options for the user to filter the data according to a specific threshold; (iii) a user-friendly interface and efficient back-end allowing the rapid querying of the data through multiple angles (i.e. transcript, RBP, or sequence attributes) and (iv) generation of several interactive data visualization charts describing the results of user queries. oRNAment is freely available at http://rnabiology.ircm.qc.ca/oRNAment/.


Assuntos
Bases de Dados Genéticas , Proteínas de Ligação a RNA/metabolismo , RNA/química , Animais , Sítios de Ligação , Caenorhabditis elegans/genética , Drosophila melanogaster/genética , Humanos , Camundongos , Motivos de Nucleotídeos , RNA/metabolismo , RNA Mensageiro/química , RNA Mensageiro/metabolismo , Transcriptoma , Peixe-Zebra/genética
7.
Bioinformatics ; 36(Suppl_1): i353-i361, 2020 07 01.
Artigo em Inglês | MEDLINE | ID: mdl-32657367

RESUMO

MOTIVATION: Accurate probabilistic models of sequence evolution are essential for a wide variety of bioinformatics tasks, including sequence alignment and phylogenetic inference. The ability to realistically simulate sequence evolution is also at the core of many benchmarking strategies. Yet, mutational processes have complex context dependencies that remain poorly modeled and understood. RESULTS: We introduce EvoLSTM, a recurrent neural network-based evolution simulator that captures mutational context dependencies. EvoLSTM uses a sequence-to-sequence long short-term memory model trained to predict mutation probabilities at each position of a given sequence, taking into consideration the 14 flanking nucleotides. EvoLSTM can realistically simulate mammalian and plant DNA sequence evolution and reveals unexpectedly strong long-range context dependencies in mutation probabilities. EvoLSTM brings modern machine-learning approaches to bear on sequence evolution. It will serve as a useful tool to study and simulate complex mutational processes. AVAILABILITY AND IMPLEMENTATION: Code and dataset are available at https://github.com/DongjoonLim/EvoLSTM. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Aprendizado de Máquina , Redes Neurais de Computação , Benchmarking , Filogenia , Alinhamento de Sequência , Software
8.
Bioinformatics ; 36(Suppl_1): i276-i284, 2020 07 01.
Artigo em Inglês | MEDLINE | ID: mdl-32657407

RESUMO

MOTIVATION: RNA-protein interactions are key effectors of post-transcriptional regulation. Significant experimental and bioinformatics efforts have been expended on characterizing protein binding mechanisms on the molecular level, and on highlighting the sequence and structural traits of RNA that impact the binding specificity for different proteins. Yet our ability to predict these interactions in silico remains relatively poor. RESULTS: In this study, we introduce RPI-Net, a graph neural network approach for RNA-protein interaction prediction. RPI-Net learns and exploits a graph representation of RNA molecules, yielding significant performance gains over existing state-of-the-art approaches. We also introduce an approach to rectify an important type of sequence bias caused by the RNase T1 enzyme used in many CLIP-Seq experiments, and we show that correcting this bias is essential in order to learn meaningful predictors and properly evaluate their accuracy. Finally, we provide new approaches to interpret the trained models and extract simple, biologically interpretable representations of the learned sequence and structural motifs. AVAILABILITY AND IMPLEMENTATION: Source code can be accessed at https://www.github.com/HarveyYan/RNAonGraph. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Redes Neurais de Computação , RNA , Ligação Proteica , Estrutura Secundária de Proteína , RNA/metabolismo , Software
9.
Bioinformatics ; 36(1): 212-220, 2020 01 01.
Artigo em Inglês | MEDLINE | ID: mdl-31197316

RESUMO

MOTIVATION: The genotype assignment problem consists of predicting, from the genotype of an individual, which of a known set of populations it originated from. The problem arises in a variety of contexts, including wildlife forensics, invasive species detection and biodiversity monitoring. Existing approaches perform well under ideal conditions but are sensitive to a variety of common violations of the assumptions they rely on. RESULTS: In this article, we introduce Mycorrhiza, a machine learning approach for the genotype assignment problem. Our algorithm makes use of phylogenetic networks to engineer features that encode the evolutionary relationships among samples. Those features are then used as input to a Random Forests classifier. The classification accuracy was assessed on multiple published empirical SNP, microsatellite or consensus sequence datasets with wide ranges of size, geographical distribution and population structure and on simulated datasets. It compared favorably against widely used assessment tests or mixture analysis methods such as STRUCTURE and Admixture, and against another machine-learning based approach using principal component analysis for dimensionality reduction. Mycorrhiza yields particularly significant gains on datasets with a large average fixation index (FST) or deviation from the Hardy-Weinberg equilibrium. Moreover, the phylogenetic network approach estimates mixture proportions with good accuracy. AVAILABILITY AND IMPLEMENTATION: Mycorrhiza is released as an easy to use open-source python package at github.com/jgeofil/mycorrhiza. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Algoritmos , Biologia Computacional , Filogenia , Software , Biologia Computacional/métodos , Genótipo , Técnicas de Genotipagem , Aprendizado de Máquina
10.
Bioinformatics ; 36(Suppl_2): i895-i902, 2020 12 30.
Artigo em Inglês | MEDLINE | ID: mdl-33381838

RESUMO

MOTIVATION: The ability to develop robust machine-learning (ML) models is considered imperative to the adoption of ML techniques in biology and medicine fields. This challenge is particularly acute when data available for training is not independent and identically distributed (iid), in which case trained models are vulnerable to out-of-distribution generalization problems. Of particular interest are problems where data correspond to observations made on phylogenetically related samples (e.g. antibiotic resistance data). RESULTS: We introduce DendroNet, a new approach to train neural networks in the context of evolutionary data. DendroNet explicitly accounts for the relatedness of the training/testing data, while allowing the model to evolve along the branches of the phylogenetic tree, hence accommodating potential changes in the rules that relate genotypes to phenotypes. Using simulated data, we demonstrate that DendroNet produces models that can be significantly better than non-phylogenetically aware approaches. DendroNet also outperforms other approaches at two biological tasks of significant practical importance: antiobiotic resistance prediction in bacteria and trophic level prediction in fungi. AVAILABILITY AND IMPLEMENTATION: https://github.com/BlanchetteLab/DendroNet.


Assuntos
Aprendizado de Máquina , Redes Neurais de Computação , Filogenia , Aprendizado de Máquina Supervisionado
11.
J Proteome Res ; 19(1): 18-27, 2020 01 03.
Artigo em Inglês | MEDLINE | ID: mdl-31738558

RESUMO

The PAQosome is an 11-subunit chaperone involved in the biogenesis of several human protein complexes. We show that ASDURF, a recently discovered upstream open reading frame (uORF) in the 5' UTR of ASNSD1 mRNA, encodes the 12th subunit of the PAQosome. ASDURF displays significant structural homology to ß-prefoldins and assembles with the five known subunits of the prefoldin-like module of the PAQosome to form a heterohexameric prefoldin-like complex. A model of the PAQosome prefoldin-like module is presented. The data presented here provide an example of a eukaryotic uORF-encoded polypeptide whose function is not limited to cis-acting translational regulation of downstream coding sequence and highlights the importance of including alternative ORF products in proteomic studies.


Assuntos
Chaperonas Moleculares , Proteômica , Humanos , Chaperonas Moleculares/genética , Fases de Leitura Aberta
12.
Mol Biol Evol ; 36(4): 766-783, 2019 04 01.
Artigo em Inglês | MEDLINE | ID: mdl-30698742

RESUMO

Genetic code deviations involving stop codons have been previously reported in mitochondrial genomes of several green plants (Viridiplantae), most notably chlorophyte algae (Chlorophyta). However, as changes in codon recognition from one amino acid to another are more difficult to infer, such changes might have gone unnoticed in particular lineages with high evolutionary rates that are otherwise prone to codon reassignments. To gain further insight into the evolution of the mitochondrial genetic code in green plants, we have conducted an in-depth study across mtDNAs from 51 green plants (32 chlorophytes and 19 streptophytes). Besides confirming known stop-to-sense reassignments, our study documents the first cases of sense-to-sense codon reassignments in Chlorophyta mtDNAs. In several Sphaeropleales, we report the decoding of AGG codons (normally arginine) as alanine, by tRNA(CCU) of various origins that carry the recognition signature for alanine tRNA synthetase. In Chromochloris, we identify tRNA variants decoding AGG as methionine and the synonymous codon CGG as leucine. Finally, we find strong evidence supporting the decoding of AUA codons (normally isoleucine) as methionine in Pycnococcus. Our results rely on a recently developed conceptual framework (CoreTracker) that predicts codon reassignments based on the disparity between DNA sequence (codons) and the derived protein sequence. These predictions are then validated by an evaluation of tRNA phylogeny, to identify the evolution of new tRNAs via gene duplication and loss, and structural modifications that lead to the assignment of new tRNA identities and a change in the genetic code.


Assuntos
Clorófitas/genética , Evolução Molecular , Código Genético , Genoma Mitocondrial , Filogenia , RNA de Transferência/genética
13.
RNA ; 24(1): 98-113, 2018 01.
Artigo em Inglês | MEDLINE | ID: mdl-29079635

RESUMO

Cells are highly asymmetrical, a feature that relies on the sorting of molecular constituents, including proteins, lipids, and nucleic acids, to distinct subcellular locales. The localization of RNA molecules is an important layer of gene regulation required to modulate localized cellular activities, although its global prevalence remains unclear. We combine biochemical cell fractionation with RNA-sequencing (CeFra-seq) analysis to assess the prevalence and conservation of RNA asymmetric distribution on a transcriptome-wide scale in Drosophila and human cells. This approach reveals that the majority (∼80%) of cellular RNA species are asymmetrically distributed, whether considering coding or noncoding transcript populations, in patterns that are broadly conserved evolutionarily. Notably, a large number of Drosophila and human long noncoding RNAs and circular RNAs display enriched levels within specific cytoplasmic compartments, suggesting that these RNAs fulfill extra-nuclear functions. Moreover, fraction-specific mRNA populations exhibit distinctive sequence characteristics. Comparative analysis of mRNA fractionation profiles with that of their encoded proteins reveals a general lack of correlation in subcellular distribution, marked by strong cases of asymmetry. However, coincident distribution profiles are observed for mRNA/protein pairs related to a variety of functional protein modules, suggesting complex regulatory inputs of RNA localization to cellular organization.


Assuntos
RNA Mensageiro/genética , RNA não Traduzido/genética , Animais , Proteínas de Drosophila/genética , Proteínas de Drosophila/metabolismo , Drosophila melanogaster , Células Hep G2 , Humanos , Transporte Proteico , Transporte de RNA , RNA de Cadeia Dupla/genética , RNA de Cadeia Dupla/metabolismo , RNA Mensageiro/metabolismo , RNA não Traduzido/metabolismo , Especificidade da Espécie
14.
Bioinformatics ; 35(14): i117-i126, 2019 07 15.
Artigo em Inglês | MEDLINE | ID: mdl-31510664

RESUMO

MOTIVATION: Genome rearrangements drastically change gene order along great stretches of a chromosome. There has been initial evidence that these apparently non-local events in the 1D sense may have breakpoints that are close in the 3D sense. We harness the power of the Double Cut and Join model of genome rearrangement, along with Hi-C chromosome conformation capture data to test this hypothesis between human and mouse. RESULTS: We devise novel statistical tests that show that indeed, rearrangement scenarios that transform the human into the mouse gene order are enriched for pairs of breakpoints that have frequent chromosome interactions. This is observed for both intra-chromosomal breakpoint pairs, as well as for inter-chromosomal pairs. For intra-chromosomal rearrangements, the enrichment exists from close (<20 Mb) to very distant (100 Mb) pairs. Further, the pattern exists across multiple cell lines in Hi-C data produced by different laboratories and at different stages of the cell cycle. We show that similarities in the contact frequencies between these many experiments contribute to the enrichment. We conclude that either (i) rearrangements usually involve breakpoints that are spatially close or (ii) there is selection against rearrangements that act on spatially distant breakpoints. AVAILABILITY AND IMPLEMENTATION: Our pipeline is freely available at https://bitbucket.org/thekswenson/locality. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Cromatina , Genoma , Software , Animais , Ciclo Celular , Pontos de Quebra do Cromossomo , Cromossomos , Humanos , Mamíferos , Camundongos
15.
Bioinformatics ; 35(14): i333-i342, 2019 07 15.
Artigo em Inglês | MEDLINE | ID: mdl-31510698

RESUMO

MOTIVATION: Messenger RNA subcellular localization mechanisms play a crucial role in post-transcriptional gene regulation. This trafficking is mediated by trans-acting RNA-binding proteins interacting with cis-regulatory elements called zipcodes. While new sequencing-based technologies allow the high-throughput identification of RNAs localized to specific subcellular compartments, the precise mechanisms at play, and their dependency on specific sequence elements, remain poorly understood. RESULTS: We introduce RNATracker, a novel deep neural network built to predict, from their sequence alone, the distributions of mRNA transcripts over a predefined set of subcellular compartments. RNATracker integrates several state-of-the-art deep learning techniques (e.g. CNN, LSTM and attention layers) and can make use of both sequence and secondary structure information. We report on a variety of evaluations showing RNATracker's strong predictive power, which is significantly superior to a variety of baseline predictors. Despite its complexity, several aspects of the model can be isolated to yield valuable, testable mechanistic hypotheses, and to locate candidate zipcode sequences within transcripts. AVAILABILITY AND IMPLEMENTATION: Code and data can be accessed at https://www.github.com/HarveyYan/RNATracker. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Redes Neurais de Computação , Aprendizado Profundo , Estrutura Secundária de Proteína , RNA Mensageiro
16.
Genet Epidemiol ; 42(3): 233-249, 2018 04.
Artigo em Inglês | MEDLINE | ID: mdl-29423954

RESUMO

Predicting a phenotype and understanding which variables improve that prediction are two very challenging and overlapping problems in the analysis of high-dimensional (HD) data such as those arising from genomic and brain imaging studies. It is often believed that the number of truly important predictors is small relative to the total number of variables, making computational approaches to variable selection and dimension reduction extremely important. To reduce dimensionality, commonly used two-step methods first cluster the data in some way, and build models using cluster summaries to predict the phenotype. It is known that important exposure variables can alter correlation patterns between clusters of HD variables, that is, alter network properties of the variables. However, it is not well understood whether such altered clustering is informative in prediction. Here, assuming there is a binary exposure with such network-altering effects, we explore whether the use of exposure-dependent clustering relationships in dimension reduction can improve predictive modeling in a two-step framework. Hence, we propose a modeling framework called ECLUST to test this hypothesis, and evaluate its performance through extensive simulations. With ECLUST, we found improved prediction and variable selection performance compared to methods that do not consider the environment in the clustering step, or to methods that use the original data as features. We further illustrate this modeling framework through the analysis of three data sets from very different fields, each with HD data, a binary exposure, and a phenotype of interest. Our method is available in the eclust CRAN package.


Assuntos
Doença/genética , Modelos Genéticos , Adolescente , Algoritmos , Criança , Pré-Escolar , Análise por Conglomerados , Simulação por Computador , Bases de Dados como Assunto , Epigênese Genética , Regulação da Expressão Gênica , Humanos , Imageamento por Ressonância Magnética
17.
BMC Genomics ; 20(1): 162, 2019 Feb 28.
Artigo em Inglês | MEDLINE | ID: mdl-30819105

RESUMO

BACKGROUND: Understanding how transcription occurs requires the integration of genome-wide and locus-specific information gleaned from robust technologies. Chromatin immunoprecipitation (ChIP) is a staple in gene expression studies, and while genome-wide methods are available, high-throughput approaches to analyze defined regions are lacking. RESULTS: Here, we present carbon copy-ChIP (2C-ChIP), a versatile, inexpensive, and high-throughput technique to quantitatively measure the abundance of DNA sequences in ChIP samples. This method combines ChIP with ligation-mediated amplification (LMA) and deep sequencing to probe large genomic regions of interest. 2C-ChIP recapitulates results from benchmark ChIP approaches. We applied 2C-ChIP to the HOXA cluster to find that a region where H3K27me3 and SUZ12 linger encodes HOXA-AS2, a long non-coding RNA that enhances gene expression during cellular differentiation. CONCLUSIONS: 2C-ChIP fills the need for a robust molecular biology tool designed to probe dedicated genomic regions in a high-throughput setting. The flexible nature of the 2C-ChIP approach allows rapid changes in experimental design at relatively low cost, making it a highly efficient method for chromatin analysis.


Assuntos
Imunoprecipitação da Cromatina/métodos , Sequenciamento de Nucleotídeos em Larga Escala , Análise de Sequência de DNA , Diferenciação Celular/genética , Células Cultivadas , Epigênese Genética , Expressão Gênica , Genes Homeobox , Genômica , Humanos , RNA Longo não Codificante/fisiologia , Reação em Cadeia da Polimerase em Tempo Real
18.
Nucleic Acids Res ; 45(6): 2994-3005, 2017 04 07.
Artigo em Inglês | MEDLINE | ID: mdl-28334773

RESUMO

Topologically associating domains (TADs) have been proposed to be the basic unit of chromosome folding and have been shown to play key roles in genome organization and gene regulation. Several different tools are available for TAD prediction, but their properties have never been thoroughly assessed. In this manuscript, we compare the output of seven different TAD prediction tools on two published Hi-C data sets. TAD predictions varied greatly between tools in number, size distribution and other biological properties. Assessed against a manual annotation of TADs, individual TAD boundary predictions were found to be quite reliable, but their assembly into complete TAD structures was much less so. In addition, many tools were sensitive to sequencing depth and resolution of the interaction frequency matrix. This manuscript provides users and designers of TAD prediction tools with information that will help guide the choice of tools and the interpretation of their predictions.


Assuntos
Cromossomos/química , Sequenciamento de Nucleotídeos em Larga Escala , Análise de Sequência de DNA , Software , Algoritmos , Sítios de Ligação , Fator de Ligação a CCCTC , Humanos , Proteínas Repressoras/metabolismo
19.
Nucleic Acids Res ; 45(2): 556-566, 2017 01 25.
Artigo em Inglês | MEDLINE | ID: mdl-27899600

RESUMO

MicroRNAs (miRNA) are short single-stranded RNA molecules derived from hairpin-forming precursors that play a crucial role as post-transcriptional regulators in eukaryotes and viruses. In the past years, many microRNA target genes (MTGs) have been identified experimentally. However, because of the high costs of experimental approaches, target genes databases remain incomplete. Although several target prediction programs have been developed in the recent years to identify MTGs in silico, their specificity and sensitivity remain low. Here, we propose a new approach called MirAncesTar, which uses ancestral genome reconstruction to boost the accuracy of existing MTGs prediction tools for human miRNAs. For each miRNA and each putative human target UTR, our algorithm makes uses of existing prediction tools to identify putative target sites in the human UTR, as well as in its mammalian orthologs and inferred ancestral sequences. It then evaluates evidence in support of selective pressure to maintain target site counts (rather than sequences), accounting for the possibility of target site turnover. It finally integrates this measure with several simpler ones using a logistic regression predictor. MirAncesTar improves the accuracy of existing MTG predictors by 26% to 157%. Source code and prediction results for human miRNAs, as well as supporting evolutionary data are available at http://cs.mcgill.ca/∼blanchem/mirancestar.


Assuntos
Biologia Computacional/métodos , MicroRNAs/genética , Interferência de RNA , RNA Mensageiro/genética , Algoritmos , Animais , Sítios de Ligação , Simulação por Computador , Humanos , MicroRNAs/química , RNA Mensageiro/química
20.
Nucleic Acids Res ; 45(18): 10415-10427, 2017 Oct 13.
Artigo em Inglês | MEDLINE | ID: mdl-28977652

RESUMO

Biological networks are rich representations of the relationships between entities such as genes or proteins and have become increasingly complete thanks to various high-throughput network mapping experimental approaches. Here, we propose a method to use such networks to guide the search for functional sequence motifs. Specifically, we introduce Local Enrichment of Sequence Motifs in biological Networks (LESMoN), an enumerative motif discovery algorithm that identifies 5' untranslated region (UTR) sequence motifs whose associated proteins form unexpectedly dense clusters in a given biological network. When applied to the human protein-protein interaction network from BioGRID, LESMoN identifies several highly significant 5' UTR sequence motifs, including both previously known motifs and uncharacterized ones. The vast majority of these motifs are evolutionary conserved and the genes containing them are significantly enriched for various gene ontology terms suggesting new associations between 5' UTR motifs and a number of biological processes. We validate in vivo the role in protein expression regulation of three motifs identified by LESMoN.


Assuntos
Regiões 5' não Traduzidas/genética , Algoritmos , Biologia Computacional/métodos , Regulação da Expressão Gênica , Redes Reguladoras de Genes , Elementos Reguladores de Transcrição , Sítios de Ligação/genética , Ontologia Genética , Estudos de Associação Genética , Humanos , Mutação , Mapas de Interação de Proteínas/genética , Fatores de Transcrição/metabolismo
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA