Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 116
Filtrar
Más filtros

Banco de datos
Tipo del documento
Intervalo de año de publicación
1.
Cell ; 2024 Jul 04.
Artículo en Inglés | MEDLINE | ID: mdl-38986619

RESUMEN

Posterior fossa group A (PFA) ependymoma is a lethal brain cancer diagnosed in infants and young children. The lack of driver events in the PFA linear genome led us to search its 3D genome for characteristic features. Here, we reconstructed 3D genomes from diverse childhood tumor types and uncovered a global topology in PFA that is highly reminiscent of stem and progenitor cells in a variety of human tissues. A remarkable feature exclusively present in PFA are type B ultra long-range interactions in PFAs (TULIPs), regions separated by great distances along the linear genome that interact with each other in the 3D nuclear space with surprising strength. TULIPs occur in all PFA samples and recur at predictable genomic coordinates, and their formation is induced by expression of EZHIP. The universality of TULIPs across PFA samples suggests a conservation of molecular principles that could be exploited therapeutically.

2.
Cell ; 183(6): 1617-1633.e22, 2020 12 10.
Artículo en Inglés | MEDLINE | ID: mdl-33259802

RESUMEN

Histone H3.3 glycine 34 to arginine/valine (G34R/V) mutations drive deadly gliomas and show exquisite regional and temporal specificity, suggesting a developmental context permissive to their effects. Here we show that 50% of G34R/V tumors (n = 95) bear activating PDGFRA mutations that display strong selection pressure at recurrence. Although considered gliomas, G34R/V tumors actually arise in GSX2/DLX-expressing interneuron progenitors, where G34R/V mutations impair neuronal differentiation. The lineage of origin may facilitate PDGFRA co-option through a chromatin loop connecting PDGFRA to GSX2 regulatory elements, promoting PDGFRA overexpression and mutation. At the single-cell level, G34R/V tumors harbor dual neuronal/astroglial identity and lack oligodendroglial programs, actively repressed by GSX2/DLX-mediated cell fate specification. G34R/V may become dispensable for tumor maintenance, whereas mutant-PDGFRA is potently oncogenic. Collectively, our results open novel research avenues in deadly tumors. G34R/V gliomas are neuronal malignancies where interneuron progenitors are stalled in differentiation by G34R/V mutations and malignant gliogenesis is promoted by co-option of a potentially targetable pathway, PDGFRA signaling.


Asunto(s)
Neoplasias Encefálicas/genética , Carcinogénesis/genética , Glioma/genética , Histonas/genética , Interneuronas/metabolismo , Mutación/genética , Células-Madre Neurales/metabolismo , Receptor alfa de Factor de Crecimiento Derivado de Plaquetas/genética , Animales , Astrocitos/metabolismo , Astrocitos/patología , Neoplasias Encefálicas/patología , Carcinogénesis/patología , Linaje de la Célula , Reprogramación Celular/genética , Cromatina/metabolismo , Embrión de Mamíferos/metabolismo , Epigénesis Genética , Regulación Neoplásica de la Expresión Génica , Silenciador del Gen , Glioma/patología , Histonas/metabolismo , Lisina/metabolismo , Ratones Endogámicos C57BL , Modelos Biológicos , Clasificación del Tumor , Oligodendroglía/metabolismo , Regiones Promotoras Genéticas/genética , Prosencéfalo/embriología , Receptor alfa de Factor de Crecimiento Derivado de Plaquetas/metabolismo , Transcripción Genética , Transcriptoma/genética
3.
Bioinformatics ; 40(2)2024 02 01.
Artículo en Inglés | MEDLINE | ID: mdl-38291894

RESUMEN

MOTIVATION: Up to 75% of the human genome encodes RNAs. The function of many non-coding RNAs relies on their ability to fold into 3D structures. Specifically, nucleotides inside secondary structure loops form non-canonical base pairs that help stabilize complex local 3D structures. These RNA 3D motifs can promote specific interactions with other molecules or serve as catalytic sites. RESULTS: We introduce PERFUMES, a computational pipeline to identify 3D motifs that can be associated with observable features. Given a set of RNA sequences with associated binary experimental measurements, PERFUMES searches for RNA 3D motifs using BayesPairing2 and extracts those that are over-represented in the set of positive sequences. It also conducts a thermodynamics analysis of the structural context that can support the interpretation of the predictions. We illustrate PERFUMES' usage on the SNRPA protein binding site, for which the tool retrieved both previously known binder motifs and new ones. AVAILABILITY AND IMPLEMENTATION: PERFUMES is an open-source Python package (https://jwgitlab.cs.mcgill.ca/arnaud_chol/perfumes).


Asunto(s)
Perfumes , Humanos , Conformación de Ácido Nucleico , Motivos de Nucleótidos , Emparejamiento Base , ARN/química
4.
Bioinformatics ; 39(39 Suppl 1): i386-i393, 2023 06 30.
Artículo en Inglés | MEDLINE | ID: mdl-37387127

RESUMEN

MOTIVATION: Accurately assessing contacts between DNA fragments inside the nucleus with Hi-C experiment is crucial for understanding the role of 3D genome organization in gene regulation. This challenging task is due in part to the high sequencing depth of Hi-C libraries required to support high-resolution analyses. Most existing Hi-C data are collected with limited sequencing coverage, leading to poor chromatin interaction frequency estimation. Current computational approaches to enhance Hi-C signals focus on the analysis of individual Hi-C datasets of interest, without taking advantage of the facts that (i) several hundred Hi-C contact maps are publicly available and (ii) the vast majority of local spatial organizations are conserved across multiple cell types. RESULTS: Here, we present RefHiC-SR, an attention-based deep learning framework that uses a reference panel of Hi-C datasets to facilitate the enhancement of Hi-C data resolution of a given study sample. We compare RefHiC-SR against tools that do not use reference samples and find that RefHiC-SR outperforms other programs across different cell types, and sequencing depths. It also enables high-accuracy mapping of structures such as loops and topologically associating domains. AVAILABILITY AND IMPLEMENTATION: https://github.com/BlanchetteLab/RefHiC.


Asunto(s)
Núcleo Celular , Bibliotecas , Cromatina/genética
5.
Bioinformatics ; 38(Suppl 1): i299-i306, 2022 06 24.
Artículo en Inglés | MEDLINE | ID: mdl-35758792

RESUMEN

MOTIVATION: The computational prediction of regulatory function associated with a genomic sequence is of utter importance in -omics study, which facilitates our understanding of the underlying mechanisms underpinning the vast gene regulatory network. Prominent examples in this area include the binding prediction of transcription factors in DNA regulatory regions, and predicting RNA-protein interaction in the context of post-transcriptional gene expression. However, existing computational methods have suffered from high false-positive rates and have seldom used any evolutionary information, despite the vast amount of available orthologous data across multitudes of extant and ancestral genomes, which readily present an opportunity to improve the accuracy of existing computational methods. RESULTS: In this study, we present a novel probabilistic approach called PhyloPGM that leverages previously trained TFBS or RNA-RBP binding predictors by aggregating their predictions from various orthologous regions, in order to boost the overall prediction accuracy on human sequences. Throughout our experiments, PhyloPGM has shown significant improvement over baselines such as the sequence-based RNA-RBP binding predictor RNATracker and the sequence-based TFBS predictor that is known as FactorNet. PhyloPGM is simple in principle, easy to implement and yet, yields impressive results. AVAILABILITY AND IMPLEMENTATION: The PhyloPGM package is available at https://github.com/BlanchetteLab/PhyloPGM. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Genómica , Secuencias Reguladoras de Ácidos Nucleicos , ADN , Genómica/métodos , Humanos , ARN , Análisis de Secuencia de ADN/métodos
6.
BMC Cancer ; 22(1): 1297, 2022 Dec 12.
Artículo en Inglés | MEDLINE | ID: mdl-36503484

RESUMEN

BACKGROUND: Juvenile Pilocytic Astrocytomas (JPAs) are one of the most common pediatric brain tumors, and they are driven by aberrant activation of the mitogen-activated protein kinase (MAPK) signaling pathway. RAF-fusions are the most common genetic alterations identified in JPAs, with the prototypical KIAA1549-BRAF fusion leading to loss of BRAF's auto-inhibitory domain and subsequent constitutive kinase activation. JPAs are highly vascular and show pervasive immune infiltration, which can lead to low tumor cell purity in clinical samples. This can result in gene fusions that are difficult to detect with conventional omics approaches including RNA-Seq. METHODS: To this effect, we applied RNA-Seq as well as linked-read whole-genome sequencing and in situ Hi-C as new approaches to detect and characterize low-frequency gene fusions at the genomic, transcriptomic and spatial level. RESULTS: Integration of these datasets allowed the identification and detailed characterization of two novel BRAF fusion partners, PTPRZ1 and TOP2B, in addition to the canonical fusion with partner KIAA1549. Additionally, our Hi-C datasets enabled investigations of 3D genome architecture in JPAs which showed a high level of correlation in 3D compartment annotations between JPAs compared to other pediatric tumors, and high similarity to normal adult astrocytes. We detected interactions between BRAF and its fusion partners exclusively in tumor samples containing BRAF fusions. CONCLUSIONS: We demonstrate the power of integrating multi-omic datasets to identify low frequency fusions and characterize the JPA genome at high resolution. We suggest that linked-reads and Hi-C could be used in clinic for the detection and characterization of JPAs.


Asunto(s)
Astrocitoma , Neoplasias Encefálicas , Niño , Adulto , Humanos , Multiómica , Proteínas Proto-Oncogénicas B-raf/genética , Proteínas de Fusión Oncogénica/genética , Astrocitoma/patología , Neoplasias Encefálicas/patología , Proteínas Tirosina Fosfatasas Clase 5 Similares a Receptores
7.
Nucleic Acids Res ; 48(D1): D166-D173, 2020 01 08.
Artículo en Inglés | MEDLINE | ID: mdl-31724725

RESUMEN

Protein-RNA interactions are essential for controlling most aspects of RNA metabolism, including synthesis, processing, trafficking, stability and degradation. In vitro selection methods, such as RNAcompete and RNA Bind-n-Seq, have defined the consensus target motifs of hundreds of RNA-binding proteins (RBPs). However, readily available information about the distribution features of these motifs across full transcriptomes was hitherto lacking. Here, we introduce oRNAment (o RNA motifs enrichment in transcriptomes), a database that catalogues the putative motif instances of 223 RBPs, encompassing 453 motifs, in a transcriptome-wide fashion. The database covers 525 718 complete coding and non-coding RNA species across the transcriptomes of human and four prominent model organisms: Caenorhabditis elegans, Danio rerio, Drosophila melanogaster and Mus musculus. The unique features of oRNAment include: (i) hosting of the most comprehensive mapping of RBP motif instances to date, with 421 133 612 putative binding sites described across five species; (ii) options for the user to filter the data according to a specific threshold; (iii) a user-friendly interface and efficient back-end allowing the rapid querying of the data through multiple angles (i.e. transcript, RBP, or sequence attributes) and (iv) generation of several interactive data visualization charts describing the results of user queries. oRNAment is freely available at http://rnabiology.ircm.qc.ca/oRNAment/.


Asunto(s)
Bases de Datos Genéticas , Proteínas de Unión al ARN/metabolismo , ARN/química , Animales , Sitios de Unión , Caenorhabditis elegans/genética , Drosophila melanogaster/genética , Humanos , Ratones , Motivos de Nucleótidos , ARN/metabolismo , ARN Mensajero/química , ARN Mensajero/metabolismo , Transcriptoma , Pez Cebra/genética
8.
Bioinformatics ; 36(Suppl_1): i353-i361, 2020 07 01.
Artículo en Inglés | MEDLINE | ID: mdl-32657367

RESUMEN

MOTIVATION: Accurate probabilistic models of sequence evolution are essential for a wide variety of bioinformatics tasks, including sequence alignment and phylogenetic inference. The ability to realistically simulate sequence evolution is also at the core of many benchmarking strategies. Yet, mutational processes have complex context dependencies that remain poorly modeled and understood. RESULTS: We introduce EvoLSTM, a recurrent neural network-based evolution simulator that captures mutational context dependencies. EvoLSTM uses a sequence-to-sequence long short-term memory model trained to predict mutation probabilities at each position of a given sequence, taking into consideration the 14 flanking nucleotides. EvoLSTM can realistically simulate mammalian and plant DNA sequence evolution and reveals unexpectedly strong long-range context dependencies in mutation probabilities. EvoLSTM brings modern machine-learning approaches to bear on sequence evolution. It will serve as a useful tool to study and simulate complex mutational processes. AVAILABILITY AND IMPLEMENTATION: Code and dataset are available at https://github.com/DongjoonLim/EvoLSTM. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Aprendizaje Automático , Redes Neurales de la Computación , Benchmarking , Filogenia , Alineación de Secuencia , Programas Informáticos
9.
Bioinformatics ; 36(Suppl_1): i276-i284, 2020 07 01.
Artículo en Inglés | MEDLINE | ID: mdl-32657407

RESUMEN

MOTIVATION: RNA-protein interactions are key effectors of post-transcriptional regulation. Significant experimental and bioinformatics efforts have been expended on characterizing protein binding mechanisms on the molecular level, and on highlighting the sequence and structural traits of RNA that impact the binding specificity for different proteins. Yet our ability to predict these interactions in silico remains relatively poor. RESULTS: In this study, we introduce RPI-Net, a graph neural network approach for RNA-protein interaction prediction. RPI-Net learns and exploits a graph representation of RNA molecules, yielding significant performance gains over existing state-of-the-art approaches. We also introduce an approach to rectify an important type of sequence bias caused by the RNase T1 enzyme used in many CLIP-Seq experiments, and we show that correcting this bias is essential in order to learn meaningful predictors and properly evaluate their accuracy. Finally, we provide new approaches to interpret the trained models and extract simple, biologically interpretable representations of the learned sequence and structural motifs. AVAILABILITY AND IMPLEMENTATION: Source code can be accessed at https://www.github.com/HarveyYan/RNAonGraph. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Redes Neurales de la Computación , ARN , Unión Proteica , Estructura Secundaria de Proteína , ARN/metabolismo , Programas Informáticos
10.
Bioinformatics ; 36(1): 212-220, 2020 01 01.
Artículo en Inglés | MEDLINE | ID: mdl-31197316

RESUMEN

MOTIVATION: The genotype assignment problem consists of predicting, from the genotype of an individual, which of a known set of populations it originated from. The problem arises in a variety of contexts, including wildlife forensics, invasive species detection and biodiversity monitoring. Existing approaches perform well under ideal conditions but are sensitive to a variety of common violations of the assumptions they rely on. RESULTS: In this article, we introduce Mycorrhiza, a machine learning approach for the genotype assignment problem. Our algorithm makes use of phylogenetic networks to engineer features that encode the evolutionary relationships among samples. Those features are then used as input to a Random Forests classifier. The classification accuracy was assessed on multiple published empirical SNP, microsatellite or consensus sequence datasets with wide ranges of size, geographical distribution and population structure and on simulated datasets. It compared favorably against widely used assessment tests or mixture analysis methods such as STRUCTURE and Admixture, and against another machine-learning based approach using principal component analysis for dimensionality reduction. Mycorrhiza yields particularly significant gains on datasets with a large average fixation index (FST) or deviation from the Hardy-Weinberg equilibrium. Moreover, the phylogenetic network approach estimates mixture proportions with good accuracy. AVAILABILITY AND IMPLEMENTATION: Mycorrhiza is released as an easy to use open-source python package at github.com/jgeofil/mycorrhiza. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Algoritmos , Biología Computacional , Filogenia , Programas Informáticos , Biología Computacional/métodos , Genotipo , Técnicas de Genotipaje , Aprendizaje Automático
11.
Bioinformatics ; 36(Suppl_2): i895-i902, 2020 12 30.
Artículo en Inglés | MEDLINE | ID: mdl-33381838

RESUMEN

MOTIVATION: The ability to develop robust machine-learning (ML) models is considered imperative to the adoption of ML techniques in biology and medicine fields. This challenge is particularly acute when data available for training is not independent and identically distributed (iid), in which case trained models are vulnerable to out-of-distribution generalization problems. Of particular interest are problems where data correspond to observations made on phylogenetically related samples (e.g. antibiotic resistance data). RESULTS: We introduce DendroNet, a new approach to train neural networks in the context of evolutionary data. DendroNet explicitly accounts for the relatedness of the training/testing data, while allowing the model to evolve along the branches of the phylogenetic tree, hence accommodating potential changes in the rules that relate genotypes to phenotypes. Using simulated data, we demonstrate that DendroNet produces models that can be significantly better than non-phylogenetically aware approaches. DendroNet also outperforms other approaches at two biological tasks of significant practical importance: antiobiotic resistance prediction in bacteria and trophic level prediction in fungi. AVAILABILITY AND IMPLEMENTATION: https://github.com/BlanchetteLab/DendroNet.


Asunto(s)
Aprendizaje Automático , Redes Neurales de la Computación , Filogenia , Aprendizaje Automático Supervisado
12.
J Proteome Res ; 19(1): 18-27, 2020 01 03.
Artículo en Inglés | MEDLINE | ID: mdl-31738558

RESUMEN

The PAQosome is an 11-subunit chaperone involved in the biogenesis of several human protein complexes. We show that ASDURF, a recently discovered upstream open reading frame (uORF) in the 5' UTR of ASNSD1 mRNA, encodes the 12th subunit of the PAQosome. ASDURF displays significant structural homology to ß-prefoldins and assembles with the five known subunits of the prefoldin-like module of the PAQosome to form a heterohexameric prefoldin-like complex. A model of the PAQosome prefoldin-like module is presented. The data presented here provide an example of a eukaryotic uORF-encoded polypeptide whose function is not limited to cis-acting translational regulation of downstream coding sequence and highlights the importance of including alternative ORF products in proteomic studies.


Asunto(s)
Chaperonas Moleculares , Proteómica , Humanos , Chaperonas Moleculares/genética , Sistemas de Lectura Abierta
13.
Mol Biol Evol ; 36(4): 766-783, 2019 04 01.
Artículo en Inglés | MEDLINE | ID: mdl-30698742

RESUMEN

Genetic code deviations involving stop codons have been previously reported in mitochondrial genomes of several green plants (Viridiplantae), most notably chlorophyte algae (Chlorophyta). However, as changes in codon recognition from one amino acid to another are more difficult to infer, such changes might have gone unnoticed in particular lineages with high evolutionary rates that are otherwise prone to codon reassignments. To gain further insight into the evolution of the mitochondrial genetic code in green plants, we have conducted an in-depth study across mtDNAs from 51 green plants (32 chlorophytes and 19 streptophytes). Besides confirming known stop-to-sense reassignments, our study documents the first cases of sense-to-sense codon reassignments in Chlorophyta mtDNAs. In several Sphaeropleales, we report the decoding of AGG codons (normally arginine) as alanine, by tRNA(CCU) of various origins that carry the recognition signature for alanine tRNA synthetase. In Chromochloris, we identify tRNA variants decoding AGG as methionine and the synonymous codon CGG as leucine. Finally, we find strong evidence supporting the decoding of AUA codons (normally isoleucine) as methionine in Pycnococcus. Our results rely on a recently developed conceptual framework (CoreTracker) that predicts codon reassignments based on the disparity between DNA sequence (codons) and the derived protein sequence. These predictions are then validated by an evaluation of tRNA phylogeny, to identify the evolution of new tRNAs via gene duplication and loss, and structural modifications that lead to the assignment of new tRNA identities and a change in the genetic code.


Asunto(s)
Chlorophyta/genética , Evolución Molecular , Código Genético , Genoma Mitocondrial , Filogenia , ARN de Transferencia/genética
14.
RNA ; 24(1): 98-113, 2018 01.
Artículo en Inglés | MEDLINE | ID: mdl-29079635

RESUMEN

Cells are highly asymmetrical, a feature that relies on the sorting of molecular constituents, including proteins, lipids, and nucleic acids, to distinct subcellular locales. The localization of RNA molecules is an important layer of gene regulation required to modulate localized cellular activities, although its global prevalence remains unclear. We combine biochemical cell fractionation with RNA-sequencing (CeFra-seq) analysis to assess the prevalence and conservation of RNA asymmetric distribution on a transcriptome-wide scale in Drosophila and human cells. This approach reveals that the majority (∼80%) of cellular RNA species are asymmetrically distributed, whether considering coding or noncoding transcript populations, in patterns that are broadly conserved evolutionarily. Notably, a large number of Drosophila and human long noncoding RNAs and circular RNAs display enriched levels within specific cytoplasmic compartments, suggesting that these RNAs fulfill extra-nuclear functions. Moreover, fraction-specific mRNA populations exhibit distinctive sequence characteristics. Comparative analysis of mRNA fractionation profiles with that of their encoded proteins reveals a general lack of correlation in subcellular distribution, marked by strong cases of asymmetry. However, coincident distribution profiles are observed for mRNA/protein pairs related to a variety of functional protein modules, suggesting complex regulatory inputs of RNA localization to cellular organization.


Asunto(s)
ARN Mensajero/genética , ARN no Traducido/genética , Animales , Proteínas de Drosophila/genética , Proteínas de Drosophila/metabolismo , Drosophila melanogaster , Células Hep G2 , Humanos , Transporte de Proteínas , Transporte de ARN , ARN Bicatenario/genética , ARN Bicatenario/metabolismo , ARN Mensajero/metabolismo , ARN no Traducido/metabolismo , Especificidad de la Especie
15.
Bioinformatics ; 35(14): i117-i126, 2019 07 15.
Artículo en Inglés | MEDLINE | ID: mdl-31510664

RESUMEN

MOTIVATION: Genome rearrangements drastically change gene order along great stretches of a chromosome. There has been initial evidence that these apparently non-local events in the 1D sense may have breakpoints that are close in the 3D sense. We harness the power of the Double Cut and Join model of genome rearrangement, along with Hi-C chromosome conformation capture data to test this hypothesis between human and mouse. RESULTS: We devise novel statistical tests that show that indeed, rearrangement scenarios that transform the human into the mouse gene order are enriched for pairs of breakpoints that have frequent chromosome interactions. This is observed for both intra-chromosomal breakpoint pairs, as well as for inter-chromosomal pairs. For intra-chromosomal rearrangements, the enrichment exists from close (<20 Mb) to very distant (100 Mb) pairs. Further, the pattern exists across multiple cell lines in Hi-C data produced by different laboratories and at different stages of the cell cycle. We show that similarities in the contact frequencies between these many experiments contribute to the enrichment. We conclude that either (i) rearrangements usually involve breakpoints that are spatially close or (ii) there is selection against rearrangements that act on spatially distant breakpoints. AVAILABILITY AND IMPLEMENTATION: Our pipeline is freely available at https://bitbucket.org/thekswenson/locality. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Cromatina , Genoma , Programas Informáticos , Animales , Ciclo Celular , Puntos de Rotura del Cromosoma , Cromosomas , Humanos , Mamíferos , Ratones
16.
Bioinformatics ; 35(14): i333-i342, 2019 07 15.
Artículo en Inglés | MEDLINE | ID: mdl-31510698

RESUMEN

MOTIVATION: Messenger RNA subcellular localization mechanisms play a crucial role in post-transcriptional gene regulation. This trafficking is mediated by trans-acting RNA-binding proteins interacting with cis-regulatory elements called zipcodes. While new sequencing-based technologies allow the high-throughput identification of RNAs localized to specific subcellular compartments, the precise mechanisms at play, and their dependency on specific sequence elements, remain poorly understood. RESULTS: We introduce RNATracker, a novel deep neural network built to predict, from their sequence alone, the distributions of mRNA transcripts over a predefined set of subcellular compartments. RNATracker integrates several state-of-the-art deep learning techniques (e.g. CNN, LSTM and attention layers) and can make use of both sequence and secondary structure information. We report on a variety of evaluations showing RNATracker's strong predictive power, which is significantly superior to a variety of baseline predictors. Despite its complexity, several aspects of the model can be isolated to yield valuable, testable mechanistic hypotheses, and to locate candidate zipcode sequences within transcripts. AVAILABILITY AND IMPLEMENTATION: Code and data can be accessed at https://www.github.com/HarveyYan/RNATracker. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Redes Neurales de la Computación , Aprendizaje Profundo , Estructura Secundaria de Proteína , ARN Mensajero
17.
Genet Epidemiol ; 42(3): 233-249, 2018 04.
Artículo en Inglés | MEDLINE | ID: mdl-29423954

RESUMEN

Predicting a phenotype and understanding which variables improve that prediction are two very challenging and overlapping problems in the analysis of high-dimensional (HD) data such as those arising from genomic and brain imaging studies. It is often believed that the number of truly important predictors is small relative to the total number of variables, making computational approaches to variable selection and dimension reduction extremely important. To reduce dimensionality, commonly used two-step methods first cluster the data in some way, and build models using cluster summaries to predict the phenotype. It is known that important exposure variables can alter correlation patterns between clusters of HD variables, that is, alter network properties of the variables. However, it is not well understood whether such altered clustering is informative in prediction. Here, assuming there is a binary exposure with such network-altering effects, we explore whether the use of exposure-dependent clustering relationships in dimension reduction can improve predictive modeling in a two-step framework. Hence, we propose a modeling framework called ECLUST to test this hypothesis, and evaluate its performance through extensive simulations. With ECLUST, we found improved prediction and variable selection performance compared to methods that do not consider the environment in the clustering step, or to methods that use the original data as features. We further illustrate this modeling framework through the analysis of three data sets from very different fields, each with HD data, a binary exposure, and a phenotype of interest. Our method is available in the eclust CRAN package.


Asunto(s)
Enfermedad/genética , Modelos Genéticos , Adolescente , Algoritmos , Niño , Preescolar , Análisis por Conglomerados , Simulación por Computador , Bases de Datos como Asunto , Epigénesis Genética , Regulación de la Expresión Génica , Humanos , Imagen por Resonancia Magnética
18.
BMC Genomics ; 20(1): 162, 2019 Feb 28.
Artículo en Inglés | MEDLINE | ID: mdl-30819105

RESUMEN

BACKGROUND: Understanding how transcription occurs requires the integration of genome-wide and locus-specific information gleaned from robust technologies. Chromatin immunoprecipitation (ChIP) is a staple in gene expression studies, and while genome-wide methods are available, high-throughput approaches to analyze defined regions are lacking. RESULTS: Here, we present carbon copy-ChIP (2C-ChIP), a versatile, inexpensive, and high-throughput technique to quantitatively measure the abundance of DNA sequences in ChIP samples. This method combines ChIP with ligation-mediated amplification (LMA) and deep sequencing to probe large genomic regions of interest. 2C-ChIP recapitulates results from benchmark ChIP approaches. We applied 2C-ChIP to the HOXA cluster to find that a region where H3K27me3 and SUZ12 linger encodes HOXA-AS2, a long non-coding RNA that enhances gene expression during cellular differentiation. CONCLUSIONS: 2C-ChIP fills the need for a robust molecular biology tool designed to probe dedicated genomic regions in a high-throughput setting. The flexible nature of the 2C-ChIP approach allows rapid changes in experimental design at relatively low cost, making it a highly efficient method for chromatin analysis.


Asunto(s)
Inmunoprecipitación de Cromatina/métodos , Secuenciación de Nucleótidos de Alto Rendimiento , Análisis de Secuencia de ADN , Diferenciación Celular/genética , Células Cultivadas , Epigénesis Genética , Expresión Génica , Genes Homeobox , Genómica , Humanos , ARN Largo no Codificante/fisiología , Reacción en Cadena en Tiempo Real de la Polimerasa
19.
Nucleic Acids Res ; 45(6): 2994-3005, 2017 04 07.
Artículo en Inglés | MEDLINE | ID: mdl-28334773

RESUMEN

Topologically associating domains (TADs) have been proposed to be the basic unit of chromosome folding and have been shown to play key roles in genome organization and gene regulation. Several different tools are available for TAD prediction, but their properties have never been thoroughly assessed. In this manuscript, we compare the output of seven different TAD prediction tools on two published Hi-C data sets. TAD predictions varied greatly between tools in number, size distribution and other biological properties. Assessed against a manual annotation of TADs, individual TAD boundary predictions were found to be quite reliable, but their assembly into complete TAD structures was much less so. In addition, many tools were sensitive to sequencing depth and resolution of the interaction frequency matrix. This manuscript provides users and designers of TAD prediction tools with information that will help guide the choice of tools and the interpretation of their predictions.


Asunto(s)
Cromosomas/química , Secuenciación de Nucleótidos de Alto Rendimiento , Análisis de Secuencia de ADN , Programas Informáticos , Algoritmos , Sitios de Unión , Factor de Unión a CCCTC , Humanos , Proteínas Represoras/metabolismo
20.
Nucleic Acids Res ; 45(2): 556-566, 2017 01 25.
Artículo en Inglés | MEDLINE | ID: mdl-27899600

RESUMEN

MicroRNAs (miRNA) are short single-stranded RNA molecules derived from hairpin-forming precursors that play a crucial role as post-transcriptional regulators in eukaryotes and viruses. In the past years, many microRNA target genes (MTGs) have been identified experimentally. However, because of the high costs of experimental approaches, target genes databases remain incomplete. Although several target prediction programs have been developed in the recent years to identify MTGs in silico, their specificity and sensitivity remain low. Here, we propose a new approach called MirAncesTar, which uses ancestral genome reconstruction to boost the accuracy of existing MTGs prediction tools for human miRNAs. For each miRNA and each putative human target UTR, our algorithm makes uses of existing prediction tools to identify putative target sites in the human UTR, as well as in its mammalian orthologs and inferred ancestral sequences. It then evaluates evidence in support of selective pressure to maintain target site counts (rather than sequences), accounting for the possibility of target site turnover. It finally integrates this measure with several simpler ones using a logistic regression predictor. MirAncesTar improves the accuracy of existing MTG predictors by 26% to 157%. Source code and prediction results for human miRNAs, as well as supporting evolutionary data are available at http://cs.mcgill.ca/∼blanchem/mirancestar.


Asunto(s)
Biología Computacional/métodos , MicroARNs/genética , Interferencia de ARN , ARN Mensajero/genética , Algoritmos , Animales , Sitios de Unión , Simulación por Computador , Humanos , MicroARNs/química , ARN Mensajero/química
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA