Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 26
Filtrar
Más filtros

Banco de datos
Tipo del documento
Intervalo de año de publicación
1.
Nucleic Acids Res ; 48(8): 4066-4080, 2020 05 07.
Artículo en Inglés | MEDLINE | ID: mdl-32182345

RESUMEN

We introduce an R package and a web-based visualization tool for the representation, analysis and integration of epigenomic data in the context of 3D chromatin interaction networks. GARDEN-NET allows for the projection of user-submitted genomic features on pre-loaded chromatin interaction networks, exploiting the functionalities of the ChAseR package to explore the features in combination with chromatin network topology properties. We demonstrate the approach using published epigenomic and chromatin structure datasets in haematopoietic cells, including a collection of gene expression, DNA methylation and histone modifications data in primary healthy myeloid cells from hundreds of individuals. These datasets allow us to test the robustness of chromatin assortativity, which highlights which epigenomic features, alone or in combination, are more strongly associated with 3D genome architecture. We find evidence for genomic regions with specific histone modifications, DNA methylation, and gene expression levels to be forming preferential contacts in 3D nuclear space, to a different extent depending on the cell type and lineage. Finally, we examine replication timing data and find it to be the genomic feature most strongly associated with overall 3D chromatin organization at multiple scales, consistent with previous results from the literature.


Asunto(s)
Cromatina/metabolismo , Epigénesis Genética , Células Madre Hematopoyéticas/metabolismo , Programas Informáticos , Linfocitos B/metabolismo , Metilación de ADN , Momento de Replicación del ADN , Expresión Génica , Código de Histonas , Humanos , Neutrófilos/metabolismo , Regiones Promotoras Genéticas
2.
Nucleic Acids Res ; 47(6): 2778-2792, 2019 04 08.
Artículo en Inglés | MEDLINE | ID: mdl-30799488

RESUMEN

The concept of tissue-specific gene expression posits that lineage-determining transcription factors (LDTFs) determine the open chromatin profile of a cell via collaborative binding, providing molecular beacons to signal-dependent transcription factors (SDTFs). However, the guiding principles of LDTF binding, chromatin accessibility and enhancer activity have not yet been systematically evaluated. We sought to study these features of the macrophage genome by the combination of experimental (ChIP-seq, ATAC-seq and GRO-seq) and computational approaches. We show that Random Forest and Support Vector Regression machine learning methods can accurately predict chromatin accessibility using the binding patterns of the LDTF PU.1 and four other key TFs of macrophages (IRF8, JUNB, CEBPA and RUNX1). Any of these TFs alone were not sufficient to predict open chromatin, indicating that TF binding is widespread at closed or weakly opened chromatin regions. Analysis of the PU.1 cistrome revealed that two-thirds of PU.1 binding occurs at low accessible chromatin. We termed these sites labelled regulatory elements (LREs), which may represent a dormant state of a future enhancer and contribute to macrophage cellular plasticity. Collectively, our work demonstrates the existence of LREs occupied by various key TFs, regulating specific gene expression programs triggered by divergent macrophage polarizing stimuli.


Asunto(s)
Ensamble y Desensamble de Cromatina/fisiología , Macrófagos/metabolismo , Secuencias Reguladoras de Ácidos Nucleicos , Factores de Transcripción/metabolismo , Animales , Células Cultivadas , Biología Computacional , Regulación de la Expresión Génica/fisiología , Genoma , Aprendizaje Automático , Ratones , Ratones Endogámicos C57BL , Unión Proteica/fisiología , Coloración y Etiquetado/métodos , Activación Transcripcional/fisiología
3.
Genome Res ; 27(1): 95-106, 2017 01.
Artículo en Inglés | MEDLINE | ID: mdl-27821408

RESUMEN

The impact of RNA structures in coding sequences (CDS) within mRNAs is poorly understood. Here, we identify a novel and highly conserved mechanism of translational control involving RNA structures within coding sequences and the DEAD-box helicase Dhh1. Using yeast genetics and genome-wide ribosome profiling analyses, we show that this mechanism, initially derived from studies of the Brome Mosaic virus RNA genome, extends to yeast and human mRNAs highly enriched in membrane and secreted proteins. All Dhh1-dependent mRNAs, viral and cellular, share key common features. First, they contain long and highly structured CDSs, including a region located around nucleotide 70 after the translation initiation site; second, they are directly bound by Dhh1 with a specific binding distribution; and third, complementary experimental approaches suggest that they are activated by Dhh1 at the translation initiation step. Our results show that ribosome translocation is not the only unwinding force of CDS and uncover a novel layer of translational control that involves RNA helicases and RNA folding within CDS providing novel opportunities for regulation of membrane and secretome proteins.


Asunto(s)
ARN Helicasas DEAD-box/genética , Iniciación de la Cadena Peptídica Traduccional , Biosíntesis de Proteínas , ARN/genética , Proteínas de Saccharomyces cerevisiae/genética , Bromovirus/genética , Exones/genética , Regulación de la Expresión Génica/genética , Humanos , Conformación de Ácido Nucleico , Sistemas de Lectura Abierta/genética , ARN Mensajero/genética , Ribosomas/genética , Saccharomyces cerevisiae/genética
4.
PLoS Comput Biol ; 15(11): e1007496, 2019 11.
Artículo en Inglés | MEDLINE | ID: mdl-31765368

RESUMEN

The sheer size of the human genome makes it improbable that identical somatic mutations at the exact same position are observed in multiple tumours solely by chance. The scarcity of cancer driver mutations also precludes positive selection as the sole explanation. Therefore, recurrent mutations may be highly informative of characteristics of mutational processes. To explore the potential, we use recurrence as a starting point to cluster >2,500 whole genomes of a pan-cancer cohort. We describe each genome with 13 recurrence-based and 29 general mutational features. Using principal component analysis we reduce the dimensionality and create independent features. We apply hierarchical clustering to the first 18 principal components followed by k-means clustering. We show that the resulting 16 clusters capture clinically relevant cancer phenotypes. High levels of recurrent substitutions separate the clusters that we link to UV-light exposure and deregulated activity of POLE from the one representing defective mismatch repair, which shows high levels of recurrent insertions/deletions. Recurrence of both mutation types characterizes cancer genomes with somatic hypermutation of immunoglobulin genes and the cluster of genomes exposed to gastric acid. Low levels of recurrence are observed for the cluster where tobacco-smoke exposure induces mutagenesis and the one linked to increased activity of cytidine deaminases. Notably, the majority of substitutions are recurrent in a single tumour type, while recurrent insertions/deletions point to shared processes between tumour types. Recurrence also reveals susceptible sequence motifs, including TT[C>A]TTT and AAC[T>G]T for the POLE and 'gastric-acid exposure' clusters, respectively. Moreover, we refine knowledge of mutagenesis, including increased C/G deletion levels in general for lung tumours and specifically in midsize homopolymer sequence contexts for microsatellite instable tumours. Our findings are an important step towards the development of a generic cancer diagnostic test for clinical practice based on whole-genome sequencing that could replace multiple diagnostics currently in use.


Asunto(s)
Biología Computacional/métodos , Neoplasias/clasificación , Neoplasias/genética , Estudios de Cohortes , Bases de Datos de Ácidos Nucleicos , Predisposición Genética a la Enfermedad/genética , Genoma Humano/genética , Humanos , Mutación INDEL/genética , Mutagénesis/genética , Mutación/genética , Polimorfismo de Nucleótido Simple/genética , Análisis de Secuencia de ADN/métodos , Eliminación de Secuencia/genética
5.
Biotechnol Bioeng ; 116(3): 677-692, 2019 03.
Artículo en Inglés | MEDLINE | ID: mdl-30512195

RESUMEN

The existence of dynamic cellular phenotypes in changing environmental conditions is of major interest for cell biologists who aim to understand the mechanism and sequence of regulation of gene expression. In the context of therapeutic protein production by Chinese Hamster Ovary (CHO) cells, a detailed temporal understanding of cell-line behavior and control is necessary to achieve a more predictable and reliable process performance. Of particular interest are data on dynamic, temporally resolved transcriptional regulation of genes in response to altered substrate availability and culture conditions. In this study, the gene transcription dynamics throughout a 9-day batch culture of CHO cells was examined by analyzing histone modifications and gene expression profiles in regular 12- and 24-hr intervals, respectively. Three levels of regulation were observed: (a) the presence or absence of DNA methylation in the promoter region provides an ON/OFF switch; (b) a temporally resolved correlation is observed between the presence of active transcription- and promoter-specific histone marks and the expression level of the respective genes; and (c) a major mechanism of gene regulation is identified by interaction of coding genes with long non-coding RNA (lncRNA), as observed in the regulation of the expression level of both neighboring coding/lnc gene pairs and of gene pairs where the lncRNA is able to form RNA-DNA-DNA triplexes. Such triplex-forming regions were predominantly found in the promoter or enhancer region of the targeted coding gene. Significantly, the coding genes with the highest degree of variation in expression during the batch culture are characterized by a larger number of possible triplex-forming interactions with differentially expressed lncRNAs. This indicates a specific role of lncRNA-triplexes in enabling rapid and large changes in transcription. A more comprehensive understanding of these regulatory mechanisms will provide an opportunity for new tools to control cellular behavior and to engineer enhanced phenotypes.


Asunto(s)
Técnicas de Cultivo Celular por Lotes/métodos , Epigénesis Genética/genética , Regulación de la Expresión Génica/genética , Adaptación Fisiológica , Animales , Células CHO , Cricetinae , Cricetulus , Perfilación de la Expresión Génica , ARN Largo no Codificante/genética , Transcriptoma
6.
Genome Res ; 25(4): 478-87, 2015 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-25644835

RESUMEN

While analyzing the DNA methylome of multiple myeloma (MM), a plasma cell neoplasm, by whole-genome bisulfite sequencing and high-density arrays, we observed a highly heterogeneous pattern globally characterized by regional DNA hypermethylation embedded in extensive hypomethylation. In contrast to the widely reported DNA hypermethylation of promoter-associated CpG islands (CGIs) in cancer, hypermethylated sites in MM, as opposed to normal plasma cells, were located outside CpG islands and were unexpectedly associated with intronic enhancer regions defined in normal B cells and plasma cells. Both RNA-seq and in vitro reporter assays indicated that enhancer hypermethylation is globally associated with down-regulation of its host genes. ChIP-seq and DNase-seq further revealed that DNA hypermethylation in these regions is related to enhancer decommissioning. Hypermethylated enhancer regions overlapped with binding sites of B cell-specific transcription factors (TFs) and the degree of enhancer methylation inversely correlated with expression levels of these TFs in MM. Furthermore, hypermethylated regions in MM were methylated in stem cells and gradually became demethylated during normal B-cell differentiation, suggesting that MM cells either reacquire epigenetic features of undifferentiated cells or maintain an epigenetic signature of a putative myeloma stem cell progenitor. Overall, we have identified DNA hypermethylation of developmentally regulated enhancers as a new type of epigenetic modification associated with the pathogenesis of MM.


Asunto(s)
Metilación de ADN/genética , Elementos de Facilitación Genéticos/genética , Mieloma Múltiple/genética , Células Madre Neoplásicas/citología , Células Plasmáticas/citología , Diferenciación Celular/genética , Línea Celular Tumoral , Islas de CpG/genética , ADN de Neoplasias/genética , Regulación hacia Abajo/genética , Epigénesis Genética/genética , Regulación Neoplásica de la Expresión Génica , Genoma Humano/genética , Humanos , Regiones Promotoras Genéticas , Factores de Transcripción/biosíntesis , Factores de Transcripción/genética
7.
Theor Popul Biol ; 123: 70-79, 2018 09.
Artículo en Inglés | MEDLINE | ID: mdl-29964061

RESUMEN

We introduce the conditional Site Frequency Spectrum (SFS) for a genomic region linked to a focal mutation of known frequency. An exact expression for its expected value is provided for the neutral model without recombination. Its relation with the expected SFS for two sites, 2-SFS, is discussed. These spectra derive from the coalescent approach of Fu (1995) for finite samples, which is reviewed. Remarkably simple expressions are obtained for the linked SFS of a large population, which are also solutions of the multi-allelic Kolmogorov equations. These formulae are the immediate extensions of the well known single site θ∕f neutral SFS. Besides the general interest in these spectra, they relate to relevant biological cases, such as structural variants and introgressions. As an application, a recipe to adapt Tajima's D and other SFS-based neutrality tests to a non-recombining region containing a neutral marker is presented.


Asunto(s)
Genética de Población/métodos , Modelos Genéticos , Tasa de Mutación , Evolución Molecular , Desequilibrio de Ligamiento , Selección Genética
9.
Nature ; 452(7189): 840-5, 2008 Apr 17.
Artículo en Inglés | MEDLINE | ID: mdl-18421347

RESUMEN

Sequencing DNA from several organisms has revealed that duplication and drift of existing genes have primarily moulded the contents of a given genome. Though the effect of knocking out or overexpressing a particular gene has been studied in many organisms, no study has systematically explored the effect of adding new links in a biological network. To explore network evolvability, we constructed 598 recombinations of promoters (including regulatory regions) with different transcription or sigma-factor genes in Escherichia coli, added over a wild-type genetic background. Here we show that approximately 95% of new networks are tolerated by the bacteria, that very few alter growth, and that expression level correlates with factor position in the wild-type network hierarchy. Most importantly, we find that certain networks consistently survive over the wild type under various selection pressures. Therefore new links in the network are rarely a barrier for evolution and can even confer a fitness advantage.


Asunto(s)
Escherichia coli/genética , Escherichia coli/metabolismo , Evolución Molecular , Regulación Bacteriana de la Expresión Génica/genética , Redes Reguladoras de Genes/genética , Ingeniería Genética , Selección Genética , Escherichia coli/crecimiento & desarrollo , Proteínas de Escherichia coli/genética , Proteínas de Escherichia coli/metabolismo , Genes Bacterianos/genética , Respuesta al Choque Térmico , Análisis de Secuencia por Matrices de Oligonucleótidos , Sistemas de Lectura Abierta/genética , Regiones Promotoras Genéticas/genética , Pase Seriado , Factor sigma/genética , Factor sigma/metabolismo , Factores de Transcripción/genética , Factores de Transcripción/metabolismo
10.
Nucleic Acids Res ; 40(20): 10073-83, 2012 Nov 01.
Artículo en Inglés | MEDLINE | ID: mdl-22962361

RESUMEN

High-throughput sequencing of cDNA libraries constructed from cellular RNA complements (RNA-Seq) naturally provides a digital quantitative measurement for every expressed RNA molecule. Nature, impact and mutual interference of biases in different experimental setups are, however, still poorly understood-mostly due to the lack of data from intermediate protocol steps. We analysed multiple RNA-Seq experiments, involving different sample preparation protocols and sequencing platforms: we broke them down into their common--and currently indispensable--technical components (reverse transcription, fragmentation, adapter ligation, PCR amplification, gel segregation and sequencing), investigating how such different steps influence abundance and distribution of the sequenced reads. For each of those steps, we developed universally applicable models, which can be parameterised by empirical attributes of any experimental protocol. Our models are implemented in a computer simulation pipeline called the Flux Simulator, and we show that read distributions generated by different combinations of these models reproduce well corresponding evidence obtained from the corresponding experimental setups. We further demonstrate that our in silico RNA-Seq provides insights about hidden precursors that determine the final configuration of reads along gene bodies; enhancing or compensatory effects that explain apparently controversial observations can be observed. Moreover, our simulations identify hitherto unreported sources of systematic bias from RNA hydrolysis, a fragmentation technique currently employed by most RNA-Seq protocols.


Asunto(s)
Simulación por Computador , Perfilación de la Expresión Génica , Secuenciación de Nucleótidos de Alto Rendimiento , Análisis de Secuencia de ARN , Hidrólisis , ARN/metabolismo
11.
BMC Genomics ; 14: 148, 2013 Mar 05.
Artículo en Inglés | MEDLINE | ID: mdl-23497037

RESUMEN

BACKGROUND: In contrast to international pig breeds, the Iberian breed has not been admixed with Asian germplasm. This makes it an important model to study both domestication and relevance of Asian genes in the pig. Besides, Iberian pigs exhibit high meat quality as well as appetite and propensity to obesity. Here we provide a genome wide analysis of nucleotide and structural diversity in a reduced representation library from a pool (n=9 sows) and shotgun genomic sequence from a single sow of the highly inbred Guadyerbas strain. In the pool, we applied newly developed tools to account for the peculiarities of these data. RESULTS: A total of 254,106 SNPs in the pool (79.6 Mb covered) and 643,783 in the Guadyerbas sow (1.47 Gb covered) were called. The nucleotide diversity (1.31x10-3 per bp in autosomes) is very similar to that reported in wild boar. A much lower than expected diversity in the X chromosome was confirmed (1.79x10-4 per bp in the individual and 5.83x10-4 per bp in the pool). A strong (0.70) correlation between recombination and variability was observed, but not with gene density or GC content. Multicopy regions affected about 4% of annotated pig genes in their entirety, and 2% of the genes partially. Genes within the lowest variability windows comprised interferon genes and, in chromosome X, genes involved in behavior like HTR2C or MCEP2. A modified Hudson-Kreitman-Aguadé test for pools also indicated an accelerated evolution in genes involved in behavior, as well as in spermatogenesis and in lipid metabolism. CONCLUSIONS: This work illustrates the strength of current sequencing technologies to picture a comprehensive landscape of variability in livestock species, and to pinpoint regions containing genes potentially under selection. Among those genes, we report genes involved in behavior, including feeding behavior, and lipid metabolism. The pig X chromosome is an outlier in terms of nucleotide diversity, which suggests selective constraints. Our data further confirm the importance of structural variation in the species, including Iberian pigs, and allowed us to identify new paralogs for known gene families.


Asunto(s)
Animales Endogámicos/genética , Mapeo Cromosómico , Polimorfismo de Nucleótido Simple/genética , Porcinos/genética , Animales , Cruzamiento , Variación Genética , Nucleótidos/genética
12.
BMC Genomics ; 14: 363, 2013 May 31.
Artículo en Inglés | MEDLINE | ID: mdl-23721540

RESUMEN

BACKGROUND: The only known albino gorilla, named Snowflake, was a male wild born individual from Equatorial Guinea who lived at the Barcelona Zoo for almost 40 years. He was diagnosed with non-syndromic oculocutaneous albinism, i.e. white hair, light eyes, pink skin, photophobia and reduced visual acuity. Despite previous efforts to explain the genetic cause, this is still unknown. Here, we study the genetic cause of his albinism and making use of whole genome sequencing data we find a higher inbreeding coefficient compared to other gorillas. RESULTS: We successfully identified the causal genetic variant for Snowflake's albinism, a non-synonymous single nucleotide variant located in a transmembrane region of SLC45A2. This transporter is known to be involved in oculocutaneous albinism type 4 (OCA4) in humans. We provide experimental evidence that shows that this amino acid replacement alters the membrane spanning capability of this transmembrane region. Finally, we provide a comprehensive study of genome-wide patterns of autozygogosity revealing that Snowflake's parents were related, being this the first report of inbreeding in a wild born Western lowland gorilla. CONCLUSIONS: In this study we demonstrate how the use of whole genome sequencing can be extended to link genotype and phenotype in non-model organisms and it can be a powerful tool in conservation genetics (e.g., inbreeding and genetic diversity) with the expected decrease in sequencing cost.


Asunto(s)
Genómica , Gorilla gorilla/genética , Secuenciación de Nucleótidos de Alto Rendimiento , Endogamia , Secuencia de Aminoácidos , Animales , Femenino , Heterocigoto , Masculino , Proteínas de Transporte de Membrana/química , Proteínas de Transporte de Membrana/genética , Repeticiones de Microsatélite/genética , Datos de Secuencia Molecular , Mutación , Análisis de Secuencia de ADN
13.
PLoS Biol ; 8(9)2010 Sep 07.
Artículo en Inglés | MEDLINE | ID: mdl-20838655

RESUMEN

A synergistic combination of two next-generation sequencing platforms with a detailed comparative BAC physical contig map provided a cost-effective assembly of the genome sequence of the domestic turkey (Meleagris gallopavo). Heterozygosity of the sequenced source genome allowed discovery of more than 600,000 high quality single nucleotide variants. Despite this heterozygosity, the current genome assembly (∼1.1 Gb) includes 917 Mb of sequence assigned to specific turkey chromosomes. Annotation identified nearly 16,000 genes, with 15,093 recognized as protein coding and 611 as non-coding RNA genes. Comparative analysis of the turkey, chicken, and zebra finch genomes, and comparing avian to mammalian species, supports the characteristic stability of avian genomes and identifies genes unique to the avian lineage. Clear differences are seen in number and variety of genes of the avian immune system where expansions and novel genes are less frequent than examples of gene loss. The turkey genome sequence provides resources to further understand the evolution of vertebrate genomes and genetic variation underlying economically important quantitative traits in poultry. This integrated approach may be a model for providing both gene and chromosome level assemblies of other species with agricultural, ecological, and evolutionary interest.


Asunto(s)
Genoma , Pavos/genética , Animales , Secuencia de Bases , Mapeo Cromosómico , ADN/genética , Polimorfismo de Nucleótido Simple , Análisis de Secuencia de ADN , Homología de Secuencia de Ácido Nucleico , Especificidad de la Especie
14.
Nucleic Acids Res ; 39(16): 6886-95, 2011 Sep 01.
Artículo en Inglés | MEDLINE | ID: mdl-21624887

RESUMEN

We present and validate BlastR, a method for efficiently and accurately searching non-coding RNAs. Our approach relies on the comparison of di-nucleotides using BlosumR, a new log-odd substitution matrix. In order to use BlosumR for comparison, we recoded RNA sequences into protein-like sequences. We then showed that BlosumR can be used along with the BlastP algorithm in order to search non-coding RNA sequences. Using Rfam as a gold standard, we benchmarked this approach and show BlastR to be more sensitive than BlastN. We also show that BlastR is both faster and more sensitive than BlastP used with a single nucleotide log-odd substitution matrix. BlastR, when used in combination with WU-BlastP, is about 5% more accurate than WU-BlastN and about 50 times slower. The approach shown here is equally effective when combined with the NCBI-Blast package. The software is an open source freeware available from www.tcoffee.org/blastr.html.


Asunto(s)
Bases de Datos de Ácidos Nucleicos , ARN no Traducido/química , Análisis de Secuencia de ARN , Algoritmos , Alineación de Secuencia , Programas Informáticos
15.
Bioinform Adv ; 3(1): vbac101, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-36726731

RESUMEN

Summary: Nanopore reads encode information on the methylation status of cytosines in CpG dinucleotides. The length of the reads makes it comparatively easy to look at patterns consisting of multiple loci; here, we exploit this property to search for regions where one can define subpopulations of molecules based on methylation patterns. As an example, we run our clustering algorithm on known imprinted genes; we also scan chromosome 15 looking for windows corresponding to heterogeneous methylation. Our software can also compute the covariance of methylation across these regions while keeping into account the mixture of different types of reads. Availability and implementation: https://github.com/EmanueleRaineri/cvlr. Contact: simon.heath@cnag.crg.eu. Supplementary information: Supplementary data are available at Bioinformatics Advances online.

16.
BMC Bioinformatics ; 13: 239, 2012 Sep 20.
Artículo en Inglés | MEDLINE | ID: mdl-22992255

RESUMEN

BACKGROUND: Performing high throughput sequencing on samples pooled from different individuals is a strategy to characterize genetic variability at a small fraction of the cost required for individual sequencing. In certain circumstances some variability estimators have even lower variance than those obtained with individual sequencing. SNP calling and estimating the frequency of the minor allele from pooled samples, though, is a subtle exercise for at least three reasons. First, sequencing errors may have a much larger relevance than in individual SNP calling: while their impact in individual sequencing can be reduced by setting a restriction on a minimum number of reads per allele, this would have a strong and undesired effect in pools because it is unlikely that alleles at low frequency in the pool will be read many times. Second, the prior allele frequency for heterozygous sites in individuals is usually 0.5 (assuming one is not analyzing sequences coming from, e.g. cancer tissues), but this is not true in pools: in fact, under the standard neutral model, singletons (i.e. alleles of minimum frequency) are the most common class of variants because P(f) ∝ 1/f and they occur more often as the sample size increases. Third, an allele appearing only once in the reads from a pool does not necessarily correspond to a singleton in the set of individuals making up the pool, and vice versa, there can be more than one read - or, more likely, none - from a true singleton. RESULTS: To improve upon existing theory and software packages, we have developed a Bayesian approach for minor allele frequency (MAF) computation and SNP calling in pools (and implemented it in a program called snape): the approach takes into account sequencing errors and allows users to choose different priors. We also set up a pipeline which can simulate the coalescence process giving rise to the SNPs, the pooling procedure and the sequencing. We used it to compare the performance of snape to that of other packages. CONCLUSIONS: We present a software which helps in calling SNPs in pooled samples: it has good power while retaining a low false discovery rate (FDR). The method also provides the posterior probability that a SNP is segregating and the full posterior distribution of f for every SNP. In order to test the behaviour of our software, we generated (through simulated coalescence) artificial genomes and computed the effect of a pooled sequencing protocol, followed by SNP calling. In this setting, snape has better power and False Discovery Rate (FDR) than the comparable packages samtools, PoPoolation, Varscan : for N = 50 chromosomes, snape has power ≈ 35%and FDR ≈ 2.5%. snape is available at http://code.google.com/p/snape-pooled/ (source code and precompiled binaries).


Asunto(s)
Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Polimorfismo de Nucleótido Simple , Análisis de Secuencia de ADN/métodos , Alelos , Teorema de Bayes , Frecuencia de los Genes , Genoma , Humanos , Programas Informáticos
17.
Bioinformatics ; 26(14): 1685-9, 2010 Jul 15.
Artículo en Inglés | MEDLINE | ID: mdl-20519287

RESUMEN

MOTIVATION: Molecular chaperones prevent the aggregation of their substrate proteins and thereby ensure that they reach their functional native state. The bacterial GroEL/ES chaperonin system is understood in great detail on a structural, mechanistic and functional level; its interactors in Escherichia coli have been identified and characterized. However, a long-standing question in the field is: What makes a protein a chaperone substrate? RESULTS: Here we identify, using a bioinformatics-based approach a simple set of quantities, which characterize the GroEL-substrate proteome. We define three novel parameters differentiating GroEL interactors from other cellular proteins: lower rate of evolution, hydrophobicity and aggregation propensity. Combining them with other known features to a simple Bayesian predictor allows us to identify known homologous and heterologous GroEL substrateproteins. We discuss our findings in relation to established mechanisms of protein folding and evolutionary buffering by chaperones.


Asunto(s)
Chaperoninas/química , Biología Computacional/métodos , Chaperonina 60/química , Chaperonina 60/metabolismo , Evolución Molecular , Cinética , Pliegue de Proteína , Proteoma/metabolismo
18.
Bioinformatics ; 24(24): 2839-48, 2008 Dec 15.
Artículo en Inglés | MEDLINE | ID: mdl-18845582

RESUMEN

BACKGROUND: The computation of the statistical properties of motif occurrences has an obviously relevant application: patterns that are significantly over- or under-represented in genomes or proteins are interesting candidates for biological roles. However, the problem is computationally hard; as a result, virtually all the existing motif finders use fast but approximate scoring functions, in spite of the fact that they have been shown to produce systematically incorrect results. A few interesting exact approaches are known, but they are very slow and hence not practical in the case of realistic sequences. RESULTS: We give an exact solution, solely based on deterministic finite-state automata (DFA), to the problem of finding the whole relevant part of the probability distribution function of a simple-word motif in a homogeneous (biological) sequence. Out of that, the z-value can always be computed, while the P-value can be obtained either when it is not too extreme with respect to the number of floating-point digits available in the implementation, or when the number of pattern occurrences is moderately low. In particular, the time complexity of the algorithms for Markov models of moderate order (0 < or = m < or = 2) is far better than that of Nuel, which was the fastest similar exact algorithm known to date; in many cases, even approximate methods are outperformed. CONCLUSIONS: DFA are a standard tool of computer science for the study of patterns; previous works in biology propose algorithms involving automata, but there they are used, respectively, as a first step to write a generating function, or to build a finite Markov-chain imbedding (FMCI). In contrast, we directly rely on DFA to perform the calculations; thus we manage to obtain an algorithm which is both easily interpretable and efficient. This approach can be used for exact statistical studies of very long genomes and protein sequences, as we illustrate with some examples on the scale of the human genome.


Asunto(s)
Biología Computacional/métodos , Genómica/métodos , Cadenas de Markov , Secuencias de Aminoácidos , Genoma Humano , Humanos , Probabilidad , Análisis de Secuencia de Proteína
19.
Cell Rep ; 17(8): 2101-2111, 2016 11 15.
Artículo en Inglés | MEDLINE | ID: mdl-27851971

RESUMEN

DNA methylation and the localization and post-translational modification of nucleosomes are interdependent factors that contribute to the generation of distinct phenotypes from genetically identical cells. With 112 whole-genome bisulfite sequencing datasets from the BLUEPRINT Epigenome Project, we analyzed the global development of DNA methylation patterns during lineage commitment and maturation of a range of immune system effector cells and the cancers that arise from them. We show clear trends in methylation patterns that are distinct in the innate and adaptive arms of the human immune system, both globally and in relation to consistently positioned nucleosomes. Most notable are a progressive loss of methylation in developing lymphocytes and the consistent occurrence of non-CG methylation in specific cell types. Cancer samples from the two lineages are further polarized, suggesting the involvement of distinct lineage-specific epigenetic mechanisms. We anticipate broad utility for this resource as a basis for further comparative epigenetic analyses.


Asunto(s)
Inmunidad Adaptativa/genética , Metilación de ADN/genética , Inmunidad Innata/genética , Linfocitos B/metabolismo , Secuencia de Bases , Sitios de Unión , Factor de Unión a CCCTC , Fosfatos de Dinucleósidos/genética , Exones/genética , Humanos , Linfocitos/metabolismo , Células Mieloides/metabolismo , Nucleosomas
20.
Cancer Cell ; 30(5): 806-821, 2016 Nov 14.
Artículo en Inglés | MEDLINE | ID: mdl-27846393

RESUMEN

We analyzed the in silico purified DNA methylation signatures of 82 mantle cell lymphomas (MCL) in comparison with cell subpopulations spanning the entire B cell lineage. We identified two MCL subgroups, respectively carrying epigenetic imprints of germinal-center-inexperienced and germinal-center-experienced B cells, and we found that DNA methylation profiles during lymphomagenesis are largely influenced by the methylation dynamics in normal B cells. An integrative epigenomic approach revealed 10,504 differentially methylated regions in regulatory elements marked by H3K27ac in MCL primary cases, including a distant enhancer showing de novo looping to the MCL oncogene SOX11. Finally, we observed that the magnitude of DNA methylation changes per case is highly variable and serves as an independent prognostic factor for MCL outcome.


Asunto(s)
Metilación de ADN , Elementos de Facilitación Genéticos , Epigenómica/métodos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Linfoma de Células del Manto/genética , Linfocitos B/metabolismo , Línea Celular Tumoral , Linaje de la Célula , Simulación por Computador , Epigénesis Genética , Regulación Neoplásica de la Expresión Génica , Humanos , Factores de Transcripción SOXC/genética
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA