Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 26
Filtrar
Mais filtros

Bases de dados
Tipo de documento
Intervalo de ano de publicação
1.
Nucleic Acids Res ; 48(8): 4066-4080, 2020 05 07.
Artigo em Inglês | MEDLINE | ID: mdl-32182345

RESUMO

We introduce an R package and a web-based visualization tool for the representation, analysis and integration of epigenomic data in the context of 3D chromatin interaction networks. GARDEN-NET allows for the projection of user-submitted genomic features on pre-loaded chromatin interaction networks, exploiting the functionalities of the ChAseR package to explore the features in combination with chromatin network topology properties. We demonstrate the approach using published epigenomic and chromatin structure datasets in haematopoietic cells, including a collection of gene expression, DNA methylation and histone modifications data in primary healthy myeloid cells from hundreds of individuals. These datasets allow us to test the robustness of chromatin assortativity, which highlights which epigenomic features, alone or in combination, are more strongly associated with 3D genome architecture. We find evidence for genomic regions with specific histone modifications, DNA methylation, and gene expression levels to be forming preferential contacts in 3D nuclear space, to a different extent depending on the cell type and lineage. Finally, we examine replication timing data and find it to be the genomic feature most strongly associated with overall 3D chromatin organization at multiple scales, consistent with previous results from the literature.


Assuntos
Cromatina/metabolismo , Epigênese Genética , Células-Tronco Hematopoéticas/metabolismo , Software , Linfócitos B/metabolismo , Metilação de DNA , Período de Replicação do DNA , Expressão Gênica , Código das Histonas , Humanos , Neutrófilos/metabolismo , Regiões Promotoras Genéticas
2.
Nucleic Acids Res ; 47(6): 2778-2792, 2019 04 08.
Artigo em Inglês | MEDLINE | ID: mdl-30799488

RESUMO

The concept of tissue-specific gene expression posits that lineage-determining transcription factors (LDTFs) determine the open chromatin profile of a cell via collaborative binding, providing molecular beacons to signal-dependent transcription factors (SDTFs). However, the guiding principles of LDTF binding, chromatin accessibility and enhancer activity have not yet been systematically evaluated. We sought to study these features of the macrophage genome by the combination of experimental (ChIP-seq, ATAC-seq and GRO-seq) and computational approaches. We show that Random Forest and Support Vector Regression machine learning methods can accurately predict chromatin accessibility using the binding patterns of the LDTF PU.1 and four other key TFs of macrophages (IRF8, JUNB, CEBPA and RUNX1). Any of these TFs alone were not sufficient to predict open chromatin, indicating that TF binding is widespread at closed or weakly opened chromatin regions. Analysis of the PU.1 cistrome revealed that two-thirds of PU.1 binding occurs at low accessible chromatin. We termed these sites labelled regulatory elements (LREs), which may represent a dormant state of a future enhancer and contribute to macrophage cellular plasticity. Collectively, our work demonstrates the existence of LREs occupied by various key TFs, regulating specific gene expression programs triggered by divergent macrophage polarizing stimuli.


Assuntos
Montagem e Desmontagem da Cromatina/fisiologia , Macrófagos/metabolismo , Sequências Reguladoras de Ácido Nucleico , Fatores de Transcrição/metabolismo , Animais , Células Cultivadas , Biologia Computacional , Regulação da Expressão Gênica/fisiologia , Genoma , Aprendizado de Máquina , Camundongos , Camundongos Endogâmicos C57BL , Ligação Proteica/fisiologia , Coloração e Rotulagem/métodos , Ativação Transcricional/fisiologia
3.
Genome Res ; 27(1): 95-106, 2017 01.
Artigo em Inglês | MEDLINE | ID: mdl-27821408

RESUMO

The impact of RNA structures in coding sequences (CDS) within mRNAs is poorly understood. Here, we identify a novel and highly conserved mechanism of translational control involving RNA structures within coding sequences and the DEAD-box helicase Dhh1. Using yeast genetics and genome-wide ribosome profiling analyses, we show that this mechanism, initially derived from studies of the Brome Mosaic virus RNA genome, extends to yeast and human mRNAs highly enriched in membrane and secreted proteins. All Dhh1-dependent mRNAs, viral and cellular, share key common features. First, they contain long and highly structured CDSs, including a region located around nucleotide 70 after the translation initiation site; second, they are directly bound by Dhh1 with a specific binding distribution; and third, complementary experimental approaches suggest that they are activated by Dhh1 at the translation initiation step. Our results show that ribosome translocation is not the only unwinding force of CDS and uncover a novel layer of translational control that involves RNA helicases and RNA folding within CDS providing novel opportunities for regulation of membrane and secretome proteins.


Assuntos
RNA Helicases DEAD-box/genética , Iniciação Traducional da Cadeia Peptídica , Biossíntese de Proteínas , RNA/genética , Proteínas de Saccharomyces cerevisiae/genética , Bromovirus/genética , Éxons/genética , Regulação da Expressão Gênica/genética , Humanos , Conformação de Ácido Nucleico , Fases de Leitura Aberta/genética , RNA Mensageiro/genética , Ribossomos/genética , Saccharomyces cerevisiae/genética
4.
PLoS Comput Biol ; 15(11): e1007496, 2019 11.
Artigo em Inglês | MEDLINE | ID: mdl-31765368

RESUMO

The sheer size of the human genome makes it improbable that identical somatic mutations at the exact same position are observed in multiple tumours solely by chance. The scarcity of cancer driver mutations also precludes positive selection as the sole explanation. Therefore, recurrent mutations may be highly informative of characteristics of mutational processes. To explore the potential, we use recurrence as a starting point to cluster >2,500 whole genomes of a pan-cancer cohort. We describe each genome with 13 recurrence-based and 29 general mutational features. Using principal component analysis we reduce the dimensionality and create independent features. We apply hierarchical clustering to the first 18 principal components followed by k-means clustering. We show that the resulting 16 clusters capture clinically relevant cancer phenotypes. High levels of recurrent substitutions separate the clusters that we link to UV-light exposure and deregulated activity of POLE from the one representing defective mismatch repair, which shows high levels of recurrent insertions/deletions. Recurrence of both mutation types characterizes cancer genomes with somatic hypermutation of immunoglobulin genes and the cluster of genomes exposed to gastric acid. Low levels of recurrence are observed for the cluster where tobacco-smoke exposure induces mutagenesis and the one linked to increased activity of cytidine deaminases. Notably, the majority of substitutions are recurrent in a single tumour type, while recurrent insertions/deletions point to shared processes between tumour types. Recurrence also reveals susceptible sequence motifs, including TT[C>A]TTT and AAC[T>G]T for the POLE and 'gastric-acid exposure' clusters, respectively. Moreover, we refine knowledge of mutagenesis, including increased C/G deletion levels in general for lung tumours and specifically in midsize homopolymer sequence contexts for microsatellite instable tumours. Our findings are an important step towards the development of a generic cancer diagnostic test for clinical practice based on whole-genome sequencing that could replace multiple diagnostics currently in use.


Assuntos
Biologia Computacional/métodos , Neoplasias/classificação , Neoplasias/genética , Estudos de Coortes , Bases de Dados de Ácidos Nucleicos , Predisposição Genética para Doença/genética , Genoma Humano/genética , Humanos , Mutação INDEL/genética , Mutagênese/genética , Mutação/genética , Polimorfismo de Nucleotídeo Único/genética , Análise de Sequência de DNA/métodos , Deleção de Sequência/genética
5.
Biotechnol Bioeng ; 116(3): 677-692, 2019 03.
Artigo em Inglês | MEDLINE | ID: mdl-30512195

RESUMO

The existence of dynamic cellular phenotypes in changing environmental conditions is of major interest for cell biologists who aim to understand the mechanism and sequence of regulation of gene expression. In the context of therapeutic protein production by Chinese Hamster Ovary (CHO) cells, a detailed temporal understanding of cell-line behavior and control is necessary to achieve a more predictable and reliable process performance. Of particular interest are data on dynamic, temporally resolved transcriptional regulation of genes in response to altered substrate availability and culture conditions. In this study, the gene transcription dynamics throughout a 9-day batch culture of CHO cells was examined by analyzing histone modifications and gene expression profiles in regular 12- and 24-hr intervals, respectively. Three levels of regulation were observed: (a) the presence or absence of DNA methylation in the promoter region provides an ON/OFF switch; (b) a temporally resolved correlation is observed between the presence of active transcription- and promoter-specific histone marks and the expression level of the respective genes; and (c) a major mechanism of gene regulation is identified by interaction of coding genes with long non-coding RNA (lncRNA), as observed in the regulation of the expression level of both neighboring coding/lnc gene pairs and of gene pairs where the lncRNA is able to form RNA-DNA-DNA triplexes. Such triplex-forming regions were predominantly found in the promoter or enhancer region of the targeted coding gene. Significantly, the coding genes with the highest degree of variation in expression during the batch culture are characterized by a larger number of possible triplex-forming interactions with differentially expressed lncRNAs. This indicates a specific role of lncRNA-triplexes in enabling rapid and large changes in transcription. A more comprehensive understanding of these regulatory mechanisms will provide an opportunity for new tools to control cellular behavior and to engineer enhanced phenotypes.


Assuntos
Técnicas de Cultura Celular por Lotes/métodos , Epigênese Genética/genética , Regulação da Expressão Gênica/genética , Adaptação Fisiológica , Animais , Células CHO , Cricetinae , Cricetulus , Perfilação da Expressão Gênica , RNA Longo não Codificante/genética , Transcriptoma
6.
Genome Res ; 25(4): 478-87, 2015 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-25644835

RESUMO

While analyzing the DNA methylome of multiple myeloma (MM), a plasma cell neoplasm, by whole-genome bisulfite sequencing and high-density arrays, we observed a highly heterogeneous pattern globally characterized by regional DNA hypermethylation embedded in extensive hypomethylation. In contrast to the widely reported DNA hypermethylation of promoter-associated CpG islands (CGIs) in cancer, hypermethylated sites in MM, as opposed to normal plasma cells, were located outside CpG islands and were unexpectedly associated with intronic enhancer regions defined in normal B cells and plasma cells. Both RNA-seq and in vitro reporter assays indicated that enhancer hypermethylation is globally associated with down-regulation of its host genes. ChIP-seq and DNase-seq further revealed that DNA hypermethylation in these regions is related to enhancer decommissioning. Hypermethylated enhancer regions overlapped with binding sites of B cell-specific transcription factors (TFs) and the degree of enhancer methylation inversely correlated with expression levels of these TFs in MM. Furthermore, hypermethylated regions in MM were methylated in stem cells and gradually became demethylated during normal B-cell differentiation, suggesting that MM cells either reacquire epigenetic features of undifferentiated cells or maintain an epigenetic signature of a putative myeloma stem cell progenitor. Overall, we have identified DNA hypermethylation of developmentally regulated enhancers as a new type of epigenetic modification associated with the pathogenesis of MM.


Assuntos
Metilação de DNA/genética , Elementos Facilitadores Genéticos/genética , Mieloma Múltiplo/genética , Células-Tronco Neoplásicas/citologia , Plasmócitos/citologia , Diferenciação Celular/genética , Linhagem Celular Tumoral , Ilhas de CpG/genética , DNA de Neoplasias/genética , Regulação para Baixo/genética , Epigênese Genética/genética , Regulação Neoplásica da Expressão Gênica , Genoma Humano/genética , Humanos , Regiões Promotoras Genéticas , Fatores de Transcrição/biossíntese , Fatores de Transcrição/genética
7.
Theor Popul Biol ; 123: 70-79, 2018 09.
Artigo em Inglês | MEDLINE | ID: mdl-29964061

RESUMO

We introduce the conditional Site Frequency Spectrum (SFS) for a genomic region linked to a focal mutation of known frequency. An exact expression for its expected value is provided for the neutral model without recombination. Its relation with the expected SFS for two sites, 2-SFS, is discussed. These spectra derive from the coalescent approach of Fu (1995) for finite samples, which is reviewed. Remarkably simple expressions are obtained for the linked SFS of a large population, which are also solutions of the multi-allelic Kolmogorov equations. These formulae are the immediate extensions of the well known single site θ∕f neutral SFS. Besides the general interest in these spectra, they relate to relevant biological cases, such as structural variants and introgressions. As an application, a recipe to adapt Tajima's D and other SFS-based neutrality tests to a non-recombining region containing a neutral marker is presented.


Assuntos
Genética Populacional/métodos , Modelos Genéticos , Taxa de Mutação , Evolução Molecular , Desequilíbrio de Ligação , Seleção Genética
9.
Nature ; 452(7189): 840-5, 2008 Apr 17.
Artigo em Inglês | MEDLINE | ID: mdl-18421347

RESUMO

Sequencing DNA from several organisms has revealed that duplication and drift of existing genes have primarily moulded the contents of a given genome. Though the effect of knocking out or overexpressing a particular gene has been studied in many organisms, no study has systematically explored the effect of adding new links in a biological network. To explore network evolvability, we constructed 598 recombinations of promoters (including regulatory regions) with different transcription or sigma-factor genes in Escherichia coli, added over a wild-type genetic background. Here we show that approximately 95% of new networks are tolerated by the bacteria, that very few alter growth, and that expression level correlates with factor position in the wild-type network hierarchy. Most importantly, we find that certain networks consistently survive over the wild type under various selection pressures. Therefore new links in the network are rarely a barrier for evolution and can even confer a fitness advantage.


Assuntos
Escherichia coli/genética , Escherichia coli/metabolismo , Evolução Molecular , Regulação Bacteriana da Expressão Gênica/genética , Redes Reguladoras de Genes/genética , Engenharia Genética , Seleção Genética , Escherichia coli/crescimento & desenvolvimento , Proteínas de Escherichia coli/genética , Proteínas de Escherichia coli/metabolismo , Genes Bacterianos/genética , Resposta ao Choque Térmico , Análise de Sequência com Séries de Oligonucleotídeos , Fases de Leitura Aberta/genética , Regiões Promotoras Genéticas/genética , Inoculações Seriadas , Fator sigma/genética , Fator sigma/metabolismo , Fatores de Transcrição/genética , Fatores de Transcrição/metabolismo
10.
Nucleic Acids Res ; 40(20): 10073-83, 2012 Nov 01.
Artigo em Inglês | MEDLINE | ID: mdl-22962361

RESUMO

High-throughput sequencing of cDNA libraries constructed from cellular RNA complements (RNA-Seq) naturally provides a digital quantitative measurement for every expressed RNA molecule. Nature, impact and mutual interference of biases in different experimental setups are, however, still poorly understood-mostly due to the lack of data from intermediate protocol steps. We analysed multiple RNA-Seq experiments, involving different sample preparation protocols and sequencing platforms: we broke them down into their common--and currently indispensable--technical components (reverse transcription, fragmentation, adapter ligation, PCR amplification, gel segregation and sequencing), investigating how such different steps influence abundance and distribution of the sequenced reads. For each of those steps, we developed universally applicable models, which can be parameterised by empirical attributes of any experimental protocol. Our models are implemented in a computer simulation pipeline called the Flux Simulator, and we show that read distributions generated by different combinations of these models reproduce well corresponding evidence obtained from the corresponding experimental setups. We further demonstrate that our in silico RNA-Seq provides insights about hidden precursors that determine the final configuration of reads along gene bodies; enhancing or compensatory effects that explain apparently controversial observations can be observed. Moreover, our simulations identify hitherto unreported sources of systematic bias from RNA hydrolysis, a fragmentation technique currently employed by most RNA-Seq protocols.


Assuntos
Simulação por Computador , Perfilação da Expressão Gênica , Sequenciamento de Nucleotídeos em Larga Escala , Análise de Sequência de RNA , Hidrólise , RNA/metabolismo
11.
BMC Genomics ; 14: 148, 2013 Mar 05.
Artigo em Inglês | MEDLINE | ID: mdl-23497037

RESUMO

BACKGROUND: In contrast to international pig breeds, the Iberian breed has not been admixed with Asian germplasm. This makes it an important model to study both domestication and relevance of Asian genes in the pig. Besides, Iberian pigs exhibit high meat quality as well as appetite and propensity to obesity. Here we provide a genome wide analysis of nucleotide and structural diversity in a reduced representation library from a pool (n=9 sows) and shotgun genomic sequence from a single sow of the highly inbred Guadyerbas strain. In the pool, we applied newly developed tools to account for the peculiarities of these data. RESULTS: A total of 254,106 SNPs in the pool (79.6 Mb covered) and 643,783 in the Guadyerbas sow (1.47 Gb covered) were called. The nucleotide diversity (1.31x10-3 per bp in autosomes) is very similar to that reported in wild boar. A much lower than expected diversity in the X chromosome was confirmed (1.79x10-4 per bp in the individual and 5.83x10-4 per bp in the pool). A strong (0.70) correlation between recombination and variability was observed, but not with gene density or GC content. Multicopy regions affected about 4% of annotated pig genes in their entirety, and 2% of the genes partially. Genes within the lowest variability windows comprised interferon genes and, in chromosome X, genes involved in behavior like HTR2C or MCEP2. A modified Hudson-Kreitman-Aguadé test for pools also indicated an accelerated evolution in genes involved in behavior, as well as in spermatogenesis and in lipid metabolism. CONCLUSIONS: This work illustrates the strength of current sequencing technologies to picture a comprehensive landscape of variability in livestock species, and to pinpoint regions containing genes potentially under selection. Among those genes, we report genes involved in behavior, including feeding behavior, and lipid metabolism. The pig X chromosome is an outlier in terms of nucleotide diversity, which suggests selective constraints. Our data further confirm the importance of structural variation in the species, including Iberian pigs, and allowed us to identify new paralogs for known gene families.


Assuntos
Animais Endogâmicos/genética , Mapeamento Cromossômico , Polimorfismo de Nucleotídeo Único/genética , Suínos/genética , Animais , Cruzamento , Variação Genética , Nucleotídeos/genética
12.
BMC Genomics ; 14: 363, 2013 May 31.
Artigo em Inglês | MEDLINE | ID: mdl-23721540

RESUMO

BACKGROUND: The only known albino gorilla, named Snowflake, was a male wild born individual from Equatorial Guinea who lived at the Barcelona Zoo for almost 40 years. He was diagnosed with non-syndromic oculocutaneous albinism, i.e. white hair, light eyes, pink skin, photophobia and reduced visual acuity. Despite previous efforts to explain the genetic cause, this is still unknown. Here, we study the genetic cause of his albinism and making use of whole genome sequencing data we find a higher inbreeding coefficient compared to other gorillas. RESULTS: We successfully identified the causal genetic variant for Snowflake's albinism, a non-synonymous single nucleotide variant located in a transmembrane region of SLC45A2. This transporter is known to be involved in oculocutaneous albinism type 4 (OCA4) in humans. We provide experimental evidence that shows that this amino acid replacement alters the membrane spanning capability of this transmembrane region. Finally, we provide a comprehensive study of genome-wide patterns of autozygogosity revealing that Snowflake's parents were related, being this the first report of inbreeding in a wild born Western lowland gorilla. CONCLUSIONS: In this study we demonstrate how the use of whole genome sequencing can be extended to link genotype and phenotype in non-model organisms and it can be a powerful tool in conservation genetics (e.g., inbreeding and genetic diversity) with the expected decrease in sequencing cost.


Assuntos
Genômica , Gorilla gorilla/genética , Sequenciamento de Nucleotídeos em Larga Escala , Endogamia , Sequência de Aminoácidos , Animais , Feminino , Heterozigoto , Masculino , Proteínas de Membrana Transportadoras/química , Proteínas de Membrana Transportadoras/genética , Repetições de Microssatélites/genética , Dados de Sequência Molecular , Mutação , Análise de Sequência de DNA
13.
PLoS Biol ; 8(9)2010 Sep 07.
Artigo em Inglês | MEDLINE | ID: mdl-20838655

RESUMO

A synergistic combination of two next-generation sequencing platforms with a detailed comparative BAC physical contig map provided a cost-effective assembly of the genome sequence of the domestic turkey (Meleagris gallopavo). Heterozygosity of the sequenced source genome allowed discovery of more than 600,000 high quality single nucleotide variants. Despite this heterozygosity, the current genome assembly (∼1.1 Gb) includes 917 Mb of sequence assigned to specific turkey chromosomes. Annotation identified nearly 16,000 genes, with 15,093 recognized as protein coding and 611 as non-coding RNA genes. Comparative analysis of the turkey, chicken, and zebra finch genomes, and comparing avian to mammalian species, supports the characteristic stability of avian genomes and identifies genes unique to the avian lineage. Clear differences are seen in number and variety of genes of the avian immune system where expansions and novel genes are less frequent than examples of gene loss. The turkey genome sequence provides resources to further understand the evolution of vertebrate genomes and genetic variation underlying economically important quantitative traits in poultry. This integrated approach may be a model for providing both gene and chromosome level assemblies of other species with agricultural, ecological, and evolutionary interest.


Assuntos
Genoma , Perus/genética , Animais , Sequência de Bases , Mapeamento Cromossômico , DNA/genética , Polimorfismo de Nucleotídeo Único , Análise de Sequência de DNA , Homologia de Sequência do Ácido Nucleico , Especificidade da Espécie
14.
Nucleic Acids Res ; 39(16): 6886-95, 2011 Sep 01.
Artigo em Inglês | MEDLINE | ID: mdl-21624887

RESUMO

We present and validate BlastR, a method for efficiently and accurately searching non-coding RNAs. Our approach relies on the comparison of di-nucleotides using BlosumR, a new log-odd substitution matrix. In order to use BlosumR for comparison, we recoded RNA sequences into protein-like sequences. We then showed that BlosumR can be used along with the BlastP algorithm in order to search non-coding RNA sequences. Using Rfam as a gold standard, we benchmarked this approach and show BlastR to be more sensitive than BlastN. We also show that BlastR is both faster and more sensitive than BlastP used with a single nucleotide log-odd substitution matrix. BlastR, when used in combination with WU-BlastP, is about 5% more accurate than WU-BlastN and about 50 times slower. The approach shown here is equally effective when combined with the NCBI-Blast package. The software is an open source freeware available from www.tcoffee.org/blastr.html.


Assuntos
Bases de Dados de Ácidos Nucleicos , RNA não Traduzido/química , Análise de Sequência de RNA , Algoritmos , Alinhamento de Sequência , Software
15.
Bioinform Adv ; 3(1): vbac101, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-36726731

RESUMO

Summary: Nanopore reads encode information on the methylation status of cytosines in CpG dinucleotides. The length of the reads makes it comparatively easy to look at patterns consisting of multiple loci; here, we exploit this property to search for regions where one can define subpopulations of molecules based on methylation patterns. As an example, we run our clustering algorithm on known imprinted genes; we also scan chromosome 15 looking for windows corresponding to heterogeneous methylation. Our software can also compute the covariance of methylation across these regions while keeping into account the mixture of different types of reads. Availability and implementation: https://github.com/EmanueleRaineri/cvlr. Contact: simon.heath@cnag.crg.eu. Supplementary information: Supplementary data are available at Bioinformatics Advances online.

16.
BMC Bioinformatics ; 13: 239, 2012 Sep 20.
Artigo em Inglês | MEDLINE | ID: mdl-22992255

RESUMO

BACKGROUND: Performing high throughput sequencing on samples pooled from different individuals is a strategy to characterize genetic variability at a small fraction of the cost required for individual sequencing. In certain circumstances some variability estimators have even lower variance than those obtained with individual sequencing. SNP calling and estimating the frequency of the minor allele from pooled samples, though, is a subtle exercise for at least three reasons. First, sequencing errors may have a much larger relevance than in individual SNP calling: while their impact in individual sequencing can be reduced by setting a restriction on a minimum number of reads per allele, this would have a strong and undesired effect in pools because it is unlikely that alleles at low frequency in the pool will be read many times. Second, the prior allele frequency for heterozygous sites in individuals is usually 0.5 (assuming one is not analyzing sequences coming from, e.g. cancer tissues), but this is not true in pools: in fact, under the standard neutral model, singletons (i.e. alleles of minimum frequency) are the most common class of variants because P(f) ∝ 1/f and they occur more often as the sample size increases. Third, an allele appearing only once in the reads from a pool does not necessarily correspond to a singleton in the set of individuals making up the pool, and vice versa, there can be more than one read - or, more likely, none - from a true singleton. RESULTS: To improve upon existing theory and software packages, we have developed a Bayesian approach for minor allele frequency (MAF) computation and SNP calling in pools (and implemented it in a program called snape): the approach takes into account sequencing errors and allows users to choose different priors. We also set up a pipeline which can simulate the coalescence process giving rise to the SNPs, the pooling procedure and the sequencing. We used it to compare the performance of snape to that of other packages. CONCLUSIONS: We present a software which helps in calling SNPs in pooled samples: it has good power while retaining a low false discovery rate (FDR). The method also provides the posterior probability that a SNP is segregating and the full posterior distribution of f for every SNP. In order to test the behaviour of our software, we generated (through simulated coalescence) artificial genomes and computed the effect of a pooled sequencing protocol, followed by SNP calling. In this setting, snape has better power and False Discovery Rate (FDR) than the comparable packages samtools, PoPoolation, Varscan : for N = 50 chromosomes, snape has power ≈ 35%and FDR ≈ 2.5%. snape is available at http://code.google.com/p/snape-pooled/ (source code and precompiled binaries).


Assuntos
Sequenciamento de Nucleotídeos em Larga Escala/métodos , Polimorfismo de Nucleotídeo Único , Análise de Sequência de DNA/métodos , Alelos , Teorema de Bayes , Frequência do Gene , Genoma , Humanos , Software
17.
Bioinformatics ; 26(14): 1685-9, 2010 Jul 15.
Artigo em Inglês | MEDLINE | ID: mdl-20519287

RESUMO

MOTIVATION: Molecular chaperones prevent the aggregation of their substrate proteins and thereby ensure that they reach their functional native state. The bacterial GroEL/ES chaperonin system is understood in great detail on a structural, mechanistic and functional level; its interactors in Escherichia coli have been identified and characterized. However, a long-standing question in the field is: What makes a protein a chaperone substrate? RESULTS: Here we identify, using a bioinformatics-based approach a simple set of quantities, which characterize the GroEL-substrate proteome. We define three novel parameters differentiating GroEL interactors from other cellular proteins: lower rate of evolution, hydrophobicity and aggregation propensity. Combining them with other known features to a simple Bayesian predictor allows us to identify known homologous and heterologous GroEL substrateproteins. We discuss our findings in relation to established mechanisms of protein folding and evolutionary buffering by chaperones.


Assuntos
Chaperoninas/química , Biologia Computacional/métodos , Chaperonina 60/química , Chaperonina 60/metabolismo , Evolução Molecular , Cinética , Dobramento de Proteína , Proteoma/metabolismo
18.
Bioinformatics ; 24(24): 2839-48, 2008 Dec 15.
Artigo em Inglês | MEDLINE | ID: mdl-18845582

RESUMO

BACKGROUND: The computation of the statistical properties of motif occurrences has an obviously relevant application: patterns that are significantly over- or under-represented in genomes or proteins are interesting candidates for biological roles. However, the problem is computationally hard; as a result, virtually all the existing motif finders use fast but approximate scoring functions, in spite of the fact that they have been shown to produce systematically incorrect results. A few interesting exact approaches are known, but they are very slow and hence not practical in the case of realistic sequences. RESULTS: We give an exact solution, solely based on deterministic finite-state automata (DFA), to the problem of finding the whole relevant part of the probability distribution function of a simple-word motif in a homogeneous (biological) sequence. Out of that, the z-value can always be computed, while the P-value can be obtained either when it is not too extreme with respect to the number of floating-point digits available in the implementation, or when the number of pattern occurrences is moderately low. In particular, the time complexity of the algorithms for Markov models of moderate order (0 < or = m < or = 2) is far better than that of Nuel, which was the fastest similar exact algorithm known to date; in many cases, even approximate methods are outperformed. CONCLUSIONS: DFA are a standard tool of computer science for the study of patterns; previous works in biology propose algorithms involving automata, but there they are used, respectively, as a first step to write a generating function, or to build a finite Markov-chain imbedding (FMCI). In contrast, we directly rely on DFA to perform the calculations; thus we manage to obtain an algorithm which is both easily interpretable and efficient. This approach can be used for exact statistical studies of very long genomes and protein sequences, as we illustrate with some examples on the scale of the human genome.


Assuntos
Biologia Computacional/métodos , Genômica/métodos , Cadeias de Markov , Motivos de Aminoácidos , Genoma Humano , Humanos , Probabilidade , Análise de Sequência de Proteína
19.
Cell Rep ; 17(8): 2101-2111, 2016 11 15.
Artigo em Inglês | MEDLINE | ID: mdl-27851971

RESUMO

DNA methylation and the localization and post-translational modification of nucleosomes are interdependent factors that contribute to the generation of distinct phenotypes from genetically identical cells. With 112 whole-genome bisulfite sequencing datasets from the BLUEPRINT Epigenome Project, we analyzed the global development of DNA methylation patterns during lineage commitment and maturation of a range of immune system effector cells and the cancers that arise from them. We show clear trends in methylation patterns that are distinct in the innate and adaptive arms of the human immune system, both globally and in relation to consistently positioned nucleosomes. Most notable are a progressive loss of methylation in developing lymphocytes and the consistent occurrence of non-CG methylation in specific cell types. Cancer samples from the two lineages are further polarized, suggesting the involvement of distinct lineage-specific epigenetic mechanisms. We anticipate broad utility for this resource as a basis for further comparative epigenetic analyses.


Assuntos
Imunidade Adaptativa/genética , Metilação de DNA/genética , Imunidade Inata/genética , Linfócitos B/metabolismo , Sequência de Bases , Sítios de Ligação , Fator de Ligação a CCCTC , Fosfatos de Dinucleosídeos/genética , Éxons/genética , Humanos , Linfócitos/metabolismo , Células Mieloides/metabolismo , Nucleossomos
20.
Cancer Cell ; 30(5): 806-821, 2016 Nov 14.
Artigo em Inglês | MEDLINE | ID: mdl-27846393

RESUMO

We analyzed the in silico purified DNA methylation signatures of 82 mantle cell lymphomas (MCL) in comparison with cell subpopulations spanning the entire B cell lineage. We identified two MCL subgroups, respectively carrying epigenetic imprints of germinal-center-inexperienced and germinal-center-experienced B cells, and we found that DNA methylation profiles during lymphomagenesis are largely influenced by the methylation dynamics in normal B cells. An integrative epigenomic approach revealed 10,504 differentially methylated regions in regulatory elements marked by H3K27ac in MCL primary cases, including a distant enhancer showing de novo looping to the MCL oncogene SOX11. Finally, we observed that the magnitude of DNA methylation changes per case is highly variable and serves as an independent prognostic factor for MCL outcome.


Assuntos
Metilação de DNA , Elementos Facilitadores Genéticos , Epigenômica/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Linfoma de Célula do Manto/genética , Linfócitos B/metabolismo , Linhagem Celular Tumoral , Linhagem da Célula , Simulação por Computador , Epigênese Genética , Regulação Neoplásica da Expressão Gênica , Humanos , Fatores de Transcrição SOXC/genética
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA