Your browser doesn't support javascript.
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 24
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
PLoS Comput Biol ; 15(11): e1007496, 2019 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-31765368

RESUMO

The sheer size of the human genome makes it improbable that identical somatic mutations at the exact same position are observed in multiple tumours solely by chance. The scarcity of cancer driver mutations also precludes positive selection as the sole explanation. Therefore, recurrent mutations may be highly informative of characteristics of mutational processes. To explore the potential, we use recurrence as a starting point to cluster >2,500 whole genomes of a pan-cancer cohort. We describe each genome with 13 recurrence-based and 29 general mutational features. Using principal component analysis we reduce the dimensionality and create independent features. We apply hierarchical clustering to the first 18 principal components followed by k-means clustering. We show that the resulting 16 clusters capture clinically relevant cancer phenotypes. High levels of recurrent substitutions separate the clusters that we link to UV-light exposure and deregulated activity of POLE from the one representing defective mismatch repair, which shows high levels of recurrent insertions/deletions. Recurrence of both mutation types characterizes cancer genomes with somatic hypermutation of immunoglobulin genes and the cluster of genomes exposed to gastric acid. Low levels of recurrence are observed for the cluster where tobacco-smoke exposure induces mutagenesis and the one linked to increased activity of cytidine deaminases. Notably, the majority of substitutions are recurrent in a single tumour type, while recurrent insertions/deletions point to shared processes between tumour types. Recurrence also reveals susceptible sequence motifs, including TT[C>A]TTT and AAC[T>G]T for the POLE and 'gastric-acid exposure' clusters, respectively. Moreover, we refine knowledge of mutagenesis, including increased C/G deletion levels in general for lung tumours and specifically in midsize homopolymer sequence contexts for microsatellite instable tumours. Our findings are an important step towards the development of a generic cancer diagnostic test for clinical practice based on whole-genome sequencing that could replace multiple diagnostics currently in use.

2.
Nucleic Acids Res ; 47(6): 2778-2792, 2019 04 08.
Artigo em Inglês | MEDLINE | ID: mdl-30799488

RESUMO

The concept of tissue-specific gene expression posits that lineage-determining transcription factors (LDTFs) determine the open chromatin profile of a cell via collaborative binding, providing molecular beacons to signal-dependent transcription factors (SDTFs). However, the guiding principles of LDTF binding, chromatin accessibility and enhancer activity have not yet been systematically evaluated. We sought to study these features of the macrophage genome by the combination of experimental (ChIP-seq, ATAC-seq and GRO-seq) and computational approaches. We show that Random Forest and Support Vector Regression machine learning methods can accurately predict chromatin accessibility using the binding patterns of the LDTF PU.1 and four other key TFs of macrophages (IRF8, JUNB, CEBPA and RUNX1). Any of these TFs alone were not sufficient to predict open chromatin, indicating that TF binding is widespread at closed or weakly opened chromatin regions. Analysis of the PU.1 cistrome revealed that two-thirds of PU.1 binding occurs at low accessible chromatin. We termed these sites labelled regulatory elements (LREs), which may represent a dormant state of a future enhancer and contribute to macrophage cellular plasticity. Collectively, our work demonstrates the existence of LREs occupied by various key TFs, regulating specific gene expression programs triggered by divergent macrophage polarizing stimuli.


Assuntos
Montagem e Desmontagem da Cromatina/fisiologia , Macrófagos/metabolismo , Sequências Reguladoras de Ácido Nucleico , Fatores de Transcrição/metabolismo , Animais , Células Cultivadas , Biologia Computacional , Regulação da Expressão Gênica/fisiologia , Genoma , Aprendizado de Máquina , Camundongos , Camundongos Endogâmicos C57BL , Ligação Proteica/fisiologia , Coloração e Rotulagem/métodos , Ativação Transcricional/fisiologia
3.
Biotechnol Bioeng ; 116(3): 677-692, 2019 03.
Artigo em Inglês | MEDLINE | ID: mdl-30512195

RESUMO

The existence of dynamic cellular phenotypes in changing environmental conditions is of major interest for cell biologists who aim to understand the mechanism and sequence of regulation of gene expression. In the context of therapeutic protein production by Chinese Hamster Ovary (CHO) cells, a detailed temporal understanding of cell-line behavior and control is necessary to achieve a more predictable and reliable process performance. Of particular interest are data on dynamic, temporally resolved transcriptional regulation of genes in response to altered substrate availability and culture conditions. In this study, the gene transcription dynamics throughout a 9-day batch culture of CHO cells was examined by analyzing histone modifications and gene expression profiles in regular 12- and 24-hr intervals, respectively. Three levels of regulation were observed: (a) the presence or absence of DNA methylation in the promoter region provides an ON/OFF switch; (b) a temporally resolved correlation is observed between the presence of active transcription- and promoter-specific histone marks and the expression level of the respective genes; and (c) a major mechanism of gene regulation is identified by interaction of coding genes with long non-coding RNA (lncRNA), as observed in the regulation of the expression level of both neighboring coding/lnc gene pairs and of gene pairs where the lncRNA is able to form RNA-DNA-DNA triplexes. Such triplex-forming regions were predominantly found in the promoter or enhancer region of the targeted coding gene. Significantly, the coding genes with the highest degree of variation in expression during the batch culture are characterized by a larger number of possible triplex-forming interactions with differentially expressed lncRNAs. This indicates a specific role of lncRNA-triplexes in enabling rapid and large changes in transcription. A more comprehensive understanding of these regulatory mechanisms will provide an opportunity for new tools to control cellular behavior and to engineer enhanced phenotypes.

4.
Theor Popul Biol ; 123: 70-79, 2018 09.
Artigo em Inglês | MEDLINE | ID: mdl-29964061

RESUMO

We introduce the conditional Site Frequency Spectrum (SFS) for a genomic region linked to a focal mutation of known frequency. An exact expression for its expected value is provided for the neutral model without recombination. Its relation with the expected SFS for two sites, 2-SFS, is discussed. These spectra derive from the coalescent approach of Fu (1995) for finite samples, which is reviewed. Remarkably simple expressions are obtained for the linked SFS of a large population, which are also solutions of the multi-allelic Kolmogorov equations. These formulae are the immediate extensions of the well known single site θ∕f neutral SFS. Besides the general interest in these spectra, they relate to relevant biological cases, such as structural variants and introgressions. As an application, a recipe to adapt Tajima's D and other SFS-based neutrality tests to a non-recombining region containing a neutral marker is presented.


Assuntos
Genética Populacional/métodos , Modelos Genéticos , Taxa de Mutação , Evolução Molecular , Desequilíbrio de Ligação , Seleção Genética
6.
Genome Res ; 27(1): 95-106, 2017 01.
Artigo em Inglês | MEDLINE | ID: mdl-27821408

RESUMO

The impact of RNA structures in coding sequences (CDS) within mRNAs is poorly understood. Here, we identify a novel and highly conserved mechanism of translational control involving RNA structures within coding sequences and the DEAD-box helicase Dhh1. Using yeast genetics and genome-wide ribosome profiling analyses, we show that this mechanism, initially derived from studies of the Brome Mosaic virus RNA genome, extends to yeast and human mRNAs highly enriched in membrane and secreted proteins. All Dhh1-dependent mRNAs, viral and cellular, share key common features. First, they contain long and highly structured CDSs, including a region located around nucleotide 70 after the translation initiation site; second, they are directly bound by Dhh1 with a specific binding distribution; and third, complementary experimental approaches suggest that they are activated by Dhh1 at the translation initiation step. Our results show that ribosome translocation is not the only unwinding force of CDS and uncover a novel layer of translational control that involves RNA helicases and RNA folding within CDS providing novel opportunities for regulation of membrane and secretome proteins.


Assuntos
RNA Helicases DEAD-box/genética , Iniciação Traducional da Cadeia Peptídica , Biossíntese de Proteínas , RNA/genética , Proteínas de Saccharomyces cerevisiae/genética , Bromovirus/genética , Éxons/genética , Regulação da Expressão Gênica/genética , Humanos , Conformação de Ácido Nucleico , Fases de Leitura Aberta/genética , RNA Mensageiro/genética , Ribossomos/genética , Saccharomyces cerevisiae/genética
7.
Cell Rep ; 17(8): 2101-2111, 2016 11 15.
Artigo em Inglês | MEDLINE | ID: mdl-27851971

RESUMO

DNA methylation and the localization and post-translational modification of nucleosomes are interdependent factors that contribute to the generation of distinct phenotypes from genetically identical cells. With 112 whole-genome bisulfite sequencing datasets from the BLUEPRINT Epigenome Project, we analyzed the global development of DNA methylation patterns during lineage commitment and maturation of a range of immune system effector cells and the cancers that arise from them. We show clear trends in methylation patterns that are distinct in the innate and adaptive arms of the human immune system, both globally and in relation to consistently positioned nucleosomes. Most notable are a progressive loss of methylation in developing lymphocytes and the consistent occurrence of non-CG methylation in specific cell types. Cancer samples from the two lineages are further polarized, suggesting the involvement of distinct lineage-specific epigenetic mechanisms. We anticipate broad utility for this resource as a basis for further comparative epigenetic analyses.


Assuntos
Imunidade Adaptativa/genética , Metilação de DNA/genética , Imunidade Inata/genética , Linfócitos B/metabolismo , Sequência de Bases , Sítios de Ligação , Fator de Ligação a CCCTC , Fosfatos de Dinucleosídeos/genética , Éxons/genética , Humanos , Linfócitos/metabolismo , Células Mieloides/metabolismo , Nucleossomos
8.
Cancer Cell ; 30(5): 806-821, 2016 Nov 14.
Artigo em Inglês | MEDLINE | ID: mdl-27846393

RESUMO

We analyzed the in silico purified DNA methylation signatures of 82 mantle cell lymphomas (MCL) in comparison with cell subpopulations spanning the entire B cell lineage. We identified two MCL subgroups, respectively carrying epigenetic imprints of germinal-center-inexperienced and germinal-center-experienced B cells, and we found that DNA methylation profiles during lymphomagenesis are largely influenced by the methylation dynamics in normal B cells. An integrative epigenomic approach revealed 10,504 differentially methylated regions in regulatory elements marked by H3K27ac in MCL primary cases, including a distant enhancer showing de novo looping to the MCL oncogene SOX11. Finally, we observed that the magnitude of DNA methylation changes per case is highly variable and serves as an independent prognostic factor for MCL outcome.


Assuntos
Metilação de DNA , Elementos Facilitadores Genéticos , Epigenômica/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Linfoma de Célula do Manto/genética , Linfócitos B/metabolismo , Linhagem Celular Tumoral , Linhagem da Célula , Simulação por Computador , Epigênese Genética , Regulação Neoplásica da Expressão Gênica , Humanos , Fatores de Transcrição SOXC/genética
9.
Nat Commun ; 6: 10001, 2015 Dec 09.
Artigo em Inglês | MEDLINE | ID: mdl-26647970

RESUMO

As whole-genome sequencing for cancer genome analysis becomes a clinical tool, a full understanding of the variables affecting sequencing analysis output is required. Here using tumour-normal sample pairs from two different types of cancer, chronic lymphocytic leukaemia and medulloblastoma, we conduct a benchmarking exercise within the context of the International Cancer Genome Consortium. We compare sequencing methods, analysis pipelines and validation methods. We show that using PCR-free methods and increasing sequencing depth to ∼ 100 × shows benefits, as long as the tumour:control coverage ratio remains balanced. We observe widely varying mutation call rates and low concordance among analysis pipelines, reflecting the artefact-prone nature of the raw data and lack of standards for dealing with the artefacts. However, we show that, using the benchmark mutation set we have created, many issues are in fact easy to remedy and have an immediate positive impact on mutation detection accuracy.


Assuntos
Sequenciamento de Nucleotídeos em Larga Escala/métodos , Leucemia Linfoide/genética , Meduloblastoma/genética , Mutação , Genoma Humano , Humanos
10.
Nat Genet ; 47(7): 746-56, 2015 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-26053498

RESUMO

We analyzed the DNA methylome of ten subpopulations spanning the entire B cell differentiation program by whole-genome bisulfite sequencing and high-density microarrays. We observed that non-CpG methylation disappeared upon B cell commitment, whereas CpG methylation changed extensively during B cell maturation, showing an accumulative pattern and affecting around 30% of all measured CpG sites. Early differentiation stages mainly displayed enhancer demethylation, which was associated with upregulation of key B cell transcription factors and affected multiple genes involved in B cell biology. Late differentiation stages, in contrast, showed extensive demethylation of heterochromatin and methylation gain at Polycomb-repressed areas, and genes with apparent functional impact in B cells were not affected. This signature, which has previously been linked to aging and cancer, was particularly widespread in mature cells with an extended lifespan. Comparing B cell neoplasms with their normal counterparts, we determined that they frequently acquire methylation changes in regions already undergoing dynamic methylation during normal B cell differentiation.


Assuntos
Linfócitos B/fisiologia , Metilação de DNA , Epigênese Genética/imunologia , Sequência de Bases , Diferenciação Celular , Células Cultivadas , Ilhas de CpG , Regulação Leucêmica da Expressão Gênica , Genoma Humano , Humanos , Leucemia de Células B/genética , Análise de Sequência de DNA
11.
Genome Res ; 25(4): 478-87, 2015 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-25644835

RESUMO

While analyzing the DNA methylome of multiple myeloma (MM), a plasma cell neoplasm, by whole-genome bisulfite sequencing and high-density arrays, we observed a highly heterogeneous pattern globally characterized by regional DNA hypermethylation embedded in extensive hypomethylation. In contrast to the widely reported DNA hypermethylation of promoter-associated CpG islands (CGIs) in cancer, hypermethylated sites in MM, as opposed to normal plasma cells, were located outside CpG islands and were unexpectedly associated with intronic enhancer regions defined in normal B cells and plasma cells. Both RNA-seq and in vitro reporter assays indicated that enhancer hypermethylation is globally associated with down-regulation of its host genes. ChIP-seq and DNase-seq further revealed that DNA hypermethylation in these regions is related to enhancer decommissioning. Hypermethylated enhancer regions overlapped with binding sites of B cell-specific transcription factors (TFs) and the degree of enhancer methylation inversely correlated with expression levels of these TFs in MM. Furthermore, hypermethylated regions in MM were methylated in stem cells and gradually became demethylated during normal B-cell differentiation, suggesting that MM cells either reacquire epigenetic features of undifferentiated cells or maintain an epigenetic signature of a putative myeloma stem cell progenitor. Overall, we have identified DNA hypermethylation of developmentally regulated enhancers as a new type of epigenetic modification associated with the pathogenesis of MM.


Assuntos
Metilação de DNA/genética , Elementos Facilitadores Genéticos/genética , Mieloma Múltiplo/genética , Células-Tronco Neoplásicas/citologia , Plasmócitos/citologia , Diferenciação Celular/genética , Linhagem Celular Tumoral , Ilhas de CpG/genética , DNA de Neoplasias/genética , Regulação para Baixo/genética , Epigênese Genética/genética , Regulação Neoplásica da Expressão Gênica , Genoma Humano/genética , Humanos , Regiões Promotoras Genéticas , Fatores de Transcrição/biossíntese , Fatores de Transcrição/genética
12.
PLoS One ; 9(5): e97349, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-24824426

RESUMO

We apply a known algorithm for computing exactly inequalities between Beta distributions to assess whether a given position in a genome is differentially methylated across samples. We discuss the advantages brought by the adoption of this solution with respect to two approximations (Fisher's test and Z score). The same formalism presented here can be applied in a similar way to variant calling.


Assuntos
Metilação de DNA/genética , Genoma/genética , Modelos Genéticos , Teorema de Bayes , Genômica/métodos , Probabilidade
13.
BMC Genomics ; 14: 363, 2013 May 31.
Artigo em Inglês | MEDLINE | ID: mdl-23721540

RESUMO

BACKGROUND: The only known albino gorilla, named Snowflake, was a male wild born individual from Equatorial Guinea who lived at the Barcelona Zoo for almost 40 years. He was diagnosed with non-syndromic oculocutaneous albinism, i.e. white hair, light eyes, pink skin, photophobia and reduced visual acuity. Despite previous efforts to explain the genetic cause, this is still unknown. Here, we study the genetic cause of his albinism and making use of whole genome sequencing data we find a higher inbreeding coefficient compared to other gorillas. RESULTS: We successfully identified the causal genetic variant for Snowflake's albinism, a non-synonymous single nucleotide variant located in a transmembrane region of SLC45A2. This transporter is known to be involved in oculocutaneous albinism type 4 (OCA4) in humans. We provide experimental evidence that shows that this amino acid replacement alters the membrane spanning capability of this transmembrane region. Finally, we provide a comprehensive study of genome-wide patterns of autozygogosity revealing that Snowflake's parents were related, being this the first report of inbreeding in a wild born Western lowland gorilla. CONCLUSIONS: In this study we demonstrate how the use of whole genome sequencing can be extended to link genotype and phenotype in non-model organisms and it can be a powerful tool in conservation genetics (e.g., inbreeding and genetic diversity) with the expected decrease in sequencing cost.


Assuntos
Genômica , Gorilla gorilla/genética , Sequenciamento de Nucleotídeos em Larga Escala , Endogamia , Sequência de Aminoácidos , Animais , Feminino , Heterozigoto , Masculino , Proteínas de Membrana Transportadoras/química , Proteínas de Membrana Transportadoras/genética , Repetições de Microssatélites/genética , Dados de Sequência Molecular , Mutação , Análise de Sequência de DNA
14.
BMC Genomics ; 14: 148, 2013 Mar 05.
Artigo em Inglês | MEDLINE | ID: mdl-23497037

RESUMO

BACKGROUND: In contrast to international pig breeds, the Iberian breed has not been admixed with Asian germplasm. This makes it an important model to study both domestication and relevance of Asian genes in the pig. Besides, Iberian pigs exhibit high meat quality as well as appetite and propensity to obesity. Here we provide a genome wide analysis of nucleotide and structural diversity in a reduced representation library from a pool (n=9 sows) and shotgun genomic sequence from a single sow of the highly inbred Guadyerbas strain. In the pool, we applied newly developed tools to account for the peculiarities of these data. RESULTS: A total of 254,106 SNPs in the pool (79.6 Mb covered) and 643,783 in the Guadyerbas sow (1.47 Gb covered) were called. The nucleotide diversity (1.31x10-3 per bp in autosomes) is very similar to that reported in wild boar. A much lower than expected diversity in the X chromosome was confirmed (1.79x10-4 per bp in the individual and 5.83x10-4 per bp in the pool). A strong (0.70) correlation between recombination and variability was observed, but not with gene density or GC content. Multicopy regions affected about 4% of annotated pig genes in their entirety, and 2% of the genes partially. Genes within the lowest variability windows comprised interferon genes and, in chromosome X, genes involved in behavior like HTR2C or MCEP2. A modified Hudson-Kreitman-Aguadé test for pools also indicated an accelerated evolution in genes involved in behavior, as well as in spermatogenesis and in lipid metabolism. CONCLUSIONS: This work illustrates the strength of current sequencing technologies to picture a comprehensive landscape of variability in livestock species, and to pinpoint regions containing genes potentially under selection. Among those genes, we report genes involved in behavior, including feeding behavior, and lipid metabolism. The pig X chromosome is an outlier in terms of nucleotide diversity, which suggests selective constraints. Our data further confirm the importance of structural variation in the species, including Iberian pigs, and allowed us to identify new paralogs for known gene families.


Assuntos
Animais Endogâmicos/genética , Mapeamento Cromossômico , Polimorfismo de Nucleotídeo Único/genética , Suínos/genética , Animais , Cruzamento , Variação Genética , Nucleotídeos/genética
15.
BMC Bioinformatics ; 13: 239, 2012 Sep 20.
Artigo em Inglês | MEDLINE | ID: mdl-22992255

RESUMO

BACKGROUND: Performing high throughput sequencing on samples pooled from different individuals is a strategy to characterize genetic variability at a small fraction of the cost required for individual sequencing. In certain circumstances some variability estimators have even lower variance than those obtained with individual sequencing. SNP calling and estimating the frequency of the minor allele from pooled samples, though, is a subtle exercise for at least three reasons. First, sequencing errors may have a much larger relevance than in individual SNP calling: while their impact in individual sequencing can be reduced by setting a restriction on a minimum number of reads per allele, this would have a strong and undesired effect in pools because it is unlikely that alleles at low frequency in the pool will be read many times. Second, the prior allele frequency for heterozygous sites in individuals is usually 0.5 (assuming one is not analyzing sequences coming from, e.g. cancer tissues), but this is not true in pools: in fact, under the standard neutral model, singletons (i.e. alleles of minimum frequency) are the most common class of variants because P(f) ∝ 1/f and they occur more often as the sample size increases. Third, an allele appearing only once in the reads from a pool does not necessarily correspond to a singleton in the set of individuals making up the pool, and vice versa, there can be more than one read - or, more likely, none - from a true singleton. RESULTS: To improve upon existing theory and software packages, we have developed a Bayesian approach for minor allele frequency (MAF) computation and SNP calling in pools (and implemented it in a program called snape): the approach takes into account sequencing errors and allows users to choose different priors. We also set up a pipeline which can simulate the coalescence process giving rise to the SNPs, the pooling procedure and the sequencing. We used it to compare the performance of snape to that of other packages. CONCLUSIONS: We present a software which helps in calling SNPs in pooled samples: it has good power while retaining a low false discovery rate (FDR). The method also provides the posterior probability that a SNP is segregating and the full posterior distribution of f for every SNP. In order to test the behaviour of our software, we generated (through simulated coalescence) artificial genomes and computed the effect of a pooled sequencing protocol, followed by SNP calling. In this setting, snape has better power and False Discovery Rate (FDR) than the comparable packages samtools, PoPoolation, Varscan : for N = 50 chromosomes, snape has power ≈ 35%and FDR ≈ 2.5%. snape is available at http://code.google.com/p/snape-pooled/ (source code and precompiled binaries).


Assuntos
Sequenciamento de Nucleotídeos em Larga Escala/métodos , Polimorfismo de Nucleotídeo Único , Análise de Sequência de DNA/métodos , Alelos , Teorema de Bayes , Frequência do Gene , Genoma , Humanos , Software
16.
Nucleic Acids Res ; 40(20): 10073-83, 2012 Nov 01.
Artigo em Inglês | MEDLINE | ID: mdl-22962361

RESUMO

High-throughput sequencing of cDNA libraries constructed from cellular RNA complements (RNA-Seq) naturally provides a digital quantitative measurement for every expressed RNA molecule. Nature, impact and mutual interference of biases in different experimental setups are, however, still poorly understood-mostly due to the lack of data from intermediate protocol steps. We analysed multiple RNA-Seq experiments, involving different sample preparation protocols and sequencing platforms: we broke them down into their common--and currently indispensable--technical components (reverse transcription, fragmentation, adapter ligation, PCR amplification, gel segregation and sequencing), investigating how such different steps influence abundance and distribution of the sequenced reads. For each of those steps, we developed universally applicable models, which can be parameterised by empirical attributes of any experimental protocol. Our models are implemented in a computer simulation pipeline called the Flux Simulator, and we show that read distributions generated by different combinations of these models reproduce well corresponding evidence obtained from the corresponding experimental setups. We further demonstrate that our in silico RNA-Seq provides insights about hidden precursors that determine the final configuration of reads along gene bodies; enhancing or compensatory effects that explain apparently controversial observations can be observed. Moreover, our simulations identify hitherto unreported sources of systematic bias from RNA hydrolysis, a fragmentation technique currently employed by most RNA-Seq protocols.


Assuntos
Simulação por Computador , Perfilação da Expressão Gênica , Sequenciamento de Nucleotídeos em Larga Escala , Análise de Sequência de RNA , Hidrólise , RNA/metabolismo
17.
Genetics ; 191(4): 1397-401, 2012 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-22661328

RESUMO

Missing data are common in DNA sequences obtained through high-throughput sequencing. Furthermore, samples of low quality or problems in the experimental protocol often cause a loss of data even with traditional sequencing technologies. Here we propose modified estimators of variability and neutrality tests that can be naturally applied to sequences with missing data, without the need to remove bases or individuals from the analysis. Modified statistics include the Watterson estimator θW, Tajima's D, Fay and Wu's H, and HKA. We develop a general framework to take missing data into account in frequency spectrum-based neutrality tests and we derive the exact expression for the variance of these statistics under the neutral model. The neutrality tests proposed here can also be used as summary statistics to describe the information contained in other classes of data like DNA microarrays.


Assuntos
Frequência do Gene , Variação Genética , Modelos Genéticos , Algoritmos , Simulação por Computador , Interpretação Estatística de Dados , Genética Populacional
18.
PLoS One ; 7(1): e30377, 2012.
Artigo em Inglês | MEDLINE | ID: mdl-22276185

RESUMO

We present a fast mapping-based algorithm to compute the mappability of each region of a reference genome up to a specified number of mismatches. Knowing the mappability of a genome is crucial for the interpretation of massively parallel sequencing experiments. We investigate the properties of the mappability of eukaryotic DNA/RNA both as a whole and at the level of the gene family, providing for various organisms tracks which allow the mappability information to be visually explored. In addition, we show that mappability varies greatly between species and gene classes. Finally, we suggest several practical applications where mappability can be used to refine the analysis of high-throughput sequencing data (SNP calling, gene expression quantification and paired-end experiments). This work highlights mappability as an important concept which deserves to be taken into full account, in particular when massively parallel sequencing technologies are employed. The GEM mappability program belongs to the GEM (GEnome Multitool) suite of programs, which can be freely downloaded for any use from its website (http://gemlibrary.sourceforge.net).


Assuntos
Algoritmos , Biologia Computacional/métodos , Genoma Humano/genética , Análise de Sequência de DNA/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Humanos
19.
Nucleic Acids Res ; 39(16): 6886-95, 2011 Sep 01.
Artigo em Inglês | MEDLINE | ID: mdl-21624887

RESUMO

We present and validate BlastR, a method for efficiently and accurately searching non-coding RNAs. Our approach relies on the comparison of di-nucleotides using BlosumR, a new log-odd substitution matrix. In order to use BlosumR for comparison, we recoded RNA sequences into protein-like sequences. We then showed that BlosumR can be used along with the BlastP algorithm in order to search non-coding RNA sequences. Using Rfam as a gold standard, we benchmarked this approach and show BlastR to be more sensitive than BlastN. We also show that BlastR is both faster and more sensitive than BlastP used with a single nucleotide log-odd substitution matrix. BlastR, when used in combination with WU-BlastP, is about 5% more accurate than WU-BlastN and about 50 times slower. The approach shown here is equally effective when combined with the NCBI-Blast package. The software is an open source freeware available from www.tcoffee.org/blastr.html.


Assuntos
Bases de Dados de Ácidos Nucleicos , RNA não Traduzido/química , Análise de Sequência de RNA , Algoritmos , Alinhamento de Sequência , Software
20.
PLoS Biol ; 8(9)2010 Sep 07.
Artigo em Inglês | MEDLINE | ID: mdl-20838655

RESUMO

A synergistic combination of two next-generation sequencing platforms with a detailed comparative BAC physical contig map provided a cost-effective assembly of the genome sequence of the domestic turkey (Meleagris gallopavo). Heterozygosity of the sequenced source genome allowed discovery of more than 600,000 high quality single nucleotide variants. Despite this heterozygosity, the current genome assembly (∼1.1 Gb) includes 917 Mb of sequence assigned to specific turkey chromosomes. Annotation identified nearly 16,000 genes, with 15,093 recognized as protein coding and 611 as non-coding RNA genes. Comparative analysis of the turkey, chicken, and zebra finch genomes, and comparing avian to mammalian species, supports the characteristic stability of avian genomes and identifies genes unique to the avian lineage. Clear differences are seen in number and variety of genes of the avian immune system where expansions and novel genes are less frequent than examples of gene loss. The turkey genome sequence provides resources to further understand the evolution of vertebrate genomes and genetic variation underlying economically important quantitative traits in poultry. This integrated approach may be a model for providing both gene and chromosome level assemblies of other species with agricultural, ecological, and evolutionary interest.


Assuntos
Genoma , Perus/genética , Animais , Sequência de Bases , Mapeamento Cromossômico , DNA/genética , Polimorfismo de Nucleotídeo Único , Análise de Sequência de DNA , Homologia de Sequência do Ácido Nucleico , Especificidade da Espécie
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA