Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 5 de 5
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Nature ; 487(7406): 190-5, 2012 Jul 11.
Artigo em Inglês | MEDLINE | ID: mdl-22785314

RESUMO

Recent advances in whole-genome sequencing have brought the vision of personal genomics and genomic medicine closer to reality. However, current methods lack clinical accuracy and the ability to describe the context (haplotypes) in which genome variants co-occur in a cost-effective manner. Here we describe a low-cost DNA sequencing and haplotyping process, long fragment read (LFR) technology, which is similar to sequencing long single DNA molecules without cloning or separation of metaphase chromosomes. In this study, ten LFR libraries were made using only ∼100 picograms of human DNA per sample. Up to 97% of the heterozygous single nucleotide variants were assembled into long haplotype contigs. Removal of false positive single nucleotide variants not phased by multiple LFR haplotypes resulted in a final genome error rate of 1 in 10 megabases. Cost-effective and accurate genome sequencing and haplotyping from 10-20 human cells, as demonstrated here, will enable comprehensive genetic studies and diverse clinical applications.


Assuntos
Genoma Humano , Genômica/métodos , Análise de Sequência de DNA/métodos , Alelos , Linhagem Celular , Feminino , Inativação Gênica , Variação Genética , Haplótipos , Humanos , Mutação , Reprodutibilidade dos Testes , Análise de Sequência de DNA/economia , Análise de Sequência de DNA/normas
2.
J Comput Biol ; 19(3): 279-92, 2012 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-22175250

RESUMO

Unchained base reads on self-assembling DNA nanoarrays have recently emerged as a promising approach to low-cost, high-quality resequencing of human genomes. Because of unique characteristics of these mated pair reads, existing computational methods for resequencing assembly, such as those based on map-consensus calling, are not adequate for accurate variant calling. We describe novel computational methods developed for accurate calling of SNPs and short substitutions and indels (<100 bp); the same methods apply to evaluation of hypothesized larger, structural variations. We use an optimization process that iteratively adjusts the genome sequence to maximize its a posteriori probability given the observed reads. For each candidate sequence, this probability is computed using Bayesian statistics with a simple read generation model and simplifying assumptions that make the problem computationally tractable. The optimization process iteratively applies one-base substitutions, insertions, and deletions until convergence is achieved to an optimum diploid sequence. A local de novo assembly procedure that generalizes approaches based on De Bruijn graphs is used to seed the optimization process in order to reduce the chance of converging to local optima. Finally, a correlation-based filter is applied to reduce the false positive rate caused by the presence of repetitive regions in the reference genome.


Assuntos
Mapeamento de Sequências Contíguas/métodos , Genoma Humano , Análise de Sequência de DNA/métodos , Algoritmos , Alelos , Sequência de Bases , Teorema de Bayes , Mapeamento Cromossômico , Simulação por Computador , Interpretação Estatística de Dados , Humanos , Modelos Genéticos
3.
Science ; 327(5961): 78-81, 2010 Jan 01.
Artigo em Inglês | MEDLINE | ID: mdl-19892942

RESUMO

Genome sequencing of large numbers of individuals promises to advance the understanding, treatment, and prevention of human diseases, among other applications. We describe a genome sequencing platform that achieves efficient imaging and low reagent consumption with combinatorial probe anchor ligation chemistry to independently assay each base from patterned nanoarrays of self-assembling DNA nanoballs. We sequenced three human genomes with this platform, generating an average of 45- to 87-fold coverage per genome and identifying 3.2 to 4.5 million sequence variants per genome. Validation of one genome data set demonstrates a sequence accuracy of about 1 false variant per 100 kilobases. The high accuracy, affordable cost of $4400 for sequencing consumables, and scalability of this platform enable complete human genome sequencing for the detection of rare variants in large-scale genetic studies.


Assuntos
DNA/química , Genoma Humano , Análise em Microsséries , Análise de Sequência de DNA/métodos , Sequência de Bases , Biologia Computacional , Custos e Análise de Custo , DNA/genética , Bases de Dados de Ácidos Nucleicos , Biblioteca Genômica , Genótipo , Haplótipos , Projeto Genoma Humano , Humanos , Masculino , Nanoestruturas , Nanotecnologia , Técnicas de Amplificação de Ácido Nucleico , Polimorfismo de Nucleotídeo Único , Análise de Sequência de DNA/economia , Análise de Sequência de DNA/instrumentação , Análise de Sequência de DNA/normas , Software
4.
Protein Sci ; 17(1): 54-65, 2008 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-18042678

RESUMO

Metals play a variety of roles in biological processes, and hence their presence in a protein structure can yield vital functional information. Because the residues that coordinate a metal often undergo conformational changes upon binding, detection of binding sites based on simple geometric criteria in proteins without bound metal is difficult. However, aspects of the physicochemical environment around a metal binding site are often conserved even when this structural rearrangement occurs. We have developed a Bayesian classifier using known zinc binding sites as positive training examples and nonmetal binding regions that nonetheless contain residues frequently observed in zinc sites as negative training examples. In order to allow variation in the exact positions of atoms, we average a variety of biochemical and biophysical properties in six concentric spherical shells around the site of interest. At a specificity of 99.8%, this method achieves 75.5% sensitivity in unbound proteins at a positive predictive value of 73.6%. We also test its accuracy on predicted protein structures obtained by homology modeling using templates with 30%-50% sequence identity to the target sequences. At a specificity of 99.8%, we correctly identify at least one zinc binding site in 65.5% of modeled proteins. Thus, in many cases, our model is accurate enough to identify metal binding sites in proteins of unknown structure for which no high sequence identity homologs of known structure exist. Both the source code and a Web interface are available to the public at http://feature.stanford.edu/metals.


Assuntos
Proteínas de Transporte/química , Proteínas de Transporte/metabolismo , Zinco/química , Zinco/metabolismo , Sítios de Ligação , Proteínas de Transporte/genética , Genômica , Modelos Biológicos , Modelos Moleculares , Conformação Proteica , Sensibilidade e Especificidade
5.
BMC Bioinformatics ; 8 Suppl 4: S10, 2007 May 22.
Artigo em Inglês | MEDLINE | ID: mdl-17570144

RESUMO

BACKGROUND: Structural genomics initiatives are producing increasing numbers of three-dimensional (3D) structures for which there is little functional information. Structure-based annotation of molecular function is therefore becoming critical. We previously presented FEATURE, a method for describing microenvironments around functional sites in proteins. However, FEATURE uses supervised machine learning and so is limited to building models for sites of known importance and location. We hypothesized that there are a large number of sites in proteins that are associated with function that have not yet been recognized. Toward that end, we have developed a method for clustering protein microenvironments in order to evaluate the potential for discovering novel sites that have not been previously identified. RESULTS: We have prototyped a computational method for rapid clustering of millions of microenvironments in order to discover residues whose surrounding environments are similar and which may therefore share a functional or structural role. We clustered nearly 2,000,000 environments from 9,600 protein chains and defined 4,550 clusters. As a preliminary validation, we asked whether known 3D environments associated with PROSITE motifs were "rediscovered". We found examples of clusters highly enriched for residues that share PROSITE sequence motifs. CONCLUSION: Our results demonstrate that we can cluster protein environments successfully using a simplified representation and K-means clustering algorithm. The rediscovery of known 3D motifs allows us to calibrate the size and intercluster distances that characterize useful clusters. This information will then allow us to find new clusters with similar characteristics that represent novel structural or functional sites.


Assuntos
Algoritmos , Modelos Químicos , Modelos Moleculares , Proteínas/química , Proteínas/ultraestrutura , Análise de Sequência de Proteína/métodos , Motivos de Aminoácidos , Sítios de Ligação , Simulação por Computador , Imageamento Tridimensional/métodos , Ligantes , Ligação Proteica , Conformação Proteica
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...