Your browser doesn't support javascript.
loading
Montrer: 20 | 50 | 100
Résultats 1 - 5 de 5
Filtrer
Plus de filtres











Base de données
Gamme d'année
1.
Nature ; 487(7406): 190-5, 2012 Jul 11.
Article de Anglais | MEDLINE | ID: mdl-22785314

RÉSUMÉ

Recent advances in whole-genome sequencing have brought the vision of personal genomics and genomic medicine closer to reality. However, current methods lack clinical accuracy and the ability to describe the context (haplotypes) in which genome variants co-occur in a cost-effective manner. Here we describe a low-cost DNA sequencing and haplotyping process, long fragment read (LFR) technology, which is similar to sequencing long single DNA molecules without cloning or separation of metaphase chromosomes. In this study, ten LFR libraries were made using only ∼100 picograms of human DNA per sample. Up to 97% of the heterozygous single nucleotide variants were assembled into long haplotype contigs. Removal of false positive single nucleotide variants not phased by multiple LFR haplotypes resulted in a final genome error rate of 1 in 10 megabases. Cost-effective and accurate genome sequencing and haplotyping from 10-20 human cells, as demonstrated here, will enable comprehensive genetic studies and diverse clinical applications.


Sujet(s)
Génome humain , Génomique/méthodes , Analyse de séquence d'ADN/méthodes , Allèles , Lignée cellulaire , Femelle , Extinction de l'expression des gènes , Variation génétique , Haplotypes , Humains , Mutation , Reproductibilité des résultats , Analyse de séquence d'ADN/économie , Analyse de séquence d'ADN/normes
2.
J Comput Biol ; 19(3): 279-92, 2012 Mar.
Article de Anglais | MEDLINE | ID: mdl-22175250

RÉSUMÉ

Unchained base reads on self-assembling DNA nanoarrays have recently emerged as a promising approach to low-cost, high-quality resequencing of human genomes. Because of unique characteristics of these mated pair reads, existing computational methods for resequencing assembly, such as those based on map-consensus calling, are not adequate for accurate variant calling. We describe novel computational methods developed for accurate calling of SNPs and short substitutions and indels (<100 bp); the same methods apply to evaluation of hypothesized larger, structural variations. We use an optimization process that iteratively adjusts the genome sequence to maximize its a posteriori probability given the observed reads. For each candidate sequence, this probability is computed using Bayesian statistics with a simple read generation model and simplifying assumptions that make the problem computationally tractable. The optimization process iteratively applies one-base substitutions, insertions, and deletions until convergence is achieved to an optimum diploid sequence. A local de novo assembly procedure that generalizes approaches based on De Bruijn graphs is used to seed the optimization process in order to reduce the chance of converging to local optima. Finally, a correlation-based filter is applied to reduce the false positive rate caused by the presence of repetitive regions in the reference genome.


Sujet(s)
Cartographie de contigs/méthodes , Génome humain , Analyse de séquence d'ADN/méthodes , Algorithmes , Allèles , Séquence nucléotidique , Théorème de Bayes , Cartographie chromosomique , Simulation numérique , Interprétation statistique de données , Humains , Modèles génétiques
3.
Science ; 327(5961): 78-81, 2010 Jan 01.
Article de Anglais | MEDLINE | ID: mdl-19892942

RÉSUMÉ

Genome sequencing of large numbers of individuals promises to advance the understanding, treatment, and prevention of human diseases, among other applications. We describe a genome sequencing platform that achieves efficient imaging and low reagent consumption with combinatorial probe anchor ligation chemistry to independently assay each base from patterned nanoarrays of self-assembling DNA nanoballs. We sequenced three human genomes with this platform, generating an average of 45- to 87-fold coverage per genome and identifying 3.2 to 4.5 million sequence variants per genome. Validation of one genome data set demonstrates a sequence accuracy of about 1 false variant per 100 kilobases. The high accuracy, affordable cost of $4400 for sequencing consumables, and scalability of this platform enable complete human genome sequencing for the detection of rare variants in large-scale genetic studies.


Sujet(s)
ADN/composition chimique , Génome humain , Analyse sur microréseau , Analyse de séquence d'ADN/méthodes , Séquence nucléotidique , Biologie informatique , Coûts et analyse des coûts , ADN/génétique , Bases de données d'acides nucléiques , Banque génomique , Génotype , Haplotypes , Projet génome humain , Humains , Mâle , Nanostructures , Nanotechnologie , Techniques d'amplification d'acides nucléiques , Polymorphisme de nucléotide simple , Analyse de séquence d'ADN/économie , Analyse de séquence d'ADN/instrumentation , Analyse de séquence d'ADN/normes , Logiciel
4.
Protein Sci ; 17(1): 54-65, 2008 Jan.
Article de Anglais | MEDLINE | ID: mdl-18042678

RÉSUMÉ

Metals play a variety of roles in biological processes, and hence their presence in a protein structure can yield vital functional information. Because the residues that coordinate a metal often undergo conformational changes upon binding, detection of binding sites based on simple geometric criteria in proteins without bound metal is difficult. However, aspects of the physicochemical environment around a metal binding site are often conserved even when this structural rearrangement occurs. We have developed a Bayesian classifier using known zinc binding sites as positive training examples and nonmetal binding regions that nonetheless contain residues frequently observed in zinc sites as negative training examples. In order to allow variation in the exact positions of atoms, we average a variety of biochemical and biophysical properties in six concentric spherical shells around the site of interest. At a specificity of 99.8%, this method achieves 75.5% sensitivity in unbound proteins at a positive predictive value of 73.6%. We also test its accuracy on predicted protein structures obtained by homology modeling using templates with 30%-50% sequence identity to the target sequences. At a specificity of 99.8%, we correctly identify at least one zinc binding site in 65.5% of modeled proteins. Thus, in many cases, our model is accurate enough to identify metal binding sites in proteins of unknown structure for which no high sequence identity homologs of known structure exist. Both the source code and a Web interface are available to the public at http://feature.stanford.edu/metals.


Sujet(s)
Protéines de transport/composition chimique , Protéines de transport/métabolisme , Zinc/composition chimique , Zinc/métabolisme , Sites de fixation , Protéines de transport/génétique , Génomique , Modèles biologiques , Modèles moléculaires , Conformation des protéines , Sensibilité et spécificité
5.
BMC Bioinformatics ; 8 Suppl 4: S10, 2007 May 22.
Article de Anglais | MEDLINE | ID: mdl-17570144

RÉSUMÉ

BACKGROUND: Structural genomics initiatives are producing increasing numbers of three-dimensional (3D) structures for which there is little functional information. Structure-based annotation of molecular function is therefore becoming critical. We previously presented FEATURE, a method for describing microenvironments around functional sites in proteins. However, FEATURE uses supervised machine learning and so is limited to building models for sites of known importance and location. We hypothesized that there are a large number of sites in proteins that are associated with function that have not yet been recognized. Toward that end, we have developed a method for clustering protein microenvironments in order to evaluate the potential for discovering novel sites that have not been previously identified. RESULTS: We have prototyped a computational method for rapid clustering of millions of microenvironments in order to discover residues whose surrounding environments are similar and which may therefore share a functional or structural role. We clustered nearly 2,000,000 environments from 9,600 protein chains and defined 4,550 clusters. As a preliminary validation, we asked whether known 3D environments associated with PROSITE motifs were "rediscovered". We found examples of clusters highly enriched for residues that share PROSITE sequence motifs. CONCLUSION: Our results demonstrate that we can cluster protein environments successfully using a simplified representation and K-means clustering algorithm. The rediscovery of known 3D motifs allows us to calibrate the size and intercluster distances that characterize useful clusters. This information will then allow us to find new clusters with similar characteristics that represent novel structural or functional sites.


Sujet(s)
Algorithmes , Modèles chimiques , Modèles moléculaires , Protéines/composition chimique , Protéines/ultrastructure , Analyse de séquence de protéine/méthodes , Motifs d'acides aminés , Sites de fixation , Simulation numérique , Imagerie tridimensionnelle/méthodes , Ligands , Liaison aux protéines , Conformation des protéines
SÉLECTION CITATIONS
DÉTAIL DE RECHERCHE