Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 41
Filtrar
Mais filtros

Base de dados
País/Região como assunto
Tipo de documento
Intervalo de ano de publicação
1.
Nature ; 601(7893): 422-427, 2022 01.
Artigo em Inglês | MEDLINE | ID: mdl-34987224

RESUMO

Maternal morbidity and mortality continue to rise, and pre-eclampsia is a major driver of this burden1. Yet the ability to assess underlying pathophysiology before clinical presentation to enable identification of pregnancies at risk remains elusive. Here we demonstrate the ability of plasma cell-free RNA (cfRNA) to reveal patterns of normal pregnancy progression and determine the risk of developing pre-eclampsia months before clinical presentation. Our results centre on comprehensive transcriptome data from eight independent prospectively collected cohorts comprising 1,840 racially diverse pregnancies and retrospective analysis of 2,539 banked plasma samples. The pre-eclampsia data include 524 samples (72 cases and 452 non-cases) from two diverse independent cohorts collected 14.5 weeks (s.d., 4.5 weeks) before delivery. We show that cfRNA signatures from a single blood draw can track pregnancy progression at the placental, maternal and fetal levels and can robustly predict pre-eclampsia, with a sensitivity of 75% and a positive predictive value of 32.3% (s.d., 3%), which is superior to the state-of-the-art method2. cfRNA signatures of normal pregnancy progression and pre-eclampsia are independent of clinical factors, such as maternal age, body mass index and race, which cumulatively account for less than 1% of model variance. Further, the cfRNA signature for pre-eclampsia contains gene features linked to biological processes implicated in the underlying pathophysiology of pre-eclampsia.


Assuntos
Ácidos Nucleicos Livres , Pré-Eclâmpsia , RNA , Ácidos Nucleicos Livres/sangue , Feminino , Humanos , Pré-Eclâmpsia/diagnóstico , Pré-Eclâmpsia/genética , Valor Preditivo dos Testes , Gravidez , RNA/sangue , Estudos Retrospectivos , Sensibilidade e Especificidade
2.
Nature ; 518(7537): 102-6, 2015 Feb 05.
Artigo em Inglês | MEDLINE | ID: mdl-25487149

RESUMO

Myocardial infarction (MI), a leading cause of death around the world, displays a complex pattern of inheritance. When MI occurs early in life, genetic inheritance is a major component to risk. Previously, rare mutations in low-density lipoprotein (LDL) genes have been shown to contribute to MI risk in individual families, whereas common variants at more than 45 loci have been associated with MI risk in the population. Here we evaluate how rare mutations contribute to early-onset MI risk in the population. We sequenced the protein-coding regions of 9,793 genomes from patients with MI at an early age (≤50 years in males and ≤60 years in females) along with MI-free controls. We identified two genes in which rare coding-sequence mutations were more frequent in MI cases versus controls at exome-wide significance. At low-density lipoprotein receptor (LDLR), carriers of rare non-synonymous mutations were at 4.2-fold increased risk for MI; carriers of null alleles at LDLR were at even higher risk (13-fold difference). Approximately 2% of early MI cases harbour a rare, damaging mutation in LDLR; this estimate is similar to one made more than 40 years ago using an analysis of total cholesterol. Among controls, about 1 in 217 carried an LDLR coding-sequence mutation and had plasma LDL cholesterol > 190 mg dl(-1). At apolipoprotein A-V (APOA5), carriers of rare non-synonymous mutations were at 2.2-fold increased risk for MI. When compared with non-carriers, LDLR mutation carriers had higher plasma LDL cholesterol, whereas APOA5 mutation carriers had higher plasma triglycerides. Recent evidence has connected MI risk with coding-sequence mutations at two genes functionally related to APOA5, namely lipoprotein lipase and apolipoprotein C-III (refs 18, 19). Combined, these observations suggest that, as well as LDL cholesterol, disordered metabolism of triglyceride-rich lipoproteins contributes to MI risk.


Assuntos
Alelos , Apolipoproteínas A/genética , Exoma/genética , Predisposição Genética para Doença/genética , Infarto do Miocárdio/genética , Receptores de LDL/genética , Fatores Etários , Idade de Início , Apolipoproteína A-V , Estudos de Casos e Controles , LDL-Colesterol/sangue , Doença da Artéria Coronariana/genética , Feminino , Genética Populacional , Heterozigoto , Humanos , Masculino , Pessoa de Meia-Idade , Mutação/genética , Infarto do Miocárdio/sangue , National Heart, Lung, and Blood Institute (U.S.) , Triglicerídeos/sangue , Estados Unidos
3.
Bioinformatics ; 35(21): 4389-4391, 2019 11 01.
Artigo em Inglês | MEDLINE | ID: mdl-30916319

RESUMO

SUMMARY: Reference genomes are refined to reflect error corrections and other improvements. While this process improves novel data generation and analysis, incorporating data analyzed on an older reference genome assembly requires transforming the coordinates and representations of the data to the new assembly. Multiple tools exist to perform this transformation for coordinate-only data types, but none supports accurate transformation of genome-wide short variation. Here we present GenomeWarp, a tool for efficiently transforming variants between genome assemblies. GenomeWarp transforms regions and short variants in a conservative manner to minimize false positive and negative variants in the target genome, and converts over 99% of regions and short variants from a representative human genome. AVAILABILITY AND IMPLEMENTATION: GenomeWarp is written in Java. All source code and the user manual are freely available at https://github.com/verilylifesciences/genomewarp. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Genômica , Software , Genoma Humano , Humanos
4.
Nature ; 491(7422): 56-65, 2012 Nov 01.
Artigo em Inglês | MEDLINE | ID: mdl-23128226

RESUMO

By characterizing the geographic and functional spectrum of human genetic variation, the 1000 Genomes Project aims to build a resource to help to understand the genetic contribution to disease. Here we describe the genomes of 1,092 individuals from 14 populations, constructed using a combination of low-coverage whole-genome and exome sequencing. By developing methods to integrate information across several algorithms and diverse data sources, we provide a validated haplotype map of 38 million single nucleotide polymorphisms, 1.4 million short insertions and deletions, and more than 14,000 larger deletions. We show that individuals from different populations carry different profiles of rare and common variants, and that low-frequency variants show substantial geographic differentiation, which is further increased by the action of purifying selection. We show that evolutionary conservation and coding consequence are key determinants of the strength of purifying selection, that rare-variant load varies substantially across biological pathways, and that each individual contains hundreds of rare non-coding variants at conserved sites, such as motif-disrupting changes in transcription-factor-binding sites. This resource, which captures up to 98% of accessible single nucleotide polymorphisms at a frequency of 1% in related populations, enables analysis of common and low-frequency variants in individuals from diverse, including admixed, populations.


Assuntos
Variação Genética/genética , Genética Populacional , Genoma Humano/genética , Genômica , Alelos , Sítios de Ligação/genética , Sequência Conservada/genética , Evolução Molecular , Genética Médica , Estudo de Associação Genômica Ampla , Haplótipos/genética , Humanos , Motivos de Nucleotídeos , Polimorfismo de Nucleotídeo Único/genética , Grupos Raciais/genética , Deleção de Sequência/genética , Fatores de Transcrição/metabolismo
5.
Mol Cell ; 37(3): 311-20, 2010 Feb 12.
Artigo em Inglês | MEDLINE | ID: mdl-20159551

RESUMO

Antibiotic resistance arises through mechanisms such as selection of naturally occurring resistant mutants and horizontal gene transfer. Recently, oxidative stress has been implicated as one of the mechanisms whereby bactericidal antibiotics kill bacteria. Here, we show that sublethal levels of bactericidal antibiotics induce mutagenesis, resulting in heterogeneous increases in the minimum inhibitory concentration for a range of antibiotics, irrespective of the drug target. This increase in mutagenesis correlates with an increase in ROS and is prevented by the ROS scavenger thiourea and by anaerobic conditions, indicating that sublethal concentrations of antibiotics induce mutagenesis by stimulating the production of ROS. We demonstrate that these effects can lead to mutant strains that are sensitive to the applied antibiotic but resistant to other antibiotics. This work establishes a radical-based molecular mechanism whereby sublethal levels of antibiotics can lead to multidrug resistance, which has important implications for the widespread use and misuse of antibiotics.


Assuntos
Antibacterianos/farmacologia , Farmacorresistência Bacteriana Múltipla/efeitos dos fármacos , Escherichia coli/efeitos dos fármacos , Mutagênese , Sequência de Aminoácidos , Sequência de Bases , Farmacorresistência Bacteriana Múltipla/genética , Escherichia coli/genética , Escherichia coli/metabolismo , Transferência Genética Horizontal/efeitos dos fármacos , Testes de Sensibilidade Microbiana , Dados de Sequência Molecular , Espécies Reativas de Oxigênio/metabolismo , Alinhamento de Sequência , Análise de Sequência de Proteína
6.
Hum Mol Genet ; 20(7): 1285-9, 2011 Apr 01.
Artigo em Inglês | MEDLINE | ID: mdl-21212097

RESUMO

Exome sequencing is a powerful tool for discovery of the Mendelian disease genes. Previously, we reported a novel locus for autosomal recessive non-syndromic mental retardation (NSMR) in a consanguineous family [Nolan, D.K., Chen, P., Das, S., Ober, C. and Waggoner, D. (2008) Fine mapping of a locus for nonsyndromic mental retardation on chromosome 19p13. Am. J. Med. Genet. A, 146A, 1414-1422]. Using linkage and homozygosity mapping, we previously localized the gene to chromosome 19p13. The parents of this sibship were recently included in an exome sequencing project. Using a series of filters, we narrowed the putative causal mutation to a single variant site that segregated with NSMR: the mutation was homozygous in five affected siblings but in none of eight unaffected siblings. This mutation causes a substitution of a leucine for a highly conserved proline at amino acid 182 in TECR (trans-2,3-enoyl-CoA reductase), a synaptic glycoprotein. Our results reveal the value of massively parallel sequencing for identification of novel disease genes that could not be found using traditional approaches and identifies only the seventh causal mutation for autosomal recessive NSMR.


Assuntos
Cromossomos Humanos Par 19/genética , Doenças Genéticas Inatas/genética , Deficiência Intelectual/genética , Glicoproteínas de Membrana/genética , Mutação , Oxirredutases/genética , Membranas Sinápticas/genética , Feminino , Doenças Genéticas Inatas/enzimologia , Humanos , Deficiência Intelectual/enzimologia , Masculino , Glicoproteínas de Membrana/metabolismo , Oxirredutases/metabolismo , Linhagem , Membranas Sinápticas/enzimologia
7.
N Engl J Med ; 363(23): 2220-7, 2010 Dec 02.
Artigo em Inglês | MEDLINE | ID: mdl-20942659

RESUMO

We sequenced all protein-coding regions of the genome (the "exome") in two family members with combined hypolipidemia, marked by extremely low plasma levels of low-density lipoprotein (LDL) cholesterol, high-density lipoprotein (HDL) cholesterol, and triglycerides. These two participants were compound heterozygotes for two distinct nonsense mutations in ANGPTL3 (encoding the angiopoietin-like 3 protein). ANGPTL3 has been reported to inhibit lipoprotein lipase and endothelial lipase, thereby increasing plasma triglyceride and HDL cholesterol levels in rodents. Our finding of ANGPTL3 mutations highlights a role for the gene in LDL cholesterol metabolism in humans and shows the usefulness of exome sequencing for identification of novel genetic causes of inherited disorders. (Funded by the National Human Genome Research Institute and others.).


Assuntos
Angiopoietinas/genética , Códon sem Sentido , Hipobetalipoproteinemias/genética , Proteína 3 Semelhante a Angiopoietina , Proteínas Semelhantes a Angiopoietina , HDL-Colesterol/sangue , HDL-Colesterol/genética , LDL-Colesterol/sangue , LDL-Colesterol/genética , Análise Mutacional de DNA , Feminino , Ligação Genética , Humanos , Masculino , Linhagem
8.
Genome Res ; 20(9): 1297-303, 2010 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-20644199

RESUMO

Next-generation DNA sequencing (NGS) projects, such as the 1000 Genomes Project, are already revolutionizing our understanding of genetic variation among individuals. However, the massive data sets generated by NGS--the 1000 Genome pilot alone includes nearly five terabases--make writing feature-rich, efficient, and robust analysis tools difficult for even computationally sophisticated individuals. Indeed, many professionals are limited in the scope and the ease with which they can answer scientific questions by the complexity of accessing and manipulating the data produced by these machines. Here, we discuss our Genome Analysis Toolkit (GATK), a structured programming framework designed to ease the development of efficient and robust analysis tools for next-generation DNA sequencers using the functional programming philosophy of MapReduce. The GATK provides a small but rich set of data access patterns that encompass the majority of analysis tool needs. Separating specific analysis calculations from common data management infrastructure enables us to optimize the GATK framework for correctness, stability, and CPU and memory efficiency and to enable distributed and shared memory parallelization. We highlight the capabilities of the GATK by describing the implementation and application of robust, scale-tolerant tools like coverage calculators and single nucleotide polymorphism (SNP) calling. We conclude that the GATK programming framework enables developers and analysts to quickly and easily write efficient and robust NGS tools, many of which have already been incorporated into large-scale sequencing projects like the 1000 Genomes Project and The Cancer Genome Atlas.


Assuntos
Genoma , Genômica/métodos , Análise de Sequência de DNA/métodos , Software , Sequência de Bases
9.
PLoS Comput Biol ; 8(7): e1002604, 2012.
Artigo em Inglês | MEDLINE | ID: mdl-22807667

RESUMO

High coverage whole genome sequencing provides near complete information about genetic variation. However, other technologies can be more efficient in some settings by (a) reducing redundant coverage within samples and (b) exploiting patterns of genetic variation across samples. To characterize as many samples as possible, many genetic studies therefore employ lower coverage sequencing or SNP array genotyping coupled to statistical imputation. To compare these approaches individually and in conjunction, we developed a statistical framework to estimate genotypes jointly from sequence reads, array intensities, and imputation. In European samples, we find similar sensitivity (89%) and specificity (99.6%) from imputation with either 1× sequencing or 1 M SNP arrays. Sensitivity is increased, particularly for low-frequency polymorphisms (MAF < 5%), when low coverage sequence reads are added to dense genome-wide SNP arrays--the converse, however, is not true. At sites where sequence reads and array intensities produce different sample genotypes, joint analysis reduces genotype errors and identifies novel error modes. Our joint framework informs the use of next-generation sequencing in genome wide association studies and supports development of improved methods for genotype calling.


Assuntos
Genômica/métodos , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Polimorfismo de Nucleotídeo Único , Análise de Sequência de DNA/métodos , Algoritmos , Análise por Conglomerados , Bases de Dados Genéticas , Estudo de Associação Genômica Ampla , Genótipo , Humanos , Sensibilidade e Especificidade , População Branca
10.
BMC Genomics ; 13: 375, 2012 Aug 05.
Artigo em Inglês | MEDLINE | ID: mdl-22863213

RESUMO

BACKGROUND: Pacific Biosciences technology provides a fundamentally new data type that provides the potential to overcome some limitations of current next generation sequencing platforms by providing significantly longer reads, single molecule sequencing, low composition bias and an error profile that is orthogonal to other platforms. With these potential advantages in mind, we here evaluate the utility of the Pacific Biosciences RS platform for human medical amplicon resequencing projects. RESULTS: We evaluated the Pacific Biosciences technology for SNP discovery in medical resequencing projects using the Genome Analysis Toolkit, observing high sensitivity and specificity for calling differences in amplicons containing known true or false SNPs. We assessed data quality: most errors were indels (~14%) with few apparent miscalls (~1%). In this work, we define a custom data processing pipeline for Pacific Biosciences data for human data analysis. CONCLUSION: Critically, the error properties were largely free of the context-specific effects that affect other sequencing technologies. These data show excellent utility for follow-up validation and extension studies in human data and medical genetics projects, but can be extended to other organisms with a reference genome.


Assuntos
Análise de Sequência de DNA , Variação Genética , Genoma Humano , Genótipo , Humanos , Polimorfismo de Nucleotídeo Único , Software , Interface Usuário-Computador
11.
Bioinformatics ; 27(15): 2156-8, 2011 Aug 01.
Artigo em Inglês | MEDLINE | ID: mdl-21653522

RESUMO

SUMMARY: The variant call format (VCF) is a generic format for storing DNA polymorphism data such as SNPs, insertions, deletions and structural variants, together with rich annotations. VCF is usually stored in a compressed manner and can be indexed for fast data retrieval of variants from a range of positions on the reference genome. The format was developed for the 1000 Genomes Project, and has also been adopted by other projects such as UK10K, dbSNP and the NHLBI Exome Project. VCFtools is a software suite that implements various utilities for processing VCF files, including validation, merging, comparing and also provides a general Perl API. AVAILABILITY: http://vcftools.sourceforge.net


Assuntos
Variação Genética , Genômica/métodos , Armazenamento e Recuperação da Informação/métodos , Software , Alelos , Genoma Humano , Genótipo , Humanos
12.
Nat Biotechnol ; 40(6): 932-937, 2022 06.
Artigo em Inglês | MEDLINE | ID: mdl-35190689

RESUMO

Understanding the relationship between amino acid sequence and protein function is a long-standing challenge with far-reaching scientific and translational implications. State-of-the-art alignment-based techniques cannot predict function for one-third of microbial protein sequences, hampering our ability to exploit data from diverse organisms. Here, we train deep learning models to accurately predict functional annotations for unaligned amino acid sequences across rigorous benchmark assessments built from the 17,929 families of the protein families database Pfam. The models infer known patterns of evolutionary substitutions and learn representations that accurately cluster sequences from unseen families. Combining deep models with existing methods significantly improves remote homology detection, suggesting that the deep models learn complementary information. This approach extends the coverage of Pfam by >9.5%, exceeding additions made over the last decade, and predicts function for 360 human reference proteome proteins with no previous Pfam annotation. These results suggest that deep learning models will be a core component of future protein annotation tools.


Assuntos
Aprendizado Profundo , Sequência de Aminoácidos , Bases de Dados de Proteínas , Humanos , Anotação de Sequência Molecular , Proteoma/metabolismo , Proteômica
13.
BMC Genomics ; 12: 42, 2011 Jan 18.
Artigo em Inglês | MEDLINE | ID: mdl-21244689

RESUMO

BACKGROUND: Comprehensive sequence characterization across the MHC is important for successful organ transplantation and genetic association studies. To this end, we have developed an automated sample preparation, molecular barcoding and multiplexing protocol for the amplification and sequence-determination of class I HLA loci. We have coupled this process to a novel HLA calling algorithm to determine the most likely pair of alleles at each locus. RESULTS: We have benchmarked our protocol with 270 HapMap individuals from four worldwide populations with 96.4% accuracy at 4-digit resolution. A variation of this initial protocol, more suitable for large sample sizes, in which molecular barcodes are added during PCR rather than library construction, was tested on 95 HapMap individuals with 98.6% accuracy at 4-digit resolution. CONCLUSIONS: Next-generation sequencing on the 454 FLX Titanium platform is a reliable, efficient, and scalable technology for HLA typing.


Assuntos
Genes MHC Classe I/genética , Teste de Histocompatibilidade/métodos , Análise de Sequência de DNA/métodos , Humanos , Reação em Cadeia da Polimerase
14.
Mol Biol Evol ; 27(9): 2198-209, 2010 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-20427419

RESUMO

Over the past decade, attempts to explain the unusual size and prevalence of low-complexity regions (LCRs) in the proteins of the human malaria parasite Plasmodium falciparum have used both neutral and adaptive models. This past research has offered conflicting explanations for LCR characteristics and their role in, and influence on, the evolution of genome structure. Here we show that P. falciparum LCRs (PfLCRs) are not a single phenomenon, but rather consist of at least three distinct types of sequence, and this heterogeneity is the source of the conflict in the literature. Using molecular and population genetics, we show that these families of PfLCRs are evolving by different mechanisms. One of these families, named here the HighGC family, is of particular interest because these LCRs act as recombination hotspots, both in genes under positive selection for high levels of diversity which can be created by recombination (antigens) and those likely to be evolving neutrally or under negative selection (metabolic enzymes). We discuss how the discovery of these distinct species of PfLCRs helps to resolve previous contradictory studies on LCRs in malaria and contributes to our understanding of the evolution of the of the parasite's unusual genome.


Assuntos
Evolução Molecular , Genoma de Protozoário/genética , Plasmodium falciparum/genética , Animais
15.
Nature ; 433(7022): 128-32, 2005 Jan 13.
Artigo em Inglês | MEDLINE | ID: mdl-15650731

RESUMO

We present a protocol for the experimental determination of ensembles of protein conformations that represent simultaneously the native structure and its associated dynamics. The procedure combines the strengths of nuclear magnetic resonance spectroscopy--for obtaining experimental information at the atomic level about the structural and dynamical features of proteins--with the ability of molecular dynamics simulations to explore a wide range of protein conformations. We illustrate the method for human ubiquitin in solution and find that there is considerable conformational heterogeneity throughout the protein structure. The interior atoms of the protein are tightly packed in each individual conformation that contributes to the ensemble but their overall behaviour can be described as having a significant degree of liquid-like character. The protocol is completely general and should lead to significant advances in our ability to understand and utilize the structures of native proteins.


Assuntos
Simulação por Computador , Espectroscopia de Ressonância Magnética , Ubiquitina/química , Ubiquitina/metabolismo , Humanos , Modelos Moleculares , Conformação Proteica , Reprodutibilidade dos Testes , Soluções/química
16.
Trends Biochem Sci ; 31(7): 359-65, 2006 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-16766188

RESUMO

In Escherichia coli, the multi-enzyme RNA degradosome contributes to the global, posttranscriptional regulation of gene expression. The degradosome components are recognized through natively unstructured "microdomains" comprising as few as 15-40 amino acids. Consequently, the degradosome might experience a comparatively smaller number of evolutionary constraints, because there is little requirement to maintain a folded state for the interaction sites. New regulatory properties of the degradosome could arise with relative rapidity, because partners that modify its function could be recruited by quickly evolving microdomains. The unusual combination of the centrality of RNA degradation in gene expression and the generality of natively unstructured microdomains in recognition can fortuitously confer a capacity for efficacious adaptive change to degradosome-like assemblies in eubacteria.


Assuntos
Endorribonucleases/fisiologia , Escherichia coli/genética , Evolução Molecular , Complexos Multienzimáticos/fisiologia , Polirribonucleotídeo Nucleotidiltransferase/fisiologia , RNA Helicases/fisiologia , Modelos Moleculares , Filogenia
17.
Mol Biol Evol ; 26(11): 2455-62, 2009 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-19602543

RESUMO

Understanding the molecular details of the sequence of events in multistep evolutionary pathways can reveal the extent to which natural selection exploits regulatory mutations affecting expression, amino acid replacements affecting the active site, amino acid replacements affecting protein folding or stability, or variations affecting gene copy number. In experimentally exploring the adaptive landscape of the evolution of resistance to beta-lactam antibiotics in enteric bacteria, we noted that a regulatory mutation that increases beta-lactamase expression by about 2-fold has a very strong tendency to be fixed at or near the end of the evolutionary pathway. This pattern contrasts with previous experiments selecting for the utilization of novel substrates, in which regulatory mutations that increase expression are often fixed early in the process. To understand the basis of the difference, we carried out experiments in which the expression of beta-lactamase was under the control of a tunable arabinose promoter. We find that the fitness effect of an increase in gene expression is highly dependent on the catalytic activity of the coding sequence. An increase in expression of an inefficient enzyme has a negligible effect on drug resistance; however, the effect of an increase in expression of an efficient enzyme is very large. The contrast in the temporal incorporation of regulatory mutants between antibiotic resistance and the utilization of novel substrates is related to the nature of the function that relates enzyme activity to fitness. A mathematical model of beta-lactam resistance is examined in detail and shown to be consistent with the observed results.


Assuntos
Escherichia coli/classificação , Escherichia coli/genética , Evolução Molecular , Mutação/genética , Evolução Molecular Direcionada , Resistência Microbiana a Medicamentos/genética , Cinética , Testes de Sensibilidade Microbiana , beta-Lactamases/genética
18.
Curr Opin Struct Biol ; 16(2): 160-5, 2006 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-16483766

RESUMO

Conformational sampling by direct optimization of an all-atom energy function is ineffective and inefficient because of the ruggedness of the energy landscape. Discrete sampling schemes represent an attractive alternative for generating ensembles of conformers consistent with spatial restraints derived from empirical data. Conformational sampling is becoming increasingly important for structure prediction as the bottleneck in accurate prediction shifts from energy functions to the methods used to find low-energy conformers. Experimental structure determination remains a perennial challenge as investigators tackle larger macromolecular systems, and begin to incorporate more complete descriptions of uncertainty, heterogeneity and dynamics into their models. Computational approaches that combine dense, discrete sampling with all-atom energy evaluation and refinement may help to overcome the remaining barriers to solving these problems.


Assuntos
Modelos Moleculares , Proteínas/química , Algoritmos , Cristalografia por Raios X , Conformação Proteica
19.
Nat Biotechnol ; 37(10): 1155-1162, 2019 10.
Artigo em Inglês | MEDLINE | ID: mdl-31406327

RESUMO

The DNA sequencing technologies in use today produce either highly accurate short reads or less-accurate long reads. We report the optimization of circular consensus sequencing (CCS) to improve the accuracy of single-molecule real-time (SMRT) sequencing (PacBio) and generate highly accurate (99.8%) long high-fidelity (HiFi) reads with an average length of 13.5 kilobases (kb). We applied our approach to sequence the well-characterized human HG002/NA24385 genome and obtained precision and recall rates of at least 99.91% for single-nucleotide variants (SNVs), 95.98% for insertions and deletions <50 bp (indels) and 95.99% for structural variants. Our CCS method matches or exceeds the ability of short-read sequencing to detect small variants and structural variants. We estimate that 2,434 discordances are correctable mistakes in the 'genome in a bottle' (GIAB) benchmark set. Nearly all (99.64%) variants can be phased into haplotypes, further improving variant detection. De novo genome assembly using CCS reads alone produced a contiguous and accurate genome with a contig N50 of >15 megabases (Mb) and concordance of 99.997%, substantially outperforming assembly with less-accurate long reads.


Assuntos
DNA Circular/genética , Genoma Humano , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Análise de Sequência de DNA/métodos , Sequência de Bases , Variação Genética , Haplótipos , Humanos
20.
Structure ; 14(8): 1313-20, 2006 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-16905105

RESUMO

The accurate and effective interpretation of low-resolution data in X-ray crystallography is becoming increasingly important as structural initiatives turn toward large multiprotein complexes. Substantial challenges remain due to the poor information content and ambiguity in the interpretation of electron density maps at low resolution. Here, we describe a semiautomated procedure that employs a restraint-based conformational search algorithm, RAPPER, to produce a starting model for the structure determination of ligase interacting factor 1 in complex with a fragment of DNA ligase IV at low resolution. The combined use of experimental data and a priori knowledge of protein structure enabled us not only to generate an all-atom model but also to reaffirm the inferred sequence registry. This approach provides a means to extract quickly from experimental data useful information that would otherwise be discarded and to take into account the uncertainty in the interpretation--an overriding issue for low-resolution data.


Assuntos
Algoritmos , Cristalografia por Raios X/métodos , Modelos Moleculares , Complexos Multiproteicos/química , Processamento de Sinais Assistido por Computador , DNA Ligase Dependente de ATP , DNA Ligases/química , Proteínas de Ligação a DNA/química , Estrutura Terciária de Proteína , Proteínas de Saccharomyces cerevisiae/química
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA