RESUMO
Complete and accurate genome assemblies form the basis of most downstream genomic analyses and are of critical importance. Recent genome assembly projects have relied on a combination of noisy long-read sequencing and accurate short-read sequencing, with the former offering greater assembly continuity and the latter providing higher consensus accuracy. The recently introduced Pacific Biosciences (PacBio) HiFi sequencing technology bridges this divide by delivering long reads (>10 kbp) with high per-base accuracy (>99.9%). Here we present HiCanu, a modification of the Canu assembler designed to leverage the full potential of HiFi reads via homopolymer compression, overlap-based error correction, and aggressive false overlap filtering. We benchmark HiCanu with a focus on the recovery of haplotype diversity, major histocompatibility complex (MHC) variants, satellite DNAs, and segmental duplications. For diploid human genomes sequenced to 30× HiFi coverage, HiCanu achieved superior accuracy and allele recovery compared to the current state of the art. On the effectively haploid CHM13 human cell line, HiCanu achieved an NG50 contig size of 77 Mbp with a per-base consensus accuracy of 99.999% (QV50), surpassing recent assemblies of high-coverage, ultralong Oxford Nanopore Technologies (ONT) reads in terms of both accuracy and continuity. This HiCanu assembly correctly resolves 337 out of 341 validation BACs sampled from known segmental duplications and provides the first preliminary assemblies of nine complete human centromeric regions. Although gaps and errors still remain within the most challenging regions of the genome, these results represent a significant advance toward the complete assembly of human genomes.
Assuntos
Variação Genética , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Análise de Sequência de DNA/métodos , Alelos , Animais , Linhagem Celular , Duplicação Cromossômica , DNA de Neoplasias , DNA Satélite , Drosophila/genética , Genoma Humano , Haplótipos , Humanos , Reprodutibilidade dos Testes , SoftwareRESUMO
Quantitative mass spectrometry methods offer near-comprehensive proteome coverage; however, these methods still suffer with regards to sample throughput. Multiplex quantitation via isobaric chemical tags (e.g., TMT and iTRAQ) provides an avenue for mass spectrometry-based proteome quantitation experiments to move away from simple binary comparisons and toward greater parallelization. Herein, we demonstrate a straightforward method for immediately expanding the throughput of the TMT isobaric reagents from 6-plex to 8-plex. This method is based upon our ability to resolve the isotopic shift that results from substituting a (15)N for a (13)C. In an accommodation to the preferred fragmentation pathways of ETD, the TMT-127 and -129 reagents were recently modified such that a (13)C was exchanged for a (15)N. As a result of this substitution, the new TMT reporter ions are 6.32 mDa lighter. Even though the mass difference between these reporter ion isotopologues is incredibly small, modern high-resolution and mass accuracy analyzers can resolve these ions. On the basis of our ability to resolve and accurately measure the relative intensity of these isobaric reporter ions, we demonstrate that we are able to quantify across eight samples simultaneously by combining the (13)C- and (15)N-containing reporter ions. Considering the structure of the TMT reporter ion, we believe this work serves as a blueprint for expanding the multiplexing capacity of the TMT reagents to at least 10-plex and possibly up to 18-plex.
Assuntos
Cromatografia Líquida de Alta Pressão , Proteoma/análise , Espectrometria de Massas em Tandem , Animais , Encéfalo/metabolismo , Isótopos de Carbono/química , Células HeLa , Humanos , Camundongos , Isótopos de Nitrogênio/química , Baço/metabolismo , Tiazóis/químicaRESUMO
Numerous soluble proteins convert to insoluble amyloid-like fibrils that have common properties. Amyloid fibrils are associated with fatal diseases such as Alzheimer's, and amyloid-like fibrils can be formed in vitro. For the yeast protein Sup35, conversion to amyloid-like fibrils is associated with a transmissible infection akin to that caused by mammalian prions. A seven-residue peptide segment from Sup35 forms amyloid-like fibrils and closely related microcrystals, from which we have determined the atomic structure of the cross-beta spine. It is a double beta-sheet, with each sheet formed from parallel segments stacked in register. Side chains protruding from the two sheets form a dry, tightly self-complementing steric zipper, bonding the sheets. Within each sheet, every segment is bound to its two neighbouring segments through stacks of both backbone and side-chain hydrogen bonds. The structure illuminates the stability of amyloid fibrils, their self-seeding characteristic and their tendency to form polymorphic structures.
Assuntos
Amiloide/química , Modelos Moleculares , Fragmentos de Peptídeos/química , Príons/química , Proteínas de Saccharomyces cerevisiae/química , Saccharomyces cerevisiae/química , Amidas/química , Sequência de Aminoácidos , Amiloide/metabolismo , Cristalização , Cristalografia por Raios X , Ligação de Hidrogênio , Dados de Sequência Molecular , Fragmentos de Peptídeos/metabolismo , Fatores de Terminação de Peptídeos , Príons/metabolismo , Estrutura Secundária de Proteína , Proteínas de Saccharomyces cerevisiae/metabolismo , TermodinâmicaRESUMO
The seven-residue peptide GNNQQNY from the N-terminal region of the yeast prion protein Sup35, which forms amyloid fibers, colloidal aggregates and highly ordered nanocrystals, provides a model system for characterizing the elusively protean cross-beta conformation. Depending on preparative conditions, orthorhombic and monoclinic crystals with similar lath-shaped morphology have been obtained. Ultra high-resolution (<0.5A spacing) electron diffraction patterns from single nanocrystals show that the peptide chains pack in parallel cross-beta columns with approximately 4.86A axial spacing. Mosaic striations 20-50 nm wide observed by electron microscopy indicate lateral size-limiting crystal growth related to amyloid fiber formation. Frequently obtained orthorhombic forms, with apparent space group symmetry P2(1)2(1)2(1), have cell dimensions ranging from /a/=22.7-21.2A, /b/=39.9-39.3A, /c/=4.89-4.86A for wet to dried states. Electron diffraction data from single nanocrystals, recorded in tilt series of still frames, have been mapped in reciprocal space. However, reliable integrated intensities cannot be obtained from these series, and dynamical electron diffraction effects present problems in data analysis. The diversity of ordered structures formed under similar conditions has made it difficult to obtain reproducible X-ray diffraction data from powder specimens; and overlapping Bragg reflections in the powder patterns preclude separated structure factor measurements for these data. Model protofilaments, consisting of tightly paired, half-staggered beta strands related by a screw axis, can be fit in the crystal lattices, but model refinement will require accurate structure factor measurements. Nearly anhydrous packing of this hydrophilic peptide can account for the insolubility of the crystals, since the activation energy for rehydration may be extremely high. Water-excluding packing of paired cross-beta peptide segments in thin protofilaments may be characteristic of the wide variety of anomalously stable amyloid aggregates.
Assuntos
Amiloide/química , Peptídeos/química , Proteínas de Saccharomyces cerevisiae , Cristalografia por Raios X , Elétrons , Proteínas Fúngicas/química , Microscopia Eletrônica , Fatores de Terminação de Peptídeos , Polimorfismo Genético , Príons/química , Conformação Proteica , Água/química , Difração de Raios XRESUMO
Mass spectrometry-based proteomics experiments have become an important tool for studying biological systems. Identifying the proteins in complex mixtures by assigning peptide fragmentation spectra to peptide sequences is an important step in the proteomics process. The 1-2 ppm mass-accuracy of hybrid instruments, like the LTQ-FT, has been cited as a key factor in their ability to identify a larger number of peptides with greater confidence than competing instruments. However, in replicate experiments of an 18-protein mixture, we note parent masses deviate 171 ppm, on average, for ion-trap data directed identifications and 8 ppm, on average, for preview Fourier transform (FT) data directed identifications. These deviations are neither caused by poor calibration nor by excessive ion-loading and are most likely due to errors in parent mass estimation. To improve these deviations, we introduce msPrefix, a program to re-estimate a peptide's parent mass from an associated high-accuracy full-scan survey spectrum. In 18-protein mixture experiments, msPrefix parent mass estimates deviate only 1 ppm, on average, from the identified peptides. In a cell lysate experiment searched with a tolerance of 50 ppm, 2295 peptides were confidently identified using native data and 4560 using msPrefixed data. Likewise, in a plasma experiment searched with a tolerance of 50 ppm, 326 peptides were identified using native data and 1216 using msPrefixed data. msPrefix is also able to determine which MS/MS spectra were possibly derived from multiple precursor ions. In complex mixture experiments, we demonstrate that more than 50% of triggered MS/MS may have had multiple precursor ions and note that spectra with multiple candidate ions are less likely to result in an identification using TANDEM. These results demonstrate integration of msPrefix into traditional shotgun proteomics workflows significantly improves identification results.
Assuntos
Proteínas Sanguíneas/química , Peptídeos/análise , Espectrometria de Massas em Tandem/métodos , Algoritmos , Bases de Dados de Proteínas , Humanos , Espectrometria de Massas em Tandem/instrumentaçãoRESUMO
The GXXXG motif is a frequently occurring sequence of residues that is known to favor helix-helix interactions in membrane proteins. Here we show that the GXXXG motif is also prevalent in soluble proteins whose structures have been determined. Some 152 proteins from a non-redundant PDB set contain at least one alpha-helix with the GXXXG motif, 41 +/- 9% more than expected if glycine residues were uniformly distributed in those alpha-helices. More than 50% of the GXXXG-containing alpha-helices participate in helix-helix interactions. In fact, 26 of those helix-helix interactions are structurally similar to the helix-helix interaction of the glycophorin A dimer, where two transmembrane helices associate to form a dimer stabilized by the GXXXG motif. As for the glycophorin A structure, we find backbone-to-backbone atomic contacts of the C alpha-H...O type in each of these 26 helix-helix interactions that display the stereochemical hallmarks of hydrogen bond formation. These glycophorin A-like helix-helix interactions are enriched in the general set of helix-helix interactions containing the GXXXG motif, suggesting that the inferred C alpha-H...O hydrogen bonds stabilize the helix-helix interactions. In addition to the GXXXG motif, some 808 proteins from the non-redundant PDB set contain at least one alpha-helix with the AXXXA motif (30 +/- 3% greater than expected). Both the GXXXG and AXXXA motifs occur frequently in predicted alpha-helices from 24 fully sequenced genomes. Occurrence of the AXXXA motif is enhanced to a greater extent in thermophiles than in mesophiles, suggesting that helical interaction based on the AXXXA motif may be a common mechanism of thermostability in protein structures. We conclude that the GXXXG sequence motif stabilizes helix-helix interactions in proteins, and that the AXXXA sequence motif also stabilizes the folded state of proteins.