Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 11 de 11
Filter
Add more filters










Publication year range
1.
bioRxiv ; 2023 Dec 01.
Article in English | MEDLINE | ID: mdl-38077089

ABSTRACT

Apes possess two sex chromosomes-the male-specific Y and the X shared by males and females. The Y chromosome is crucial for male reproduction, with deletions linked to infertility. The X chromosome carries genes vital for reproduction and cognition. Variation in mating patterns and brain function among great apes suggests corresponding differences in their sex chromosome structure and evolution. However, due to their highly repetitive nature and incomplete reference assemblies, ape sex chromosomes have been challenging to study. Here, using the state-of-the-art experimental and computational methods developed for the telomere-to-telomere (T2T) human genome, we produced gapless, complete assemblies of the X and Y chromosomes for five great apes (chimpanzee, bonobo, gorilla, Bornean and Sumatran orangutans) and a lesser ape, the siamang gibbon. These assemblies completely resolved ampliconic, palindromic, and satellite sequences, including the entire centromeres, allowing us to untangle the intricacies of ape sex chromosome evolution. We found that, compared to the X, ape Y chromosomes vary greatly in size and have low alignability and high levels of structural rearrangements. This divergence on the Y arises from the accumulation of lineage-specific ampliconic regions and palindromes (which are shared more broadly among species on the X) and from the abundance of transposable elements and satellites (which have a lower representation on the X). Our analysis of Y chromosome genes revealed lineage-specific expansions of multi-copy gene families and signatures of purifying selection. In summary, the Y exhibits dynamic evolution, while the X is more stable. Finally, mapping short-read sequencing data from >100 great ape individuals revealed the patterns of diversity and selection on their sex chromosomes, demonstrating the utility of these reference assemblies for studies of great ape evolution. These complete sex chromosome assemblies are expected to further inform conservation genetics of nonhuman apes, all of which are endangered species.

2.
Science ; 376(6588): eabl4178, 2022 04.
Article in English | MEDLINE | ID: mdl-35357911

ABSTRACT

Existing human genome assemblies have almost entirely excluded repetitive sequences within and near centromeres, limiting our understanding of their organization, evolution, and functions, which include facilitating proper chromosome segregation. Now, a complete, telomere-to-telomere human genome assembly (T2T-CHM13) has enabled us to comprehensively characterize pericentromeric and centromeric repeats, which constitute 6.2% of the genome (189.9 megabases). Detailed maps of these regions revealed multimegabase structural rearrangements, including in active centromeric repeat arrays. Analysis of centromere-associated sequences uncovered a strong relationship between the position of the centromere and the evolution of the surrounding DNA through layered repeat expansions. Furthermore, comparisons of chromosome X centromeres across a diverse panel of individuals illuminated high degrees of structural, epigenetic, and sequence variation in these complex and rapidly evolving regions.


Subject(s)
Centromere/genetics , Chromosome Mapping , Epigenesis, Genetic , Genome, Human , Evolution, Molecular , Genomics , Humans , Repetitive Sequences, Nucleic Acid
3.
Science ; 376(6588): 44-53, 2022 04.
Article in English | MEDLINE | ID: mdl-35357919

ABSTRACT

Since its initial release in 2000, the human reference genome has covered only the euchromatic fraction of the genome, leaving important heterochromatic regions unfinished. Addressing the remaining 8% of the genome, the Telomere-to-Telomere (T2T) Consortium presents a complete 3.055 billion-base pair sequence of a human genome, T2T-CHM13, that includes gapless assemblies for all chromosomes except Y, corrects errors in the prior references, and introduces nearly 200 million base pairs of sequence containing 1956 gene predictions, 99 of which are predicted to be protein coding. The completed regions include all centromeric satellite arrays, recent segmental duplications, and the short arms of all five acrocentric chromosomes, unlocking these complex regions of the genome to variational and functional studies.


Subject(s)
Genome, Human , Human Genome Project , Sequence Analysis, DNA/standards , Cell Line , Chromosomes, Artificial, Bacterial/genetics , Chromosomes, Human/genetics , Humans , Reference Values
4.
Nature ; 585(7823): 79-84, 2020 09.
Article in English | MEDLINE | ID: mdl-32663838

ABSTRACT

After two decades of improvements, the current human reference genome (GRCh38) is the most accurate and complete vertebrate genome ever produced. However, no single chromosome has been finished end to end, and hundreds of unresolved gaps persist1,2. Here we present a human genome assembly that surpasses the continuity of GRCh382, along with a gapless, telomere-to-telomere assembly of a human chromosome. This was enabled by high-coverage, ultra-long-read nanopore sequencing of the complete hydatidiform mole CHM13 genome, combined with complementary technologies for quality improvement and validation. Focusing our efforts on the human X chromosome3, we reconstructed the centromeric satellite DNA array (approximately 3.1 Mb) and closed the 29 remaining gaps in the current reference, including new sequences from the human pseudoautosomal regions and from cancer-testis ampliconic gene families (CT-X and GAGE). These sequences will be integrated into future human reference genome releases. In addition, the complete chromosome X, combined with the ultra-long nanopore data, allowed us to map methylation patterns across complex tandem repeats and satellite arrays. Our results demonstrate that finishing the entire human genome is now within reach, and the data presented here will facilitate ongoing efforts to complete the other human chromosomes.


Subject(s)
Chromosomes, Human, X/genetics , Genome, Human/genetics , Telomere/genetics , Centromere/genetics , CpG Islands/genetics , DNA Methylation , DNA, Satellite/genetics , Female , Humans , Hydatidiform Mole/genetics , Male , Pregnancy , Reproducibility of Results , Testis/metabolism
5.
Sci Transl Med ; 6(254): 254ra126, 2014 Sep 17.
Article in English | MEDLINE | ID: mdl-25232178

ABSTRACT

Public health officials have raised concerns that plasmid transfer between Enterobacteriaceae species may spread resistance to carbapenems, an antibiotic class of last resort, thereby rendering common health care-associated infections nearly impossible to treat. To determine the diversity of carbapenemase-encoding plasmids and assess their mobility among bacterial species, we performed comprehensive surveillance and genomic sequencing of carbapenem-resistant Enterobacteriaceae in the National Institutes of Health (NIH) Clinical Center patient population and hospital environment. We isolated a repertoire of carbapenemase-encoding Enterobacteriaceae, including multiple strains of Klebsiella pneumoniae, Klebsiella oxytoca, Escherichia coli, Enterobacter cloacae, Citrobacter freundii, and Pantoea species. Long-read genome sequencing with full end-to-end assembly revealed that these organisms carry the carbapenem resistance genes on a wide array of plasmids. K. pneumoniae and E. cloacae isolated simultaneously from a single patient harbored two different carbapenemase-encoding plasmids, indicating that plasmid transfer between organisms was unlikely within this patient. We did, however, find evidence of horizontal transfer of carbapenemase-encoding plasmids between K. pneumoniae, E. cloacae, and C. freundii in the hospital environment. Our data, including full plasmid identification, challenge assumptions about horizontal gene transfer events within patients and identify possible connections between patients and the hospital environment. In addition, we identified a new carbapenemase-encoding plasmid of potentially high clinical impact carried by K. pneumoniae, E. coli, E. cloacae, and Pantoea species, in unrelated patients and in the hospital environment.


Subject(s)
Bacterial Proteins/biosynthesis , Cross Infection , Enterobacteriaceae/enzymology , Plasmids , beta-Lactamases/biosynthesis , Enterobacteriaceae/classification , Enterobacteriaceae/genetics , Hospitals, Public , Humans , National Institutes of Health (U.S.) , Population Surveillance , Real-Time Polymerase Chain Reaction , United States
6.
J Clin Microbiol ; 51(3): 752-8, 2013 Mar.
Article in English | MEDLINE | ID: mdl-23254127

ABSTRACT

With increasing rates of antibiotic resistance, bacterial infections have become more difficult to treat, elevating the importance of surveillance and prevention. Effective surveillance relies on the availability of rapid, cost-effective, and informative typing methods to monitor bacterial isolates. PCR-based typing assays are fast and inexpensive, but their utility is limited by the lack of targets which are capable of distinguishing between strains within a species. To identify highly informative PCR targets from the growing base of publicly available bacterial genome sequences, we developed pan-PCR. This computer algorithm uses existing genome sequences for isolates of a species of interest and identifies a set of genes whose patterns of presence or absence provide the best discrimination between strains in this species. A set of PCR primers targeting the identified genes is then designed, with each PCR product being of a different size to allow multiplexing. These target DNA regions and PCR primers can then be utilized to type bacterial isolates. To evaluate pan-PCR, we designed an assay for the emerging pathogen Acinetobacter baumannii. Taking as input a set of 29 previously sequenced genomes, pan-PCR identified 6 genetic loci whose presence or absence was capable of distinguishing all the input strains. This assay was applied to a set of patient isolates, and its discriminatory power was compared to that of multilocus sequence typing (MLST) and whole-genome optical maps. We found that the pan-PCR assay was capable of making clinically relevant distinctions between strains with identical MLST profiles and showed a discriminatory power similar to that of optical maps. Pan-PCR represents a tool capable of exploiting available genome sequence data to design highly discriminatory PCR assays. The ease of design and implementation makes this approach feasible for diagnostic facilities of all sizes.


Subject(s)
Computational Biology/methods , Genome, Bacterial , Molecular Typing/methods , Polymerase Chain Reaction/methods , Algorithms , Bacteria/classification , Bacteria/genetics , Bacteria/isolation & purification , Bacterial Infections/epidemiology , Bacterial Infections/microbiology , DNA Primers/genetics , Humans , Molecular Epidemiology/methods , Software
7.
BMC Biol ; 10: 107, 2012 Dec 21.
Article in English | MEDLINE | ID: mdl-23259493

ABSTRACT

BACKGROUND: Calcium-activated photoproteins are luciferase variants found in photocyte cells of bioluminescent jellyfish (Phylum Cnidaria) and comb jellies (Phylum Ctenophora). The complete genomic sequence from the ctenophore Mnemiopsis leidyi, a representative of the earliest branch of animals that emit light, provided an opportunity to examine the genome of an organism that uses this class of luciferase for bioluminescence and to look for genes involved in light reception. To determine when photoprotein genes first arose, we examined the genomic sequence from other early-branching taxa. We combined our genomic survey with gene trees, developmental expression patterns, and functional protein assays of photoproteins and opsins to provide a comprehensive view of light production and light reception in Mnemiopsis. RESULTS: The Mnemiopsis genome has 10 full-length photoprotein genes situated within two genomic clusters with high sequence conservation that are maintained due to strong purifying selection and concerted evolution. Photoprotein-like genes were also identified in the genomes of the non-luminescent sponge Amphimedon queenslandica and the non-luminescent cnidarian Nematostella vectensis, and phylogenomic analysis demonstrated that photoprotein genes arose at the base of all animals. Photoprotein gene expression in Mnemiopsis embryos begins during gastrulation in migrating precursors to photocytes and persists throughout development in the canals where photocytes reside. We identified three putative opsin genes in the Mnemiopsis genome and show that they do not group with well-known bilaterian opsin subfamilies. Interestingly, photoprotein transcripts are co-expressed with two of the putative opsins in developing photocytes. Opsin expression is also seen in the apical sensory organ. We present evidence that one opsin functions as a photopigment in vitro, absorbing light at wavelengths that overlap with peak photoprotein light emission, raising the hypothesis that light production and light reception may be functionally connected in ctenophore photocytes. We also present genomic evidence of a complete ciliary phototransduction cascade in Mnemiopsis. CONCLUSIONS: This study elucidates the genomic organization, evolutionary history, and developmental expression of photoprotein and opsin genes in the ctenophore Mnemiopsis leidyi, introduces a novel dual role for ctenophore photocytes in both bioluminescence and phototransduction, and raises the possibility that light production and light reception are linked in this early-branching non-bilaterian animal.


Subject(s)
Ctenophora/cytology , Ctenophora/genetics , Evolution, Molecular , Gene Expression Regulation , Genome/genetics , Luminescent Proteins/genetics , Opsins/genetics , Amino Acid Sequence , Animals , Cluster Analysis , Ctenophora/radiation effects , Gene Expression Regulation/radiation effects , Green Fluorescent Proteins/metabolism , Light , Light Signal Transduction/radiation effects , Luminescent Proteins/chemistry , Luminescent Proteins/metabolism , Molecular Sequence Data , Opsins/chemistry , Opsins/metabolism , Phylogeny , RNA, Messenger/genetics , RNA, Messenger/metabolism , Reproducibility of Results , Selection, Genetic , Sequence Alignment , Sequence Analysis, Protein , Spectrum Analysis
8.
Genome Biol ; 13(7): R64, 2012 Jul 25.
Article in English | MEDLINE | ID: mdl-22830599

ABSTRACT

BACKGROUND: While Staphylococcus epidermidis is commonly isolated from healthy human skin, it is also the most frequent cause of nosocomial infections on indwelling medical devices. Despite its importance, few genome sequences existed and the most frequent hospital-associated lineage, ST2, had not been fully sequenced. RESULTS: We cultivated 71 commensal S. epidermidis isolates from 15 skin sites and compared them with 28 nosocomial isolates from venous catheters and blood cultures. We produced 21 commensal and 9 nosocomial draft genomes, and annotated and compared their gene content, phylogenetic relatedness and biochemical functions. The commensal strains had an open pan-genome with 80% core genes and 20% variable genes. The variable genome was characterized by an overabundance of transposable elements, transcription factors and transporters. Biochemical diversity, as assayed by antibiotic resistance and in vitro biofilm formation, demonstrated the varied phenotypic consequences of this genomic diversity. The nosocomial isolates exhibited both large-scale rearrangements and single-nucleotide variation. We showed that S. epidermidis genomes separate into two phylogenetic groups, one consisting only of commensals. The formate dehydrogenase gene, present only in commensals, is a discriminatory marker between the two groups. CONCLUSIONS: Commensal skin S. epidermidis have an open pan-genome and show considerable diversity between isolates, even when derived from a single individual or body site. For ST2, the most common nosocomial lineage, we detect variation between three independent isolates sequenced. Finally, phylogenetic analyses revealed a previously unrecognized group of S. epidermidis strains characterized by reduced virulence and formate dehydrogenase, which we propose as a clinical molecular marker.


Subject(s)
Catheter-Related Infections/microbiology , Cross Infection/microbiology , Sequence Analysis, DNA/methods , Skin/microbiology , Staphylococcus epidermidis/classification , Staphylococcus epidermidis/genetics , Drug Resistance, Bacterial , Evolution, Molecular , Genetic Variation , Genome, Bacterial , Humans , Molecular Sequence Data , Molecular Typing , Phylogeny , Staphylococcus epidermidis/isolation & purification
9.
BMC Genomics ; 11: 21, 2010 Jan 11.
Article in English | MEDLINE | ID: mdl-20064230

ABSTRACT

BACKGROUND: The approaches for shotgun-based sequencing of vertebrate genomes are now well-established, and have resulted in the generation of numerous draft whole-genome sequence assemblies. In contrast, the process of refining those assemblies to improve contiguity and increase accuracy (known as 'sequence finishing') remains tedious, labor-intensive, and expensive. As a result, the vast majority of vertebrate genome sequences generated to date remain at a draft stage. RESULTS: To date, our genome sequencing efforts have focused on comparative studies of targeted genomic regions, requiring sequence finishing of large blocks of orthologous sequence (average size 0.5-2 Mb) from various subsets of 75 vertebrates. This experience has provided a unique opportunity to compare the relative effort required to finish shotgun-generated genome sequence assemblies from different species, which we report here. Importantly, we found that the sequence assemblies generated for the same orthologous regions from various vertebrates show substantial variation with respect to misassemblies and, in particular, the frequency and characteristics of sequence gaps. As a consequence, the work required to finish different species' sequences varied greatly. Application of the same standardized methods for finishing provided a novel opportunity to "assay" characteristics of genome sequences among many vertebrate species. It is important to note that many of the problems we have encountered during sequence finishing reflect unique architectural features of a particular vertebrate's genome, which in some cases may have important functional and/or evolutionary implications. Finally, based on our analyses, we have been able to improve our procedures to overcome some of these problems and to increase the overall efficiency of the sequence-finishing process, although significant challenges still remain. CONCLUSION: Our findings have important implications for the eventual finishing of the draft whole-genome sequences that have now been generated for a large number of vertebrates.


Subject(s)
Genomics/methods , Sequence Analysis, DNA/methods , Vertebrates/genetics , Animals , Chromosome Mapping , Chromosomes, Artificial, Bacterial , Genome
10.
Genome Res ; 14(11): 2235-44, 2004 Nov.
Article in English | MEDLINE | ID: mdl-15479945

ABSTRACT

Although the cost of generating draft-quality genomic sequence continues to decline, refining that sequence by the process of "sequence finishing" remains expensive. Near-perfect finished sequence is an appropriate goal for the human genome and a small set of reference genomes; however, such a high-quality product cannot be cost-justified for large numbers of additional genomes, at least for the foreseeable future. Here we describe the generation and quality of an intermediate grade of finished genomic sequence (termed comparative-grade finished sequence), which is tailored for use in multispecies sequence comparisons. Our analyses indicate that this sequence is very high quality (with the residual gaps and errors mostly falling within repetitive elements) and reflects 99% of the total sequence. Importantly, comparative-grade sequence finishing requires approximately 40-fold less reagents and approximately 10-fold less personnel effort compared to the generation of near-perfect finished sequence, such as that produced for the human genome. Although applied here to finishing sequence derived from individual bacterial artificial chromosome (BAC) clones, one could envision establishing routines for refining sequences emanating from whole-genome shotgun sequencing projects to a similar quality level. Our experience to date demonstrates that comparative-grade sequence finishing represents a practical and affordable option for sequence refinement en route to comparative analyses.


Subject(s)
Chromosomes, Artificial, Bacterial/genetics , Contig Mapping/economics , Exons/genetics , Genome , Sequence Analysis, DNA/economics , Software , Animals , Base Sequence , Cloning, Molecular , Computational Biology , Contig Mapping/methods , Costs and Cost Analysis , Databases, Genetic , Lemur/genetics , Molecular Sequence Data , Papio/genetics , Rats , Repetitive Sequences, Nucleic Acid , Sequence Homology, Nucleic Acid , Software/economics
11.
Science ; 302(5646): 842-6, 2003 Oct 31.
Article in English | MEDLINE | ID: mdl-14593172

ABSTRACT

Functional analysis of a genome requires accurate gene structure information and a complete gene inventory. A dual experimental strategy was used to verify and correct the initial genome sequence annotation of the reference plant Arabidopsis. Sequencing full-length cDNAs and hybridizations using RNA populations from various tissues to a set of high-density oligonucleotide arrays spanning the entire genome allowed the accurate annotation of thousands of gene structures. We identified 5817 novel transcription units, including a substantial amount of antisense gene transcription, and 40 genes within the genetically defined centromeres. This approach resulted in completion of approximately 30% of the Arabidopsis ORFeome as a resource for global functional experimentation of the plant proteome.


Subject(s)
Arabidopsis/genetics , Genome, Plant , RNA, Messenger/genetics , RNA, Plant/genetics , Transcription, Genetic , Chromosome Mapping , Chromosomes, Plant/genetics , Cloning, Molecular , Computational Biology , DNA, Complementary/genetics , DNA, Intergenic , Expressed Sequence Tags , Gene Expression Profiling , Genes, Plant , Genomics , Nucleic Acid Hybridization , Oligonucleotide Array Sequence Analysis , Open Reading Frames , Reverse Transcriptase Polymerase Chain Reaction
SELECTION OF CITATIONS
SEARCH DETAIL
...