Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 20
Filter
1.
bioRxiv ; 2023 Dec 01.
Article in English | MEDLINE | ID: mdl-38077089

ABSTRACT

Apes possess two sex chromosomes-the male-specific Y and the X shared by males and females. The Y chromosome is crucial for male reproduction, with deletions linked to infertility. The X chromosome carries genes vital for reproduction and cognition. Variation in mating patterns and brain function among great apes suggests corresponding differences in their sex chromosome structure and evolution. However, due to their highly repetitive nature and incomplete reference assemblies, ape sex chromosomes have been challenging to study. Here, using the state-of-the-art experimental and computational methods developed for the telomere-to-telomere (T2T) human genome, we produced gapless, complete assemblies of the X and Y chromosomes for five great apes (chimpanzee, bonobo, gorilla, Bornean and Sumatran orangutans) and a lesser ape, the siamang gibbon. These assemblies completely resolved ampliconic, palindromic, and satellite sequences, including the entire centromeres, allowing us to untangle the intricacies of ape sex chromosome evolution. We found that, compared to the X, ape Y chromosomes vary greatly in size and have low alignability and high levels of structural rearrangements. This divergence on the Y arises from the accumulation of lineage-specific ampliconic regions and palindromes (which are shared more broadly among species on the X) and from the abundance of transposable elements and satellites (which have a lower representation on the X). Our analysis of Y chromosome genes revealed lineage-specific expansions of multi-copy gene families and signatures of purifying selection. In summary, the Y exhibits dynamic evolution, while the X is more stable. Finally, mapping short-read sequencing data from >100 great ape individuals revealed the patterns of diversity and selection on their sex chromosomes, demonstrating the utility of these reference assemblies for studies of great ape evolution. These complete sex chromosome assemblies are expected to further inform conservation genetics of nonhuman apes, all of which are endangered species.

2.
Science ; 376(6588): 44-53, 2022 04.
Article in English | MEDLINE | ID: mdl-35357919

ABSTRACT

Since its initial release in 2000, the human reference genome has covered only the euchromatic fraction of the genome, leaving important heterochromatic regions unfinished. Addressing the remaining 8% of the genome, the Telomere-to-Telomere (T2T) Consortium presents a complete 3.055 billion-base pair sequence of a human genome, T2T-CHM13, that includes gapless assemblies for all chromosomes except Y, corrects errors in the prior references, and introduces nearly 200 million base pairs of sequence containing 1956 gene predictions, 99 of which are predicted to be protein coding. The completed regions include all centromeric satellite arrays, recent segmental duplications, and the short arms of all five acrocentric chromosomes, unlocking these complex regions of the genome to variational and functional studies.


Subject(s)
Genome, Human , Human Genome Project , Sequence Analysis, DNA/standards , Cell Line , Chromosomes, Artificial, Bacterial/genetics , Chromosomes, Human/genetics , Humans , Reference Values
3.
Nature ; 585(7823): 79-84, 2020 09.
Article in English | MEDLINE | ID: mdl-32663838

ABSTRACT

After two decades of improvements, the current human reference genome (GRCh38) is the most accurate and complete vertebrate genome ever produced. However, no single chromosome has been finished end to end, and hundreds of unresolved gaps persist1,2. Here we present a human genome assembly that surpasses the continuity of GRCh382, along with a gapless, telomere-to-telomere assembly of a human chromosome. This was enabled by high-coverage, ultra-long-read nanopore sequencing of the complete hydatidiform mole CHM13 genome, combined with complementary technologies for quality improvement and validation. Focusing our efforts on the human X chromosome3, we reconstructed the centromeric satellite DNA array (approximately 3.1 Mb) and closed the 29 remaining gaps in the current reference, including new sequences from the human pseudoautosomal regions and from cancer-testis ampliconic gene families (CT-X and GAGE). These sequences will be integrated into future human reference genome releases. In addition, the complete chromosome X, combined with the ultra-long nanopore data, allowed us to map methylation patterns across complex tandem repeats and satellite arrays. Our results demonstrate that finishing the entire human genome is now within reach, and the data presented here will facilitate ongoing efforts to complete the other human chromosomes.


Subject(s)
Chromosomes, Human, X/genetics , Genome, Human/genetics , Telomere/genetics , Centromere/genetics , CpG Islands/genetics , DNA Methylation , DNA, Satellite/genetics , Female , Humans , Hydatidiform Mole/genetics , Male , Pregnancy , Reproducibility of Results , Testis/metabolism
4.
Sci Transl Med ; 6(254): 254ra126, 2014 Sep 17.
Article in English | MEDLINE | ID: mdl-25232178

ABSTRACT

Public health officials have raised concerns that plasmid transfer between Enterobacteriaceae species may spread resistance to carbapenems, an antibiotic class of last resort, thereby rendering common health care-associated infections nearly impossible to treat. To determine the diversity of carbapenemase-encoding plasmids and assess their mobility among bacterial species, we performed comprehensive surveillance and genomic sequencing of carbapenem-resistant Enterobacteriaceae in the National Institutes of Health (NIH) Clinical Center patient population and hospital environment. We isolated a repertoire of carbapenemase-encoding Enterobacteriaceae, including multiple strains of Klebsiella pneumoniae, Klebsiella oxytoca, Escherichia coli, Enterobacter cloacae, Citrobacter freundii, and Pantoea species. Long-read genome sequencing with full end-to-end assembly revealed that these organisms carry the carbapenem resistance genes on a wide array of plasmids. K. pneumoniae and E. cloacae isolated simultaneously from a single patient harbored two different carbapenemase-encoding plasmids, indicating that plasmid transfer between organisms was unlikely within this patient. We did, however, find evidence of horizontal transfer of carbapenemase-encoding plasmids between K. pneumoniae, E. cloacae, and C. freundii in the hospital environment. Our data, including full plasmid identification, challenge assumptions about horizontal gene transfer events within patients and identify possible connections between patients and the hospital environment. In addition, we identified a new carbapenemase-encoding plasmid of potentially high clinical impact carried by K. pneumoniae, E. coli, E. cloacae, and Pantoea species, in unrelated patients and in the hospital environment.


Subject(s)
Bacterial Proteins/biosynthesis , Cross Infection , Enterobacteriaceae/enzymology , Plasmids , beta-Lactamases/biosynthesis , Enterobacteriaceae/classification , Enterobacteriaceae/genetics , Hospitals, Public , Humans , National Institutes of Health (U.S.) , Population Surveillance , Real-Time Polymerase Chain Reaction , United States
5.
PLoS Genet ; 10(3): e1004190, 2014 Mar.
Article in English | MEDLINE | ID: mdl-24603370

ABSTRACT

Although a considerable proportion of serum lipids loci identified in European ancestry individuals (EA) replicate in African Americans (AA), interethnic differences in the distribution of serum lipids suggest that some genetic determinants differ by ethnicity. We conducted a comprehensive evaluation of five lipid candidate genes to identify variants with ethnicity-specific effects. We sequenced ABCA1, LCAT, LPL, PON1, and SERPINE1 in 48 AA individuals with extreme serum lipid concentrations (high HDLC/low TG or low HDLC/high TG). Identified variants were genotyped in the full population-based sample of AA (n = 1694) and tested for an association with serum lipids. rs328 (LPL) and correlated variants were associated with higher HDLC and lower TG. Interestingly, a stronger effect was observed on a "European" vs. "African" genetic background at this locus. To investigate this effect, we evaluated the region among West Africans (WA). For TG, the effect size among WA was the same in AA with only African local ancestry (2-3% lower TG), while the larger association among AA with local European ancestry matched previous reports in EA (10%). For HDLC, there was no association with rs328 in AA with only African local ancestry or in WA, while the association among AA with European local ancestry was much greater than what has been observed for EA (15 vs. ∼ 5 mg/dl), suggesting an interaction with an environmental or genetic factor that differs by ethnicity. Beyond this ancestry effect, the importance of African ancestry-focused, sequence-based work was also highlighted by serum lipid associations of variants that were in higher frequency (or present only) among those of African ancestry. By beginning our study with the sequence variation present in AA individuals, investigating local ancestry effects, and seeking replication in WA, we were able to comprehensively evaluate the role of a set of candidate genes in serum lipids in AA.


Subject(s)
Black or African American/genetics , Ethnicity/genetics , Genome-Wide Association Study , Lipids/genetics , Genetic Variation , Genotype , High-Throughput Nucleotide Sequencing , Humans , Linkage Disequilibrium , Lipids/blood , Polymorphism, Single Nucleotide , White People/genetics
6.
Genome Biol ; 13(7): R64, 2012 Jul 25.
Article in English | MEDLINE | ID: mdl-22830599

ABSTRACT

BACKGROUND: While Staphylococcus epidermidis is commonly isolated from healthy human skin, it is also the most frequent cause of nosocomial infections on indwelling medical devices. Despite its importance, few genome sequences existed and the most frequent hospital-associated lineage, ST2, had not been fully sequenced. RESULTS: We cultivated 71 commensal S. epidermidis isolates from 15 skin sites and compared them with 28 nosocomial isolates from venous catheters and blood cultures. We produced 21 commensal and 9 nosocomial draft genomes, and annotated and compared their gene content, phylogenetic relatedness and biochemical functions. The commensal strains had an open pan-genome with 80% core genes and 20% variable genes. The variable genome was characterized by an overabundance of transposable elements, transcription factors and transporters. Biochemical diversity, as assayed by antibiotic resistance and in vitro biofilm formation, demonstrated the varied phenotypic consequences of this genomic diversity. The nosocomial isolates exhibited both large-scale rearrangements and single-nucleotide variation. We showed that S. epidermidis genomes separate into two phylogenetic groups, one consisting only of commensals. The formate dehydrogenase gene, present only in commensals, is a discriminatory marker between the two groups. CONCLUSIONS: Commensal skin S. epidermidis have an open pan-genome and show considerable diversity between isolates, even when derived from a single individual or body site. For ST2, the most common nosocomial lineage, we detect variation between three independent isolates sequenced. Finally, phylogenetic analyses revealed a previously unrecognized group of S. epidermidis strains characterized by reduced virulence and formate dehydrogenase, which we propose as a clinical molecular marker.


Subject(s)
Catheter-Related Infections/microbiology , Cross Infection/microbiology , Sequence Analysis, DNA/methods , Skin/microbiology , Staphylococcus epidermidis/classification , Staphylococcus epidermidis/genetics , Drug Resistance, Bacterial , Evolution, Molecular , Genetic Variation , Genome, Bacterial , Humans , Molecular Sequence Data , Molecular Typing , Phylogeny , Staphylococcus epidermidis/isolation & purification
7.
Nat Genet ; 43(3): 189-96, 2011 Mar.
Article in English | MEDLINE | ID: mdl-21258341

ABSTRACT

Ciliary dysfunction leads to a broad range of overlapping phenotypes, collectively termed ciliopathies. This grouping is underscored by genetic overlap, where causal genes can also contribute modifier alleles to clinically distinct disorders. Here we show that mutations in TTC21B, which encodes the retrograde intraflagellar transport protein IFT139, cause both isolated nephronophthisis and syndromic Jeune asphyxiating thoracic dystrophy. Moreover, although resequencing of TTC21B in a large, clinically diverse ciliopathy cohort and matched controls showed a similar frequency of rare changes, in vivo and in vitro evaluations showed a significant enrichment of pathogenic alleles in cases (P < 0.003), suggesting that TTC21B contributes pathogenic alleles to ∼5% of ciliopathy cases. Our data illustrate how genetic lesions can be both causally associated with diverse ciliopathies and interact in trans with other disease-causing genes and highlight how saturated resequencing followed by functional analysis of all variants informs the genetic architecture of inherited disorders.


Subject(s)
Adaptor Proteins, Signal Transducing/genetics , Alleles , Ciliary Motility Disorders/genetics , Animals , Genetic Variation , Humans , Mice , Mutation , Pedigree , Photoreceptor Cells/physiology , Zebrafish/genetics
8.
BMC Genomics ; 11: 21, 2010 Jan 11.
Article in English | MEDLINE | ID: mdl-20064230

ABSTRACT

BACKGROUND: The approaches for shotgun-based sequencing of vertebrate genomes are now well-established, and have resulted in the generation of numerous draft whole-genome sequence assemblies. In contrast, the process of refining those assemblies to improve contiguity and increase accuracy (known as 'sequence finishing') remains tedious, labor-intensive, and expensive. As a result, the vast majority of vertebrate genome sequences generated to date remain at a draft stage. RESULTS: To date, our genome sequencing efforts have focused on comparative studies of targeted genomic regions, requiring sequence finishing of large blocks of orthologous sequence (average size 0.5-2 Mb) from various subsets of 75 vertebrates. This experience has provided a unique opportunity to compare the relative effort required to finish shotgun-generated genome sequence assemblies from different species, which we report here. Importantly, we found that the sequence assemblies generated for the same orthologous regions from various vertebrates show substantial variation with respect to misassemblies and, in particular, the frequency and characteristics of sequence gaps. As a consequence, the work required to finish different species' sequences varied greatly. Application of the same standardized methods for finishing provided a novel opportunity to "assay" characteristics of genome sequences among many vertebrate species. It is important to note that many of the problems we have encountered during sequence finishing reflect unique architectural features of a particular vertebrate's genome, which in some cases may have important functional and/or evolutionary implications. Finally, based on our analyses, we have been able to improve our procedures to overcome some of these problems and to increase the overall efficiency of the sequence-finishing process, although significant challenges still remain. CONCLUSION: Our findings have important implications for the eventual finishing of the draft whole-genome sequences that have now been generated for a large number of vertebrates.


Subject(s)
Genomics/methods , Sequence Analysis, DNA/methods , Vertebrates/genetics , Animals , Chromosome Mapping , Chromosomes, Artificial, Bacterial , Genome
9.
Genome Res ; 19(9): 1665-74, 2009 Sep.
Article in English | MEDLINE | ID: mdl-19602640

ABSTRACT

ClinSeq is a pilot project to investigate the use of whole-genome sequencing as a tool for clinical research. By piloting the acquisition of large amounts of DNA sequence data from individual human subjects, we are fostering the development of hypothesis-generating approaches for performing research in genomic medicine, including the exploration of issues related to the genetic architecture of disease, implementation of genomic technology, informed consent, disclosure of genetic information, and archiving, analyzing, and displaying sequence data. In the initial phase of ClinSeq, we are enrolling roughly 1000 participants; the evaluation of each includes obtaining a detailed family and medical history, as well as a clinical evaluation. The participants are being consented broadly for research on many traits and for whole-genome sequencing. Initially, Sanger-based sequencing of 300-400 genes thought to be relevant to atherosclerosis is being performed, with the resulting data analyzed for rare, high-penetrance variants associated with specific clinical traits. The participants are also being consented to allow the contact of family members for additional studies of sequence variants to explore their potential association with specific phenotypes. Here, we present the general considerations in designing ClinSeq, preliminary results based on the generation of an initial 826 Mb of sequence data, the findings for several genes that serve as positive controls for the project, and our views about the potential implications of ClinSeq. The early experiences with ClinSeq illustrate how large-scale medical sequencing can be a practical, productive, and critical component of research in genomic medicine.


Subject(s)
Atherosclerosis/genetics , Biomedical Research , Cardiovascular Diseases/genetics , Genome, Human , Genomics , Pilot Projects , Sequence Analysis, DNA/methods , Aged , Cohort Studies , Female , Humans , Male , Pedigree , Phenotype
10.
Science ; 324(5931): 1190-2, 2009 May 29.
Article in English | MEDLINE | ID: mdl-19478181

ABSTRACT

Human skin is a large, heterogeneous organ that protects the body from pathogens while sustaining microorganisms that influence human health and disease. Our analysis of 16S ribosomal RNA gene sequences obtained from 20 distinct skin sites of healthy humans revealed that physiologically comparable sites harbor similar bacterial communities. The complexity and stability of the microbial community are dependent on the specific characteristics of the skin site. This topographical and temporal survey provides a baseline for studies that examine the role of bacterial communities in disease states and the microbial interdependencies required to maintain healthy skin.


Subject(s)
Bacteria/isolation & purification , Metagenome , Skin/microbiology , Actinobacteria/classification , Actinobacteria/genetics , Actinobacteria/isolation & purification , Adult , Bacteria/classification , Bacteria/genetics , Bacteroidetes/classification , Bacteroidetes/genetics , Bacteroidetes/isolation & purification , Biodiversity , Female , Genes, rRNA , Humans , Male , Molecular Sequence Data , Phylogeny , Proteobacteria/classification , Proteobacteria/genetics , Proteobacteria/isolation & purification , RNA, Ribosomal, 16S , Time Factors , Young Adult
11.
Genome Res ; 18(7): 1043-50, 2008 Jul.
Article in English | MEDLINE | ID: mdl-18502944

ABSTRACT

The many layers and structures of the skin serve as elaborate hosts to microbes, including a diversity of commensal and pathogenic bacteria that contribute to both human health and disease. To determine the complexity and identity of the microbes inhabiting the skin, we sequenced bacterial 16S small-subunit ribosomal RNA genes isolated from the inner elbow of five healthy human subjects. This analysis revealed 113 operational taxonomic units (OTUs; "phylotypes") at the level of 97% similarity that belong to six bacterial divisions. To survey all depths of the skin, we sampled using three methods: swab, scrape, and punch biopsy. Proteobacteria dominated the skin microbiota at all depths of sampling. Interpersonal variation is approximately equal to intrapersonal variation when considering bacterial community membership and structure. Finally, we report strong similarities in the complexity and identity of mouse and human skin microbiota. This study of healthy human skin microbiota will serve to direct future research addressing the role of skin microbiota in health and disease, and metagenomic projects addressing the complex physiological interactions between the skin and the microbes that inhabit this environment.


Subject(s)
Bacteria/genetics , Genetic Variation , Skin/microbiology , Adult , Aged , Animals , DNA, Bacterial/analysis , DNA, Bacterial/genetics , DNA, Ribosomal/genetics , Female , Humans , Male , Mice , Mice, Inbred C57BL , Middle Aged , RNA, Ribosomal, 16S/genetics
12.
J Mol Evol ; 65(3): 207-14, 2007 Sep.
Article in English | MEDLINE | ID: mdl-17676366

ABSTRACT

It is understood that DNA and amino acid substitution rates are highly sequence context-dependent, e.g., C --> T substitutions in vertebrates may occur much more frequently at CpG sites and that cysteine substitution rates may depend on support of the context for participation in a disulfide bond. Furthermore, many applications rely on quantitative models of nucleotide or amino acid substitution, including phylogenetic inference and identification of amino acid sequence positions involved in functional specificity. We describe quantification of the context dependence of nucleotide substitution rates using baboon, chimpanzee, and human genomic sequence data generated by the NISC Comparative Sequencing Program. Relative mutation rates are reported for the 96 classes of mutations of the form 5' alphabetagamma 3' --> 5' alphadeltagamma 3', where alpha, beta, gamma, and delta are nucleotides and beta not equal delta, based on maximum likelihood calculations. Our results confirm that C --> T substitutions are enhanced at CpG sites compared with other transitions, relatively independent of the identity of the preceding nucleotide. While, as expected, transitions generally occur more frequently than transversions, we find that the most frequent transversions involve the C at CpG sites (CpG transversions) and that their rate is comparable to the rate of transitions at non-CpG sites. A four-class model of the rates of context-dependent evolution of primate DNA sequences, CpG transitions > non-CpG transitions approximately CpG transversions > non-CpG transversions, captures qualitative features of the mutation spectrum. We find that despite qualitative similarity of mutation rates among different genomic regions, there are statistically significant differences.


Subject(s)
Genomic Instability , Mutation , Primates/genetics , Animals , Base Composition , Base Sequence , CpG Islands , DNA Mutational Analysis , Humans , Likelihood Functions , Models, Genetic
13.
Genome Res ; 17(6): 760-74, 2007 Jun.
Article in English | MEDLINE | ID: mdl-17567995

ABSTRACT

A key component of the ongoing ENCODE project involves rigorous comparative sequence analyses for the initially targeted 1% of the human genome. Here, we present orthologous sequence generation, alignment, and evolutionary constraint analyses of 23 mammalian species for all ENCODE targets. Alignments were generated using four different methods; comparisons of these methods reveal large-scale consistency but substantial differences in terms of small genomic rearrangements, sensitivity (sequence coverage), and specificity (alignment accuracy). We describe the quantitative and qualitative trade-offs concomitant with alignment method choice and the levels of technical error that need to be accounted for in applications that require multisequence alignments. Using the generated alignments, we identified constrained regions using three different methods. While the different constraint-detecting methods are in general agreement, there are important discrepancies relating to both the underlying alignments and the specific algorithms. However, by integrating the results across the alignments and constraint-detecting methods, we produced constraint annotations that were found to be robust based on multiple independent measures. Analyses of these annotations illustrate that most classes of experimentally annotated functional elements are enriched for constrained sequences; however, large portions of each class (with the exception of protein-coding sequences) do not overlap constrained regions. The latter elements might not be under primary sequence constraint, might not be constrained across all mammals, or might have expendable molecular functions. Conversely, 40% of the constrained sequences do not overlap any of the functional elements that have been experimentally identified. Together, these findings demonstrate and quantify how many genomic functional elements await basic molecular characterization.


Subject(s)
Evolution, Molecular , Genome, Human , Mammals/genetics , Open Reading Frames , Phylogeny , Sequence Alignment , Animals , Human Genome Project , Humans
14.
Genome Res ; 16(6): 796-803, 2006 Jun.
Article in English | MEDLINE | ID: mdl-16672307

ABSTRACT

Sequencing of full-insert clones from full-length cDNA libraries from both Xenopus laevis and Xenopus tropicalis has been ongoing as part of the Xenopus Gene Collection Initiative. Here we present 10,967 full ORF verified cDNA clones (8049 from X. laevis and 2918 from X. tropicalis) as a community resource. Because the genome of X. laevis, but not X. tropicalis, has undergone allotetraploidization, comparison of coding sequences from these two clawed (pipid) frogs provides a unique angle for exploring the molecular evolution of duplicate genes. Within our clone set, we have identified 445 gene trios, each comprised of an allotetraploidization-derived X. laevis gene pair and their shared X. tropicalis ortholog. Pairwise dN/dS, comparisons within trios show strong evidence for purifying selection acting on all three members. However, dN/dS ratios between X. laevis gene pairs are elevated relative to their X. tropicalis ortholog. This difference is highly significant and indicates an overall relaxation of selective pressures on duplicated gene pairs. We have found that the paralogs that have been lost since the tetraploidization event are enriched for several molecular functions, but have found no such enrichment in the extant paralogs. Approximately 14% of the paralogous pairs analyzed here also show differential expression indicative of subfunctionalization.


Subject(s)
Base Sequence , Gene Library , Polyploidy , Xenopus laevis/genetics , Xenopus/genetics , Animals , Evolution, Molecular , Gene Expression , Genes, Duplicate , Genome , Molecular Sequence Data , Open Reading Frames/genetics , Phylogeny , Sequence Homology, Nucleic Acid
15.
Proc Natl Acad Sci U S A ; 103(4): 1030-5, 2006 Jan 24.
Article in English | MEDLINE | ID: mdl-16418266

ABSTRACT

Identification of the specific cytogenetic abnormality is one of the critical steps for classification of acute myeloblastic leukemia (AML) which influences the selection of appropriate therapy and provides information about disease prognosis. However at present, the genetic complexity of AML is only partially understood. To obtain a comprehensive, unbiased, quantitative measure, we performed serial analysis of gene expression (SAGE) on CD15(+) myeloid progenitor cells from 22 AML patients who had four of the most common translocations, namely t(8;21), t(15;17), t(9;11), and inv(16). The quantitative data provide clear evidence that the major change in all these translocation-carrying leukemias is a decrease in expression of the majority of transcripts compared with normal CD15(+) cells. From a total of 1,247,535 SAGE tags, we identified 2,604 transcripts whose expression was significantly altered in these leukemias compared with normal myeloid progenitor cells. The gene ontology of the 1,110 transcripts that matched known genes revealed that each translocation had a uniquely altered profile in various functional categories including regulation of transcription, cell cycle, protein synthesis, and apoptosis. Our global analysis of gene expression of common translocations in AML can focus attention on the function of the genes with altered expression for future biological studies as well as highlight genes/pathways for more specifically targeted therapy.


Subject(s)
Gene Expression Profiling/methods , Gene Expression Regulation, Neoplastic , Gene Expression Regulation , Leukemia, Myeloid, Acute/genetics , Leukemia/genetics , Translocation, Genetic , Apoptosis , Cell Differentiation , Chromosomes, Human, Pair 11/genetics , Chromosomes, Human, Pair 9/genetics , Computational Biology , DNA, Complementary/metabolism , Expressed Sequence Tags , Gene Library , Humans , Leukocytes, Mononuclear/cytology , Lewis X Antigen/biosynthesis , Myeloid Progenitor Cells/cytology , Oligonucleotide Array Sequence Analysis , RNA/chemistry , RNA, Messenger/metabolism , Time Factors
16.
Genome Res ; 14(11): 2235-44, 2004 Nov.
Article in English | MEDLINE | ID: mdl-15479945

ABSTRACT

Although the cost of generating draft-quality genomic sequence continues to decline, refining that sequence by the process of "sequence finishing" remains expensive. Near-perfect finished sequence is an appropriate goal for the human genome and a small set of reference genomes; however, such a high-quality product cannot be cost-justified for large numbers of additional genomes, at least for the foreseeable future. Here we describe the generation and quality of an intermediate grade of finished genomic sequence (termed comparative-grade finished sequence), which is tailored for use in multispecies sequence comparisons. Our analyses indicate that this sequence is very high quality (with the residual gaps and errors mostly falling within repetitive elements) and reflects 99% of the total sequence. Importantly, comparative-grade sequence finishing requires approximately 40-fold less reagents and approximately 10-fold less personnel effort compared to the generation of near-perfect finished sequence, such as that produced for the human genome. Although applied here to finishing sequence derived from individual bacterial artificial chromosome (BAC) clones, one could envision establishing routines for refining sequences emanating from whole-genome shotgun sequencing projects to a similar quality level. Our experience to date demonstrates that comparative-grade sequence finishing represents a practical and affordable option for sequence refinement en route to comparative analyses.


Subject(s)
Chromosomes, Artificial, Bacterial/genetics , Contig Mapping/economics , Exons/genetics , Genome , Sequence Analysis, DNA/economics , Software , Animals , Base Sequence , Cloning, Molecular , Computational Biology , Contig Mapping/methods , Costs and Cost Analysis , Databases, Genetic , Lemur/genetics , Molecular Sequence Data , Papio/genetics , Rats , Repetitive Sequences, Nucleic Acid , Sequence Homology, Nucleic Acid , Software/economics
17.
Nucleic Acids Res ; 32(Database issue): D572-4, 2004 Jan 01.
Article in English | MEDLINE | ID: mdl-14681483

ABSTRACT

Hembase (http://hembase.niddk.nih.gov) is an integrated browser and genome portal designed for web-based examination of the human erythroid transcriptome. To date, Hembase contains 15,752 entries from erythroblast Expressed Sequenced Tags (ESTs) and 380 referenced genes relevant for erythropoiesis. The database is organized to provide a cytogenetic band position, a unique name as well as a concise annotation for each entry. Search queries may be performed by name, keyword or cytogenetic location. Search results are linked to primary sequence data and three major human genome browsers for access to information considered current at the time of each search. Hembase provides interested scientists and clinical hematologists with a genome-based approach toward the study of erythroid biology.


Subject(s)
Databases, Genetic , Erythrocytes/metabolism , Erythropoiesis/genetics , Genomics , Hematology , Computational Biology , Cytogenetics , Expressed Sequence Tags , Genome, Human , Humans , Information Storage and Retrieval , Internet , Transcription, Genetic/genetics
18.
Genome Res ; 13(1): 55-63, 2003 Jan.
Article in English | MEDLINE | ID: mdl-12529306

ABSTRACT

Duplications have long been postulated to be an important mechanism by which genomes evolve. Interspecies genomic comparisons are one method by which the origin and molecular mechanism of duplications can be inferred. By comparative mapping in human, mouse, and rat, we previously found evidence for a recent chromosome-fission event that occurred in the mouse lineage. Cytogenetic mapping revealed that the genomic segments flanking the fission site appeared to be duplicated, with copies residing near the centromere of multiple mouse chromosomes. Here we report the mapping and sequencing of the regions of mouse chromosomes 5 and 6 involved in this chromosome-fission event as well as the results of comparative sequence analysis with the orthologous human and rat genomic regions. Our data indicate that the duplications associated with mouse chromosomes 5 and 6 are recent and that the resulting duplicated segments share significant sequence similarity with a series of regions near the centromeres of the mouse chromosomes previously identified by cytogenetic mapping. We also identified pericentromeric duplicated segments shared between mouse chromosomes 5 and 1. Finally, novel mouse satellite sequences as well as putative chimeric transcripts were found to be associated with the duplicated segments. Together, these findings demonstrate that pericentromeric duplications are not restricted to primates and may be a common mechanism for genome evolution in mammals.


Subject(s)
Centromere/genetics , Gene Duplication , Animals , Chimera/genetics , Chromosomes/genetics , Chromosomes, Human/genetics , Conserved Sequence/genetics , DNA, Satellite/genetics , Evolution, Molecular , Genetic Markers/genetics , Humans , Mice , Physical Chromosome Mapping/methods , Rats
19.
Nucleic Acids Res ; 30(11): 2469-77, 2002 Jun 01.
Article in English | MEDLINE | ID: mdl-12034835

ABSTRACT

In parallel with the production of genomic sequence data, attention is being focused on the generation of comprehensive cDNA-sequence resources. Such efforts are increasingly emphasizing the production of high-accuracy sequence corresponding to the entire insert of cDNA clones, especially those presumed to reflect the full-length mRNA. The complete sequencing of cDNA clones on a large scale presents unique challenges because of the generally small, yet heterogeneous, sizes of the cloned inserts. We have developed a strategy for high-throughput sequencing of cDNA clones using the transposon Tn5. This approach has been tailored for implementation within an existing large-scale 'shotgun-style' sequencing program, although it could be readily adapted for use in virtually any sequencing environment. In addition, we have developed a modified version of our strategy that can be applied to cDNA clones with large cloning vectors, thereby overcoming a potential limitation of transposon-based approaches. Here we describe the details of our cDNA-sequencing pipeline, including a summary of the experience in sequencing more than 4200 cDNA clones to produce more than 8 million base pairs of high-accuracy cDNA sequence. These data provide both convincing evidence that the insertion of Tn5 into cDNA clones is sufficiently random for its effective use in large-scale cDNA sequencing as well as interesting insight about the sequence context preferred for insertion by Tn5.


Subject(s)
DNA Transposable Elements/genetics , DNA, Complementary/genetics , Sequence Analysis, DNA/methods , Base Composition , Binomial Distribution , Cloning, Molecular , Genetic Vectors/genetics , Mutagenesis, Insertional/genetics , Physical Chromosome Mapping/methods , Recombination, Genetic/genetics , Sensitivity and Specificity
20.
Genome Res ; 12(1): 3-15, 2002 Jan.
Article in English | MEDLINE | ID: mdl-11779826

ABSTRACT

Williams syndrome is a complex developmental disorder that results from the heterozygous deletion of a approximately 1.6-Mb segment of human chromosome 7q11.23. These deletions are mediated by large (approximately 300 kb) duplicated blocks of DNA of near-identical sequence. Previously, we showed that the orthologous region of the mouse genome is devoid of such duplicated segments. Here, we extend our studies to include the generation of approximately 3.3 Mb of genomic sequence from the mouse Williams syndrome region, of which just over 1.4 Mb is finished to high accuracy. Comparative analyses of the mouse and human sequences within and immediately flanking the interval commonly deleted in Williams syndrome have facilitated the identification of nine previously unreported genes, provided detailed sequence-based information regarding 30 genes residing in the region, and revealed a number of potentially interesting conserved noncoding sequences. Finally, to facilitate comparative sequence analysis, we implemented several enhancements to the program, including the addition of links from annotated features within a generated percent-identity plot to specific records in public databases. Taken together, the results reported here provide an important comparative sequence resource that should catalyze additional studies of Williams syndrome, including those that aim to characterize genes within the commonly deleted interval and to develop mouse models of the disorder.


Subject(s)
Chromosomes, Human, Pair 7/genetics , Sequence Analysis, DNA/methods , Sequence Homology, Nucleic Acid , Williams Syndrome/genetics , Animals , Base Composition , Conserved Sequence/genetics , Humans , Mice , Molecular Sequence Data , Physical Chromosome Mapping
SELECTION OF CITATIONS
SEARCH DETAIL
...