Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 125
Filter
Add more filters

Publication year range
1.
Cell ; 185(15): 2708-2724, 2022 07 21.
Article in English | MEDLINE | ID: mdl-35868275

ABSTRACT

Synthetic genomics is the construction of viruses, bacteria, and eukaryotic cells with synthetic genomes. It involves two basic processes: synthesis of complete genomes or chromosomes and booting up of those synthetic nucleic acids to make viruses or living cells. The first synthetic genomics efforts resulted in the construction of viruses. This led to a revolution in viral reverse genetics and improvements in vaccine design and manufacture. The first bacterium with a synthetic genome led to construction of a minimal bacterial cell and recoded Escherichia coli strains able to incorporate multiple non-standard amino acids in proteins and resistant to phage infection. Further advances led to a yeast strain with a synthetic genome and new approaches for animal and plant artificial chromosomes. On the horizon there are dramatic advances in DNA synthesis that will enable extraordinary new opportunities in medicine, industry, agriculture, and research.


Subject(s)
Bacteriophages , Chromosomes , Animals , Bacteriophages/genetics , Chromosomes/genetics , Escherichia coli/genetics , Genome, Viral , Genomics/methods , Saccharomyces cerevisiae/genetics , Sequence Analysis, DNA , Synthetic Biology/methods
2.
Physiol Rev ; 104(3): 1409-1459, 2024 Jul 01.
Article in English | MEDLINE | ID: mdl-38517040

ABSTRACT

The collective efforts of scientists over multiple decades have led to advancements in molecular and cellular biology-based technologies including genetic engineering and animal cloning that are now being harnessed to enhance the suitability of pig organs for xenotransplantation into humans. Using organs sourced from pigs with multiple gene deletions and human transgene insertions, investigators have overcome formidable immunological and physiological barriers in pig-to-nonhuman primate (NHP) xenotransplantation and achieved prolonged pig xenograft survival. These studies informed the design of Revivicor's (Revivicor Inc, Blacksburg, VA) genetically engineered pigs with 10 genetic modifications (10 GE) (including the inactivation of 4 endogenous porcine genes and insertion of 6 human transgenes), whose hearts and kidneys have now been studied in preclinical human xenotransplantation models with brain-dead recipients. Additionally, the first two clinical cases of pig-to-human heart xenotransplantation were recently performed with hearts from this 10 GE pig at the University of Maryland. Although this review focuses on xenotransplantation of hearts and kidneys, multiple organs, tissues, and cell types from genetically engineered pigs will provide much-needed therapeutic interventions in the future.


Subject(s)
Animals, Genetically Modified , Transplantation, Heterologous , Animals , Transplantation, Heterologous/methods , Humans , Swine , Genetic Engineering/methods , Heart Transplantation/methods
3.
Nat Rev Genet ; 19(1): 51-62, 2018 Jan.
Article in English | MEDLINE | ID: mdl-29082913

ABSTRACT

A gene can be defined as essential when loss of its function compromises viability of the individual (for example, embryonic lethality) or results in profound loss of fitness. At the population level, identification of essential genes is accomplished by observing intolerance to loss-of-function variants. Several computational methods are available to score gene essentiality, and recent progress has been made in defining essentiality in the non-coding genome. Haploinsufficiency is emerging as a critical aspect of gene essentiality: approximately 3,000 human genes cannot tolerate loss of one of the two alleles. Genes identified as essential in human cell lines or knockout mice may be distinct from those in living humans. Reconciling these discrepancies in how we evaluate gene essentiality has applications in clinical genetics and may offer insights for drug development.


Subject(s)
Genes, Essential , Animals , Genetic Variation , Genome, Human , Genomics , Haploinsufficiency , Humans , Mice , Mice, Knockout , RNA, Untranslated/genetics
4.
Proc Natl Acad Sci U S A ; 117(6): 3053-3062, 2020 02 11.
Article in English | MEDLINE | ID: mdl-31980526

ABSTRACT

Genome sequencing has established clinical utility for rare disease diagnosis. While increasing numbers of individuals have undergone elective genome sequencing, a comprehensive study surveying genome-wide disease-associated genes in adults with deep phenotyping has not been reported. Here we report the results of a 3-y precision medicine study with a goal to integrate whole-genome sequencing with deep phenotyping. A cohort of 1,190 adult participants (402 female [33.8%]; mean age, 54 y [range 20 to 89+]; 70.6% European) had whole-genome sequencing, and were deeply phenotyped using metabolomics, advanced imaging, and clinical laboratory tests in addition to family/medical history. Of 1,190 adults, 206 (17.3%) had at least 1 genetic variant with pathogenic (P) or likely pathogenic (LP) assessment that suggests a predisposition of genetic risk. A multidisciplinary clinical team reviewed all reportable findings for the assessment of genotype and phenotype associations, and 137 (11.5%) had genotype and phenotype associations. A high percentage of genotype and phenotype associations (>75%) was observed for dyslipidemia (n = 24), cardiomyopathy, arrhythmia, and other cardiac diseases (n = 42), and diabetes and endocrine diseases (n = 17). A lack of genotype and phenotype associations, a potential burden for patient care, was observed in 69 (5.8%) individuals with P/LP variants. Genomics and metabolomics associations identified 61 (5.1%) heterozygotes with phenotype manifestations affecting serum metabolite levels in amino acid, lipid and cofactor, and vitamin pathways. Our descriptive analysis provides results on the integration of whole-genome sequencing and deep phenotyping for clinical assessments in adults.


Subject(s)
Diagnostic Imaging , Metabolomics , Precision Medicine/methods , Whole Genome Sequencing , Adult , Aged , Aged, 80 and over , Cohort Studies , Female , Genetic Predisposition to Disease/genetics , Genotype , Heart Diseases/genetics , Humans , Male , Middle Aged , Phenotype , Young Adult
5.
Proc Natl Acad Sci U S A ; 116(18): 8960-8965, 2019 04 30.
Article in English | MEDLINE | ID: mdl-30988206

ABSTRACT

Sequence variation data of the human proteome can be used to analyze 3D protein structures to derive functional insights. We used genetic variant data from nearly 140,000 individuals to analyze 3D positional conservation in 4,715 proteins and 3,951 homology models using 860,292 missense and 465,886 synonymous variants. Sixty percent of protein structures harbor at least one intolerant 3D site as defined by significant depletion of observed over expected missense variation. Structural intolerance data correlated with deep mutational scanning functional readouts for PPARG, MAPK1/ERK2, UBE2I, SUMO1, PTEN, CALM1, CALM2, and TPK1 and with shallow mutagenesis data for 1,026 proteins. The 3D structural intolerance analysis revealed different features for ligand binding pockets and orthosteric and allosteric sites. Large-scale data on human genetic variation support a definition of functional 3D sites proteome-wide.


Subject(s)
Genetic Variation/genetics , Imaging, Three-Dimensional/methods , Proteome/genetics , Binding Sites , Calmodulin/genetics , DNA Mutational Analysis/methods , Humans , Ligands , Mitogen-Activated Protein Kinase 1/genetics , Models, Molecular , Molecular Conformation , Mutation , PPAR gamma/genetics , PTEN Phosphohydrolase/genetics , Protein Conformation , SUMO-1 Protein/genetics , Ubiquitin-Activating Enzymes/genetics
6.
Am J Hum Genet ; 102(4): 609-619, 2018 04 05.
Article in English | MEDLINE | ID: mdl-29625023

ABSTRACT

There is a significant interest in the standardized classification of human genetic variants. We used whole-genome sequence data from 10,495 unrelated individuals to contrast population frequency of pathogenic variants to the expected population prevalence of the disease. Analyses included the ACMG-recommended 59 gene-condition sets for incidental findings and 463 genes associated with 265 OrphaNet conditions. A total of 25,505 variants were used to identify patterns of inflation (i.e., excess genetic risk and misclassification). Inflation increases as the level of evidence supporting the pathogenic nature of the variant decreases. We observed up to 11.5% of genetic disorders with inflation in pathogenic variant sets and up to 92.3% for the variant set with conflicting interpretations. This improved to 7.7% and 57.7%, respectively, after filtering for disease-specific allele frequency. The patterns of inflation were replicated using public data from more than 138,000 genomes. The burden of rare variants was a main contributing factor of the observed inflation, indicating collective misclassified rare variants. We also analyzed the dynamics of re-classification of variant pathogenicity in ClinVar over time, which indicates progressive improvement in variant classification. The study shows that databases include a significant proportion of wrongly ascertained variants; however, it underscores the critical role of ClinVar to contrast claims and foster validation across submitters.


Subject(s)
Disease/genetics , Genetic Variation , Genetic Predisposition to Disease , Humans , Prevalence , Reproducibility of Results , Risk Factors , Software , Time Factors
7.
Proc Natl Acad Sci U S A ; 115(14): 3686-3691, 2018 04 03.
Article in English | MEDLINE | ID: mdl-29555771

ABSTRACT

Reducing premature mortality associated with age-related chronic diseases, such as cancer and cardiovascular disease, is an urgent priority. We report early results using genomics in combination with advanced imaging and other clinical testing to proactively screen for age-related chronic disease risk among adults. We enrolled active, symptom-free adults in a study of screening for age-related chronic diseases associated with premature mortality. In addition to personal and family medical history and other clinical testing, we obtained whole-genome sequencing (WGS), noncontrast whole-body MRI, dual-energy X-ray absorptiometry (DXA), global metabolomics, a new blood test for prediabetes (Quantose IR), echocardiography (ECHO), ECG, and cardiac rhythm monitoring to identify age-related chronic disease risks. Precision medicine screening using WGS and advanced imaging along with other testing among active, symptom-free adults identified a broad set of complementary age-related chronic disease risks associated with premature mortality and strengthened WGS variant interpretation. This and other similarly designed screening approaches anchored by WGS and advanced imaging may have the potential to extend healthy life among active adults through improved prevention and early detection of age-related chronic diseases (and their risk factors) associated with premature mortality.


Subject(s)
Disease/genetics , Genetic Predisposition to Disease , Image Processing, Computer-Assisted/methods , Mutation , Precision Medicine/methods , Whole Genome Sequencing/methods , Adult , Aged , Aged, 80 and over , Cardiovascular Diseases/diagnostic imaging , Cardiovascular Diseases/genetics , Cardiovascular Diseases/pathology , Disease/classification , Female , High-Throughput Nucleotide Sequencing , Humans , Male , Middle Aged , Neoplasms/diagnostic imaging , Neoplasms/genetics , Neoplasms/pathology , Nervous System Diseases/diagnostic imaging , Nervous System Diseases/genetics , Nervous System Diseases/pathology , Risk Assessment , Sequence Analysis, RNA , Young Adult
8.
Am J Hum Genet ; 101(5): 700-715, 2017 Nov 02.
Article in English | MEDLINE | ID: mdl-29100084

ABSTRACT

Short tandem repeats (STRs) are hyper-mutable sequences in the human genome. They are often used in forensics and population genetics and are also the underlying cause of many genetic diseases. There are challenges associated with accurately determining the length polymorphism of STR loci in the genome by next-generation sequencing (NGS). In particular, accurate detection of pathological STR expansion is limited by the sequence read length during whole-genome analysis. We developed TREDPARSE, a software package that incorporates various cues from read alignment and paired-end distance distribution, as well as a sequence stutter model, in a probabilistic framework to infer repeat sizes for genetic loci, and we used this software to infer repeat sizes for 30 known disease loci. Using simulated data, we show that TREDPARSE outperforms other available software. We sampled the full genome sequences of 12,632 individuals to an average read depth of approximately 30× to 40× with Illumina HiSeq X. We identified 138 individuals with risk alleles at 15 STR disease loci. We validated a representative subset of the samples (n = 19) by Sanger and by Oxford Nanopore sequencing. Additionally, we validated the STR calls against known allele sizes in a set of GeT-RM reference cell-line materials (n = 6). Several STR loci that are entirely guanine or cytosines (G or C) have insufficient read evidence for inference and therefore could not be assayed precisely by TREDPARSE. TREDPARSE extends the limit of STR size detection beyond the physical sequence read length. This extension is critical because many of the disease risk cutoffs are close to or beyond the short sequence read length of 100 to 150 bases.


Subject(s)
Genome, Human/genetics , Microsatellite Repeats/genetics , Adolescent , Adult , Alleles , Child , Female , Genetics, Population/methods , High-Throughput Nucleotide Sequencing/methods , Humans , Male , Middle Aged , Polymorphism, Genetic/genetics , Sequence Analysis, DNA/methods , Software
9.
Proc Natl Acad Sci U S A ; 114(30): 8059-8064, 2017 07 25.
Article in English | MEDLINE | ID: mdl-28674023

ABSTRACT

The HLA gene complex on human chromosome 6 is one of the most polymorphic regions in the human genome and contributes in large part to the diversity of the immune system. Accurate typing of HLA genes with short-read sequencing data has historically been difficult due to the sequence similarity between the polymorphic alleles. Here, we introduce an algorithm, xHLA, that iteratively refines the mapping results at the amino acid level to achieve 99-100% four-digit typing accuracy for both class I and II HLA genes, taking only [Formula: see text]3 min to process a 30× whole-genome BAM file on a desktop computer.


Subject(s)
Histocompatibility Testing/methods , Algorithms , Benchmarking , Humans
10.
Proc Natl Acad Sci U S A ; 114(38): 10166-10171, 2017 09 19.
Article in English | MEDLINE | ID: mdl-28874526

ABSTRACT

Prediction of human physical traits and demographic information from genomic data challenges privacy and data deidentification in personalized medicine. To explore the current capabilities of phenotype-based genomic identification, we applied whole-genome sequencing, detailed phenotyping, and statistical modeling to predict biometric traits in a cohort of 1,061 participants of diverse ancestry. Individually, for a large fraction of the traits, their predictive accuracy beyond ancestry and demographic information is limited. However, we have developed a maximum entropy algorithm that integrates multiple predictions to determine which genomic samples and phenotype measurements originate from the same person. Using this algorithm, we have reidentified an average of >8 of 10 held-out individuals in an ethnically mixed cohort and an average of 5 of either 10 African Americans or 10 Europeans. This work challenges current conceptions of personal privacy and may have far-reaching ethical and legal implications.


Subject(s)
Confidentiality , DNA Fingerprinting , Models, Genetic , Phenotype , Whole Genome Sequencing , Adult , Age Factors , Algorithms , Body Size , Cohort Studies , Data Anonymization , Female , Humans , Male , Middle Aged , Pigmentation/genetics , Young Adult
11.
PLoS Pathog ; 13(3): e1006292, 2017 03.
Article in English | MEDLINE | ID: mdl-28328962

ABSTRACT

The characterization of the blood virome is important for the safety of blood-derived transfusion products, and for the identification of emerging pathogens. We explored non-human sequence data from whole-genome sequencing of blood from 8,240 individuals, none of whom were ascertained for any infectious disease. Viral sequences were extracted from the pool of sequence reads that did not map to the human reference genome. Analyses sifted through close to 1 Petabyte of sequence data and performed 0.5 trillion similarity searches. With a lower bound for identification of 2 viral genomes/100,000 cells, we mapped sequences to 94 different viruses, including sequences from 19 human DNA viruses, proviruses and RNA viruses (herpesviruses, anelloviruses, papillomaviruses, three polyomaviruses, adenovirus, HIV, HTLV, hepatitis B, hepatitis C, parvovirus B19, and influenza virus) in 42% of the study participants. Of possible relevance to transfusion medicine, we identified Merkel cell polyomavirus in 49 individuals, papillomavirus in blood of 13 individuals, parvovirus B19 in 6 individuals, and the presence of herpesvirus 8 in 3 individuals. The presence of DNA sequences from two RNA viruses was unexpected: Hepatitis C virus is revealing of an integration event, while the influenza virus sequence resulted from immunization with a DNA vaccine. Age, sex and ancestry contributed significantly to the prevalence of infection. The remaining 75 viruses mostly reflect extensive contamination of commercial reagents and from the environment. These technical problems represent a major challenge for the identification of novel human pathogens. Increasing availability of human whole-genome sequences will contribute substantial amounts of data on the composition of the normal and pathogenic human blood virome. Distinguishing contaminants from real human viruses is challenging.


Subject(s)
Blood/virology , Virus Diseases/epidemiology , Adolescent , Adult , Aged , Aged, 80 and over , Child , Child, Preschool , DNA, Viral/blood , Female , Humans , Infant , Male , Middle Aged , Prevalence , Young Adult
12.
Proc Natl Acad Sci U S A ; 113(42): 11901-11906, 2016 10 18.
Article in English | MEDLINE | ID: mdl-27702888

ABSTRACT

We report on the sequencing of 10,545 human genomes at 30×-40× coverage with an emphasis on quality metrics and novel variant and sequence discovery. We find that 84% of an individual human genome can be sequenced confidently. This high-confidence region includes 91.5% of exon sequence and 95.2% of known pathogenic variant positions. We present the distribution of over 150 million single-nucleotide variants in the coding and noncoding genome. Each newly sequenced genome contributes an average of 8,579 novel variants. In addition, each genome carries on average 0.7 Mb of sequence that is not found in the main build of the hg38 reference genome. The density of this catalog of variation allowed us to construct high-resolution profiles that define genomic sites that are highly intolerant of genetic variation. These results indicate that the data generated by deep genome sequencing is of the quality necessary for clinical use.


Subject(s)
Genome, Human , Genomics , Whole Genome Sequencing , Chromosome Mapping , Computational Biology/methods , Databases, Nucleic Acid , Genetic Predisposition to Disease , Genetic Variation , Genomics/methods , Humans , Open Reading Frames , Polymorphism, Single Nucleotide , Reproducibility of Results , Untranslated Regions
13.
Mol Biol Evol ; 34(12): 3154-3168, 2017 Dec 01.
Article in English | MEDLINE | ID: mdl-29029226

ABSTRACT

Human high-altitude (HA) adaptation or mal-adaptation is explored to understand the physiology, pathophysiology, and molecular mechanisms that underlie long-term exposure to hypoxia. Here, we report the results of an analysis of the largest whole-genome-sequencing of Chronic Mountain Sickness (CMS) and nonCMS individuals, identified candidate genes and functionally validated these candidates in a genetic model system (Drosophila). We used PreCIOSS algorithm that uses Haplotype Allele Frequency score to separate haplotypes carrying the favored allele from the noncarriers and accordingly, prioritize genes associated with the CMS or nonCMS phenotype. Haplotypes in eleven candidate regions, with SNPs mostly in nonexonic regions, were significantly different between CMS and nonCMS subjects. Closer examination of individual genes in these regions revealed the involvement of previously identified candidates (e.g., SENP1) and also unreported ones SGK3, COPS5, PRDM1, and IFT122 in CMS. Remarkably, in addition to genes like SENP1, SGK3, and COPS5 which are HIF-dependent, our study reveals for the first time HIF-independent gene PRDM1, indicating an involvement of wider, nonHIF pathways in HA adaptation. Finally, we observed that down-regulating orthologs of these genes in Drosophila significantly enhanced their hypoxia tolerance. Taken together, the PreCIOSS algorithm, applied on a large number of genomes, identifies the involvement of both new and previously reported genes in selection sweeps, highlighting the involvement of multiple hypoxia response systems. Since the overwhelming majority of SNPs are in nonexonic (and possibly regulatory) regions, we speculate that adaptation to HA necessitates greater genetic flexibility allowing for transcript variability in response to graded levels of hypoxia.


Subject(s)
Acclimatization/genetics , Altitude Sickness/genetics , Adaptation, Physiological/genetics , Adult , Alleles , Altitude , Altitude Sickness/metabolism , Altitude Sickness/physiopathology , Animals , Chronic Disease , Drosophila/genetics , Evolution, Molecular , Gene Frequency/genetics , Haplotypes/genetics , Humans , Hypoxia/genetics , Hypoxia/physiopathology , Male , Peru , Polymorphism, Single Nucleotide/genetics , Positive Regulatory Domain I-Binding Factor 1/genetics , Positive Regulatory Domain I-Binding Factor 1/metabolism , Whole Genome Sequencing/methods
14.
Genome Res ; 25(3): 435-44, 2015 Mar.
Article in English | MEDLINE | ID: mdl-25654978

ABSTRACT

The availability of genetically tractable organisms with simple genomes is critical for the rapid, systems-level understanding of basic biological processes. Mycoplasma bacteria, with the smallest known genomes among free-living cellular organisms, are ideal models for this purpose, but the natural versions of these cells have genome complexities still too great to offer a comprehensive view of a fundamental life form. Here we describe an efficient method for reducing genomes from these organisms by identifying individually deletable regions using transposon mutagenesis and progressively clustering deleted genomic segments using meiotic recombination between the bacterial genomes harbored in yeast. Mycoplasmal genomes subjected to this process and transplanted into recipient cells yielded two mycoplasma strains. The first simultaneously lacked eight singly deletable regions of the genome, representing a total of 91 genes and ∼ 10% of the original genome. The second strain lacked seven of the eight regions, representing 84 genes. Growth assay data revealed an absence of genetic interactions among the 91 genes under tested conditions. Despite predicted effects of the deletions on sugar metabolism and the proteome, growth rates were unaffected by the gene deletions in the seven-deletion strain. These results support the feasibility of using single-gene disruption data to design and construct viable genomes lacking multiple genes, paving the way toward genome minimization. The progressive clustering method is expected to be effective for the reorganization of any mega-sized DNA molecules cloned in yeast, facilitating the construction of designer genomes in microbes as well as genomic fragments for genetic engineering of higher eukaryotes.


Subject(s)
Bacteria/genetics , Gene Transfer, Horizontal , Genome, Bacterial , Multigene Family , Sequence Deletion , Yeasts/genetics , DNA Transposable Elements
15.
Proc Natl Acad Sci U S A ; 112(45): 14024-9, 2015 Nov 10.
Article in English | MEDLINE | ID: mdl-26512100

ABSTRACT

Observations from human microbiome studies are often conflicting or inconclusive. Many factors likely contribute to these issues including small cohort sizes, sample collection, and handling and processing differences. The field of microbiome research is moving from 16S rDNA gene sequencing to a more comprehensive genomic and functional representation through whole-genome sequencing (WGS) of complete communities. Here we performed quantitative and qualitative analyses comparing WGS metagenomic data from human stool specimens using the Illumina Nextera XT and Illumina TruSeq DNA PCR-free kits, and the KAPA Biosystems Hyper Prep PCR and PCR-free systems. Significant differences in taxonomy are observed among the four different next-generation sequencing library preparations using a DNA mock community and a cell control of known concentration. We also revealed biases in error profiles, duplication rates, and loss of reads representing organisms that have a high %G+C content that can significantly impact results. As with all methods, the use of benchmarking controls has revealed critical differences among methods that impact sequencing results and later would impact study interpretation. We recommend that the community adopt PCR-free-based approaches to reduce PCR bias that affects calculations of abundance and to improve assemblies for accurate taxonomic assignment. Furthermore, the inclusion of a known-input cell spike-in control provides accurate quantitation of organisms in clinical samples.


Subject(s)
Gene Library , Genome, Bacterial/genetics , High-Throughput Nucleotide Sequencing/methods , Metagenomics/methods , Microbiota/genetics , Analysis of Variance , Base Composition , Base Sequence , Feces/chemistry , Humans , Metagenomics/trends , Molecular Sequence Data , Polymerase Chain Reaction , Sequence Analysis, DNA , Species Specificity
16.
BMC Genomics ; 18(1): 296, 2017 04 13.
Article in English | MEDLINE | ID: mdl-28407798

ABSTRACT

BACKGROUND: Metagenomics is the study of the microbial genomes isolated from communities found on our bodies or in our environment. By correctly determining the relation between human health and the human associated microbial communities, novel mechanisms of health and disease can be found, thus enabling the development of novel diagnostics and therapeutics. Due to the diversity of the microbial communities, strategies developed for aligning human genomes cannot be utilized, and genomes of the microbial species in the community must be assembled de novo. However, in order to obtain the best metagenomic assemblies, it is important to choose the proper assembler. Due to the rapidly evolving nature of metagenomics, new assemblers are constantly created, and the field has not yet agreed on a standardized process. Furthermore, the truth sets used to compare these methods are either too simple (computationally derived diverse communities) or complex (microbial communities of unknown composition), yielding results that are hard to interpret. In this analysis, we interrogate the strengths and weaknesses of five popular assemblers through the use of defined biological samples of known genomic composition and abundance. We assessed the performance of each assembler on their ability to reassemble genomes, call taxonomic abundances, and recreate open reading frames (ORFs). RESULTS: We tested five metagenomic assemblers: Omega, metaSPAdes, IDBA-UD, metaVelvet and MEGAHIT on known and synthetic metagenomic data sets. MetaSPAdes excelled in diverse sets, IDBA-UD performed well all around, metaVelvet had high accuracy in high abundance organisms, and MEGAHIT was able to accurately differentiate similar organisms within a community. At the ORF level, metaSPAdes and MEGAHIT had the least number of missing ORFs within diverse and similar communities respectively. CONCLUSIONS: Depending on the metagenomics question asked, the correct assembler for the task at hand will differ. It is important to choose the appropriate assembler, and thus clearly define the biological problem of an experiment, as different assemblers will give different answers to the same question.


Subject(s)
Chromosome Mapping/methods , Computational Biology/methods , Metagenomics/methods , Data Accuracy , Genome, Bacterial , Humans , Open Reading Frames , Software
18.
Genome Res ; 23(5): 826-32, 2013 May.
Article in English | MEDLINE | ID: mdl-23282328

ABSTRACT

There is increasing evidence that the phenotypic effects of genomic sequence variants are best understood in terms of variant haplotypes rather than as isolated polymorphisms. Haplotype analysis is also critically important for uncovering population histories and for the study of evolutionary genetics. Although the sequencing of individual human genomes to reveal personal collections of sequence variants is now well established, there has been slower progress in the phasing of these variants into pairs of haplotypes along each pair of chromosomes. Here, we have developed a distinct approach to haplotyping that can yield chromosome-length haplotypes, including the vast majority of heterozygous single-nucleotide polymorphisms (SNPs) in an individual human genome. This approach exploits the haploid nature of sperm cells and employs a combination of genotyping and low-coverage sequencing on a short-read platform. In addition to generating chromosome-length haplotypes, the approach can directly identify recombination events (averaging 1.1 per chromosome) with a median resolution of <100 kb.


Subject(s)
Genome, Human , Haplotypes/genetics , Spermatozoa , Chromosome Mapping , Genotype , Humans , Male , Polymorphism, Single Nucleotide , Sequence Analysis, DNA
19.
Genome Res ; 23(5): 867-77, 2013 May.
Article in English | MEDLINE | ID: mdl-23564253

ABSTRACT

Although biofilms have been shown to be reservoirs of pathogens, our knowledge of the microbial diversity in biofilms within critical areas, such as health care facilities, is limited. Available methods for pathogen identification and strain typing have some inherent restrictions. In particular, culturing will yield only a fraction of the species present, PCR of virulence or marker genes is mainly focused on a handful of known species, and shotgun metagenomics is limited in the ability to detect strain variations. In this study, we present a single-cell genome sequencing approach to address these limitations and demonstrate it by specifically targeting bacterial cells within a complex biofilm from a hospital bathroom sink drain. A newly developed, automated platform was used to generate genomic DNA by the multiple displacement amplification (MDA) technique from hundreds of single cells in parallel. MDA reactions were screened and classified by 16S rRNA gene PCR sequence, which revealed a broad range of bacteria covering 25 different genera representing environmental species, human commensals, and opportunistic human pathogens. Here we focus on the recovery of a nearly complete genome representing a novel strain of the periodontal pathogen Porphyromonas gingivalis (P. gingivalis JCVI SC001) using the single-cell assembly tool SPAdes. Single-cell genomics is becoming an accepted method to capture novel genomes, primarily in the marine and soil environments. Here we show for the first time that it also enables comparative genomic analysis of strain variation in a pathogen captured from complex biofilm samples in a healthcare facility.


Subject(s)
Biofilms , High-Throughput Nucleotide Sequencing , Porphyromonas gingivalis/genetics , Single-Cell Analysis , Bacteroidaceae Infections/genetics , Bacteroidaceae Infections/microbiology , Cross Infection/genetics , Cross Infection/microbiology , Genome, Bacterial , Humans , Porphyromonas gingivalis/pathogenicity
20.
Nat Methods ; 10(5): 410-2, 2013 May.
Article in English | MEDLINE | ID: mdl-23542886

ABSTRACT

Transfer of genomes into yeast facilitates genome engineering for genetically intractable organisms, but this process has been hampered by the need for cumbersome isolation of intact genomes before transfer. Here we demonstrate direct cell-to-cell transfer of bacterial genomes as large as 1.8 megabases (Mb) into yeast under conditions that promote cell fusion. Moreover, we discovered that removal of restriction endonucleases from donor bacteria resulted in the enhancement of genome transfer.


Subject(s)
Genome, Bacterial , Genome, Fungal , Transfection
SELECTION OF CITATIONS
SEARCH DETAIL