Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 9 de 9
Filter
1.
Genome Res ; 27(1): 1-14, 2017 01.
Article in English | MEDLINE | ID: mdl-27965293

ABSTRACT

Siberia and Northwestern Russia are home to over 40 culturally and linguistically diverse indigenous ethnic groups, yet genetic variation and histories of peoples from this region are largely uncharacterized. We present deep whole-genome sequencing data (∼38×) from 28 individuals belonging to 14 distinct indigenous populations from that region. We combined these data sets with additional 32 modern-day and 46 ancient human genomes to reconstruct genetic histories of several indigenous Northern Eurasian populations. We found that Siberian and East Asian populations shared 38% of their ancestry with a 45,000-yr-old Ust'-Ishim individual who was previously believed to have no modern-day descendants. Western Siberians trace 57% of their ancestry to ancient North Eurasians, represented by the 24,000-yr-old Siberian Mal'ta boy MA-1. Eastern Siberian populations formed a distinct sublineage that separated from other East Asian populations ∼10,000 yr ago. In addition, we uncovered admixtures between Siberians and Eastern European hunter-gatherers from Samara, Karelia, Hungary, and Sweden (from 8000-6600 yr ago); Yamnaya people (5300-4700 yr ago); and modern-day Northeastern Europeans. Our results provide new insights into genetic histories of Siberian and Northeastern European populations and evidence of ancient gene flow from Siberia into Europe.


Subject(s)
DNA, Mitochondrial/genetics , Genetics, Population , Genome, Human , White People/genetics , Asian People/genetics , Ethnicity/genetics , Gene Flow , Genetic Variation , Haplotypes , High-Throughput Nucleotide Sequencing , Humans , Phylogeography , Russia , Siberia
2.
Genome Res ; 25(4): 534-43, 2015 Apr.
Article in English | MEDLINE | ID: mdl-25665577

ABSTRACT

Accurate evaluation of microbial communities is essential for understanding global biogeochemical processes and can guide bioremediation and medical treatments. Metagenomics is most commonly used to analyze microbial diversity and metabolic potential, but assemblies of the short reads generated by current sequencing platforms may fail to recover heterogeneous strain populations and rare organisms. Here we used short (150-bp) and long (multi-kb) synthetic reads to evaluate strain heterogeneity and study microorganisms at low abundance in complex microbial communities from terrestrial sediments. The long-read data revealed multiple (probably dozens of) closely related species and strains from previously undescribed Deltaproteobacteria and Aminicenantes (candidate phylum OP8). Notably, these are the most abundant organisms in the communities, yet short-read assemblies achieved only partial genome coverage, mostly in the form of short scaffolds (N50 = ∼ 2200 bp). Genome architecture and metabolic potential for these lineages were reconstructed using a new synteny-based method. Analysis of long-read data also revealed thousands of species whose abundances were <0.1% in all samples. Most of the organisms in this "long tail" of rare organisms belong to phyla that are also represented by abundant organisms. Genes encoding glycosyl hydrolases are significantly more abundant than expected in rare genomes, suggesting that rare species may augment the capability for carbon turnover and confer resilience to changing environmental conditions. Overall, the study showed that a diversity of closely related strains and rare organisms account for a major portion of the communities. These are probably common features of many microbial communities and can be effectively studied using a combination of long and short reads.


Subject(s)
Bacterial Proteins/genetics , Deltaproteobacteria/genetics , Geologic Sediments/microbiology , Hydrolases/genetics , Microbial Consortia/genetics , Base Sequence , Biodiversity , Chloroflexi/genetics , Chloroflexi/isolation & purification , DNA, Bacterial/genetics , Deltaproteobacteria/isolation & purification , Genome, Bacterial , Geologic Sediments/analysis , Glucose/metabolism , Metagenomics/methods , Sequence Analysis, DNA
3.
Lancet ; 375(9725): 1525-35, 2010 May 01.
Article in English | MEDLINE | ID: mdl-20435227

ABSTRACT

BACKGROUND: The cost of genomic information has fallen steeply, but the clinical translation of genetic risk estimates remains unclear. We aimed to undertake an integrated analysis of a complete human genome in a clinical context. METHODS: We assessed a patient with a family history of vascular disease and early sudden death. Clinical assessment included analysis of this patient's full genome sequence, risk prediction for coronary artery disease, screening for causes of sudden cardiac death, and genetic counselling. Genetic analysis included the development of novel methods for the integration of whole genome and clinical risk. Disease and risk analysis focused on prediction of genetic risk of variants associated with mendelian disease, recognised drug responses, and pathogenicity for novel variants. We queried disease-specific mutation databases and pharmacogenomics databases to identify genes and mutations with known associations with disease and drug response. We estimated post-test probabilities of disease by applying likelihood ratios derived from integration of multiple common variants to age-appropriate and sex-appropriate pre-test probabilities. We also accounted for gene-environment interactions and conditionally dependent risks. FINDINGS: Analysis of 2.6 million single nucleotide polymorphisms and 752 copy number variations showed increased genetic risk for myocardial infarction, type 2 diabetes, and some cancers. We discovered rare variants in three genes that are clinically associated with sudden cardiac death-TMEM43, DSP, and MYBPC3. A variant in LPA was consistent with a family history of coronary artery disease. The patient had a heterozygous null mutation in CYP2C19 suggesting probable clopidogrel resistance, several variants associated with a positive response to lipid-lowering therapy, and variants in CYP4F2 and VKORC1 that suggest he might have a low initial dosing requirement for warfarin. Many variants of uncertain importance were reported. INTERPRETATION: Although challenges remain, our results suggest that whole-genome sequencing can yield useful and clinically relevant information for individual patients. FUNDING: National Institute of General Medical Sciences; National Heart, Lung And Blood Institute; National Human Genome Research Institute; Howard Hughes Medical Institute; National Library of Medicine, Lucile Packard Foundation for Children's Health; Hewlett Packard Foundation; Breetwor Family Foundation.


Subject(s)
Genetic Predisposition to Disease/genetics , Genetic Testing , Genome, Human , Sequence Analysis, DNA , Vascular Diseases/genetics , Adult , Aryl Hydrocarbon Hydroxylases/genetics , Carrier Proteins/genetics , Cytochrome P-450 CYP2C19 , Cytochrome P-450 Enzyme System/genetics , Cytochrome P450 Family 4 , Death, Sudden, Cardiac , Desmoplakins/genetics , Environment , Family Health , Genetic Counseling , Humans , Lipoprotein(a)/genetics , Male , Membrane Proteins/genetics , Mixed Function Oxygenases/genetics , Mutation , Osteoarthritis/genetics , Pedigree , Pharmacogenetics , Polymorphism, Single Nucleotide , Risk Assessment , Vitamin K Epoxide Reductases
4.
Nat Biotechnol ; 32(3): 261-266, 2014 Mar.
Article in English | MEDLINE | ID: mdl-24561555

ABSTRACT

The rapid growth of sequencing technologies has greatly contributed to our understanding of human genetics. Yet, despite this growth, mainstream technologies have not been fully able to resolve the diploid nature of the human genome. Here we describe statistically aided, long-read haplotyping (SLRH), a rapid, accurate method that uses a statistical algorithm to take advantage of the partially phased information contained in long genomic fragments analyzed by short-read sequencing. For a human sample, as little as 30 Gbp of additional sequencing data are needed to phase genotypes identified by 50× coverage whole-genome sequencing. Using SLRH, we phase 99% of single-nucleotide variants in three human genomes into long haplotype blocks 0.2-1 Mbp in length. We apply our method to determine allele-specific methylation patterns in a human genome and identify hundreds of differentially methylated regions that were previously unknown. SLRH should facilitate population-scale haplotyping of human genomes.


Subject(s)
Genomics/methods , Haplotypes/genetics , Sequence Analysis, DNA/methods , Algorithms , DNA Methylation/genetics , Genome, Human/genetics , Humans , Polymerase Chain Reaction , Polymorphism, Single Nucleotide/genetics
5.
PLoS One ; 9(9): e106689, 2014.
Article in English | MEDLINE | ID: mdl-25188499

ABSTRACT

High-throughput DNA sequencing technologies have revolutionized genomic analysis, including the de novo assembly of whole genomes. Nevertheless, assembly of complex genomes remains challenging, in part due to the presence of dispersed repeats which introduce ambiguity during genome reconstruction. Transposable elements (TEs) can be particularly problematic, especially for TE families exhibiting high sequence identity, high copy number, or complex genomic arrangements. While TEs strongly affect genome function and evolution, most current de novo assembly approaches cannot resolve long, identical, and abundant families of TEs. Here, we applied a novel Illumina technology called TruSeq synthetic long-reads, which are generated through highly-parallel library preparation and local assembly of short read data and which achieve lengths of 1.5-18.5 Kbp with an extremely low error rate ([Formula: see text]0.03% per base). To test the utility of this technology, we sequenced and assembled the genome of the model organism Drosophila melanogaster (reference genome strain y; cn, bw, sp) achieving an N50 contig size of 69.7 Kbp and covering 96.9% of the euchromatic chromosome arms of the current reference genome. TruSeq synthetic long-read technology enables placement of individual TE copies in their proper genomic locations as well as accurate reconstruction of TE sequences. We entirely recovered and accurately placed 4,229 (77.8%) of the 5,434 annotated transposable elements with perfect identity to the current reference genome. As TEs are ubiquitous features of genomes of many species, TruSeq synthetic long-reads, and likely other methods that generate long-reads, offer a powerful approach to improve de novo assemblies of whole genomes.


Subject(s)
DNA Transposable Elements/genetics , Drosophila melanogaster/genetics , High-Throughput Nucleotide Sequencing/methods , Animals , Genome/genetics
6.
Nat Genet ; 46(12): 1343-9, 2014 Dec.
Article in English | MEDLINE | ID: mdl-25326703

ABSTRACT

Haplotype-resolved genome sequencing enables the accurate interpretation of medically relevant genetic variation, deep inferences regarding population history and non-invasive prediction of fetal genomes. We describe an approach for genome-wide haplotyping based on contiguity-preserving transposition (CPT-seq) and combinatorial indexing. Tn5 transposition is used to modify DNA with adaptor and index sequences while preserving contiguity. After DNA dilution and compartmentalization, the transposase is removed, resolving the DNA into individually indexed libraries. The libraries in each compartment, enriched for neighboring genomic elements, are further indexed via PCR. Combinatorial 96-plex indexing at both the transposition and PCR stage enables the construction of phased synthetic reads from each of the nearly 10,000 'virtual compartments'. We demonstrate the feasibility of this method by assembling >95% of the heterozygous variants in a human genome into long, accurate haplotype blocks (N50 = 1.4-2.3 Mb). The rapid, scalable and cost-effective workflow could enable haplotype resolution to become routine in human genome sequencing.


Subject(s)
Haplotypes , High-Throughput Nucleotide Sequencing/methods , Sequence Analysis, DNA/methods , Algorithms , Chromosome Mapping , Cluster Analysis , DNA/genetics , Female , Gene Library , Genome, Human , Genomics , Heterozygote , Humans , Male , Polymerase Chain Reaction , Reproducibility of Results , Transposases/genetics
7.
Science ; 341(6144): 384-7, 2013 Jul 26.
Article in English | MEDLINE | ID: mdl-23888037

ABSTRACT

Histocompatibility is the basis by which multicellular organisms of the same species distinguish self from nonself. Relatively little is known about the mechanisms underlying histocompatibility reactions in lower organisms. Botryllus schlosseri is a colonial urochordate, a sister group of vertebrates, that exhibits a genetically determined natural transplantation reaction, whereby self-recognition between colonies leads to formation of parabionts with a common vasculature, whereas rejection occurs between incompatible colonies. Using genetically defined lines, whole-transcriptome sequencing, and genomics, we identified a single gene that encodes self-nonself and determines "graft" outcomes in this organism. This gene is significantly up-regulated in colonies poised to undergo fusion and/or rejection, is highly expressed in the vasculature, and is functionally linked to histocompatibility outcomes. These findings establish a platform for advancing the science of allorecognition.


Subject(s)
Genes , Histocompatibility/genetics , Urochordata/genetics , Urochordata/immunology , Alleles , Animals , Genome , Genotype , Immune Tolerance , Molecular Sequence Data , Sequence Analysis, DNA , Transcriptome , Up-Regulation , Urochordata/physiology
8.
Elife ; 2: e00569, 2013 Jul 02.
Article in English | MEDLINE | ID: mdl-23840927

ABSTRACT

Botryllus schlosseri is a colonial urochordate that follows the chordate plan of development following sexual reproduction, but invokes a stem cell-mediated budding program during subsequent rounds of asexual reproduction. As urochordates are considered to be the closest living invertebrate relatives of vertebrates, they are ideal subjects for whole genome sequence analyses. Using a novel method for high-throughput sequencing of eukaryotic genomes, we sequenced and assembled 580 Mbp of the B. schlosseri genome. The genome assembly is comprised of nearly 14,000 intron-containing predicted genes, and 13,500 intron-less predicted genes, 40% of which could be confidently parceled into 13 (of 16 haploid) chromosomes. A comparison of homologous genes between B. schlosseri and other diverse taxonomic groups revealed genomic events underlying the evolution of vertebrates and lymphoid-mediated immunity. The B. schlosseri genome is a community resource for studying alternative modes of reproduction, natural transplantation reactions, and stem cell-mediated regeneration. DOI:http://dx.doi.org/10.7554/eLife.00569.001.


Subject(s)
Chordata/genetics , Genome , Animals , Chordata/classification , Chordata/physiology , Chromosome Mapping , High-Throughput Nucleotide Sequencing , Phylogeny , Reproduction
9.
Nat Biotechnol ; 27(9): 847-50, 2009 Sep.
Article in English | MEDLINE | ID: mdl-19668243

ABSTRACT

Recent advances in high-throughput DNA sequencing technologies have enabled order-of-magnitude improvements in both cost and throughput. Here we report the use of single-molecule methods to sequence an individual human genome. We aligned billions of 24- to 70-bp reads (32 bp average) to approximately 90% of the National Center for Biotechnology Information (NCBI) reference genome, with 28x average coverage. Our results were obtained on one sequencing instrument by a single operator with four data collection runs. Single-molecule sequencing enabled analysis of human genomic information without the need for cloning, amplification or ligation. We determined approximately 2.8 million single nucleotide polymorphisms (SNPs) with a false-positive rate of less than 1% as validated by Sanger sequencing and 99.8% concordance with SNP genotyping arrays. We identified 752 regions of copy number variation by analyzing coverage depth alone and validated 27 of these using digital PCR. This milestone should allow widespread application of genome sequencing to many aspects of genetics and human health, including personal genomics.


Subject(s)
Genome, Human , Genomics/methods , Sequence Analysis, DNA/methods , Computer Simulation , Humans , Polymorphism, Single Nucleotide , Reproducibility of Results
SELECTION OF CITATIONS
SEARCH DETAIL