Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 205
Filter
Add more filters

Publication year range
1.
Nat Rev Genet ; 2024 Apr 22.
Article in English | MEDLINE | ID: mdl-38649458

ABSTRACT

Genome sequences largely determine the biology and encode the history of an organism, and de novo assembly - the process of reconstructing the genome sequence of an organism from sequencing reads - has been a central problem in bioinformatics for four decades. Until recently, genomes were typically assembled into fragments of a few megabases at best, but now technological advances in long-read sequencing enable the near-complete assembly of each chromosome - also known as telomere-to-telomere assembly - for many organisms. Here, we review recent progress on assembly algorithms and protocols, with a focus on how to derive near-telomere-to-telomere assemblies. We also discuss the additional developments that will be required to resolve remaining assembly gaps and to assemble non-diploid genomes.

2.
Nature ; 625(7994): 312-320, 2024 Jan.
Article in English | MEDLINE | ID: mdl-38200293

ABSTRACT

The Holocene (beginning around 12,000 years ago) encompassed some of the most significant changes in human evolution, with far-reaching consequences for the dietary, physical and mental health of present-day populations. Using a dataset of more than 1,600 imputed ancient genomes1, we modelled the selection landscape during the transition from hunting and gathering, to farming and pastoralism across West Eurasia. We identify key selection signals related to metabolism, including that selection at the FADS cluster began earlier than previously reported and that selection near the LCT locus predates the emergence of the lactase persistence allele by thousands of years. We also find strong selection in the HLA region, possibly due to increased exposure to pathogens during the Bronze Age. Using ancient individuals to infer local ancestry tracts in over 400,000 samples from the UK Biobank, we identify widespread differences in the distribution of Mesolithic, Neolithic and Bronze Age ancestries across Eurasia. By calculating ancestry-specific polygenic risk scores, we show that height differences between Northern and Southern Europe are associated with differential Steppe ancestry, rather than selection, and that risk alleles for mood-related phenotypes are enriched for Neolithic farmer ancestry, whereas risk alleles for diabetes and Alzheimer's disease are enriched for Western hunter-gatherer ancestry. Our results indicate that ancient selection and migration were large contributors to the distribution of phenotypic diversity in present-day Europeans.


Subject(s)
Asian , European People , Genome, Human , Selection, Genetic , Humans , Affect , Agriculture/history , Alleles , Alzheimer Disease/genetics , Asia/ethnology , Asian/genetics , Diabetes Mellitus/genetics , Europe/ethnology , European People/genetics , Farmers/history , Genetic Loci/genetics , Genetic Predisposition to Disease , Genome, Human/genetics , History, Ancient , Human Migration , Hunting/history , Multigene Family/genetics , Phenotype , UK Biobank , Multifactorial Inheritance/genetics
3.
Nature ; 612(7939): 283-291, 2022 12.
Article in English | MEDLINE | ID: mdl-36477129

ABSTRACT

Late Pliocene and Early Pleistocene epochs 3.6 to 0.8 million years ago1 had climates resembling those forecasted under future warming2. Palaeoclimatic records show strong polar amplification with mean annual temperatures of 11-19 °C above contemporary values3,4. The biological communities inhabiting the Arctic during this time remain poorly known because fossils are rare5. Here we report an ancient environmental DNA6 (eDNA) record describing the rich plant and animal assemblages of the Kap København Formation in North Greenland, dated to around two million years ago. The record shows an open boreal forest ecosystem with mixed vegetation of poplar, birch and thuja trees, as well as a variety of Arctic and boreal shrubs and herbs, many of which had not previously been detected at the site from macrofossil and pollen records. The DNA record confirms the presence of hare and mitochondrial DNA from animals including mastodons, reindeer, rodents and geese, all ancestral to their present-day and late Pleistocene relatives. The presence of marine species including horseshoe crab and green algae support a warmer climate than today. The reconstructed ecosystem has no modern analogue. The survival of such ancient eDNA probably relates to its binding to mineral surfaces. Our findings open new areas of genetic research, demonstrating that it is possible to track the ecology and evolution of biological communities from two million years ago using ancient eDNA.


Subject(s)
DNA, Environmental , Ecosystem , Ecology , Fossils , Greenland
4.
Nature ; 600(7887): 86-92, 2021 12.
Article in English | MEDLINE | ID: mdl-34671161

ABSTRACT

During the last glacial-interglacial cycle, Arctic biotas experienced substantial climatic changes, yet the nature, extent and rate of their responses are not fully understood1-8. Here we report a large-scale environmental DNA metagenomic study of ancient plant and mammal communities, analysing 535 permafrost and lake sediment samples from across the Arctic spanning the past 50,000 years. Furthermore, we present 1,541 contemporary plant genome assemblies that were generated as reference sequences. Our study provides several insights into the long-term dynamics of the Arctic biota at the circumpolar and regional scales. Our key findings include: (1) a relatively homogeneous steppe-tundra flora dominated the Arctic during the Last Glacial Maximum, followed by regional divergence of vegetation during the Holocene epoch; (2) certain grazing animals consistently co-occurred in space and time; (3) humans appear to have been a minor factor in driving animal distributions; (4) higher effective precipitation, as well as an increase in the proportion of wetland plants, show negative effects on animal diversity; (5) the persistence of the steppe-tundra vegetation in northern Siberia enabled the late survival of several now-extinct megafauna species, including the woolly mammoth until 3.9 ± 0.2 thousand years ago (ka) and the woolly rhinoceros until 9.8 ± 0.2 ka; and (6) phylogenetic analysis of mammoth environmental DNA reveals a previously unsampled mitochondrial lineage. Our findings highlight the power of ancient environmental metagenomics analyses to advance understanding of population histories and long-term ecological dynamics.


Subject(s)
Biota , DNA, Ancient/analysis , DNA, Environmental/analysis , Metagenomics , Animals , Arctic Regions , Climate Change/history , Databases, Genetic , Datasets as Topic , Extinction, Biological , Geologic Sediments , Grassland , Greenland , Haplotypes/genetics , Herbivory/genetics , History, Ancient , Humans , Lakes , Mammoths , Mitochondria/genetics , Perissodactyla , Permafrost , Phylogeny , Plants/genetics , Population Dynamics , Rain , Siberia , Spatio-Temporal Analysis , Wetlands
5.
PLoS Genet ; 20(7): e1011318, 2024 Jul.
Article in English | MEDLINE | ID: mdl-39024186

ABSTRACT

Sex chromosomes are evolutionarily labile in many animals and sometimes fuse with autosomes, creating so-called neo-sex chromosomes. Fusions between sex chromosomes and autosomes have been proposed to reduce sexual conflict and to promote adaptation and reproductive isolation among species. Recently, advances in genomics have fuelled the discovery of such fusions across the tree of life. Here, we discovered multiple fusions leading to neo-sex chromosomes in the sapho subclade of the classical adaptive radiation of Heliconius butterflies. Heliconius butterflies generally have 21 chromosomes with very high synteny. However, the five Heliconius species in the sapho subclade show large variation in chromosome number ranging from 21 to 60. We find that the W chromosome is fused with chromosome 4 in all of them. Two sister species pairs show subsequent fusions between the W and chromosomes 9 or 14, respectively. These fusions between autosomes and sex chromosomes make Heliconius butterflies an ideal system for studying the role of neo-sex chromosomes in adaptive radiations and the degeneration of sex chromosomes over time. Our findings emphasize the capability of short-read resequencing to detect genomic signatures of fusion events between sex chromosomes and autosomes even when sex chromosomes are not explicitly assembled.


Subject(s)
Butterflies , Evolution, Molecular , Sex Chromosomes , Animals , Butterflies/genetics , Sex Chromosomes/genetics , Female , Male , Phylogeny , Genomics/methods , Synteny , Chromosomes, Insect/genetics , Genome, Insect
6.
Genome Res ; 33(7): 1023-1031, 2023 07.
Article in English | MEDLINE | ID: mdl-37562965

ABSTRACT

The pairwise sequentially Markovian coalescent (PSMC) algorithm and its extensions infer the coalescence time of two homologous chromosomes at each genomic position. This inference is used in reconstructing demographic histories, detecting selection signatures, studying genome-wide associations, constructing ancestral recombination graphs, and more. Inference of coalescence times between each pair of haplotypes in a large data set is of great interest, as they may provide rich information about the population structure and history of the sample. Here, we introduce a new method, Gamma-SMC, which is more than 10 times faster than current methods. To obtain this speed-up, we represent the posterior coalescence time distributions succinctly as a gamma distribution with just two parameters; in contrast, PSMC and its extensions hold these in a vector over discrete intervals of time. Thus, Gamma-SMC has constant time-complexity per site, without dependence on the number of discrete time states. Additionally, because of this continuous representation, our method is able to infer times spanning many orders of magnitude and, as such, is robust to parameter misspecification. We describe how this approach works, show its performance on simulated and real data, and illustrate its use in studying recent positive selection in the 1000 Genomes Project data set.


Subject(s)
Genome , Genomics , Haplotypes , Chromosomes/genetics , Algorithms , Models, Genetic , Genetics, Population
7.
Nature ; 570(7760): 182-188, 2019 06.
Article in English | MEDLINE | ID: mdl-31168093

ABSTRACT

Northeastern Siberia has been inhabited by humans for more than 40,000 years but its deep population history remains poorly understood. Here we investigate the late Pleistocene population history of northeastern Siberia through analyses of 34 newly recovered ancient genomes that date to between 31,000 and 600 years ago. We document complex population dynamics during this period, including at least three major migration events: an initial peopling by a previously unknown Palaeolithic population of 'Ancient North Siberians' who are distantly related to early West Eurasian hunter-gatherers; the arrival of East Asian-related peoples, which gave rise to 'Ancient Palaeo-Siberians' who are closely related to contemporary communities from far-northeastern Siberia (such as the Koryaks), as well as Native Americans; and a Holocene migration of other East Asian-related peoples, who we name 'Neo-Siberians', and from whom many contemporary Siberians are descended. Each of these population expansions largely replaced the earlier inhabitants, and ultimately generated the mosaic genetic make-up of contemporary peoples who inhabit a vast area across northern Eurasia and the Americas.


Subject(s)
Genome, Human/genetics , Human Migration/history , Asia/ethnology , DNA, Ancient/analysis , Europe/ethnology , Gene Pool , Haplotypes , History, 15th Century , History, Ancient , History, Medieval , Humans , Indians, North American , Male , Siberia/ethnology
8.
Proc Natl Acad Sci U S A ; 119(4)2022 01 25.
Article in English | MEDLINE | ID: mdl-35042809

ABSTRACT

The Earth BioGenome Project (EBP) is an audacious endeavor to obtain whole-genome sequences of representatives from all eukaryotic species on Earth. In addition to the project's technical and organizational challenges, it also faces complicated ethical, legal, and social issues. This paper, from members of the EBP's Ethical, Legal, and Social Issues (ELSI) Committee, catalogs these ELSI concerns arising from EBP. These include legal issues, such as sample collection and permitting; the applicability of international treaties, such as the Convention on Biological Diversity and the Nagoya Protocol; intellectual property; sample accessioning; and biosecurity and ethical issues, such as sampling from the territories of Indigenous peoples and local communities, the protection of endangered species, and cross-border collections, among several others. We also comment on the intersection of digital sequence information and data rights. More broadly, this list of ethical, legal, and social issues for large-scale genomic sequencing projects may be useful in the consideration of ethical frameworks for future projects. While we do not-and cannot-provide simple, overarching solutions for all the issues raised here, we conclude our perspective by beginning to chart a path forward for EBP's work.


Subject(s)
Endangered Species/legislation & jurisprudence , Ethics, Research , Genomics , Animals , Biosecurity/ethics , Biosecurity/legislation & jurisprudence , Genomics/ethics , Genomics/legislation & jurisprudence , Humans
9.
Proc Natl Acad Sci U S A ; 119(4)2022 01 25.
Article in English | MEDLINE | ID: mdl-35042801

ABSTRACT

Life on Earth has evolved from initial simplicity to the astounding complexity we experience today. Bacteria and archaea have largely excelled in metabolic diversification, but eukaryotes additionally display abundant morphological innovation. How have these innovations come about and what constraints are there on the origins of novelty and the continuing maintenance of biodiversity on Earth? The history of life and the code for the working parts of cells and systems are written in the genome. The Earth BioGenome Project has proposed that the genomes of all extant, named eukaryotes-about 2 million species-should be sequenced to high quality to produce a digital library of life on Earth, beginning with strategic phylogenetic, ecological, and high-impact priorities. Here we discuss why we should sequence all eukaryotic species, not just a representative few scattered across the many branches of the tree of life. We suggest that many questions of evolutionary and ecological significance will only be addressable when whole-genome data representing divergences at all of the branchings in the tree of life or all species in natural ecosystems are available. We envisage that a genomic tree of life will foster understanding of the ongoing processes of speciation, adaptation, and organismal dependencies within entire ecosystems. These explorations will resolve long-standing problems in phylogenetics, evolution, ecology, conservation, agriculture, bioindustry, and medicine.


Subject(s)
Base Sequence/genetics , Eukaryota/genetics , Genomics/ethics , Animals , Biodiversity , Biological Evolution , Ecology , Ecosystem , Genome , Genomics/methods , Humans , Phylogeny
10.
Proc Natl Acad Sci U S A ; 119(4)2022 01 25.
Article in English | MEDLINE | ID: mdl-35042802

ABSTRACT

A global international initiative, such as the Earth BioGenome Project (EBP), requires both agreement and coordination on standards to ensure that the collective effort generates rapid progress toward its goals. To this end, the EBP initiated five technical standards committees comprising volunteer members from the global genomics scientific community: Sample Collection and Processing, Sequencing and Assembly, Annotation, Analysis, and IT and Informatics. The current versions of the resulting standards documents are available on the EBP website, with the recognition that opportunities, technologies, and challenges may improve or change in the future, requiring flexibility for the EBP to meet its goals. Here, we describe some highlights from the proposed standards, and areas where additional challenges will need to be met.


Subject(s)
Base Sequence/genetics , Eukaryota/genetics , Genomics/standards , Animals , Biodiversity , Genomics/methods , Humans , Reference Standards , Reference Values , Sequence Analysis, DNA/methods , Sequence Analysis, DNA/standards
11.
Mol Biol Evol ; 40(5)2023 05 02.
Article in English | MEDLINE | ID: mdl-37194566

ABSTRACT

We present genome sequences for the caecilians Geotrypetes seraphini (3.8 Gb) and Microcaecilia unicolor (4.7 Gb), representatives of a limbless, mostly soil-dwelling amphibian clade with reduced eyes, and unique putatively chemosensory tentacles. More than 69% of both genomes are composed of repeats, with retrotransposons being the most abundant. We identify 1,150 orthogroups that are unique to caecilians and enriched for functions in olfaction and detection of chemical signals. There are 379 orthogroups with signatures of positive selection on caecilian lineages with roles in organ development and morphogenesis, sensory perception, and immunity amongst others. We discover that caecilian genomes are missing the zone of polarizing activity regulatorysequence (ZRS) enhancer of Sonic Hedgehog which is also mutated in snakes. In vivo deletions have shown ZRS is required for limb development in mice, thus, revealing a shared molecular target implicated in the independent evolution of limblessness in snakes and caecilians.


Subject(s)
Amphibians , Hedgehog Proteins , Animals , Mice , Hedgehog Proteins/genetics , Amphibians/genetics , Genome , Snakes/genetics , Acclimatization , Evolution, Molecular
12.
Bioinformatics ; 39(1)2023 01 01.
Article in English | MEDLINE | ID: mdl-36525368

ABSTRACT

SUMMARY: We present YaHS, a user-friendly command-line tool for the construction of chromosome-scale scaffolds from Hi-C data. It can be run with a single-line command, requires minimal input from users (an assembly file and an alignment file) which is compatible with similar tools and provides assembly results in multiple formats, thereby enabling rapid, robust and scalable construction of high-quality genome assemblies with high accuracy and contiguity. AVAILABILITY AND IMPLEMENTATION: YaHS is implemented in C and licensed under the MIT License. The source code, documentation and tutorial are available at https://github.com/sanger-tol/yahs. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Documentation , Software
13.
BMC Bioinformatics ; 24(1): 288, 2023 Jul 18.
Article in English | MEDLINE | ID: mdl-37464285

ABSTRACT

BACKGROUND:  PacBio high fidelity (HiFi) sequencing reads are both long (15-20 kb) and highly accurate (> Q20). Because of these properties, they have revolutionised genome assembly leading to more accurate and contiguous genomes. In eukaryotes the mitochondrial genome is sequenced alongside the nuclear genome often at very high coverage. A dedicated tool for mitochondrial genome assembly using HiFi reads is still missing. RESULTS:  MitoHiFi was developed within the Darwin Tree of Life Project to assemble mitochondrial genomes from the HiFi reads generated for target species. The input for MitoHiFi is either the raw reads or the assembled contigs, and the tool outputs a mitochondrial genome sequence fasta file along with annotation of protein and RNA genes. Variants arising from heteroplasmy are assembled independently, and nuclear insertions of mitochondrial sequences are identified and not used in organellar genome assembly. MitoHiFi has been used to assemble 374 mitochondrial genomes (368 Metazoa and 6 Fungi species) for the Darwin Tree of Life Project, the Vertebrate Genomes Project and the Aquatic Symbiosis Genome Project. Inspection of 60 mitochondrial genomes assembled with MitoHiFi for species that already have reference sequences in public databases showed the widespread presence of previously unreported repeats. CONCLUSIONS:  MitoHiFi is able to assemble mitochondrial genomes from a wide phylogenetic range of taxa from Pacbio HiFi data. MitoHiFi is written in python and is freely available on GitHub ( https://github.com/marcelauliano/MitoHiFi ). MitoHiFi is available with its dependencies as a Docker container on GitHub (ghcr.io/marcelauliano/mitohifi:master).


Subject(s)
Genome, Mitochondrial , Phylogeny , RNA , Eukaryota , Sequence Analysis, DNA , High-Throughput Nucleotide Sequencing
14.
Mol Biol Evol ; 39(2)2022 02 03.
Article in English | MEDLINE | ID: mdl-35084493

ABSTRACT

Joint phylogenetic analysis of ancient DNA (aDNA) with modern phylogenies is hampered by low sequence coverage and post-mortem deamination, often resulting in overconservative or incorrect assignment. We provide a new efficient likelihood-based workflow, pathPhynder, that takes advantage of all the polymorphic sites in the target sequence. This effectively evaluates the number of ancestral and derived alleles present on each branch and reports the most likely placement of an ancient sample in the phylogeny and a haplogroup assignment, together with alternatives and supporting evidence. To illustrate the application of pathPhynder, we show improved Y chromosome assignments for published aDNA sequences, using a newly compiled Y variation data set (120,908 markers from 2,014 samples) that significantly enhances Y haplogroup assignment for low coverage samples. We apply the method to all published male aDNA samples from Africa, giving new insights into ancient migrations and the relationships between ancient and modern populations. The same software can be used to place samples with large amounts of missing data into other large non-recombining phylogenies such as the mitochondrial tree.


Subject(s)
Chromosomes, Human, Y , DNA, Ancient , Phylogeny , Base Sequence , DNA, Ancient/analysis , DNA, Mitochondrial/genetics , Haplotypes , Humans , Likelihood Functions , Male , Sequence Analysis, DNA/methods
15.
Mol Biol Evol ; 39(11)2022 11 03.
Article in English | MEDLINE | ID: mdl-36376993

ABSTRACT

Rapid ecological speciation along depth gradients has taken place repeatedly in freshwater fishes, yet molecular mechanisms facilitating such diversification are typically unclear. In Lake Masoko, an African crater lake, the cichlid Astatotilapia calliptera has diverged into shallow-littoral and deep-benthic ecomorphs with strikingly different jaw structures within the last 1,000 years. Using genome-wide transcriptome data, we explore two major regulatory transcriptional mechanisms, expression and splicing-QTL variants, and examine their contributions to differential gene expression underpinning functional phenotypes. We identified 7,550 genes with significant differential expression between ecomorphs, of which 5.4% were regulated by cis-regulatory expression QTLs, and 9.2% were regulated by cis-regulatory splicing QTLs. We also found strong signals of divergent selection on differentially expressed genes associated with craniofacial development. These results suggest that large-scale transcriptome modification plays an important role during early-stage speciation. We conclude that regulatory variants are important targets of selection driving ecologically relevant divergence in gene expression during adaptive diversification.


Subject(s)
Cichlids , Genetic Speciation , Animals , Cichlids/genetics , Lakes , Phenotype , Quantitative Trait Loci
16.
Nat Methods ; 17(6): 615-620, 2020 06.
Article in English | MEDLINE | ID: mdl-32366989

ABSTRACT

Methods to deconvolve single-cell RNA-sequencing (scRNA-seq) data are necessary for samples containing a mixture of genotypes, whether they are natural or experimentally combined. Multiplexing across donors is a popular experimental design that can avoid batch effects, reduce costs and improve doublet detection. By using variants detected in scRNA-seq reads, it is possible to assign cells to their donor of origin and identify cross-genotype doublets that may have highly similar transcriptional profiles, precluding detection by transcriptional profile. More subtle cross-genotype variant contamination can be used to estimate the amount of ambient RNA. Ambient RNA is caused by cell lysis before droplet partitioning and is an important confounder of scRNA-seq analysis. Here we develop souporcell, a method to cluster cells using the genetic variants detected within the scRNA-seq reads. We show that it achieves high accuracy on genotype clustering, doublet detection and ambient RNA estimation, as demonstrated across a range of challenging scenarios.


Subject(s)
RNA-Seq/methods , RNA/genetics , Single-Cell Analysis/methods , Algorithms , Base Sequence , Cell Line , Cluster Analysis , Genotype , Humans , Polymorphism, Single Nucleotide , Sensitivity and Specificity , Software
18.
Nature ; 546(7658): 370-375, 2017 06 15.
Article in English | MEDLINE | ID: mdl-28489815

ABSTRACT

Technology utilizing human induced pluripotent stem cells (iPS cells) has enormous potential to provide improved cellular models of human disease. However, variable genetic and phenotypic characterization of many existing iPS cell lines limits their potential use for research and therapy. Here we describe the systematic generation, genotyping and phenotyping of 711 iPS cell lines derived from 301 healthy individuals by the Human Induced Pluripotent Stem Cells Initiative. Our study outlines the major sources of genetic and phenotypic variation in iPS cells and establishes their suitability as models of complex human traits and cancer. Through genome-wide profiling we find that 5-46% of the variation in different iPS cell phenotypes, including differentiation capacity and cellular morphology, arises from differences between individuals. Additionally, we assess the phenotypic consequences of genomic copy-number alterations that are repeatedly observed in iPS cells. In addition, we present a comprehensive map of common regulatory variants affecting the transcriptome of human pluripotent cells.


Subject(s)
Genetic Variation/genetics , Induced Pluripotent Stem Cells/metabolism , Cells, Cultured , Cellular Reprogramming/genetics , DNA Copy Number Variations/genetics , Gene Expression Regulation/genetics , Genotype , Humans , Organ Specificity , Phenotype , Quality Control , Quantitative Trait Loci/genetics , Transcriptome/genetics
SELECTION OF CITATIONS
SEARCH DETAIL