Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 202
Filter
1.
bioRxiv ; 2024 Apr 01.
Article in English | MEDLINE | ID: mdl-38617250

ABSTRACT

East African cichlid fishes have diversified in an explosive fashion, but the (epi)genetic basis of the phenotypic diversity of these fishes remains largely unknown. Although transposable elements (TEs) have been associated with phenotypic variation in cichlids, little is known about their transcriptional activity and epigenetic silencing. Here, we describe dynamic patterns of TE expression in African cichlid gonads and during early development. Orthology inference revealed an expansion of piwil1 genes in Lake Malawi cichlids, likely driven by PiggyBac TEs. The expanded piwil1 copies have signatures of positive selection and retain amino acid residues essential for catalytic activity. Furthermore, the gonads of African cichlids express a Piwi-interacting RNA (piRNA) pathway that target TEs. We define the genomic sites of piRNA production in African cichlids and find divergence in closely related species, in line with fast evolution of piRNA-producing loci. Our findings suggest dynamic co-evolution of TEs and host silencing pathways in the African cichlid radiations. We propose that this co-evolution has contributed to cichlid genomic diversity.

2.
Nat Rev Genet ; 2024 Apr 22.
Article in English | MEDLINE | ID: mdl-38649458

ABSTRACT

Genome sequences largely determine the biology and encode the history of an organism, and de novo assembly - the process of reconstructing the genome sequence of an organism from sequencing reads - has been a central problem in bioinformatics for four decades. Until recently, genomes were typically assembled into fragments of a few megabases at best, but now technological advances in long-read sequencing enable the near-complete assembly of each chromosome - also known as telomere-to-telomere assembly - for many organisms. Here, we review recent progress on assembly algorithms and protocols, with a focus on how to derive near-telomere-to-telomere assemblies. We also discuss the additional developments that will be required to resolve remaining assembly gaps and to assemble non-diploid genomes.

3.
bioRxiv ; 2024 Jun 07.
Article in English | MEDLINE | ID: mdl-38352436

ABSTRACT

Neural crest (NC) is a vertebrate-specific embryonic progenitor cell population at the basis of important vertebrate features such as the craniofacial skeleton and pigmentation patterns. Despite the wide-ranging variation of NC-derived traits across vertebrates, the contribution of NC to species diversification remains largely unexplored. Here, leveraging the adaptive diversity of African Great Lakes' cichlid species, we combined comparative transcriptomics and population genomics to investigate the evolution of the NC genetic programme in the context of their morphological divergence. Our analysis revealed substantial differences in transcriptional landscapes across somitogenesis, an embryonic period coinciding with NC development and migration. This included dozens of genes with described functions in the vertebrate NC gene regulatory network, several of which showed signatures of positive selection. Among candidate genes showing between-species expression divergence, we focused on two teleost-specific paralogs of the NC-specifier gene sox10 ( sox10a and sox10b ) as prime candidates to influence NC development. These genes, expressed in NC cells, displayed remarkable spatio-temporal expression variation in cichlids, suggesting their contribution to inter-specific morphological differences. Finally, through CRISPR/Cas9 mutagenesis, we experimentally demonstrated the functional divergence between cichlid sox10 paralogs, with the acquisition of a novel skeletogenic function by sox10a . When compared to the two teleost models zebrafish and medaka, our findings reveal that sox10 duplication, although retained in most teleost lineages, have had variable functional fates across their phylogeny. Altogether, our study suggests that NC-related processes - particularly those controlled by sox10 s - might be involved in generating morphological diversification between species and lays the groundwork for further investigations into the mechanisms underpinning vertebrate NC diversification.

4.
Cell Genom ; 4(3): 100507, 2024 Mar 13.
Article in English | MEDLINE | ID: mdl-38417441

ABSTRACT

The harsh climate of Arabia has posed challenges in generating ancient DNA from the region, hindering the direct examination of ancient genomes for understanding the demographic processes that shaped Arabian populations. In this study, we report whole-genome sequence data obtained from four Tylos-period individuals from Bahrain. Their genetic ancestry can be modeled as a mixture of sources from ancient Anatolia, Levant, and Iran/Caucasus, with variation between individuals suggesting population heterogeneity in Bahrain before the onset of Islam. We identify the G6PD Mediterranean mutation associated with malaria resistance in three out of four ancient Bahraini samples and estimate that it rose in frequency in Eastern Arabia from 5 to 6 kya onward, around the time agriculture appeared in the region. Our study characterizes the genetic composition of ancient Arabians, shedding light on the population history of Bahrain and demonstrating the feasibility of studies of ancient DNA in the region.


Subject(s)
Arabs , DNA, Ancient , Genetics, Population , Genome, Human , Humans , Arabs/genetics , Bahrain
5.
Nature ; 625(7994): 312-320, 2024 Jan.
Article in English | MEDLINE | ID: mdl-38200293

ABSTRACT

The Holocene (beginning around 12,000 years ago) encompassed some of the most significant changes in human evolution, with far-reaching consequences for the dietary, physical and mental health of present-day populations. Using a dataset of more than 1,600 imputed ancient genomes1, we modelled the selection landscape during the transition from hunting and gathering, to farming and pastoralism across West Eurasia. We identify key selection signals related to metabolism, including that selection at the FADS cluster began earlier than previously reported and that selection near the LCT locus predates the emergence of the lactase persistence allele by thousands of years. We also find strong selection in the HLA region, possibly due to increased exposure to pathogens during the Bronze Age. Using ancient individuals to infer local ancestry tracts in over 400,000 samples from the UK Biobank, we identify widespread differences in the distribution of Mesolithic, Neolithic and Bronze Age ancestries across Eurasia. By calculating ancestry-specific polygenic risk scores, we show that height differences between Northern and Southern Europe are associated with differential Steppe ancestry, rather than selection, and that risk alleles for mood-related phenotypes are enriched for Neolithic farmer ancestry, whereas risk alleles for diabetes and Alzheimer's disease are enriched for Western hunter-gatherer ancestry. Our results indicate that ancient selection and migration were large contributors to the distribution of phenotypic diversity in present-day Europeans.


Subject(s)
Asian , European People , Genome, Human , Selection, Genetic , Humans , Affect , Agriculture/history , Alleles , Alzheimer Disease/genetics , Asia/ethnology , Asian/genetics , Diabetes Mellitus/genetics , Europe/ethnology , European People/genetics , Farmers/history , Genetic Loci/genetics , Genetic Predisposition to Disease , Genome, Human/genetics , History, Ancient , Human Migration , Hunting/history , Multigene Family/genetics , Phenotype , UK Biobank , Multifactorial Inheritance/genetics
6.
ArXiv ; 2023 Aug 15.
Article in English | MEDLINE | ID: mdl-37645045

ABSTRACT

De novo assembly is the process of reconstructing the genome sequence of an organism from sequencing reads. Genome sequences are essential to biology, and assembly has been a central problem in bioinformatics for four decades. Until recently, genomes were typically assembled into fragments of a few megabases at best but technological advances in long-read sequencing now enable near complete chromosome-level assembly, also known as telomere-to-telomere assembly, for many organisms. Here we review recent progress on assembly algorithms and protocols. We focus on how to derive near telomere-to-telomere assemblies and discuss potential future developments.

7.
Genome Res ; 33(7): 1023-1031, 2023 07.
Article in English | MEDLINE | ID: mdl-37562965

ABSTRACT

The pairwise sequentially Markovian coalescent (PSMC) algorithm and its extensions infer the coalescence time of two homologous chromosomes at each genomic position. This inference is used in reconstructing demographic histories, detecting selection signatures, studying genome-wide associations, constructing ancestral recombination graphs, and more. Inference of coalescence times between each pair of haplotypes in a large data set is of great interest, as they may provide rich information about the population structure and history of the sample. Here, we introduce a new method, Gamma-SMC, which is more than 10 times faster than current methods. To obtain this speed-up, we represent the posterior coalescence time distributions succinctly as a gamma distribution with just two parameters; in contrast, PSMC and its extensions hold these in a vector over discrete intervals of time. Thus, Gamma-SMC has constant time-complexity per site, without dependence on the number of discrete time states. Additionally, because of this continuous representation, our method is able to infer times spanning many orders of magnitude and, as such, is robust to parameter misspecification. We describe how this approach works, show its performance on simulated and real data, and illustrate its use in studying recent positive selection in the 1000 Genomes Project data set.


Subject(s)
Genome , Genomics , Haplotypes , Chromosomes/genetics , Algorithms , Models, Genetic , Genetics, Population
8.
BMC Bioinformatics ; 24(1): 288, 2023 Jul 18.
Article in English | MEDLINE | ID: mdl-37464285

ABSTRACT

BACKGROUND:  PacBio high fidelity (HiFi) sequencing reads are both long (15-20 kb) and highly accurate (> Q20). Because of these properties, they have revolutionised genome assembly leading to more accurate and contiguous genomes. In eukaryotes the mitochondrial genome is sequenced alongside the nuclear genome often at very high coverage. A dedicated tool for mitochondrial genome assembly using HiFi reads is still missing. RESULTS:  MitoHiFi was developed within the Darwin Tree of Life Project to assemble mitochondrial genomes from the HiFi reads generated for target species. The input for MitoHiFi is either the raw reads or the assembled contigs, and the tool outputs a mitochondrial genome sequence fasta file along with annotation of protein and RNA genes. Variants arising from heteroplasmy are assembled independently, and nuclear insertions of mitochondrial sequences are identified and not used in organellar genome assembly. MitoHiFi has been used to assemble 374 mitochondrial genomes (368 Metazoa and 6 Fungi species) for the Darwin Tree of Life Project, the Vertebrate Genomes Project and the Aquatic Symbiosis Genome Project. Inspection of 60 mitochondrial genomes assembled with MitoHiFi for species that already have reference sequences in public databases showed the widespread presence of previously unreported repeats. CONCLUSIONS:  MitoHiFi is able to assemble mitochondrial genomes from a wide phylogenetic range of taxa from Pacbio HiFi data. MitoHiFi is written in python and is freely available on GitHub ( https://github.com/marcelauliano/MitoHiFi ). MitoHiFi is available with its dependencies as a Docker container on GitHub (ghcr.io/marcelauliano/mitohifi:master).


Subject(s)
Genome, Mitochondrial , Phylogeny , RNA , Eukaryota , Sequence Analysis, DNA , High-Throughput Nucleotide Sequencing
9.
Nat Commun ; 14(1): 3412, 2023 06 09.
Article in English | MEDLINE | ID: mdl-37296119

ABSTRACT

Numerous novel adaptations characterise the radiation of notothenioids, the dominant fish group in the freezing seas of the Southern Ocean. To improve understanding of the evolution of this iconic fish group, here we generate and analyse new genome assemblies for 24 species covering all major subgroups of the radiation, including five long-read assemblies. We present a new estimate for the onset of the radiation at 10.7 million years ago, based on a time-calibrated phylogeny derived from genome-wide sequence data. We identify a two-fold variation in genome size, driven by expansion of multiple transposable element families, and use the long-read data to reconstruct two evolutionarily important, highly repetitive gene family loci. First, we present the most complete reconstruction to date of the antifreeze glycoprotein gene family, whose emergence enabled survival in sub-zero temperatures, showing the expansion of the antifreeze gene locus from the ancestral to the derived state. Second, we trace the loss of haemoglobin genes in icefishes, the only vertebrates lacking functional haemoglobins, through complete reconstruction of the two haemoglobin gene clusters across notothenioid families. Both the haemoglobin and antifreeze genomic loci are characterised by multiple transposon expansions that may have driven the evolutionary history of these genes.


Subject(s)
Fishes , Perciformes , Animals , Fishes/genetics , Genomics , Vertebrates , Phylogeny , Hemoglobins/genetics , Antarctic Regions
10.
Mol Biol Evol ; 40(5)2023 05 02.
Article in English | MEDLINE | ID: mdl-37194566

ABSTRACT

We present genome sequences for the caecilians Geotrypetes seraphini (3.8 Gb) and Microcaecilia unicolor (4.7 Gb), representatives of a limbless, mostly soil-dwelling amphibian clade with reduced eyes, and unique putatively chemosensory tentacles. More than 69% of both genomes are composed of repeats, with retrotransposons being the most abundant. We identify 1,150 orthogroups that are unique to caecilians and enriched for functions in olfaction and detection of chemical signals. There are 379 orthogroups with signatures of positive selection on caecilian lineages with roles in organ development and morphogenesis, sensory perception, and immunity amongst others. We discover that caecilian genomes are missing the zone of polarizing activity regulatorysequence (ZRS) enhancer of Sonic Hedgehog which is also mutated in snakes. In vivo deletions have shown ZRS is required for limb development in mice, thus, revealing a shared molecular target implicated in the independent evolution of limblessness in snakes and caecilians.


Subject(s)
Amphibians , Hedgehog Proteins , Animals , Mice , Hedgehog Proteins/genetics , Amphibians/genetics , Genome , Snakes/genetics , Acclimatization , Evolution, Molecular
11.
Bioinformatics ; 39(1)2023 01 01.
Article in English | MEDLINE | ID: mdl-36525368

ABSTRACT

SUMMARY: We present YaHS, a user-friendly command-line tool for the construction of chromosome-scale scaffolds from Hi-C data. It can be run with a single-line command, requires minimal input from users (an assembly file and an alignment file) which is compatible with similar tools and provides assembly results in multiple formats, thereby enabling rapid, robust and scalable construction of high-quality genome assemblies with high accuracy and contiguity. AVAILABILITY AND IMPLEMENTATION: YaHS is implemented in C and licensed under the MIT License. The source code, documentation and tutorial are available at https://github.com/sanger-tol/yahs. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Documentation , Software
12.
Mol Ecol Resour ; 23(4): 872-885, 2023 May.
Article in English | MEDLINE | ID: mdl-36533297

ABSTRACT

The ithomiine butterflies (Nymphalidae: Danainae) represent the largest known radiation of Müllerian mimetic butterflies. They dominate by number the mimetic butterfly communities, which include species such as the iconic neotropical Heliconius genus. Recent studies on the ecology and genetics of speciation in Ithomiini have suggested that sexual pheromones, colour pattern and perhaps hostplant could drive reproductive isolation. However, no reference genome was available for Ithomiini, which has hindered further exploration on the genetic architecture of these candidate traits, and more generally on the genomic patterns of divergence. Here, we generated high-quality, chromosome-scale genome assemblies for two Melinaea species, M. marsaeus and M. menophilus, and a draft genome of the species Ithomia salapia. We obtained genomes with a size ranging from 396 to 503 Mb across the three species and scaffold N50 of 40.5 and 23.2 Mb for the two chromosome-scale assemblies. Using collinearity analyses we identified massive rearrangements between the two closely related Melinaea species. An annotation of transposable elements and gene content was performed, as well as a specialist annotation to target chemosensory genes, which is crucial for host plant detection and mate recognition in mimetic species. A comparative genomic approach revealed independent gene expansions in ithomiines and particularly in gustatory receptor genes. These first three genomes of ithomiine mimetic butterflies constitute a valuable addition and a welcome comparison to existing biological models such as Heliconius, and will enable further understanding of the mechanisms of adaptation in butterflies.


Subject(s)
Butterflies , Animals , Butterflies/genetics , Adaptation, Physiological , Phenotype , Genomics , Chromosomes/genetics
13.
Wellcome Open Res ; 8: 401, 2023.
Article in English | MEDLINE | ID: mdl-38680652

ABSTRACT

Sequences derived from circular DNA molecules (i.e. most bacterial, viral and plastid genomes) are expected to be linearised and rotated to a common start position for most downstream analyses including alignment. Despite this being a common and straightforward task, available software is either limited to a small number of input sequences, lacks the option to specify a custom anchor string, or requires a commercial license. Here, we present rotate, a simple, open source command line program written in C with no external dependencies, which can rotate a set of input sequences to a custom anchor string (allowing for a specified number of mismatches), or offset the input sequences to the desired position. The combination of both functionalities allows the rotation of all input sequences to any desired starting position, enabling downstream analysis. rotate is extremely fast and scales linearly with the number of input sequences, taking only seconds to rotate over a thousand mitochondrial sequences.

14.
Nature ; 612(7939): 283-291, 2022 12.
Article in English | MEDLINE | ID: mdl-36477129

ABSTRACT

Late Pliocene and Early Pleistocene epochs 3.6 to 0.8 million years ago1 had climates resembling those forecasted under future warming2. Palaeoclimatic records show strong polar amplification with mean annual temperatures of 11-19 °C above contemporary values3,4. The biological communities inhabiting the Arctic during this time remain poorly known because fossils are rare5. Here we report an ancient environmental DNA6 (eDNA) record describing the rich plant and animal assemblages of the Kap København Formation in North Greenland, dated to around two million years ago. The record shows an open boreal forest ecosystem with mixed vegetation of poplar, birch and thuja trees, as well as a variety of Arctic and boreal shrubs and herbs, many of which had not previously been detected at the site from macrofossil and pollen records. The DNA record confirms the presence of hare and mitochondrial DNA from animals including mastodons, reindeer, rodents and geese, all ancestral to their present-day and late Pleistocene relatives. The presence of marine species including horseshoe crab and green algae support a warmer climate than today. The reconstructed ecosystem has no modern analogue. The survival of such ancient eDNA probably relates to its binding to mineral surfaces. Our findings open new areas of genetic research, demonstrating that it is possible to track the ecology and evolution of biological communities from two million years ago using ancient eDNA.


Subject(s)
DNA, Environmental , Ecosystem , Ecology , Fossils , Greenland
15.
Mol Biol Evol ; 39(11)2022 11 03.
Article in English | MEDLINE | ID: mdl-36376993

ABSTRACT

Rapid ecological speciation along depth gradients has taken place repeatedly in freshwater fishes, yet molecular mechanisms facilitating such diversification are typically unclear. In Lake Masoko, an African crater lake, the cichlid Astatotilapia calliptera has diverged into shallow-littoral and deep-benthic ecomorphs with strikingly different jaw structures within the last 1,000 years. Using genome-wide transcriptome data, we explore two major regulatory transcriptional mechanisms, expression and splicing-QTL variants, and examine their contributions to differential gene expression underpinning functional phenotypes. We identified 7,550 genes with significant differential expression between ecomorphs, of which 5.4% were regulated by cis-regulatory expression QTLs, and 9.2% were regulated by cis-regulatory splicing QTLs. We also found strong signals of divergent selection on differentially expressed genes associated with craniofacial development. These results suggest that large-scale transcriptome modification plays an important role during early-stage speciation. We conclude that regulatory variants are important targets of selection driving ecologically relevant divergence in gene expression during adaptive diversification.


Subject(s)
Cichlids , Genetic Speciation , Animals , Cichlids/genetics , Lakes , Phenotype , Quantitative Trait Loci
17.
Nat Ecol Evol ; 6(12): 1940-1951, 2022 12.
Article in English | MEDLINE | ID: mdl-36266459

ABSTRACT

Epigenetic variation can alter transcription and promote phenotypic divergence between populations facing different environmental challenges. Here, we assess the epigenetic basis of diversification during the early stages of speciation. Specifically, we focus on the extent and functional relevance of DNA methylome divergence in the very young radiation of Astatotilapia calliptera in crater Lake Masoko, southern Tanzania. Our study focuses on two lake ecomorphs that diverged approximately 1,000 years ago and a population in the nearby river from which they separated approximately 10,000 years ago. The two lake ecomorphs show no fixed genetic differentiation, yet are characterized by different morphologies, depth preferences and diets. We report extensive genome-wide methylome divergence between the two lake ecomorphs, and between the lake and river populations, linked to key biological processes and associated with altered transcriptional activity of ecologically relevant genes. Such genes differing between lake ecomorphs include those involved in steroid metabolism, hemoglobin composition and erythropoiesis, consistent with their divergent habitat occupancy. Using a common-garden experiment, we found that global methylation profiles are often rapidly remodeled across generations but ecomorph-specific differences can be inherited. Collectively, our study suggests an epigenetic contribution to the early stages of vertebrate speciation.


Subject(s)
Cichlids , Lakes , Animals , Biological Evolution , Cichlids/genetics , Ecosystem , Epigenesis, Genetic
18.
Elife ; 112022 Oct 12.
Article in English | MEDLINE | ID: mdl-36222650

ABSTRACT

The ANOSPP amplicon panel is a genus-wide targeted sequencing panel to facilitate large-scale monitoring of Anopheles species diversity. Combining information from the 62 nuclear amplicons present in the ANOSPP panel allows for a more senstive and specific species assignment than single gene (e.g. COI) barcoding, which is desirable in the light of permeable species boundaries. Here, we present NNoVAE, a method using Nearest Neighbours (NN) and Variational Autoencoders (VAE), which we apply to k-mers resulting from the ANOSPP amplicon sequences in order to hierarchically assign species identity. The NN step assigns a sample to a species-group by comparing the k-mers arising from each haplotype's amplicon sequence to a reference database. The VAE step is required to distinguish between closely related species, and also has sufficient resolution to reveal population structure within species. In tests on independent samples with over 80% amplicon coverage, NNoVAE correctly classifies to species level 98% of samples within the An. gambiae complex and 89% of samples outside the complex. We apply NNoVAE to over two thousand new samples from Burkina Faso and Gabon, identifying unexpected species in Gabon. NNoVAE presents an approach that may be of value to other targeted sequencing panels, and is a method that will be used to survey Anopheles species diversity and Plasmodium transmission patterns through space and time on a large scale, with plans to analyse half a million mosquitoes in the next five years.


Subject(s)
Anopheles , Animals , Anopheles/genetics , Burkina Faso , Gabon
20.
Genome Biol ; 23(1): 204, 2022 09 27.
Article in English | MEDLINE | ID: mdl-36167554

ABSTRACT

BACKGROUND: Many short-read genome assemblies have been found to be incomplete and contain mis-assemblies. The Vertebrate Genomes Project has been producing new reference genome assemblies with an emphasis on being as complete and error-free as possible, which requires utilizing long reads, long-range scaffolding data, new assembly algorithms, and manual curation. A more thorough evaluation of the recent references relative to prior assemblies can provide a detailed overview of the types and magnitude of improvements. RESULTS: Here we evaluate new vertebrate genome references relative to the previous assemblies for the same species and, in two cases, the same individuals, including a mammal (platypus), two birds (zebra finch, Anna's hummingbird), and a fish (climbing perch). We find that up to 11% of genomic sequence is entirely missing in the previous assemblies. In the Vertebrate Genomes Project zebra finch assembly, we identify eight new GC- and repeat-rich micro-chromosomes with high gene density. The impact of missing sequences is biased towards GC-rich 5'-proximal promoters and 5' exon regions of protein-coding genes and long non-coding RNAs. Between 26 and 60% of genes include structural or sequence errors that could lead to misunderstanding of their function when using the previous genome assemblies. CONCLUSIONS: Our findings reveal novel regulatory landscapes and protein coding sequences that have been greatly underestimated in previous assemblies and are now present in the Vertebrate Genomes Project reference genomes.


Subject(s)
Genome , Vertebrates , Animals , Base Composition/genetics , Chromosomes , Genome/genetics , Sequence Analysis, DNA , Vertebrates/genetics
SELECTION OF CITATIONS
SEARCH DETAIL
...