Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 202
Filtrar
1.
bioRxiv ; 2024 Apr 01.
Artigo em Inglês | MEDLINE | ID: mdl-38617250

RESUMO

East African cichlid fishes have diversified in an explosive fashion, but the (epi)genetic basis of the phenotypic diversity of these fishes remains largely unknown. Although transposable elements (TEs) have been associated with phenotypic variation in cichlids, little is known about their transcriptional activity and epigenetic silencing. Here, we describe dynamic patterns of TE expression in African cichlid gonads and during early development. Orthology inference revealed an expansion of piwil1 genes in Lake Malawi cichlids, likely driven by PiggyBac TEs. The expanded piwil1 copies have signatures of positive selection and retain amino acid residues essential for catalytic activity. Furthermore, the gonads of African cichlids express a Piwi-interacting RNA (piRNA) pathway that target TEs. We define the genomic sites of piRNA production in African cichlids and find divergence in closely related species, in line with fast evolution of piRNA-producing loci. Our findings suggest dynamic co-evolution of TEs and host silencing pathways in the African cichlid radiations. We propose that this co-evolution has contributed to cichlid genomic diversity.

2.
Nat Rev Genet ; 2024 Apr 22.
Artigo em Inglês | MEDLINE | ID: mdl-38649458

RESUMO

Genome sequences largely determine the biology and encode the history of an organism, and de novo assembly - the process of reconstructing the genome sequence of an organism from sequencing reads - has been a central problem in bioinformatics for four decades. Until recently, genomes were typically assembled into fragments of a few megabases at best, but now technological advances in long-read sequencing enable the near-complete assembly of each chromosome - also known as telomere-to-telomere assembly - for many organisms. Here, we review recent progress on assembly algorithms and protocols, with a focus on how to derive near-telomere-to-telomere assemblies. We also discuss the additional developments that will be required to resolve remaining assembly gaps and to assemble non-diploid genomes.

3.
Cell Genom ; 4(3): 100507, 2024 Mar 13.
Artigo em Inglês | MEDLINE | ID: mdl-38417441

RESUMO

The harsh climate of Arabia has posed challenges in generating ancient DNA from the region, hindering the direct examination of ancient genomes for understanding the demographic processes that shaped Arabian populations. In this study, we report whole-genome sequence data obtained from four Tylos-period individuals from Bahrain. Their genetic ancestry can be modeled as a mixture of sources from ancient Anatolia, Levant, and Iran/Caucasus, with variation between individuals suggesting population heterogeneity in Bahrain before the onset of Islam. We identify the G6PD Mediterranean mutation associated with malaria resistance in three out of four ancient Bahraini samples and estimate that it rose in frequency in Eastern Arabia from 5 to 6 kya onward, around the time agriculture appeared in the region. Our study characterizes the genetic composition of ancient Arabians, shedding light on the population history of Bahrain and demonstrating the feasibility of studies of ancient DNA in the region.


Assuntos
Árabes , DNA Antigo , Genética Populacional , Genoma Humano , Humanos , Árabes/genética , Barein
4.
bioRxiv ; 2024 Jan 31.
Artigo em Inglês | MEDLINE | ID: mdl-38352436

RESUMO

Neural crest (NC) is a vertebrate-specific embryonic progenitor cell population at the basis of important vertebrate features such as the craniofacial skeleton and pigmentation patterns. Despite the wide-ranging variation of NC-derived traits across vertebrates, the contribution of NC to species diversification remains largely unexplored. Here, by leveraging the adaptive diversity of African Great Lakes' cichlid species, we combined comparative transcriptomics and population genomics to investigate the role of NC development in morphological diversification. Our analysis revealed substantial differences in transcriptional landscapes across somitogenesis, an embryonic period coinciding with NC development and migration. Notably, several NC-related gene expression clusters showed both species-specific divergence in transcriptional landscapes and signatures of positive selection. Specifically, we identified two paralogs of the sox10 gene as prime NC-related candidates contributing to interspecific morphological variation, which displayed remarkable spatio-temporal expression variation in cichlids. Finally, through CRISPR-KO mutants, we experimentally validated the functional divergence between sox10 paralogs, with the acquisition of a novel role in cichlid skeletogenesis by sox10-like. Our study demonstrates the central role of NC-related processes - in particular those controlled by sox10s - in generating morphological diversification among closely-related species and lays the groundwork for further investigations into the mechanisms underpinning vertebrate NC diversification.

5.
Nature ; 625(7994): 312-320, 2024 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-38200293

RESUMO

The Holocene (beginning around 12,000 years ago) encompassed some of the most significant changes in human evolution, with far-reaching consequences for the dietary, physical and mental health of present-day populations. Using a dataset of more than 1,600 imputed ancient genomes1, we modelled the selection landscape during the transition from hunting and gathering, to farming and pastoralism across West Eurasia. We identify key selection signals related to metabolism, including that selection at the FADS cluster began earlier than previously reported and that selection near the LCT locus predates the emergence of the lactase persistence allele by thousands of years. We also find strong selection in the HLA region, possibly due to increased exposure to pathogens during the Bronze Age. Using ancient individuals to infer local ancestry tracts in over 400,000 samples from the UK Biobank, we identify widespread differences in the distribution of Mesolithic, Neolithic and Bronze Age ancestries across Eurasia. By calculating ancestry-specific polygenic risk scores, we show that height differences between Northern and Southern Europe are associated with differential Steppe ancestry, rather than selection, and that risk alleles for mood-related phenotypes are enriched for Neolithic farmer ancestry, whereas risk alleles for diabetes and Alzheimer's disease are enriched for Western hunter-gatherer ancestry. Our results indicate that ancient selection and migration were large contributors to the distribution of phenotypic diversity in present-day Europeans.


Assuntos
Asiático , População Europeia , Genoma Humano , Seleção Genética , Humanos , Afeto , Agricultura/história , Alelos , Doença de Alzheimer/genética , Ásia/etnologia , Asiático/genética , Diabetes Mellitus/genética , Europa (Continente)/etnologia , População Europeia/genética , Fazendeiros/história , Loci Gênicos/genética , Predisposição Genética para Doença , Genoma Humano/genética , História Antiga , Migração Humana , Caça/história , Família Multigênica/genética , Fenótipo , Biobanco do Reino Unido , Herança Multifatorial/genética
6.
Genome Res ; 33(7): 1023-1031, 2023 07.
Artigo em Inglês | MEDLINE | ID: mdl-37562965

RESUMO

The pairwise sequentially Markovian coalescent (PSMC) algorithm and its extensions infer the coalescence time of two homologous chromosomes at each genomic position. This inference is used in reconstructing demographic histories, detecting selection signatures, studying genome-wide associations, constructing ancestral recombination graphs, and more. Inference of coalescence times between each pair of haplotypes in a large data set is of great interest, as they may provide rich information about the population structure and history of the sample. Here, we introduce a new method, Gamma-SMC, which is more than 10 times faster than current methods. To obtain this speed-up, we represent the posterior coalescence time distributions succinctly as a gamma distribution with just two parameters; in contrast, PSMC and its extensions hold these in a vector over discrete intervals of time. Thus, Gamma-SMC has constant time-complexity per site, without dependence on the number of discrete time states. Additionally, because of this continuous representation, our method is able to infer times spanning many orders of magnitude and, as such, is robust to parameter misspecification. We describe how this approach works, show its performance on simulated and real data, and illustrate its use in studying recent positive selection in the 1000 Genomes Project data set.


Assuntos
Genoma , Genômica , Haplótipos , Cromossomos/genética , Algoritmos , Modelos Genéticos , Genética Populacional
7.
ArXiv ; 2023 Aug 15.
Artigo em Inglês | MEDLINE | ID: mdl-37645045

RESUMO

De novo assembly is the process of reconstructing the genome sequence of an organism from sequencing reads. Genome sequences are essential to biology, and assembly has been a central problem in bioinformatics for four decades. Until recently, genomes were typically assembled into fragments of a few megabases at best but technological advances in long-read sequencing now enable near complete chromosome-level assembly, also known as telomere-to-telomere assembly, for many organisms. Here we review recent progress on assembly algorithms and protocols. We focus on how to derive near telomere-to-telomere assemblies and discuss potential future developments.

8.
BMC Bioinformatics ; 24(1): 288, 2023 Jul 18.
Artigo em Inglês | MEDLINE | ID: mdl-37464285

RESUMO

BACKGROUND:  PacBio high fidelity (HiFi) sequencing reads are both long (15-20 kb) and highly accurate (> Q20). Because of these properties, they have revolutionised genome assembly leading to more accurate and contiguous genomes. In eukaryotes the mitochondrial genome is sequenced alongside the nuclear genome often at very high coverage. A dedicated tool for mitochondrial genome assembly using HiFi reads is still missing. RESULTS:  MitoHiFi was developed within the Darwin Tree of Life Project to assemble mitochondrial genomes from the HiFi reads generated for target species. The input for MitoHiFi is either the raw reads or the assembled contigs, and the tool outputs a mitochondrial genome sequence fasta file along with annotation of protein and RNA genes. Variants arising from heteroplasmy are assembled independently, and nuclear insertions of mitochondrial sequences are identified and not used in organellar genome assembly. MitoHiFi has been used to assemble 374 mitochondrial genomes (368 Metazoa and 6 Fungi species) for the Darwin Tree of Life Project, the Vertebrate Genomes Project and the Aquatic Symbiosis Genome Project. Inspection of 60 mitochondrial genomes assembled with MitoHiFi for species that already have reference sequences in public databases showed the widespread presence of previously unreported repeats. CONCLUSIONS:  MitoHiFi is able to assemble mitochondrial genomes from a wide phylogenetic range of taxa from Pacbio HiFi data. MitoHiFi is written in python and is freely available on GitHub ( https://github.com/marcelauliano/MitoHiFi ). MitoHiFi is available with its dependencies as a Docker container on GitHub (ghcr.io/marcelauliano/mitohifi:master).


Assuntos
Genoma Mitocondrial , Filogenia , RNA , Eucariotos , Análise de Sequência de DNA , Sequenciamento de Nucleotídeos em Larga Escala
9.
Nat Commun ; 14(1): 3412, 2023 06 09.
Artigo em Inglês | MEDLINE | ID: mdl-37296119

RESUMO

Numerous novel adaptations characterise the radiation of notothenioids, the dominant fish group in the freezing seas of the Southern Ocean. To improve understanding of the evolution of this iconic fish group, here we generate and analyse new genome assemblies for 24 species covering all major subgroups of the radiation, including five long-read assemblies. We present a new estimate for the onset of the radiation at 10.7 million years ago, based on a time-calibrated phylogeny derived from genome-wide sequence data. We identify a two-fold variation in genome size, driven by expansion of multiple transposable element families, and use the long-read data to reconstruct two evolutionarily important, highly repetitive gene family loci. First, we present the most complete reconstruction to date of the antifreeze glycoprotein gene family, whose emergence enabled survival in sub-zero temperatures, showing the expansion of the antifreeze gene locus from the ancestral to the derived state. Second, we trace the loss of haemoglobin genes in icefishes, the only vertebrates lacking functional haemoglobins, through complete reconstruction of the two haemoglobin gene clusters across notothenioid families. Both the haemoglobin and antifreeze genomic loci are characterised by multiple transposon expansions that may have driven the evolutionary history of these genes.


Assuntos
Peixes , Perciformes , Animais , Peixes/genética , Genômica , Vertebrados , Filogenia , Hemoglobinas/genética , Regiões Antárticas
10.
Mol Biol Evol ; 40(5)2023 05 02.
Artigo em Inglês | MEDLINE | ID: mdl-37194566

RESUMO

We present genome sequences for the caecilians Geotrypetes seraphini (3.8 Gb) and Microcaecilia unicolor (4.7 Gb), representatives of a limbless, mostly soil-dwelling amphibian clade with reduced eyes, and unique putatively chemosensory tentacles. More than 69% of both genomes are composed of repeats, with retrotransposons being the most abundant. We identify 1,150 orthogroups that are unique to caecilians and enriched for functions in olfaction and detection of chemical signals. There are 379 orthogroups with signatures of positive selection on caecilian lineages with roles in organ development and morphogenesis, sensory perception, and immunity amongst others. We discover that caecilian genomes are missing the zone of polarizing activity regulatorysequence (ZRS) enhancer of Sonic Hedgehog which is also mutated in snakes. In vivo deletions have shown ZRS is required for limb development in mice, thus, revealing a shared molecular target implicated in the independent evolution of limblessness in snakes and caecilians.


Assuntos
Anfíbios , Proteínas Hedgehog , Animais , Camundongos , Proteínas Hedgehog/genética , Anfíbios/genética , Genoma , Serpentes/genética , Aclimatação , Evolução Molecular
11.
Bioinformatics ; 39(1)2023 01 01.
Artigo em Inglês | MEDLINE | ID: mdl-36525368

RESUMO

SUMMARY: We present YaHS, a user-friendly command-line tool for the construction of chromosome-scale scaffolds from Hi-C data. It can be run with a single-line command, requires minimal input from users (an assembly file and an alignment file) which is compatible with similar tools and provides assembly results in multiple formats, thereby enabling rapid, robust and scalable construction of high-quality genome assemblies with high accuracy and contiguity. AVAILABILITY AND IMPLEMENTATION: YaHS is implemented in C and licensed under the MIT License. The source code, documentation and tutorial are available at https://github.com/sanger-tol/yahs. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Documentação , Software
12.
Mol Ecol Resour ; 23(4): 872-885, 2023 May.
Artigo em Inglês | MEDLINE | ID: mdl-36533297

RESUMO

The ithomiine butterflies (Nymphalidae: Danainae) represent the largest known radiation of Müllerian mimetic butterflies. They dominate by number the mimetic butterfly communities, which include species such as the iconic neotropical Heliconius genus. Recent studies on the ecology and genetics of speciation in Ithomiini have suggested that sexual pheromones, colour pattern and perhaps hostplant could drive reproductive isolation. However, no reference genome was available for Ithomiini, which has hindered further exploration on the genetic architecture of these candidate traits, and more generally on the genomic patterns of divergence. Here, we generated high-quality, chromosome-scale genome assemblies for two Melinaea species, M. marsaeus and M. menophilus, and a draft genome of the species Ithomia salapia. We obtained genomes with a size ranging from 396 to 503 Mb across the three species and scaffold N50 of 40.5 and 23.2 Mb for the two chromosome-scale assemblies. Using collinearity analyses we identified massive rearrangements between the two closely related Melinaea species. An annotation of transposable elements and gene content was performed, as well as a specialist annotation to target chemosensory genes, which is crucial for host plant detection and mate recognition in mimetic species. A comparative genomic approach revealed independent gene expansions in ithomiines and particularly in gustatory receptor genes. These first three genomes of ithomiine mimetic butterflies constitute a valuable addition and a welcome comparison to existing biological models such as Heliconius, and will enable further understanding of the mechanisms of adaptation in butterflies.


Assuntos
Borboletas , Animais , Borboletas/genética , Adaptação Fisiológica , Fenótipo , Genômica , Cromossomos/genética
13.
Wellcome Open Res ; 8: 401, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-38680652

RESUMO

Sequences derived from circular DNA molecules (i.e. most bacterial, viral and plastid genomes) are expected to be linearised and rotated to a common start position for most downstream analyses including alignment. Despite this being a common and straightforward task, available software is either limited to a small number of input sequences, lacks the option to specify a custom anchor string, or requires a commercial license. Here, we present rotate, a simple, open source command line program written in C with no external dependencies, which can rotate a set of input sequences to a custom anchor string (allowing for a specified number of mismatches), or offset the input sequences to the desired position. The combination of both functionalities allows the rotation of all input sequences to any desired starting position, enabling downstream analysis. rotate is extremely fast and scales linearly with the number of input sequences, taking only seconds to rotate over a thousand mitochondrial sequences.

14.
Nature ; 612(7939): 283-291, 2022 12.
Artigo em Inglês | MEDLINE | ID: mdl-36477129

RESUMO

Late Pliocene and Early Pleistocene epochs 3.6 to 0.8 million years ago1 had climates resembling those forecasted under future warming2. Palaeoclimatic records show strong polar amplification with mean annual temperatures of 11-19 °C above contemporary values3,4. The biological communities inhabiting the Arctic during this time remain poorly known because fossils are rare5. Here we report an ancient environmental DNA6 (eDNA) record describing the rich plant and animal assemblages of the Kap København Formation in North Greenland, dated to around two million years ago. The record shows an open boreal forest ecosystem with mixed vegetation of poplar, birch and thuja trees, as well as a variety of Arctic and boreal shrubs and herbs, many of which had not previously been detected at the site from macrofossil and pollen records. The DNA record confirms the presence of hare and mitochondrial DNA from animals including mastodons, reindeer, rodents and geese, all ancestral to their present-day and late Pleistocene relatives. The presence of marine species including horseshoe crab and green algae support a warmer climate than today. The reconstructed ecosystem has no modern analogue. The survival of such ancient eDNA probably relates to its binding to mineral surfaces. Our findings open new areas of genetic research, demonstrating that it is possible to track the ecology and evolution of biological communities from two million years ago using ancient eDNA.


Assuntos
DNA Ambiental , Ecossistema , Ecologia , Fósseis , Groenlândia
16.
Mol Biol Evol ; 39(11)2022 11 03.
Artigo em Inglês | MEDLINE | ID: mdl-36376993

RESUMO

Rapid ecological speciation along depth gradients has taken place repeatedly in freshwater fishes, yet molecular mechanisms facilitating such diversification are typically unclear. In Lake Masoko, an African crater lake, the cichlid Astatotilapia calliptera has diverged into shallow-littoral and deep-benthic ecomorphs with strikingly different jaw structures within the last 1,000 years. Using genome-wide transcriptome data, we explore two major regulatory transcriptional mechanisms, expression and splicing-QTL variants, and examine their contributions to differential gene expression underpinning functional phenotypes. We identified 7,550 genes with significant differential expression between ecomorphs, of which 5.4% were regulated by cis-regulatory expression QTLs, and 9.2% were regulated by cis-regulatory splicing QTLs. We also found strong signals of divergent selection on differentially expressed genes associated with craniofacial development. These results suggest that large-scale transcriptome modification plays an important role during early-stage speciation. We conclude that regulatory variants are important targets of selection driving ecologically relevant divergence in gene expression during adaptive diversification.


Assuntos
Ciclídeos , Especiação Genética , Animais , Ciclídeos/genética , Lagos , Fenótipo , Locos de Características Quantitativas
17.
Elife ; 112022 Oct 12.
Artigo em Inglês | MEDLINE | ID: mdl-36222650

RESUMO

The ANOSPP amplicon panel is a genus-wide targeted sequencing panel to facilitate large-scale monitoring of Anopheles species diversity. Combining information from the 62 nuclear amplicons present in the ANOSPP panel allows for a more senstive and specific species assignment than single gene (e.g. COI) barcoding, which is desirable in the light of permeable species boundaries. Here, we present NNoVAE, a method using Nearest Neighbours (NN) and Variational Autoencoders (VAE), which we apply to k-mers resulting from the ANOSPP amplicon sequences in order to hierarchically assign species identity. The NN step assigns a sample to a species-group by comparing the k-mers arising from each haplotype's amplicon sequence to a reference database. The VAE step is required to distinguish between closely related species, and also has sufficient resolution to reveal population structure within species. In tests on independent samples with over 80% amplicon coverage, NNoVAE correctly classifies to species level 98% of samples within the An. gambiae complex and 89% of samples outside the complex. We apply NNoVAE to over two thousand new samples from Burkina Faso and Gabon, identifying unexpected species in Gabon. NNoVAE presents an approach that may be of value to other targeted sequencing panels, and is a method that will be used to survey Anopheles species diversity and Plasmodium transmission patterns through space and time on a large scale, with plans to analyse half a million mosquitoes in the next five years.


Assuntos
Anopheles , Animais , Anopheles/genética , Burkina Faso , Gabão
18.
Nat Ecol Evol ; 6(12): 1940-1951, 2022 12.
Artigo em Inglês | MEDLINE | ID: mdl-36266459

RESUMO

Epigenetic variation can alter transcription and promote phenotypic divergence between populations facing different environmental challenges. Here, we assess the epigenetic basis of diversification during the early stages of speciation. Specifically, we focus on the extent and functional relevance of DNA methylome divergence in the very young radiation of Astatotilapia calliptera in crater Lake Masoko, southern Tanzania. Our study focuses on two lake ecomorphs that diverged approximately 1,000 years ago and a population in the nearby river from which they separated approximately 10,000 years ago. The two lake ecomorphs show no fixed genetic differentiation, yet are characterized by different morphologies, depth preferences and diets. We report extensive genome-wide methylome divergence between the two lake ecomorphs, and between the lake and river populations, linked to key biological processes and associated with altered transcriptional activity of ecologically relevant genes. Such genes differing between lake ecomorphs include those involved in steroid metabolism, hemoglobin composition and erythropoiesis, consistent with their divergent habitat occupancy. Using a common-garden experiment, we found that global methylation profiles are often rapidly remodeled across generations but ecomorph-specific differences can be inherited. Collectively, our study suggests an epigenetic contribution to the early stages of vertebrate speciation.


Assuntos
Ciclídeos , Lagos , Animais , Evolução Biológica , Ciclídeos/genética , Ecossistema , Epigênese Genética
19.
Genome Biol ; 23(1): 204, 2022 09 27.
Artigo em Inglês | MEDLINE | ID: mdl-36167554

RESUMO

BACKGROUND: Many short-read genome assemblies have been found to be incomplete and contain mis-assemblies. The Vertebrate Genomes Project has been producing new reference genome assemblies with an emphasis on being as complete and error-free as possible, which requires utilizing long reads, long-range scaffolding data, new assembly algorithms, and manual curation. A more thorough evaluation of the recent references relative to prior assemblies can provide a detailed overview of the types and magnitude of improvements. RESULTS: Here we evaluate new vertebrate genome references relative to the previous assemblies for the same species and, in two cases, the same individuals, including a mammal (platypus), two birds (zebra finch, Anna's hummingbird), and a fish (climbing perch). We find that up to 11% of genomic sequence is entirely missing in the previous assemblies. In the Vertebrate Genomes Project zebra finch assembly, we identify eight new GC- and repeat-rich micro-chromosomes with high gene density. The impact of missing sequences is biased towards GC-rich 5'-proximal promoters and 5' exon regions of protein-coding genes and long non-coding RNAs. Between 26 and 60% of genes include structural or sequence errors that could lead to misunderstanding of their function when using the previous genome assemblies. CONCLUSIONS: Our findings reveal novel regulatory landscapes and protein coding sequences that have been greatly underestimated in previous assemblies and are now present in the Vertebrate Genomes Project reference genomes.


Assuntos
Genoma , Vertebrados , Animais , Composição de Bases/genética , Cromossomos , Genoma/genética , Análise de Sequência de DNA , Vertebrados/genética
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA