Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 38
Filtrar
1.
Mol Biol Evol ; 40(5)2023 05 02.
Artículo en Inglés | MEDLINE | ID: mdl-37194566

RESUMEN

We present genome sequences for the caecilians Geotrypetes seraphini (3.8 Gb) and Microcaecilia unicolor (4.7 Gb), representatives of a limbless, mostly soil-dwelling amphibian clade with reduced eyes, and unique putatively chemosensory tentacles. More than 69% of both genomes are composed of repeats, with retrotransposons being the most abundant. We identify 1,150 orthogroups that are unique to caecilians and enriched for functions in olfaction and detection of chemical signals. There are 379 orthogroups with signatures of positive selection on caecilian lineages with roles in organ development and morphogenesis, sensory perception, and immunity amongst others. We discover that caecilian genomes are missing the zone of polarizing activity regulatorysequence (ZRS) enhancer of Sonic Hedgehog which is also mutated in snakes. In vivo deletions have shown ZRS is required for limb development in mice, thus, revealing a shared molecular target implicated in the independent evolution of limblessness in snakes and caecilians.


Asunto(s)
Anfibios , Proteínas Hedgehog , Animales , Ratones , Proteínas Hedgehog/genética , Anfibios/genética , Genoma , Serpientes/genética , Aclimatación , Evolución Molecular
2.
BMC Bioinformatics ; 24(1): 288, 2023 Jul 18.
Artículo en Inglés | MEDLINE | ID: mdl-37464285

RESUMEN

BACKGROUND:  PacBio high fidelity (HiFi) sequencing reads are both long (15-20 kb) and highly accurate (> Q20). Because of these properties, they have revolutionised genome assembly leading to more accurate and contiguous genomes. In eukaryotes the mitochondrial genome is sequenced alongside the nuclear genome often at very high coverage. A dedicated tool for mitochondrial genome assembly using HiFi reads is still missing. RESULTS:  MitoHiFi was developed within the Darwin Tree of Life Project to assemble mitochondrial genomes from the HiFi reads generated for target species. The input for MitoHiFi is either the raw reads or the assembled contigs, and the tool outputs a mitochondrial genome sequence fasta file along with annotation of protein and RNA genes. Variants arising from heteroplasmy are assembled independently, and nuclear insertions of mitochondrial sequences are identified and not used in organellar genome assembly. MitoHiFi has been used to assemble 374 mitochondrial genomes (368 Metazoa and 6 Fungi species) for the Darwin Tree of Life Project, the Vertebrate Genomes Project and the Aquatic Symbiosis Genome Project. Inspection of 60 mitochondrial genomes assembled with MitoHiFi for species that already have reference sequences in public databases showed the widespread presence of previously unreported repeats. CONCLUSIONS:  MitoHiFi is able to assemble mitochondrial genomes from a wide phylogenetic range of taxa from Pacbio HiFi data. MitoHiFi is written in python and is freely available on GitHub ( https://github.com/marcelauliano/MitoHiFi ). MitoHiFi is available with its dependencies as a Docker container on GitHub (ghcr.io/marcelauliano/mitohifi:master).


Asunto(s)
Genoma Mitocondrial , Filogenia , ARN , Eucariontes , Análisis de Secuencia de ADN , Secuenciación de Nucleótidos de Alto Rendimiento
3.
Genome Res ; 27(5): 849-864, 2017 05.
Artículo en Inglés | MEDLINE | ID: mdl-28396521

RESUMEN

The human reference genome assembly plays a central role in nearly all aspects of today's basic and clinical research. GRCh38 is the first coordinate-changing assembly update since 2009; it reflects the resolution of roughly 1000 issues and encompasses modifications ranging from thousands of single base changes to megabase-scale path reorganizations, gap closures, and localization of previously orphaned sequences. We developed a new approach to sequence generation for targeted base updates and used data from new genome mapping technologies and single haplotype resources to identify and resolve larger assembly issues. For the first time, the reference assembly contains sequence-based representations for the centromeres. We also expanded the number of alternate loci to create a reference that provides a more robust representation of human population variation. We demonstrate that the updates render the reference an improved annotation substrate, alter read alignments in unchanged regions, and impact variant interpretation at clinically relevant loci. We additionally evaluated a collection of new de novo long-read haploid assemblies and conclude that although the new assemblies compare favorably to the reference with respect to continuity, error rate, and gene completeness, the reference still provides the best representation for complex genomic regions and coding sequences. We assert that the collected updates in GRCh38 make the newer assembly a more robust substrate for comprehensive analyses that will promote our understanding of human biology and advance our efforts to improve health.


Asunto(s)
Mapeo Contig/métodos , Genoma Humano , Genómica/métodos , Análisis de Secuencia de ADN/métodos , Programas Informáticos , Mapeo Contig/normas , Genómica/normas , Haploidia , Haplotipos , Humanos , Polimorfismo Genético , Estándares de Referencia , Análisis de Secuencia de ADN/normas
4.
Nature ; 496(7446): 498-503, 2013 Apr 25.
Artículo en Inglés | MEDLINE | ID: mdl-23594743

RESUMEN

Zebrafish have become a popular organism for the study of vertebrate gene function. The virtually transparent embryos of this species, and the ability to accelerate genetic studies by gene knockdown or overexpression, have led to the widespread use of zebrafish in the detailed investigation of vertebrate gene function and increasingly, the study of human genetic disease. However, for effective modelling of human genetic disease it is important to understand the extent to which zebrafish genes and gene structures are related to orthologous human genes. To examine this, we generated a high-quality sequence assembly of the zebrafish genome, made up of an overlapping set of completely sequenced large-insert clones that were ordered and oriented using a high-resolution high-density meiotic map. Detailed automatic and manual annotation provides evidence of more than 26,000 protein-coding genes, the largest gene set of any vertebrate so far sequenced. Comparison to the human reference genome shows that approximately 70% of human genes have at least one obvious zebrafish orthologue. In addition, the high quality of this genome assembly provides a clearer understanding of key genomic features such as a unique repeat content, a scarcity of pseudogenes, an enrichment of zebrafish-specific genes on chromosome 4 and chromosomal regions that influence sex determination.


Asunto(s)
Secuencia Conservada/genética , Genoma/genética , Pez Cebra/genética , Animales , Cromosomas/genética , Evolución Molecular , Femenino , Genes/genética , Genoma Humano/genética , Genómica , Humanos , Masculino , Meiosis/genética , Anotación de Secuencia Molecular , Seudogenes/genética , Estándares de Referencia , Procesos de Determinación del Sexo/genética , Proteínas de Pez Cebra/genética
5.
Bioinformatics ; 32(16): 2508-10, 2016 08 15.
Artículo en Inglés | MEDLINE | ID: mdl-27153597

RESUMEN

MOTIVATION: For most research approaches, genome analyses are dependent on the existence of a high quality genome reference assembly. However, the local accuracy of an assembly remains difficult to assess and improve. The gEVAL browser allows the user to interrogate an assembly in any region of the genome by comparing it to different datasets and evaluating the concordance. These analyses include: a wide variety of sequence alignments, comparative analyses of multiple genome assemblies, and consistency with optical and other physical maps. gEVAL highlights allelic variations, regions of low complexity, abnormal coverage, and potential sequence and assembly errors, and offers strategies for improvement. Although gEVAL focuses primarily on sequence integrity, it can also display arbitrary annotation including from Ensembl or TrackHub sources. We provide gEVAL web sites for many human, mouse, zebrafish and chicken assemblies to support the Genome Reference Consortium, and gEVAL is also downloadable to enable its use for any organism and assembly. AVAILABILITY AND IMPLEMENTATION: Web Browser: http://geval.sanger.ac.uk, Plugin: http://wchow.github.io/wtsi-geval-plugin CONTACT: kj2@sanger.ac.uk SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Genómica , Navegador Web , Animales , Genoma , Humanos , Internet , Ratones , Alineación de Secuencia
6.
Curr Biol ; 34(19): 4412-4423.e5, 2024 Oct 07.
Artículo en Inglés | MEDLINE | ID: mdl-39260362

RESUMEN

Oxford ragwort (Senecio squalidus) is one of only two homoploid hybrid species known to have originated very recently, so it is a unique model for determining genomic changes and stabilization following homoploid hybrid speciation. Here, we provide a chromosome-level genome assembly of S. squalidus with 95% of the assembly contained in the 10 longest scaffolds, corresponding to its haploid chromosome number. We annotated 30,249 protein-coding genes and estimated that ∼62% of the genome consists of repetitive elements. We then characterized genome-wide patterns of linkage disequilibrium, polymorphism, and divergence in S. squalidus and its two parental species, finding that (1) linkage disequilibrium is highly heterogeneous, with a region on chromosome 4 showing increased values across all three species but especially in S. squalidus; (2) regions harboring genetic incompatibilities between the two parental species tend to be large, show reduced recombination, and have lower polymorphism in S. squalidus; (3) the two parental species have an unequal contribution (70:30) to the genome of S. squalidus, with long blocks of parent-specific ancestry supporting a very rapid stabilization of the hybrid lineage after hybrid formation; and (4) genomic regions with major parent ancestry exhibit an overrepresentation of loci with evidence for divergent selection occurring between the two parental species on Mount Etna. Our results show that both genetic incompatibilities and natural selection play a role in determining genome-wide reorganization following hybrid speciation and that patterns associated with homoploid hybrid speciation-typically seen in much older systems-can evolve very quickly following hybridization.


Asunto(s)
Especiación Genética , Genoma de Planta , Hibridación Genética , Senecio , Senecio/genética , Desequilibrio de Ligamiento
7.
Wellcome Open Res ; 9: 551, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-39429628

RESUMEN

We present genome assembly from individual female An. coustani (African malaria mosquito; Arthropoda; Insecta; Diptera; Culicidae) from Lopé, Gabon. The genome sequence is 270 megabases in span. Most of the assembly is scaffolded into three chromosomal pseudomolecules with the X sex chromosome assembled for both species. The complete mitochondrial genome was also assembled and is 15.4 kilobases in length.

8.
Wellcome Open Res ; 8: 74, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-37424773

RESUMEN

We present a genome assembly from an individual female Anopheles gambiae (the malaria mosquito; Arthropoda; Insecta; Diptera; Culicidae), Ifakara strain. The genome sequence is 264 megabases in span. Most of the assembly is scaffolded into three chromosomal pseudomolecules with the X sex chromosome assembled. The complete mitochondrial genome was also assembled and is 15.4 kilobases in length.

9.
Wellcome Open Res ; 8: 507, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-38046191

RESUMEN

We present a genome assembly from an individual male Anopheles moucheti (the malaria mosquito; Arthropoda; Insecta; Diptera; Culicidae), from a wild population in Cameroon. The genome sequence is 271 megabases in span. The majority of the assembly is scaffolded into three chromosomal pseudomolecules with the X sex chromosome assembled. The complete mitochondrial genome was also assembled and is 15.5 kilobases in length.

10.
Nat Commun ; 14(1): 3412, 2023 06 09.
Artículo en Inglés | MEDLINE | ID: mdl-37296119

RESUMEN

Numerous novel adaptations characterise the radiation of notothenioids, the dominant fish group in the freezing seas of the Southern Ocean. To improve understanding of the evolution of this iconic fish group, here we generate and analyse new genome assemblies for 24 species covering all major subgroups of the radiation, including five long-read assemblies. We present a new estimate for the onset of the radiation at 10.7 million years ago, based on a time-calibrated phylogeny derived from genome-wide sequence data. We identify a two-fold variation in genome size, driven by expansion of multiple transposable element families, and use the long-read data to reconstruct two evolutionarily important, highly repetitive gene family loci. First, we present the most complete reconstruction to date of the antifreeze glycoprotein gene family, whose emergence enabled survival in sub-zero temperatures, showing the expansion of the antifreeze gene locus from the ancestral to the derived state. Second, we trace the loss of haemoglobin genes in icefishes, the only vertebrates lacking functional haemoglobins, through complete reconstruction of the two haemoglobin gene clusters across notothenioid families. Both the haemoglobin and antifreeze genomic loci are characterised by multiple transposon expansions that may have driven the evolutionary history of these genes.


Asunto(s)
Peces , Perciformes , Animales , Peces/genética , Genómica , Vertebrados , Filogenia , Hemoglobinas/genética , Regiones Antárticas
11.
Cell Rep ; 42(1): 111992, 2023 01 31.
Artículo en Inglés | MEDLINE | ID: mdl-36662619

RESUMEN

Insights into the evolution of non-model organisms are limited by the lack of reference genomes of high accuracy, completeness, and contiguity. Here, we present a chromosome-level, karyotype-validated reference genome and pangenome for the barn swallow (Hirundo rustica). We complement these resources with a reference-free multialignment of the reference genome with other bird genomes and with the most comprehensive catalog of genetic markers for the barn swallow. We identify potentially conserved and accelerated genes using the multialignment and estimate genome-wide linkage disequilibrium using the catalog. We use the pangenome to infer core and accessory genes and to detect variants using it as a reference. Overall, these resources will foster population genomics studies in the barn swallow, enable detection of candidate genes in comparative genomics studies, and help reduce bias toward a single reference genome.


Asunto(s)
Golondrinas , Animales , Golondrinas/genética , Metagenómica , Genoma/genética , Genómica , Cromosomas
12.
Mar Biotechnol (NY) ; 24(3): 655-660, 2022 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-35394576

RESUMEN

The yellowfin seabream, Acanthopagrus latus, is widely distributed throughout the Indo-West Pacific. This species, as a euryhaline Sparidae fish, inhabits in coastal environments with large and frequent salinity fluctuation. So the A. latus can be considered as an ideal species for elucidating the evolutionary mechanism of salinity stress adaption on teleost fish species. Here, a chromosome-scale assembly of A. latus was obtained with PacBio and Hi-C hybrid sequencing strategy. The final assembly genome of A. latus is 685.14 Mbp. The values of contig N50 and scaffold N50 are 14.88 Mbp and 30.72 Mbp, respectively. 29,227 genes were successfully predicted for A. latus in total. Then, the comparative genomics and phylogenetic analysis were employed for investigating the different osmoregulation strategies of salinity stress adaption on multiple whole genome scale of Sparidae species. The highly accurate chromosomal information provides the important genome resources for understanding the osmoregulation evolutionary pattern of the euryhaline Sparidae species.


Asunto(s)
Perciformes , Dorada , Animales , Cromosomas/genética , Perciformes/genética , Filogenia , Estrés Salino , Dorada/genética
13.
Wellcome Open Res ; 7: 287, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-36874567

RESUMEN

We present a genome assembly from an individual female Anopheles funestus (the malaria mosquito; Arthropoda; Insecta; Diptera; Culicidae). The genome sequence is 251 megabases in span. The majority of the assembly is scaffolded into three chromosomal pseudomolecules with the X sex chromosome assembled. The complete mitochondrial genome was also assembled and is 15.4 kilobases in length.

14.
BMC Bioinformatics ; 12: 383, 2011 Sep 29.
Artículo en Inglés | MEDLINE | ID: mdl-21957981

RESUMEN

BACKGROUND: In previous work, we reported the development of caCORRECT, a novel microarray quality control system built to identify and correct spatial artifacts commonly found on Affymetrix arrays. We have made recent improvements to caCORRECT, including the development of a model-based data-replacement strategy and integration with typical microarray workflows via caCORRECT's web portal and caBIG grid services. In this report, we demonstrate that caCORRECT improves the reproducibility and reliability of experimental results across several common Affymetrix microarray platforms. caCORRECT represents an advance over state-of-art quality control methods such as Harshlighting, and acts to improve gene expression calculation techniques such as PLIER, RMA and MAS5.0, because it incorporates spatial information into outlier detection as well as outlier information into probe normalization. The ability of caCORRECT to recover accurate gene expressions from low quality probe intensity data is assessed using a combination of real and synthetic artifacts with PCR follow-up confirmation and the affycomp spike in data. The caCORRECT tool can be accessed at the website: http://cacorrect.bme.gatech.edu. RESULTS: We demonstrate that (1) caCORRECT's artifact-aware normalization avoids the undesirable global data warping that happens when any damaged chips are processed without caCORRECT; (2) When used upstream of RMA, PLIER, or MAS5.0, the data imputation of caCORRECT generally improves the accuracy of microarray gene expression in the presence of artifacts more than using Harshlighting or not using any quality control; (3) Biomarkers selected from artifactual microarray data which have undergone the quality control procedures of caCORRECT are more likely to be reliable, as shown by both spike in and PCR validation experiments. Finally, we present a case study of the use of caCORRECT to reliably identify biomarkers for renal cell carcinoma, yielding two diagnostic biomarkers with potential clinical utility, PRKAB1 and NNMT. CONCLUSIONS: caCORRECT is shown to improve the accuracy of gene expression, and the reproducibility of experimental results in clinical application. This study suggests that caCORRECT will be useful to clean up possible artifacts in new as well as archived microarray data.


Asunto(s)
Perfilación de la Expresión Génica/métodos , Análisis de Secuencia por Matrices de Oligonucleótidos/métodos , Artefactos , Carcinoma de Células Renales/genética , Estudios de Seguimiento , Humanos , Análisis de Secuencia por Matrices de Oligonucleótidos/normas , Control de Calidad , Reproducibilidad de los Resultados
15.
Gigascience ; 10(1)2021 01 09.
Artículo en Inglés | MEDLINE | ID: mdl-33420778

RESUMEN

Genome sequence assemblies provide the basis for our understanding of biology. Generating error-free assemblies is therefore the ultimate, but sadly still unachieved goal of a multitude of research projects. Despite the ever-advancing improvements in data generation, assembly algorithms and pipelines, no automated approach has so far reliably generated near error-free genome assemblies for eukaryotes. Whilst working towards improved datasets and fully automated pipelines, assembly evaluation and curation is actively used to bridge this shortcoming and significantly reduce the number of assembly errors. In addition to this increase in product value, the insights gained from assembly curation are fed back into the automated assembly strategy and contribute to notable improvements in genome assembly quality. We describe our tried and tested approach for assembly curation using gEVAL, the genome evaluation browser. We outline the procedures applied to genome curation using gEVAL and also our recommendations for assembly curation in a gEVAL-independent context to facilitate the uptake of genome curation in the wider community.


Asunto(s)
Genoma , Genómica , Algoritmos , Eucariontes , Programas Informáticos
16.
G3 (Bethesda) ; 11(5)2021 05 07.
Artículo en Inglés | MEDLINE | ID: mdl-33734373

RESUMEN

Hermetia illucens L. (Diptera: Stratiomyidae), the Black Soldier Fly (BSF) is an increasingly important species for bioconversion of organic material into animal feed. We generated a high-quality chromosome-scale genome assembly of the BSF using Pacific Bioscience, 10X Genomics linked read and high-throughput chromosome conformation capture sequencing technology. Scaffolding the final assembly with Hi-C data produced a highly contiguous 1.01 Gb genome with 99.75% of scaffolds assembled into pseudochromosomes representing seven chromosomes with 16.01 Mb contig and 180.46 Mb scaffold N50 values. The highly complete genome obtained a Benchmarking Universal Single-Copy Orthologs (BUSCO) completeness of 98.6%. We masked 67.32% of the genome as repetitive sequences and annotated a total of 16,478 protein-coding genes using the BRAKER2 pipeline. We analyzed an established lab population to investigate the genomic variation and architecture of the BSF revealing six autosomes and an X chromosome. Additionally, we estimated the inbreeding coefficient (1.9%) of the lab population by assessing runs of homozygosity. This provided evidence for inbreeding events including long runs of homozygosity on chromosome 5. The release of this novel chromosome-scale BSF genome assembly will provide an improved resource for further genomic studies, functional characterization of genes of interest and genetic modification of this economically important species.


Asunto(s)
Cromosomas , Dípteros , Animales , Cromosomas/genética , Dípteros/genética , Genoma , Genómica , Secuencias Repetitivas de Ácidos Nucleicos
17.
Genome Biol Evol ; 13(9)2021 09 01.
Artículo en Inglés | MEDLINE | ID: mdl-34499122

RESUMEN

The reed warbler (Acrocephalus scirpaceus) is a long-distance migrant passerine with a wide distribution across Eurasia. This species has fascinated researchers for decades, especially its role as host of a brood parasite, and its capacity for rapid phenotypic change in the face of climate change. Currently, it is expanding its range northwards in Europe, and is altering its migratory behavior in certain areas. Thus, there is great potential to discover signs of recent evolution and its impact on the genomic composition of the reed warbler. Here, we present a high-quality reference genome for the reed warbler, based on PacBio, 10×, and Hi-C sequencing. The genome has an assembly size of 1,075,083,815 bp with a scaffold N50 of 74,438,198 bp and a contig N50 of 12,742,779 bp. BUSCO analysis using aves_odb10 as a model showed that 95.7% of BUSCO genes were complete. We found unequivocal evidence of two separate macrochromosomal fusions in the reed warbler genome, in addition to the previously identified fusion between chromosome Z and a part of chromosome 4A in the Sylvioidea superfamily. We annotated 14,645 protein-coding genes, and a BUSCO analysis of the protein sequences indicated 97.5% completeness. This reference genome will serve as an important resource, and will provide new insights into the genomic effects of evolutionary drivers such as coevolution, range expansion, and adaptations to climate change, as well as chromosomal rearrangements in birds.


Asunto(s)
Passeriformes , Pájaros Cantores , Animales , Cromosomas/genética , Genoma , Genómica , Passeriformes/genética , Pájaros Cantores/genética
18.
Wellcome Open Res ; 6: 225, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-34703904

RESUMEN

We present a genome assembly from a clonal population of Eimeria tenella Houghton parasites (Apicomplexa; Conoidasida; Eucoccidiorida; Eimeriidae). The genome sequence is 53.25 megabases in span. The entire assembly is scaffolded into 15 chromosomal pseudomolecules, with complete mitochondrion and apicoplast organellar genomes also present.

19.
Wellcome Open Res ; 6: 162, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-35600244

RESUMEN

We present a genome assembly from an individual male Arvicola amphibius (the European water vole; Chordata; Mammalia; Rodentia; Cricetidae). The genome sequence is 2.30 gigabases in span. The majority of the assembly is scaffolded into 18 chromosomal pseudomolecules, including the X sex chromosome. Gene annotation of this assembly on Ensembl has identified 21,394 protein coding genes.

20.
Gigascience ; 10(12)2021 12 20.
Artículo en Inglés | MEDLINE | ID: mdl-34927191

RESUMEN

BACKGROUND: The tufted duck is a non-model organism that experiences high mortality in highly pathogenic avian influenza outbreaks. It belongs to the same bird family (Anatidae) as the mallard, one of the best-studied natural hosts of low-pathogenic avian influenza viruses. Studies in non-model bird species are crucial to disentangle the role of the host response in avian influenza virus infection in the natural reservoir. Such endeavour requires a high-quality genome assembly and transcriptome. FINDINGS: This study presents the first high-quality, chromosome-level reference genome assembly of the tufted duck using the Vertebrate Genomes Project pipeline. We sequenced RNA (complementary DNA) from brain, ileum, lung, ovary, spleen, and testis using Illumina short-read and Pacific Biosciences long-read sequencing platforms, which were used for annotation. We found 34 autosomes plus Z and W sex chromosomes in the curated genome assembly, with 99.6% of the sequence assigned to chromosomes. Functional annotation revealed 14,099 protein-coding genes that generate 111,934 transcripts, which implies a mean of 7.9 isoforms per gene. We also identified 246 small RNA families. CONCLUSIONS: This annotated genome contributes to continuing research into the host response in avian influenza virus infections in a natural reservoir. Our findings from a comparison between short-read and long-read reference transcriptomics contribute to a deeper understanding of these competing options. In this study, both technologies complemented each other. We expect this annotation to be a foundation for further comparative and evolutionary genomic studies, including many waterfowl relatives with differing susceptibilities to avian influenza viruses.


Asunto(s)
Patos , Gripe Aviar , Animales , Patos/genética , Femenino , Genoma , Genómica , Humanos , Gripe Aviar/epidemiología , Gripe Aviar/genética , Masculino , Transcriptoma
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA