RESUMO
BACKGROUND: PacBio high fidelity (HiFi) sequencing reads are both long (15-20 kb) and highly accurate (> Q20). Because of these properties, they have revolutionised genome assembly leading to more accurate and contiguous genomes. In eukaryotes the mitochondrial genome is sequenced alongside the nuclear genome often at very high coverage. A dedicated tool for mitochondrial genome assembly using HiFi reads is still missing. RESULTS: MitoHiFi was developed within the Darwin Tree of Life Project to assemble mitochondrial genomes from the HiFi reads generated for target species. The input for MitoHiFi is either the raw reads or the assembled contigs, and the tool outputs a mitochondrial genome sequence fasta file along with annotation of protein and RNA genes. Variants arising from heteroplasmy are assembled independently, and nuclear insertions of mitochondrial sequences are identified and not used in organellar genome assembly. MitoHiFi has been used to assemble 374 mitochondrial genomes (368 Metazoa and 6 Fungi species) for the Darwin Tree of Life Project, the Vertebrate Genomes Project and the Aquatic Symbiosis Genome Project. Inspection of 60 mitochondrial genomes assembled with MitoHiFi for species that already have reference sequences in public databases showed the widespread presence of previously unreported repeats. CONCLUSIONS: MitoHiFi is able to assemble mitochondrial genomes from a wide phylogenetic range of taxa from Pacbio HiFi data. MitoHiFi is written in python and is freely available on GitHub ( https://github.com/marcelauliano/MitoHiFi ). MitoHiFi is available with its dependencies as a Docker container on GitHub (ghcr.io/marcelauliano/mitohifi:master).
Assuntos
Genoma Mitocondrial , Filogenia , RNA , Eucariotos , Análise de Sequência de DNA , Sequenciamento de Nucleotídeos em Larga EscalaRESUMO
The Puma lineage within the family Felidae consists of 3 species that last shared a common ancestor around 4.9 million years ago. Whole-genome sequences of 2 species from the lineage were previously reported: the cheetah (Acinonyx jubatus) and the mountain lion (Puma concolor). The present report describes a whole-genome assembly of the remaining species, the jaguarundi (Puma yagouaroundi). We sequenced the genome of a male jaguarundi with 10X Genomics linked reads and assembled the whole-genome sequence. The assembled genome contains a series of scaffolds that reach the length of chromosome arms and is similar in scaffold contiguity to the genome assemblies of cheetah and puma, with a contig N50 = 100.2 kbp and a scaffold N50 = 49.27 Mbp. We assessed the assembled sequence of the jaguarundi genome using BUSCO, aligned reads of the sequenced individual and another published female jaguarundi to the assembled genome, annotated protein-coding genes, repeats, genomic variants and their effects with respect to the protein-coding genes, and analyzed differences of the 2 jaguarundis from the reference mitochondrial genome. The jaguarundi genome assembly and its annotation were compared in quality, variants, and features to the previously reported genome assemblies of puma and cheetah. Computational analyzes used in the study were implemented in transparent and reproducible way to allow their further reuse and modification.
Assuntos
Felidae , Puma , Animais , Feminino , Genoma , Genômica , Masculino , Anotação de Sequência Molecular , Puma/genéticaRESUMO
The Russian Federation is the largest and one of the most ethnically diverse countries in the world, however no centralized reference database of genetic variation exists to date. Such data are crucial for medical genetics and essential for studying population history. The Genome Russia Project aims at filling this gap by performing whole genome sequencing and analysis of peoples of the Russian Federation. Here we report the characterization of genome-wide variation of 264 healthy adults, including 60 newly sequenced samples. People of Russia carry known and novel genetic variants of adaptive, clinical and functional consequence that in many cases show allele frequency divergence from neighboring populations. Population genetics analyses revealed six phylogeographic partitions among indigenous ethnicities corresponding to their geographic locales. This study presents a characterization of population-specific genomic variation in Russia with results important for medical genetics and for understanding the dynamic population history of the world's largest country.
Assuntos
Variação Genética , Adulto , Doenças Transmissíveis/genética , Demografia , Haplótipos , Humanos , Mutação INDEL , Farmacogenética , Fenótipo , Filogeografia , Polimorfismo de Nucleotídeo Único , Federação Russa/etnologia , Seleção Genética , Sequenciamento Completo do GenomaRESUMO
Pangolins, unique mammals with scales over most of their body, no teeth, poor vision, and an acute olfactory system, comprise the only placental order (Pholidota) without a whole-genome map. To investigate pangolin biology and evolution, we developed genome assemblies of the Malayan (Manis javanica) and Chinese (M. pentadactyla) pangolins. Strikingly, we found that interferon epsilon (IFNE), exclusively expressed in epithelial cells and important in skin and mucosal immunity, is pseudogenized in all African and Asian pangolin species that we examined, perhaps impacting resistance to infection. We propose that scale development was an innovation that provided protection against injuries or stress and reduced pangolin vulnerability to infection. Further evidence of specialized adaptations was evident from positively selected genes involving immunity-related pathways, inflammation, energy storage and metabolism, muscular and nervous systems, and scale/hair development. Olfactory receptor gene families are significantly expanded in pangolins, reflecting their well-developed olfaction system. This study provides insights into mammalian adaptation and functional diversification, new research tools and questions, and perhaps a new natural IFNE-deficient animal model for studying mammalian immunity.
Assuntos
Escamas de Animais/anatomia & histologia , Evolução Molecular , Genoma , Imunidade Inata/genética , Mamíferos/genética , Adaptação Fisiológica , Animais , Espécies em Perigo de Extinção , Interferons/genética , Mamíferos/anatomia & histologia , Mamíferos/classificação , Mamíferos/imunologia , Receptores Odorantes/genéticaRESUMO
Whole-genome analysis of Mycobacterium tuberculosis isolates collected in Russia (N = 71) from patients with tuberculous spondylitis supports a detailed characterization of pathogen strain distributions and drug resistance phenotype, plus distinguished occurrence and association of known resistance mutations. We identify known and novel genome determinants related to bacterial virulence, pathogenicity, and drug resistance.
Assuntos
Genoma Bacteriano , Mycobacterium tuberculosis/genética , Espondilite/epidemiologia , Espondilite/microbiologia , Tuberculose/epidemiologia , Tuberculose/microbiologia , Sequenciamento Completo do Genoma , Antituberculosos/farmacologia , Farmacorresistência Bacteriana , Geografia , Humanos , Testes de Sensibilidade Microbiana , Mutação , Mycobacterium tuberculosis/efeitos dos fármacos , Filogenia , Federação Russa/epidemiologia , VirulênciaRESUMO
We present genome assembly from individual female An. coustani (African malaria mosquito; Arthropoda; Insecta; Diptera; Culicidae) from Lopé, Gabon. The genome sequence is 270 megabases in span. Most of the assembly is scaffolded into three chromosomal pseudomolecules with the X sex chromosome assembled for both species. The complete mitochondrial genome was also assembled and is 15.4 kilobases in length.
RESUMO
Pusa sibirica, the Baikal seal, is the only extant, exclusively freshwater, pinniped species. The pending issue is, how and when they reached their current habitat-the rift lake Baikal, more than three thousand kilometers away from the Arctic Ocean. To explore the demographic history and genetic diversity of this species, we generated a de novo chromosome-length assembly, and compared it with three closely related marine pinniped species. Multiple whole genome alignment of the four species compared with their karyotypes showed high conservation of chromosomal features, except for three large inversions on chromosome VI. We found the mean heterozygosity of the studied Baikal seal individuals was relatively low (0.61 SNPs/kbp), but comparable to other analyzed pinniped samples. Demographic reconstruction of seals revealed differing trajectories, yet remarkable variations in Ne occurred during approximately the same time periods. The Baikal seal showed a significantly more severe decline relative to other species. This could be due to the difference in environmental conditions encountered by the earlier populations of Baikal seals, as ice sheets changed during glacial-interglacial cycles. We connect this period to the time of migration to Lake Baikal, which occurred ~3-0.3 Mya, after which the population stabilized, indicating balanced habitat conditions.
Assuntos
Lagos , Focas Verdadeiras , Animais , Focas Verdadeiras/genética , CariótipoRESUMO
We present a genome assembly from an individual female Anopheles gambiae (the malaria mosquito; Arthropoda; Insecta; Diptera; Culicidae), Ifakara strain. The genome sequence is 264 megabases in span. Most of the assembly is scaffolded into three chromosomal pseudomolecules with the X sex chromosome assembled. The complete mitochondrial genome was also assembled and is 15.4 kilobases in length.
RESUMO
We present a genome assembly from an individual male Anopheles moucheti (the malaria mosquito; Arthropoda; Insecta; Diptera; Culicidae), from a wild population in Cameroon. The genome sequence is 271 megabases in span. The majority of the assembly is scaffolded into three chromosomal pseudomolecules with the X sex chromosome assembled. The complete mitochondrial genome was also assembled and is 15.5 kilobases in length.
RESUMO
We present a genome assembly from an individual female Anopheles funestus (the malaria mosquito; Arthropoda; Insecta; Diptera; Culicidae). The genome sequence is 251 megabases in span. The majority of the assembly is scaffolded into three chromosomal pseudomolecules with the X sex chromosome assembled. The complete mitochondrial genome was also assembled and is 15.4 kilobases in length.
RESUMO
BACKGROUND: Large-scale sequencing projects provide high-quality full-genome data that can be used for reconstruction of chromosomal exchanges and rearrangements that disrupt conserved syntenic blocks. The highest resolution of cross-species homology can be obtained on the basis of whole-genome, reference-free alignments. Very large multiple alignments of full-genome sequence stored in a binary format demand an accurate and efficient computational approach for synteny block production. FINDINGS: halSynteny performs efficient processing of pairwise alignment blocks for any pair of genomes in the alignment. The tool is part of the HAL comparative genomics suite and is targeted to build synteny blocks for multi-hundred-way, reference-free vertebrate alignments built with the Cactus system. CONCLUSIONS: halSynteny enables an accurate and rapid identification of synteny in multiple full-genome alignments. The method is implemented in C++11 as a component of the halTools software and released under MIT license. The package is available at https://github.com/ComparativeGenomicsToolkit/hal/.
Assuntos
Algoritmos , Biologia Computacional/métodos , Genômica/métodos , Software , Reprodutibilidade dos Testes , Alinhamento de Sequência/métodos , SinteniaRESUMO
Mycobacterium tuberculosis is a highly studied pathogen due to public health importance. Despite this, problems like early drug resistance, diagnostics and treatment success prediction are still not fully resolved. Here, we analyze the incidence of point mutations widely used for drug resistance detection in laboratory practice and conduct comparative analysis of whole-genome sequence (WGS) for clinical M. tuberculosis strains collected from patients with pulmonary tuberculosis (PTB) and extra-pulmonary tuberculosis (XPTB) localization. A total of 72 pulmonary and 73 extrapulmonary microbiologically characterized M. tuberculosis isolates were collected from patients from 2007 to 2014 in Russia. Genomic DNA was used for WGS and obtained data allowed identifying major mutations known to be associated with drug resistance to first-line and second-line antituberculous drugs. In some cases previously described mutations were not identified. Using genome-based phylogenetic analysis we identified M. tuberculosis substrains associated with distinctions in the occurrence in PTB vs. XPTB cases. Phylogenetic analyses did reveal M. tuberculosis genetic substrains associated with TB localization. XPTB was associated with Beijing sublineages Central Asia (Beijing CAO), Central Asia Clade A (Beijing A) and 4.8 groups, while PTB localization was associated with group LAM (4.3). Further, the XPTB strain in some cases showed elevated drug resistance patterns relative to PTB isolates. HIV was significantly associated with the development of XPTB in the Beijing B0/W148 group and among unclustered Beijing isolates.
RESUMO
Genome-wide assessment of genetic diversity has the potential to increase the ability to understand admixture, inbreeding, kinship and erosion of genetic diversity affecting both captive (ex situ) and wild (in situ) populations of threatened species. The sable antelope (Hippotragus niger), native to the savannah woodlands of sub-Saharan Africa, is a species that is being managed ex situ in both public (zoo) and private (ranch) collections in the United States. Our objective was to develop whole genome sequence resources that will serve as a foundation for characterizing the genetic status of ex situ populations of sable antelope relative to populations in the wild. Here we report the draft genome assembly of a male sable antelope, a member of the subfamily Hippotraginae (Bovidae, Cetartiodactyla, Mammalia). The 2.596 Gb draft genome consists of 136,528 contigs with an N50 of 45.5 Kbp and 16,927 scaffolds with an N50 of 4.59 Mbp. De novo annotation identified 18,828 protein-coding genes and repetitive sequences encompassing 46.97% of the genome. The discovery of single nucleotide variants (SNVs) was assisted by the re-sequencing of seven additional captive and wild individuals, representing two different subspecies, leading to the identification of 1,987,710 bi-allelic SNVs. Assembly of the mitochondrial genomes revealed that each individual was defined by a unique haplotype and these data were used to infer the mitochondrial gene tree relative to other hippotragine species. The sable antelope genome constitutes a valuable resource for assessing genome-wide diversity and evolutionary potential, thereby facilitating long-term conservation of this charismatic species.
Assuntos
Antílopes/genética , Genoma , Genômica , Sequenciamento Completo do Genoma , Animais , Antílopes/classificação , Biodiversidade , Evolução Biológica , Biologia Computacional/métodos , Feminino , Variação Genética , Genética Populacional , Genoma Mitocondrial , Genômica/métodos , Masculino , Anotação de Sequência Molecular , Fenótipo , Filogenia , Estados UnidosRESUMO
A comparative analysis of whole genome sequencing (WGS) and genotype calling was initiated for ten human genome samples sequenced by St. Petersburg State University Peterhof Sequencing Center and by three commercial sequencing centers outside of Russia. The sequence quality, efficiency of DNA variant and genotype calling were compared with each other and with DNA microarrays for each of ten study subjects. We assessed calling of SNPs, indels, copy number variation, and the speed of WGS throughput promised. Twenty separate QC analyses showed high similarities among the sequence quality and called genotypes. The ten genomes tested by the centers included eight American patients afflicted with autoimmune hepatitis (AIH), plus one case's unaffected parents, in a prelude to discovering genetic influences in this rare disease of unknown etiology. The detailed internal replication and parallel analyses allowed the observation of two of eight AIH cases carrying a rare allele genotype for a previously described AIH-associated gene (FTCD), plus multiple occurrences of known HLA-DRB1 alleles associated with AIH (HLA-DRB1-03:01:01, 13:01:01 and 7:01:01). We also list putative SNVs in other genes as suggestive in AIH influence.
Assuntos
Técnicas de Genotipagem , Hepatite Autoimune/genética , Sequenciamento Completo do Genoma , Adolescente , Amônia-Liases/genética , Criança , Pré-Escolar , Estudos de Coortes , Variações do Número de Cópias de DNA , Feminino , Predisposição Genética para Doença , Glutamato Formimidoiltransferase/genética , Cadeias HLA-DRB1/genética , Humanos , Mutação INDEL , Masculino , Enzimas Multifuncionais , Análise de Sequência com Séries de Oligonucleotídeos , Polimorfismo de Nucleotídeo Único , Controle de Qualidade , Federação Russa , Fatores de TempoRESUMO
Solenodons are insectivores that live in Hispaniola and Cuba. They form an isolated branch in the tree of placental mammals that are highly divergent from other eulipothyplan insectivores The history, unique biology, and adaptations of these enigmatic venomous species could be illuminated by the availability of genome data. However, a whole genome assembly for solenodons has not been previously performed, partially due to the difficulty in obtaining samples from the field. Island isolation and reduced numbers have likely resulted in high homozygosity within the Hispaniolan solenodon (Solenodon paradoxus). Thus, we tested the performance of several assembly strategies on the genome of this genetically impoverished species. The string graph-based assembly strategy seemed a better choice compared to the conventional de Bruijn graph approach due to the high levels of homozygosity, which is often a hallmark of endemic or endangered species. A consensus reference genome was assembled from sequences of 5 individuals from the southern subspecies (S. p. woodi). In addition, we obtained an additional sequence from 1 sample of the northern subspecies (S. p. paradoxus). The resulting genome assemblies were compared to each other and annotated for genes, with an emphasis on venom genes, repeats, variable microsatellite loci, and other genomic variants. Phylogenetic positioning and selection signatures were inferred based on 4,416 single-copy orthologs from 10 other mammals. We estimated that solenodons diverged from other extant mammals 73.6 million years ago. Patterns of single-nucleotide polymorphism variation allowed us to infer population demography, which supported a subspecies split within the Hispaniolan solenodon at least 300 thousand years ago.
Assuntos
Evolução Biológica , Sequência Conservada/genética , Espécies em Perigo de Extinção , Ilhas , Mamíferos/genética , Análise de Sequência de DNA/métodos , Animais , Cuba , Genoma , Heterozigoto , Especificidade da EspécieRESUMO
BACKGROUND: As the number of sequenced genomes rapidly increases, chromosome assembly is becoming an even more crucial step of any genome study. Since de novo chromosome assemblies are confounded by repeat-mediated artifacts, reference-assisted assemblies that use comparative inference have become widely used, prompting the development of several reference-assisted assembly programs for prokaryotic and eukaryotic genomes. FINDINGS: We developed Chromosomer - a reference-based genome arrangement tool, which rapidly builds chromosomes from genome contigs or scaffolds using their alignments to a reference genome of a closely related species. Chromosomer does not require mate-pair libraries and it offers a number of auxiliary tools that implement common operations accompanying the genome assembly process. CONCLUSIONS: Despite implementing a straightforward alignment-based approach, Chromosomer is a useful tool for genomic analysis of species without chromosome maps. Putative chromosome assemblies by Chromosomer can be used in comparative genomic analysis, genomic variation assessment, potential linkage group inference and other kinds of analysis involving contig or scaffold mapping to a high-quality assembly.
Assuntos
Mapeamento Cromossômico/métodos , Mapeamento de Sequências Contíguas/métodos , Genoma , Alinhamento de Sequência , SoftwareRESUMO
Pangolins (order Pholidota) are the only mammals covered by scales. We have recently sequenced and analyzed the genomes of two critically endangered Asian pangolin species, namely the Malayan pangolin (Manis javanica) and the Chinese pangolin (Manis pentadactyla). These complete genome sequences will serve as reference sequences for future research to address issues of species conservation and to advance knowledge in mammalian biology and evolution. To further facilitate the global research effort in pangolin biology, we developed the Pangolin Genome Database (PGD), as a future hub for hosting pangolin genomic and transcriptomic data and annotations, and with useful analysis tools for the research community. Currently, the PGD provides the reference pangolin genome and transcriptome data, gene sequences and functional information, expressed transcripts, pseudogenes, genomic variations, organ-specific expression data and other useful annotations. We anticipate that the PGD will be an invaluable platform for researchers who are interested in pangolin and mammalian research. We will continue updating this hub by including more data, annotation and analysis tools particularly from our research consortium.Database URL: http://pangolin-genome.um.edu.my.
Assuntos
Bases de Dados Genéticas , Genoma , Mamíferos , Animais , Etiquetas de Sequências Expressas , Variação Genética , Mamíferos/genética , Mamíferos/metabolismo , Anotação de Sequência Molecular , Pseudogenes , Transcriptoma/fisiologiaRESUMO
BACKGROUND: Patterns of genetic and genomic variance are informative in inferring population history for human, model species and endangered populations. RESULTS: Here the genome sequence of wild-born African cheetahs reveals extreme genomic depletion in SNV incidence, SNV density, SNVs of coding genes, MHC class I and II genes, and mitochondrial DNA SNVs. Cheetah genomes are on average 95 % homozygous compared to the genomes of the outbred domestic cat (24.08 % homozygous), Virunga Mountain Gorilla (78.12 %), inbred Abyssinian cat (62.63 %), Tasmanian devil, domestic dog and other mammalian species. Demographic estimators impute two ancestral population bottlenecks: one >100,000 years ago coincident with cheetah migrations out of the Americas and into Eurasia and Africa, and a second 11,084-12,589 years ago in Africa coincident with late Pleistocene large mammal extinctions. MHC class I gene loss and dramatic reduction in functional diversity of MHC genes would explain why cheetahs ablate skin graft rejection among unrelated individuals. Significant excess of non-synonymous mutations in AKAP4 (p<0.02), a gene mediating spermatozoon development, indicates cheetah fixation of five function-damaging amino acid variants distinct from AKAP4 homologues of other Felidae or mammals; AKAP4 dysfunction may cause the cheetah's extremely high (>80 %) pleiomorphic sperm. CONCLUSIONS: The study provides an unprecedented genomic perspective for the rare cheetah, with potential relevance to the species' natural history, physiological adaptations and unique reproductive disposition.