Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 4 de 4
Filtrar
Mais filtros








Base de dados
Intervalo de ano de publicação
1.
bioRxiv ; 2024 Jun 01.
Artigo em Inglês | MEDLINE | ID: mdl-38853857

RESUMO

Despite the widespread adoption of k -mer-based methods in bioinformatics, a fundamental question persists: How can we quantify the influence of k sizes in applications? With no universal answer available, choosing an optimal k size or employing multiple k sizes remains application-specific, arbitrary, and computationally expensive. The assessment of the primary parameter k is typically empirical, based on the end products of applications which pass complex processes of genome analysis, comparison, assembly, alignment, and error correction. The elusiveness of the problem stems from a limited understanding of the transitions of k -mers with respect to k sizes. Indeed, there is considerable room for improving both practice and theory by exploring k -mer-specific quantities across multiple k sizes. This paper introduces an algorithmic framework built upon a novel substring representation: the Prokrustean graph. The primary functionality of this framework is to extract various k -mer-based quantities across a range of k sizes, but its computational complexity depends only on maximal repeats, not on the k range. For example, counting maximal unitigs of de Bruijn graphs for k = 10 , … , 100 takes just a few seconds with a Prokrustean graph built on a read set of gigabases in size. This efficiency sets the graph apart from other substring indices, such as the FM-index, which are normally optimized for string pattern searching rather than for depicting the substring structure across varying lengths. However, the Prokrustean graph is expected to close this gap, as it can be built using the extended Burrows-Wheeler Transform (eBWT) in a space-efficient manner. The framework is particularly useful in pangenome and metagenome analyses, where the demand for precise multi- k approaches is increasing due to the complex and diverse nature of the information being managed. We introduce four applications implemented with the framework that extract key quantities actively utilized in modern pangenomics and metagenomics.

2.
Genome Biol Evol ; 16(3)2024 03 02.
Artigo em Inglês | MEDLINE | ID: mdl-38502059

RESUMO

Siphonophores (Cnidaria: Hydrozoa) are abundant predators found throughout the ocean and are important constituents of the global zooplankton community. They range in length from a few centimeters to tens of meters. They are gelatinous, fragile, and difficult to collect, so many aspects of the biology of these roughly 200 species remain poorly understood. To survey siphonophore genome diversity, we performed Illumina sequencing of 32 species sampled broadly across the phylogeny. Sequencing depth was sufficient to estimate nuclear genome size from k-mer spectra in six specimens, ranging from 0.7 to 2.3 Gb, with heterozygosity estimates between 0.69% and 2.32%. Incremental k-mer counting indicates k-mer peaks can be absent with nearly 20× read coverage, suggesting minimum genome sizes range from 1.4 to 5.6 Gb in the 25 samples without peaks in the k-mer spectra. This work confirms most siphonophore nuclear genomes are large relative to the genomes of other cnidarians, but also identifies several with reduced size that are tractable targets for future siphonophore nuclear genome assembly projects. We also assembled complete mitochondrial genomes for 33 specimens from these new data, indicating a conserved gene order shared among nonsiphonophore hydrozoans, Cystonectae, and some Physonectae, revealing the ancestral mitochondrial gene order of siphonophores. Our results also suggest extensive rearrangement of mitochondrial genomes within other Physonectae and in Calycophorae. Though siphonophores comprise a small fraction of cnidarian species, this survey greatly expands our understanding of cnidarian genome diversity. This study further illustrates both the importance of deep phylogenetic sampling and the utility of k-mer-based genome skimming in understanding the genomic diversity of a clade.


Assuntos
Cnidários , Genoma Mitocondrial , Hidrozoários , Animais , Cnidários/genética , Filogenia , Hidrozoários/genética , Genômica , Tamanho do Genoma
3.
Microbiol Spectr ; 12(2): e0366923, 2024 Feb 06.
Artigo em Inglês | MEDLINE | ID: mdl-38214524

RESUMO

Microsporidia are obligate intracellular eukaryotic parasites with an extremely broad host range. They have both economic and public health importance. Ploidy in microsporidia is variable, with a few species formally identified as diploid and one as polyploid. Given the increase in the number of studies sequencing microsporidian genomes, it is now possible to assess ploidy levels across all currently explored microsporidian diversity. We estimate ploidy for all microsporidian data sets available on the Sequence Read Archive using k-mer-based analyses, indicating that polyploidy is widespread in Microsporidia and that ploidy change is dynamic in the group. Using genome-wide heterozygosity estimates, we also show that polyploid microsporidian genomes are relatively homozygous, and we discuss the implications of these findings on the timing of polyploidization events and their origin.IMPORTANCEMicrosporidia are single-celled intracellular parasites, distantly related to fungi, that can infect a broad range of hosts, from humans all the way to protozoans. Exploiting the wealth of microsporidian genomic data available, we use k-mer-based analyses to assess ploidy status across the group. Understanding a genome's ploidy is crucial in order to assemble it effectively and may also be relevant for better understanding a parasite's behavior and life cycle. We show that tetraploidy is present in at least six species in Microsporidia and that these polyploidization events are likely to have occurred independently. We discuss why these findings may be paradoxical, given that Microsporidia, like other intracellular parasites, have extremely small, reduced genomes.


Assuntos
Microsporídios , Humanos , Filogenia , Evolução Molecular , Genoma Fúngico , Poliploidia
4.
BMC Evol Biol ; 20(1): 157, 2020 11 23.
Artigo em Inglês | MEDLINE | ID: mdl-33228538

RESUMO

BACKGROUND: K-mer spectra of DNA sequences contain important information about sequence composition and sequence evolution. We want to reveal the evolution rules of genome sequences by studying the k-mer spectra of genome sequences. RESULTS: The intrinsic laws of k-mer spectra of 920 genome sequences from primate to prokaryote were analyzed. We found that there are two types of evolution selection modes in genome sequences, named as CG Independent Selection and TA Independent Selection. There is a mutual inhibition relationship between CG and TA independent selections. We found that the intensity of CG and TA independent selections correlates closely with genome evolution and G + C content of genome sequences. The living habits of species are related closely to the independent selection modes adopted by species genomes. Consequently, we proposed an evolution mechanism of genomes in which the genome evolution is determined by the intensities of the CG and TA independent selections and the mutual inhibition relationship. Besides, by the evolution mechanism of genomes, we speculated the evolution modes of prokaryotes in mild and extreme environments in the anaerobic age and the evolving process of prokaryotes from anaerobic to aerobic environment on earth as well as the originations of different eukaryotes. CONCLUSION: We found that there are two independent selection modes in genome sequences. The evolution of genome sequence is determined by the two independent selection modes and the mutual inhibition relationship between them.


Assuntos
Eucariotos , Evolução Molecular , Genoma , Animais , Composição de Bases , Genoma/genética , Células Procarióticas , Seleção Genética
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA