Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 1.098
Filtrar
Más filtros

Base de datos
Tipo del documento
Intervalo de año de publicación
1.
Microbiol Resour Announc ; : e0088223, 2024 Sep 04.
Artículo en Inglés | MEDLINE | ID: mdl-39230279

RESUMEN

The complete genome assembly of Candida auris strains B11103, B11221, and B11244 is reported in this manuscript. These strains represent the three geographical clades, namely, South Asian (Clade I), South African (Clade III), and South American (Clade IV).

2.
J Thromb Haemost ; 2024 Sep 09.
Artículo en Inglés | MEDLINE | ID: mdl-39260745

RESUMEN

BACKGROUND: Targeted long-read sequencing (LRS) is expected to comprehensively analyse diverse complex variants in haemophilia A (HA) and B (HB), caused by the F8 and F9 genes, respectively. However, its clinical applicability still requires extensive validation. OBJECTIVES: To evaluate the clinical applicability of targeted LRS-based analysis, compared with routine PCR-based methods. METHODS: Gene variants of retrieved subjects were retrospectively and prospectively analysed. Whole-genome sequencing (WGS) was performed to further analyse undiagnosed cases. Breakpoints of novel genomic rearrangements were mapped and validated using long-distance-PCR and long-range-PCR combined with sequencing. RESULTS: Totally, 122 subjects were retrieved. In retrospective analysis of the 90 HA cases, HA-LRS assay showed consistent results in 84 cases compared with routine methods, and characterized six large deletions with their exact breakpoints confirmed by further validation in six cases (routine methods only presented failure in amplifying the involved exons). In prospective analysis of the 21 HA subjects, 20 variants of F8 were identified in 20 cases. For the remaining HA patient, no duplication/deletion or SNV/InDel was found, but a potential recombination involving exons 14 and 21 of F8 was observed by LRS. WGS analysis and further verification defined a 30,478bp tandem repeat involving exons 14-21 of F8. Among the 11 HB patients, HB-LRS analysis detected 11 SNVs/InDels in F9, consistent with routine methods. CONCLUSIONS: Targeted LRS-based analysis is efficient and comprehensive to identify SNVs/InDels and genomic rearrangements of haemophilia genes, especially we first expanding the panel including F9. However, further investigation for complex gross rearrangement is still essential.

3.
J Psychopharmacol ; : 2698811241268899, 2024 Sep 11.
Artículo en Inglés | MEDLINE | ID: mdl-39262167

RESUMEN

BACKGROUND: The enzyme expression (i.e. phenotype) of the Cytochrome P450 2D6 (CYP2D6) gene is highly relevant to the metabolism of psychotropic medications, and therefore to precision medicine (i.e. personalised prescribing). AIMS: This review aims to assess the improvement in CYP2D6 phenotyping sensitivity (IPS) and accuracy (IPA) offered by long-read sequencing (LRS), a new genetic testing technology. METHODS: Human DNA samples that underwent LRS genotyping of CYP2D6 in published, peer-reviewed clinical research were eligible for inclusion. A systematic literature search was conducted until 30 September 2023. CYP2D6 genotypes were translated into phenotypes using the international consensus method. IPS was the percentage of non-normal LRS CYP2D6 phenotypes undetectable with FDA-approved testing (AmpliChip). IPA was the percentage of LRS CYP2D6 phenotypes mischaracterised by non-LRS genetic tests (for samples with LRS and non-LRS data). RESULTS: Six studies and 1411 samples were included. In a meta-analysis of four studies, IPS was 10% overall (95% CI = (2, 18); n = 1385), 20% amongst Oceanians (95% CI = (17, 23); n = 582) and 2% amongst Europeans (95% CI = (1, 4); n = 803). IPA was 4% in a large European cohort (95% CI = (2, 7); n = 567). When LRS was used selectively (e.g. for novel or complex CYP2D6 genotypes), very high figures were observed for IPS (e.g. 88%; 95% CI = (72, 100); n = 17; country = Japan) and IPA (e.g. 76%; 95% CI = (55, 98); n = 17; country = Japan). CONCLUSIONS: LRS improves CYP2D6 phenotyping compared to established genetic tests, particularly amongst Oceanian and Japanese individuals, and those with novel or complex genotypes. LRS may therefore assist in optimising personalised prescribing of psychotropic medications. Further research is needed to determine associated clinical benefits, such as increased medication safety and efficacy.

4.
BMC Med Genomics ; 17(1): 227, 2024 Sep 09.
Artículo en Inglés | MEDLINE | ID: mdl-39251998

RESUMEN

BACKGROUND: Duchenne Muscular Dystrophy (DMD) is an X-linked disorder caused by mutations in the DMD gene, with large deletions being the most common type of mutation. Inversions involving the DMD gene are a less frequent cause of the disorder, largely because they often evade detection by standard diagnostic methods such as multiplex ligation probe amplification (MLPA) and whole exome sequencing (WES). CASE PRESENTATION: Our research identified two intrachromosomal inversions involving the dystrophin gene in two unrelated families through Long-read sequencing (LRS). These variants were subsequently confirmed via Sanger sequencing. The first case involved a pericentric inversion extending from DMD intron 47 to Xq27.3. The second case featured a paracentric inversion between DMD intron 42 and Xp21.1, inherited from the mother. In both cases, simple repeat sequences (SRS) were present at the breakpoints of these inversions. CONCLUSIONS: Our findings demonstrate that LRS is an effective tool for detecting atypical mutations. The identification of SRS at the breakpoints in DMD patients enhances our understanding of the mechanisms underlying structural variations, thereby facilitating the exploration of potential treatments.


Asunto(s)
Inversión Cromosómica , Distrofina , Distrofia Muscular de Duchenne , Humanos , Distrofina/genética , Distrofia Muscular de Duchenne/genética , Masculino , Inversión Cromosómica/genética , Puntos de Rotura del Cromosoma , Femenino , Linaje , Niño , Análisis de Secuencia de ADN
5.
Comput Struct Biotechnol J ; 23: 3186-3198, 2024 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-39263210

RESUMEN

Axolotls are known for their remarkable regeneration ability. Exploring their transcriptome provides insight into regenerative mechanisms. However, the current annotation of the axolotl transcriptome is limited, leaving the role of unannotated transcripts in regeneration unknown. To discourse this challenge, we exploited long-read sequencing technology, which enables direct observation of full-length RNA transcripts, greatly enhancing the coverage and accuracy of axolotl transcriptome annotation. By utilizing this method, we identified 222 novel gene loci and 4775 novel transcripts, which were quantified using short-read sequencing data. Through the inclusive analysis, we discovered novel homologs, potential functional proteins, noncoding RNAs, and alternative splicing events in key regeneration pathways. In particular, we identified novel transcripts with high protein-coding potential implicated in cell cycle regulation and musculoskeletal development, and regeneration were identified. Interestingly, alternative splice variants were also detected across diverse pathways critical to regeneration. This specifies that these novel transcripts potentially play vital roles underpinning the robust regenerative capacities of axolotls. Single-cell transcriptomic analysis further revealed these isoforms to predominantly exist in axolotl limb chondrocytes and mature tissue cell populations. Overall, the findings significantly advanced consideration of the axolotl transcriptome and provided a new perspective for understanding the mechanisms of regenerative abilities of axolotls.

6.
Pract Lab Med ; 41: e00423, 2024 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-39228674

RESUMEN

Background: Long-read sequencing technology, widely used in research, is proving useful in clinical diagnosis, especially for infectious diseases. Despite recent advances, it hasn't been routinely applied to constitutional human diseases. Long-read sequencing detects intronic variants and phases variants, crucial for identifying recessive diseases. Methods: We integrated long-read sequencing into the clinical diagnostic workflow for the MEFV gene, responsible for familial Mediterranean fever (FMF), using a Nanopore-based workflow. This involved long-range PCR amplification, native barcoding kit library preparation, GridION sequencing, and in-house bioinformatics. We compared this new workflow against our validated method using 39 patient samples and 3 samples from an external quality assessment scheme to ensure compliance with ISO15189 standards. Results: Our evaluation demonstrated excellent performance, meeting ISO15189 requirements for reproducibility, repeatability, sensitivity, and specificity. Since October 2022, 150 patient samples were successfully analyzed with no failures. Among these samples, we identified 13 heterozygous carriers of likely pathogenic (LP) or pathogenic (P) variants, 1 patient with a homozygous LP/P variant in MEFV, and 4 patients with compound heterozygous variants. Conclusion: This study represents the first integration of long-read sequencing for FMF clinical diagnosis, achieving 100 % sensitivity and specificity. Our findings highlight its potential to identify pathogenic variants without parental segregation analysis, offering faster, cost-effective, and accurate clinical diagnosis. This successful implementation lays the groundwork for future applications in other constitutional human diseases, advancing precision medicine.

7.
Biol Methods Protoc ; 9(1): bpae057, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-39262440

RESUMEN

Rapid advancements in sequencing technologies have led to significant progress in microbial genomics, yet challenges persist in accurately identifying microbial strain diversity in metagenomic samples, especially when working with noisy long-read data from platforms like Oxford Nanopore Technologies (ONT). In this article, we introduce NanoMGT, a tool designed to enhance marker gene typing in low-complexity mono-species samples, leveraging the unique properties of long reads. NanoMGT excels in its ability to accurately identify mutations amidst high error rates, ensuring the reliable detection of multiple strain-specific marker genes. Our tool implements a novel scoring system that rewards mutations co-occurring across different reads and penalizes densely grouped, likely erroneous variants, thereby achieving a good balance between sensitivity and precision. A comparative evaluation of NanoMGT, using a simulated multi-strain sample of seven bacterial species, demonstrated superior performance relative to existing tools and the advantages of using a threshold-based filtering approach to calling minority variants in ONT's sequencing data. NanoMGT's potential as a post-binning tool in metagenomic pipelines is particularly notable, enabling researchers to more accurately determine specific alleles and understand strain diversity in microbial communities. Our findings have significant implications for clinical diagnostics, environmental microbiology, and the broader field of genomics. The findings offer a reliable and efficient approach to marker gene typing in complex metagenomic samples.

8.
Brief Bioinform ; 25(5)2024 Jul 25.
Artículo en Inglés | MEDLINE | ID: mdl-39256200

RESUMEN

Copy number variations (CNVs) play pivotal roles in disease susceptibility and have been intensively investigated in human disease studies. Long-read sequencing technologies offer opportunities for comprehensive structural variation (SV) detection, and numerous methodologies have been developed recently. Consequently, there is a pressing need to assess these methods and aid researchers in selecting appropriate techniques for CNV detection using long-read sequencing. Hence, we conducted an evaluation of eight CNV calling methods across 22 datasets from nine publicly available samples and 15 simulated datasets, covering multiple sequencing platforms. The overall performance of CNV callers varied substantially and was influenced by the input dataset type, sequencing depth, and CNV type, among others. Specifically, the PacBio CCS sequencing platform outperformed PacBio CLR and Nanopore platforms regarding CNV detection recall rates. A sequencing depth of 10x demonstrated the capability to identify 85% of the CNVs detected in a 50x dataset. Moreover, deletions were more generally detectable than duplications. Among the eight benchmarked methods, cuteSV, Delly, pbsv, and Sniffles2 demonstrated superior accuracy, while SVIM exhibited high recall rates.


Asunto(s)
Algoritmos , Variaciones en el Número de Copia de ADN , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Análisis de Secuencia de ADN/métodos , Biología Computacional/métodos , Genoma Humano
9.
Front Vet Sci ; 11: 1443855, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-39144078

RESUMEN

Introduction: Spillover events of Mycoplasma ovipneumoniae have devastating effects on the wild sheep populations. Multilocus sequence typing (MLST) is used to monitor spillover events and the spread of M. ovipneumoniae between the sheep populations. Most studies involving the typing of M. ovipneumoniae have used Sanger sequencing. However, this technology is time-consuming, expensive, and is not well suited to efficient batch sample processing. Methods: Our study aimed to develop and validate an MLST workflow for typing of M. ovipneumoniae using Nanopore Rapid Barcoding sequencing and multiplex polymerase chain reaction (PCR). We compare the workflow with Nanopore Native Barcoding library preparation and Illumina MiSeq amplicon protocols to determine the most accurate and cost-effective method for sequencing multiplex amplicons. A multiplex PCR was optimized for four housekeeping genes of M. ovipneumoniae using archived DNA samples (N = 68) from nasal swabs. Results: Sequences recovered from Nanopore Rapid Barcoding correctly identified all MLST types with the shortest total workflow time and lowest cost per sample when compared with Nanopore Native Barcoding and Illumina MiSeq methods. Discussion: Our proposed workflow is a convenient and effective method for strain typing of M. ovipneumoniae and can be applied to other bacterial MLST schemes. The workflow is suitable for diagnostic settings, where reduced hands-on time, cost, and multiplexing capabilities are important.

10.
medRxiv ; 2024 Jun 15.
Artículo en Inglés | MEDLINE | ID: mdl-39108517

RESUMEN

Background: Mutations within the genes PRKN and PINK1 are the leading cause of early onset autosomal recessive Parkinson's disease (PD). However, the genetic cause of most early-onset PD (EOPD) cases still remains unresolved. Long-read sequencing has successfully identified many pathogenic structural variants that cause disease, but this technology has not been widely applied to PD. We recently identified the genetic cause of EOPD in a pair of monozygotic twins by uncovering a complex structural variant that spans over 7 Mb, utilizing Oxford Nanopore Technologies (ONT) long-read sequencing. In this study, we aimed to expand on this and assess whether a second variant could be detected with ONT long-read sequencing in other unresolved EOPD cases reported to carry one heterozygous variant in PRKN or PINK1. Methods: ONT long-read sequencing was performed on patients with one reported PRKN/PINK1 pathogenic variant. EOPD patients with an age at onset younger than 50 were included in this study. As a positive control, we also included EOPD patients who had already been identified to carry two known PRKN pathogenic variants. Initial genetic testing was performed using either short-read targeted panel sequencing for single nucleotide variants and multiplex ligation-dependent probe amplification (MLPA) for copy number variants. Results: 48 patients were included in this study (PRKN "one-variant" n = 24, PINK1 "one-variant" n = 12, PRKN "two-variants" n = 12). Using ONT long-read sequencing, we detected a second pathogenic variant in six PRKN "one-variant" patients (26%, 6/23) but none in the PINK1 "one-variant" patients (0%, 0/12). Long-read sequencing identified one case with a complex inversion, two instances of structural variant overlap, and three cases of duplication. In addition, in the positive control PRKN "two-variants" group, we were able to identify both pathogenic variants in PRKN in all the patients (100%, 12/12). Conclusions: This data highlights that ONT long-read sequencing is a powerful tool to identify a pathogenic structural variant at the PRKN locus that is often missed by conventional methods. Therefore, for cases where conventional methods fail to detect a second variant for EOPD, long-read sequencing should be considered as an alternative and complementary approach.

11.
BMC Bioinformatics ; 25(1): 263, 2024 Aug 08.
Artículo en Inglés | MEDLINE | ID: mdl-39118013

RESUMEN

BACKGROUND: Genome assembly, which involves reconstructing a target genome, relies on scaffolding methods to organize and link partially assembled fragments. The rapid evolution of long read sequencing technologies toward more accurate long reads, coupled with the continued use of short read technologies, has created a unique need for hybrid assembly workflows. The construction of accurate genomic scaffolds in hybrid workflows is complicated due to scale, sequencing technology diversity (e.g., short vs. long reads, contigs or partial assemblies), and repetitive regions within a target genome. RESULTS: In this paper, we present a new parallel workflow for hybrid genome scaffolding that would allow combining pre-constructed partial assemblies with newly sequenced long reads toward an improved assembly. More specifically, the workflow, called Maptcha, is aimed at generating long scaffolds of a target genome, from two sets of input sequences-an already constructed partial assembly of contigs, and a set of newly sequenced long reads. Our scaffolding approach internally uses an alignment-free mapping step to build a ⟨ contig,contig ⟩ graph using long reads as linking information. Subsequently, this graph is used to generate scaffolds. We present and evaluate a graph-theoretic "wiring" heuristic to perform this scaffolding step. To enable efficient workload management in a parallel setting, we use a batching technique that partitions the scaffolding tasks so that the more expensive alignment-based assembly step at the end can be efficiently parallelized. This step also allows the use of any standalone assembler for generating the final scaffolds. CONCLUSIONS: Our experiments with Maptcha on a variety of input genomes, and comparison against two state-of-the-art hybrid scaffolders demonstrate that Maptcha is able to generate longer and more accurate scaffolds substantially faster. In almost all cases, the scaffolds produced by Maptcha are at least an order of magnitude longer (in some cases two orders) than the scaffolds produced by state-of-the-art tools. Maptcha runs significantly faster too, reducing time-to-solution from hours to minutes for most input cases. We also performed a coverage experiment by varying the sequencing coverage depth for long reads, which demonstrated the potential of Maptcha to generate significantly longer scaffolds in low coverage settings ( 1 × - 10 × ).


Asunto(s)
Genómica , Flujo de Trabajo , Genómica/métodos , Análisis de Secuencia de ADN/métodos , Programas Informáticos , Genoma , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Algoritmos
12.
Mitochondrial DNA B Resour ; 9(8): 1020-1023, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-39119347

RESUMEN

Heptathela kimurai (Kishida, 1920) is a spider that belongs to the family Heptathelidae which is a basial lineage of spiders. The molecular information of ancestral species belonging to families like Heptathelidae is comparatively limited when compared to spider species from derived families. Here we present the complete mitochondrial genome sequence (mtDNA) of H. kimurai. The sequence was obtained using massively parallel sequencing technology. The circular genome was 14,224 bp in length, and the AT content was 69.53%. The H. kimurai mitochondrial genome contains 13 protein-coding genes (PCGs), 21 tRNA genes, and 2 rRNA genes. The majority of PCGs were found in the heavy strand.

13.
J Pharm Biomed Anal ; 249: 116397, 2024 Oct 15.
Artículo en Inglés | MEDLINE | ID: mdl-39111245

RESUMEN

We proposed a single-color fluorogenic DNA decoding sequencing method designed to improve sequencing accuracy, increase read length and throughput, as well as decrease scanning time. This method involves the incorporation of a mixture of four types of 3'-O-modified nucleotide reversible terminators into each reaction. Among them, two nucleotides are labeled with the same fluorophore, while the remaining two are unlabeled. Only one nucleotide can be extended in each reaction, and an encoding that partially defines base composition can be obtained. Through cyclic interrogation of a template twice with different nucleotide combinations, two sets of encodings are sequentially obtained, enabling the determination of the sequence. We demonstrate the feasibility of this method using established sequencing chemistry, achieving a cycle efficiency of approximately 99.5 %. Notably, this strategy exhibits remarkable efficacy in the detection and correction of sequencing errors, achieving a theoretical error rate of 0.00016 % at a sequencing depth of ×2, which is lower than Sanger sequencing. This method is theoretically compatible with the existing sequencing-by-synthesis (SBS) platforms, and the instrument is simpler, which may facilitate further reductions in sequencing costs, thereby broadening its applications in biology and medicine. Moreover, we demonstrate the capability to detect known mutation sites using information from only a single sequencing run. We validate this approach by accurately identifying a mutation site in the human mitochondrial DNA.


Asunto(s)
Colorantes Fluorescentes , Mutación , Colorantes Fluorescentes/química , Humanos , Análisis de Secuencia de ADN/métodos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , ADN/genética , Genotipo , Técnicas de Genotipaje/métodos , Análisis Mutacional de ADN/métodos , ADN Mitocondrial/genética
14.
mSystems ; : e0024224, 2024 Aug 19.
Artículo en Inglés | MEDLINE | ID: mdl-39158287

RESUMEN

Although long-read sequencing has enabled obtaining high-quality and complete genomes from metagenomes, many challenges still remain to completely decompose a metagenome into its constituent prokaryotic and viral genomes. This study focuses on decomposing an estuarine metagenome to obtain a more accurate estimate of microbial diversity. To achieve this, we developed a new bead-based DNA extraction method, a novel bin refinement method, and obtained 150 Gbp of Nanopore sequencing. We estimate that there are ~500 bacterial and archaeal species in our sample and obtained 68 high-quality bins (>90% complete, <5% contamination, ≤5 contigs, contig length of >100 kbp, and all ribosomal and tRNA genes). We also obtained many contigs of picoeukaryotes, environmental DNA of larger eukaryotes such as mammals, and complete mitochondrial and chloroplast genomes and detected ~40,000 viral populations. Our analysis indicates that there are only a few strains that comprise most of the species abundances. IMPORTANCE: Ocean and estuarine microbiomes play critical roles in global element cycling and ecosystem function. Despite the importance of these microbial communities, many species still have not been cultured in the lab. Environmental sequencing is the primary way the function and population dynamics of these communities can be studied. Long-read sequencing provides an avenue to overcome limitations of short-read technologies to obtain complete microbial genomes but comes with its own technical challenges, such as needed sequencing depth and obtaining high-quality DNA. We present here new sampling and bioinformatics methods to attempt decomposing an estuarine microbiome into its constituent genomes. Our results suggest there are only a few strains that comprise most of the species abundances from viruses to picoeukaryotes, and to fully decompose a metagenome of this diversity requires 1 Tbp of long-read sequencing. We anticipate that as long-read sequencing technologies continue to improve, less sequencing will be needed.

15.
Brief Funct Genomics ; 2024 Aug 19.
Artículo en Inglés | MEDLINE | ID: mdl-39158328

RESUMEN

Long-read sequencing technologies can capture entire RNA transcripts in a single sequencing read, reducing the ambiguity in constructing and quantifying transcript models in comparison to more common and earlier methods, such as short-read sequencing. Recent improvements in the accuracy of long-read sequencing technologies have expanded the scope for novel splice isoform detection and have also enabled a far more accurate reconstruction of complex splicing patterns and transcriptomes. Additionally, the incorporation and advancements of machine learning and deep learning algorithms in bioinformatic software have significantly improved the reliability of long-read sequencing transcriptomic studies. However, there is a lack of consensus on what bioinformatic tools and pipelines produce the most precise and consistent results. Thus, this review aims to discuss and compare the performance of available methods for novel isoform discovery with long-read sequencing technologies, with 25 tools being presented. Furthermore, this review intends to demonstrate the need for developing standard analytical pipelines, tools, and transcript model conventions for novel isoform discovery and transcriptomic studies.

16.
Genome Biol Evol ; 16(8)2024 Aug 05.
Artículo en Inglés | MEDLINE | ID: mdl-39101619

RESUMEN

The plant Arabidopsis thaliana is a model system used by researchers through much of plant research. Recent efforts have focused on discovering the genomic variation found in naturally occurring ecotypes isolated from around the world. These ecotypes have come from diverse climates and therefore have faced and adapted to a variety of abiotic and biotic stressors. The sequencing and comparative analysis of these genomes can offer insight into the adaptive strategies of plants. While there are a large number of ecotype genome sequences available, the majority were created using short-read technology. Mapping of short-reads containing structural variation to a reference genome bereft of that variation leads to incorrect mapping of those reads, resulting in a loss of genetic information and introduction of false heterozygosity. For this reason, long-read de novo sequencing of genomes is required to resolve structural variation events. In this article, we sequenced the genomes of eight natural variants of A. thaliana using nanopore sequencing. This resulted in highly contiguous assemblies with >95% of the genome contained within five contigs. The sequencing results from this study include five ecotypes from relict and African populations, an area of untapped genetic diversity. With this study, we increase the knowledge of diversity we have across A. thaliana ecotypes and contribute to ongoing production of an A. thaliana pan-genome.


Asunto(s)
Arabidopsis , Ecotipo , Genoma de Planta , Arabidopsis/genética , Cromosomas de las Plantas/genética , Anotación de Secuencia Molecular , Variación Genética
17.
G3 (Bethesda) ; 2024 Aug 08.
Artículo en Inglés | MEDLINE | ID: mdl-39115373

RESUMEN

The northern pike Esox lucius is a freshwater fish with low genetic diversity but ecological success throughout the Northern Hemisphere. Here we generate an annotated chromosome-level genome assembly of 941 Mbp in length with 25 chromosome-length scaffolds. We then genotype 47 northern pike from Alaska through New Jersey at a genome-wide scale and characterize a striking decrease in genetic diversity along the sampling range. Individuals west of the North American Continental Divide have substantially higher diversity than those to the east (e.g., Interior Alaska and St. Lawrence River have on average 181K and 64K heterozygous SNPs per individual, or a heterozygous SNP every 5.2 kbp and 14.6 kbp, respectively). Individuals clustered within each population with strong support, with numerous private alleles observed within each population. Evidence for recent population expansion was observed for a Manitoba hatchery and the St. Lawrence population (Tajima's D = -1.07 and -1.30, respectively). Several chromosomes have large regions with elevated diversity, including LG24, which holds amhby, the ancestral sex determining gene. As expected amhby was largely male-specific in Alaska and the Yukon and absent southeast to these populations, but we document some amhby(-) males in Alaska and amhby(+) males in the Columbia River, providing evidence for a patchwork of presence of this system in the western region. These results support the theory that northern pike recolonized North America from refugia in Alaska and expanded following deglaciation from west to east, with probable founder effects resulting in loss of both neutral and functional diversity (e.g., amhby).

18.
Genome Biol ; 25(1): 226, 2024 Aug 19.
Artículo en Inglés | MEDLINE | ID: mdl-39160564

RESUMEN

Long-read sequencing holds great potential for characterizing complex microbial communities, yet taxonomic profiling tools designed specifically for long reads remain lacking. We introduce Melon, a novel marker-based taxonomic profiler that capitalizes on the unique attributes of long reads. Melon employs a two-stage classification scheme to reduce computational time and is equipped with an expectation-maximization-based post-correction module to handle ambiguous reads. Melon achieves superior performance compared to existing tools in both mock and simulated samples. Using wastewater metagenomic samples, we demonstrate the applicability of Melon by showing it provides reliable estimates of overall genome copies, and species-level taxonomic profiles.


Asunto(s)
Metagenómica , Metagenómica/métodos , Metagenoma , Marcadores Genéticos , Aguas Residuales/microbiología , Programas Informáticos
19.
J Hered ; 2024 Aug 22.
Artículo en Inglés | MEDLINE | ID: mdl-39171826

RESUMEN

Pteronarcys californica (Newport 1848) is commonly referred to as the giant salmonfly and is the largest species of stonefly (Insecta: Plecoptera) in the western United States. Historically, it was widespread and abundant in western rivers, but populations have experienced a substantial decline in the past few decades, becoming locally extirpated in numerous rivers in Utah, Colorado, and Montana. Although previous research has explored the ecological variables conducive to the survivability of populations of the giant salmonfly, a lack of genomic resources hampers exploration of how genetic variation is spread across extant populations. To accelerate research on this imperiled species, we present a de novo chromosomal-length genome assembly of P. californica generated from PacBio HiFi sequencing and Hi-C chromosome conformation capture. Our assembly includes 14 predicted pseudo chromosomes and 98.8% of Insecta universal core orthologs. At 2.40 gigabases, the P. californica assembly is the largest of available stonefly assemblies, highlighting at least 9.5-fold variation in assembly size across the order. Repetitive elements (REs) account for much of the genome size increase in P. californica relative to other stonefly species, with the content of Class I retroelements alone exceeding the entire assembly size of all but two other species studied. We also observed preliminary suborder-specific trends in genome size that merit testing with more robust taxon sampling.

20.
G3 (Bethesda) ; 2024 Aug 16.
Artículo en Inglés | MEDLINE | ID: mdl-39148415

RESUMEN

The recent acceleration in genome sequencing targeting previously unexplored parts of the tree of life presents computational challenges. Samples collected from the wild often contain sequences from several organisms, including the target, its cobionts, and contaminants. Effective methods are therefore needed to separate sequences. Though advances in sequencing technology make this task easier, it remains difficult to taxonomically assign sequences from eukaryotic taxa that are not well-represented in databases. Therefore, reference-based methods alone are insufficient. Here, I examine how we can take advantage of differences in sequence composition between organisms to identify symbionts, parasites and contaminants in samples, with minimal reliance on reference data. To this end, I explore data from the Darwin Tree of Life project, including hundreds of high-quality HiFi read sets from insects. Visualising two-dimensional representations of read tetranucleotide composition learned by a Variational Autoencoder can reveal distinct components of a sample. Annotating the embeddings with additional information, such as coding density, estimated coverage, or taxonomic labels allows rapid assessment of the contents of a dataset. The approach scales to millions of sequences, making it possible to explore unassembled read sets, even for large genomes. Combined with interactive visualisation tools, it allows a large fraction of cobionts reported by reference-based screening to be identified. Crucially, it also facilitates retrieving genomes for which suitable reference data are absent.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA