Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 794
Filter
1.
G3 (Bethesda) ; 2024 Aug 16.
Article in English | MEDLINE | ID: mdl-39148415

ABSTRACT

The recent acceleration in genome sequencing targeting previously unexplored parts of the tree of life presents computational challenges. Samples collected from the wild often contain sequences from several organisms, including the target, its cobionts, and contaminants. Effective methods are therefore needed to separate sequences. Though advances in sequencing technology make this task easier, it remains difficult to taxonomically assign sequences from eukaryotic taxa that are not well-represented in databases. Therefore, reference-based methods alone are insufficient. Here, I examine how we can take advantage of differences in sequence composition between organisms to identify symbionts, parasites and contaminants in samples, with minimal reliance on reference data. To this end, I explore data from the Darwin Tree of Life project, including hundreds of high-quality HiFi read sets from insects. Visualising two-dimensional representations of read tetranucleotide composition learned by a Variational Autoencoder can reveal distinct components of a sample. Annotating the embeddings with additional information, such as coding density, estimated coverage, or taxonomic labels allows rapid assessment of the contents of a dataset. The approach scales to millions of sequences, making it possible to explore unassembled read sets, even for large genomes. Combined with interactive visualisation tools, it allows a large fraction of cobionts reported by reference-based screening to be identified. Crucially, it also facilitates retrieving genomes for which suitable reference data are absent.

2.
medRxiv ; 2024 Jun 15.
Article in English | MEDLINE | ID: mdl-39108517

ABSTRACT

Background: Mutations within the genes PRKN and PINK1 are the leading cause of early onset autosomal recessive Parkinson's disease (PD). However, the genetic cause of most early-onset PD (EOPD) cases still remains unresolved. Long-read sequencing has successfully identified many pathogenic structural variants that cause disease, but this technology has not been widely applied to PD. We recently identified the genetic cause of EOPD in a pair of monozygotic twins by uncovering a complex structural variant that spans over 7 Mb, utilizing Oxford Nanopore Technologies (ONT) long-read sequencing. In this study, we aimed to expand on this and assess whether a second variant could be detected with ONT long-read sequencing in other unresolved EOPD cases reported to carry one heterozygous variant in PRKN or PINK1. Methods: ONT long-read sequencing was performed on patients with one reported PRKN/PINK1 pathogenic variant. EOPD patients with an age at onset younger than 50 were included in this study. As a positive control, we also included EOPD patients who had already been identified to carry two known PRKN pathogenic variants. Initial genetic testing was performed using either short-read targeted panel sequencing for single nucleotide variants and multiplex ligation-dependent probe amplification (MLPA) for copy number variants. Results: 48 patients were included in this study (PRKN "one-variant" n = 24, PINK1 "one-variant" n = 12, PRKN "two-variants" n = 12). Using ONT long-read sequencing, we detected a second pathogenic variant in six PRKN "one-variant" patients (26%, 6/23) but none in the PINK1 "one-variant" patients (0%, 0/12). Long-read sequencing identified one case with a complex inversion, two instances of structural variant overlap, and three cases of duplication. In addition, in the positive control PRKN "two-variants" group, we were able to identify both pathogenic variants in PRKN in all the patients (100%, 12/12). Conclusions: This data highlights that ONT long-read sequencing is a powerful tool to identify a pathogenic structural variant at the PRKN locus that is often missed by conventional methods. Therefore, for cases where conventional methods fail to detect a second variant for EOPD, long-read sequencing should be considered as an alternative and complementary approach.

3.
Cells ; 13(15)2024 Jul 26.
Article in English | MEDLINE | ID: mdl-39120292

ABSTRACT

Biallelic variants in USH2A are associated with retinitis pigmentosa (RP) and Type 2 Usher Syndrome (USH2), leading to impaired vision and, additionally, hearing loss in the latter. Although the introduction of next-generation sequencing into clinical diagnostics has led to a significant uplift in molecular diagnostic rates, many patients remain molecularly unsolved. It is thought that non-coding variants or variants of uncertain significance contribute significantly to this diagnostic gap. This study aims to demonstrate the clinical utility of the reverse transcription-polymerase chain reaction (RT-PCR)-Oxford Nanopore Technology (ONT) sequencing of USH2A mRNA transcripts from nasal epithelial cells to determine the splice-altering effect of candidate variants. Five affected individuals with USH2 or non-syndromic RP who had undergone whole genome sequencing were recruited for further investigation. All individuals had uncertain genotypes in USH2A, including deep intronic rare variants, c.8682-654C>G, c.9055+389G>A, and c.9959-2971C>T; a synonymous variant of uncertain significance, c.2139C>T; p.(Gly713=); and a predicted loss of function duplication spanning an intron/exon boundary, c.3812-3_3837dup p.(Met1280Ter). In silico assessment using SpliceAI provided splice-altering predictions for all candidate variants which were investigated using ONT sequencing. All predictions were found to be accurate; however, in the case of c.3812-3_3837dup, the outcome was a complex cryptic splicing pattern with predominant in-frame exon 18 skipping and a low level of exon 18 inclusion leading to the predicted stop gain. This study detected and functionally characterised simple and complex mis-splicing patterns in USH2A arising from previously unknown deep intronic variants and previously reported variants of uncertain significance, confirming the pathogenicity of the variants.


Subject(s)
Extracellular Matrix Proteins , RNA Splicing , Usher Syndromes , Humans , Extracellular Matrix Proteins/genetics , Usher Syndromes/genetics , Female , Male , RNA Splicing/genetics , High-Throughput Nucleotide Sequencing/methods , Exons/genetics , Mutation/genetics , Retinitis Pigmentosa/genetics , Adult , RNA, Messenger/genetics , RNA, Messenger/metabolism , Introns/genetics , Middle Aged
4.
Front Vet Sci ; 11: 1443855, 2024.
Article in English | MEDLINE | ID: mdl-39144078

ABSTRACT

Introduction: Spillover events of Mycoplasma ovipneumoniae have devastating effects on the wild sheep populations. Multilocus sequence typing (MLST) is used to monitor spillover events and the spread of M. ovipneumoniae between the sheep populations. Most studies involving the typing of M. ovipneumoniae have used Sanger sequencing. However, this technology is time-consuming, expensive, and is not well suited to efficient batch sample processing. Methods: Our study aimed to develop and validate an MLST workflow for typing of M. ovipneumoniae using Nanopore Rapid Barcoding sequencing and multiplex polymerase chain reaction (PCR). We compare the workflow with Nanopore Native Barcoding library preparation and Illumina MiSeq amplicon protocols to determine the most accurate and cost-effective method for sequencing multiplex amplicons. A multiplex PCR was optimized for four housekeeping genes of M. ovipneumoniae using archived DNA samples (N = 68) from nasal swabs. Results: Sequences recovered from Nanopore Rapid Barcoding correctly identified all MLST types with the shortest total workflow time and lowest cost per sample when compared with Nanopore Native Barcoding and Illumina MiSeq methods. Discussion: Our proposed workflow is a convenient and effective method for strain typing of M. ovipneumoniae and can be applied to other bacterial MLST schemes. The workflow is suitable for diagnostic settings, where reduced hands-on time, cost, and multiplexing capabilities are important.

5.
Genome Biol Evol ; 16(8)2024 Aug 05.
Article in English | MEDLINE | ID: mdl-39101619

ABSTRACT

The plant Arabidopsis thaliana is a model system used by researchers through much of plant research. Recent efforts have focused on discovering the genomic variation found in naturally occurring ecotypes isolated from around the world. These ecotypes have come from diverse climates and therefore have faced and adapted to a variety of abiotic and biotic stressors. The sequencing and comparative analysis of these genomes can offer insight into the adaptive strategies of plants. While there are a large number of ecotype genome sequences available, the majority were created using short-read technology. Mapping of short-reads containing structural variation to a reference genome bereft of that variation leads to incorrect mapping of those reads, resulting in a loss of genetic information and introduction of false heterozygosity. For this reason, long-read de novo sequencing of genomes is required to resolve structural variation events. In this article, we sequenced the genomes of eight natural variants of A. thaliana using nanopore sequencing. This resulted in highly contiguous assemblies with >95% of the genome contained within five contigs. The sequencing results from this study include five ecotypes from relict and African populations, an area of untapped genetic diversity. With this study, we increase the knowledge of diversity we have across A. thaliana ecotypes and contribute to ongoing production of an A. thaliana pan-genome.


Subject(s)
Arabidopsis , Ecotype , Genome, Plant , Arabidopsis/genetics , Chromosomes, Plant/genetics , Molecular Sequence Annotation , Genetic Variation
6.
Mitochondrial DNA B Resour ; 9(8): 1020-1023, 2024.
Article in English | MEDLINE | ID: mdl-39119347

ABSTRACT

Heptathela kimurai (Kishida, 1920) is a spider that belongs to the family Heptathelidae which is a basial lineage of spiders. The molecular information of ancestral species belonging to families like Heptathelidae is comparatively limited when compared to spider species from derived families. Here we present the complete mitochondrial genome sequence (mtDNA) of H. kimurai. The sequence was obtained using massively parallel sequencing technology. The circular genome was 14,224 bp in length, and the AT content was 69.53%. The H. kimurai mitochondrial genome contains 13 protein-coding genes (PCGs), 21 tRNA genes, and 2 rRNA genes. The majority of PCGs were found in the heavy strand.

7.
J Clin Microbiol ; : e0074124, 2024 Aug 13.
Article in English | MEDLINE | ID: mdl-39136450

ABSTRACT

The transition from MIRU-VNTR-based epidemiology studies in tuberculosis (TB) to genomic epidemiology has transformed how we track transmission. However, short-read sequencing is poor at analyzing repetitive regions such as the MIRU-VNTR loci. This causes a gap between the new genomic data and the large amount of information stored in historical databases. Long-read sequencing could bridge this knowledge gap by allowing analysis of repetitive regions. However, the feasibility of extracting MIRU-VNTRs from long reads and linking them to historical data has not been evaluated. In our study, an in silico arm, consisting of inference of MIRU patterns from long-read sequences (using MIRUReader program), was compared with an experimental arm, involving standard amplification and fragment sizing. We analyzed overall performance on 39 isolates from South Africa and confirmed reproducibility in a sample enriched with 62 clustered cases from Spain. Finally, we ran 25 consecutive incident cases, demonstrating the feasibility of correctly assigning new clustered/orphan cases by linking data inferred from genomic analysis to MIRU-VNTR databases. Of the 3,024 loci analyzed, only 11 discrepancies (0.36%) were found between the two arms: three attributed to experimental error and eight to misassigned alleles from long-read sequencing. A second round of analysis of these discrepancies resulted in agreement between the experimental and in silico arms in all but one locus. Adjusting the MIRUReader program code allowed us to flag potential in silico misassignments due to suboptimal coverage or unfixed double alleles. Our study indicates that long-read sequencing could help address potential chronological and geographical gaps arising from the transition from molecular to genomic epidemiology of tuberculosis. IMPORTANCE: The transition from molecular epidemiology in tuberculosis (TB), based on the analysis of repetitive regions (VNTR-based genotyping), to genomic epidemiology transforms in the precision with which we track transmission. However, short-read sequencing, the most common method for performing genomic analysis, is poor at analyzing repetitive regions. This means that we face a gap between the new genomic data and the large amount of information stored in historical databases, which is also an obstacle to cross-national surveillance involving settings where only molecular data are available. Long-read sequencing could help bridge this knowledge gap by allowing analysis of repetitive regions. Our study demonstrates that MIRU-VNTR patterns can be successfully inferred from long-read sequences, allowing the correct assignment of new cases as clustered/orphan by linking new data extracted from genomic analysis to historical MIRU-VNTR databases. Our data may provide a starting point for bridging the knowledge gap between the molecular and genomic eras in tuberculosis epidemiology.

8.
BMC Genomics ; 25(1): 679, 2024 Jul 08.
Article in English | MEDLINE | ID: mdl-38978005

ABSTRACT

BACKGROUND: Oxford Nanopore provides high throughput sequencing platforms able to reconstruct complete bacterial genomes with 99.95% accuracy. However, even small levels of error can obscure the phylogenetic relationships between closely related isolates. Polishing tools have been developed to correct these errors, but it is uncertain if they obtain the accuracy needed for the high-resolution source tracking of foodborne illness outbreaks. RESULTS: We tested 132 combinations of assembly and short- and long-read polishing tools to assess their accuracy for reconstructing the genome sequences of 15 highly similar Salmonella enterica serovar Newport isolates from a 2020 onion outbreak. While long-read polishing alone improved accuracy, near perfect accuracy (99.9999% accuracy or ~ 5 nucleotide errors across the 4.8 Mbp genome, excluding low confidence regions) was only obtained by pipelines that combined both long- and short-read polishing tools. Notably, medaka was a more accurate and efficient long-read polisher than Racon. Among short-read polishers, NextPolish showed the highest accuracy, but Pilon, Polypolish, and POLCA performed similarly. Among the 5 best performing pipelines, polishing with medaka followed by NextPolish was the most common combination. Importantly, the order of polishing tools mattered i.e., using less accurate tools after more accurate ones introduced errors. Indels in homopolymers and repetitive regions, where the short reads could not be uniquely mapped, remained the most challenging errors to correct. CONCLUSIONS: Short reads are still needed to correct errors in nanopore sequenced assemblies to obtain the accuracy required for source tracking investigations. Our granular assessment of the performance of the polishing pipelines allowed us to suggest best practices for tool users and areas for improvement for tool developers.


Subject(s)
Benchmarking , Disease Outbreaks , Genome, Bacterial , Nanopores , Nanopore Sequencing/methods , High-Throughput Nucleotide Sequencing/methods , Salmonella enterica/genetics , Salmonella enterica/isolation & purification , Humans , Phylogeny
9.
Brief Bioinform ; 25(4)2024 May 23.
Article in English | MEDLINE | ID: mdl-38980375

ABSTRACT

Structural variation (SV) is an important form of genomic variation that influences gene function and expression by altering the structure of the genome. Although long-read data have been proven to better characterize SVs, SVs detected from noisy long-read data still include a considerable portion of false-positive calls. To accurately detect SVs in long-read data, we present SVDF, a method that employs a learning-based noise filtering strategy and an SV signature-adaptive clustering algorithm, for effectively reducing the likelihood of false-positive events. Benchmarking results from multiple orthogonal experiments demonstrate that, across different sequencing platforms and depths, SVDF achieves higher calling accuracy for each sample compared to several existing general SV calling tools. We believe that, with its meticulous and sensitive SV detection capability, SVDF can bring new opportunities and advancements to cutting-edge genomic research.


Subject(s)
Algorithms , Humans , Sequence Analysis, DNA/methods , High-Throughput Nucleotide Sequencing/methods , Genomics/methods , Genomic Structural Variation , Software
10.
bioRxiv ; 2024 Jun 22.
Article in English | MEDLINE | ID: mdl-38948813

ABSTRACT

Organismal aging is marked by decline in cellular function and anatomy, ultimately resulting in death. To inform our understanding of the mechanisms underlying this degeneration, we performed standard RNA sequencing and Nanopore direct RNA sequencing over an adult time course in Caenorhabditis elegans. Long reads allowed for identification of hundreds of novel isoforms and age-associated differential isoform accumulation, resulting from alternative splicing and terminal exon choice. Genome-wide analysis reveals a decline in RNA processing fidelity and a rise in inosine and pseudouridine editing events in transcripts from older animals. In this first map of pseudouridine modifications for C. elegans, we find that they largely reside in coding sequences and that the number of genes with this modification increases with age. Collectively, this analysis discovers transcriptomic signatures associated with age and is a valuable resource to understand the many processes that dictate altered gene expression patterns and post-transcriptional regulation in aging.

11.
Data Brief ; 54: 110296, 2024 Jun.
Article in English | MEDLINE | ID: mdl-38962209

ABSTRACT

Antimicrobial resistance remains a significant global and One Health threat, owing to the diminishing effectiveness of antibiotics against rapidly evolving multidrug-resistant bacteria, and the limited innovative research towards the development of new antibiotic therapeutics. In this article, we present the whole-genome sequence data of Proteus mirabilis-MN029 obtained from highly accurate long-read PacBioⓇ HiFi technology. The antibacterial activities of the selected African native plant species were also evaluated using the disk diffusion method. Acquired antibiotic resistance genes and chromosomal mutations corresponding to antibiotics of clinical importance were identified from genomic data. Using ethlyl acetate as solvent, Pterocarpus angolensis leaf extracts showed the most promising antibacterial effects against Proteus mirabilis-MN029. These datasets will be useful for future experimental research aimed at designing new antibacterial drugs from plant extracts that are effective alone or in combination with existing antibiotics to overcome multidrug-resistance mechanisms.

12.
Mol Genet Genomics ; 299(1): 65, 2024 Jul 07.
Article in English | MEDLINE | ID: mdl-38972030

ABSTRACT

BACKGROUND: A large number of challenging medically relevant genes (CMRGs) are situated in complex or highly repetitive regions of the human genome, hindering comprehensive characterization of genetic variants using next-generation sequencing technologies. In this study, we employed long-read sequencing technology, extensively utilized in studying complex genomic regions, to characterize genetic alterations, including short variants (single nucleotide variants and short insertions and deletions) and copy number variations, in 370 CMRGs across 41 individuals from 19 global populations. RESULTS: Our analysis revealed high levels of genetic variants in CMRGs, with 68.73% exhibiting copy number variations and 65.20% containing short variants that may disrupt protein function across individuals. Such variants can influence pharmacogenomics, genetic disease susceptibility, and other clinical outcomes. We observed significant differences in CMRG variation across populations, with individuals of African ancestry harboring the highest number of copy number variants and short variants compared to samples from other continents. Notably, 15.79% to 33.96% of short variants were exclusively detectable through long-read sequencing. While the T2T-CHM13 reference genome significantly improved the assembly of CMRG regions, thereby facilitating variant detection in these regions, some regions still lacked resolution. CONCLUSION: Our results provide an important reference for future clinical and pharmacogenetic studies, highlighting the need for a comprehensive representation of global genetic diversity in the reference genome and improved variant calling techniques to fully resolve medically relevant genes.


Subject(s)
DNA Copy Number Variations , Genome, Human , High-Throughput Nucleotide Sequencing , Humans , DNA Copy Number Variations/genetics , High-Throughput Nucleotide Sequencing/methods , Genome, Human/genetics , Polymorphism, Single Nucleotide/genetics , Genetic Variation/genetics , Genetic Predisposition to Disease , Genetics, Population/methods , INDEL Mutation
13.
Genomics ; 116(5): 110894, 2024 Jul 15.
Article in English | MEDLINE | ID: mdl-39019410

ABSTRACT

Technologies for detecting structural variation (SV) have advanced with the advent of long-read sequencing, which enables the validation of SV at a nucleotide level. Optical genome mapping (OGM), a technology based on physical mapping, can also provide comprehensive SVs analysis. We applied long-read whole genome sequencing (LRWGS) to accurately reconstruct breakpoint (BP) segments in a patient with complex chromosome 6q rearrangements that remained elusive by conventional karyotyping. Although all BPs were precisely identified by LRWGS, there were two possible ways to construct the BP segments in terms of their orders and orientations. Thus, we also used OGM analysis. Notably, OGM recognized entire inversions exceeding 500 kb in size, which LRWGS could not characterize. Consequently, here we successfully unveil the full genomic structure of this complex chromosomal 6q rearrangement and cryptic SVs through combined long-molecule genomic analyses, showcasing how LRWGS and OGM can complement each other in SV analysis.

14.
ISME Commun ; 4(1): ycae099, 2024 Jan.
Article in English | MEDLINE | ID: mdl-39081363

ABSTRACT

While the air microbiome and its diversity are essential for human health and ecosystem resilience, comprehensive air microbial diversity monitoring has remained rare, so that little is known about the air microbiome's composition, distribution, or functionality. Here we show that nanopore sequencing-based metagenomics can robustly assess the air microbiome in combination with active air sampling through liquid impingement and tailored computational analysis. We provide fast and portable laboratory and computational approaches for air microbiome profiling, which we leverage to robustly assess the taxonomic composition of the core air microbiome of a controlled greenhouse environment and of a natural outdoor environment. We show that long-read sequencing can resolve species-level annotations and specific ecosystem functions through de novo metagenomic assemblies despite the low amount of fragmented DNA used as an input for nanopore sequencing. We then apply our pipeline to assess the diversity and variability of an urban air microbiome, using Barcelona, Spain, as an example; this randomized experiment gives first insights into the presence of highly stable location-specific air microbiomes within the city's boundaries, and showcases the robust microbial assessments that can be achieved through automatable, fast, and portable nanopore sequencing technology.

15.
Front Genet ; 15: 1435087, 2024.
Article in English | MEDLINE | ID: mdl-39045321

ABSTRACT

Introduction: Structural Variants (SVs) are a type of variation that can significantly influence phenotypes and cause diseases. Thus, the accurate detection of SVs is a vital part of modern genetic analysis. The advent of long-read sequencing technology ushers in a new era of more accurate and comprehensive SV calling, and many tools have been developed to call SVs using long-read data. Haplotype-tagging is a procedure that can tag haplotype information on reads and can thus potentially improve the SV detection; nevertheless, few methods make use of this information. In this article, we introduce HapKled, a new SV detection tool that can accurately detect SVs from Oxford Nanopore Technologies (ONT) long-read alignment data. Methods: HapKled utilizes haplotype information underlying alignment data by conducting haplotype-tagging using Whatshap on the reads to improve the detection performance, with three unique calling mechanics including altering clustering conditions according to haplotype information of signatures, determination of similar SVs based on haplotype information, and slack filtering conditions based on haplotype quality. Results: In our evaluations, HapKled outperformed state-of-the-art tools and can deliver better SV detection results on both simulated and real sequencing data. The code and experiments of HapKled can be obtained from https://github.com/CoREse/HapKled. Discussion: With the superb SV detection performance that HapKled can deliver, HapKled could be useful in bioinformatics research, clinical diagnosis, and medical research and development.

16.
PeerJ ; 12: e17605, 2024.
Article in English | MEDLINE | ID: mdl-39011377

ABSTRACT

Viral outbreaks are a constant threat to aquaculture, limiting production for better global food security. A lack of diagnostic testing and monitoring in resource-limited areas hinders the capacity to respond rapidly to disease outbreaks and to prevent viral pathogens becoming endemic in fisheries productive waters. Recent developments in diagnostic testing for emerging viruses, however, offers a solution for rapid in situ monitoring of viral outbreaks. Genomic epidemiology has furthermore proven highly effective in detecting viral mutations involved in pathogenesis and assisting in resolving chains of transmission. Here, we demonstrate the application of an in-field epidemiological tool kit to track viral outbreaks in aquaculture on farms with reduced access to diagnostic labs, and with non-destructive sampling. Inspired by the "lab in a suitcase" approach used for genomic surveillance of human viral pathogens and wastewater monitoring of COVID19, we evaluated the feasibility of real-time genome sequencing surveillance of the fish pathogen, Infectious spleen and kidney necrosis virus (ISKNV) in Lake Volta. Viral fractions from water samples collected from cages holding Nile tilapia (Oreochromis niloticus) with suspected ongoing ISKNV infections were concentrated and used as a template for whole genome sequencing, using a previously developed tiled PCR method for ISKNV. Mutations in ISKNV in samples collected from the water surrounding the cages matched those collected from infected caged fish, illustrating that water samples can be used for detecting predominant ISKNV variants in an ongoing outbreak. This approach allows for the detection of ISKNV and tracking of the dynamics of variant frequencies, and may thus assist in guiding control measures for the rapid isolation and quarantine of infected farms and facilities.


Subject(s)
Aquaculture , Fish Diseases , Iridoviridae , Animals , Fish Diseases/virology , Fish Diseases/epidemiology , Fish Diseases/diagnosis , Iridoviridae/genetics , Iridoviridae/isolation & purification , Ghana/epidemiology , Lakes/virology , DNA Virus Infections/virology , DNA Virus Infections/epidemiology , DNA Virus Infections/veterinary , DNA Virus Infections/transmission , Genome, Viral/genetics , Tilapia/virology , Disease Outbreaks/veterinary , Disease Outbreaks/prevention & control , Whole Genome Sequencing/methods , Cichlids/virology
17.
BMC Genom Data ; 25(1): 70, 2024 Jul 15.
Article in English | MEDLINE | ID: mdl-39009995

ABSTRACT

OBJECTIVES: Ants are ecologically dominant insects in most terrestrial ecosystems, with more than 14,000 extant species in about 340 genera recorded to date. However, genomic resources are still scarce for most species, especially for species endemic in East or Southeast Asia, limiting the study of phylogeny, speciation and adaptation of this evolutionarily successful animal lineage. Here, we assemble and annotate the genomes of Odontoponera transversa and Camponotus friedae, two ant species with a natural distribution in China, to facilitate future study of ant evolution. DATA DESCRIPTION: We obtained a total of 16 Gb and 51 Gb PacBio HiFi data for O. transversa and C. friedae, respectively, which were assembled into the draft genomes of 339 Mb for O. transversa and 233 Mb for C. friedae. Genome assessments by multiple metrics showed good completeness and high accuracy of the two assemblies. Gene annotations assisted by RNA-seq data yielded a comparable number of protein-coding genes in the two genomes (10,892 for O. transversa and 11,296 for C. friedae), while repeat annotations revealed a remarkable difference of repeat content between these two ant species (149.4 Mb for O. transversa versus 49.7 Mb for C. friedae). Besides, complete mitochondrial genomes for the two species were assembled and annotated.


Subject(s)
Ants , Genome, Insect , Animals , Ants/genetics , Ants/classification , Genome, Insect/genetics , Molecular Sequence Annotation , Phylogeny , Genomics/methods
18.
Med ; 2024 Jul 16.
Article in English | MEDLINE | ID: mdl-39047733

ABSTRACT

BACKGROUND: Delineating base-resolution breakpoints of complex rearrangements is crucial for an accurate clinical understanding of pathogenic variants and for carrier screening within family networks or the broader population. However, despite advances in genetic testing using short-read sequencing (SRS), this task remains costly and challenging. METHODS: This study addresses the challenges of resolving missing disease-causing breakpoints in complex genomic disorders with suspected homozygous rearrangements by employing multiple long-read sequencing (LRS) strategies, including a novel and efficient strategy named nanopore-based rapid acquisition of neighboring genomic regions (NanoRanger). NanoRanger does not require large amounts of ultrahigh-molecular-weight DNA and stands out for its ease of use and rapid acquisition of large genomic regions of interest with deep coverage. FINDINGS: We describe a cohort of 16 familial cases, each harboring homozygous rearrangements that defied breakpoint determination by SRS and optical genome mapping (OGM). NanoRanger identified the breakpoints with single-base-pair resolution, enabling accurate determination of the carrier status of unaffected family members as well as the founder nature of these genomic lesions and their frequency in the local population. The resolved breakpoints revealed that repetitive DNA, gene regulatory elements, and transcription activity contribute to genome instability in these novel recessive rearrangements. CONCLUSIONS: Our data suggest that NanoRanger greatly improves the success rate of resolving base-resolution breakpoints of complex genomic disorders and expands access to LRS for the benefit of patients with Mendelian disorders. FUNDING: M.L. is supported by KAUST Baseline Award no. BAS/1/1080-01-01 and KAUST Research Translation Fund Award no. REI/1/4742-01.

19.
Article in English | MEDLINE | ID: mdl-39049755

ABSTRACT

CONTEXT: Genetic testing for 21-hydroxylase deficiency (21-OHD) is always challenging. Current approaches, short-read sequencing and multiplex ligation-dependent probe amplification (MLPA), are insufficient for the detection of chimeric genes or complicated variants from multiple copies. Recently developed long-read sequencing (LRS) can solve this problem. OBJECTIVE: To investigate the clinical utility of LRS in precision diagnosis of 21-hydroxylase deficiency. METHODS: In the cohort of 832 patients with 21-OHD, the current approaches provided the precise molecular diagnosis for 81.7% (680/832) of cases. LRS was performed to solve the remaining 144 cases with complex chimeric variants and eight cases with variants from multiple copies. Clinical manifestations in patients with continuous deletions of CYP21A2 extending to TNXB (namely CAH-X) were further evaluated. RESULTS: Using LRS in combination with previous genetic test results, a total of 16.9% (281/1664) CYP21A1P/CYP21A2 or TNXA/TNXB chimeric alleles were identified in 832 patients, with CYP21A1P/CYP21A2 accounting for 10.4% and TNXA/TNXB for 6.5%. The top three common chimeras were CYP21 CH-1, TNX CH-1 and TNX CH-2, accounting for 77.2% (217/281) of all chimeric alleles. The eight patients with variants on multiple copies of CYP21A2 were accurately identified with LRS. The prevalence of CAH-X in our cohort was 12.1%, and a high frequency of connective tissue-related symptoms was observed in CAH-X patients. CONCLUSION: LRS can detect all types of CYP21A2 variants, including complex chimeras and pathogenic variants on multiple copies in patients with 21-OHD, which could be utilized as a first-tier routine test for the precision diagnosis and categorization of congenital adrenal hyperplasia.

20.
J Neurol ; 2024 Jul 30.
Article in English | MEDLINE | ID: mdl-39078482

ABSTRACT

BACKGROUND: Neuronal intranuclear inclusion disease (NIID) is a rare neurodegenerative disease caused by the expansion of GGC repeats in the 5'-untranslated region (5'-UTR) of NOTCH2NLC. Although increasing evidence suggests that NIID affects various organs, its association with renal involvement remains unclear. We studied the genetic background of a family with NIID, in which four of five members presented with proteinuria as the initial manifestation. The renal pathology of three patients was diagnosed as focal segmental glomerulosclerosis (FSGS) at a previous hospital. These patients also presented with tremors, retinal degeneration, and episodic neurological events. Finally, one patient exhibited reversible bilateral thalamic high-intensity signal changes on diffusion-weighted imaging during episodic neurological events. METHODS: Exome sequencing (ES) and nanopore long-read whole-genome sequencing (LR-WGS) were performed on the index case, followed by nanopore target sequencing using Cas9-mediated PCR-free enrichment and methylation analysis. RESULTS: ES revealed no candidate variants; however, nanopore LR-WGS in the index case revealed expansion of short tandem repeats (STR) in NOTCH2NLC. Subsequent nanopore target sequencing using Cas9-mediated PCR-free enrichment showed STR expansion of NOTCH2NLC in an affected sibling and asymptomatic father. Methylation analysis using nanopore data revealed hypermethylation of the expanded allele in the asymptomatic father and partial hypermethylation in a mildly symptomatic sibling, whereas the expanded allele was hypomethylated in the index case. CONCLUSIONS: This investigation expands the clinical spectrum of NIID, suggesting that STR expansion of NOTCH2NLC is a cause of renal diseases, including FSGS.

SELECTION OF CITATIONS
SEARCH DETAIL