Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 68
Filter
1.
mSystems ; : e0016024, 2024 Aug 06.
Article in English | MEDLINE | ID: mdl-39105591

ABSTRACT

As antimicrobial resistance (AMR) surveillance shifts to genomics, ensuring the quality of whole-genome sequencing (WGS) data produced across laboratories is critical. Participation in genomic proficiency tests (GPTs) not only increases individual laboratories' WGS capacity but also provides a unique opportunity to improve species-specific thresholds for WGS quality control (QC) by repeated resequencing of distinct isolates. Here, we present the results of the EU Reference Laboratory for Antimicrobial Resistance (EURL-AR) network GPTs of 2021 and 2022, which included 25 EU national reference laboratories (NLRs). A total of 392 genomes from 12 AMR-bacteria were evaluated based on WGS QC metrics. Two percent (n = 9) of the data were excluded, due to contamination, and 11% (n = 41) of the remaining genomes were identified as outliers in at least one QC metric and excluded from computation of the adjusted QC thresholds (AQT). Two QC metric correlation groups were identified through linear regression. Eight percent (n = 28) of the submitted genomes, from 11 laboratories, failed one or more of the AQTs. However, only three laboratories (12%) were identified as underperformers, failing across AQTs for uncorrelated QC metrics in at least two genomes. Finally, new species-specific thresholds for "N50" and "number of contigs > 200 bp" are presented for guidance in routine laboratory QC. The continued participation of NRLs in GPTs will reveal WGS workflow flaws and improve AMR surveillance data. GPT data will continue to contribute to the development of reliable species-specific thresholds for routine WGS QC, standardizing sequencing data QC and ensure inter- and intranational laboratory comparability.IMPORTANCEIllumina next-generation sequencing is an integral part of antimicrobial resistance (AMR) surveillance and the most widely used whole-genome sequencing (WGS) platform. The high-throughput, relative low-cost, high discriminatory power, and rapid turnaround time of WGS compared to classical biochemical methods means the technology will likely remain a fundamental tool in AMR surveillance and public health. In this study, we present the current level of WGS capacity among national reference laboratories in the EU Reference Laboratory for AMR network, summarizing applied methodology and statistically evaluating the quality of the obtained sequence data. These findings provide the basis for setting new and revised thresholds for quality metrics used in routine WGS, which have previously been arbitrarily defined. In addition, underperforming participants are identified and encouraged to evaluate their workflows to produce reliable results.

2.
Methods Mol Biol ; 2833: 161-183, 2024.
Article in English | MEDLINE | ID: mdl-38949710

ABSTRACT

Outbreaks are a risk to public health particularly when pathogenic, hypervirulent, and/or multidrug-resistant organisms (MDROs) are involved. In a hospital setting, vulnerable populations such as the immunosuppressed, intensive care patients, and neonates are most at risk. Rapid and accurate outbreak detection is essential to implement effective interventions in clinical areas to control and stop further transmission. Advances in the field of whole genome sequencing (WGS) have resulted in lowered costs, increased capacity, and improved reproducibility of results. WGS now has the potential to revolutionize the investigation and management of outbreaks replacing conventional genotyping and other discrimination systems. Here, we outline specific procedures and protocols to implement WGS into investigation of outbreaks in healthcare settings.


Subject(s)
Disease Outbreaks , Genomics , Whole Genome Sequencing , Humans , Whole Genome Sequencing/methods , Genomics/methods , Genome, Bacterial
3.
Methods Mol Biol ; 2822: 245-262, 2024.
Article in English | MEDLINE | ID: mdl-38907923

ABSTRACT

RNA sequencing (RNA-Seq) has emerged as a powerful and versatile tool for the comprehensive analysis of transcriptomes and has been widely used to investigate gene expression, copy number variation, alternative splicing, and novel transcript discovery. This chapter outlines the methodology for conducting short-read RNA-Seq, starting from RNA enrichment to library preparation and sequencing. Throughout the chapter, practical tips and best practices are provided to guide researchers in order to optimize each step of the RNA-Seq workflow. Multiple quality control steps throughout the workflow that are critical to obtain high-quality RNA-Seq data are also discussed.


Subject(s)
RNA-Seq , Humans , RNA-Seq/methods , Gene Expression Profiling/methods , Transcriptome/genetics , Sequence Analysis, RNA/methods , Gene Library , High-Throughput Nucleotide Sequencing/methods , Quality Control , RNA/genetics , Workflow , Software , Alternative Splicing/genetics , Computational Biology/methods
4.
bioRxiv ; 2024 May 21.
Article in English | MEDLINE | ID: mdl-38826378

ABSTRACT

The extremely high levels of genetic polymorphism within the human major histocompatibility complex (MHC) limit the usefulness of reference-based alignment methods for sequence assembly. We incorporate a short read de novo assembly algorithm into a workflow for novel application to the MHC. MHConstructor is a containerized pipeline designed for high-throughput, haplotype-informed, reproducible assembly of both whole genome sequencing and target-capture short read data in large, population cohorts. To-date, no other self-contained tool exists for the generation of de novo MHC assemblies from short read data. MHConstructor facilitates wide-spread access to high quality, alignment-free MHC sequence analysis.

5.
Genes (Basel) ; 15(6)2024 May 22.
Article in English | MEDLINE | ID: mdl-38927591

ABSTRACT

Glycogen synthase kinase-3ß (GSK3ß) not only plays a crucial role in regulating sperm maturation but also is pivotal in orchestrating the acrosome reaction. Here, we integrated single-molecule long-read and short-read sequencing to comprehensively examine GSK3ß expression patterns in adult Diannan small-ear pig (DSE) testes. We identified the most important transcript ENSSSCT00000039364 of GSK3ß, obtaining its full-length coding sequence (CDS) spanning 1263 bp. Gene structure analysis located GSK3ß on pig chromosome 13 with 12 exons. Protein structure analysis reflected that GSK3ß consisted of 420 amino acids containing PKc-like conserved domains. Phylogenetic analysis underscored the evolutionary conservation and homology of GSK3ß across different mammalian species. The evaluation of the protein interaction network, KEGG, and GO pathways implied that GSK3ß interacted with 50 proteins, predominantly involved in the Wnt signaling pathway, papillomavirus infection, hippo signaling pathway, hepatocellular carcinoma, gastric cancer, colorectal cancer, breast cancer, endometrial cancer, basal cell carcinoma, and Alzheimer's disease. Functional annotation identified that GSK3ß was involved in thirteen GOs, including six molecular functions and seven biological processes. ceRNA network analysis suggested that DSE GSK3ß was regulated by 11 miRNA targets. Furthermore, qPCR expression analysis across 15 tissues highlighted that GSK3ß was highly expressed in the testis. Subcellular localization analysis indicated that the majority of the GSK3ß protein was located in the cytoplasm of ST (swine testis) cells, with a small amount detected in the nucleus. Overall, our findings shed new light on GSK3ß's role in DSE reproduction, providing a foundation for further functional studies of GSK3ß function.


Subject(s)
Glycogen Synthase Kinase 3 beta , Spermatogenesis , Animals , Glycogen Synthase Kinase 3 beta/genetics , Glycogen Synthase Kinase 3 beta/metabolism , Male , Swine/genetics , Spermatogenesis/genetics , Testis/metabolism , Phylogeny , Gene Expression Regulation
6.
Article in English | MEDLINE | ID: mdl-38862430

ABSTRACT

Tandem duplication (TD) is a major type of structural variations (SVs) that plays an important role in novel gene formation and human diseases. However, TDs are often missed or incorrectly classified as insertions by most modern SV detection methods due to the lack of specialized operation on TD-related mutational signals. Herein, we developed a TD detection module for the Pindel tool, referred to as Pindel-TD, based on a TD-specific pattern growth approach. Pindel-TD is capable of detecting TDs with a wide size range at single nucleotide resolution. Using simulated and real read data from HG002, we demonstrated that Pindel-TD outperforms other leading methods in terms of precision, recall, F1-score, and robustness. Furthermore, by applying Pindel-TD to data generated from the K562 cancer cell line, we identified a TD located at the seventh exon of SAGE1, providing an explanation for its high expression. Pindel-TD is available for non-commercial use at https://github.com/xjtu-omics/pindel.


Subject(s)
Software , Humans , K562 Cells , Gene Duplication , Tandem Repeat Sequences/genetics , Algorithms
8.
Brief Bioinform ; 25(3)2024 Mar 27.
Article in English | MEDLINE | ID: mdl-38605641

ABSTRACT

Simulation of RNA-seq reads is critical in the assessment, comparison, benchmarking and development of bioinformatics tools. Yet the field of RNA-seq simulators has progressed little in the last decade. To address this need we have developed BEERS2, which combines a flexible and highly configurable design with detailed simulation of the entire library preparation and sequencing pipeline. BEERS2 takes input transcripts (typically fully length messenger RNA transcripts with polyA tails) from either customizable input or from CAMPAREE simulated RNA samples. It produces realistic reads of these transcripts as FASTQ, SAM or BAM formats with the SAM or BAM formats containing the true alignment to the reference genome. It also produces true transcript-level quantification values. BEERS2 combines a flexible and highly configurable design with detailed simulation of the entire library preparation and sequencing pipeline and is designed to include the effects of polyA selection and RiboZero for ribosomal depletion, hexamer priming sequence biases, GC-content biases in polymerase chain reaction (PCR) amplification, barcode read errors and errors during PCR amplification. These characteristics combine to make BEERS2 the most complete simulation of RNA-seq to date. Finally, we demonstrate the use of BEERS2 by measuring the effect of several settings on the popular Salmon pseudoalignment algorithm.


Subject(s)
Genome , RNA , RNA-Seq , Sequence Analysis, RNA , Computer Simulation , RNA/genetics , High-Throughput Nucleotide Sequencing
9.
Evol Appl ; 17(3): e13653, 2024 Mar.
Article in English | MEDLINE | ID: mdl-38495945

ABSTRACT

Genomic structural variants (SVs) are now recognized as an integral component of intraspecific polymorphism and are known to contribute to evolutionary processes in various organisms. However, they are inherently difficult to detect and genotype from readily available short-read sequencing data, and therefore remain poorly documented in wild populations. Salmonid species displaying strong interpopulation variability in both life history traits and habitat characteristics, such as Atlantic salmon (Salmo salar), offer a prime context for studying adaptive polymorphism, but the contribution of SVs to fine-scale local adaptation has yet to be explored. Here, we performed a comparative analysis of SVs, single nucleotide polymorphisms (SNPs) and small indels (<50 bp) segregating in the Romaine and Puyjalon salmon, two putatively locally adapted populations inhabiting neighboring rivers (Québec, Canada) and showing pronounced variation in life history traits, namely growth, fecundity, and age at maturity and smoltification. We first catalogued polymorphism using a hybrid SV characterization approach pairing both short- (16X) and long-read sequencing (20X) for variant discovery with graph-based genotyping of SVs across 60 salmon genomes, along with characterization of SNPs and small indels from short reads. We thus identified 115,907 SVs, 8,777,832 SNPs and 1,089,321 short indels, with SVs covering 4.8 times more base pairs than SNPs. All three variant types revealed a highly congruent population structure and similar patterns of F ST and density variation along the genome. Finally, we performed outlier detection and redundancy analysis (RDA) to identify variants of interest in the putative local adaptation of Romaine and Puyjalon salmon. Genes located near these variants were enriched for biological processes related to nervous system function, suggesting that observed variation in traits such as age at smoltification could arise from differences in neural development. This study therefore demonstrates the feasibility of large-scale SV characterization and highlights its relevance for salmonid population genomics.

10.
Brief Bioinform ; 25(2)2024 Jan 22.
Article in English | MEDLINE | ID: mdl-38349061

ABSTRACT

Extrachromosomal circular DNA (eccDNA) is currently attracting considerable attention from researchers due to its significant impact on tumor biogenesis. High-throughput sequencing (HTS) methods for eccDNA identification are continually evolving. However, an efficient pipeline for the integrative and comprehensive analysis of eccDNA obtained from HTS data is still lacking. Here, we introduce eccDNA-pipe, an accessible software package that offers a user-friendly pipeline for conducting eccDNA analysis starting from raw sequencing data. This dataset includes data from various sequencing techniques such as whole-genome sequencing (WGS), Circle-seq and Circulome-seq, obtained through short-read sequencing or long-read sequencing. eccDNA-pipe presents a comprehensive solution for both upstream and downstream analysis, encompassing quality control and eccDNA identification in upstream analysis and downstream tasks such as eccDNA length distribution analysis, differential analysis of genes enriched with eccDNA and visualization of eccDNA structures. Notably, eccDNA-pipe automatically generates high-quality publication-ready plots. In summary, eccDNA-pipe provides a comprehensive and user-friendly pipeline for customized analysis of eccDNA research.


Subject(s)
DNA, Circular , Neoplasms , Humans , DNA, Circular/genetics , DNA/genetics , High-Throughput Nucleotide Sequencing , Whole Genome Sequencing
11.
Microbiol Resour Announc ; 13(2): e0085423, 2024 Feb 15.
Article in English | MEDLINE | ID: mdl-38179913

ABSTRACT

We present the closed genome sequence of the Clostridium botulinum BT-22100019 strain isolated from the stool specimen of an infant diagnosed with botulism. With 4.33-Mb genome size and 28.0% G + C content, the bont/B1 gene encoded for botulinum neurotoxin serotype B was found on a 262 kb plasmid arranged in a ha+ orfx - cluster.

12.
J Pers Med ; 13(12)2023 Nov 27.
Article in English | MEDLINE | ID: mdl-38138882

ABSTRACT

BACKGROUND: Pharmacogenetics (PGx) aims to determine genetic signatures that can be used in clinical settings to individualize treatment for each patient, including anti-cancer drugs, anti-psychotics, and painkillers. Taken together, a better understanding of the impacts of genetic variants on the corresponding protein function or expression permits the prediction of the pharmacological response: responders, non-responders, and those with adverse drug reactions (ADRs). OBJECTIVE: This work provides a comparison between innovative long-read sequencing (LRS) and short-read sequencing (SRS) techniques. METHODS AND MATERIALS: The gene panel captured using PacBio HiFi® sequencing was tested on thirteen clinical samples on GENTYANE's platform. SRS, using a comprehensive pharmacogenetics panel, was performed in routine settings at the Civil Hospitals of Lyon. We focused on complex regions analysis, including copy number variations (CNVs), structural variants, repeated regions, and phasing-haplotyping for three key pharmacogenes: CYP2D6, UGT1A1, and NAT2. RESULTS: Variants and the corresponding expected star (*) alleles were reported. Although only 38.4% concordance was found for haplotype determination and 61.5% for diplotype, this did not affect the metabolism scoring. A better accuracy of LRS was obtained for the detection of the CYP2D6*5 haplotype in the presence of the duplicated wild-type CYP2D6*2 form. A total concordance was performed for UGT1A1 TA repeat detection. Direct phasing using the LRS approach allowed us to correct certain NAT2 profiles. CONCLUSIONS: Combining an optimized variant-calling pipeline and with direct phasing analysis, LRS is a robust technique for PGx analysis that can minimize the risk of mis-haplotyping.

13.
Curr Hematol Malig Rep ; 18(6): 284-291, 2023 Dec.
Article in English | MEDLINE | ID: mdl-37947937

ABSTRACT

PURPOSE OF REVIEW: The length of telomeres, protective structures at the chromosome ends, is a well-established biomarker for pathological conditions including multisystemic syndromes called telomere biology disorders. Approaches to measure telomere length (TL) differ on whether they estimate average, distribution, or chromosome-specific TL, and each presents their own advantages and limitations. RECENT FINDINGS: The development of long-read sequencing and publication of the telomere-to-telomere human genome reference has allowed for scalable and high-resolution TL estimation in pre-existing sequencing datasets but is still impractical as a dedicated TL test. As sequencing costs continue to fall and strategies for selectively enriching telomere regions prior to sequencing improve, these approaches may become a promising alternative to classic methods. Measurement methods rely on probe hybridization, qPCR or more recently, computational methods using sequencing data. Refinements of existing techniques and new approaches have been recently developed but a test that is accurate, simple, and scalable is still lacking.


Subject(s)
Telomere , Humans , Forecasting , Telomere/genetics
14.
Vet Res ; 54(1): 95, 2023 Oct 18.
Article in English | MEDLINE | ID: mdl-37853447

ABSTRACT

When resequencing animal genomes, some short reads cannot be mapped to the reference genome and are usually discarded. In this study, unmapped reads from 302 German Black Pied cattle were analyzed to identify potential pathogenic DNA. These unmapped reads were assembled and blasted against NCBI's database to identify bacterial and viral sequences. The results provided evidence for the presence of pathogens. We found sequences of Bovine parvovirus 3 and Mycoplasma species. These findings emphasize the information content of unmapped reads for gaining insight into bacterial and viral infections, which is important for veterinarians and epidemiologists.


Subject(s)
Cattle Diseases , Virus Diseases , Cattle , Animals , Sequence Analysis, DNA/veterinary , Whole Genome Sequencing/veterinary , Virus Diseases/veterinary , Bacteria/genetics , High-Throughput Nucleotide Sequencing/methods , High-Throughput Nucleotide Sequencing/veterinary
15.
Front Microbiol ; 14: 1221668, 2023.
Article in English | MEDLINE | ID: mdl-37720160

ABSTRACT

Culture-independent metagenomic sequencing of enriched agricultural water could expedite the detection and virulotyping of Shiga toxin-producing Escherichia coli (STEC). We previously determined the limits of a complete, closed metagenome-assembled genome (MAG) assembly and of a complete, fragmented MAG assembly for O157:H7 in enriched agricultural water using long reads (Oxford Nanopore Technologies, Oxford), which were 107 and 105 CFU/ml, respectively. However, the nanopore assemblies did not have enough accuracy to be used in Single Nucleotide Polymorphism (SNP) phylogenies and cannot be used for the precise identification of an outbreak STEC strain. The present study aimed to determine the limits of detection and assembly for STECs in enriched agricultural water by Illumina MiSeq sequencing technology alone, followed by establishing the limit of hybrid assembly with nanopore long-read sequencing using three different hybrid assemblers (SPAdes, Unicycler, and OPERA-MS). We also aimed to generate a genome with enough accuracy to be used in a SNP phylogeny. The classification of MiSeq and nanopore sequencing identified the same highly abundant species. Using the totality of the MiSeq output and a precision metagenomics approach in which the E. coli reads are binned before assembly, the limit of detection and assembly of STECs by MiSeq were determined to be 105 and 107 CFU/ml, respectively. While a complete, closed MAG could not be generated at any concentration, a complete, fragmented MAG was produced using the SPAdes assembler with an STEC concentration of at least 107 CFU/ml. At this concentration, hybrid assembled contigs aligned to the nanopore-assembled genome could be accurately placed in a neighbor-joining tree. The MiSeq limit of detection and assembly was less sensitive than nanopore sequencing, which was likely due to factors including the small starting material (50 vs. 1 µg) and the dilution of the library loaded on the cartridge. This pilot study demonstrates that MiSeq sequencing requires higher coverage in precision metagenomic samples; however, with sufficient concentration, STECs can be characterized and phylogeny can be accurately determined.

16.
Pediatr Clin North Am ; 70(5): 905-916, 2023 10.
Article in English | MEDLINE | ID: mdl-37704349

ABSTRACT

Selecting the ideal test to evaluate an individual with a suspected genetic disorder can be challenging. While several clinical testing options are available, no single test yet captures all potentially causative genetic variants. Thus, clinicians may order testing in a stepwise fashion, and what to order after non-diagnostic testing can be challenging to determine. Here, we provide an overview of commonly used clinical genetic tests, guidance on when they are best used, and what they may miss. We conclude with a discussion of how new technologies might be used to identify challenging variants and simplify clinical testing in the future.


Subject(s)
Exome , Genetic Testing , Humans
17.
Appl Plant Sci ; 11(4): e11537, 2023.
Article in English | MEDLINE | ID: mdl-37601316

ABSTRACT

Recent technological advances in long-read high-throughput sequencing and assembly methods have facilitated the generation of annotated chromosome-scale whole-genome sequence data for evolutionary studies; however, generating such data can still be difficult for many plant species. For example, obtaining high-molecular-weight DNA is typically impossible for samples in historical herbarium collections, which often have degraded DNA. The need to fast-freeze newly collected living samples to conserve high-quality DNA can be complicated when plants are only found in remote areas. Therefore, short-read reduced-genome representations, such as target capture and genome skimming, remain important for evolutionary studies. Here, we review the pros and cons of each technique for non-model plant taxa. We provide guidance related to logistics, budget, the genomic resources previously available for the target clade, and the nature of the study. Furthermore, we assess the available bioinformatic analyses, detailing best practices and pitfalls, and suggest pathways to combine newly generated data with legacy data. Finally, we explore the possible downstream analyses allowed by the type of data generated using each technique. We provide a practical guide to help researchers make the best-informed choice regarding reduced genome representation for evolutionary studies of non-model plants in cases where whole-genome sequencing remains impractical.

18.
Microb Genom ; 9(8)2023 08.
Article in English | MEDLINE | ID: mdl-37526643

ABSTRACT

The global surveillance and outbreak investigation of antimicrobial resistance (AMR) is amidst a paradigm shift from traditional biology to bioinformatics. This is due to developments in whole-genome-sequencing (WGS) technologies, bioinformatics tools, and reduced costs. The increased use of WGS is accompanied by challenges such as standardization, quality control (QC), and data sharing. Thus, there is global need for inter-laboratory WGS proficiency test (PT) schemes to evaluate laboratories' capacity to produce reliable genomic data. Here, we present the results of the first iteration of the Genomic PT (GPT) organized by the Global Capacity Building Group at the Technical University of Denmark in 2020. Participating laboratories sequenced two isolates and corresponding DNA of Salmonella enterica, Escherichia coli and Campylobacter coli, using WGS methodologies routinely employed at their laboratories. The participants' ability to obtain consistently good-quality WGS data was assessed based on several QC WGS metrics. A total of 21 laboratories from 21 European countries submitted WGS and meta-data. Most delivered high-quality sequence data with only two laboratories identified as overall underperforming. The QC metrics, N50 and number of contigs, were identified as good indicators for high-sequencing quality. We propose QC thresholds for N50 greater than 20 000 and 25 000 for Campylobacter coli and Escherichia coli, respectively, and number of contigs >200 bp greater than 225, 265 and 100 for Salmonella enterica, Escherichia coli and Campylobacter coli, respectively. The GPT2020 results confirm the importance of systematic QC procedures, ensuring the submission of reliable WGS data for surveillance and outbreak investigation to meet the requirements of the paradigm shift in methodology.


Subject(s)
Anti-Bacterial Agents , Salmonella enterica , Humans , Anti-Bacterial Agents/pharmacology , European Union , Drug Resistance, Bacterial/genetics , Escherichia coli/genetics , Genomics , Salmonella enterica/genetics
19.
Microbiol Spectr ; 11(4): e0035623, 2023 08 17.
Article in English | MEDLINE | ID: mdl-37466446

ABSTRACT

Escherichia coli sequence type 131 (ST131) has contributed to the spread of extended-spectrum beta-lactamase (ESBL) and has emerged as the dominant cause of hospital- and community-acquired urinary tract infections. Here, we report for the first time an in-depth analysis of whole-genome sequencing (WGS) of 4 ESBL-producing E. coli ST131 isolates recovered from patients in two hospitals in Armenia using Illumina short-read sequencing for accurate base calling to determine their genotype and to infer their phylogeny and using Oxford Nanopore Technologies long-read sequencing to resolve plasmid and chromosomal genetic elements. Genotypically, the four Armenian isolates were identified as part of the H30Rx/clade C2 (n = 2) and H41/clade A (n = 2) lineages and were phylogenetically closely related to isolates from the European Nucleotide Archive (ENA) database previously recovered from patients in the United States, Australia, and New Zealand. The Armenian isolates recovered in this study had chromosomal integration of the blaCTX-M-15 gene in the H30Rx isolates and a high number of virulence genes found in the H41 isolates associated with the carriage of a rare genomic island (in the context of E. coli ST131) containing the S fimbrial, salmochelin siderophore, and microcin H47 virulence genes. Furthermore, our data show the evolution of the IncF[2:A2:B20] plasmid harboring both blaCTX-M-15 and blaCTX-M-27 genes, derived from the recombination of genes from an IncF[F2:A-:B-] blaCTX-M-15-associated plasmid into the IncF[F1:A2:B20] blaCTX-M-27-associated plasmid backbone seen in two genetically closely related H41 Armenian isolates. IMPORTANCE Combining short and long reads from whole-genome sequencing analysis provided a genetic context for uncommon genes of clinical importance to better understand transmission and evolutionary features of ESBL-producing uropathogenic E. coli (UPEC) ST131 isolates recovered in Armenia. Using hybrid genome assembly in countries lacking genomic surveillance studies can inform us about new lineages not seen in other countries with genes encoding high virulence and antibiotic resistance harbored on mobile genetic elements.


Subject(s)
Escherichia coli Infections , Escherichia coli Proteins , Humans , Escherichia coli/genetics , Plasmids/genetics , Escherichia coli Infections/epidemiology , Escherichia coli Proteins/genetics , beta-Lactamases/genetics , Anti-Bacterial Agents
20.
Biol Res ; 56(1): 42, 2023 Jul 20.
Article in English | MEDLINE | ID: mdl-37468985

ABSTRACT

The human genome contains regions that cannot be adequately assembled or aligned using next generation short-read sequencing technologies. More than 2500 genes are known contain such 'dark' regions. In this study, we investigate the negative consequences of dark regions on gene discovery across a range of disease and study types, showing that dark regions are likely preventing researchers from identifying genetic variants relevant to human disease.


Subject(s)
Genome, Human , High-Throughput Nucleotide Sequencing , Humans , Genome, Human/genetics , Sequence Analysis, DNA
SELECTION OF CITATIONS
SEARCH DETAIL