Pesquisa | Secretaria de Estado da Saúde

1.

eccDNA-pipe: an integrated pipeline for identification, analysis and visualization of extrachromosomal circular DNA from high-throughput sequencing data.

Fang, Minghao; Fang, Jingwen; Luo, Songwen; Liu, Ke; Yu, Qiaoni; Yang, Jiaxuan; Zhou, Youyang; Li, Zongkai; Sun, Ruoming; Guo, Chuang; Qu, Kun.

Brief Bioinform ; 25(2)2024 Jan 22.

Artigo em Inglês | MEDLINE | ID: mdl-38349061

RESUMO

Extrachromosomal circular DNA (eccDNA) is currently attracting considerable attention from researchers due to its significant impact on tumor biogenesis. High-throughput sequencing (HTS) methods for eccDNA identification are continually evolving. However, an efficient pipeline for the integrative and comprehensive analysis of eccDNA obtained from HTS data is still lacking. Here, we introduce eccDNA-pipe, an accessible software package that offers a user-friendly pipeline for conducting eccDNA analysis starting from raw sequencing data. This dataset includes data from various sequencing techniques such as whole-genome sequencing (WGS), Circle-seq and Circulome-seq, obtained through short-read sequencing or long-read sequencing. eccDNA-pipe presents a comprehensive solution for both upstream and downstream analysis, encompassing quality control and eccDNA identification in upstream analysis and downstream tasks such as eccDNA length distribution analysis, differential analysis of genes enriched with eccDNA and visualization of eccDNA structures. Notably, eccDNA-pipe automatically generates high-quality publication-ready plots. In summary, eccDNA-pipe provides a comprehensive and user-friendly pipeline for ï»¿customized analysis of eccDNA research.

Assuntos

DNA Circular , Neoplasias , Humanos , DNA Circular/genética , DNA/genética , Sequenciamento de Nucleotídeos em Larga Escala , Sequenciamento Completo do Genoma

2.

BEERS2: RNA-Seq simulation through high fidelity in silico modeling.

Brooks, Thomas G; Lahens, Nicholas F; Mrcela, Antonijo; Sarantopoulou, Dimitra; Nayak, Soumyashant; Naik, Amruta; Sengupta, Shaon; Choi, Peter S; Grant, Gregory R.

Brief Bioinform ; 25(3)2024 Mar 27.

Artigo em Inglês | MEDLINE | ID: mdl-38605641

RESUMO

Simulation of RNA-seq reads is critical in the assessment, comparison, benchmarking and development of bioinformatics tools. Yet the field of RNA-seq simulators has progressed little in the last decade. To address this need we have developed BEERS2, which combines a flexible and highly configurable design with detailed simulation of the entire library preparation and sequencing pipeline. BEERS2 takes input transcripts (typically fully length messenger RNA transcripts with polyA tails) from either customizable input or from CAMPAREE simulated RNA samples. It produces realistic reads of these transcripts as FASTQ, SAM or BAM formats with the SAM or BAM formats containing the true alignment to the reference genome. It also produces true transcript-level quantification values. BEERS2 combines a flexible and highly configurable design with detailed simulation of the entire library preparation and sequencing pipeline and is designed to include the effects of polyA selection and RiboZero for ribosomal depletion, hexamer priming sequence biases, GC-content biases in polymerase chain reaction (PCR) amplification, barcode read errors and errors during PCR amplification. These characteristics combine to make BEERS2 the most complete simulation of RNA-seq to date. Finally, we demonstrate the use of BEERS2 by measuring the effect of several settings on the popular Salmon pseudoalignment algorithm.

Assuntos

Genoma , RNA , RNA-Seq , Análise de Sequência de RNA , Simulação por Computador , RNA/genética , Sequenciamento de Nucleotídeos em Larga Escala

3.

Benchmarking genome assembly methods on metagenomic sequencing data.

Zhang, Zhenmiao; Yang, Chao; Veldsman, Werner Pieter; Fang, Xiaodong; Zhang, Lu.

Brief Bioinform ; 24(2)2023 03 19.

Artigo em Inglês | MEDLINE | ID: mdl-36917471

RESUMO

Metagenome assembly is an efficient approach to reconstruct microbial genomes from metagenomic sequencing data. Although short-read sequencing has been widely used for metagenome assembly, linked- and long-read sequencing have shown their advancements in assembly by providing long-range DNA connectedness. Many metagenome assembly tools were developed to simplify the assembly graphs and resolve the repeats in microbial genomes. However, there remains no comprehensive evaluation of metagenomic sequencing technologies, and there is a lack of practical guidance on selecting the appropriate metagenome assembly tools. This paper presents a comprehensive benchmark of 19 commonly used assembly tools applied to metagenomic sequencing datasets obtained from simulation, mock communities or human gut microbiomes. These datasets were generated using mainstream sequencing platforms, such as Illumina and BGISEQ short-read sequencing, 10x Genomics linked-read sequencing, and PacBio and Oxford Nanopore long-read sequencing. The assembly tools were extensively evaluated against many criteria, which revealed that long-read assemblers generated high contig contiguity but failed to reveal some medium- and high-quality metagenome-assembled genomes (MAGs). Linked-read assemblers obtained the highest number of overall near-complete MAGs from the human gut microbiomes. Hybrid assemblers using both short- and long-read sequencing were promising methods to improve both total assembly length and the number of near-complete MAGs. This paper also discussed the running time and peak memory consumption of these assembly tools and provided practical guidance on selecting them.

Assuntos

Metagenoma , Microbiota , Humanos , Benchmarking , Microbiota/genética , Metagenômica/métodos , Genômica/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Análise de Sequência de DNA/métodos

4.

Characterization of fine geographic scale population genetics in sugar kelp (Saccharina latissima) using genome-wide markers.

Bråtelund, Signe; Ruttink, Tom; Goecke, Franz; Broch, Ole Jacob; Klemetsdal, Gunnar; Ødegård, Jørgen; Ergon, Åshild.

BMC Genomics ; 25(1): 901, 2024 Sep 30.

Artigo em Inglês | MEDLINE | ID: mdl-39350004

RESUMO

BACKGROUND: Kelps are not only ecologically important, being primary producers and habitat forming species, they also hold substantial economic potential. Expansion of the kelp cultivation industry raises the interest for genetic improvement of kelp for cultivation, as well as concerns about genetic introgression from cultivated to wild populations. Thus, increased understanding of population genetics in natural kelp populations is crucial. Genotyping-by-sequencing (GBS) is a powerful tool for studying population genetics. Here, using Saccharina latissima (sugar kelp) as our study species, we characterize the population genetics at a fine geographic scale, while also investigating the influence of marker type (biallelic SNPs versus multi-allelic short read-backed haplotypes) and minor allele count (MAC) thresholds on estimated population genetic metrics. RESULTS: We examined 150 sporophytes from 10 locations within a small area in Mid-Norway. Employing GBS, we detected 20,710 bi-allelic SNPs and 42,264 haplotype alleles at 20,297 high quality GBS loci. We used both marker types as well as two MAC filtering thresholds (3 and 15) in the analyses. Overall, higher genetic diversity, more outbreeding and stronger substructure was estimated using haplotypes compared to SNPs, and with MAC 15 compared to MAC 3. The population displayed high genetic diversity (HE ranging from 0.18-0.37) and significant outbreeding (FIS ≤ - 0.076). Construction of a genomic relationship matrix, however, revealed a few close relatives within sampling locations. The connectivity between sampling locations was high (FST ≤ 0.09), but subtle, yet significant, genetic substructure was detected, even between sampling locations separated by less than 2 km. Isolation-by-distance was significant and explained 15% of the genetic variation, while incorporation of predicted currents in an "isolation-by-oceanography" model explained a larger proportion (~ 27%). CONCLUSION: The studied population is diverse, significantly outbred and exhibits high connectivity, partly due to local currents. The use of genome-wide markers combined with permutation testing provides high statistical power to detect subtle population substructure and inbreeding or outbreeding. Short haplotypes extracted from GBS data and removal of rare alleles enhances the resolution. Careful consideration of marker type and filtering thresholds is crucial when comparing independent studies, as they profoundly influence numerical estimates of population genetic metrics.

Assuntos

Genética Populacional , Haplótipos , Kelp , Polimorfismo de Nucleotídeo Único , Kelp/genética , Marcadores Genéticos , Alelos , Variação Genética , Algas Comestíveis , Laminaria

5.

Haplotype determination of the Bombyx mori nucleopolyhedrovirus by Nanopore sequencing and linkage of single nucleotide variants.

Wennmann, Jörg T; Lim, Fang-Shiang; Senger, Sergei; Gani, Mudasir; Jehle, Johannes A; Keilwagen, Jens.

J Gen Virol ; 105(5)2024 05.

Artigo em Inglês | MEDLINE | ID: mdl-38767624

RESUMO

Naturally occurring isolates of baculoviruses, such as the Bombyx mori nucleopolyhedrovirus (BmNPV), usually consist of numerous genetically different haplotypes. Deciphering the different haplotypes of such isolates is hampered by the large size of the dsDNA genome, as well as the short read length of next generation sequencing (NGS) techniques that are widely applied for baculovirus isolate characterization. In this study, we addressed this challenge by combining the accuracy of NGS to determine single nucleotide variants (SNVs) as genetic markers with the long read length of Nanopore sequencing technique. This hybrid approach allowed the comprehensive analysis of genetically homogeneous and heterogeneous isolates of BmNPV. Specifically, this allowed the identification of two putative major haplotypes in the heterogeneous isolate BmNPV-Ja by SNV position linkage. SNV positions, which were determined based on NGS data, were linked by the long Nanopore reads in a Position Weight Matrix. Using a modified Expectation-Maximization algorithm, the Nanopore reads were assigned according to the occurrence of variable SNV positions by machine learning. The cohorts of reads were de novo assembled, which led to the identification of BmNPV haplotypes. The method demonstrated the strength of the combined approach of short- and long-read sequencing techniques to decipher the genetic diversity of baculovirus isolates.

Assuntos

Bombyx , Haplótipos , Sequenciamento de Nucleotídeos em Larga Escala , Sequenciamento por Nanoporos , Nucleopoliedrovírus , Polimorfismo de Nucleotídeo Único , Nucleopoliedrovírus/genética , Nucleopoliedrovírus/classificação , Nucleopoliedrovírus/isolamento & purificação , Animais , Sequenciamento por Nanoporos/métodos , Bombyx/virologia , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Genoma Viral

6.

The genome of Anoplarchus purpurescens (Stichaeidae) reflects its carnivorous diet.

Le, Ninh; Heras, Joseph; Herrera, Michelle J; German, Donovan P; Crummett, Lisa T.

Mol Genet Genomics ; 298(6): 1419-1434, 2023 Nov.

Artigo em Inglês | MEDLINE | ID: mdl-37690047

RESUMO

Digestion is driven by digestive enzymes and digestive enzyme gene copy number can provide insights on the genomic underpinnings of dietary specialization. The "Adaptive Modulation Hypothesis" (AMH) proposes that digestive enzyme activity, which increases with increased gene copy number, should correlate with substrate quantity in the diet. To test the AMH and reveal some of the genetics of herbivory vs carnivory, we sequenced, assembled, and annotated the genome of Anoplarchus purpurescens, a carnivorous prickleback fish in the family Stichaeidae, and compared the gene copy number for key digestive enzymes to that of Cebidichthys violaceus, a herbivorous fish from the same family. A highly contiguous genome assembly of high quality (N50 = 10.6 Mb) was produced for A. purpurescens, using combined long-read and short-read technology, with an estimated 33,842 protein-coding genes. The digestive enzymes that we examined include pancreatic α-amylase, carboxyl ester lipase, alanyl aminopeptidase, trypsin, and chymotrypsin. Anoplarchus purpurescens had fewer copies of pancreatic α-amylase (carbohydrate digestion) than C. violaceus (1 vs. 3 copies). Moreover, A. purpurescens had one fewer copy of carboxyl ester lipase (plant lipid digestion) than C. violaceus (4 vs. 5). We observed an expansion in copy number for several protein digestion genes in A. purpurescens compared to C. violaceus, including trypsin (5 vs. 3) and total aminopeptidases (6 vs. 5). Collectively, these genomic differences coincide with measured digestive enzyme activities (phenotypes) in the two species and they support the AMH. Moreover, this genomic resource is now available to better understand fish biology and dietary specialization.

Assuntos

Carnivoridade , Perciformes , Animais , Tripsina/metabolismo , Filogenia , alfa-Amilases Pancreáticas/metabolismo , Peixes , Dieta , Lipase/metabolismo , Ésteres/metabolismo

7.

Choice of assemblers has a critical impact on de novo assembly of SARS-CoV-2 genome and characterizing variants.

Islam, Rashedul; Raju, Rajan Saha; Tasnim, Nazia; Shihab, Istiak Hossain; Bhuiyan, Maruf Ahmed; Araf, Yusha; Islam, Tofazzal.

Brief Bioinform ; 22(5)2021 09 02.

Artigo em Inglês | MEDLINE | ID: mdl-33822878

RESUMO

BACKGROUND: Coronavirus Disease 2019 (COVID-19), caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), has become a global pandemic following its initial emergence in China. SARS-CoV-2 has a positive-sense single-stranded RNA virus genome of around 30Kb. Using next-generation sequencing technologies, a large number of SARS-CoV-2 genomes are being sequenced at an unprecedented rate and being deposited in public repositories. For the de novo assembly of the SARS-CoV-2 genomes, a myriad of assemblers is being used, although their impact on the assembly quality has not been characterized for this virus. In this study, we aim to understand the variabilities on assembly qualities due to the choice of the assemblers. RESULTS: We performed 6648 de novo assemblies of 416 SARS-CoV-2 samples using eight different assemblers with different k-mer lengths. We used Illumina paired-end sequencing reads and compared the assembly quality of those assemblers. We showed that the choice of assembler plays a significant role in reconstructing the SARS-CoV-2 genome. Two metagenomic assemblers, e.g. MEGAHIT and metaSPAdes, performed better compared with others in most of the assembly quality metrics including, recovery of a larger fraction of the genome, constructing larger contigs and higher N50, NA50 values, etc. We showed that at least 09% (259/2873) of the variants present in the assemblies between MEGAHIT and metaSPAdes are unique to one of the assembly methods. CONCLUSION: Our analyses indicate the critical role of assembly methods for assembling SARS-CoV-2 genome using short reads and their impact on variant characterization. This study could help guide future studies to determine the best-suited assembler for the de novo assembly of virus genomes.

Assuntos

Genoma Viral , Mutação , SARS-CoV-2/genética , COVID-19/virologia , Bases de Dados Genéticas , Sequências de Repetição em Tandem

8.

PERHAPS: Paired-End short Reads-based HAPlotyping from next-generation Sequencing data.

Huang, Jie; Pallotti, Stefano; Zhou, Qianling; Kleber, Marcus; Xin, Xiaomeng; King, Daniel A; Napolioni, Valerio.

Brief Bioinform ; 22(4)2021 07 20.

Artigo em Inglês | MEDLINE | ID: mdl-33285565

RESUMO

The identification of rare haplotypes may greatly expand our knowledge in the genetic architecture of both complex and monogenic traits. To this aim, we developed PERHAPS (Paired-End short Reads-based HAPlotyping from next-generation Sequencing data), a new and simple approach to directly call haplotypes from short-read, paired-end Next Generation Sequencing (NGS) data. To benchmark this method, we considered the APOE classic polymorphism (*1/*2/*3/*4), since it represents one of the best examples of functional polymorphism arising from the haplotype combination of two Single Nucleotide Polymorphisms (SNPs). We leveraged the big Whole Exome Sequencing (WES) and SNP-array data obtained from the multi-ethnic UK BioBank (UKBB, N=48,855). By applying PERHAPS, based on piecing together the paired-end reads according to their FASTQ-labels, we extracted the haplotype data, along with their frequencies and the individual diplotype. Concordance rates between WES directly called diplotypes and the ones generated through statistical pre-phasing and imputation of SNP-array data are extremely high (>99%), either when stratifying the sample by SNP-array genotyping batch or self-reported ethnic group. Hardy-Weinberg Equilibrium tests and the comparison of obtained haplotype frequencies with the ones available from the 1000 Genome Project further supported the reliability of PERHAPS. Notably, we were able to determine the existence of the rare APOE*1 haplotype in two unrelated African subjects from UKBB, supporting its presence at appreciable frequency (approximatively 0.5%) in the African Yoruba population. Despite acknowledging some technical shortcomings, PERHAPS represents a novel and simple approach that will partly overcome the limitations in direct haplotype calling from short read-based sequencing.

Assuntos

Algoritmos , Genoma Humano , Haplótipos , Sequenciamento de Nucleotídeos em Larga Escala , Polimorfismo de Nucleotídeo Único , Apolipoproteínas E/genética , Projeto Genoma Humano , Humanos

9.

ngsComposer: an automated pipeline for empirically based NGS data quality filtering.

Kuster, Ryan D; Yencho, G Craig; Olukolu, Bode A.

Brief Bioinform ; 22(5)2021 09 02.

Artigo em Inglês | MEDLINE | ID: mdl-33822850

RESUMO

Next-generation sequencing (NGS) enables massively parallel acquisition of large-scale omics data; however, objective data quality filtering parameters are lacking. Although a useful metric, evidence reveals that platform-generated Phred values overestimate per-base quality scores. We have developed novel and empirically based algorithms that streamline NGS data quality filtering. The pipeline leverages known sequence motifs to enable empirical estimation of error rates, detection of erroneous base calls and removal of contaminating adapter sequence. The performance of motif-based error detection and quality filtering were further validated with read compression rates as an unbiased metric. Elevated error rates at read ends, where known motifs lie, tracked with propagation of erroneous base calls. Barcode swapping, an inherent problem with pooled libraries, was also effectively mitigated. The ngsComposer pipeline is suitable for various NGS protocols and platforms due to the universal concepts on which the algorithms are based.

Assuntos

Algoritmos , Biologia Computacional/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Análise de Sequência de DNA/métodos , Software , Simulação por Computador , Humanos , Reprodutibilidade dos Testes

10.

Optical genome mapping and revisiting short-read genome sequencing data reveal previously overlooked structural variants disrupting retinal disease-associated genes.

de Bruijn, Suzanne E; Rodenburg, Kim; Corominas, Jordi; Ben-Yosef, Tamar; Reurink, Janine; Kremer, Hannie; Whelan, Laura; Plomp, Astrid S; Berger, Wolfgang; Farrar, G Jane; Ferenc Kovács, Árpád; Fajardy, Isabelle; Hitti-Malin, Rebekkah J; Weisschuh, Nicole; Weener, Marianna E; Sharon, Dror; Pennings, Ronald J E; Haer-Wigman, Lonneke; Hoyng, Carel B; Nelen, Marcel R; Vissers, Lisenka E L M; van den Born, L Ingeborgh; Gilissen, Christian; Cremers, Frans P M; Hoischen, Alexander; Neveling, Kornelia; Roosing, Susanne.

Genet Med ; 25(3): 100345, 2023 03.

Artigo em Inglês | MEDLINE | ID: mdl-36524988

RESUMO

PURPOSE: Structural variants (SVs) play an important role in inherited retinal diseases (IRD). Although the identification of SVs significantly improved upon the availability of genome sequencing, it is expected that involvement of SVs in IRDs is higher than anticipated. We revisited short-read genome sequencing data to enhance the identification of gene-disruptive SVs. METHODS: Optical genome mapping was performed to improve SV detection in short-read genome sequencing-negative cases. In addition, reanalysis of short-read genome sequencing data was performed to improve the interpretation of SVs and to re-establish SV prioritization criteria. RESULTS: In a monoallelic USH2A case, optical genome mapping identified a pericentric inversion (173 megabase), with 1 breakpoint disrupting USH2A. Retrospectively, the variant could be observed in genome sequencing data but was previously deemed false positive. Reanalysis of short-read genome sequencing data (427 IRD cases) was performed which yielded 30 pathogenic SVs affecting, among other genes, USH2A (n = 15), PRPF31 (n = 3), and EYS (n = 2). Eight of these (>25%) were overlooked during previous analyses. CONCLUSION: Critical evaluation of our findings allowed us to re-establish and improve our SV prioritization and interpretation guidelines, which will prevent missing pathogenic events in future analyses. Our data suggest that more attention should be paid to SV interpretation and the current contribution of SVs in IRDs is still underestimated.

Assuntos

Genoma Humano , Doenças Retinianas , Humanos , Estudos Retrospectivos , Genoma Humano/genética , Mapeamento Cromossômico , Análise de Sequência , Doenças Retinianas/genética , Variação Estrutural do Genoma , Proteínas do Olho/genética

11.

The role of plasmids in carbapenem resistant E. coli in Alameda County, California.

Walas, Nikolina; Slown, Samuel; Amato, Heather K; Lloyd, Tyler; Bender, Monica; Varghese, Vici; Pandori, Mark; Graham, Jay P.

BMC Microbiol ; 23(1): 147, 2023 05 22.

Artigo em Inglês | MEDLINE | ID: mdl-37217873

RESUMO

BACKGROUND: Antimicrobial resistant infections continue to be a leading global public health crisis. Mobile genetic elements, such as plasmids, have been shown to play a major role in the dissemination of antimicrobial resistance (AMR) genes. Despite its ongoing threat to human health, surveillance of AMR in the United States is often limited to phenotypic resistance. Genomic analyses are important to better understand the underlying resistance mechanisms, assess risk, and implement appropriate prevention strategies. This study aimed to investigate the extent of plasmid mediated antimicrobial resistance that can be inferred from short read sequences of carbapenem resistant E. coli (CR-Ec) in Alameda County, California. E. coli isolates from healthcare locations in Alameda County were sequenced using an Illumina MiSeq and assembled with Unicycler. Genomes were categorized according to predefined multilocus sequence typing (MLST) and core genome multilocus sequence typing (cgMLST) schemes. Resistance genes were identified and corresponding contigs were predicted to be plasmid-borne or chromosome-borne using two bioinformatic tools (MOB-suite and mlplasmids). RESULTS: Among 82 of CR-Ec identified between 2017 and 2019, twenty-five sequence types (STs) were detected. ST131 was the most prominent (n = 17) followed closely by ST405 (n = 12). blaCTX-M were the most common ESBL genes and just over half (18/30) of these genes were predicted to be plasmid-borne by both MOB-suite and mlplasmids. Three genetically related groups of E. coli isolates were identified with cgMLST. One of the groups contained an isolate with a chromosome-borne blaCTX-M-15 gene and an isolate with a plasmid-borne blaCTX-M-15 gene. CONCLUSIONS: This study provides insights into the dominant clonal groups driving carbapenem resistant E. coli infections in Alameda County, CA, USA clinical sites and highlights the relevance of whole-genome sequencing in routine local genomic surveillance. The finding of multi-drug resistant plasmids harboring high-risk resistance genes is of concern as it indicates a risk of dissemination to previously susceptible clonal groups, potentially complicating clinical and public health intervention.

Assuntos

Infecções por Escherichia coli , Escherichia coli , Humanos , Escherichia coli/genética , Carbapenêmicos/farmacologia , Tipagem de Sequências Multilocus , Antibacterianos/farmacologia , Plasmídeos/genética , Infecções por Escherichia coli/epidemiologia , beta-Lactamases/genética , Testes de Sensibilidade Microbiana

12.

Unmapped short reads from whole-genome sequencing indicate potential infectious pathogens in german black Pied cattle.

Neumann, Guilherme B; Korkuc, Paula; Reißmann, Monika; Wolf, Manuel J; May, Katharina; König, Sven; Brockmann, Gudrun A.

Vet Res ; 54(1): 95, 2023 Oct 18.

Artigo em Inglês | MEDLINE | ID: mdl-37853447

RESUMO

When resequencing animal genomes, some short reads cannot be mapped to the reference genome and are usually discarded. In this study, unmapped reads from 302 German Black Pied cattle were analyzed to identify potential pathogenic DNA. These unmapped reads were assembled and blasted against NCBI's database to identify bacterial and viral sequences. The results provided evidence for the presence of pathogens. We found sequences of Bovine parvovirus 3 and Mycoplasma species. These findings emphasize the information content of unmapped reads for gaining insight into bacterial and viral infections, which is important for veterinarians and epidemiologists.

Assuntos

Doenças dos Bovinos , Viroses , Bovinos , Animais , Análise de Sequência de DNA/veterinária , Sequenciamento Completo do Genoma/veterinária , Viroses/veterinária , Bactérias/genética , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Sequenciamento de Nucleotídeos em Larga Escala/veterinária

13.

Investigating the dark-side of the genome: a barrier to human disease variant discovery?

Ryan, Niamh M; Corvin, Aiden.

Biol Res ; 56(1): 42, 2023 Jul 20.

Artigo em Inglês | MEDLINE | ID: mdl-37468985

RESUMO

The human genome contains regions that cannot be adequately assembled or aligned using next generation short-read sequencing technologies. More than 2500 genes are known contain such 'dark' regions. In this study, we investigate the negative consequences of dark regions on gene discovery across a range of disease and study types, showing that dark regions are likely preventing researchers from identifying genetic variants relevant to human disease.

Assuntos

Genoma Humano , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Genoma Humano/genética , Análise de Sequência de DNA

14.

Decoding sex: Elucidating sex determination and how high-quality genome assemblies are untangling the evolutionary dynamics of sex chromosomes.

Ramos, Luana; Antunes, Agostinho.

Genomics ; 114(2): 110277, 2022 03.

Artigo em Inglês | MEDLINE | ID: mdl-35104609

RESUMO

Sexual reproduction is a diverse and widespread process. In gonochoristic species, the differentiation of sexes occurs through diverse mechanisms, influenced by environmental and genetic factors. In most vertebrates, a master-switch gene is responsible for triggering a sex determination network. However, only a few genes have acquired master-switch functions, and this process is associated with the evolution of sex-chromosomes, which have a significant influence in evolution. Additionally, their highly repetitive regions impose challenges for high-quality sequencing, even using high-throughput, state-of-the-art techniques. Here, we review the mechanisms involved in sex determination and their role in the evolution of species, particularly vertebrates, focusing on sex chromosomes and the challenges involved in sequencing these genomic elements. We also address the improvements provided by the growth of sequencing projects, by generating a massive number of near-gapless, telomere-to-telomere, chromosome-level, phased assemblies, increasing the number and quality of sex-chromosome sequences available for further studies.

Assuntos

Cromossomos Sexuais , Telômero , Animais , Sequências Repetitivas de Ácido Nucleico , Cromossomos Sexuais/genética , Telômero/genética , Vertebrados/genética

15.

Evaluating the activity of nonsense-mediated RNA decay via Nanopore direct RNA sequencing.

Li, Ying; Wan, Li; Zhang, Lili; Zhuo, Zhongling; Luo, Xuanmei; Cui, Jingyi; Liu, Ye; Su, Fei; Tang, Min; Xiao, Fei.

Biochem Biophys Res Commun ; 621: 67-73, 2022 09 17.

Artigo em Inglês | MEDLINE | ID: mdl-35810593

RESUMO

Nonsense-mediated mRNA decay (NMD) and its regulation play an important role in eliminating faulty transcripts and controlling gene expression. However, measuring NMD activity and characterizing its targets remain challenging. In this study, we set out to establish Nanopore direct RNA sequencing in combination with quantitative real-time PCR (qPCR) as a method for analyzing NMD activity and its targets in cultured cell lines and clinical tissue samples. Nanopore RNA sequencing could detect more isoforms than short-read sequencing, especially in identifying novel isoforms and predicting isoforms annotated with premature termination codon (PTC). Changes in transcriptional isoforms of five genes (PRS, RPL12, SRSF2, PPIA, and TMEM208) could faithfully reflect NMD activity in the three cell lines and prostate cancer (PCA) samples. NMD activity in PCA samples varied, but some patients showed an increased trend. Together, Nanopore sequencing was superior in identifying NMD targets and evaluating NMD activity compared with short-read sequencing, and the NMD markers we screened may be used for measuring NMD activity in clinical patients.

Assuntos

Sequenciamento por Nanoporos , Nanoporos , Humanos , Masculino , Proteínas de Membrana/metabolismo , Degradação do RNAm Mediada por Códon sem Sentido , Isoformas de Proteínas/metabolismo , RNA/metabolismo , Estabilidade de RNA/genética , RNA Mensageiro/genética , RNA Mensageiro/metabolismo , Análise de Sequência de RNA

16.

Comprehensive collection of genes and comparative analysis of full-length transcriptome sequences from Japanese larch (Larix kaempferi) and Kuril larch (Larix gmelinii var. japonica).

Mishima, Kentaro; Hirakawa, Hideki; Iki, Taiichi; Fukuda, Yoko; Hirao, Tomonori; Tamura, Akira; Takahashi, Makoto.

BMC Plant Biol ; 22(1): 470, 2022 Oct 04.

Artigo em Inglês | MEDLINE | ID: mdl-36192701

RESUMO

BACKGROUND: Japanese larch (Larix kaempferi) is an economically important deciduous conifer species that grows in cool-temperate forests and is endemic to Japan. Kuril larch (L. gmelinii var. japonica) is a variety of Dahurian larch that is naturally distributed in the Kuril Islands and Sakhalin. The hybrid larch (L. gmelinii var. japonica × L. kaempferi) exhibits heterosis, which manifests as rapid juvenile growth and high resistance to vole grazing. Since these superior characteristics have been valued by forestry managers, the hybrid larch is one of the most important plantation species in Hokkaido. To accelerate molecular breeding in these species, we collected and compared full-length cDNA isoforms (Iso-Seq) and RNA-Seq short-read, and merged them to construct candidate gene as reference for both Larix species. To validate the results, candidate protein-coding genes (ORFs) related to some flowering signal-related genes âwere screened from the reference sequences, and the phylogenetic relationship with closely related species was elucidated. RESULTS: Using the isoform sequencing of PacBio RS ll and the de novo assembly of RNA-Seq short-read sequences, we identified 50,690 and 38,684 ORFs in Japanese larch and Kuril larch, respectively. BUSCO completeness values were 90.5% and 92.1% in the Japanese and Kuril larches, respectively. After comparing the collected ORFs from the two larch species, a total of 19,813 clusters, comprising 22,571 Japanese larch ORFs and 22,667 Kuril larch ORFs, were contained in the intersection of the Venn diagram. In addition, we screened several ORFs related to flowering signals (SUPPRESSER OF OVEREXPRESSION OF CO1: SOC1, LEAFY: LFY, FLOWERING Locus T: FT, CONSTANCE: CO) from both reference sequences, and very similar found in other species. CONCLUSIONS: The collected ORFs will be useful as reference sequences for molecular breeding of Japanese and Kuril larches, and also for clarifying the evolution of the conifer genome and investigating functional genomics.

Assuntos

Larix , DNA Complementar , Japão , Larix/genética , Filogenia , Transcriptoma

17.

RNA sequencing and its applications in cancer and rare diseases.

Ergin, Selvi; Kherad, Nasim; Alagoz, Meryem.

Mol Biol Rep ; 49(3): 2325-2333, 2022 Mar.

Artigo em Inglês | MEDLINE | ID: mdl-34988891

RESUMO

With the invention of RNA sequencing over a decade ago, diagnosis and identification of the gene-related diseases entered a new phase that enabled more accurate analysis of the diseases that are difficult to approach and analyze. RNA sequencing has availed in-depth study of transcriptomes in different species and provided better understanding of rare diseases and taxonomical classifications of various eukaryotic organisms. Development of single-cell, short-read, long-read and direct RNA sequencing using both blood and biopsy specimens of the organism together with recent advancement in computational analysis programs has made the medical professional's ability in identifying the origin and cause of genetic disorders indispensable. Altogether, such advantages have evolved the treatment design since RNA sequencing can detect the resistant genes against the existing therapies and help medical professions to take a further step in improving methods of treatments towards higher effectiveness and less side effects. Therefore, it is of essence to all researchers and scientists to have deeper insight in all available methods of RNA sequencing while taking a step-in therapy design.

Assuntos

Neoplasias , Doenças Raras , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Humanos , Neoplasias/diagnóstico , Neoplasias/genética , Doenças Raras/diagnóstico , Doenças Raras/genética , Análise de Sequência de RNA/métodos , Transcriptoma/genética , Sequenciamento do Exoma

18.

Progress in Methods for Copy Number Variation Profiling.

Gordeeva, Veronika; Sharova, Elena; Arapidi, Georgij.

Int J Mol Sci ; 23(4)2022 Feb 15.

Artigo em Inglês | MEDLINE | ID: mdl-35216262

RESUMO

Copy number variations (CNVs) are the predominant class of structural genomic variations involved in the processes of evolutionary adaptation, genomic disorders, and disease progression. Compared with single-nucleotide variants, there have been challenges associated with the detection of CNVs owing to their diverse sizes. However, the field has seen significant progress in the past 20-30 years. This has been made possible due to the rapid development of molecular diagnostic methods which ensure a more detailed view of the genome structure, further complemented by recent advances in computational methods. Here, we review the major approaches that have been used to routinely detect CNVs, ranging from cytogenetics to the latest sequencing technologies, and then cover their specific features.

Assuntos

Variações do Número de Cópias de DNA/genética , Genoma/genética , Genômica/métodos , Citogenética/métodos , Progressão da Doença , Humanos , Polimorfismo de Nucleotídeo Único/genética

19.

Testing assembly strategies of Francisella tularensis genomes to infer an evolutionary conservation analysis of genomic structures.

Neubert, Kerstin; Zuchantke, Eric; Leidenfrost, Robert Maximilian; Wünschiers, Röbbe; Grützke, Josephine; Malorny, Burkhard; Brendebach, Holger; Al Dahouk, Sascha; Homeier, Timo; Hotzel, Helmut; Reinert, Knut; Tomaso, Herbert; Busch, Anne.

BMC Genomics ; 22(1): 822, 2021 Nov 14.

Artigo em Inglês | MEDLINE | ID: mdl-34773979

RESUMO

BACKGROUND: We benchmarked sequencing technology and assembly strategies for short-read, long-read, and hybrid assemblers in respect to correctness, contiguity, and completeness of assemblies in genomes of Francisella tularensis. Benchmarking allowed in-depth analyses of genomic structures of the Francisella pathogenicity islands and insertion sequences. Five major high-throughput sequencing technologies were applied, including next-generation "short-read" and third-generation "long-read" sequencing methods. RESULTS: We focused on short-read assemblers, hybrid assemblers, and analysis of the genomic structure with particular emphasis on insertion sequences and the Francisella pathogenicity island. The A5-miseq pipeline performed best for MiSeq data, Mira for Ion Torrent data, and ABySS for HiSeq data from eight short-read assembly methods. Two approaches were applied to benchmark long-read and hybrid assembly strategies: long-read-first assembly followed by correction with short reads (Canu/Pilon, Flye/Pilon) and short-read-first assembly along with scaffolding based on long reads (Unicyler, SPAdes). Hybrid assembly can resolve large repetitive regions best with a "long-read first" approach. CONCLUSIONS: Genomic structures of the Francisella pathogenicity islands frequently showed misassembly. Insertion sequences (IS) could be used to perform an evolutionary conservation analysis. A phylogenetic structure of insertion sequences and the evolution within the clades elucidated the clade structure of the highly conservative F. tularensis.

Assuntos

Francisella tularensis , Genoma Bacteriano , Elementos de DNA Transponíveis , Francisella tularensis/genética , Genômica , Sequenciamento de Nucleotídeos em Larga Escala , Filogenia , Análise de Sequência de DNA

20.

Development and evaluation of a rapid and cost-efficient NGS-based MHC class I genotyping method for macaques by using a prevalent short-read sequencer.

Tanimoto, Kousuke; Naruse, Taeko K; Matano, Tetsuro; Kimura, Akinori.

Immunogenetics ; 73(2): 175-186, 2021 04.

Artigo em Inglês | MEDLINE | ID: mdl-33447871

RESUMO

Rhesus macaque is one of the most widely used primate model animals for immunological research of infectious diseases including human immunodeficiency virus (HIV) infection. It is well known that major histocompatibility complex (MHC) class I genotypes affect the susceptibility and disease progression to simian immunodeficiency virus (SIV) in rhesus macaques, which is resembling to HIV in humans. It is required to convincingly determine the MHC genotypes in the immunological investigations, that is why several next-generation sequencing (NGS)-based methods have been established. In general, NGS-based genotyping methods using short amplicons are not often applied to MHC because of increasing number of alleles and inevitable ambiguity in allele detection, although there is an advantage of short read sequencing systems that are commonly used today. In this study, we developed a new high-throughput NGS-based genotyping method for MHC class I alleles in rhesus macaques and cynomolgus macaques. By using our method, 95% and 100% of alleles identified by PCR cloning-based method were detected in rhesus macaques and cynomolgus macaques, respectively, which were highly correlated with their expression levels. It was noted that the simulation of new-allele detection step using artificial alleles differing by a few nucleotide sequences from a known allele could be identified with high accuracy and that we could detect a real novel allele from a rhesus macaque sample. These findings supported that our method could be adapted for primate animal models such as macaques to reduce the cost and labor of previous NGS-based MHC genotyping.

Assuntos

Técnicas de Genotipagem/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Antígenos de Histocompatibilidade Classe I/genética , Alelos , Animais , Genes MHC Classe I/genética , Genótipo , Macaca , Reprodutibilidade dos Testes , Análise de Sequência de DNA

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

Detalhe da pesquisa