Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 251
Filtrar
1.
Trends Genet ; 39(9): 649-671, 2023 09.
Artigo em Inglês | MEDLINE | ID: mdl-37230864

RESUMO

Long-read sequencing (LRS) technologies have provided extremely powerful tools to explore genomes. While in the early years these methods suffered technical limitations, they have recently made significant progress in terms of read length, throughput, and accuracy and bioinformatics tools have strongly improved. Here, we aim to review the current status of LRS technologies, the development of novel methods, and the impact on genomics research. We will explore the most impactful recent findings made possible by these technologies focusing on high-resolution sequencing of genomes and transcriptomes and the direct detection of DNA and RNA modifications. We will also discuss how LRS methods promise a more comprehensive understanding of human genetic variation, transcriptomics, and epigenetics for the coming years.


Assuntos
Genômica , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Genômica/métodos , Análise de Sequência de DNA/métodos , Biologia Computacional , Perfilação da Expressão Gênica/métodos
2.
Brief Bioinform ; 25(4)2024 May 23.
Artigo em Inglês | MEDLINE | ID: mdl-38851298

RESUMO

Deletion is a crucial type of genomic structural variation and is associated with numerous genetic diseases. The advent of third-generation sequencing technology has facilitated the analysis of complex genomic structures and the elucidation of the mechanisms underlying phenotypic changes and disease onset due to genomic variants. Importantly, it has introduced innovative perspectives for deletion variants calling. Here we propose a method named Dual Attention Structural Variation (DASV) to analyze deletion structural variations in sequencing data. DASV converts gene alignment information into images and integrates them with genomic sequencing data through a dual attention mechanism. Subsequently, it employs a multi-scale network to precisely identify deletion regions. Compared with four widely used genome structural variation calling tools: cuteSV, SVIM, Sniffles and PBSV, the results demonstrate that DASV consistently achieves a balance between precision and recall, enhancing the F1 score across various datasets. The source code is available at https://github.com/deconvolution-w/DASV.


Assuntos
Sequenciamento de Nucleotídeos em Larga Escala , Software , Humanos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Deleção de Sequência , Análise de Sequência de DNA/métodos , Algoritmos , Genômica/métodos , Biologia Computacional/métodos
3.
Brief Bioinform ; 24(5)2023 09 20.
Artigo em Inglês | MEDLINE | ID: mdl-37478372

RESUMO

Access to accurate viral genomes is important to downstream data analysis. Third-generation sequencing (TGS) has recently become a popular platform for virus sequencing because of its long read length. However, its per-base error rate, which is higher than next-generation sequencing, can lead to genomes with errors. Polishing tools are thus needed to correct errors either before or after sequence assembly. Despite promising results of available polishing tools, there is still room to improve the error correction performance to perform more accurate genome assembly. The errors, particularly those in coding regions, can hamper analysis such as linage identification and variant monitoring. In this work, we developed a novel pipeline, HMMPolish, for correcting (polishing) errors in protein-coding regions of known RNA viruses. This tool can be applied to either raw TGS reads or the assembled sequences of the target virus. By utilizing profile Hidden Markov Models of protein families/domains in known viruses, HMMPolish can correct errors that are ignored by available polishers. We extensively validated HMMPolish on 34 datasets that covered four clinically important viruses, including HIV-1, influenza-A, norovirus, and severe acute respiratory syndrome coronavirus 2. These datasets contain reads with different properties, such as sequencing depth and platforms (PacBio or Nanopore). The benchmark results against popular/representative polishers show that HMMPolish competes favorably on error correction in coding regions of known RNA viruses.


Assuntos
COVID-19 , Vírus de RNA , Vírus , Humanos , Análise de Sequência de DNA/métodos , Genoma , Sequenciamento de Nucleotídeos em Larga Escala/métodos
4.
Brief Bioinform ; 24(1)2023 01 19.
Artigo em Inglês | MEDLINE | ID: mdl-36527429

RESUMO

Extensive investigation of gene fusions in cancer has led to the discovery of novel biomarkers and therapeutic targets. To date, most studies have neglected chromosomal rearrangement-independent fusion transcripts and complex fusion structures such as double or triple-hop fusions, and fusion-circRNAs. In this review, we untangle fusion-related terminology and propose a classification system involving both gene and transcript fusions. We highlight the importance of RNA-level fusions and how long-read sequencing approaches can improve detection and characterization. Moreover, we discuss novel bioinformatic tools to identify fusions in long-read sequencing data and strategies to experimentally validate and functionally characterize fusion transcripts.


Assuntos
Neoplasias , Humanos , Neoplasias/genética , Biologia Computacional , Fusão Gênica , RNA/genética
5.
Hum Genomics ; 18(1): 110, 2024 Sep 29.
Artigo em Inglês | MEDLINE | ID: mdl-39343938

RESUMO

Spinal muscular atrophy (SMA) is the second most common fatal genetic disease in infancy. It is caused by deletion or intragenic pathogenic variants of the causative gene SMN1, which degenerates anterior horn motor neurons and leads to progressive myasthenia and muscle atrophy. Early treatment improves motor function and prognosis in patients with SMA, but drugs are expensive and do not cure the disease. Therefore, carrier screening seems to be the most effective way to prevent SMA birth defects. In this study, we genetically analyzed 1400 samples using multiplex ligation-dependent probe amplification (MLPA) and quantitative polymerase chain reaction (qPCR), and compared the consistency of the results. We randomly selected 44 samples with consistent MLPA and qPCR results for comprehensive SMA analysis (CASMA) using a long-read sequencing (LRS)-based approach. CASMA results showed 100% consistency, visually and intuitively explained the inconsistency between exons 7 and 8 copy numbers detected by MLPA in 13 samples. A total of 16 samples showed inconsistent MLPA and qPCR results for SMN1 exon 7. CASMA was performed on all samples and the results were consistent with those of resampling for MLPA and qPCR detection. CASMA also detected an additional intragenic variant c.-39A>G in a sample with two copies of SMN1 (RT02). Finally, we detected 23 SMA carriers, with an estimated carrier rate of 1/61 in this cohort. In addition, CASMA identified the "2 + 0" carrier status of SMN1 and SMN2 in a family by analyzing the genotypes of only three samples (parents and one sibling). CASMA has great advantages over MLPA and qPCR assays, and could become a powerful technical support for large-scale screening of SMA.


Assuntos
Éxons , Atrofia Muscular Espinal , Proteína 1 de Sobrevivência do Neurônio Motor , Humanos , Atrofia Muscular Espinal/genética , Atrofia Muscular Espinal/diagnóstico , Proteína 1 de Sobrevivência do Neurônio Motor/genética , Feminino , Masculino , Éxons/genética , Triagem de Portadores Genéticos/métodos , Reação em Cadeia da Polimerase Multiplex/métodos , Análise de Sequência de DNA/métodos
6.
Semin Cell Dev Biol ; 127: 155-165, 2022 07.
Artigo em Inglês | MEDLINE | ID: mdl-34838434

RESUMO

It is well established that DNA base modifications play a key role in gene regulation during development and in response to environmental stress. This type of epigenetic control of development and environmental responses has been intensively studied over the past few decades. Similar to DNA, various RNA species also undergo modifications that play important roles in, for example, RNA splicing, protein translation, and the avoidance of immune surveillance by host. More than 160 different types of RNA modifications have been identified. In addition to base modifications, RNA modification also involves splicing of pre-mRNAs, leading to as many as tens of transcript isoforms from a single pre-RNA, especially in higher organisms. However, the function, prevalence and distribution of RNA modifications are poorly understood. The lack of a suitable method for the reliable identification of RNA modifications constitutes a significant challenge to studying their functions. This review focuses on the technologies that enable de novo identification of RNA base modifications and the alternatively spliced mRNA transcripts.


Assuntos
Processamento Alternativo , Splicing de RNA , Processamento Alternativo/genética , Isoformas de Proteínas/metabolismo , RNA/genética , RNA/metabolismo , Precursores de RNA/genética , Precursores de RNA/metabolismo , Splicing de RNA/genética , RNA Mensageiro/genética
7.
J Virol ; 97(11): e0070523, 2023 Nov 30.
Artigo em Inglês | MEDLINE | ID: mdl-37843370

RESUMO

IMPORTANCE: The lack of a reliable method to accurately detect when replication-competent HIV has been cleared is a major challenge in developing a cure. This study introduces a new approach called the HIVepsilon-seq (HIVε-seq) assay, which uses long-read sequencing technology and bioinformatics to scrutinize the HIV genome at the nucleotide level, distinguishing between defective and intact HIV. This study included 30 participants on antiretroviral therapy, including 17 women, and was able to discriminate between defective and genetically intact viruses at the single DNA strand level. The HIVε-seq assay is an improvement over previous methods, as it requires minimal sample, less specialized lab equipment, and offers a shorter turnaround time. The HIVε-seq assay offers a promising new tool for researchers to measure the intact HIV reservoir, advancing efforts towards finding a cure for this devastating disease.


Assuntos
Infecções por HIV , HIV , Provírus , Feminino , Humanos , Linfócitos T CD4-Positivos , DNA Viral/genética , Infecções por HIV/tratamento farmacológico , Infecções por HIV/epidemiologia , Infecções por HIV/virologia , Nucleotídeos , Provírus/genética , Carga Viral , Análise de Sequência de DNA , Masculino , Fatores Sexuais , HIV/genética
8.
Brief Bioinform ; 23(4)2022 07 18.
Artigo em Inglês | MEDLINE | ID: mdl-35769001

RESUMO

The poly(A) tail is a dynamic addition to the eukaryotic mRNA and the change in its length plays an essential role in regulating gene expression through affecting nuclear export, mRNA stability and translation. Only recently high-throughput sequencing strategies began to emerge for transcriptome-wide profiling of poly(A) tail length in diverse developmental stages and organisms. However, there is currently no easy-to-use and universal tool for measuring poly(A) tails in sequencing data from different sequencing protocols. Here we established PolyAtailor, a unified and efficient framework, for identifying and analyzing poly(A) tails from PacBio-based long reads or next generation short reads. PolyAtailor provides two core functions for measuring poly(A) tails, namely Tail_map and Tail_scan, which can be used for profiling tails with or without using a reference genome. Particularly, PolyAtailor can identify all potential tails in a read, providing users with detailed information such as tail position, tail length, tail sequence and tail type. Moreover, PolyAtailor integrates rich functions for poly(A) tail and poly(A) site analyses, such as differential poly(A) length analysis, poly(A) site identification and annotation, and statistics and visualization of base composition in tails. We compared PolyAtailor with three latest methods, FLAMAnalysis, FLEPSeq and PAIsoSeqAnalysis, using data from three sequencing protocols in HeLa samples and Arabidopsis. Results show that PolyAtailor is effective in measuring poly(A) tail length and detecting significance of differential poly(A) length, which achieves much higher sensitivity and accuracy than competing methods. PolyAtailor is available at https://github.com/BMILAB/PolyAtailor.


Assuntos
Poli A , Poliadenilação , Perfilação da Expressão Gênica , Sequenciamento de Nucleotídeos em Larga Escala , Poli A/genética , Poli A/metabolismo , RNA Mensageiro/genética , Análise de Sequência de RNA/métodos
9.
Microb Ecol ; 87(1): 66, 2024 May 03.
Artigo em Inglês | MEDLINE | ID: mdl-38700528

RESUMO

Despite the importance of wood-inhabiting fungi on nutrient cycling and ecosystem functions, their ecology, especially related to their community assembly, is still highly unexplored. In this study, we analyzed the wood-inhabiting fungal richness, community composition, and phylogenetics using PacBio sequencing. Opposite to what has been expected that deterministic processes especially environmental filtering through wood-physicochemical properties controls the community assembly of wood-inhabiting fungal communities, here we showed that both deterministic and stochastic processes can highly contribute to the community assembly processes of wood-inhabiting fungi in this tropical forest. We demonstrated that the dynamics of stochastic and deterministic processes varied with wood decomposition stages. The initial stage was mainly governed by a deterministic process (homogenous selection), whereas the early and later decomposition stages were governed by the stochastic processes (ecological drift). Deterministic processes were highly contributed by wood physicochemical properties (especially macronutrients and hemicellulose) rather than soil physicochemical factors. We elucidated that fine-scale fungal-fungal interactions, especially the network topology, modularity, and keystone taxa of wood-inhabiting fungal communities, strongly differed in an initial and decomposing deadwood. This current study contributes to a better understanding of the ecological processes of wood-inhabiting fungi in tropical regions where the knowledge of wood-inhabiting fungi is highly limited.


Assuntos
Florestas , Fungos , Micobioma , Madeira , Madeira/microbiologia , Fungos/genética , Fungos/classificação , Fungos/isolamento & purificação , Clima Tropical , Filogenia , Sequenciamento de Nucleotídeos em Larga Escala , Biodiversidade
10.
BMC Pediatr ; 24(1): 330, 2024 May 13.
Artigo em Inglês | MEDLINE | ID: mdl-38741052

RESUMO

BACKGROUND: Thalassemias represent some of the most common monogenic diseases worldwide and are caused by variations in human hemoglobin genes which disrupt the balance of synthesis between the alpha and beta globin chains. Thalassemia gene detection technology is the gold standard to achieve accurate detection of thalassemia, but in clinical practice, most of the tests are only for common genotypes, which can easily lead to missing or misdiagnosis of rare thalassemia genotypes. CASE PRESENTATION: We present the case of an 18-year-old Chinese female with abnormal values of routine hematological indices who was admitted for genetic screening for thalassemia. Genomic DNA was extracted and used for the genetic assays. Gap polymerase chain reaction and agarose gel electrophoresis were performed to detect HBA gene deletions, while PCR-reverse dot blot hybridization was used to detect point mutations in the HBA and HBB genes. Next-generation sequencing and third-generation sequencing (TGS) were used to identify known and potentially novel genotypes of thalassemia. We identified a novel complex variant αHb WestmeadαHb Westmeadαanti3.7/-α3.7 in a patient with rare alpha-thalassemia. CONCLUSIONS: Our study identified a novel complex variant that expands the thalassemia gene variants spectrum. Meanwhile, the study suggests that TGS could effectively improve the specificity of thalassemia gene detection, and has promising potential for the discovery of novel thalassemia genotypes, which could also improve the accuracy of genetic counseling. Couples who are thalassemia carriers have the opportunity to reduce their risk of having a child with thalassemia.


Assuntos
Talassemia alfa , Humanos , Talassemia alfa/genética , Talassemia alfa/diagnóstico , Feminino , Adolescente , Sequenciamento de Nucleotídeos em Larga Escala , Genótipo , Testes Genéticos/métodos , Mutação Puntual , Hemoglobinas Anormais/genética
11.
J Invertebr Pathol ; 205: 108121, 2024 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-38705355

RESUMO

The oak processionary moth (OPM) Thaumetopoea processionea is a pest of oak trees and poses health risks to humans due to the urticating setae of later instar larvae. For this reason, it is difficult to rear OPM under laboratory conditions, carry out bioassays or examine larvae for pathogens. Biological control targets the early larval instars and is based primarily on commercial preparations of Bacillus thuringiensis ssp. kurstaki (Btk). To test the entomopathogenic potential of other spore-forming bacteria, a user-friendly bioassay system was developed that (i) applies bacterial spore suspensions by oak bud dipping, (ii) targets first instar larvae through feeding exposure and (iii) takes into account their group-feeding behavior. A negligible mortality in the untreated control proved the functionality of the newly established bioassay system. Whereas the commercial Btk HD-1 strain was used as a bioassay standard and confirmed as being highly efficient, a Bacillus wiedmannii strain was ineffective in killing OPM larvae. Larvae, which died during the infection experiment, were further subjected to Nanopore sequencing for a metagenomic approach for entomopathogen detection. It further corroborated that B.wiedmannii was not able to infect and establish in OPM, but identified potential insect pathogenic species from the genera Serratia and Pseudomonas.


Assuntos
Bioensaio , Larva , Mariposas , Controle Biológico de Vetores , Animais , Mariposas/microbiologia , Bioensaio/métodos , Controle Biológico de Vetores/métodos , Larva/microbiologia , Metagenoma , Quercus/microbiologia , Bacillus thuringiensis/genética
12.
Proc Natl Acad Sci U S A ; 118(5)2021 02 02.
Artigo em Inglês | MEDLINE | ID: mdl-33495335

RESUMO

5-Methylcytosine (5mC) is an important type of epigenetic modification. Bisulfite sequencing (BS-seq) has limitations, such as severe DNA degradation. Using single molecule real-time sequencing, we developed a methodology to directly examine 5mC. This approach holistically examined kinetic signals of a DNA polymerase (including interpulse duration and pulse width) and sequence context for every nucleotide within a measurement window, termed the holistic kinetic (HK) model. The measurement window of each analyzed double-stranded DNA molecule comprised 21 nucleotides with a cytosine in a CpG site in the center. We used amplified DNA (unmethylated) and M.SssI-treated DNA (methylated) (M.SssI being a CpG methyltransferase) to train a convolutional neural network. The area under the curve for differentiating methylation states using such samples was up to 0.97. The sensitivity and specificity for genome-wide 5mC detection at single-base resolution reached 90% and 94%, respectively. The HK model was then tested on human-mouse hybrid fragments in which each member of the hybrid had a different methylation status. The model was also tested on human genomic DNA molecules extracted from various biological samples, such as buffy coat, placental, and tumoral tissues. The overall methylation levels deduced by the HK model were well correlated with those by BS-seq (r = 0.99; P < 0.0001) and allowed the measurement of allele-specific methylation patterns in imprinted genes. Taken together, this methodology has provided a system for simultaneous genome-wide genetic and epigenetic analyses.


Assuntos
Citosina/metabolismo , Metilação de DNA/genética , Análise de Sequência de DNA , Imagem Individual de Molécula , Animais , Sequência de Bases , DNA/metabolismo , Impressão Genômica , Humanos , Camundongos , Modelos Biológicos
13.
Proc Natl Acad Sci U S A ; 118(50)2021 12 14.
Artigo em Inglês | MEDLINE | ID: mdl-34873045

RESUMO

In the field of circulating cell-free DNA, most of the studies have focused on short DNA molecules (e.g., <500 bp). The existence of long cell-free DNA molecules has been poorly explored. In this study, we demonstrated that single-molecule real-time sequencing allowed us to detect and analyze a substantial proportion of long DNA molecules from both fetal and maternal sources in maternal plasma. Such molecules were beyond the size detection limits of short-read sequencing technologies. The proportions of long cell-free DNA molecules in maternal plasma over 500 bp were 15.5%, 19.8%, and 32.3% for the first, second, and third trimesters, respectively. The longest fetal-derived plasma DNA molecule observed was 23,635 bp. Long plasma DNA molecules demonstrated predominance of A or G 5' fragment ends. Pregnancies with preeclampsia demonstrated a reduction in long maternal plasma DNA molecules, reduced frequencies for selected 5' 4-mer end motifs ending with G or A, and increased frequencies for selected motifs ending with T or C. Finally, we have developed an approach that employs the analysis of methylation patterns of the series of CpG sites on a long DNA molecule for determining its tissue origin. This approach achieved an area under the curve of 0.88 in differentiating between fetal and maternal plasma DNA molecules, enabling the determination of maternal inheritance and recombination events in the fetal genome. This work opens up potential clinical utilities of long cell-free DNA analysis in maternal plasma including noninvasive prenatal testing of monogenic diseases and detection/monitoring of pregnancy-associated disorders such as preeclampsia.


Assuntos
Ácidos Nucleicos Livres/sangue , Ácidos Nucleicos Livres/genética , Adulto , Cromossomos/genética , Simulação por Computador , Feminino , Feto , Humanos , Gravidez , Imagem Individual de Molécula
14.
Hemoglobin ; : 1-6, 2024 Jul 15.
Artigo em Inglês | MEDLINE | ID: mdl-39007770

RESUMO

α-thalassemia major (α-TM) often causes Hb Bart's (c4) hydrops fetalis and severe obstetric complications in the mother. Step-wise screening for couples at risk of having offspring(s) affected by α-TM is the efficient prevention method but some rare genotypes of thalassemia cannot be detected. A 32-year-old male with low HbA2 (2.4%) and mild anemia was performed real-time PCR-based multicolor melting curve analysis (MMCA) because his wife was -SEA deletion carrier. The result of multiplex ligation-dependent probe amplification (MLPA) suggested the existence of -SEA deletion in the proband. A novel deletion of the α-globin gene cluster was found using self-designed MLPA probes combined with longer PCR, which was further accurately described to be 16.8Kb (hg38, Chr16:1,65,236-1,82,113) deletion by the third-generation sequencing. A fragment ranging from 1,53,226 to 1,54,538(GRch38/hg38) was identified which suggested the existence of the homologous recombination event. The third-generation sequencing is accurate and efficient in obtaining accurate information for complex structural variations.

15.
BMC Genomics ; 24(1): 229, 2023 May 02.
Artigo em Inglês | MEDLINE | ID: mdl-37131128

RESUMO

BACKGROUND: Mitochondrial genome sequences have become critical to the study of biodiversity. Genome skimming and other short-read based methods are the most common approaches, but they are not well-suited to scale up to multiplexing hundreds of samples. Here, we report on a new approach to sequence hundreds to thousands of complete mitochondrial genomes in parallel using long-amplicon sequencing. We amplified the mitochondrial genome of 677 specimens in two partially overlapping amplicons and implemented an asymmetric PCR-based indexing approach to multiplex 1,159 long amplicons together on a single PacBio SMRT Sequel II cell. We also tested this method on Oxford Nanopore Technologies (ONT) MinION R9.4 to assess if this method could be applied to other long-read technologies. We implemented several optimizations that make this method significantly more efficient than alternative mitochondrial genome sequencing methods. RESULTS: With the PacBio sequencing data we recovered at least one of the two fragments for 96% of samples (~ 80-90%) with mean coverage ~ 1,500x. The ONT data recovered less than 50% of input fragments likely due to low throughput and the design of the Barcoded Universal Primers which were optimized for PacBio sequencing. We compared a single mitochondrial gene alignment to half and full mitochondrial genomes and found, as expected, increased tree support with longer alignments, though whole mitochondrial genomes were not significantly better than half mitochondrial genomes. CONCLUSIONS: This method can effectively capture thousands of long amplicons in a single run and be used to build more robust phylogenies quickly and effectively. We provide several recommendations for future users depending on the evolutionary scale of their system. A natural extension of this method is to collect multi-locus datasets consisting of mitochondrial genomes and several long nuclear loci at once.


Assuntos
Genoma Mitocondrial , Sequenciamento por Nanoporos , Nanoporos , Análise de Sequência de DNA/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Biodiversidade
16.
BMC Plant Biol ; 23(1): 399, 2023 Aug 22.
Artigo em Inglês | MEDLINE | ID: mdl-37605165

RESUMO

The environment in Antarctica is characterized by low temperature, intense UVB and few vegetation types. The Pohlia nutans M211 are bryophytes, which are the primary plants in Antarctica and can thrive well in the Antarctic harsh environment. The transcriptional profiling of Pohlia nutans M211 under low temperature and high UVB conditions was analyzed to explore their polar adaptation mechanism in the extreme Antarctic environment by third-generation sequencing and second-generation sequencing. In comparison to earlier second-generation sequencing techniques, a total of 43,101 non-redundant transcripts and 10,532 lncRNA transcripts were obtained, which were longer and more accurate. The analysis results of GO, KEGG, AS (alternative splicing), and WGCNA (weighted gene co-expression network analysis) of DEGs (differentially expressed genes), combined with the biochemical kits revealed that antioxidant, secondary metabolites pathways and photosynthesis were the key adaptive pathways for Pohlia nutans M211 to the Antarctic extreme environment. Furthermore, the low temperature and strong UVB are closely linked for the first time by the gene HY5 (hlongated hypocotyl 5) to form a protein interaction network through the PPI (protein-protein interaction networks) analysis method. The UVR8 module, photosynthetic module, secondary metabolites synthesis module, and temperature response module were the key components of the PPI network. In conclusion, this study will help to further explore the polar adaptation mechanism of Antarctic plants represented by bryophytes and to enrich the polar gene resources.


Assuntos
Briófitas , Bryopsida , Antioxidantes , Regiões Antárticas , Fotossíntese , Briófitas/genética
17.
Brief Bioinform ; 22(6)2021 11 05.
Artigo em Inglês | MEDLINE | ID: mdl-34453168

RESUMO

Real-world evaluations of metagenomic reconstructions are challenged by distinguishing reconstruction artifacts from genes and proteins present in situ. Here, we evaluate short-read-only, long-read-only and hybrid assembly approaches on four different metagenomic samples of varying complexity. We demonstrate how different assembly approaches affect gene and protein inference, which is particularly relevant for downstream functional analyses. For a human gut microbiome sample, we use complementary metatranscriptomic and metaproteomic data to assess the metagenomic data-based protein predictions. Our findings pave the way for critical assessments of metagenomic reconstructions. We propose a reference-independent solution, which exploits the synergistic effects of multi-omic data integration for the in situ study of microbiomes using long-read sequencing data.


Assuntos
Biologia Computacional/métodos , Metagenoma , Metagenômica/métodos , Resistência Microbiana a Medicamentos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Humanos
18.
Clin Chem ; 69(2): 168-179, 2023 02 01.
Artigo em Inglês | MEDLINE | ID: mdl-36322427

RESUMO

BACKGROUND: Recent studies using single molecule, real-time (SMRT) sequencing revealed a substantial population of analyzable long cell-free DNA (cfDNA) in plasma. Potential clinical utilities of such long cfDNA in pregnancy and cancer have been demonstrated. However, the performance of different long-read sequencing platforms for the analysis of long cfDNA remains unknown. METHODS: Size biases of SMRT sequencing by Pacific Biosciences (PacBio) and nanopore sequencing by Oxford Nanopore Technologies (ONT) were evaluated using artificial mixtures of sonicated human and mouse DNA of different sizes. cfDNA from plasma samples of pregnant women at different trimesters, hepatitis B carriers, and patients with hepatocellular carcinoma were sequenced with the 2 platforms. RESULTS: Both platforms showed biases to sequence longer (1500 bp vs 200 bp) DNA fragments, with PacBio showing a stronger bias (5-fold overrepresentation of long fragments vs 2-fold in ONT). Percentages of cfDNA fragments 500 bp were around 6-fold higher in PacBio compared with ONT. End motif profiles of cfDNA from PacBio and ONT were similar, yet exhibited platform-dependent patterns. Tissue-of-origin analysis based on single-molecule methylation patterns showed comparable performance on both platforms. CONCLUSIONS: SMRT sequencing generated data with higher percentages of long cfDNA compared with nanopore sequencing. Yet, a higher number of long cfDNA fragments eligible for the tissue-of-origin analysis could be obtained from nanopore sequencing due to its much higher throughput. When analyzing the size and end motif of cfDNA, one should be aware of the analytical characteristics and possible biases of the sequencing platforms being used.


Assuntos
Ácidos Nucleicos Livres , Neoplasias Hepáticas , Sequenciamento por Nanoporos , Humanos , Feminino , Gravidez , Animais , Camundongos , Ácidos Nucleicos Livres/genética , Sequenciamento de Nucleotídeos em Larga Escala , Análise de Sequência de DNA , DNA/genética
19.
RNA Biol ; 20(1): 281-295, 2023 01.
Artigo em Inglês | MEDLINE | ID: mdl-37272060

RESUMO

Breast Cancer Gene 1 (BRCA1) is a tumour suppressor protein that modulates multiple biological processes including genomic stability and DNA damage repair. Although the main BRCA1 protein is well characterized, further proteomics studies have already identified additional BRCA1 isoforms with lower molecular weights. However, the accurate nucleotide sequence determination of their corresponding mRNAs is still a barrier, mainly due to the increased mRNA length of BRCA1 (~5.5 kb) and the limitations of the already implemented sequencing approaches. In the present study, we designed and employed a multiplexed hybrid sequencing approach (Hybrid-seq), based on nanopore and semi-conductor sequencing, aiming to detect BRCA1 alternative transcripts in a panel of human cancer and non-cancerous cell lines. The implementation of the described Hybrid-seq approach led to the generation of highly accurate long sequencing reads that enabled the identification of a wide spectrum of BRCA1 splice variants (BRCA1 sv.7 - sv.52), thus deciphering the transcriptional landscape of the human BRCA1 gene. In addition, demultiplexing of the sequencing data unveiled the expression profile and abundance of the described BRCA1 mRNAs in breast, ovarian, prostate, colorectal, lung and brain cancer as well as in non-cancerous human cell lines. Finally, in silico analysis supports that multiple detected mRNAs harbour open reading frames, being highly expected to encode putative protein isoforms with conserved domains, thus providing new insights into the complex roles of BRCA1 in genomic stability and DNA damage repair.


Assuntos
Proteína BRCA1 , Neoplasias da Mama , Humanos , Feminino , Proteína BRCA1/genética , Proteína BRCA1/metabolismo , Genes BRCA1 , Reparo do DNA/genética , Isoformas de Proteínas/genética , Isoformas de Proteínas/metabolismo , Instabilidade Genômica , Neoplasias da Mama/genética
20.
Mol Biol Rep ; 50(11): 9587-9599, 2023 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-37787843

RESUMO

BACKGROUND: Analytical validity is a prerequisite to use a next generation sequencing (NGS)-based application as an in vitro diagnostic test or a companion diagnostic in clinical practice. Currently, in the United States and the European Union, the intended use of such NGS-based tests does not refer to guided drug therapy on the basis of pharmacogenetic profiling of drug metabolizing enzymes, although the value of pharmacogenetic testing has been reported. However, in research, a large variety of NGS-based tests are used and have been confirmed to be at least comparable to array-based testing. METHODS AND RESULTS: A systematic evaluation was performed screening and assessing published literature on analytical validation of NGS applications for pharmacogenetic profiling of CYP2C9, CYP2C19, CYP2D6, VKORC1 and/or UGT1A1. Although NGS applications are also increasingly used for implementation assessments in clinical practice, we show in the present systematic literature evaluation that published information on the current status of analytical validation of NGS applications targeting drug metabolizing enzymes is scarce. Furthermore, a comprehensive performance evaluation of whole exome and whole genome sequencing with the intended use for pharmacogenetic profiling has not been published so far. CONCLUSIONS: A standard in reporting on analytical validation of NGS-based tests is not in place yet. Therefore, many relevant performance criteria are not addressed in published literature. For an appropriate analytical validation of an NGS-based qualitative test for pharmacogenetic profiling at least accuracy, precision, limit of detection and specificity should be addressed to facilitate the implementation of such tests in clinical use.


Assuntos
Sequenciamento de Nucleotídeos em Larga Escala , Farmacogenética , Farmacogenética/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Sequenciamento Completo do Genoma , Citocromo P-450 CYP2D6 , Exoma
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA