Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 37
Filtrar
1.
Brief Bioinform ; 25(5)2024 Jul 25.
Artigo em Inglês | MEDLINE | ID: mdl-39177264

RESUMO

Recent nanopore sequencing system (R10.4) has enhanced base calling accuracy and is being increasingly utilized for detecting CpG methylation state. However, the robustness and universality of the methylation calling model in officially supplied Dorado remains poorly tested. In this study, we obtained heterogeneous datasets from human and plant sources to carry out comprehensive evaluations, which showed that Dorado performed significantly different across datasets. We therefore developed deep neural networks and implemented several optimizations in training a new model called DeepBAM. DeepBAM achieved superior and more stable performances compared with Dorado, including higher area under the ROC curves (98.47% on average and up to 7.36% improvement) and F1 scores (94.97% on average and up to 16.24% improvement) across the datasets. DeepBAM-based whole genome methylation frequencies have achieved >0.95 correlations with BS-seq on four of five datasets, outperforming Dorado in all instances. It enables unraveling allele-specific methylation patterns, including regions of transposable elements. The enhanced performance of DeepBAM paves the way for broader applications of nanopore sequencing in CpG methylation studies.


Assuntos
Ilhas de CpG , Metilação de DNA , Sequenciamento por Nanoporos , Sequenciamento por Nanoporos/métodos , Humanos , Software , Análise de Sequência de DNA/métodos , Redes Neurais de Computação
2.
Mol Cell ; 71(2): 306-318.e7, 2018 07 19.
Artigo em Inglês | MEDLINE | ID: mdl-30017583

RESUMO

DNA N6-methyladenine (6mA) modification is the most prevalent DNA modification in prokaryotes, but whether it exists in human cells and whether it plays a role in human diseases remain enigmatic. Here, we showed that 6mA is extensively present in the human genome, and we cataloged 881,240 6mA sites accounting for ∼0.051% of the total adenines. [G/C]AGG[C/T] was the most significantly associated motif with 6mA modification. 6mA sites were enriched in the coding regions and mark actively transcribed genes in human cells. DNA 6mA and N6-demethyladenine modification in the human genome were mediated by methyltransferase N6AMT1 and demethylase ALKBH1, respectively. The abundance of 6mA was significantly lower in cancers, accompanied by decreased N6AMT1 and increased ALKBH1 levels, and downregulation of 6mA modification levels promoted tumorigenesis. Collectively, our results demonstrate that DNA 6mA modification is extensively present in human cells and the decrease of genomic DNA 6mA promotes human tumorigenesis.


Assuntos
Adenina/análogos & derivados , Homólogo AlkB 1 da Histona H2a Dioxigenase/metabolismo , Genoma Humano , DNA Metiltransferases Sítio Específica (Adenina-Específica)/metabolismo , Adenina/metabolismo , Homólogo AlkB 1 da Histona H2a Dioxigenase/genética , Animais , Carcinogênese/genética , DNA/genética , Metilação de DNA , Xenoenxertos , Humanos , Camundongos , Camundongos Nus , DNA Metiltransferases Sítio Específica (Adenina-Específica)/genética
3.
Bioinformatics ; 39(1)2023 01 01.
Artigo em Inglês | MEDLINE | ID: mdl-36548365

RESUMO

MOTIVATION: Oxford Nanopore sequencing has great potential and advantages in population-scale studies. Due to the cost of sequencing, the depth of whole-genome sequencing for per individual sample must be small. However, the existing single nucleotide polymorphism (SNP) callers are aimed at high-coverage Nanopore sequencing reads. Detecting the SNP variants on low-coverage Nanopore sequencing data is still a challenging problem. RESULTS: We developed a novel deep learning-based SNP calling method, NanoSNP, to identify the SNP sites (excluding short indels) based on low-coverage Nanopore sequencing reads. In this method, we design a multi-step, multi-scale and haplotype-aware SNP detection pipeline. First, the pileup model in NanoSNP utilizes the naive pileup feature to predict a subset of SNP sites with a Bi-long short-term memory (LSTM) network. These SNP sites are phased and used to divide the low-coverage Nanopore reads into different haplotypes. Finally, the long-range haplotype feature and short-range pileup feature are extracted from each haplotype. The haplotype model combines two features and predicts the genotype for the candidate site using a Bi-LSTM network. To evaluate the performance of NanoSNP, we compared NanoSNP with Clair, Clair3, Pepper-DeepVariant and NanoCaller on the low-coverage (∼16×) Nanopore sequencing reads. We also performed cross-genome testing on six human genomes HG002-HG007, respectively. Comprehensive experiments demonstrate that NanoSNP outperforms Clair, Pepper-DeepVariant and NanoCaller in identifying SNPs on low-coverage Nanopore sequencing data, including the difficult-to-map regions and major histocompatibility complex regions in the human genome. NanoSNP is comparable to Clair3 when the coverage exceeds 16×. AVAILABILITY AND IMPLEMENTATION: https://github.com/huangnengCSU/NanoSNP.git. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Sequenciamento por Nanoporos , Nanoporos , Humanos , Haplótipos , Software , Polimorfismo de Nucleotídeo Único , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Análise de Sequência de DNA/métodos
4.
J Clin Microbiol ; 59(8): e0007921, 2021 07 19.
Artigo em Inglês | MEDLINE | ID: mdl-33952598

RESUMO

While China experienced a peak and decline in coronavirus disease 2019 (COVID-19) cases at the start of 2020, regional outbreaks continuously emerged in subsequent months. Resurgences of COVID-19 have also been observed in many other countries. In Guangzhou, China, a small outbreak, involving less than 100 residents, emerged in March and April 2020, and comprehensive and near-real-time genomic surveillance of SARS-CoV-2 was conducted. When the numbers of confirmed cases among overseas travelers increased, public health measures were enhanced by shifting from self-quarantine to central quarantine and SARS-CoV-2 testing for all overseas travelers. In an analysis of 109 imported cases, we found diverse viral variants distributed in the global viral phylogeny, which were frequently shared within households but not among passengers on the same flight. In contrast to the viral diversity of imported cases, local transmission was predominately attributed to two specific variants imported from Africa, including local cases that reported no direct or indirect contact with imported cases. The introduction events of the virus were identified or deduced before the enhanced measures were taken. These results show the interventions were effective in containing the spread of SARS-CoV-2, and they rule out the possibility of cryptic transmission of viral variants from the first wave in January and February 2020. Our study provides evidence and emphasizes the importance of controls for overseas travelers in the context of the pandemic and exemplifies how viral genomic data can facilitate COVID-19 surveillance and inform public health mitigation strategies.


Assuntos
COVID-19 , SARS-CoV-2 , África , Teste para COVID-19 , China/epidemiologia , Genômica , Humanos
5.
Nat Methods ; 14(11): 1072-1074, 2017 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-28945707

RESUMO

We present a tool that combines fast mapping, error correction, and de novo assembly (MECAT; accessible at https://github.com/xiaochuanle/MECAT) for processing single-molecule sequencing (SMS) reads. MECAT's computing efficiency is superior to that of current tools, while the results MECAT produces are comparable or improved. MECAT enables reference mapping or de novo assembly of large genomes using SMS reads on a single computer.


Assuntos
Sequenciamento de Nucleotídeos em Larga Escala/métodos , Análise de Sequência de DNA/métodos , Algoritmos , Software
6.
Bioinformatics ; 35(22): 4586-4595, 2019 11 01.
Artigo em Inglês | MEDLINE | ID: mdl-30994904

RESUMO

MOTIVATION: The Oxford Nanopore sequencing enables to directly detect methylation states of bases in DNA from reads without extra laboratory techniques. Novel computational methods are required to improve the accuracy and robustness of DNA methylation state prediction using Nanopore reads. RESULTS: In this study, we develop DeepSignal, a deep learning method to detect DNA methylation states from Nanopore sequencing reads. Testing on Nanopore reads of Homo sapiens (H. sapiens), Escherichia coli (E. coli) and pUC19 shows that DeepSignal can achieve higher performance at both read level and genome level on detecting 6 mA and 5mC methylation states comparing to previous hidden Markov model (HMM) based methods. DeepSignal achieves similar performance cross different DNA methylation bases, different DNA methylation motifs and both singleton and mixed DNA CpG. Moreover, DeepSignal requires much lower coverage than those required by HMM and statistics based methods. DeepSignal can achieve 90% above accuracy for detecting 5mC and 6 mA using only 2× coverage of reads. Furthermore, for DNA CpG methylation state prediction, DeepSignal achieves 90% correlation with bisulfite sequencing using just 20× coverage of reads, which is much better than HMM based methods. Especially, DeepSignal can predict methylation states of 5% more DNA CpGs that previously cannot be predicted by bisulfite sequencing. DeepSignal can be a robust and accurate method for detecting methylation states of DNA bases. AVAILABILITY AND IMPLEMENTATION: DeepSignal is publicly available at https://github.com/bioinfomaticsCSU/deepsignal. SUPPLEMENTARY INFORMATION: Supplementary data are available at bioinformatics online.


Assuntos
Metilação de DNA , Sequenciamento de Nucleotídeos em Larga Escala , Aprendizado Profundo , Escherichia coli , Humanos , Sequenciamento por Nanoporos , Análise de Sequência de DNA
7.
BMC Genomics ; 20(1): 508, 2019 Jun 18.
Artigo em Inglês | MEDLINE | ID: mdl-31215402

RESUMO

BACKGROUND: DNA methylation is an important epigenetic modification. Recently the developed single-molecule real-time (SMRT) sequencing technology provided an efficient way to detect DNA N6-methyladenine (6mA) modification that played an important role in epigenetic and positively regulated gene expression. In addition, the gene expression was also regulated by genetic variation. However, the relationship between DNA 6mA modification and variation is still unknown. RESULTS: We collected the SMRT long-reads DNA, Illumina short reads DNA and RNA datasets from the young leaves of Herrania umbratica, and used them to detect 35,654 6mA modification sites, 829,894 DNA variations and 60,672 RNA variations respectively, among which, there are 303 DNA variations and 19 RNA variations with 6mA modification, and 57,468 transmitted genetic variations from DNA to RNA. The results illustrated that the genes with 6mA modification were significant disadvantage to mutate than those genes without modification (p-value< 4.9e-08). And result from the linear regression model showed the 6mA densities of genes were associated with the transmitted variations type 0/1 to 1/1 (p-value < 0.001). CONCLUSIONS: The variations of DNA and RNA in genes with 6mA modification were significant less than those in unmodified genes. Furthermore, the variations in 6mA modified genes were easily transmitted from DNA to RNA, especially the transmitted variation from DNA heterozygote to RNA homozygote.


Assuntos
Adenosina/análogos & derivados , DNA de Plantas/genética , DNA de Plantas/metabolismo , Variação Genética/genética , Genoma de Planta/genética , Magnoliopsida/genética , RNA de Plantas/genética , Adenosina/metabolismo , DNA Intergênico/genética , DNA Intergênico/metabolismo , DNA de Plantas/química , Heterozigoto , Homozigoto , Magnoliopsida/metabolismo
8.
Genome Biol ; 25(1): 107, 2024 04 26.
Artigo em Inglês | MEDLINE | ID: mdl-38671502

RESUMO

Long-read sequencing data, particularly those derived from the Oxford Nanopore sequencing platform, tend to exhibit high error rates. Here, we present NextDenovo, an efficient error correction and assembly tool for noisy long reads, which achieves a high level of accuracy in genome assembly. We apply NextDenovo to assemble 35 diverse human genomes from around the world using Nanopore long-read data. These genomes allow us to identify the landscape of segmental duplication and gene copy number variation in modern human populations. The use of NextDenovo should pave the way for population-scale long-read assembly using Nanopore long-read data.


Assuntos
Variações do Número de Cópias de DNA , Genoma Humano , Humanos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Software , Sequenciamento por Nanoporos/métodos , Análise de Sequência de DNA/métodos , Genômica/métodos
9.
J Proteome Res ; 12(1): 328-35, 2013 Jan 04.
Artigo em Inglês | MEDLINE | ID: mdl-23163785

RESUMO

Mass spectrometry has become one of the most important technologies in proteomic analysis. Tandem mass spectrometry (LC-MS/MS) is a major tool for the analysis of peptide mixtures from protein samples. The key step of MS data processing is the identification of peptides from experimental spectra by searching public sequence databases. Although a number of algorithms to identify peptides from MS/MS data have been already proposed, e.g. Sequest, OMSSA, X!Tandem, Mascot, etc., they are mainly based on statistical models considering only peak-matches between experimental and theoretical spectra, but not peak intensity information. Moreover, different algorithms gave different results from the same MS data, implying their probable incompleteness and questionable reproducibility. We developed a novel peptide identification algorithm, ProVerB, based on a binomial probability distribution model of protein tandem mass spectrometry combined with a new scoring function, making full use of peak intensity information and, thus, enhancing the ability of identification. Compared with Mascot, Sequest, and SQID, ProVerB identified significantly more peptides from LC-MS/MS data sets than the current algorithms at 1% False Discovery Rate (FDR) and provided more confident peptide identifications. ProVerB is also compatible with various platforms and experimental data sets, showing its robustness and versatility. The open-source program ProVerB is available at http://bioinformatics.jnu.edu.cn/software/proverb/ .


Assuntos
Algoritmos , Peptídeos , Proteínas , Espectrometria de Massas em Tandem , Bases de Dados de Proteínas , Internet , Modelos Estatísticos , Peptídeos/genética , Peptídeos/isolamento & purificação , Probabilidade , Proteínas/genética , Proteínas/isolamento & purificação , Software
10.
Nat Commun ; 14(1): 1250, 2023 03 06.
Artigo em Inglês | MEDLINE | ID: mdl-36878904

RESUMO

Canonical three-dimensional (3D) genome structures represent the ensemble average of pairwise chromatin interactions but not the single-allele topologies in populations of cells. Recently developed Pore-C can capture multiway chromatin contacts that reflect regional topologies of single chromosomes. By carrying out high-throughput Pore-C, we reveal extensive but regionally restricted clusters of single-allele topologies that aggregate into canonical 3D genome structures in two human cell types. We show that fragments in multi-contact reads generally coexist in the same TAD. In contrast, a concurrent significant proportion of multi-contact reads span multiple compartments of the same chromatin type over megabase distances. Synergistic chromatin looping between multiple sites in multi-contact reads is rare compared to pairwise interactions. Interestingly, the single-allele topology clusters are cell type-specific even inside highly conserved TADs in different types of cells. In summary, HiPore-C enables global characterization of single-allele topologies at an unprecedented depth to reveal elusive genome folding principles.


Assuntos
Cromatina , Humanos , Alelos , Cromatina/genética
SELEÇÃO DE REFERÊNCIAS
Detalhe da pesquisa