Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 36
Filtrar
1.
Genome Biol ; 25(1): 107, 2024 Apr 26.
Artículo en Inglés | MEDLINE | ID: mdl-38671502

RESUMEN

Long-read sequencing data, particularly those derived from the Oxford Nanopore sequencing platform, tend to exhibit high error rates. Here, we present NextDenovo, an efficient error correction and assembly tool for noisy long reads, which achieves a high level of accuracy in genome assembly. We apply NextDenovo to assemble 35 diverse human genomes from around the world using Nanopore long-read data. These genomes allow us to identify the landscape of segmental duplication and gene copy number variation in modern human populations. The use of NextDenovo should pave the way for population-scale long-read assembly using Nanopore long-read data.


Asunto(s)
Variaciones en el Número de Copia de ADN , Genoma Humano , Humanos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Programas Informáticos , Secuenciación de Nanoporos/métodos , Análisis de Secuencia de ADN/métodos , Genómica/métodos
2.
Nat Commun ; 14(1): 4054, 2023 07 08.
Artículo en Inglés | MEDLINE | ID: mdl-37422489

RESUMEN

Long single-molecular sequencing technologies, such as PacBio circular consensus sequencing (CCS) and nanopore sequencing, are advantageous in detecting DNA 5-methylcytosine in CpGs (5mCpGs), especially in repetitive genomic regions. However, existing methods for detecting 5mCpGs using PacBio CCS are less accurate and robust. Here, we present ccsmeth, a deep-learning method to detect DNA 5mCpGs using CCS reads. We sequence polymerase-chain-reaction treated and M.SssI-methyltransferase treated DNA of one human sample using PacBio CCS for training ccsmeth. Using long (≥10 Kb) CCS reads, ccsmeth achieves 0.90 accuracy and 0.97 Area Under the Curve on 5mCpG detection at single-molecule resolution. At the genome-wide site level, ccsmeth achieves >0.90 correlations with bisulfite sequencing and nanopore sequencing using only 10× reads. Furthermore, we develop a Nextflow pipeline, ccsmethphase, to detect haplotype-aware methylation using CCS reads, and then sequence a Chinese family trio to validate it. ccsmeth and ccsmethphase can be robust and accurate tools for detecting DNA 5-methylcytosines.


Asunto(s)
5-Metilcitosina , ADN , Humanos , Consenso , ADN/genética , Análisis de Secuencia de ADN/métodos , Metilación de ADN , Secuenciación de Nucleótidos de Alto Rendimiento/métodos
3.
Nat Commun ; 14(1): 2631, 2023 05 06.
Artículo en Inglés | MEDLINE | ID: mdl-37149708

RESUMEN

Although long-read single-cell RNA isoform sequencing (scISO-Seq) can reveal alternative RNA splicing in individual cells, it suffers from a low read throughput. Here, we introduce HIT-scISOseq, a method that removes most artifact cDNAs and concatenates multiple cDNAs for PacBio circular consensus sequencing (CCS) to achieve high-throughput and high-accuracy single-cell RNA isoform sequencing. HIT-scISOseq can yield >10 million high-accuracy long-reads in a single PacBio Sequel II SMRT Cell 8M. We also report the development of scISA-Tools that demultiplex HIT-scISOseq concatenated reads into single-cell cDNA reads with >99.99% accuracy and specificity. We apply HIT-scISOseq to characterize the transcriptomes of 3375 corneal limbus cells and reveal cell-type-specific isoform expression in them. HIT-scISOseq is a high-throughput, high-accuracy, technically accessible method and it can accelerate the burgeoning field of long-read single-cell transcriptomics.


Asunto(s)
Isoformas de ARN , ARN , Isoformas de ARN/genética , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Consenso , Isoformas de Proteínas/genética , Análisis de Secuencia de ADN/métodos , Análisis de Secuencia de ARN
4.
Nat Commun ; 14(1): 1250, 2023 03 06.
Artículo en Inglés | MEDLINE | ID: mdl-36878904

RESUMEN

Canonical three-dimensional (3D) genome structures represent the ensemble average of pairwise chromatin interactions but not the single-allele topologies in populations of cells. Recently developed Pore-C can capture multiway chromatin contacts that reflect regional topologies of single chromosomes. By carrying out high-throughput Pore-C, we reveal extensive but regionally restricted clusters of single-allele topologies that aggregate into canonical 3D genome structures in two human cell types. We show that fragments in multi-contact reads generally coexist in the same TAD. In contrast, a concurrent significant proportion of multi-contact reads span multiple compartments of the same chromatin type over megabase distances. Synergistic chromatin looping between multiple sites in multi-contact reads is rare compared to pairwise interactions. Interestingly, the single-allele topology clusters are cell type-specific even inside highly conserved TADs in different types of cells. In summary, HiPore-C enables global characterization of single-allele topologies at an unprecedented depth to reveal elusive genome folding principles.


Asunto(s)
Cromatina , Humanos , Alelos , Cromatina/genética
5.
Bioinformatics ; 39(1)2023 01 01.
Artículo en Inglés | MEDLINE | ID: mdl-36548365

RESUMEN

MOTIVATION: Oxford Nanopore sequencing has great potential and advantages in population-scale studies. Due to the cost of sequencing, the depth of whole-genome sequencing for per individual sample must be small. However, the existing single nucleotide polymorphism (SNP) callers are aimed at high-coverage Nanopore sequencing reads. Detecting the SNP variants on low-coverage Nanopore sequencing data is still a challenging problem. RESULTS: We developed a novel deep learning-based SNP calling method, NanoSNP, to identify the SNP sites (excluding short indels) based on low-coverage Nanopore sequencing reads. In this method, we design a multi-step, multi-scale and haplotype-aware SNP detection pipeline. First, the pileup model in NanoSNP utilizes the naive pileup feature to predict a subset of SNP sites with a Bi-long short-term memory (LSTM) network. These SNP sites are phased and used to divide the low-coverage Nanopore reads into different haplotypes. Finally, the long-range haplotype feature and short-range pileup feature are extracted from each haplotype. The haplotype model combines two features and predicts the genotype for the candidate site using a Bi-LSTM network. To evaluate the performance of NanoSNP, we compared NanoSNP with Clair, Clair3, Pepper-DeepVariant and NanoCaller on the low-coverage (∼16×) Nanopore sequencing reads. We also performed cross-genome testing on six human genomes HG002-HG007, respectively. Comprehensive experiments demonstrate that NanoSNP outperforms Clair, Pepper-DeepVariant and NanoCaller in identifying SNPs on low-coverage Nanopore sequencing data, including the difficult-to-map regions and major histocompatibility complex regions in the human genome. NanoSNP is comparable to Clair3 when the coverage exceeds 16×. AVAILABILITY AND IMPLEMENTATION: https://github.com/huangnengCSU/NanoSNP.git. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Secuenciación de Nanoporos , Nanoporos , Humanos , Haplotipos , Programas Informáticos , Polimorfismo de Nucleótido Simple , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Análisis de Secuencia de ADN/métodos
7.
Nat Commun ; 12(1): 5976, 2021 10 13.
Artículo en Inglés | MEDLINE | ID: mdl-34645826

RESUMEN

In plants, cytosine DNA methylations (5mCs) can happen in three sequence contexts as CpG, CHG, and CHH (where H = A, C, or T), which play different roles in the regulation of biological processes. Although long Nanopore reads are advantageous in the detection of 5mCs comparing to short-read bisulfite sequencing, existing methods can only detect 5mCs in the CpG context, which limits their application in plants. Here, we develop DeepSignal-plant, a deep learning tool to detect genome-wide 5mCs of all three contexts in plants from Nanopore reads. We sequence Arabidopsis thaliana and Oryza sativa using both Nanopore and bisulfite sequencing. We develop a denoising process for training models, which enables DeepSignal-plant to achieve high correlations with bisulfite sequencing for 5mC detection in all three contexts. Furthermore, DeepSignal-plant can profile more 5mC sites, which will help to provide a more complete understanding of epigenetic mechanisms of different biological processes.


Asunto(s)
Arabidopsis/genética , Citosina/metabolismo , ADN de Plantas/genética , Epigénesis Genética , Genoma de Planta , Oryza/genética , Arabidopsis/metabolismo , Islas de CpG , Metilación de ADN , ADN de Plantas/metabolismo , Aprendizaje Profundo , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Nanoporos , Oryza/metabolismo , Análisis de Secuencia de ADN , Sulfitos/química
8.
Comput Struct Biotechnol J ; 19: 4574-4580, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-34471500

RESUMEN

SPLiT-seq provides a low-cost platform to generate single-cell data by labeling the cellular origin of RNA through four rounds of combinatorial barcoding. However, an automatic and rapid method for preprocessing and classifying single-cell sequencing (SCS) data from SPLiT-seq, which directly identified and labeled combinatorial barcoding reads and distinguished special cell sequencing data, is currently lacking. Here, we develop a high-efficiency preprocessing tool for single-cell sequencing data from SPLiT-seq (SCSit), which can directly identify combinatorial barcodes and UMI of cell types and obtain more labeled reads, and remarkably enhance the retained data from SCS due to the exact alignment of insertion and deletion. Compared with the original method used in SPLiT-seq, the consistency of identified reads from SCSit increases to 97%, and mapped reads are twice than the original. Furthermore, the runtime of SCSit is less than 10% of the original. It can accurately and rapidly analyze SPLiT-seq raw data and obtain labeled reads, as well as effectively improve the single-cell data from SPLiT-seq platform. The data and source of SCSit are available on the GitHub website https://github.com/shang-qian/SCSit.

9.
J Clin Microbiol ; 59(8): e0007921, 2021 07 19.
Artículo en Inglés | MEDLINE | ID: mdl-33952598

RESUMEN

While China experienced a peak and decline in coronavirus disease 2019 (COVID-19) cases at the start of 2020, regional outbreaks continuously emerged in subsequent months. Resurgences of COVID-19 have also been observed in many other countries. In Guangzhou, China, a small outbreak, involving less than 100 residents, emerged in March and April 2020, and comprehensive and near-real-time genomic surveillance of SARS-CoV-2 was conducted. When the numbers of confirmed cases among overseas travelers increased, public health measures were enhanced by shifting from self-quarantine to central quarantine and SARS-CoV-2 testing for all overseas travelers. In an analysis of 109 imported cases, we found diverse viral variants distributed in the global viral phylogeny, which were frequently shared within households but not among passengers on the same flight. In contrast to the viral diversity of imported cases, local transmission was predominately attributed to two specific variants imported from Africa, including local cases that reported no direct or indirect contact with imported cases. The introduction events of the virus were identified or deduced before the enhanced measures were taken. These results show the interventions were effective in containing the spread of SARS-CoV-2, and they rule out the possibility of cryptic transmission of viral variants from the first wave in January and February 2020. Our study provides evidence and emphasizes the importance of controls for overseas travelers in the context of the pandemic and exemplifies how viral genomic data can facilitate COVID-19 surveillance and inform public health mitigation strategies.


Asunto(s)
COVID-19 , SARS-CoV-2 , África , Prueba de COVID-19 , China/epidemiología , Genómica , Humanos
10.
Nat Commun ; 12(1): 60, 2021 01 04.
Artículo en Inglés | MEDLINE | ID: mdl-33397900

RESUMEN

Long nanopore reads are advantageous in de novo genome assembly. However, nanopore reads usually have broad error distribution and high-error-rate subsequences. Existing error correction tools cannot correct nanopore reads efficiently and effectively. Most methods trim high-error-rate subsequences during error correction, which reduces both the length of the reads and contiguity of the final assembly. Here, we develop an error correction, and de novo assembly tool designed to overcome complex errors in nanopore reads. We propose an adaptive read selection and two-step progressive method to quickly correct nanopore reads to high accuracy. We introduce a two-stage assembler to utilize the full length of nanopore reads. Our tool achieves superior performance in both error correction and de novo assembling nanopore reads. It requires only 8122 hours to assemble a 35X coverage human genome and achieves a 2.47-fold improvement in NG50. Furthermore, our assembly of the human WERI cell line shows an NG50 of 22 Mbp. The high-quality assembly of nanopore reads can significantly reduce false positives in structure variation detection.


Asunto(s)
Nanoporos , Análisis de Secuencia de ADN , Línea Celular , Cromosomas Humanos/genética , Genoma Humano , Humanos , Retinoblastoma/genética , Programas Informáticos
11.
Hortic Res ; 6: 78, 2019.
Artículo en Inglés | MEDLINE | ID: mdl-31240103

RESUMEN

Eukaryotic DNA methylation has been receiving increasing attention for its crucial epigenetic regulatory function. The recently developed single-molecule real-time (SMRT) sequencing technology provides an efficient way to detect DNA N6-methyladenine (6mA) and N4-methylcytosine (4mC) modifications at a single-nucleotide resolution. The family Rosaceae contains horticultural plants with a wide range of economic importance. However, little is currently known regarding the genome-wide distribution patterns and functions of 6mA and 4mC modifications in the Rosaceae. In this study, we present an integrated DNA 6mA and 4mC modification database for the Rosaceae (MDR, http://mdr.xieslab.org). MDR, the first repository for displaying and storing DNA 6mA and 4mC methylomes from SMRT sequencing data sets for Rosaceae, includes meta and statistical information, methylation densities, Gene Ontology enrichment analyses, and genome search and browse for methylated sites in NCBI. MDR provides important information regarding DNA 6mA and 4mC methylation and may help users better understand epigenetic modifications in the family Rosaceae.

12.
BMC Genomics ; 20(1): 508, 2019 Jun 18.
Artículo en Inglés | MEDLINE | ID: mdl-31215402

RESUMEN

BACKGROUND: DNA methylation is an important epigenetic modification. Recently the developed single-molecule real-time (SMRT) sequencing technology provided an efficient way to detect DNA N6-methyladenine (6mA) modification that played an important role in epigenetic and positively regulated gene expression. In addition, the gene expression was also regulated by genetic variation. However, the relationship between DNA 6mA modification and variation is still unknown. RESULTS: We collected the SMRT long-reads DNA, Illumina short reads DNA and RNA datasets from the young leaves of Herrania umbratica, and used them to detect 35,654 6mA modification sites, 829,894 DNA variations and 60,672 RNA variations respectively, among which, there are 303 DNA variations and 19 RNA variations with 6mA modification, and 57,468 transmitted genetic variations from DNA to RNA. The results illustrated that the genes with 6mA modification were significant disadvantage to mutate than those genes without modification (p-value< 4.9e-08). And result from the linear regression model showed the 6mA densities of genes were associated with the transmitted variations type 0/1 to 1/1 (p-value < 0.001). CONCLUSIONS: The variations of DNA and RNA in genes with 6mA modification were significant less than those in unmodified genes. Furthermore, the variations in 6mA modified genes were easily transmitted from DNA to RNA, especially the transmitted variation from DNA heterozygote to RNA homozygote.


Asunto(s)
Adenosina/análogos & derivados , ADN de Plantas/genética , ADN de Plantas/metabolismo , Variación Genética/genética , Genoma de Planta/genética , Magnoliopsida/genética , ARN de Planta/genética , Adenosina/metabolismo , ADN Intergénico/genética , ADN Intergénico/metabolismo , ADN de Plantas/química , Heterocigoto , Homocigoto , Magnoliopsida/metabolismo
13.
Nat Commun ; 10(1): 2449, 2019 06 04.
Artículo en Inglés | MEDLINE | ID: mdl-31164644

RESUMEN

DNA base modifications, such as C5-methylcytosine (5mC) and N6-methyldeoxyadenosine (6mA), are important types of epigenetic regulations. Short-read bisulfite sequencing and long-read PacBio sequencing have inherent limitations to detect DNA modifications. Here, using raw electric signals of Oxford Nanopore long-read sequencing data, we design DeepMod, a bidirectional recurrent neural network (RNN) with long short-term memory (LSTM) to detect DNA modifications. We sequence a human genome HX1 and a Chlamydomonas reinhardtii genome using Nanopore sequencing, and then evaluate DeepMod on three types of genomes (Escherichia coli, Chlamydomonas reinhardtii and human genomes). For 5mC detection, DeepMod achieves average precision up to 0.99 for both synthetically introduced and naturally occurring modifications. For 6mA detection, DeepMod achieves ~0.9 average precision on Escherichia coli data, and have improved performance than existing methods on Chlamydomonas reinhardtii data. In conclusion, DeepMod performs well for genome-scale detection of DNA modifications and will facilitate epigenetic analysis on diverse species.


Asunto(s)
Chlamydomonas reinhardtii/genética , Metilación de ADN , Escherichia coli/genética , Genoma Bacteriano/genética , Genoma Humano/genética , Genoma de Planta/genética , Redes Neurales de la Computación , Bases de Datos de Ácidos Nucleicos , Epigénesis Genética , Humanos , Nanoporos
14.
Bioinformatics ; 35(22): 4586-4595, 2019 11 01.
Artículo en Inglés | MEDLINE | ID: mdl-30994904

RESUMEN

MOTIVATION: The Oxford Nanopore sequencing enables to directly detect methylation states of bases in DNA from reads without extra laboratory techniques. Novel computational methods are required to improve the accuracy and robustness of DNA methylation state prediction using Nanopore reads. RESULTS: In this study, we develop DeepSignal, a deep learning method to detect DNA methylation states from Nanopore sequencing reads. Testing on Nanopore reads of Homo sapiens (H. sapiens), Escherichia coli (E. coli) and pUC19 shows that DeepSignal can achieve higher performance at both read level and genome level on detecting 6 mA and 5mC methylation states comparing to previous hidden Markov model (HMM) based methods. DeepSignal achieves similar performance cross different DNA methylation bases, different DNA methylation motifs and both singleton and mixed DNA CpG. Moreover, DeepSignal requires much lower coverage than those required by HMM and statistics based methods. DeepSignal can achieve 90% above accuracy for detecting 5mC and 6 mA using only 2× coverage of reads. Furthermore, for DNA CpG methylation state prediction, DeepSignal achieves 90% correlation with bisulfite sequencing using just 20× coverage of reads, which is much better than HMM based methods. Especially, DeepSignal can predict methylation states of 5% more DNA CpGs that previously cannot be predicted by bisulfite sequencing. DeepSignal can be a robust and accurate method for detecting methylation states of DNA bases. AVAILABILITY AND IMPLEMENTATION: DeepSignal is publicly available at https://github.com/bioinfomaticsCSU/deepsignal. SUPPLEMENTARY INFORMATION: Supplementary data are available at bioinformatics online.


Asunto(s)
Metilación de ADN , Secuenciación de Nucleótidos de Alto Rendimiento , Aprendizaje Profundo , Escherichia coli , Humanos , Secuenciación de Nanoporos , Análisis de Secuencia de ADN
15.
Front Genet ; 10: 1288, 2019.
Artículo en Inglés | MEDLINE | ID: mdl-31998359

RESUMEN

N 6-methyladenine (6mA) DNA modification has been detected in several eukaryotic organisms, where it plays important roles in gene regulation and epigenetic memory maintenance. However, the genome-wide distribution patterns and potential functions of 6mA DNA modification in woodland strawberry (Fragaria vesca) remain largely unknown. Here, we examined the 6mA landscape in the F. vesca genome by adopting single-molecule real-time sequencing technology and found that 6mA modification sites were broadly distributed across the woodland strawberry genome. The pattern of 6mA distribution in the long non-coding RNA was significantly different from that in protein-coding genes. The 6mA modification influenced the gene transcription and was positively associated with gene expression, which was validated by computational and experimental analyses. Our study provides new insights into the DNA methylation in F. vesca.

16.
Int J Genomics ; 2018: 9207637, 2018.
Artículo en Inglés | MEDLINE | ID: mdl-30581839

RESUMEN

The accurate landscape of transcript isoforms plays an important role in the understanding of gene function and gene regulation. However, building complete transcripts is very challenging for short reads generated using next-generation sequencing. Fortunately, isoform sequencing (Iso-Seq) using single-molecule sequencing technologies, such as PacBio SMRT, provides long reads spanning entire transcript isoforms which do not require assembly. Therefore, we have developed ISOdb, a comprehensive resource database for hosting and carrying out an in-depth analysis of Iso-Seq datasets and visualising the full-length transcript isoforms. The current version of ISOdb has collected 93 publicly available Iso-Seq samples from eight species and presents the samples in two levels: (1) sample level, including metainformation, long read distribution, isoform numbers, and alternative splicing (AS) events of each sample; (2) gene level, including the total isoforms, novel isoform number, novel AS number, and isoform visualisation of each gene. In addition, ISOdb provides a user interface in the website for uploading sample information to facilitate the collection and analysis of researchers' datasets. Currently, ISOdb is the first repository that offers comprehensive resources and convenient public access for hosting, analysing, and visualising Iso-Seq data, which is freely available.

17.
Sci Rep ; 8(1): 16272, 2018 11 02.
Artículo en Inglés | MEDLINE | ID: mdl-30389999

RESUMEN

DNA N6-methyladenine (6mA) modifications expand the information capacity of DNA and have long been known to exist in bacterial genomes. Xanthomonas oryzae pv. Oryzicola (Xoc) is the causative agent of bacterial leaf streak, an emerging and destructive disease in rice worldwide. However, the genome-wide distribution patterns and potential functions of 6mA in Xoc are largely unknown. In this study, we analyzed the levels and global distribution patterns of 6mA modification in genomic DNA of seven Xoc strains (BLS256, BLS279, CFBP2286, CFBP7331, CFBP7341, L8 and RS105). The 6mA modification was found to be widely distributed across the seven Xoc genomes, accounting for percent of 3.80, 3.10, 3.70, 4.20, 3.40, 2.10, and 3.10 of the total adenines in BLS256, BLS279, CFBP2286, CFBP7331, CFBP7341, L8, and RS105, respectively. Notably, more than 82% of 6mA sites were located within gene bodies in all seven strains. Two specific motifs for 6 mA modification, ARGT and AVCG, were prevalent in all seven strains. Comparison of putative DNA methylation motifs from the seven strains reveals that Xoc have a specific DNA methylation system. Furthermore, the 6 mA modification of rpfC dramatically decreased during Xoc infection indicates the important role for Xoc adaption to environment.


Asunto(s)
Adenina/análogos & derivados , Metilación de ADN/genética , ADN Bacteriano/metabolismo , Regulación Bacteriana de la Expresión Génica , Xanthomonas/genética , Adenina/metabolismo , Proteínas Bacterianas/genética , Genes Bacterianos/genética , Interacciones Huésped-Patógeno/genética , Oryza/microbiología , Enfermedades de las Plantas/microbiología , Hojas de la Planta/microbiología , Virulencia/genética , Xanthomonas/patogenicidad
18.
Mol Cell ; 71(2): 306-318.e7, 2018 07 19.
Artículo en Inglés | MEDLINE | ID: mdl-30017583

RESUMEN

DNA N6-methyladenine (6mA) modification is the most prevalent DNA modification in prokaryotes, but whether it exists in human cells and whether it plays a role in human diseases remain enigmatic. Here, we showed that 6mA is extensively present in the human genome, and we cataloged 881,240 6mA sites accounting for ∼0.051% of the total adenines. [G/C]AGG[C/T] was the most significantly associated motif with 6mA modification. 6mA sites were enriched in the coding regions and mark actively transcribed genes in human cells. DNA 6mA and N6-demethyladenine modification in the human genome were mediated by methyltransferase N6AMT1 and demethylase ALKBH1, respectively. The abundance of 6mA was significantly lower in cancers, accompanied by decreased N6AMT1 and increased ALKBH1 levels, and downregulation of 6mA modification levels promoted tumorigenesis. Collectively, our results demonstrate that DNA 6mA modification is extensively present in human cells and the decrease of genomic DNA 6mA promotes human tumorigenesis.


Asunto(s)
Adenina/análogos & derivados , Histona H2a Dioxigenasa, Homólogo 1 de AlkB/metabolismo , Genoma Humano , Metiltransferasa de ADN de Sitio Específico (Adenina Especifica)/metabolismo , Adenina/metabolismo , Histona H2a Dioxigenasa, Homólogo 1 de AlkB/genética , Animales , Carcinogénesis/genética , ADN/genética , Metilación de ADN , Xenoinjertos , Humanos , Ratones , Ratones Desnudos , Metiltransferasa de ADN de Sitio Específico (Adenina Especifica)/genética
19.
Nat Methods ; 14(11): 1072-1074, 2017 Nov.
Artículo en Inglés | MEDLINE | ID: mdl-28945707

RESUMEN

We present a tool that combines fast mapping, error correction, and de novo assembly (MECAT; accessible at https://github.com/xiaochuanle/MECAT) for processing single-molecule sequencing (SMS) reads. MECAT's computing efficiency is superior to that of current tools, while the results MECAT produces are comparable or improved. MECAT enables reference mapping or de novo assembly of large genomes using SMS reads on a single computer.


Asunto(s)
Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Análisis de Secuencia de ADN/métodos , Algoritmos , Programas Informáticos
20.
PLoS One ; 9(4): e94250, 2014.
Artículo en Inglés | MEDLINE | ID: mdl-24743329

RESUMEN

Correct and bias-free interpretation of the deep sequencing data is inevitably dependent on the complete mapping of all mappable reads to the reference sequence, especially for quantitative RNA-seq applications. Seed-based algorithms are generally slow but robust, while Burrows-Wheeler Transform (BWT) based algorithms are fast but less robust. To have both advantages, we developed an algorithm FANSe2 with iterative mapping strategy based on the statistics of real-world sequencing error distribution to substantially accelerate the mapping without compromising the accuracy. Its sensitivity and accuracy are higher than the BWT-based algorithms in the tests using both prokaryotic and eukaryotic sequencing datasets. The gene identification results of FANSe2 is experimentally validated, while the previous algorithms have false positives and false negatives. FANSe2 showed remarkably better consistency to the microarray than most other algorithms in terms of gene expression quantifications. We implemented a scalable and almost maintenance-free parallelization method that can utilize the computational power of multiple office computers, a novel feature not present in any other mainstream algorithm. With three normal office computers, we demonstrated that FANSe2 mapped an RNA-seq dataset generated from an entire Illunima HiSeq 2000 flowcell (8 lanes, 608 M reads) to masked human genome within 4.1 hours with higher sensitivity than Bowtie/Bowtie2. FANSe2 thus provides robust accuracy, full indel sensitivity, fast speed, versatile compatibility and economical computational utilization, making it a useful and practical tool for deep sequencing applications. FANSe2 is freely available at http://bioinformatics.jnu.edu.cn/software/fanse2/.


Asunto(s)
Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Alineación de Secuencia/métodos , Algoritmos , Bases de Datos Genéticas , Perfilación de la Expresión Génica , Genoma Humano/genética , Humanos , Análisis de Secuencia por Matrices de Oligonucleótidos , Análisis de Secuencia de ADN , Análisis de Secuencia de ARN , Factores de Tiempo
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA