Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 4 de 4
Filtrar
Mais filtros

Base de dados
Tipo de documento
Intervalo de ano de publicação
1.
Nucleic Acids Res ; 45(14): e132, 2017 Aug 21.
Artigo em Inglês | MEDLINE | ID: mdl-28586438

RESUMO

Third generation sequencing (TGS) are highly promising technologies but the long and noisy reads from TGS are difficult to align using existing algorithms. Here, we present COSINE, a conceptually new method designed specifically for aligning long reads contaminated by a high level of errors. COSINE computes the context similarity of two stretches of nucleobases given the similarity over distributions of their short k-mers (k = 3-4) along the sequences. The results on simulated and real data show that COSINE achieves high sensitivity and specificity under a wide range of read accuracies. When the error rate is high, COSINE can offer substantial advantages over existing alignment methods.


Assuntos
Algoritmos , Biologia Computacional/métodos , Alinhamento de Sequência/métodos , Software , Sequência de Bases , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Sequenciamento de Nucleotídeos em Larga Escala/estatística & dados numéricos , Reprodutibilidade dos Testes
2.
Nucleic Acids Res ; 43(18): e116, 2015 Oct 15.
Artigo em Inglês | MEDLINE | ID: mdl-26040699

RESUMO

We developed an innovative hybrid sequencing approach, IDP-fusion, to detect fusion genes, determine fusion sites and identify and quantify fusion isoforms. IDP-fusion is the first method to study gene fusion events by integrating Third Generation Sequencing long reads and Second Generation Sequencing short reads. We applied IDP-fusion to PacBio data and Illumina data from the MCF-7 breast cancer cells. Compared with the existing tools, IDP-fusion detects fusion genes at higher precision and a very low false positive rate. The results show that IDP-fusion will be useful for unraveling the complexity of multiple fusion splices and fusion isoforms within tumorigenesis-relevant fusion genes.


Assuntos
Carcinogênese/genética , Perfilação da Expressão Gênica , Fusão Gênica , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Neoplasias da Mama/genética , Neoplasias da Mama/metabolismo , Feminino , Humanos , Células MCF-7 , Isoformas de Proteínas/genética , Isoformas de Proteínas/metabolismo , Alinhamento de Sequência
3.
Proc Natl Acad Sci U S A ; 110(50): E4821-30, 2013 Dec 10.
Artigo em Inglês | MEDLINE | ID: mdl-24282307

RESUMO

Although transcriptional and posttranscriptional events are detected in RNA-Seq data from second-generation sequencing, full-length mRNA isoforms are not captured. On the other hand, third-generation sequencing, which yields much longer reads, has current limitations of lower raw accuracy and throughput. Here, we combine second-generation sequencing and third-generation sequencing with a custom-designed method for isoform identification and quantification to generate a high-confidence isoform dataset for human embryonic stem cells (hESCs). We report 8,084 RefSeq-annotated isoforms detected as full-length and an additional 5,459 isoforms predicted through statistical inference. Over one-third of these are novel isoforms, including 273 RNAs from gene loci that have not previously been identified. Further characterization of the novel loci indicates that a subset is expressed in pluripotent cells but not in diverse fetal and adult tissues; moreover, their reduced expression perturbs the network of pluripotency-associated genes. Results suggest that gene identification, even in well-characterized human cell lines and tissues, is likely far from complete.


Assuntos
Processamento Alternativo/genética , Células-Tronco Embrionárias/metabolismo , Perfilação da Expressão Gênica/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Isoformas de Proteínas/genética , Transcriptoma/genética , Células-Tronco Embrionárias/química , Humanos , Masculino
4.
Genome Biol ; 16: 197, 2015 Sep 17.
Artigo em Inglês | MEDLINE | ID: mdl-26381235

RESUMO

SomaticSeq is an accurate somatic mutation detection pipeline implementing a stochastic boosting algorithm to produce highly accurate somatic mutation calls for both single nucleotide variants and small insertions and deletions. The workflow currently incorporates five state-of-the-art somatic mutation callers, and extracts over 70 individual genomic and sequencing features for each candidate site. A training set is provided to an adaptively boosted decision tree learner to create a classifier for predicting mutation statuses. We validate our results with both synthetic and real data. We report that SomaticSeq is able to achieve better overall accuracy than any individual tool incorporated.


Assuntos
Análise Mutacional de DNA/métodos , Aprendizado de Máquina , Neoplasias/genética , Humanos , Mutação INDEL
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA