Your browser doesn't support javascript.
loading
Montrer: 20 | 50 | 100
Résultats 1 - 9 de 9
Filtrer
Plus de filtres










Base de données
Gamme d'année
1.
bioRxiv ; 2024 Feb 28.
Article de Anglais | MEDLINE | ID: mdl-38464114

RÉSUMÉ

Gene fusions are found as cancer drivers in diverse adult and pediatric cancers. Accurate detection of fusion transcripts is essential in cancer clinical diagnostics, prognostics, and for guiding therapeutic development. Most currently available methods for fusion transcript detection are compatible with Illumina RNA-seq involving highly accurate short read sequences. Recent advances in long read isoform sequencing enable the detection of fusion transcripts at unprecedented resolution in bulk and single cell samples. Here we developed a new computational tool CTAT-LR-fusion to detect fusion transcripts from long read RNA-seq with or without companion short reads, with applications to bulk or single cell transcriptomes. We demonstrate that CTAT-LR-fusion exceeds fusion detection accuracy of alternative methods as benchmarked with simulated and real long read RNA-seq. Using short and long read RNA-seq, we further apply CTAT-LR-fusion to bulk transcriptomes of nine tumor cell lines, and to tumor single cells derived from a melanoma sample and three metastatic high grade serous ovarian carcinoma samples. In both bulk and in single cell RNA-seq, long isoform reads yielded higher sensitivity for fusion detection than short reads with notable exceptions. By combining short and long reads in CTAT-LR-fusion, we are able to further maximize detection of fusion splicing isoforms and fusion-expressing tumor cells. CTAT-LR-fusion is available at https://github.com/TrinityCTAT/CTAT-LR-fusion/wiki.

2.
Nat Biotechnol ; 42(4): 582-586, 2024 Apr.
Article de Anglais | MEDLINE | ID: mdl-37291427

RÉSUMÉ

Full-length RNA-sequencing methods using long-read technologies can capture complete transcript isoforms, but their throughput is limited. We introduce multiplexed arrays isoform sequencing (MAS-ISO-seq), a technique for programmably concatenating complementary DNAs (cDNAs) into molecules optimal for long-read sequencing, increasing the throughput >15-fold to nearly 40 million cDNA reads per run on the Sequel IIe sequencer. When applied to single-cell RNA sequencing of tumor-infiltrating T cells, MAS-ISO-seq demonstrated a 12- to 32-fold increase in the discovery of differentially spliced genes.


Sujet(s)
Séquençage nucléotidique à haut débit , Isoformes d'ARN , ADN complémentaire/génétique , Isoformes d'ARN/génétique , Séquençage nucléotidique à haut débit/méthodes , Isoformes de protéines/génétique , Analyse de séquence d'ARN/méthodes , Transcriptome , Analyse de profil d'expression de gènes/méthodes , ARN/génétique
3.
Nat Methods ; 20(4): 559-568, 2023 04.
Article de Anglais | MEDLINE | ID: mdl-36959322

RÉSUMÉ

Structural variants (SVs) are a major driver of genetic diversity and disease in the human genome and their discovery is imperative to advances in precision medicine. Existing SV callers rely on hand-engineered features and heuristics to model SVs, which cannot scale to the vast diversity of SVs nor fully harness the information available in sequencing datasets. Here we propose an extensible deep-learning framework, Cue, to call and genotype SVs that can learn complex SV abstractions directly from the data. At a high level, Cue converts alignments to images that encode SV-informative signals and uses a stacked hourglass convolutional neural network to predict the type, genotype and genomic locus of the SVs captured in each image. We show that Cue outperforms the state of the art in the detection of several classes of SVs on synthetic and real short-read data and that it can be easily extended to other sequencing platforms, while achieving competitive performance.


Sujet(s)
Apprentissage profond , Logiciel , Humains , Génotype , Signaux , Variation structurale du génome , Génome humain
4.
Bioinformatics ; 36(4): 1082-1090, 2020 02 15.
Article de Anglais | MEDLINE | ID: mdl-31584621

RÉSUMÉ

MOTIVATION: We propose Meltos, a novel computational framework to address the challenging problem of building tumor phylogeny trees using somatic structural variants (SVs) among multiple samples. Meltos leverages the tumor phylogeny tree built on somatic single nucleotide variants (SNVs) to identify high confidence SVs and produce a comprehensive tumor lineage tree, using a novel optimization formulation. While we do not assume the evolutionary progression of SVs is necessarily the same as SNVs, we show that a tumor phylogeny tree using high-quality somatic SNVs can act as a guide for calling and assigning somatic SVs on a tree. Meltos utilizes multiple genomic read signals for potential SV breakpoints in whole genome sequencing data and proposes a probabilistic formulation for estimating variant allele fractions (VAFs) of SV events. RESULTS: In order to assess the ability of Meltos to correctly refine SNV trees with SV information, we tested Meltos on two simulated datasets with five genomes in both. We also assessed Meltos on two real cancer datasets. We tested Meltos on multiple samples from a liposarcoma tumor and on a multi-sample breast cancer data (Yates et al., 2015), where the authors provide validated structural variation events together with deep, targeted sequencing for a collection of somatic SNVs. We show Meltos has the ability to place high confidence validated SV calls on a refined tumor phylogeny tree. We also showed the flexibility of Meltos to either estimate VAFs directly from genomic data or to use copy number corrected estimates. AVAILABILITY AND IMPLEMENTATION: Meltos is available at https://github.com/ih-lab/Meltos. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Sujet(s)
Tumeurs , Génome , Variation structurale du génome , Génomique , Séquençage nucléotidique à haut débit , Humains , Tumeurs/génétique , Phylogenèse , Analyse de séquence , Logiciel
5.
Curr Protoc Bioinformatics ; 62(1): e49, 2018 06.
Article de Anglais | MEDLINE | ID: mdl-29927069

RÉSUMÉ

The reconstruction of cancer phylogeny trees and quantifying the evolution of the disease is a challenging task. LICHeE and BAMSE are two computational tools designed and implemented recently for this purpose. They both utilize estimated variant allele fraction of somatic mutations across multiple samples to infer the most likely cancer phylogenies. This unit provides extensive guidelines for installing and running both LICHeE and BAMSE. © 2018 by John Wiley & Sons, Inc.


Sujet(s)
Algorithmes , Biologie informatique/méthodes , Tumeurs/génétique , Phylogenèse , Humains
6.
J Comput Biol ; 25(7): 677-688, 2018 07.
Article de Anglais | MEDLINE | ID: mdl-29658784

RÉSUMÉ

We introduce GATTACA, a framework for fast unsupervised binning of metagenomic contigs. Similar to recent approaches, GATTACA clusters contigs based on their coverage profiles across a large cohort of metagenomic samples; however, unlike previous methods that rely on read mapping, GATTACA quickly estimates these profiles from kmer counts stored in a compact index. This approach can result in over an order of magnitude speedup, while matching the accuracy of earlier methods on synthetic and real data benchmarks. It also provides a way to index metagenomic samples (e.g., from public repositories such as the Human Microbiome Project) offline once and reuse them across experiments; furthermore, the small size of the sample indices allows them to be easily transferred and stored. Leveraging the MinHash technique, GATTACA also provides an efficient way to identify publicly available metagenomic data that can be incorporated into the set of reference metagenomes to further improve binning accuracy. Thus, enabling easy indexing and reuse of publicly available metagenomic data sets, GATTACA makes accurate metagenomic analyses accessible to a much wider range of researchers.


Sujet(s)
Théorème de Bayes , Biologie informatique/statistiques et données numériques , Métagénomique/statistiques et données numériques , Microbiote/génétique , Analyse de regroupements , Humains , Métagénome/génétique
7.
Nat Commun ; 8: 15311, 2017 05 16.
Article de Anglais | MEDLINE | ID: mdl-28508884

RÉSUMÉ

Low-cost clouds can alleviate the compute and storage burden of the genome sequencing data explosion. However, moving personal genome data analysis to the cloud can raise serious privacy concerns. Here, we devise a method named Balaur, a privacy preserving read mapper for hybrid clouds based on locality sensitive hashing and kmer voting. Balaur can securely outsource a substantial fraction of the computation to the public cloud, while being highly competitive in accuracy and speed with non-private state-of-the-art read aligners on short read data. We also show that the method is significantly faster than the state of the art in long read mapping. Therefore, Balaur can enable institutions handling massive genomic data sets to shift part of their analysis to the cloud without sacrificing accuracy or exposing sensitive information to an untrusted third party.


Sujet(s)
Algorithmes , Informatique en nuage , Biologie informatique/méthodes , Sécurité informatique , Vie privée , Génomique/statistiques et données numériques , Séquençage nucléotidique à haut débit/statistiques et données numériques , Humains , Reproductibilité des résultats
8.
Genome Biol ; 16: 91, 2015 May 06.
Article de Anglais | MEDLINE | ID: mdl-25944252

RÉSUMÉ

Somatic variants can be used as lineage markers for the phylogenetic reconstruction of cancer evolution. Since somatic phylogenetics is complicated by sample heterogeneity, novel specialized tree-building methods are required for cancer phylogeny reconstruction. We present LICHeE (Lineage Inference for Cancer Heterogeneity and Evolution), a novel method that automates the phylogenetic inference of cancer progression from multiple somatic samples. LICHeE uses variant allele frequencies of somatic single nucleotide variants obtained by deep sequencing to reconstruct multi-sample cell lineage trees and infer the subclonal composition of the samples. LICHeE is open source and available at http://viq854.github.io/lichee .


Sujet(s)
Lignage cellulaire/génétique , Variation génétique , Tumeurs/génétique , Algorithmes , Néphrocarcinome/génétique , Biologie informatique/méthodes , Simulation numérique , Évolution de la maladie , Femelle , Séquençage nucléotidique à haut débit , Humains , Tumeurs du rein/génétique , Tumeurs de l'ovaire/génétique , Phylogenèse , Logiciel , Tests d'activité antitumorale sur modèle de xénogreffe
9.
Bioinformatics ; 29(13): i361-70, 2013 Jul 01.
Article de Anglais | MEDLINE | ID: mdl-23813006

RÉSUMÉ

SUMMARY: The increasing availability of high-throughput sequencing technologies has led to thousands of human genomes having been sequenced in the past years. Efforts such as the 1000 Genomes Project further add to the availability of human genome variation data. However, to date, there is no method that can map reads of a newly sequenced human genome to a large collection of genomes. Instead, methods rely on aligning reads to a single reference genome. This leads to inherent biases and lower accuracy. To tackle this problem, a new alignment tool BWBBLE is introduced in this article. We (i) introduce a new compressed representation of a collection of genomes, which explicitly tackles the genomic variation observed at every position, and (ii) design a new alignment algorithm based on the Burrows-Wheeler transform that maps short reads from a newly sequenced genome to an arbitrary collection of two or more (up to millions of) genomes with high accuracy and no inherent bias to one specific genome. AVAILABILITY: http://viq854.github.com/bwbble.


Sujet(s)
Génome humain , Alignement de séquences/méthodes , Analyse de séquence d'ADN/méthodes , Logiciel , Algorithmes , Variation génétique , Génomique/méthodes , Séquençage nucléotidique à haut débit , Humains
SÉLECTION CITATIONS
DÉTAIL DE RECHERCHE
...