Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 10 de 10
Filtrar
Más filtros












Base de datos
Intervalo de año de publicación
1.
Res Comput Mol Biol ; 14758: 308-313, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-39027313

RESUMEN

Finding relatives within a study cohort is a necessary step in many genomic studies. However, when the cohort is distributed across multiple entities subject to data-sharing restrictions, performing this step often becomes infeasible. Developing a privacy-preserving solution for this task is challenging due to the significant burden of estimating kinship between all pairs of individuals across datasets. We introduce SF-Relate, a practical and secure federated algorithm for identifying genetic relatives across data silos. SF-Relate vastly reduces the number of individual pairs to compare while maintaining accurate detection through a novel locality-sensitive hashing approach. We assign individuals who are likely to be related together into buckets and then test relationships only between individuals in matching buckets across parties. To this end, we construct an effective hash function that captures identity-by-descent (IBD) segments in genetic sequences, which, along with a new bucketing strategy, enable accurate and practical private relative detection. To guarantee privacy, we introduce an efficient algorithm based on multiparty homomorphic encryption (MHE) to allow data holders to cooperatively compute the relatedness coefficients between individuals, and to further classify their degrees of relatedness, all without sharing any private data. We demonstrate the accuracy and practical runtimes of SF-Relate on the UK Biobank and All of Us datasets. On a dataset of 200K individuals split between two parties, SF-Relate detects 94.9% of third-degree relatives, and 99.9% of second-degree or closer relatives, within 15 hours of runtime. Our work enables secure identification of relatives across large-scale genomic datasets.

2.
bioRxiv ; 2024 Feb 28.
Artículo en Inglés | MEDLINE | ID: mdl-38464114

RESUMEN

Gene fusions are found as cancer drivers in diverse adult and pediatric cancers. Accurate detection of fusion transcripts is essential in cancer clinical diagnostics, prognostics, and for guiding therapeutic development. Most currently available methods for fusion transcript detection are compatible with Illumina RNA-seq involving highly accurate short read sequences. Recent advances in long read isoform sequencing enable the detection of fusion transcripts at unprecedented resolution in bulk and single cell samples. Here we developed a new computational tool CTAT-LR-fusion to detect fusion transcripts from long read RNA-seq with or without companion short reads, with applications to bulk or single cell transcriptomes. We demonstrate that CTAT-LR-fusion exceeds fusion detection accuracy of alternative methods as benchmarked with simulated and real long read RNA-seq. Using short and long read RNA-seq, we further apply CTAT-LR-fusion to bulk transcriptomes of nine tumor cell lines, and to tumor single cells derived from a melanoma sample and three metastatic high grade serous ovarian carcinoma samples. In both bulk and in single cell RNA-seq, long isoform reads yielded higher sensitivity for fusion detection than short reads with notable exceptions. By combining short and long reads in CTAT-LR-fusion, we are able to further maximize detection of fusion splicing isoforms and fusion-expressing tumor cells. CTAT-LR-fusion is available at https://github.com/TrinityCTAT/CTAT-LR-fusion/wiki.

3.
Nat Biotechnol ; 42(4): 582-586, 2024 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-37291427

RESUMEN

Full-length RNA-sequencing methods using long-read technologies can capture complete transcript isoforms, but their throughput is limited. We introduce multiplexed arrays isoform sequencing (MAS-ISO-seq), a technique for programmably concatenating complementary DNAs (cDNAs) into molecules optimal for long-read sequencing, increasing the throughput >15-fold to nearly 40 million cDNA reads per run on the Sequel IIe sequencer. When applied to single-cell RNA sequencing of tumor-infiltrating T cells, MAS-ISO-seq demonstrated a 12- to 32-fold increase in the discovery of differentially spliced genes.


Asunto(s)
Secuenciación de Nucleótidos de Alto Rendimiento , Isoformas de ARN , ADN Complementario/genética , Isoformas de ARN/genética , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Isoformas de Proteínas/genética , Análisis de Secuencia de ARN/métodos , Transcriptoma , Perfilación de la Expresión Génica/métodos , ARN/genética
4.
Nat Methods ; 20(4): 559-568, 2023 04.
Artículo en Inglés | MEDLINE | ID: mdl-36959322

RESUMEN

Structural variants (SVs) are a major driver of genetic diversity and disease in the human genome and their discovery is imperative to advances in precision medicine. Existing SV callers rely on hand-engineered features and heuristics to model SVs, which cannot scale to the vast diversity of SVs nor fully harness the information available in sequencing datasets. Here we propose an extensible deep-learning framework, Cue, to call and genotype SVs that can learn complex SV abstractions directly from the data. At a high level, Cue converts alignments to images that encode SV-informative signals and uses a stacked hourglass convolutional neural network to predict the type, genotype and genomic locus of the SVs captured in each image. We show that Cue outperforms the state of the art in the detection of several classes of SVs on synthetic and real short-read data and that it can be easily extended to other sequencing platforms, while achieving competitive performance.


Asunto(s)
Aprendizaje Profundo , Programas Informáticos , Humanos , Genotipo , Señales (Psicología) , Variación Estructural del Genoma , Genoma Humano
5.
Bioinformatics ; 36(4): 1082-1090, 2020 02 15.
Artículo en Inglés | MEDLINE | ID: mdl-31584621

RESUMEN

MOTIVATION: We propose Meltos, a novel computational framework to address the challenging problem of building tumor phylogeny trees using somatic structural variants (SVs) among multiple samples. Meltos leverages the tumor phylogeny tree built on somatic single nucleotide variants (SNVs) to identify high confidence SVs and produce a comprehensive tumor lineage tree, using a novel optimization formulation. While we do not assume the evolutionary progression of SVs is necessarily the same as SNVs, we show that a tumor phylogeny tree using high-quality somatic SNVs can act as a guide for calling and assigning somatic SVs on a tree. Meltos utilizes multiple genomic read signals for potential SV breakpoints in whole genome sequencing data and proposes a probabilistic formulation for estimating variant allele fractions (VAFs) of SV events. RESULTS: In order to assess the ability of Meltos to correctly refine SNV trees with SV information, we tested Meltos on two simulated datasets with five genomes in both. We also assessed Meltos on two real cancer datasets. We tested Meltos on multiple samples from a liposarcoma tumor and on a multi-sample breast cancer data (Yates et al., 2015), where the authors provide validated structural variation events together with deep, targeted sequencing for a collection of somatic SNVs. We show Meltos has the ability to place high confidence validated SV calls on a refined tumor phylogeny tree. We also showed the flexibility of Meltos to either estimate VAFs directly from genomic data or to use copy number corrected estimates. AVAILABILITY AND IMPLEMENTATION: Meltos is available at https://github.com/ih-lab/Meltos. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Neoplasias , Genoma , Variación Estructural del Genoma , Genómica , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Neoplasias/genética , Filogenia , Análisis de Secuencia , Programas Informáticos
6.
Curr Protoc Bioinformatics ; 62(1): e49, 2018 06.
Artículo en Inglés | MEDLINE | ID: mdl-29927069

RESUMEN

The reconstruction of cancer phylogeny trees and quantifying the evolution of the disease is a challenging task. LICHeE and BAMSE are two computational tools designed and implemented recently for this purpose. They both utilize estimated variant allele fraction of somatic mutations across multiple samples to infer the most likely cancer phylogenies. This unit provides extensive guidelines for installing and running both LICHeE and BAMSE. © 2018 by John Wiley & Sons, Inc.


Asunto(s)
Algoritmos , Biología Computacional/métodos , Neoplasias/genética , Filogenia , Humanos
7.
J Comput Biol ; 25(7): 677-688, 2018 07.
Artículo en Inglés | MEDLINE | ID: mdl-29658784

RESUMEN

We introduce GATTACA, a framework for fast unsupervised binning of metagenomic contigs. Similar to recent approaches, GATTACA clusters contigs based on their coverage profiles across a large cohort of metagenomic samples; however, unlike previous methods that rely on read mapping, GATTACA quickly estimates these profiles from kmer counts stored in a compact index. This approach can result in over an order of magnitude speedup, while matching the accuracy of earlier methods on synthetic and real data benchmarks. It also provides a way to index metagenomic samples (e.g., from public repositories such as the Human Microbiome Project) offline once and reuse them across experiments; furthermore, the small size of the sample indices allows them to be easily transferred and stored. Leveraging the MinHash technique, GATTACA also provides an efficient way to identify publicly available metagenomic data that can be incorporated into the set of reference metagenomes to further improve binning accuracy. Thus, enabling easy indexing and reuse of publicly available metagenomic data sets, GATTACA makes accurate metagenomic analyses accessible to a much wider range of researchers.


Asunto(s)
Teorema de Bayes , Biología Computacional/estadística & datos numéricos , Metagenómica/estadística & datos numéricos , Microbiota/genética , Análisis por Conglomerados , Humanos , Metagenoma/genética
8.
Nat Commun ; 8: 15311, 2017 05 16.
Artículo en Inglés | MEDLINE | ID: mdl-28508884

RESUMEN

Low-cost clouds can alleviate the compute and storage burden of the genome sequencing data explosion. However, moving personal genome data analysis to the cloud can raise serious privacy concerns. Here, we devise a method named Balaur, a privacy preserving read mapper for hybrid clouds based on locality sensitive hashing and kmer voting. Balaur can securely outsource a substantial fraction of the computation to the public cloud, while being highly competitive in accuracy and speed with non-private state-of-the-art read aligners on short read data. We also show that the method is significantly faster than the state of the art in long read mapping. Therefore, Balaur can enable institutions handling massive genomic data sets to shift part of their analysis to the cloud without sacrificing accuracy or exposing sensitive information to an untrusted third party.


Asunto(s)
Algoritmos , Nube Computacional , Biología Computacional/métodos , Seguridad Computacional , Privacidad , Genómica/estadística & datos numéricos , Secuenciación de Nucleótidos de Alto Rendimiento/estadística & datos numéricos , Humanos , Reproducibilidad de los Resultados
9.
Genome Biol ; 16: 91, 2015 May 06.
Artículo en Inglés | MEDLINE | ID: mdl-25944252

RESUMEN

Somatic variants can be used as lineage markers for the phylogenetic reconstruction of cancer evolution. Since somatic phylogenetics is complicated by sample heterogeneity, novel specialized tree-building methods are required for cancer phylogeny reconstruction. We present LICHeE (Lineage Inference for Cancer Heterogeneity and Evolution), a novel method that automates the phylogenetic inference of cancer progression from multiple somatic samples. LICHeE uses variant allele frequencies of somatic single nucleotide variants obtained by deep sequencing to reconstruct multi-sample cell lineage trees and infer the subclonal composition of the samples. LICHeE is open source and available at http://viq854.github.io/lichee .


Asunto(s)
Linaje de la Célula/genética , Variación Genética , Neoplasias/genética , Algoritmos , Carcinoma de Células Renales/genética , Biología Computacional/métodos , Simulación por Computador , Progresión de la Enfermedad , Femenino , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Neoplasias Renales/genética , Neoplasias Ováricas/genética , Filogenia , Programas Informáticos , Ensayos Antitumor por Modelo de Xenoinjerto
10.
Bioinformatics ; 29(13): i361-70, 2013 Jul 01.
Artículo en Inglés | MEDLINE | ID: mdl-23813006

RESUMEN

SUMMARY: The increasing availability of high-throughput sequencing technologies has led to thousands of human genomes having been sequenced in the past years. Efforts such as the 1000 Genomes Project further add to the availability of human genome variation data. However, to date, there is no method that can map reads of a newly sequenced human genome to a large collection of genomes. Instead, methods rely on aligning reads to a single reference genome. This leads to inherent biases and lower accuracy. To tackle this problem, a new alignment tool BWBBLE is introduced in this article. We (i) introduce a new compressed representation of a collection of genomes, which explicitly tackles the genomic variation observed at every position, and (ii) design a new alignment algorithm based on the Burrows-Wheeler transform that maps short reads from a newly sequenced genome to an arbitrary collection of two or more (up to millions of) genomes with high accuracy and no inherent bias to one specific genome. AVAILABILITY: http://viq854.github.com/bwbble.


Asunto(s)
Genoma Humano , Alineación de Secuencia/métodos , Análisis de Secuencia de ADN/métodos , Programas Informáticos , Algoritmos , Variación Genética , Genómica/métodos , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...