Your browser doesn't support javascript.
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 95
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Nat Commun ; 11(1): 729, 2020 Feb 05.
Artigo em Inglês | MEDLINE | ID: mdl-32024854

RESUMO

The catalog of cancer driver mutations in protein-coding genes has greatly expanded in the past decade. However, non-coding cancer driver mutations are less well-characterized and only a handful of recurrent non-coding mutations, most notably TERT promoter mutations, have been reported. Here, as part of the ICGC/TCGA Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium, which aggregated whole genome sequencing data from 2658 cancer across 38 tumor types, we perform multi-faceted pathway and network analyses of non-coding mutations across 2583 whole cancer genomes from 27 tumor types compiled by the ICGC/TCGA PCAWG project that was motivated by the success of pathway and network analyses in prioritizing rare mutations in protein-coding genes. While few non-coding genomic elements are recurrently mutated in this cohort, we identify 93 genes harboring non-coding mutations that cluster into several modules of interacting proteins. Among these are promoter mutations associated with reduced mRNA expression in TP53, TLE4, and TCF4. We find that biological processes had variable proportions of coding and non-coding mutations, with chromatin remodeling and proliferation pathways altered primarily by coding mutations, while developmental pathways, including Wnt and Notch, altered by both coding and non-coding mutations. RNA splicing is primarily altered by non-coding mutations in this cohort, and samples containing non-coding mutations in well-known RNA splicing factors exhibit similar gene expression signatures as samples with coding mutations in these genes. These analyses contribute a new repertoire of possible cancer genes and mechanisms that are altered by non-coding mutations and offer insights into additional cancer vulnerabilities that can be investigated for potential therapeutic treatments.

2.
Genome Biol ; 21(1): 31, 2020 Feb 07.
Artigo em Inglês | MEDLINE | ID: mdl-32033589

RESUMO

The recent boom in microfluidics and combinatorial indexing strategies, combined with low sequencing costs, has empowered single-cell sequencing technology. Thousands-or even millions-of cells analyzed in a single experiment amount to a data revolution in single-cell biology and pose unique data science problems. Here, we outline eleven challenges that will be central to bringing this emerging field of single-cell data science forward. For each challenge, we highlight motivating research questions, review prior work, and formulate open problems. This compendium is for established researchers, newcomers, and students alike, highlighting interesting and rewarding problems for the coming years.

3.
Genome Res ; 30(2): 195-204, 2020 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-31992614

RESUMO

Single-cell RNA-sequencing (scRNA-seq) enables high-throughput measurement of RNA expression in single cells. However, because of technical limitations, scRNA-seq data often contain zero counts for many transcripts in individual cells. These zero counts, or dropout events, complicate the analysis of scRNA-seq data using standard methods developed for bulk RNA-seq data. Current scRNA-seq analysis methods typically overcome dropout by combining information across cells in a lower-dimensional space, leveraging the observation that cells generally occupy a small number of RNA expression states. We introduce netNMF-sc, an algorithm for scRNA-seq analysis that leverages information across both cells and genes. netNMF-sc learns a low-dimensional representation of scRNA-seq transcript counts using network-regularized non-negative matrix factorization. The network regularization takes advantage of prior knowledge of gene-gene interactions, encouraging pairs of genes with known interactions to be nearby each other in the low-dimensional representation. The resulting matrix factorization imputes gene abundance for both zero and nonzero counts and can be used to cluster cells into meaningful subpopulations. We show that netNMF-sc outperforms existing methods at clustering cells and estimating gene-gene covariance using both simulated and real scRNA-seq data, with increasing advantages at higher dropout rates (e.g., >60%). We also show that the results from netNMF-sc are robust to variation in the input network, with more representative networks leading to greater performance gains.

4.
Blood ; 135(1): 41-55, 2020 Jan 02.
Artigo em Inglês | MEDLINE | ID: mdl-31697823

RESUMO

To study the mechanisms of relapse in acute lymphoblastic leukemia (ALL), we performed whole-genome sequencing of 103 diagnosis-relapse-germline trios and ultra-deep sequencing of 208 serial samples in 16 patients. Relapse-specific somatic alterations were enriched in 12 genes (NR3C1, NR3C2, TP53, NT5C2, FPGS, CREBBP, MSH2, MSH6, PMS2, WHSC1, PRPS1, and PRPS2) involved in drug response. Their prevalence was 17% in very early relapse (<9 months from diagnosis), 65% in early relapse (9-36 months), and 32% in late relapse (>36 months) groups. Convergent evolution, in which multiple subclones harbor mutations in the same drug resistance gene, was observed in 6 relapses and confirmed by single-cell sequencing in 1 case. Mathematical modeling and mutational signature analysis indicated that early relapse resistance acquisition was frequently a 2-step process in which a persistent clone survived initial therapy and later acquired bona fide resistance mutations during therapy. In contrast, very early relapses arose from preexisting resistant clone(s). Two novel relapse-specific mutational signatures, one of which was caused by thiopurine treatment based on in vitro drug exposure experiments, were identified in early and late relapses but were absent from 2540 pan-cancer diagnosis samples and 129 non-ALL relapses. The novel signatures were detected in 27% of relapsed ALLs and were responsible for 46% of acquired resistance mutations in NT5C2, PRPS1, NR3C1, and TP53. These results suggest that chemotherapy-induced drug resistance mutations facilitate a subset of pediatric ALL relapses.

5.
Cell Syst ; 8(6): 514-522.e5, 2019 Jun 26.
Artigo em Inglês | MEDLINE | ID: mdl-31229560

RESUMO

Longitudinal DNA sequencing of cancer patients yields insight into how tumors evolve over time or in response to treatment. However, sequencing data from bulk tumor samples often have considerable ambiguity in clonal composition, complicating the inference of ancestral relationships between clones. We introduce Cancer Analysis of Longitudinal Data through Evolutionary Reconstruction (CALDER), an algorithm to infer phylogenetic trees from longitudinal bulk DNA sequencing data. CALDER explicitly models a longitudinally observed phylogeny incorporating constraints that longitudinal sampling imposes on phylogeny reconstruction. We show on simulated bulk tumor data that longitudinal constraints substantially reduce ambiguity in phylogeny reconstruction and that CALDER outperforms existing methods that do not leverage this longitudinal information. On real data from two chronic lymphocytic leukemia patients, we find that CALDER reconstructs more plausible and parsimonious phylogenies than existing methods, with CALDER phylogenies containing fewer tumor clones per sample. CALDER's use of longitudinal information will be advantageous in further studies of tumor heterogeneity and evolution.

6.
Mol Cancer Res ; 17(4): 895-906, 2019 04.
Artigo em Inglês | MEDLINE | ID: mdl-30651371

RESUMO

To investigate the genomic evolution of metastatic pediatric osteosarcoma, we performed whole-genome and targeted deep sequencing on 14 osteosarcoma metastases and two primary tumors from four patients (two to eight samples per patient). All four patients harbored ancestral (truncal) somatic variants resulting in TP53 inactivation and cell-cycle aberrations, followed by divergence into relapse-specific lineages exhibiting a cisplatin-induced mutation signature. In three of the four patients, the cisplatin signature accounted for >40% of mutations detected in the metastatic samples. Mutations potentially acquired during cisplatin treatment included NF1 missense mutations of uncertain significance in two patients and a KIT G565R activating mutation in one patient. Three of four patients demonstrated widespread ploidy differences between samples from the sample patient. Single-cell seeding of metastasis was detected in most metastatic samples. Cross-seeding between metastatic sites was observed in one patient, whereas in another patient a minor clone from the primary tumor seeded both metastases analyzed. These results reveal extensive clonal heterogeneity in metastatic osteosarcoma, much of which is likely cisplatin-induced. IMPLICATIONS: The extent and consequences of chemotherapy-induced damage in pediatric cancers is unknown. We found that cisplatin treatment can potentially double the mutational burden in osteosarcoma, which has implications for optimizing therapy for recurrent, chemotherapy-resistant disease.


Assuntos
Neoplasias Ósseas/tratamento farmacológico , Neoplasias Ósseas/genética , Cisplatino/uso terapêutico , Osteossarcoma/tratamento farmacológico , Osteossarcoma/genética , Antineoplásicos/farmacologia , Neoplasias Ósseas/patologia , Cisplatino/farmacologia , Evolução Clonal/efeitos dos fármacos , Análise Mutacional de DNA , Feminino , Humanos , Neoplasias Pulmonares/tratamento farmacológico , Neoplasias Pulmonares/genética , Neoplasias Pulmonares/secundário , Masculino , Modelos Genéticos , Mutagênese/efeitos dos fármacos , Metástase Neoplásica , Osteossarcoma/patologia , Sequenciamento Completo do Genoma
7.
Bioinformatics ; 34(17): i972-i980, 2018 09 01.
Artigo em Inglês | MEDLINE | ID: mdl-30423088

RESUMO

Motivation: The analysis of high-dimensional 'omics data is often informed by the use of biological interaction networks. For example, protein-protein interaction networks have been used to analyze gene expression data, to prioritize germline variants, and to identify somatic driver mutations in cancer. In these and other applications, the underlying computational problem is to identify altered subnetworks containing genes that are both highly altered in an 'omics dataset and are topologically close (e.g. connected) on an interaction network. Results: We introduce Hierarchical HotNet, an algorithm that finds a hierarchy of altered subnetworks. Hierarchical HotNet assesses the statistical significance of the resulting subnetworks over a range of biological scales and explicitly controls for ascertainment bias in the network. We evaluate the performance of Hierarchical HotNet and several other algorithms that identify altered subnetworks on the problem of predicting cancer genes and significantly mutated subnetworks. On somatic mutation data from The Cancer Genome Atlas, Hierarchical HotNet outperforms other methods and identifies significantly mutated subnetworks containing both well-known cancer genes and candidate cancer genes that are rarely mutated in the cohort. Hierarchical HotNet is a robust algorithm for identifying altered subnetworks across different 'omics datasets. Availability and implementation: http://github.com/raphael-group/hierarchical-hotnet. Supplementary information: Supplementary material are available at Bioinformatics online.


Assuntos
Neoplasias/genética , Algoritmos , Redes Reguladoras de Genes , Humanos , Neoplasias/metabolismo , Oncogenes , Mapas de Interação de Proteínas
8.
BMC Bioinformatics ; 19(1): 323, 2018 Sep 14.
Artigo em Inglês | MEDLINE | ID: mdl-30217148

RESUMO

BACKGROUND: Procedures for controlling the false discovery rate (FDR) are widely applied as a solution to the multiple comparisons problem of high-dimensional statistics. Current FDR-controlling procedures require accurately calculated p-values and rely on extrapolation into the unknown and unobserved tails of the null distribution. Both of these intermediate steps are challenging and can compromise the reliability of the results. RESULTS: We present a general method for controlling the FDR that capitalizes on the large amount of control data often found in big data studies to avoid these frequently problematic intermediate steps. The method utilizes control data to empirically construct the distribution of the test statistic under the null hypothesis and directly compares this distribution to the empirical distribution of the test data. By not relying on p-values, our control data-based empirical FDR procedure more closely follows the foundational principles of the scientific method: that inference is drawn by comparing test data to control data. The method is demonstrated through application to a problem in structural genomics. CONCLUSIONS: The method described here provides a general statistical framework for controlling the FDR that is specifically tailored for the big data setting. By relying on empirically constructed distributions and control data, it forgoes potentially problematic modeling steps and extrapolation into the unknown tails of the null distribution. This procedure is broadly applicable insofar as controlled experiments or internal negative controls are available, as is increasingly common in the big data setting.


Assuntos
Modelos Estatísticos , Teorema de Bayes , Reparo do DNA , Bases de Dados Factuais , Genoma Humano , Humanos
9.
Cell ; 173(7): 1562-1565, 2018 06 14.
Artigo em Inglês | MEDLINE | ID: mdl-29906441

RESUMO

A major ambition of artificial intelligence lies in translating patient data to successful therapies. Machine learning models face particular challenges in biomedicine, however, including handling of extreme data heterogeneity and lack of mechanistic insight into predictions. Here, we argue for "visible" approaches that guide model structure with experimental biology.


Assuntos
Biologia Computacional/métodos , Aprendizado de Máquina , Algoritmos , Pesquisa Biomédica
10.
Bioinformatics ; 34(13): i211-i217, 2018 07 01.
Artigo em Inglês | MEDLINE | ID: mdl-29950014

RESUMO

Motivation: Current technologies for single-cell DNA sequencing require whole-genome amplification (WGA), as a single cell contains too little DNA for direct sequencing. Unfortunately, WGA introduces biases in the resulting sequencing data, including non-uniformity in genome coverage and high rates of allele dropout. These biases complicate many downstream analyses, including the detection of genomic variants. Results: We show that amplification biases have a potential upside: long-range correlations in rates of allele dropout provide a signal for phasing haplotypes at the lengths of amplicons from WGA, lengths which are generally longer than than individual sequence reads. We describe a statistical test to measure concurrent allele dropout between single-nucleotide polymorphisms (SNPs) across multiple sequenced single cells. We use results of this test to perform haplotype assembly across a collection of single cells. We demonstrate that the algorithm predicts phasing between pairs of SNPs with higher accuracy than phasing from reads alone. Using whole-genome sequencing data from only seven neural cells, we obtain haplotype blocks that are orders of magnitude longer than with sequence reads alone (median length 10.2 kb versus 312 bp), with error rates <2%. We demonstrate similar advantages on whole-exome data from 16 cells, where we obtain haplotype blocks with median length 9.2 kb-comparable to typical gene lengths-compared with median lengths of 41 bp with sequence reads alone, with error rates <4%. Our algorithm will be useful for haplotyping of rare alleles and studies of allele-specific somatic aberrations. Availability and implementation: Source code is available at https://www.github.com/raphael-group. Supplementary information: Supplementary data are available at Bioinformatics online.


Assuntos
Diploide , Haplótipos , Polimorfismo de Nucleotídeo Único , Análise de Célula Única/métodos , Software , Sequenciamento Completo do Genoma/métodos , Algoritmos , Neoplasias da Mama/genética , Feminino , Frequência do Gene , Genoma Humano , Genômica/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Humanos , Masculino , Neurônios
11.
Nat Genet ; 50(5): 718-726, 2018 05.
Artigo em Inglês | MEDLINE | ID: mdl-29700472

RESUMO

Metastasis is the migration of cancerous cells from a primary tumor to other anatomical sites. Although metastasis was long thought to result from monoclonal seeding, or single cellular migrations, recent phylogenetic analyses of metastatic cancers have reported complex patterns of cellular migrations between sites, including polyclonal migrations and reseeding. However, accurate determination of migration patterns from somatic mutation data is complicated by intratumor heterogeneity and discordance between clonal lineage and cellular migration. We introduce MACHINA, a multi-objective optimization algorithm that jointly infers clonal lineages and parsimonious migration histories of metastatic cancers from DNA sequencing data. MACHINA analysis of data from multiple cancers shows that migration patterns are often not uniquely determined from sequencing data alone and that complicated migration patterns among primary tumors and metastases may be less prevalent than previously reported. MACHINA's rigorous analysis of migration histories will aid in studies of the drivers of metastasis.


Assuntos
Movimento Celular/genética , Neoplasias/genética , Neoplasias/patologia , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Humanos , Mutação , Metástase Neoplásica , Filogenia
12.
J Comput Biol ; 25(7): 689-708, 2018 07.
Artigo em Inglês | MEDLINE | ID: mdl-29658782

RESUMO

Cancer is an evolutionary process driven by somatic mutations. This process can be represented as a phylogenetic tree. Constructing such a phylogenetic tree from genome sequencing data is a challenging task due to the many types of mutations in cancer and the fact that nearly all cancer sequencing is of a bulk tumor, measuring a superposition of somatic mutations present in different cells. We study the problem of reconstructing tumor phylogenies from copy-number aberrations (CNAs) measured in bulk-sequencing data. We introduce the Copy-Number Tree Mixture Deconvolution (CNTMD) problem, which aims to find the phylogenetic tree with the fewest number of CNAs that explain the copy-number data from multiple samples of a tumor. We design an algorithm for solving the CNTMD problem and apply the algorithm to both simulated and real data. On simulated data, we find that our algorithm outperforms existing approaches that either perform deconvolution/factorization of mixed tumor samples or build phylogenetic trees assuming homogeneous tumor samples. On real data, we analyze multiple samples from a prostate cancer patient, identifying clones within these samples and a phylogenetic tree that relates these clones and their differing proportions across samples. This phylogenetic tree provides a higher resolution view of copy-number evolution of this cancer than published analyses.


Assuntos
Biologia Computacional , Variações do Número de Cópias de DNA/genética , Neoplasias/genética , Filogenia , Algoritmos , Humanos , Neoplasias/patologia
13.
Bioinformatics ; 34(2): 346-352, 2018 Jan 15.
Artigo em Inglês | MEDLINE | ID: mdl-29186385

RESUMO

MOTIVATION: The traditional view of cancer evolution states that a cancer genome accumulates a sequential ordering of mutations over a long period of time. However, in recent years it has been suggested that a cancer genome may instead undergo a one-time catastrophic event, such as chromothripsis, where a large number of mutations instead occur simultaneously. A number of potential signatures of chromothripsis have been proposed. In this work, we provide a rigorous formulation and analysis of the 'ability to walk the derivative chromosome' signature originally proposed by Korbel and Campbell. In particular, we show that this signature, as originally envisioned, may not always be present in a chromothripsis genome and we provide a precise quantification of under what circumstances it would be present. We also propose a variation on this signature, the H/T alternating fraction, which allows us to overcome some of the limitations of the original signature. RESULTS: We apply our measure to both simulated data and a previously analyzed real cancer dataset and find that the H/T alternating fraction may provide useful signal for distinguishing genomes having acquired mutations simultaneously from those acquired in a sequential fashion. AVAILABILITY AND IMPLEMENTATION: An implementation of the H/T alternating fraction is available at https://bitbucket.org/oesperlab/ht-altfrac. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

14.
Bioinformatics ; 34(2): 353-360, 2018 Jan 15.
Artigo em Inglês | MEDLINE | ID: mdl-29112732

RESUMO

MOTIVATION: Structural variation, including large deletions, duplications, inversions, translocations and other rearrangements, is common in human and cancer genomes. A number of methods have been developed to identify structural variants from Illumina short-read sequencing data. However, reliable identification of structural variants remains challenging because many variants have breakpoints in repetitive regions of the genome and thus are difficult to identify with short reads. The recently developed linked-read sequencing technology from 10X Genomics combines a novel barcoding strategy with Illumina sequencing. This technology labels all reads that originate from a small number (∼5 to 10) DNA molecules ∼50 Kbp in length with the same molecular barcode. These barcoded reads contain long-range sequence information that is advantageous for identification of structural variants. RESULTS: We present Novel Adjacency Identification with Barcoded Reads (NAIBR), an algorithm to identify structural variants in linked-read sequencing data. NAIBR predicts novel adjacencies in an individual genome resulting from structural variants using a probabilistic model that combines multiple signals in barcoded reads. We show that NAIBR outperforms several existing methods for structural variant identification-including two recent methods that also analyze linked-reads-on simulated sequencing data and 10X whole-genome sequencing data from the NA12878 human genome and the HCC1954 breast cancer cell line. Several of the novel somatic structural variants identified in HCC1954 overlap known cancer genes. AVAILABILITY AND IMPLEMENTATION: Software is available at compbio.cs.brown.edu/software. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

15.
Genome Res ; 27(11): 1885-1894, 2017 11.
Artigo em Inglês | MEDLINE | ID: mdl-29030470

RESUMO

Intra-tumor heterogeneity poses substantial challenges for cancer treatment. A tumor's composition can be deduced by reconstructing its mutational history. Central to current approaches is the infinite sites assumption that every genomic position can only mutate once over the lifetime of a tumor. The validity of this assumption has never been quantitatively assessed. We developed a rigorous statistical framework to test the infinite sites assumption with single-cell sequencing data. Our framework accounts for the high noise and contamination present in such data. We found strong evidence for the same genomic position being mutationally affected multiple times in individual tumors for 11 of 12 single-cell sequencing data sets from a variety of human cancers. Seven cases involved the loss of earlier mutations, five of which occurred at sites unaffected by large-scale genomic deletions. Four cases exhibited a parallel mutation, potentially indicating convergent evolution at the base pair level. Our results refute the general validity of the infinite sites assumption and indicate that more complex models are needed to adequately quantify intra-tumor heterogeneity for more effective cancer treatment.


Assuntos
Mutação , Neoplasias/genética , Análise de Célula Única/métodos , Sequenciamento Completo do Exoma/métodos , Evolução Molecular , Heterogeneidade Genética , Humanos , Modelos Estatísticos
16.
Bioinformatics ; 33(14): i152-i160, 2017 Jul 15.
Artigo em Inglês | MEDLINE | ID: mdl-28882002

RESUMO

Motivation: A tumor arises from an evolutionary process that can be modeled as a phylogenetic tree. However, reconstructing this tree is challenging as most cancer sequencing uses bulk tumor tissue containing heterogeneous mixtures of cells. Results: We introduce P robabilistic A lgorithm for S omatic Tr ee I nference (PASTRI), a new algorithm for bulk-tumor sequencing data that clusters somatic mutations into clones and infers a phylogenetic tree that describes the evolutionary history of the tumor. PASTRI uses an importance sampling algorithm that combines a probabilistic model of DNA sequencing data with a enumeration algorithm based on the combinatorial constraints defined by the underlying phylogenetic tree. As a result, tree inference is fast, accurate and robust to noise. We demonstrate on simulated data that PASTRI outperforms other cancer phylogeny algorithms in terms of runtime and accuracy. On real data from a chronic lymphocytic leukemia (CLL) patient, we show that a simple linear phylogeny better explains the data the complex branching phylogeny that was previously reported. PASTRI provides a robust approach for phylogenetic tree inference from mixed samples. Availability and Implementation: Software is available at compbio.cs.brown.edu/software. Contact: braphael@princeton.edu. Supplementary information: Supplementary data are available at Bioinformatics online.


Assuntos
Evolução Clonal , Genômica/métodos , Neoplasias/genética , Análise de Sequência de DNA/métodos , Software , Algoritmos , Humanos , Leucemia Linfoide/genética , Leucemia Linfoide/fisiopatologia , Modelos Estatísticos , Neoplasias/fisiopatologia
17.
Cancer Epidemiol Biomarkers Prev ; 26(10): 1531-1539, 2017 10.
Artigo em Inglês | MEDLINE | ID: mdl-28751478

RESUMO

Background: Acute lymphoblastic leukemia (ALL) is the most common childhood cancer, suggesting that germline variants influence ALL risk. Although multiple genome-wide association (GWA) studies have identified variants predisposing children to ALL, it remains unclear whether genetic heterogeneity affects ALL susceptibility and how interactions within and among genes containing ALL-associated variants influence ALL risk.Methods: Here, we jointly analyzed two published datasets of case-control GWA summary statistics along with germline data from ALL case-parent trios. We used the gene-level association method PEGASUS to identify genes with multiple variants associated with ALL. We then used PEGASUS gene scores as input to the network analysis algorithm HotNet2 to characterize the genomic architecture of ALL.Results: Using PEGASUS, we confirmed associations previously observed at genes such as ARID5B, IKZF1, CDKN2A/2B, and PIP4K2A, and we identified novel candidate gene associations. Using HotNet2, we uncovered significant gene subnetworks that may underlie inherited ALL risk: a subnetwork involved in B-cell differentiation containing the ALL-associated gene CEBPE, and a subnetwork of homeobox genes, including MEIS1Conclusions: Gene and network analysis uncovered loci associated with ALL that are missed by GWA studies, such as MEIS1 Furthermore, ALL-associated loci do not appear to interact directly with each other to influence ALL risk, and instead appear to influence leukemogenesis through multiple, complex pathways.Impact: We present a new pipeline for post hoc analysis of association studies that yields new insight into the etiology of ALL and can be applied in future studies to shed light on the genomic underpinnings of cancer. Cancer Epidemiol Biomarkers Prev; 26(10); 1531-9. ©2017 AACR.


Assuntos
Estudo de Associação Genômica Ampla/métodos , Leucemia-Linfoma Linfoblástico de Células Precursoras/etnologia , Estudos de Casos e Controles , Pré-Escolar , Predisposição Genética para Doença , Humanos , Leucemia-Linfoma Linfoblástico de Células Precursoras/genética
18.
Nat Rev Genet ; 18(9): 551-562, 2017 09.
Artigo em Inglês | MEDLINE | ID: mdl-28607512

RESUMO

Biological networks are powerful resources for the discovery of genes and genetic modules that drive disease. Fundamental to network analysis is the concept that genes underlying the same phenotype tend to interact; this principle can be used to combine and to amplify signals from individual genes. Recently, numerous bioinformatic techniques have been proposed for genetic analysis using networks, based on random walks, information diffusion and electrical resistance. These approaches have been applied successfully to identify disease genes, genetic modules and drug targets. In fact, all these approaches are variations of a unifying mathematical machinery - network propagation - suggesting that it is a powerful data transformation method of broad utility in genetic research.


Assuntos
Biologia Computacional , Doença/genética , Redes Reguladoras de Genes , Estudos de Associação Genética , Software , Algoritmos , Humanos , Mapas de Interação de Proteínas , Proteínas/metabolismo
19.
Algorithms Mol Biol ; 12: 13, 2017.
Artigo em Inglês | MEDLINE | ID: mdl-28515774

RESUMO

BACKGROUND: Cancer is an evolutionary process characterized by the accumulation of somatic mutations in a population of cells that form a tumor. One frequent type of mutations is copy number aberrations, which alter the number of copies of genomic regions. The number of copies of each position along a chromosome constitutes the chromosome's copy-number profile. Understanding how such profiles evolve in cancer can assist in both diagnosis and prognosis. RESULTS: We model the evolution of a tumor by segmental deletions and amplifications, and gauge distance from profile [Formula: see text] to [Formula: see text] by the minimum number of events needed to transform [Formula: see text] into [Formula: see text]. Given two profiles, our first problem aims to find a parental profile that minimizes the sum of distances to its children. Given k profiles, the second, more general problem, seeks a phylogenetic tree, whose k leaves are labeled by the k given profiles and whose internal vertices are labeled by ancestral profiles such that the sum of edge distances is minimum. CONCLUSIONS: For the former problem we give a pseudo-polynomial dynamic programming algorithm that is linear in the profile length, and an integer linear program formulation. For the latter problem we show it is NP-hard and give an integer linear program formulation that scales to practical problem instance sizes. We assess the efficiency and quality of our algorithms on simulated instances. AVAILABILITY: https://github.com/raphael-group/CNT-ILP.

20.
Genome Res ; 27(8): 1450-1459, 2017 08.
Artigo em Inglês | MEDLINE | ID: mdl-28522612

RESUMO

Identifying genomic variants is a fundamental first step toward the understanding of the role of inherited and acquired variation in disease. The accelerating growth in the corpus of sequencing data that underpins such analysis is making the data-download bottleneck more evident, placing substantial burdens on the research community to keep pace. As a result, the search for alternative approaches to the traditional "download and analyze" paradigm on local computing resources has led to a rapidly growing demand for cloud-computing solutions for genomics analysis. Here, we introduce the Genome Variant Investigation Platform (GenomeVIP), an open-source framework for performing genomics variant discovery and annotation using cloud- or local high-performance computing infrastructure. GenomeVIP orchestrates the analysis of whole-genome and exome sequence data using a set of robust and popular task-specific tools, including VarScan, GATK, Pindel, BreakDancer, Strelka, and Genome STRiP, through a web interface. GenomeVIP has been used for genomic analysis in large-data projects such as the TCGA PanCanAtlas and in other projects, such as the ICGC Pilots, CPTAC, ICGC-TCGA DREAM Challenges, and the 1000 Genomes SV Project. Here, we demonstrate GenomeVIP's ability to provide high-confidence annotated somatic, germline, and de novo variants of potential biological significance using publicly available data sets.


Assuntos
Computação em Nuvem , Variação Genética , Genoma Humano , Genômica/métodos , Neoplasias/genética , Software , Bases de Dados Genéticas , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Humanos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA