Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 11 de 11
Filtrar
Mais filtros











Base de dados
Intervalo de ano de publicação
1.
J Comput Biol ; 26(9): 985-1002, 2019 09.
Artigo em Inglês | MEDLINE | ID: mdl-31120348

RESUMO

Cancer arises from successive rounds of mutations, resulting in tumor cells with different somatic mutations known as clones. Drug responsiveness and therapeutics of cancer depend on the accurate detection of the clones in a tumor sample. Recent research has considered inferring clonal composition of a tumor sample using computational models based on the short read data of the sample generated using the next-generation sequencing (NGS) technology. Short reads (segmented DNA parts of different tumor cells) are noisy; therefore, inferring the clones and their mutations from the data is a difficult and complex problem. Existing methods to infer clones from noisy NGS data do not consider the presence of long-range mutational influences. Therefore, we develop a new model, called extended multiple sample tumor heterogeneity prediction by factorial Hidden Markov model (emHetFHMM), based on factorial hidden Markov models to infer clones and their proportions by capturing the long-range mutational influences. In our model, each hidden chain represents the genomic signature of a clone, and a mixture of chains results in the observed data. We make use of Gibbs sampling and exponentiated gradient algorithms to infer the hidden variables and mixing proportions. We compare our model with strong models from the previous work (PyClone, PhyloSub, and HetFHMM) based on both synthetic data and real cancer data from acute myeloid leukemia. Empirical results confirm that emHetFHMM infers clonal composition of a tumor sample more accurately than previous studies.


Assuntos
Algoritmos , Heterogeneidade Genética , Genômica/métodos , Modelos Genéticos , Mutação , Neoplasias/genética , Genômica/normas , Humanos , Cadeias de Markov
2.
J Comput Biol ; 25(2): 182-193, 2018 02.
Artigo em Inglês | MEDLINE | ID: mdl-29035575

RESUMO

Cancer arises from successive rounds of mutations, resulting in tumor cells with different somatic mutations known as clones. Drug responsiveness and therapeutics of cancer depend on the accurate detection of clones in a tumor sample. Recent research has considered inferring clonal composition of a tumor sample using computational models based on short read data of the sample generated using next-generation sequencing (NGS) technology. Short reads (segmented DNA parts of different tumor cells) are noisy; therefore, inferring the clones and their mutations from the data is a difficult and complex problem. We develop a new model called HetFHMM, based on factorial hidden Markov models, to infer clones and their proportions from noisy NGS data. In our model, each hidden chain represents the genomic signature of a clone, and a mixture of chains results in the observed data. We make use of Gibbs sampling and exponentiated gradient algorithms to infer the hidden variables and mixing proportions. We compare our model with strong models from previous work (PyClone and PhyloSub) based on both synthetic data and real cancer data on acute myeloid leukemia. Empirical results confirm that HetFHMM infers clonal composition of a tumor sample more accurately than previous work.


Assuntos
Biologia Computacional/métodos , Heterogeneidade Genética , Leucemia Mieloide Aguda/genética , Análise de Sequência de DNA/métodos , Evolução Clonal , Biologia Computacional/normas , Humanos , Cadeias de Markov , Acúmulo de Mutações , Análise de Sequência de DNA/normas
3.
Oncotarget ; 8(40): 68047-68058, 2017 Sep 15.
Artigo em Inglês | MEDLINE | ID: mdl-28978095

RESUMO

Melphalan is a cytotoxic chemotherapy used to treat patients with multiple myeloma (MM). Bone resorption by osteoclasts, by remodeling the bone surface, can reactivate dormant MM cells held in the endosteal niche to promote tumor development. Dormant MM cells can be reactivated after melphalan treatment; however, it is unclear whether melphalan treatment increases osteoclast formation to modify the endosteal niche. Melphalan treatment of mice for 14 days decreased bone volume and the endosteal bone surface, and this was associated with increases in osteoclast numbers. Bone marrow cells (BMC) from melphalan-treated mice formed more osteoclasts than BMCs from vehicle-treated mice, suggesting that osteoclast progenitors were increased. Melphalan also increased osteoclast formation in BMCs and RAW264.7 cells in vitro, which was prevented with the cell stress response (CSR) inhibitor KNK437. Melphalan also increased expression of the osteoclast regulator the microphthalmia-associated transcription factor (MITF), but not nuclear factor of activated T cells 1 (NFATc1). Melphalan increased expression of MITF-dependent cell fusion factors, dendritic cell-specific transmembrane protein (Dc-stamp) and osteoclast-stimulatory transmembrane protein (Oc-stamp) and increased cell fusion. Expression of osteoclast stimulator receptor activator of NFκB ligand (RANKL) was unaffected by melphalan treatment. These data suggest that melphalan stimulates osteoclast formation by increasing osteoclast progenitor recruitment and differentiation in a CSR-dependent manner. Melphalan-induced osteoclast formation is associated with bone loss and reduced endosteal bone surface. As well as affecting bone structure this may contribute to dormant tumor cell activation, which has implications for how melphalan is used to treat patients with MM.

4.
Genome Res ; 27(9): 1573-1588, 2017 09.
Artigo em Inglês | MEDLINE | ID: mdl-28768687

RESUMO

Prioritizing molecular alterations that act as drivers of cancer remains a crucial bottleneck in therapeutic development. Here we introduce HIT'nDRIVE, a computational method that integrates genomic and transcriptomic data to identify a set of patient-specific, sequence-altered genes, with sufficient collective influence over dysregulated transcripts. HIT'nDRIVE aims to solve the "random walk facility location" (RWFL) problem in a gene (or protein) interaction network, which differs from the standard facility location problem by its use of an alternative distance measure: "multihitting time," the expected length of the shortest random walk from any one of the set of sequence-altered genes to an expression-altered target gene. When applied to 2200 tumors from four major cancer types, HIT'nDRIVE revealed many potentially clinically actionable driver genes. We also demonstrated that it is possible to perform accurate phenotype prediction for tumor samples by only using HIT'nDRIVE-seeded driver gene modules from gene interaction networks. In addition, we identified a number of breast cancer subtype-specific driver modules that are associated with patients' survival outcome. Furthermore, HIT'nDRIVE, when applied to a large panel of pan-cancer cell lines, accurately predicted drug efficacy using the driver genes and their seeded gene modules. Overall, HIT'nDRIVE may help clinicians contextualize massive multiomics data in therapeutic decision making, enabling widespread implementation of precision oncology.


Assuntos
Neoplasias da Mama/genética , Variações do Número de Cópias de DNA/genética , Software , Transcriptoma/genética , Neoplasias da Mama/patologia , Biologia Computacional , Feminino , Genômica , Humanos , Mutação , Mapas de Interação de Proteínas/genética
5.
J Biomed Inform ; 64: 158-167, 2016 12.
Artigo em Inglês | MEDLINE | ID: mdl-27742349

RESUMO

OBJECTIVE: Text and data mining play an important role in obtaining insights from Health and Hospital Information Systems. This paper presents a text mining system for detecting admissions marked as positive for several diseases: Lung Cancer, Breast Cancer, Colon Cancer, Secondary Malignant Neoplasm of Respiratory and Digestive Organs, Multiple Myeloma and Malignant Plasma Cell Neoplasms, Pneumonia, and Pulmonary Embolism. We specifically examine the effect of linking multiple data sources on text classification performance. METHODS: Support Vector Machine classifiers are built for eight data source combinations, and evaluated using the metrics of Precision, Recall and F-Score. Sub-sampling techniques are used to address unbalanced datasets of medical records. We use radiology reports as an initial data source and add other sources, such as pathology reports and patient and hospital admission data, in order to assess the research question regarding the impact of the value of multiple data sources. Statistical significance is measured using the Wilcoxon signed-rank test. A second set of experiments explores aspects of the system in greater depth, focusing on Lung Cancer. We explore the impact of feature selection; analyse the learning curve; examine the effect of restricting admissions to only those containing reports from all data sources; and examine the impact of reducing the sub-sampling. These experiments provide better understanding of how to best apply text classification in the context of imbalanced data of variable completeness. RESULTS: Radiology questions plus patient and hospital admission data contribute valuable information for detecting most of the diseases, significantly improving performance when added to radiology reports alone or to the combination of radiology and pathology reports. CONCLUSION: Overall, linking data sources significantly improved classification performance for all the diseases examined. However, there is no single approach that suits all scenarios; the choice of the most effective combination of data sources depends on the specific disease to be classified.


Assuntos
Mineração de Dados , Doença/classificação , Registros Hospitalares , Processamento de Linguagem Natural , Hospitalização , Humanos , Cooperação do Paciente , Máquina de Vetores de Suporte
6.
BMC Genomics ; 14 Suppl 1: S14, 2013.
Artigo em Inglês | MEDLINE | ID: mdl-23369194

RESUMO

One challenge in applying bioinformatic tools to clinical or biological data is high number of features that might be provided to the learning algorithm without any prior knowledge on which ones should be used. In such applications, the number of features can drastically exceed the number of training instances which is often limited by the number of available samples for the study. The Lasso is one of many regularization methods that have been developed to prevent overfitting and improve prediction performance in high-dimensional settings. In this paper, we propose a novel algorithm for feature selection based on the Lasso and our hypothesis is that defining a scoring scheme that measures the "quality" of each feature can provide a more robust feature selection method. Our approach is to generate several samples from the training data by bootstrapping, determine the best relevance-ordering of the features for each sample, and finally combine these relevance-orderings to select highly relevant features. In addition to the theoretical analysis of our feature scoring scheme, we provided empirical evaluations on six real datasets from different fields to confirm the superiority of our method in exploratory data analysis and prediction performance. For example, we applied FeaLect, our feature scoring algorithm, to a lymphoma dataset, and according to a human expert, our method led to selecting more meaningful features than those commonly used in the clinics. This case study built a basis for discovering interesting new criteria for lymphoma diagnosis. Furthermore, to facilitate the use of our algorithm in other applications, the source code that implements our algorithm was released as FeaLect, a documented R package in CRAN.


Assuntos
Algoritmos , Linfoma/diagnóstico , Biologia Computacional , Bases de Dados Factuais , Humanos , Curva ROC
7.
Nature ; 486(7403): 346-52, 2012 Apr 18.
Artigo em Inglês | MEDLINE | ID: mdl-22522925

RESUMO

The elucidation of breast cancer subgroups and their molecular drivers requires integrated views of the genome and transcriptome from representative numbers of patients. We present an integrated analysis of copy number and gene expression in a discovery and validation set of 997 and 995 primary breast tumours, respectively, with long-term clinical follow-up. Inherited variants (copy number variants and single nucleotide polymorphisms) and acquired somatic copy number aberrations (CNAs) were associated with expression in ~40% of genes, with the landscape dominated by cis- and trans-acting CNAs. By delineating expression outlier genes driven in cis by CNAs, we identified putative cancer genes, including deletions in PPP2R2A, MTAP and MAP2K4. Unsupervised analysis of paired DNA­RNA profiles revealed novel subgroups with distinct clinical outcomes, which reproduced in the validation cohort. These include a high-risk, oestrogen-receptor-positive 11q13/14 cis-acting subgroup and a favourable prognosis subgroup devoid of CNAs. Trans-acting aberration hotspots were found to modulate subgroup-specific gene networks, including a TCR deletion-mediated adaptive immune response in the 'CNA-devoid' subgroup and a basal-specific chromosome 5 deletion-associated mitotic network. Our results provide a novel molecular stratification of the breast cancer population, derived from the impact of somatic CNAs on the transcriptome.


Assuntos
Neoplasias da Mama/genética , Neoplasias da Mama/patologia , Variações do Número de Cópias de DNA/genética , Perfilação da Expressão Gênica , Regulação Neoplásica da Expressão Gênica , Genoma Humano/genética , Neoplasias da Mama/classificação , Neoplasias da Mama/diagnóstico , Feminino , Redes Reguladoras de Genes/genética , Genes Neoplásicos/genética , Genômica , Humanos , Estimativa de Kaplan-Meier , MAP Quinase Quinase 4/genética , Polimorfismo de Nucleotídeo Único/genética , Prognóstico , Proteína Fosfatase 2/genética , Resultado do Tratamento
8.
Nature ; 486(7403): 395-9, 2012 Apr 04.
Artigo em Inglês | MEDLINE | ID: mdl-22495314

RESUMO

Primary triple-negative breast cancers (TNBCs), a tumour type defined by lack of oestrogen receptor, progesterone receptor and ERBB2 gene amplification, represent approximately 16% of all breast cancers. Here we show in 104 TNBC cases that at the time of diagnosis these cancers exhibit a wide and continuous spectrum of genomic evolution, with some having only a handful of coding somatic aberrations in a few pathways, whereas others contain hundreds of coding somatic mutations. High-throughput RNA sequencing (RNA-seq) revealed that only approximately 36% of mutations are expressed. Using deep re-sequencing measurements of allelic abundance for 2,414 somatic mutations, we determine for the first time-to our knowledge-in an epithelial tumour subtype, the relative abundance of clonal frequencies among cases representative of the population. We show that TNBCs vary widely in their clonal frequencies at the time of diagnosis, with the basal subtype of TNBC showing more variation than non-basal TNBC. Although p53 (also known as TP53), PIK3CA and PTEN somatic mutations seem to be clonally dominant compared to other genes, in some tumours their clonal frequencies are incompatible with founder status. Mutations in cytoskeletal, cell shape and motility proteins occurred at lower clonal frequencies, suggesting that they occurred later during tumour progression. Taken together, our results show that understanding the biology and therapeutic responses of patients with TNBC will require the determination of individual tumour clonal genotypes.


Assuntos
Neoplasias da Mama/genética , Neoplasias da Mama/patologia , Evolução Molecular , Mutação/genética , Alelos , Neoplasias da Mama/diagnóstico , Células Clonais/metabolismo , Células Clonais/patologia , Variações do Número de Cópias de DNA/genética , Análise Mutacional de DNA , Progressão da Doença , Feminino , Perfilação da Expressão Gênica , Regulação Neoplásica da Expressão Gênica/genética , Genótipo , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Mutação INDEL/genética , Mutação Puntual/genética , Medicina de Precisão , Reprodutibilidade dos Testes , Análise de Sequência de RNA
9.
Genome Biol ; 13(12): R124, 2012 Dec 22.
Artigo em Inglês | MEDLINE | ID: mdl-23383675

RESUMO

Simultaneous interrogation of tumor genomes and transcriptomes is underway in unprecedented global efforts. Yet, despite the essential need to separate driver mutations modulating gene expression networks from transcriptionally inert passenger mutations, robust computational methods to ascertain the impact of individual mutations on transcriptional networks are underdeveloped. We introduce a novel computational framework, DriverNet, to identify likely driver mutations by virtue of their effect on mRNA expression networks. Application to four cancer datasets reveals the prevalence of rare candidate driver mutations associated with disrupted transcriptional networks and a simultaneous modulation of oncogenic and metabolic networks, induced by copy number co-modification of adjacent oncogenic and metabolic drivers. DriverNet is available on Bioconductor or at http://compbio.bccrc.ca/software/drivernet/.


Assuntos
Regulação Neoplásica da Expressão Gênica , Redes Reguladoras de Genes , Mutação , Neoplasias/genética , Software , Algoritmos , Perfilação da Expressão Gênica , Genômica , Humanos , Redes e Vias Metabólicas/genética , Oncogenes , Transcrição Gênica
10.
Bioinformatics ; 28(2): 167-75, 2012 Jan 15.
Artigo em Inglês | MEDLINE | ID: mdl-22084253

RESUMO

MOTIVATION: The study of cancer genomes now routinely involves using next-generation sequencing technology (NGS) to profile tumours for single nucleotide variant (SNV) somatic mutations. However, surprisingly few published bioinformatics methods exist for the specific purpose of identifying somatic mutations from NGS data and existing tools are often inaccurate, yielding intolerably high false prediction rates. As such, the computational problem of accurately inferring somatic mutations from paired tumour/normal NGS data remains an unsolved challenge. RESULTS: We present the comparison of four standard supervised machine learning algorithms for the purpose of somatic SNV prediction in tumour/normal NGS experiments. To evaluate these approaches (random forest, Bayesian additive regression tree, support vector machine and logistic regression), we constructed 106 features representing 3369 candidate somatic SNVs from 48 breast cancer genomes, originally predicted with naive methods and subsequently revalidated to establish ground truth labels. We trained the classifiers on this data (consisting of 1015 true somatic mutations and 2354 non-somatic mutation positions) and conducted a rigorous evaluation of these methods using a cross-validation framework and hold-out test NGS data from both exome capture and whole genome shotgun platforms. All learning algorithms employing predictive discriminative approaches with feature selection improved the predictive accuracy over standard approaches by statistically significant margins. In addition, using unsupervised clustering of the ground truth 'false positive' predictions, we noted several distinct classes and present evidence suggesting non-overlapping sources of technical artefacts illuminating important directions for future study. AVAILABILITY: Software called MutationSeq and datasets are available from http://compbio.bccrc.ca.


Assuntos
Algoritmos , Inteligência Artificial , Neoplasias da Mama/genética , Mutação , Polimorfismo de Nucleotídeo Único , Teorema de Bayes , Análise por Conglomerados , Exoma , Feminino , Genoma , Humanos , Modelos Genéticos , Neoplasias , Software , Máquina de Vetores de Suporte
11.
Am J Clin Pathol ; 137(1): 75-85, 2012 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-22180480

RESUMO

Mantle cell lymphoma (MCL) and small lymphocytic lymphoma (SLL) exhibit similar but distinct immunophenotypic profiles. Many cases can be diagnosed readily by flow cytometry (FCM) alone; however, ambiguous cases are frequently encountered and necessitate additional studies, including immunohistochemical staining for cyclin D1 and fluorescence in situ hybridization for IgH-CCND1 rearrangement. To determine if greater diagnostic accuracy could be achieved from FCM data alone, we developed an unbiased, machine-based algorithm to identify features that best distinguish between the 2 diseases. By applying conventional diagnostic criteria to the flow cytometry data, we were able to assign 28 of 44 (64%) MCL and 48 of 70 (69%) SLL cases correctly. In contrast, we were able to assign all 44 (100%) MCL and 68 of 70 (97%) SLL cases correctly using a novel set of criteria, as identified by our automated approach. The most discriminating feature was the CD20/CD23 mean fluorescence intensity ratio, and we found unexpectedly that inclusion of FMC7 expression in the diagnostic algorithm actually reduced its accuracy. This study demonstrates that computational methods can be used on existing clinical FCM data to improve diagnostic accuracy and suggests similar computational approaches could be used to identify novel prognostic markers and perhaps subdivide existing or define new diagnostic entities.


Assuntos
Citometria de Fluxo/métodos , Leucemia Linfocítica Crônica de Células B/diagnóstico , Linfoma de Célula do Manto/diagnóstico , Idoso , Algoritmos , Antígenos CD20/metabolismo , Inteligência Artificial , Feminino , Humanos , Imunofenotipagem , Leucemia Linfocítica Crônica de Células B/sangue , Linfoma de Célula do Manto/sangue , Masculino , Reconhecimento Automatizado de Padrão , Receptores de IgE/metabolismo , Reprodutibilidade dos Testes
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA