Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 20
Filtrar
1.
Artif Intell Med ; 143: 102611, 2023 09.
Artículo en Inglés | MEDLINE | ID: mdl-37673579

RESUMEN

Medical Visual Question Answering (VQA) is a combination of medical artificial intelligence and popular VQA challenges. Given a medical image and a clinically relevant question in natural language, the medical VQA system is expected to predict a plausible and convincing answer. Although the general-domain VQA has been extensively studied, the medical VQA still needs specific investigation and exploration due to its task features. In the first part of this survey, we collect and discuss the publicly available medical VQA datasets up-to-date about the data source, data quantity, and task feature. In the second part, we review the approaches used in medical VQA tasks. We summarize and discuss their techniques, innovations, and potential improvements. In the last part, we analyze some medical-specific challenges for the field and discuss future research directions. Our goal is to provide comprehensive and helpful information for researchers interested in the medical visual question answering field and encourage them to conduct further research in this field.


Asunto(s)
Inteligencia Artificial
2.
J Occup Rehabil ; 30(3): 331-342, 2020 09.
Artículo en Inglés | MEDLINE | ID: mdl-31620997

RESUMEN

Purpose Post-injury health service utilization (HSU) contributes to injury outcomes, but limited studies investigated their relationship. This study aims to group injured patients in transport accidents based on minimal historical information of their HSU so that the groups are meaningfully associated with the outcome of interest. Methods The data include 20,692 injured patients who had compensation claims over 3 years. We propose a hybrid approach, combining unsupervised and supervised machine learning methods. Based on the first week post-injury data, we identify a proper clustering of patients best associated with total cost to recovery, as well as the discovery of HSU patterns. This allows developing models to accurately predict the outcome of interest using the discovered patterns. Furthermore, we propose to use decision tree classifiers to accurately classify future patients into the discovered clusters using their first week post-injury information. Results Our hybrid approach has identified eight patient groups. The compactness of the resulted clusters, assessed by Average Silhouette Width metric, is 0.71 indicating well-defined clusters. The resulted patient groups are highly predictive of injury outcomes. They improve the cost predictability more than twice in comparison with predictors such as gender, age and injury type. These groups also have substantial association with patients' recovery. The transparency and interpretability of decision trees allow integrating the resulting classification rules conveniently in operational processes. Conclusions This study provides a framework to discover knowledge and useful insights for health service providers and policy makers to control injury outcomes, and consequently to reduce the severity of transport accidents.


Asunto(s)
Algoritmos , Compensación y Reparación , Servicios de Salud , Heridas y Lesiones , Análisis por Conglomerados , Humanos , Heridas y Lesiones/terapia
3.
Health Inf Sci Syst ; 7(1): 18, 2019 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-31523422

RESUMEN

PURPOSE: This study develops a pattern recognition method that identifies patterns based on their similarity and their association with the outcome of interest. The practical purpose of developing this pattern recognition method is to group patients, who are injured in transport accidents, in the early stages post-injury. This grouping is based on distinctive patterns in health service use within the first week post-injury. The groups also provide predictive information towards the total cost of medication process. As a result, the group of patients who have undesirable outcomes are identified as early as possible based health service use patterns. METHODS: We propose a multi-objective optimization model to group patients. An objective function is the cost function of k-medians clustering to recognize the similar patterns. Another objective function is the cross-validated root-mean-square error to examine the association with the total cost. The best grouping is obtained by minimizing both objective functions. As a result, the multi-objective optimization model is a semi-supervised clustering which learns health service use patterns in both unsupervised and supervised ways. We also introduce an evolutionary computation approach includes stochastic gradient descent and Pareto optimal solutions to find the optimal solution. In addition, we use the decision tree method to reproduce the optimal groups using an interpretable classification model. RESULTS: The results show that the proposed multi-objective semi-supervised clustering identifies distinct groups of health service uses and contributes to predict the total cost. The performance of the multi-objective model has been examined using two metrics such as the average silhouette width and the cross-validation error. The examination proves that the multi-objective model outperforms the single-objective ones. In addition, the interpretable classification model shows that imaging and therapeutic services are critical services in the first-week post-injury to group injured patients. CONCLUSION: The proposed multi-objective semi-supervised clustering finds the optimal clusters that not only are well-separated from each other but can provide informative insights regarding the outcome of interest. It also overcomes two drawback of clustering methods such as being sensitive to the initial cluster centers and need for specifying the number of clusters.

4.
J Clin Med ; 8(9)2019 Sep 05.
Artículo en Inglés | MEDLINE | ID: mdl-31491944

RESUMEN

Clinical audit of invasive mold disease (IMD) in hematology patients is inefficient due to the difficulties of case finding. This results in antifungal stewardship (AFS) programs preferentially reporting drug cost and consumption rather than measures that actually reflect quality of care. We used machine learning-based natural language processing (NLP) to non-selectively screen chest tomography (CT) reports for pulmonary IMD, verified by clinical review against international definitions and benchmarked against key AFS measures. NLP screened 3014 reports from 1 September 2008 to 31 December 2017, generating 784 positives that after review, identified 205 IMD episodes (44% probable-proven) in 185 patients from 50,303 admissions. Breakthrough-probable/proven-IMD on antifungal prophylaxis accounted for 60% of episodes with serum monitoring of voriconazole or posaconazole in the 2 weeks prior performed in only 53% and 69% of episodes, respectively. Fiberoptic bronchoscopy within 2 days of CT scan occurred in only 54% of episodes. The average turnaround of send-away bronchoalveolar galactomannan of 12 days (range 7-22) was associated with high empiric liposomal amphotericin consumption. A random audit of 10% negative reports revealed two clinically significant misses (0.9%, 2/223). This is the first successful use of applied machine learning for institutional IMD surveillance across an entire hematology population describing process and outcome measures relevant to AFS. Compared to current methods of clinical audit, semi-automated surveillance using NLP is more efficient and inclusive by avoiding restrictions based on any underlying hematologic condition, and has the added advantage of being potentially scalable.

5.
Stud Health Technol Inform ; 266: 1-6, 2019 Aug 08.
Artículo en Inglés | MEDLINE | ID: mdl-31397293

RESUMEN

Identifying those patient groups, who have unwanted outcomes, in the early stages is crucial to providing the most appropriate level of care. In this study, we intend to find distinctive patterns in health service use (HSU) of transport accident injured patients within the first week post-injury. Aiming those patterns that are associated with the outcome of interest. To recognize these patterns, we propose a multi-objective optimization model that minimizes the k-medians cost function and regression error simultaneously. Thus, we use a semi-supervised clustering approach to identify patient groups based on HSU patterns and their association with total cost. To solve the optimization problem, we introduce an evolutionary algorithm using stochastic gradient descent and Pareto optimal solutions. As a result, we find the best optimal clusters by minimizing both objective functions. The results show that the proposed semi-supervised approach identifies distinct groups of HSUs and contributes to predict total cost. Also, the experiments prove the performance of the multi-objective approach in comparison with single- objective approaches.


Asunto(s)
Accidentes , Algoritmos , Análisis por Conglomerados , Servicios de Salud , Humanos , Medición de Riesgo
6.
J Comput Biol ; 26(9): 985-1002, 2019 09.
Artículo en Inglés | MEDLINE | ID: mdl-31120348

RESUMEN

Cancer arises from successive rounds of mutations, resulting in tumor cells with different somatic mutations known as clones. Drug responsiveness and therapeutics of cancer depend on the accurate detection of the clones in a tumor sample. Recent research has considered inferring clonal composition of a tumor sample using computational models based on the short read data of the sample generated using the next-generation sequencing (NGS) technology. Short reads (segmented DNA parts of different tumor cells) are noisy; therefore, inferring the clones and their mutations from the data is a difficult and complex problem. Existing methods to infer clones from noisy NGS data do not consider the presence of long-range mutational influences. Therefore, we develop a new model, called extended multiple sample tumor heterogeneity prediction by factorial Hidden Markov model (emHetFHMM), based on factorial hidden Markov models to infer clones and their proportions by capturing the long-range mutational influences. In our model, each hidden chain represents the genomic signature of a clone, and a mixture of chains results in the observed data. We make use of Gibbs sampling and exponentiated gradient algorithms to infer the hidden variables and mixing proportions. We compare our model with strong models from the previous work (PyClone, PhyloSub, and HetFHMM) based on both synthetic data and real cancer data from acute myeloid leukemia. Empirical results confirm that emHetFHMM infers clonal composition of a tumor sample more accurately than previous studies.


Asunto(s)
Algoritmos , Heterogeneidad Genética , Genómica/métodos , Modelos Genéticos , Mutación , Neoplasias/genética , Genómica/normas , Humanos , Cadenas de Markov
7.
Brief Bioinform ; 20(6): 2150-2166, 2019 11 27.
Artículo en Inglés | MEDLINE | ID: mdl-30184176

RESUMEN

The roles of proteolytic cleavage have been intensively investigated and discussed during the past two decades. This irreversible chemical process has been frequently reported to influence a number of crucial biological processes (BPs), such as cell cycle, protein regulation and inflammation. A number of advanced studies have been published aiming at deciphering the mechanisms of proteolytic cleavage. Given its significance and the large number of functionally enriched substrates targeted by specific proteases, many computational approaches have been established for accurate prediction of protease-specific substrates and their cleavage sites. Consequently, there is an urgent need to systematically assess the state-of-the-art computational approaches for protease-specific cleavage site prediction to further advance the existing methodologies and to improve the prediction performance. With this goal in mind, in this article, we carefully evaluated a total of 19 computational methods (including 8 scoring function-based methods and 11 machine learning-based methods) in terms of their underlying algorithm, calculated features, performance evaluation and software usability. Then, extensive independent tests were performed to assess the robustness and scalability of the reviewed methods using our carefully prepared independent test data sets with 3641 cleavage sites (specific to 10 proteases). The comparative experimental results demonstrate that PROSPERous is the most accurate generic method for predicting eight protease-specific cleavage sites, while GPS-CCD and LabCaS outperformed other predictors for calpain-specific cleavage sites. Based on our review, we then outlined some potential ways to improve the prediction performance and ease the computational burden by applying ensemble learning, deep learning, positive unlabeled learning and parallel and distributed computing techniques. We anticipate that our study will serve as a practical and useful guide for interested readers to further advance next-generation bioinformatics tools for protease-specific cleavage site prediction.


Asunto(s)
Benchmarking , Biología Computacional , Péptido Hidrolasas/metabolismo , Investigación , Algoritmos , Aprendizaje Automático , Especificidad por Sustrato
8.
J Theor Biol ; 443: 125-137, 2018 04 14.
Artículo en Inglés | MEDLINE | ID: mdl-29408627

RESUMEN

Determining the catalytic residues in an enzyme is critical to our understanding the relationship between protein sequence, structure, function, and enhancing our ability to design novel enzymes and their inhibitors. Although many enzymes have been sequenced, and their primary and tertiary structures determined, experimental methods for enzyme functional characterization lag behind. Because experimental methods used for identifying catalytic residues are resource- and labor-intensive, computational approaches have considerable value and are highly desirable for their ability to complement experimental studies in identifying catalytic residues and helping to bridge the sequence-structure-function gap. In this study, we describe a new computational method called PREvaIL for predicting enzyme catalytic residues. This method was developed by leveraging a comprehensive set of informative features extracted from multiple levels, including sequence, structure, and residue-contact network, in a random forest machine-learning framework. Extensive benchmarking experiments on eight different datasets based on 10-fold cross-validation and independent tests, as well as side-by-side performance comparisons with seven modern sequence- and structure-based methods, showed that PREvaIL achieved competitive predictive performance, with an area under the receiver operating characteristic curve and area under the precision-recall curve ranging from 0.896 to 0.973 and from 0.294 to 0.523, respectively. We demonstrated that this method was able to capture useful signals arising from different levels, leveraging such differential but useful types of features and allowing us to significantly improve the performance of catalytic residue prediction. We believe that this new method can be utilized as a valuable tool for both understanding the complex sequence-structure-function relationships of proteins and facilitating the characterization of novel enzymes lacking functional annotations.


Asunto(s)
Bases de Datos de Proteínas , Aprendizaje Automático , Análisis de Secuencia de Proteína/métodos , Programas Informáticos , Catálisis , Conformación Proteica
9.
Bioinformatics ; 34(4): 684-687, 2018 02 15.
Artículo en Inglés | MEDLINE | ID: mdl-29069280

RESUMEN

Summary: Proteases are enzymes that specifically cleave the peptide backbone of their target proteins. As an important type of irreversible post-translational modification, protein cleavage underlies many key physiological processes. When dysregulated, proteases' actions are associated with numerous diseases. Many proteases are highly specific, cleaving only those target substrates that present certain particular amino acid sequence patterns. Therefore, tools that successfully identify potential target substrates for proteases may also identify previously unknown, physiologically relevant cleavage sites, thus providing insights into biological processes and guiding hypothesis-driven experiments aimed at verifying protease-substrate interaction. In this work, we present PROSPERous, a tool for rapid in silico prediction of protease-specific cleavage sites in substrate sequences. Our tool is based on logistic regression models and uses different scoring functions and their pairwise combinations to subsequently predict potential cleavage sites. PROSPERous represents a state-of-the-art tool that enables fast, accurate and high-throughput prediction of substrate cleavage sites for 90 proteases. Availability and implementation: http://prosperous.erc.monash.edu/. Contact: jiangning.song@monash.edu or geoff.webb@monash.edu or r.pike@latrobe.edu.au. Supplementary information: Supplementary data are available at Bioinformatics online.


Asunto(s)
Péptido Hidrolasas/metabolismo , Análisis de Secuencia de Proteína/métodos , Programas Informáticos , Biología Computacional/métodos , Simulación por Computador , Exactitud de los Datos , Proteolisis , Especificidad por Sustrato
10.
J Comput Biol ; 25(2): 182-193, 2018 02.
Artículo en Inglés | MEDLINE | ID: mdl-29035575

RESUMEN

Cancer arises from successive rounds of mutations, resulting in tumor cells with different somatic mutations known as clones. Drug responsiveness and therapeutics of cancer depend on the accurate detection of clones in a tumor sample. Recent research has considered inferring clonal composition of a tumor sample using computational models based on short read data of the sample generated using next-generation sequencing (NGS) technology. Short reads (segmented DNA parts of different tumor cells) are noisy; therefore, inferring the clones and their mutations from the data is a difficult and complex problem. We develop a new model called HetFHMM, based on factorial hidden Markov models, to infer clones and their proportions from noisy NGS data. In our model, each hidden chain represents the genomic signature of a clone, and a mixture of chains results in the observed data. We make use of Gibbs sampling and exponentiated gradient algorithms to infer the hidden variables and mixing proportions. We compare our model with strong models from previous work (PyClone and PhyloSub) based on both synthetic data and real cancer data on acute myeloid leukemia. Empirical results confirm that HetFHMM infers clonal composition of a tumor sample more accurately than previous work.


Asunto(s)
Biología Computacional/métodos , Heterogeneidad Genética , Leucemia Mieloide Aguda/genética , Análisis de Secuencia de ADN/métodos , Evolución Clonal , Biología Computacional/normas , Humanos , Cadenas de Markov , Acumulación de Mutaciones , Análisis de Secuencia de ADN/normas
11.
Oncotarget ; 8(40): 68047-68058, 2017 Sep 15.
Artículo en Inglés | MEDLINE | ID: mdl-28978095

RESUMEN

Melphalan is a cytotoxic chemotherapy used to treat patients with multiple myeloma (MM). Bone resorption by osteoclasts, by remodeling the bone surface, can reactivate dormant MM cells held in the endosteal niche to promote tumor development. Dormant MM cells can be reactivated after melphalan treatment; however, it is unclear whether melphalan treatment increases osteoclast formation to modify the endosteal niche. Melphalan treatment of mice for 14 days decreased bone volume and the endosteal bone surface, and this was associated with increases in osteoclast numbers. Bone marrow cells (BMC) from melphalan-treated mice formed more osteoclasts than BMCs from vehicle-treated mice, suggesting that osteoclast progenitors were increased. Melphalan also increased osteoclast formation in BMCs and RAW264.7 cells in vitro, which was prevented with the cell stress response (CSR) inhibitor KNK437. Melphalan also increased expression of the osteoclast regulator the microphthalmia-associated transcription factor (MITF), but not nuclear factor of activated T cells 1 (NFATc1). Melphalan increased expression of MITF-dependent cell fusion factors, dendritic cell-specific transmembrane protein (Dc-stamp) and osteoclast-stimulatory transmembrane protein (Oc-stamp) and increased cell fusion. Expression of osteoclast stimulator receptor activator of NFκB ligand (RANKL) was unaffected by melphalan treatment. These data suggest that melphalan stimulates osteoclast formation by increasing osteoclast progenitor recruitment and differentiation in a CSR-dependent manner. Melphalan-induced osteoclast formation is associated with bone loss and reduced endosteal bone surface. As well as affecting bone structure this may contribute to dormant tumor cell activation, which has implications for how melphalan is used to treat patients with MM.

12.
Genome Res ; 27(9): 1573-1588, 2017 09.
Artículo en Inglés | MEDLINE | ID: mdl-28768687

RESUMEN

Prioritizing molecular alterations that act as drivers of cancer remains a crucial bottleneck in therapeutic development. Here we introduce HIT'nDRIVE, a computational method that integrates genomic and transcriptomic data to identify a set of patient-specific, sequence-altered genes, with sufficient collective influence over dysregulated transcripts. HIT'nDRIVE aims to solve the "random walk facility location" (RWFL) problem in a gene (or protein) interaction network, which differs from the standard facility location problem by its use of an alternative distance measure: "multihitting time," the expected length of the shortest random walk from any one of the set of sequence-altered genes to an expression-altered target gene. When applied to 2200 tumors from four major cancer types, HIT'nDRIVE revealed many potentially clinically actionable driver genes. We also demonstrated that it is possible to perform accurate phenotype prediction for tumor samples by only using HIT'nDRIVE-seeded driver gene modules from gene interaction networks. In addition, we identified a number of breast cancer subtype-specific driver modules that are associated with patients' survival outcome. Furthermore, HIT'nDRIVE, when applied to a large panel of pan-cancer cell lines, accurately predicted drug efficacy using the driver genes and their seeded gene modules. Overall, HIT'nDRIVE may help clinicians contextualize massive multiomics data in therapeutic decision making, enabling widespread implementation of precision oncology.


Asunto(s)
Neoplasias de la Mama/genética , Variaciones en el Número de Copia de ADN/genética , Programas Informáticos , Transcriptoma/genética , Neoplasias de la Mama/patología , Biología Computacional , Femenino , Genómica , Humanos , Mutación , Mapas de Interacción de Proteínas/genética
13.
J Biomed Inform ; 64: 158-167, 2016 12.
Artículo en Inglés | MEDLINE | ID: mdl-27742349

RESUMEN

OBJECTIVE: Text and data mining play an important role in obtaining insights from Health and Hospital Information Systems. This paper presents a text mining system for detecting admissions marked as positive for several diseases: Lung Cancer, Breast Cancer, Colon Cancer, Secondary Malignant Neoplasm of Respiratory and Digestive Organs, Multiple Myeloma and Malignant Plasma Cell Neoplasms, Pneumonia, and Pulmonary Embolism. We specifically examine the effect of linking multiple data sources on text classification performance. METHODS: Support Vector Machine classifiers are built for eight data source combinations, and evaluated using the metrics of Precision, Recall and F-Score. Sub-sampling techniques are used to address unbalanced datasets of medical records. We use radiology reports as an initial data source and add other sources, such as pathology reports and patient and hospital admission data, in order to assess the research question regarding the impact of the value of multiple data sources. Statistical significance is measured using the Wilcoxon signed-rank test. A second set of experiments explores aspects of the system in greater depth, focusing on Lung Cancer. We explore the impact of feature selection; analyse the learning curve; examine the effect of restricting admissions to only those containing reports from all data sources; and examine the impact of reducing the sub-sampling. These experiments provide better understanding of how to best apply text classification in the context of imbalanced data of variable completeness. RESULTS: Radiology questions plus patient and hospital admission data contribute valuable information for detecting most of the diseases, significantly improving performance when added to radiology reports alone or to the combination of radiology and pathology reports. CONCLUSION: Overall, linking data sources significantly improved classification performance for all the diseases examined. However, there is no single approach that suits all scenarios; the choice of the most effective combination of data sources depends on the specific disease to be classified.


Asunto(s)
Minería de Datos , Enfermedad/clasificación , Registros de Hospitales , Procesamiento de Lenguaje Natural , Hospitalización , Humanos , Cooperación del Paciente , Máquina de Vectores de Soporte
14.
J Comput Biol ; 20(7): 486-94, 2013 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-23829650

RESUMEN

It has been shown that minimum free-energy structure for RNAs and RNA-RNA interaction is often incorrect due to inaccuracies in the energy parameters and inherent limitations of the energy model. In contrast, ensemble-based quantities such as melting temperature and equilibrium concentrations can be more reliably predicted. Even structure prediction by sampling from the ensemble and clustering those structures by Sfold has proven to be more reliable than minimum free energy structure prediction. The main obstacle for ensemble-based approaches is the computational complexity of the partition function and base-pairing probabilities. For instance, the space complexity of the partition function for RNA-RNA interaction is O(n4) and the time complexity is O(n6), which is prohibitively large. Our goal in this article is to present a fast algorithm, based on sparse folding, to calculate an upper bound on the partition function. Our work is based on the recent algorithm of Hazan and Jaakkola (2012). The space complexity of our algorithm is the same as that of sparse folding algorithms, and the time complexity of our algorithm is O(MFE(n)ℓ) for single RNA and O(MFE(m, n)ℓ) for RNA-RNA interaction in practice, in which MFE is the running time of sparse folding and ℓ≤n (ℓ≤n+m) is a sequence-dependent parameter.


Asunto(s)
Algoritmos , ARN/química , ARN/metabolismo , Biología Computacional , Humanos , ARN/genética , Termodinámica
15.
BMC Genomics ; 14 Suppl 1: S14, 2013.
Artículo en Inglés | MEDLINE | ID: mdl-23369194

RESUMEN

One challenge in applying bioinformatic tools to clinical or biological data is high number of features that might be provided to the learning algorithm without any prior knowledge on which ones should be used. In such applications, the number of features can drastically exceed the number of training instances which is often limited by the number of available samples for the study. The Lasso is one of many regularization methods that have been developed to prevent overfitting and improve prediction performance in high-dimensional settings. In this paper, we propose a novel algorithm for feature selection based on the Lasso and our hypothesis is that defining a scoring scheme that measures the "quality" of each feature can provide a more robust feature selection method. Our approach is to generate several samples from the training data by bootstrapping, determine the best relevance-ordering of the features for each sample, and finally combine these relevance-orderings to select highly relevant features. In addition to the theoretical analysis of our feature scoring scheme, we provided empirical evaluations on six real datasets from different fields to confirm the superiority of our method in exploratory data analysis and prediction performance. For example, we applied FeaLect, our feature scoring algorithm, to a lymphoma dataset, and according to a human expert, our method led to selecting more meaningful features than those commonly used in the clinics. This case study built a basis for discovering interesting new criteria for lymphoma diagnosis. Furthermore, to facilitate the use of our algorithm in other applications, the source code that implements our algorithm was released as FeaLect, a documented R package in CRAN.


Asunto(s)
Algoritmos , Linfoma/diagnóstico , Biología Computacional , Bases de Datos Factuales , Humanos , Curva ROC
16.
Nature ; 486(7403): 395-9, 2012 Apr 04.
Artículo en Inglés | MEDLINE | ID: mdl-22495314

RESUMEN

Primary triple-negative breast cancers (TNBCs), a tumour type defined by lack of oestrogen receptor, progesterone receptor and ERBB2 gene amplification, represent approximately 16% of all breast cancers. Here we show in 104 TNBC cases that at the time of diagnosis these cancers exhibit a wide and continuous spectrum of genomic evolution, with some having only a handful of coding somatic aberrations in a few pathways, whereas others contain hundreds of coding somatic mutations. High-throughput RNA sequencing (RNA-seq) revealed that only approximately 36% of mutations are expressed. Using deep re-sequencing measurements of allelic abundance for 2,414 somatic mutations, we determine for the first time-to our knowledge-in an epithelial tumour subtype, the relative abundance of clonal frequencies among cases representative of the population. We show that TNBCs vary widely in their clonal frequencies at the time of diagnosis, with the basal subtype of TNBC showing more variation than non-basal TNBC. Although p53 (also known as TP53), PIK3CA and PTEN somatic mutations seem to be clonally dominant compared to other genes, in some tumours their clonal frequencies are incompatible with founder status. Mutations in cytoskeletal, cell shape and motility proteins occurred at lower clonal frequencies, suggesting that they occurred later during tumour progression. Taken together, our results show that understanding the biology and therapeutic responses of patients with TNBC will require the determination of individual tumour clonal genotypes.


Asunto(s)
Neoplasias de la Mama/genética , Neoplasias de la Mama/patología , Evolución Molecular , Mutación/genética , Alelos , Neoplasias de la Mama/diagnóstico , Células Clonales/metabolismo , Células Clonales/patología , Variaciones en el Número de Copia de ADN/genética , Análisis Mutacional de ADN , Progresión de la Enfermedad , Femenino , Perfilación de la Expresión Génica , Regulación Neoplásica de la Expresión Génica/genética , Genotipo , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Mutación INDEL/genética , Mutación Puntual/genética , Medicina de Precisión , Reproducibilidad de los Resultados , Análisis de Secuencia de ARN
17.
Nature ; 486(7403): 346-52, 2012 Apr 18.
Artículo en Inglés | MEDLINE | ID: mdl-22522925

RESUMEN

The elucidation of breast cancer subgroups and their molecular drivers requires integrated views of the genome and transcriptome from representative numbers of patients. We present an integrated analysis of copy number and gene expression in a discovery and validation set of 997 and 995 primary breast tumours, respectively, with long-term clinical follow-up. Inherited variants (copy number variants and single nucleotide polymorphisms) and acquired somatic copy number aberrations (CNAs) were associated with expression in ~40% of genes, with the landscape dominated by cis- and trans-acting CNAs. By delineating expression outlier genes driven in cis by CNAs, we identified putative cancer genes, including deletions in PPP2R2A, MTAP and MAP2K4. Unsupervised analysis of paired DNA­RNA profiles revealed novel subgroups with distinct clinical outcomes, which reproduced in the validation cohort. These include a high-risk, oestrogen-receptor-positive 11q13/14 cis-acting subgroup and a favourable prognosis subgroup devoid of CNAs. Trans-acting aberration hotspots were found to modulate subgroup-specific gene networks, including a TCR deletion-mediated adaptive immune response in the 'CNA-devoid' subgroup and a basal-specific chromosome 5 deletion-associated mitotic network. Our results provide a novel molecular stratification of the breast cancer population, derived from the impact of somatic CNAs on the transcriptome.


Asunto(s)
Neoplasias de la Mama/genética , Neoplasias de la Mama/patología , Variaciones en el Número de Copia de ADN/genética , Perfilación de la Expresión Génica , Regulación Neoplásica de la Expresión Génica , Genoma Humano/genética , Neoplasias de la Mama/clasificación , Neoplasias de la Mama/diagnóstico , Femenino , Redes Reguladoras de Genes/genética , Genes Relacionados con las Neoplasias/genética , Genómica , Humanos , Estimación de Kaplan-Meier , MAP Quinasa Quinasa 4/genética , Polimorfismo de Nucleótido Simple/genética , Pronóstico , Proteína Fosfatasa 2/genética , Resultado del Tratamiento
18.
Genome Biol ; 13(12): R124, 2012 Dec 22.
Artículo en Inglés | MEDLINE | ID: mdl-23383675

RESUMEN

Simultaneous interrogation of tumor genomes and transcriptomes is underway in unprecedented global efforts. Yet, despite the essential need to separate driver mutations modulating gene expression networks from transcriptionally inert passenger mutations, robust computational methods to ascertain the impact of individual mutations on transcriptional networks are underdeveloped. We introduce a novel computational framework, DriverNet, to identify likely driver mutations by virtue of their effect on mRNA expression networks. Application to four cancer datasets reveals the prevalence of rare candidate driver mutations associated with disrupted transcriptional networks and a simultaneous modulation of oncogenic and metabolic networks, induced by copy number co-modification of adjacent oncogenic and metabolic drivers. DriverNet is available on Bioconductor or at http://compbio.bccrc.ca/software/drivernet/.


Asunto(s)
Regulación Neoplásica de la Expresión Génica , Redes Reguladoras de Genes , Mutación , Neoplasias/genética , Programas Informáticos , Algoritmos , Perfilación de la Expresión Génica , Genómica , Humanos , Redes y Vías Metabólicas/genética , Oncogenes , Transcripción Genética
19.
Am J Clin Pathol ; 137(1): 75-85, 2012 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-22180480

RESUMEN

Mantle cell lymphoma (MCL) and small lymphocytic lymphoma (SLL) exhibit similar but distinct immunophenotypic profiles. Many cases can be diagnosed readily by flow cytometry (FCM) alone; however, ambiguous cases are frequently encountered and necessitate additional studies, including immunohistochemical staining for cyclin D1 and fluorescence in situ hybridization for IgH-CCND1 rearrangement. To determine if greater diagnostic accuracy could be achieved from FCM data alone, we developed an unbiased, machine-based algorithm to identify features that best distinguish between the 2 diseases. By applying conventional diagnostic criteria to the flow cytometry data, we were able to assign 28 of 44 (64%) MCL and 48 of 70 (69%) SLL cases correctly. In contrast, we were able to assign all 44 (100%) MCL and 68 of 70 (97%) SLL cases correctly using a novel set of criteria, as identified by our automated approach. The most discriminating feature was the CD20/CD23 mean fluorescence intensity ratio, and we found unexpectedly that inclusion of FMC7 expression in the diagnostic algorithm actually reduced its accuracy. This study demonstrates that computational methods can be used on existing clinical FCM data to improve diagnostic accuracy and suggests similar computational approaches could be used to identify novel prognostic markers and perhaps subdivide existing or define new diagnostic entities.


Asunto(s)
Citometría de Flujo/métodos , Leucemia Linfocítica Crónica de Células B/diagnóstico , Linfoma de Células del Manto/diagnóstico , Anciano , Algoritmos , Antígenos CD20/metabolismo , Inteligencia Artificial , Femenino , Humanos , Inmunofenotipificación , Leucemia Linfocítica Crónica de Células B/sangre , Linfoma de Células del Manto/sangre , Masculino , Reconocimiento de Normas Patrones Automatizadas , Receptores de IgE/metabolismo , Reproducibilidad de los Resultados
20.
Bioinformatics ; 28(2): 167-75, 2012 Jan 15.
Artículo en Inglés | MEDLINE | ID: mdl-22084253

RESUMEN

MOTIVATION: The study of cancer genomes now routinely involves using next-generation sequencing technology (NGS) to profile tumours for single nucleotide variant (SNV) somatic mutations. However, surprisingly few published bioinformatics methods exist for the specific purpose of identifying somatic mutations from NGS data and existing tools are often inaccurate, yielding intolerably high false prediction rates. As such, the computational problem of accurately inferring somatic mutations from paired tumour/normal NGS data remains an unsolved challenge. RESULTS: We present the comparison of four standard supervised machine learning algorithms for the purpose of somatic SNV prediction in tumour/normal NGS experiments. To evaluate these approaches (random forest, Bayesian additive regression tree, support vector machine and logistic regression), we constructed 106 features representing 3369 candidate somatic SNVs from 48 breast cancer genomes, originally predicted with naive methods and subsequently revalidated to establish ground truth labels. We trained the classifiers on this data (consisting of 1015 true somatic mutations and 2354 non-somatic mutation positions) and conducted a rigorous evaluation of these methods using a cross-validation framework and hold-out test NGS data from both exome capture and whole genome shotgun platforms. All learning algorithms employing predictive discriminative approaches with feature selection improved the predictive accuracy over standard approaches by statistically significant margins. In addition, using unsupervised clustering of the ground truth 'false positive' predictions, we noted several distinct classes and present evidence suggesting non-overlapping sources of technical artefacts illuminating important directions for future study. AVAILABILITY: Software called MutationSeq and datasets are available from http://compbio.bccrc.ca.


Asunto(s)
Algoritmos , Inteligencia Artificial , Neoplasias de la Mama/genética , Mutación , Polimorfismo de Nucleótido Simple , Teorema de Bayes , Análisis por Conglomerados , Exoma , Femenino , Genoma , Humanos , Modelos Genéticos , Neoplasias , Programas Informáticos , Máquina de Vectores de Soporte
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...