Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 139
Filtrar
1.
J Comput Biol ; 29(1): 56-73, 2022 01.
Artigo em Inglês | MEDLINE | ID: mdl-34986026

RESUMO

Over the past decade, a promising line of cancer research has utilized machine learning to mine statistical patterns of mutations in cancer genomes for information. Recent work shows that these statistical patterns, commonly referred to as "mutational signatures," have diverse therapeutic potential as biomarkers for cancer therapies. However, translating this potential into reality is hindered by limited access to sequencing in the clinic. Almost all methods for mutational signature analysis (MSA) rely on whole genome or whole exome sequencing data, while sequencing in the clinic is typically limited to small gene panels. To improve clinical access to MSA, we considered the question of whether targeted panels could be designed for the purpose of mutational signature detection. Here we present ScalpelSig, to our knowledge the first algorithm that automatically designs genomic panels optimized for detection of a given mutational signature. The algorithm learns from data to identify genome regions that are particularly indicative of signature activity. Using a cohort of breast cancer genomes as training data, we show that ScalpelSig panels substantially improve accuracy of signature detection compared to baselines. We find that some ScalpelSig panels even approach the performance of whole exome sequencing, which observes over 10 × as much genomic material. We test our algorithm under a variety of conditions, showing that its performance generalizes to another dataset of breast cancers, to smaller panel sizes, and to lesser amounts of training data.


Assuntos
Algoritmos , Análise Mutacional de DNA/estatística & dados numéricos , Genômica/estatística & dados numéricos , Neoplasias da Mama/genética , Estudos de Coortes , Biologia Computacional , Bases de Dados Genéticas/estatística & dados numéricos , Feminino , Humanos , Aprendizado de Máquina , Mutação , Sequenciamento Completo do Genoma/estatística & dados numéricos
3.
PLoS Comput Biol ; 17(11): e1009161, 2021 11.
Artigo em Inglês | MEDLINE | ID: mdl-34762640

RESUMO

Network propagation refers to a class of algorithms that integrate information from input data across connected nodes in a given network. These algorithms have wide applications in systems biology, protein function prediction, inferring condition-specifically altered sub-networks, and prioritizing disease genes. Despite the popularity of network propagation, there is a lack of comparative analyses of different algorithms on real data and little guidance on how to select and parameterize the various algorithms. Here, we address this problem by analyzing different combinations of network normalization and propagation methods and by demonstrating schemes for the identification of optimal parameter settings on real proteome and transcriptome data. Our work highlights the risk of a 'topology bias' caused by the incorrect use of network normalization approaches. Capitalizing on the fact that network propagation is a regularization approach, we show that minimizing the bias-variance tradeoff can be utilized for selecting optimal parameters. The application to real multi-omics data demonstrated that optimal parameters could also be obtained by either maximizing the agreement between different omics layers (e.g. proteome and transcriptome) or by maximizing the consistency between biological replicates. Furthermore, we exemplified the utility and robustness of network propagation on multi-omics datasets for identifying ageing-associated genes in brain and liver tissues of rats and for elucidating molecular mechanisms underlying prostate cancer progression. Overall, this work compares different network propagation approaches and it presents strategies for how to use network propagation algorithms to optimally address a specific research question at hand.


Assuntos
Algoritmos , Biologia Computacional/métodos , Envelhecimento/genética , Envelhecimento/metabolismo , Animais , Viés , Encéfalo/metabolismo , Biologia Computacional/estatística & dados numéricos , Interpretação Estatística de Dados , Progressão da Doença , Perfilação da Expressão Gênica/estatística & dados numéricos , Redes Reguladoras de Genes , Genômica/estatística & dados numéricos , Humanos , Fígado/metabolismo , Masculino , Neoplasias da Próstata/etiologia , Neoplasias da Próstata/genética , Neoplasias da Próstata/metabolismo , Mapas de Interação de Proteínas , Proteômica/estatística & dados numéricos , RNA Mensageiro/genética , RNA Mensageiro/metabolismo , Ratos , Biologia de Sistemas
4.
Clin Epigenetics ; 13(1): 179, 2021 09 25.
Artigo em Inglês | MEDLINE | ID: mdl-34563241

RESUMO

BACKGROUND: Nasal intestinal-type adenocarcinomas (ITAC) are strongly related to chronic wood dust exposure: The intestinal phenotype relies on CDX2 overexpression but underlying molecular mechanisms remain unknown. Our objectives were to investigate transcriptomic and methylation differences between healthy non-exposed and tumor olfactory cleft mucosae and to compare transcriptomic profiles between non-exposed, wood dust-exposed and ITAC mucosa cells. METHODS: We conducted a prospective monocentric study (NCT0281823) including 16 woodworkers with ITAC, 16 healthy exposed woodworkers and 13 healthy, non-exposed, controls. We compared tumor samples with healthy non-exposed samples, both in transcriptome and in methylome analyses. We also investigated wood dust-induced transcriptome modifications of exposed (without tumor) male woodworkers' samples and of contralateral sides of woodworkers with tumors. We conducted in parallel transcriptome and methylome analysis, and then, the transcriptome analysis was focused on the genes highlighted in methylome analysis. We replicated our results on dataset GSE17433. RESULTS: Several clusters of genes enabled the distinction between healthy and ITAC samples. Transcriptomic and IHC analysis confirmed a constant overexpression of CDX2 in ITAC samples, without any specific DNA methylation profile regarding the CDX2 locus. ITAC woodworkers also exhibited a specific transcriptomic profile in their contralateral (non-tumor) olfactory cleft, different from that of other exposed woodworkers, suggesting that they had a different exposure or a different susceptibility. Two top-loci (CACNA1C/CACNA1C-AS1 and SLC26A10) were identified with a hemimethylated profile, but only CACNA1C appeared to be overexpressed both in transcriptomic analysis and in immunohistochemistry. CONCLUSIONS: Several clusters of genes enable the distinction between healthy mucosa and ITAC samples even in contralateral nasal fossa thus paving the way for a simple diagnostic tool for ITAC in male woodworkers. CACNA1C might be considered as a master gene of ITAC and should be further investigated. TRIAL REGISTRATION: NIH ClinicalTrials, NCT0281823, registered May 23d 2016, https://www.clinicaltrials.gov/NCT0281823 .


Assuntos
Canais de Cálcio Tipo L/metabolismo , Genômica/métodos , Neoplasias Intestinais/genética , Neoplasias Nasais/genética , Adenocarcinoma/epidemiologia , Adenocarcinoma/genética , Idoso , Canais de Cálcio Tipo L/genética , Metilação de DNA/efeitos dos fármacos , Feminino , Genômica/instrumentação , Genômica/estatística & dados numéricos , Humanos , Neoplasias Intestinais/epidemiologia , Masculino , Pessoa de Meia-Idade , Neoplasias Nasais/epidemiologia , Exposição Ocupacional/análise , Madeira
5.
PLoS Comput Biol ; 17(8): e1009224, 2021 08.
Artigo em Inglês | MEDLINE | ID: mdl-34383739

RESUMO

Computational integrative analysis has become a significant approach in the data-driven exploration of biological problems. Many integration methods for cancer subtyping have been proposed, but evaluating these methods has become a complicated problem due to the lack of gold standards. Moreover, questions of practical importance remain to be addressed regarding the impact of selecting appropriate data types and combinations on the performance of integrative studies. Here, we constructed three classes of benchmarking datasets of nine cancers in TCGA by considering all the eleven combinations of four multi-omics data types. Using these datasets, we conducted a comprehensive evaluation of ten representative integration methods for cancer subtyping in terms of accuracy measured by combining both clustering accuracy and clinical significance, robustness, and computational efficiency. We subsequently investigated the influence of different omics data on cancer subtyping and the effectiveness of their combinations. Refuting the widely held intuition that incorporating more types of omics data always produces better results, our analyses showed that there are situations where integrating more omics data negatively impacts the performance of integration methods. Our analyses also suggested several effective combinations for most cancers under our studies, which may be of particular interest to researchers in omics data analysis.


Assuntos
Biologia Computacional/métodos , Neoplasias/classificação , Neoplasias/genética , Algoritmos , Biomarcadores Tumorais/genética , Interpretação Estatística de Dados , Bases de Dados Genéticas/estatística & dados numéricos , Aprendizado Profundo , Feminino , Genômica/estatística & dados numéricos , Humanos , Masculino , Aprendizado de Máquina não Supervisionado
6.
Genome Biol ; 22(1): 208, 2021 07 13.
Artigo em Inglês | MEDLINE | ID: mdl-34256818

RESUMO

One challenge facing omics association studies is the loss of statistical power when adjusting for confounders and multiple testing. The traditional statistical procedure involves fitting a confounder-adjusted regression model for each omics feature, followed by multiple testing correction. Here we show that the traditional procedure is not optimal and present a new approach, 2dFDR, a two-dimensional false discovery rate control procedure, for powerful confounder adjustment in multiple testing. Through extensive evaluation, we demonstrate that 2dFDR is more powerful than the traditional procedure, and in the presence of strong confounding and weak signals, the power improvement could be more than 100%.


Assuntos
Algoritmos , Estudo de Associação Genômica Ampla , Genômica/estatística & dados numéricos , Atlas como Assunto , Carcinoma Hepatocelular/genética , Carcinoma Hepatocelular/metabolismo , Metilação de DNA , Conjuntos de Dados como Assunto , Reações Falso-Positivas , Microbioma Gastrointestinal/genética , Genômica/métodos , Hepatite B/genética , Hepatite B/metabolismo , Vírus da Hepatite B/genética , Vírus da Hepatite B/patogenicidade , Humanos , Modelos Lineares , Neoplasias Hepáticas/genética , Neoplasias Hepáticas/metabolismo
7.
J Cancer Res Ther ; 17(2): 477-483, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34121695

RESUMO

PURPOSE: This study systematically reviews the distribution of racial/ancestral features and their inclusion as covariates in genetic-toxicity association studies following radiation therapy. MATERIALS AND METHODS: Original research studies associating genetic features and normal tissue complications following radiation therapy were identified from PubMed. The distribution of radiogenomic studies was determined by mining the statement of country of origin and racial/ancestrial distribution and the inclusion in analyses. Descriptive analyses were performed to determine the distribution of studies across races/ancestries, countries, and continents and the inclusion in analyses. RESULTS: Among 174 studies, only 23 with a population of more one race/ancestry which were predominantly conducted in the United States. Across the continents, most studies were performed in Europe (77 studies averaging at 30.6 patients/million population [pt/mil]), North America (46 studies, 20.8 pt/mil), Asia (46 studies, 2.4 pt/mil), South America (3 studies, 0.4 pt/mil), Oceania (2 studies, 2.1 pt/mil), and none from Africa. All 23 studies with more than one race/ancestry considered race/ancestry as a covariate, and three studies showed race/ancestry to be significantly associated with endpoints. CONCLUSION: Most toxicity-related radiogenomic studies involved a single race/ancestry. Individual Participant Data meta-analyses or multinational studies need to be encouraged.


Assuntos
Predisposição Genética para Doença , Genômica/estatística & dados numéricos , Neoplasias/radioterapia , Grupos Raciais/estatística & dados numéricos , Lesões por Radiação/genética , Humanos , Neoplasias/genética , Grupos Raciais/genética , Lesões por Radiação/epidemiologia
8.
Ann Med ; 53(1): 596-610, 2021 12.
Artigo em Inglês | MEDLINE | ID: mdl-33830879

RESUMO

PURPOSE: This study aims to identify potential prognostic biomarkers of bladder cancer (BCa) based on large-scale multi-omics data and investigate the role of SRC in improving predictive outcomes for BCa patients and those receiving immune checkpoint therapies (ICTs). METHODS: Large-scale multi-comic data were enrolled from the Cancer Proteome Atlas, the Cancer Genome Atlas and gene expression omnibus based on machining-learning methods. Immune infiltration, survival and other statistical analyses were implemented using R software in cancers (n = 12,452). The predictive value of SRC was performed in 81 BCa patients receiving ICT from aa validation cohort (n = 81). RESULTS: Landscape of novel candidate prognostic protein signatures of BCa patients was identified. Differential BECLIN, EGFR, PKCALPHA, ANNEXIN1, AXL and SRC expression significantly correlated with the outcomes for BCa patients from multiply cohorts (n = 906). Notably, risk score of the integrated prognosis-related proteins (IPRPs) model exhibited high diagnostic accuracy and consistent predictive ability (AUC = 0.714). Besides, we tested the clinical relevance of baseline SRC protein and mRNA expression in two independent confirmatory cohorts (n = 566) and the prognostic value in pan-cancers. Then, we found that elevated SRC expression contributed to immunosuppressive microenvironment mediated by immune checkpoint molecules of BCa and other cancers. Next, we validated SRC expression as a potential biomarker in predicting response to ICT in 81 BCa patient from FUSCC cohort, and found that expression of SRC in the baseline tumour tissues correlated with improved survival benefits, but predicts worse ICT response. CONCLUSION: This study first performed the large-scale multi-omics analysis, distinguished the IPRPs (BECLIN, EGFR, PKCALPHA, SRC, ANNEXIN1 and AXL) and revealed novel prediction model, outperforming the currently traditional prognostic indicators for anticipating BCa progression and better clinical strategies. Additionally, this study provided insight into the importance of biomarker SRC for better prognosis, which may inversely improve predictive outcomes for patients receiving ICT and enable patient selection for future clinical treatment.


Assuntos
Imunidade Adaptativa/genética , Genes src/genética , Genômica/estatística & dados numéricos , Imunoterapia , Neoplasias da Bexiga Urinária/genética , Anexina A1/metabolismo , Área Sob a Curva , Proteína Beclina-1/metabolismo , Biomarcadores Tumorais/genética , Bases de Dados Genéticas , Receptores ErbB/metabolismo , Expressão Gênica/genética , Genômica/métodos , Humanos , Aprendizado de Máquina , Seleção de Pacientes , Valor Preditivo dos Testes , Prognóstico , Modelos de Riscos Proporcionais , Proteína Quinase C-alfa/metabolismo , Proteínas Proto-Oncogênicas/metabolismo , Receptores Proteína Tirosina Quinases/metabolismo , Fatores de Risco , Análise de Sobrevida , Neoplasias da Bexiga Urinária/tratamento farmacológico , Receptor Tirosina Quinase Axl
9.
Sci Rep ; 11(1): 5146, 2021 03 04.
Artigo em Inglês | MEDLINE | ID: mdl-33664338

RESUMO

Multi-modal molecular profiling data in bulk tumors or single cells are accumulating at a fast pace. There is a great need for developing statistical and computational methods to reveal molecular structures in complex data types toward biological discoveries. Here, we introduce Nebula, a novel Bayesian integrative clustering analysis for high dimensional multi-modal molecular data to identify directly interpretable clusters and associated biomarkers in a unified and biologically plausible framework. To facilitate computational efficiency, a variational Bayes approach is developed to approximate the joint posterior distribution to achieve model inference in high-dimensional settings. We describe a pan-cancer data analysis of genomic, epigenomic, and transcriptomic alterations in close to 9000 tumor samples across canonical oncogenic signaling pathways, immune and stemness phenotype, with comparisons to state-of-the-art clustering methods. We demonstrate that Nebula has the unique advantage of revealing patterns on the basis of shared pathway alterations, offering biological and clinical insights beyond tumor type and histology in the pan-cancer analysis setting. We also illustrate the utility of Nebula in single cell data for immune cell decomposition in peripheral blood samples.


Assuntos
Carcinogênese/genética , Biologia Computacional/estatística & dados numéricos , Genômica/estatística & dados numéricos , Neoplasias/genética , Teorema de Bayes , Análise por Conglomerados , Epigenômica , Humanos , Modelos Estatísticos , Neoplasias/patologia , Transcriptoma/genética
10.
Biol Direct ; 16(1): 7, 2021 02 08.
Artigo em Inglês | MEDLINE | ID: mdl-33557857

RESUMO

Cancer is a poligenetic disease with each cancer type having a different mutation profile. Genomic data can be utilized to detect these profiles and to diagnose and differentiate cancer types. Variant calling provide mutation information. Gene expression data reveal the altered cell behaviour. The combination of the mutation and expression information can lead to accurate discrimination of different cancer types. In this study, we utilized and transferred the information of existing mutations for a novel gene selection method for gene expression data. We tested the proposed method in order to diagnose and differentiate cancer types. It is a disease specific method as both the mutations and expressions are filtered according to the selected cancer types. Our experiment results show that the proposed gene selection method leads to similar or improved performance metrics compared to classical feature selection methods and curated gene sets.


Assuntos
Perfilação da Expressão Gênica/métodos , Genômica/estatística & dados numéricos , Aprendizado de Máquina , Neoplasias/classificação , Algoritmos , Neoplasias/genética
11.
Comput Math Methods Med ; 2021: 9436582, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34976114

RESUMO

High dimensionality and noise have made it difficult to detect related biomarkers in omics data. Through previous study, penalized maximum trimmed likelihood estimation is effective in identifying mislabeled samples in high-dimensional data with mislabeled error. However, the algorithm commonly used in these studies is the concentration step (C-step), and the C-step algorithm that is applied to robust penalized regression does not ensure that the criterion function is gradually optimized iteratively, because the regularized parameters change during the iteration. This makes the C-step algorithm runs very slowly, especially when dealing with high-dimensional omics data. The AR-Cstep (C-step combined with an acceptance-rejection scheme) algorithm is proposed. In simulation experiments, the AR-Cstep algorithm converged faster (the average computation time was only 2% of that of the C-step algorithm) and was more accurate in terms of variable selection and outlier identification than the C-step algorithm. The two algorithms were further compared on triple negative breast cancer (TNBC) RNA-seq data. AR-Cstep can solve the problem of the C-step not converging and ensures that the iterative process is in the direction that improves criterion function. As an improvement of the C-step algorithm, the AR-Cstep algorithm can be extended to other robust models with regularized parameters.


Assuntos
Algoritmos , Biomarcadores/análise , Biomarcadores Tumorais/genética , Biologia Computacional , Simulação por Computador , Bases de Dados Genéticas/estatística & dados numéricos , Feminino , Genômica/estatística & dados numéricos , Humanos , Modelos Logísticos , Metabolômica/estatística & dados numéricos , Neoplasias de Mama Triplo Negativas/genética
12.
Clin Cancer Res ; 27(1): 320-329, 2021 01 01.
Artigo em Inglês | MEDLINE | ID: mdl-33037017

RESUMO

PURPOSE: The role of immune-oncologic mechanisms of racial disparities in prostate cancer remains understudied. Limited research exists to evaluate the molecular underpinnings of immune differences in African American men (AAM) and European American men (EAM) prostate tumor microenvironment (TME). EXPERIMENTAL DESIGN: A total of 1,173 radiation-naïve radical prostatectomy samples with whole transcriptome data from the Decipher GRID registry were used. Transcriptomic expressions of 1,260 immune-specific genes were selected to assess immune-oncologic differences between AAM and EAM prostate tumors. Race-specific differential expression of genes was assessed using a rank test, and intergene correlational matrix and gene set enrichment was used for pathway analysis. RESULTS: AAM prostate tumors have significant enrichment of major immune-oncologic pathways, including proinflammatory cytokines, IFNα, IFNγ, TNFα signaling, ILs, and epithelial-mesenchymal transition. AAM TME has higher total immune content score (ICSHIGH) compared with 0 (37.8% vs. 21.9%, P = 0.003). AAM tumors also have lower DNA damage repair and are genomically radiosensitive as compared with EAM. IFITM3 (IFN-inducible transmembrane protein 3) was one of the major proinflammatory genes overexpressed in AAM that predicted increased risk of biochemical recurrence selectively for AAM in both discovery [HRAAM = 2.30; 95% confidence interval (CI), 1.21-4.34; P = 0.01] and validation (HRAAM = 2.42; 95% CI, 1.52-3.86; P = 0.0001) but not in EAM. CONCLUSIONS: Prostate tumors of AAM manifest a unique immune repertoire and have significant enrichment of proinflammatory immune pathways that are associated with poorer outcomes. Observed immune-oncologic differences can aid in a genomically adaptive approach to treating prostate cancer in AAM.


Assuntos
Negro ou Afro-Americano/genética , Regulação Neoplásica da Expressão Gênica/imunologia , Recidiva Local de Neoplasia/imunologia , Neoplasias da Próstata/genética , Microambiente Tumoral/imunologia , Negro ou Afro-Americano/estatística & dados numéricos , Idoso , Conjuntos de Dados como Assunto , Transição Epitelial-Mesenquimal/genética , Transição Epitelial-Mesenquimal/imunologia , Seguimentos , Genômica/estatística & dados numéricos , Disparidades nos Níveis de Saúde , Humanos , Masculino , Pessoa de Meia-Idade , Recidiva Local de Neoplasia/genética , Recidiva Local de Neoplasia/prevenção & controle , Próstata/imunologia , Próstata/patologia , Prostatectomia , Neoplasias da Próstata/imunologia , Neoplasias da Próstata/mortalidade , Neoplasias da Próstata/terapia , Medição de Risco/estatística & dados numéricos , Microambiente Tumoral/genética , População Branca/genética , População Branca/estatística & dados numéricos
13.
PLoS Comput Biol ; 16(11): e1008397, 2020 11.
Artigo em Inglês | MEDLINE | ID: mdl-33226985

RESUMO

Genetic diseases are driven by aberrations of the human genome. Identification of such aberrations including structural variations (SVs) is key to our understanding. Conventional short-reads whole genome sequencing (cWGS) can identify SVs to base-pair resolution, but utilizes only short-range information and suffers from high false discovery rate (FDR). Linked-reads sequencing (10XWGS) utilizes long-range information by linkage of short-reads originating from the same large DNA molecule. This can mitigate alignment-based artefacts especially in repetitive regions and should enable better prediction of SVs. However, an unbiased evaluation of this technology is not available. In this study, we performed a comprehensive analysis of different types and sizes of SVs predicted by both the technologies and validated with an independent PCR based approach. The SVs commonly identified by both the technologies were highly specific, while validation rate dropped for uncommon events. A particularly high FDR was observed for SVs only found by 10XWGS. To improve FDR and sensitivity, statistical models for both the technologies were trained. Using our approach, we characterized SVs from the MCF7 cell line and a primary breast cancer tumor with high precision. This approach improves SV prediction and can therefore help in understanding the underlying genetics in various diseases.


Assuntos
Variação Estrutural do Genoma , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Sequenciamento Completo do Genoma/métodos , Neoplasias da Mama/genética , Biologia Computacional , Código de Barras de DNA Taxonômico/métodos , Código de Barras de DNA Taxonômico/estatística & dados numéricos , DNA de Neoplasias/genética , Feminino , Doenças Genéticas Inatas/diagnóstico , Doenças Genéticas Inatas/genética , Genoma Humano , Genômica/métodos , Genômica/estatística & dados numéricos , Sequenciamento de Nucleotídeos em Larga Escala/estatística & dados numéricos , Humanos , Modelos Logísticos , Células MCF-7 , Reação em Cadeia da Polimerase/métodos , Reação em Cadeia da Polimerase/estatística & dados numéricos , Análise de Sequência de DNA/métodos , Análise de Sequência de DNA/estatística & dados numéricos , Sequenciamento Completo do Genoma/estatística & dados numéricos
14.
PLoS Comput Biol ; 16(11): e1008405, 2020 11.
Artigo em Inglês | MEDLINE | ID: mdl-33166290

RESUMO

Given the complexity and diversity of the cancer genomics profiles, it is challenging to identify distinct clusters from different cancer types. Numerous analyses have been conducted for this propose. Still, the methods they used always do not directly support the high-dimensional omics data across the whole genome (Such as ATAC-seq profiles). In this study, based on the deep adversarial learning, we present an end-to-end approach ClusterATAC to leverage high-dimensional features and explore the classification results. On the ATAC-seq dataset and RNA-seq dataset, ClusterATAC has achieved excellent performance. Since ATAC-seq data plays a crucial role in the study of the effects of non-coding regions on the molecular classification of cancers, we explore the clustering solution obtained by ClusterATAC on the pan-cancer ATAC dataset. In this solution, more than 70% of the clustering are single-tumor-type-dominant, and the vast majority of the remaining clusters are associated with similar tumor types. We explore the representative non-coding loci and their linked genes of each cluster and verify some results by the literature search. These results suggest that a large number of non-coding loci affect the development and progression of cancer through its linked genes, which can potentially advance cancer diagnosis and therapy.


Assuntos
Sequenciamento de Cromatina por Imunoprecipitação/estatística & dados numéricos , Aprendizado Profundo , Neoplasias/classificação , Neoplasias/genética , Cromatina/genética , Biologia Computacional , Bases de Dados de Ácidos Nucleicos/estatística & dados numéricos , Genômica/métodos , Genômica/estatística & dados numéricos , Humanos , Família Multigênica , Distribuição Normal , Oncogenes , RNA-Seq/estatística & dados numéricos
15.
Biomolecules ; 10(10)2020 10 19.
Artigo em Inglês | MEDLINE | ID: mdl-33086649

RESUMO

Mortality attributed to lung cancer accounts for a large fraction of cancer deaths worldwide. With increasing mortality figures, the accurate prediction of prognosis has become essential. In recent years, multi-omics analysis has emerged as a useful survival prediction tool. However, the methodology relevant to multi-omics analysis has not yet been fully established and further improvements are required for clinical applications. In this study, we developed a novel method to accurately predict the survival of patients with lung cancer using multi-omics data. With unsupervised learning techniques, survival-associated subtypes in non-small cell lung cancer were first detected using the multi-omics datasets from six categories in The Cancer Genome Atlas (TCGA). The new subtypes, referred to as integration survival subtypes, clearly divided patients into longer and shorter-surviving groups (log-rank test: p = 0.003) and we confirmed that this is independent of histopathological classification (Chi-square test of independence: p = 0.94). Next, an attempt was made to detect the integration survival subtypes using only one categorical dataset. Our machine learning model that was only trained on the reverse phase protein array (RPPA) could accurately predict the integration survival subtypes (AUC = 0.99). The predicted subtypes could also distinguish between high and low risk patients (log-rank test: p = 0.012). Overall, this study explores novel potentials of multi-omics analysis to accurately predict the prognosis of patients with lung cancer.


Assuntos
Carcinoma Pulmonar de Células não Pequenas/genética , Aprendizado Profundo , Aprendizado de Máquina , Prognóstico , Carcinoma Pulmonar de Células não Pequenas/patologia , Metilação de DNA/genética , Intervalo Livre de Doença , Feminino , Genômica/estatística & dados numéricos , Humanos , Masculino , Modelos Teóricos , Análise Serial de Proteínas/métodos , Proteômica/estatística & dados numéricos
16.
PLoS One ; 15(10): e0238996, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-33095785

RESUMO

Recent developments in high-throughput methods have resulted in the collection of high-dimensional data types from multiple sources and technologies that measure distinct yet complementary information. Integrated clustering of such multiple data types or multi-view clustering is critical for revealing pathological insights. However, multi-view clustering is challenging due to the complex dependence structure between multiple data types, including directional dependency. Specifically, genomics data types have pre-specified directional dependencies known as the central dogma that describes the process of information flow from DNA to messenger RNA (mRNA) and then from mRNA to protein. Most of the existing multi-view clustering approaches assume an independent structure or pair-wise (non-directional) dependence between data types, thereby ignoring their directional relationship. Motivated by this, we propose a biology-inspired Bayesian integrated multi-view clustering model that uses an asymmetric copula to accommodate the directional dependencies between the data types. Via extensive simulation experiments, we demonstrate the negative impact of ignoring directional dependency on clustering performance. We also present an application of our model to a real-world dataset of breast cancer tumor samples collected from The Cancer Genome Altas program and provide comparative results.


Assuntos
Genômica/métodos , Modelos Estatísticos , Teorema de Bayes , Neoplasias da Mama/genética , Análise por Conglomerados , Simulação por Computador , Interpretação Estatística de Dados , Bases de Dados Genéticas/estatística & dados numéricos , Feminino , Genômica/estatística & dados numéricos , Humanos , Cadeias de Markov , Distribuição Normal
17.
Clin Cancer Res ; 26(17): 4651-4660, 2020 09 01.
Artigo em Inglês | MEDLINE | ID: mdl-32651179

RESUMO

PURPOSE: African American (AFR) men have the highest mortality rate from prostate cancer (PCa) compared with men of other racial/ancestral groups. Differences in the spectrum of somatic genome alterations in tumors between AFR men and other populations have not been well-characterized due to a lack of inclusion of significant numbers in genomic studies. EXPERIMENTAL DESIGN: To identify genomic alterations associated with race, we compared the frequencies of somatic alterations in PCa obtained from four publicly available datasets comprising 250 AFR and 611 European American (EUR) men and a targeted sequencing dataset from a commercial platform of 436 AFR and 3018 EUR men. RESULTS: Mutations in ZFHX3 as well as focal deletions in ETV3 were more frequent in tumors from AFR men. TP53 mutations were associated with increasing Gleason score. MYC amplifications were more frequent in tumors from AFR men with metastatic PCa, whereas deletions in PTEN and rearrangements in TMPRSS2-ERG were less frequent in tumors from AFR men. KMT2D truncations and CCND1 amplifications were more frequent in primary PCa from AFR men. Genomic features that could impact clinical decision making were not significantly different between the two groups including tumor mutation burden, MSI status, and genomic alterations in select DNA repair genes, CDK12, and in AR. CONCLUSIONS: Although we identified some novel differences in AFR men compared with other populations, the frequencies of genomic alterations in current therapeutic targets for PCa were similar between AFR and EUR men, suggesting that existing precision medicine approaches could be equally beneficial if applied equitably.


Assuntos
Biomarcadores Tumorais/genética , Negro ou Afro-Americano/genética , Genômica/estatística & dados numéricos , Neoplasias da Próstata/genética , População Branca/genética , Adulto , Idoso , Idoso de 80 Anos ou mais , Variações do Número de Cópias de DNA , Análise Mutacional de DNA/estatística & dados numéricos , Reparo do DNA , Conjuntos de Dados como Assunto , Disparidades nos Níveis de Saúde , Humanos , Incidência , Masculino , Pessoa de Meia-Idade , Mutação , Gradação de Tumores , Neoplasias da Próstata/diagnóstico , Neoplasias da Próstata/mortalidade , Neoplasias da Próstata/patologia
18.
J Biopharm Stat ; 30(5): 834-853, 2020 09 02.
Artigo em Inglês | MEDLINE | ID: mdl-32310707

RESUMO

Precision medicine is an emerging approach for disease treatment and prevention that accounts for individual variability in genes, environment, and lifestyle. Cancer is a genomic disease; therefore, the dose-efficacy and dose-toxicity relationships for molecularly targeted agents in cancer most likely differ, based on the genomic mutation pattern. The individualized optimal dose - the maximal efficacious dose with a clinically acceptable safety profile - may vary depending on the genomic mutation patterns and should be determined prior to the use of these agents in precision medicine. In addition, genes that influence the individualized optimal doses should be identified in early-phase development. In this study, we propose a novel dose-finding approach to identify the individualized optimal dose for molecularly targeted agents in phase I cancer trials. Individualized optimal dose determination and gene selection were conducted simultaneously based on L1 and L2 penalized regression. Similar to most reported dose-finding approaches, this study considers non-monotonic patterns for dose-efficacy and dose-toxicity relationships, as well as correlations between efficacy and toxicity outcomes based on multinomial distribution. Our dose-finding algorithm is based on the predictive probability calculated with an estimated penalized regression model. We compare the operating characteristics between the proposed and existing methods by simulation studies under various scenarios.


Assuntos
Ensaios Clínicos Fase I como Assunto/estatística & dados numéricos , Genômica/estatística & dados numéricos , Medicina de Precisão/estatística & dados numéricos , Projetos de Pesquisa/estatística & dados numéricos , Algoritmos , Antineoplásicos/administração & dosagem , Biomarcadores Tumorais/genética , Simulação por Computador , Interpretação Estatística de Dados , Relação Dose-Resposta a Droga , Humanos , Modelos Estatísticos , Terapia de Alvo Molecular/estatística & dados numéricos , Mutação , Neoplasias/tratamento farmacológico , Neoplasias/genética
19.
Urology ; 142: 166-173, 2020 08.
Artigo em Inglês | MEDLINE | ID: mdl-32277993

RESUMO

OBJECTIVE: To validate the 17-gene Oncotype DX Genomic Prostate Score (GPS) as a predictor of adverse pathology (AP) in African American (AA) men and to assess the distribution of GPS in AA and European American (EA) men with localized prostate cancer. METHODS: The study populations were derived from 2 multi-institutional observational studies. Between February 2009 and September 2014, AA and EA men who elected immediate radical prostatectomy after a ≥10-core transrectal ultrasound biopsy were included in the study. Logistic regressions, area under the receiver operating characteristics curves (AUC), calibration curves, and predictive values were used to compare the accuracy of GPS. AP was defined as primary Gleason grade 4, presence of any Gleason pattern 5, and/or non-organ-confined disease (≥pT3aN0M0) at radical prostatectomy. RESULTS: Overall, 96 AA and 76 EA men were selected and 46 (26.7%) had AP. GPS result was a significant predictor of AP (odds ratio per 20 GPS units [OR/20 units] in AA: 4.58; 95% confidence interval (CI) 1.8-11.5, P = .001; and EA: 4.88; 95% CI 1.8-13.5, P = .002). On multivariate analysis, there was no significant interaction between GPS and race (P >.10). GPS remained significant in models adjusted for either National Comprehensive Cancer Network (NCCN) risk group or Cancer of the Prostate Risk Assessment (CAPRA) score. In race-stratified models, area under the receiver operating characteristics curves for GPS/20 units was 0.69 for AAs vs 0.74 for EAs (P = .79). The GPS distributions were not statistically different by race (all P >.05). CONCLUSION: In this clinical validation study, the Oncotype DX GPS is an independent predictor of AP at prostatectomy in AA and EA men with similar predictive accuracy and distributions.


Assuntos
Testes Genéticos/estatística & dados numéricos , Próstata/patologia , Prostatectomia , Neoplasias da Próstata/diagnóstico , Fatores Raciais/estatística & dados numéricos , Negro ou Afro-Americano/estatística & dados numéricos , Idoso , Biópsia com Agulha de Grande Calibre , Genômica/métodos , Genômica/estatística & dados numéricos , Humanos , Masculino , Pessoa de Meia-Idade , Gradação de Tumores , Estudos Observacionais como Assunto , Valor Preditivo dos Testes , Prognóstico , Próstata/cirurgia , Neoplasias da Próstata/genética , Neoplasias da Próstata/cirurgia , Curva ROC , Medição de Risco/métodos , Medição de Risco/estatística & dados numéricos , Estados Unidos , População Branca/estatística & dados numéricos
20.
Br J Cancer ; 122(10): 1467-1476, 2020 05.
Artigo em Inglês | MEDLINE | ID: mdl-32203215

RESUMO

BACKGROUND: Unsupervised learning methods, such as Hierarchical Cluster Analysis, are commonly used for the analysis of genomic platform data. Unfortunately, such approaches ignore the well-documented heterogeneous composition of prostate cancer samples. Our aim is to use more sophisticated analytical approaches to deconvolute the structure of prostate cancer transcriptome data, providing novel clinically actionable information for this disease. METHODS: We apply an unsupervised model called Latent Process Decomposition (LPD), which can handle heterogeneity within individual cancer samples, to genome-wide expression data from eight prostate cancer clinical series, including 1,785 malignant samples with the clinical endpoints of PSA failure and metastasis. RESULTS: We show that PSA failure is correlated with the level of an expression signature called DESNT (HR = 1.52, 95% CI = [1.36, 1.7], P = 9.0 × 10-14, Cox model), and that patients with a majority DESNT signature have an increased metastatic risk (X2 test, P = 0.0017, and P = 0.0019). In addition, we develop a stratification framework that incorporates DESNT and identifies three novel molecular subtypes of prostate cancer. CONCLUSIONS: These results highlight the importance of using more complex approaches for the analysis of genomic data, may assist drug targeting, and have allowed the construction of a nomogram combining DESNT with other clinical factors for use in clinical management.


Assuntos
Biomarcadores Tumorais/sangue , Perfilação da Expressão Gênica/estatística & dados numéricos , Neoplasias da Próstata/genética , Transcriptoma/genética , Regulação Neoplásica da Expressão Gênica/genética , Genômica/estatística & dados numéricos , Humanos , Estimativa de Kaplan-Meier , Masculino , Pessoa de Meia-Idade , Prognóstico , Intervalo Livre de Progressão , Modelos de Riscos Proporcionais , Antígeno Prostático Específico/sangue , Neoplasias da Próstata/sangue , Neoplasias da Próstata/patologia , Medição de Risco , Fatores de Risco
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA