Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 107
Filtrar
1.
Nat Cancer ; 5(2): 299-314, 2024 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-38253803

RESUMO

Contemporary analyses focused on a limited number of clinical and molecular biomarkers have been unable to accurately predict clinical outcomes in pancreatic ductal adenocarcinoma. Here we describe a precision medicine platform known as the Molecular Twin consisting of advanced machine-learning models and use it to analyze a dataset of 6,363 clinical and multi-omic molecular features from patients with resected pancreatic ductal adenocarcinoma to accurately predict disease survival (DS). We show that a full multi-omic model predicts DS with the highest accuracy and that plasma protein is the top single-omic predictor of DS. A parsimonious model learning only 589 multi-omic features demonstrated similar predictive performance as the full multi-omic model. Our platform enables discovery of parsimonious biomarker panels and performance assessment of outcome prediction models learning from resource-intensive panels. This approach has considerable potential to impact clinical care and democratize precision cancer medicine worldwide.


Assuntos
Adenocarcinoma , Carcinoma Ductal Pancreático , Neoplasias Pancreáticas , Humanos , Adenocarcinoma/genética , Adenocarcinoma/cirurgia , Neoplasias Pancreáticas/genética , Neoplasias Pancreáticas/cirurgia , Multiômica , Inteligência Artificial , Carcinoma Ductal Pancreático/genética , Carcinoma Ductal Pancreático/cirurgia , Inteligência
2.
Pac Symp Biocomput ; 29: 359-373, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38160292

RESUMO

This work demonstrates the use of cluster analysis in detecting fair and unbiased novel discoveries. Given a sample population of elective spinal fusion patients, we identify two overarching subgroups driven by insurance type. The Medicare group, associated with lower socioeconomic status, exhibited an over-representation of negative risk factors. The findings provide a compelling depiction of the interwoven socioeconomic and racial disparities present within the healthcare system, highlighting their consequential effects on health inequalities. The results are intended to guide design of fair and precise machine learning models based on intentional integration of population stratification.


Assuntos
Medicare , Disparidades Socioeconômicas em Saúde , Idoso , Humanos , Estados Unidos , Biologia Computacional , Grupos Raciais , Análise por Conglomerados
3.
Pac Symp Biocomput ; 29: 96-107, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38160272

RESUMO

The concept of a digital twin came from the engineering, industrial, and manufacturing domains to create virtual objects or machines that could inform the design and development of real objects. This idea is appealing for precision medicine where digital twins of patients could help inform healthcare decisions. We have developed a methodology for generating and using digital twins for clinical outcome prediction. We introduce a new approach that combines synthetic data and network science to create digital twins (i.e. SynTwin) for precision medicine. First, our approach starts by estimating the distance between all subjects based on their available features. Second, the distances are used to construct a network with subjects as nodes and edges defining distance less than the percolation threshold. Third, communities or cliques of subjects are defined. Fourth, a large population of synthetic patients are generated using a synthetic data generation algorithm that models the correlation structure of the data to generate new patients. Fifth, digital twins are selected from the synthetic patient population that are within a given distance defining a subject community in the network. Finally, we compare and contrast community-based prediction of clinical endpoints using real subjects, digital twins, or both within and outside of the community. Key to this approach are the digital twins defined using patient similarity that represent hypothetical unobserved patients with patterns similar to nearby real patients as defined by network distance and community structure. We apply our SynTwin approach to predicting mortality in a population-based cancer registry (n=87,674) from the Surveillance, Epidemiology, and End Results (SEER) program from the National Cancer Institute (USA). Our results demonstrate that nearest network neighbor prediction of mortality in this study is significantly improved with digital twins (AUROC=0.864, 95% CI=0.857-0.872) over just using real data alone (AUROC=0.791, 95% CI=0.781-0.800). These results suggest a network-based digital twin strategy using synthetic patients may add value to precision medicine efforts.


Assuntos
Algoritmos , Biologia Computacional , Humanos , Análise por Conglomerados , Medicina de Precisão
4.
Comput Toxicol ; 252023 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-37829618

RESUMO

Adverse outcome pathways provide a powerful tool for understanding the biological signaling cascades that lead to disease outcomes following toxicity. The framework outlines downstream responses known as key events, culminating in a clinically significant adverse outcome as a final result of the toxic exposure. Here we use the AOP framework combined with artificial intelligence methods to gain novel insights into genetic mechanisms that underlie toxicity-mediated adverse health outcomes. Specifically, we focus on liver cancer as a case study with diverse underlying mechanisms that are clinically significant. Our approach uses two complementary AI techniques: Generative modeling via automated machine learning and genetic algorithms, and graph machine learning. We used data from the US Environmental Protection Agency's Adverse Outcome Pathway Database (AOP-DB; aopdb.epa.gov) and the UK Biobank's genetic data repository. We use the AOP-DB to extract disease-specific AOPs and build graph neural networks used in our final analyses. We use the UK Biobank to retrieve real-world genotype and phenotype data, where genotypes are based on single nucleotide polymorphism data extracted from the AOP-DB, and phenotypes are case/control cohorts for the disease of interest (liver cancer) corresponding to those adverse outcome pathways. We also use propensity score matching to appropriately sample based on important covariates (demographics, comorbidities, and social deprivation indices) and to balance the case and control populations in our machine language training/testing datasets. Finally, we describe a novel putative risk factor for LC that depends on genetic variation in both the aryl-hydrocarbon receptor (AHR) and ATP binding cassette subfamily B member 11 (ABCB11) genes.

5.
medRxiv ; 2023 Aug 04.
Artigo em Inglês | MEDLINE | ID: mdl-37577697

RESUMO

Motivation: Genome-Wide Association Studies (GWAS) commonly assume phenotypic and genetic homogeneity that is not present in complex conditions. We designed Transformative Regression Analysis of Combined Effects (TRACE), a GWAS methodology that better accounts for clinical phenotype heterogeneity and identifies gene-by-environment (GxE) interactions. We demonstrated with UK Biobank (UKB) data that TRACE increased the variance explained in All-Cause Heart Failure (AHF) via the discovery of novel single nucleotide polymorphism (SNP) and SNP-by-environment (i.e. GxE) interaction associations. First, we transformed 312 AHF-related ICD10 codes (including AHF) into continuous low-dimensional features (i.e., latent phenotypes) for a more nuanced disease representation. Then, we ran a standard GWAS on our latent phenotypes to discover main effects and identified GxE interactions with target encoding. Genes near associated SNPs subsequently underwent enrichment analysis to explore potential functional mechanisms underlying associations. Latent phenotypes were regressed against their SNP hits and the estimated latent phenotype values were used to measure the amount of AHF variance explained. Results: Our method identified over 100 main GWAS effects that were consistent with prior studies and hundreds of novel gene-by-smoking interactions, which collectively accounted for approximately 10% of AHF variance. This represents an improvement over traditional GWAS whose results account for a negligible proportion of AHF variance. Enrichment analyses suggested that hundreds of miRNAs mediated the SNP effect on various AHF-related biological pathways. The TRACE framework can be applied to decode the genetics of other complex diseases. Availability: All code is available at https://github.com/EpistasisLab/latent_phenotype_project.

6.
J Mol Med (Berl) ; 100(9): 1341-1353, 2022 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-35986225

RESUMO

Idiopathic pulmonary fibrosis (IPF) is a chronic, progressive, fibrosing interstitial pneumonia of unknown etiology. The role of genetic risk factors has been the focus of numerous studies probing for associations of genetic variants with IPF. We aimed to determine whether single-nucleotide polymorphisms (SNPs) of four candidate genes are associated with IPF susceptibility and survival in a Portuguese population. A retrospective case-control study was performed with 64 IPF patients and 74 healthy controls. Ten single-nucleotide variants residing in the MUC5B, TOLLIP, SERPINB1, and PLAU genes were analyzed. Single- and multi-locus analyses were performed to investigate the predictive potential of specific variants in IPF susceptibility and survival. Multifactor dimensionality reduction (MDR) was employed to uncover predictive multi-locus interactions underlying IPF susceptibility. The MUC5B rs35705950 SNP was significantly associated with IPF: T allele carriers were significantly more frequent among IPF patients (75.0% vs 20.3%, P < 1.0 × 10-6). Genotypic and allelic distributions of TOLLIP, PLAU, and SERPINB1 SNPs did not differ significantly between groups. However, the MUC5B-TOLLIP T-C-T-C haplotype, defined by the rs35705950-rs111521887-rs5743894-rs5743854 block, emerged as an independent protective factor in IPF survival (HR = 0.37, 95% CI 0.17-0.78, P = 0.009, after adjustment for FVC). No significant multi-locus interactions correlating with disease susceptibility were detected. MUC5B rs35705950 was linked to an increased risk for IPF, as reported for other populations, but not to disease survival. A haplotype incorporating SNPs of the MUC5B-TOLLIP locus at 11p15.5 seems to predict better survival and could prove useful for prognostic purposes and IPF patient stratification. KEY MESSAGES : The MUC5B rs35705950 minor allele is associated with IPF risk in the Portuguese. No predictive multi-locus interactions of IPF susceptibility were identified by MDR. A haplotype defined by MUC5B and TOLLIP SNPs is a protective factor in IPF survival. The haplotype may be used as a prognostic tool for IPF patient stratification.


Assuntos
Fibrose Pulmonar Idiopática , Serpinas , Humanos , Estudos de Casos e Controles , Predisposição Genética para Doença , Fibrose Pulmonar Idiopática/genética , Polimorfismo de Nucleotídeo Único , Estudos Retrospectivos , Serpinas/genética
7.
IEEE/ACM Trans Comput Biol Bioinform ; 19(2): 1165-1172, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-32991288

RESUMO

Lung cancer is the leading cause of cancer deaths. Low-dose computed tomography (CT)screening has been shown to significantly reduce lung cancer mortality but suffers from a high false positive rate that leads to unnecessary diagnostic procedures. The development of deep learning techniques has the potential to help improve lung cancer screening technology. Here we present the algorithm, DeepScreener, which can predict a patient's cancer status from a volumetric lung CT scan. DeepScreener is based on our model of Spatial Pyramid Pooling, which ranked 16th of 1972 teams (top 1 percent)in the Data Science Bowl 2017 competition (DSB2017), evaluated with the challenge datasets. Here we test the algorithm with an independent set of 1449 low-dose CT scans of the National Lung Screening Trial (NLST)cohort, and we find that DeepScreener has consistent performance of high accuracy. Furthermore, by combining Spatial Pyramid Pooling and 3D Convolution, it achieves an AUC of 0.892, surpassing the previous state-of-the-art algorithms using only 3D convolution. The advancement of deep learning algorithms can potentially help improve lung cancer detection with low-dose CT scans.


Assuntos
Detecção Precoce de Câncer , Neoplasias Pulmonares , Algoritmos , Detecção Precoce de Câncer/métodos , Humanos , Pulmão , Neoplasias Pulmonares/diagnóstico por imagem , Tomografia Computadorizada por Raios X
8.
Hum Genomics ; 15(1): 70, 2021 12 13.
Artigo em Inglês | MEDLINE | ID: mdl-34903281

RESUMO

The genetic basis of phenotypic variation across populations has not been well explained for most traits. Several factors may cause disparities, from variation in environments to divergent population genetic structure. We hypothesized that a population-level polygenic risk score (PRS) can explain phenotypic variation among geographic populations based solely on risk allele frequencies. We applied a population-specific PRS (psPRS) to 26 populations from the 1000 Genomes to four phenotypes: lactase persistence (LP), melanoma, multiple sclerosis (MS) and height. Our models assumed additive genetic architecture among the polymorphisms in the psPRSs, as is convention. Linear psPRSs explained a significant proportion of trait variance ranging from 0.32 for height in men to 0.88 for melanoma. The best models for LP and height were linear, while those for melanoma and MS were nonlinear. As not all variants in a PRS may confer similar, or even any, risk among diverse populations, we also filtered out SNPs to assess whether variance explained was improved using psPRSs with fewer SNPs. Variance explained usually improved with fewer SNPs in the psPRS and was as high as 0.99 for height in men using only 548 of the initial 4208 SNPs. That reducing SNPs improves psPRSs performance may indicate that missing heritability is partially due to complex architecture that does not mandate additivity, undiscovered variants or spurious associations in the databases. We demonstrated that PRS-based analyses can be used across diverse populations and phenotypes for population prediction and that these comparisons can identify the universal risk variants.


Assuntos
Herança Multifatorial , Polimorfismo de Nucleotídeo Único , Estudo de Associação Genômica Ampla , Humanos , Herança Multifatorial/genética , Fenótipo , Polimorfismo de Nucleotídeo Único/genética , Prevalência , Fatores de Risco
9.
Eur J Heart Fail ; 23(12): 2021-2032, 2021 12.
Artigo em Inglês | MEDLINE | ID: mdl-34632675

RESUMO

AIMS: Enhanced risk stratification of patients with aortic stenosis (AS) is necessary to identify patients at high risk for adverse outcomes, and may allow for better management of patient subgroups at high risk of myocardial damage. The objective of this study was to identify plasma biomarkers and multimarker profiles associated with adverse outcomes in AS. METHODS AND RESULTS: We studied 708 patients with calcific AS and measured 49 biomarkers using a Luminex platform. We studied the correlation between biomarkers and the risk of (i) death and (ii) death or heart failure-related hospital admission (DHFA). We also utilized machine-learning methods (a tree-based pipeline optimizer platform) to develop multimarker models associated with the risk of death and DHFA. In this cohort with a median follow-up of 2.8 years, multiple biomarkers were significantly predictive of death in analyses adjusted for clinical confounders, including tumour necrosis factor (TNF)-α [hazard ratio (HR) 1.28, P < 0.0001], TNF receptor 1 (TNFRSF1A; HR 1.38, P < 0.0001), fibroblast growth factor (FGF)-23 (HR 1.22, P < 0.0001), N-terminal pro B-type natriuretic peptide (NT-proBNP) (HR 1.58, P < 0.0001), matrix metalloproteinase-7 (HR 1.24, P = 0.0002), syndecan-1 (HR 1.27, P = 0.0002), suppression of tumorigenicity-2 (ST2) (IL1RL1; HR 1.22, P = 0.0002), interleukin (IL)-8 (CXCL8; HR 1.22, P = 0.0005), pentraxin (PTX)-3 (HR 1.17, P = 0.001), neutrophil gelatinase-associated lipocalin (LCN2; HR 1.18, P < 0.0001), osteoprotegerin (OPG) (TNFRSF11B; HR 1.26, P = 0.0002), and endostatin (COL18A1; HR 1.28, P = 0.0012). Several biomarkers were also significantly predictive of DHFA in adjusted analyses including FGF-23 (HR 1.36, P < 0.0001), TNF-α (HR 1.26, P < 0.0001), TNFR1 (HR 1.34, P < 0.0001), angiopoietin-2 (HR 1.26, P < 0.0001), syndecan-1 (HR 1.23, P = 0.0006), ST2 (HR 1.27, P < 0.0001), IL-8 (HR 1.18, P = 0.0009), PTX-3 (HR 1.18, P = 0.0002), OPG (HR 1.20, P = 0.0013), and NT-proBNP (HR 1.63, P < 0.0001). Machine-learning multimarker models were strongly associated with adverse outcomes (mean 1-year probability of death of 0%, 2%, and 60%; mean 1-year probability of DHFA of 0%, 4%, 97%; P < 0.0001). In these models, IL-6 (a biomarker of inflammation) and FGF-23 (a biomarker of calcification) emerged as the biomarkers of highest importance. CONCLUSIONS: Plasma biomarkers are strongly associated with the risk of adverse outcomes in patients with AS. Biomarkers of inflammation and calcification were most strongly related to prognosis.


Assuntos
Estenose da Valva Aórtica , Calcinose , Insuficiência Cardíaca , Biomarcadores , Humanos , Peptídeo Natriurético Encefálico , Fragmentos de Peptídeos , Prognóstico
10.
J Palliat Care ; 36(2): 87-92, 2021 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-31187695

RESUMO

INTRODUCTION: Studies have shown aggressive cancer care at the end of life is associated with decreased quality of life, decreased median survival, and increased cost of care. This study describes the patients most likely to receive systemic anticancer therapy at the end of life in a community cancer institute. MATERIALS AND METHODS: We performed a retrospective cohort study of 201 patients who received systemic anticancer therapy in our institution and died between July 2016 and April 2017. Data collected included primary malignancy, hospice enrollment, healthcare utilization, Oncology Care Model (OCM) enrollment, and clinical assessments at last office visit prior to a treatment decision before death. We defined our outcome variable as the receipt of anticancer treatment in the last 14 days of a patient's life. We evaluated 20 clinical exposure variables with respect to the outcome classes. Risk ratios along with their associated confidence intervals and P values were calculated. Significance was determined using the Benjamini-Hochberg procedure to account for multiple testing. RESULTS: Of the 201 patients who died of cancer, 36 (17%) received anticancer therapy within the last 14 days of life. Several risk factors were significantly positively associated with receiving anticancer therapy at the end of life including hospitalization within 30 days of end of life, number of hospitalizations per patient (≥2), death in hospital, enrollment in OCM, and a diagnosis of hematologic malignancy. CONCLUSION: Our findings demonstrate those enrolled in the OCM and those with hematologic malignancies have a higher risk of receiving anticancer therapy in the last 14 days of life. These observations highlight the need for better identifying the needs of high-risk patients and providing good quality care throughout the disease trajectory to better align end-of-life care with patients' wishes.


Assuntos
Neoplasias , Assistência Terminal , Morte , Hospitalização , Humanos , Neoplasias/terapia , Cuidados Paliativos , Qualidade de Vida , Estudos Retrospectivos
11.
Blood Adv ; 4(20): 5174-5183, 2020 10 27.
Artigo em Inglês | MEDLINE | ID: mdl-33095872

RESUMO

Chimeric antigen receptor (CAR) T-cells directed against CD19 have drastically altered outcomes for children with relapsed and refractory acute lymphoblastic leukemia (r/r ALL). Pediatric patients with r/r ALL treated with CAR-T are at increased risk of both cytokine release syndrome (CRS) and sepsis. We sought to investigate the biologic differences between CRS and sepsis and to develop predictive models which could accurately differentiate CRS from sepsis at the time of critical illness. We identified 23 different cytokines that were significantly different between patients with sepsis and CRS. Using elastic net prediction modeling and tree classification, we identified cytokines that were able to classify subjects as having CRS or sepsis accurately. A markedly elevated interferon γ (IFNγ) or a mildly elevated IFNγ in combination with a low IL1ß were associated with CRS. A normal to mildly elevated IFNγ in combination with an elevated IL1ß was associated with sepsis. This combination of IFNγ and IL1ß was able to categorize subjects as having CRS or sepsis with 97% accuracy. As CAR-T therapies become more common, these data provide important novel information to better manage potential associated toxicities.


Assuntos
Leucemia-Linfoma Linfoblástico de Células Precursoras , Sepse , Criança , Estado Terminal , Síndrome da Liberação de Citocina , Humanos , Receptores de Antígenos de Linfócitos T , Sepse/diagnóstico
12.
Database (Oxford) ; 20202020 01 01.
Artigo em Inglês | MEDLINE | ID: mdl-32701164

RESUMO

Exposure to hydraulic fracturing fluid in drinking water increases the risk of many adverse health outcomes. Unfortunately, most individuals and researchers are unaware of the health risks posed by a particular well due to the diversity of chemical ingredients used across sites. We constructed WellExplorer (http://WellExplorer.org), an interactive tool for researchers and community members to use for retrieving information regarding the hormonal, testosterone and estrogen modulators located at each well. We found that wells in Alabama use a disproportionately high number of ingredients targeting estrogen pathways, while Illinois, Ohio and Pennsylvania use a disproportionately high number of ingredients targeting testosterone pathways. Researchers can utilize WellExplorer to study health outcomes related to exposure to fracturing chemicals in their population-based cohorts. Community members can use this resource to search their home or work locations (e.g. town or zip code) to determine proximity between where they live or work and specific hormonal exposures.


Assuntos
Bases de Dados Factuais , Exposição Ambiental , Hormônios/metabolismo , Fraturamento Hidráulico , Poluentes Químicos da Água , Geografia Médica , Humanos , Estados Unidos
13.
J Am Coll Cardiol ; 75(11): 1281-1295, 2020 03 24.
Artigo em Inglês | MEDLINE | ID: mdl-32192654

RESUMO

BACKGROUND: Better risk stratification strategies are needed to enhance clinical care and trial design in heart failure with preserved ejection fraction (HFpEF). OBJECTIVES: The purpose of this study was to assess the value of a targeted plasma multi-marker approach to enhance our phenotypic characterization and risk prediction in HFpEF. METHODS: In this study, the authors measured 49 plasma biomarkers from TOPCAT (Treatment of Preserved Cardiac Function Heart Failure With an Aldosterone Antagonist) trial participants (n = 379) using a Multiplex assay. The relationship between biomarkers and the risk of all-cause death or heart failure-related hospital admission (DHFA) was assessed. A tree-based pipeline optimizer platform was used to generate a multimarker predictive model for DHFA. We validated the model in an independent cohort of HFpEF patients enrolled in the PHFS (Penn Heart Failure Study) (n = 156). RESULTS: Two large, tightly related dominant biomarker clusters were found, which included biomarkers of fibrosis/tissue remodeling, inflammation, renal injury/dysfunction, and liver fibrosis. Other clusters were composed of neurohormonal regulators of mineral metabolism, intermediary metabolism, and biomarkers of myocardial injury. Multiple biomarkers predicted incident DHFA, including 2 biomarkers related to mineral metabolism/calcification (fibroblast growth factor-23 and OPG [osteoprotegerin]), 3 inflammatory biomarkers (tumor necrosis factor-alpha, sTNFRI [soluble tumor necrosis factor-receptor I], and interleukin-6), YKL-40 (related to liver injury and inflammation), 2 biomarkers related to intermediary metabolism and adipocyte biology (fatty acid binding protein-4 and growth differentiation factor-15), angiopoietin-2 (related to angiogenesis), matrix metalloproteinase-7 (related to extracellular matrix turnover), ST-2, and N-terminal pro-B-type natriuretic peptide. A machine-learning-derived model using a combination of biomarkers was strongly predictive of the risk of DHFA (standardized hazard ratio: 2.85; 95% confidence interval: 2.03 to 4.02; p < 0.0001) and markedly improved the risk prediction when added to the MAGGIC (Meta-Analysis Global Group in Chronic Heart Failure Risk Score) risk score. In an independent cohort (PHFS), the model strongly predicted the risk of DHFA (standardized hazard ratio: 2.74; 95% confidence interval: 1.93 to 3.90; p < 0.0001), which was also independent of the MAGGIC risk score. CONCLUSIONS: Various novel circulating biomarkers in key pathophysiological domains are predictive of outcomes in HFpEF, and a multimarker approach coupled with machine-learning represents a promising strategy for enhancing risk stratification in HFpEF.


Assuntos
Biomarcadores/sangue , Insuficiência Cardíaca/sangue , Aprendizado de Máquina , Idoso , Feminino , Insuficiência Cardíaca/mortalidade , Humanos , Masculino , Pessoa de Meia-Idade , Medição de Risco , Estados Unidos/epidemiologia
14.
Artif Life ; 26(1): 23-37, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-32027528

RESUMO

Susceptibility to common human diseases such as cancer is influenced by many genetic and environmental factors that work together in a complex manner. The state of the art is to perform a genome-wide association study (GWAS) that measures millions of single-nucleotide polymorphisms (SNPs) throughout the genome followed by a one-SNP-at-a-time statistical analysis to detect univariate associations. This approach has identified thousands of genetic risk factors for hundreds of diseases. However, the genetic risk factors detected have very small effect sizes and collectively explain very little of the overall heritability of the disease. Nonetheless, it is assumed that the genetic component of risk is due to many independent risk factors that contribute additively. The fact that many genetic risk factors with small effects can be detected is taken as evidence to support this notion. It is our working hypothesis that the genetic architecture of common diseases is partly driven by non-additive interactions. To test this hypothesis, we developed a heuristic simulation-based method for conducting experiments about the complexity of genetic architecture. We show that a genetic architecture driven by complex interactions is highly consistent with the magnitude and distribution of univariate effects seen in real data. We compare our results with measures of univariate and interaction effects from two large-scale GWASs of sporadic breast cancer and find evidence to support our hypothesis that is consistent with the results of our computational experiment.


Assuntos
Biologia Computacional , Doença/genética , Estudo de Associação Genômica Ampla , Polimorfismo de Nucleotídeo Único , Simulação por Computador , Humanos
15.
Genet Epidemiol ; 44(1): 52-66, 2020 01.
Artigo em Inglês | MEDLINE | ID: mdl-31583758

RESUMO

Genetic interactions have been recognized as a potentially important contributor to the heritability of complex diseases. Nevertheless, due to small effect sizes and stringent multiple-testing correction, identifying genetic interactions in complex diseases is particularly challenging. To address the above challenges, many genomic research initiatives collaborate to form large-scale consortia and develop open access to enable sharing of genome-wide association study (GWAS) data. Despite the perceived benefits of data sharing from large consortia, a number of practical issues have arisen, such as privacy concerns on individual genomic information and heterogeneous data sources from distributed GWAS databases. In the context of large consortia, we demonstrate that the heterogeneously appearing marginal effects over distributed GWAS databases can offer new insights into genetic interactions for which conventional methods have had limited success. In this paper, we develop a novel two-stage testing procedure, named phylogenY-based effect-size tests for interactions using first 2 moments (YETI2), to detect genetic interactions through both pooled marginal effects, in terms of averaging site-specific marginal effects, and heterogeneity in marginal effects across sites, using a meta-analytic framework. YETI2 can not only be applied to large consortia without shared personal information but also can be used to leverage underlying heterogeneity in marginal effects to prioritize potential genetic interactions. We investigate the performance of YETI2 through simulation studies and apply YETI2 to bladder cancer data from dbGaP.


Assuntos
Epistasia Genética/genética , Estudo de Associação Genômica Ampla/métodos , Neoplasias da Bexiga Urinária/genética , Humanos , Disseminação de Informação , Modelos Genéticos , Polimorfismo de Nucleotídeo Único/genética
16.
J Am Med Inform Assoc ; 27(2): 244-253, 2020 02 01.
Artigo em Inglês | MEDLINE | ID: mdl-31617899

RESUMO

OBJECTIVES: The ability to identify novel risk factors for health outcomes is a key strength of electronic health record (EHR)-based research. However, the validity of such studies is limited by error in EHR-derived phenotypes. The objective of this study was to develop a novel procedure for reducing bias in estimated associations between risk factors and phenotypes in EHR data. MATERIALS AND METHODS: The proposed method combines the strengths of a gold-standard phenotype obtained through manual chart review for a small validation set of patients and an automatically-derived phenotype that is available for all patients but is potentially error-prone (hereafter referred to as the algorithm-derived phenotype). An augmented estimator of associations is obtained by optimally combining these 2 phenotypes. We conducted simulation studies to evaluate the performance of the augmented estimator and conducted an analysis of risk factors for second breast cancer events using data on a cohort from Kaiser Permanente Washington. RESULTS: The proposed method was shown to reduce bias relative to an estimator using only the algorithm-derived phenotype and reduce variance compared to an estimator using only the validation data. DISCUSSION: Our simulation studies and real data application demonstrate that, compared to the estimator using validation data only, the augmented estimator has lower variance (ie, higher statistical efficiency). Compared to the estimator using error-prone EHR-derived phenotypes, the augmented estimator has smaller bias. CONCLUSIONS: The proposed estimator can effectively combine an error-prone phenotype with gold-standard data from a limited chart review in order to improve analyses of risk factors using EHR data.


Assuntos
Algoritmos , Registros Eletrônicos de Saúde/classificação , Viés , Data Warehousing , Humanos
17.
BMC Med Inform Decis Mak ; 19(Suppl 4): 147, 2019 08 08.
Artigo em Inglês | MEDLINE | ID: mdl-31391106

RESUMO

BACKGROUND: Hepatitis C affects about 3 % of the world's population. In the United States, about 3.5 million have chronic hepatitis C, and it is the leading cause of liver cancer and the most common indication for liver transplantation. In the last decades, new advances in therapy have substantially increased the cure rate of hepatitis C to more than 95% with the use of antiviral agents. However, drug safety of the new treatments remains one of the major concerns. Data from the US Food and Drug Administration (FDA) Adverse Event Reporting System (FAERS) and the Electronic Health Record (EHR) systems provide crucial post-market information to evaluate drug safety. Currently, quantitative evidence of drug safety of hepatitis C treatments based on post-market data are still limited, and there is also a lack of a standard statistical procedure to systematically compare drug safety across multiple drugs using FAERS and EHR. METHOD: In this study, we presented a statistical procedure to compare the difference in adverse events (AE) across multiple hepatitis C drugs using data from FAERS and EHR, and to assess the consistency of results from two data bases. Through three major steps, including descriptive comparison, testing for difference among groups, and quantification of association, the proposed method can provide a quantitative comparison on safety of multiple drugs. Specifically, we compared drugs that were approved by FDA to treat hepatitis C before 2011versus those approved after 2013. We used spontaneous AE reports submitted between 2004 to 2015 from FAERS data base and medical records between 1999 to 2015 from the Cerner health facts data base to estimate and compare the rate of AE after drug use. RESULT: We studied 30 most frequently reported AEs after treatment of hepatitis C, comparing the difference between drugs approved before 2011versus those approved after 2013. Our results showed that there was difference in rate of AE between the two groups of treatment. We reported the AEs that have significant statistical difference, and estimate the difference attributable to variation of age and gender between the two groups of drug users. Our findings are consistent with results in existing literature. Moreover, we compared the results obtained from FAERS data and EHR data, and evaluated the consistency of evidence. CONCLUSION: The proposed procedure is a general and standardized pipeline that can be used to compare and visualize drug safety among multiple drugs to support regulatory decision-makings using post-market data. We showed that there was statistically significant difference in AE rates between the new and old therapies for hepatitis C. We showed that both FAERS and EHR contained large information for research of post-market drug safety, but each has its own strength and limitations. Cautions should be taken when combining evidence from the two data resources and there is a need of more sophisticated informatics and statistical tools for evidence synthesis.


Assuntos
Antivirais/efeitos adversos , Efeitos Colaterais e Reações Adversas Relacionados a Medicamentos/epidemiologia , Hepatite C/tratamento farmacológico , Sistemas de Notificação de Reações Adversas a Medicamentos , Bases de Dados Factuais , Humanos , Vigilância de Produtos Comercializados , Estados Unidos , United States Food and Drug Administration
18.
BioData Min ; 12: 14, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-31320928

RESUMO

BACKGROUND: The principal line of investigation in Genome Wide Association Studies (GWAS) is the identification of main effects, that is individual Single Nucleotide Polymorphisms (SNPs) which are associated with the trait of interest, independent of other factors. A variety of methods have been proposed to this end, mostly statistical in nature and differing in assumptions and type of model employed. Moreover, for a given model, there may be multiple choices for the SNP genotype encoding. As an alternative to statistical methods, machine learning methods are often applicable. Typically, for a given GWAS, a single approach is selected and utilized to identify potential SNPs of interest. Even when multiple GWAS are combined through meta-analyses within a consortium, each GWAS is typically analyzed with a single approach and the resulting summary statistics are then utilized in meta-analyses. RESULTS: In this work we use as case studies a Type 2 Diabetes (T2D) and a breast cancer GWAS to explore a diversity of applicable approaches spanning different methods and encoding choices. We assess similarity of these approaches based on the derived ranked lists of SNPs and, for each GWAS, we identify a subset of representative approaches that we use as an ensemble to derive a union list of top SNPs. Among these are SNPs which are identified by multiple approaches as well as several SNPs identified by only one or a few of the less frequently used approaches. The latter include SNPs from established loci and SNPs which have other supporting lines of evidence in terms of their potential relevance to the traits. CONCLUSIONS: Not every main effect analysis method is suitable for every GWAS, but for each GWAS there are typically multiple applicable methods and encoding options. We suggest a workflow for a single GWAS, extensible to multiple GWAS from consortia, where representative approaches are selected among a pool of suitable options, to yield a more comprehensive set of SNPs, potentially including SNPs that would typically be missed with the most popular analyses, but that could provide additional valuable insights for follow-up.

19.
Front Artif Intell ; 2: 12, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-33733101

RESUMO

Artificial intelligence (AI) has emerged as a powerful approach for integrated analysis of the rapidly growing volume of multi-omics data, including many research and clinical tasks such as prediction of disease risk and identification of potential therapeutic targets. However, the potential for AI to facilitate the identification of factors contributing to human exceptional health and life span and their translation into novel interventions for enhancing health and life span has not yet been realized. As researchers on aging acquire large scale data both in human cohorts and model organisms, emerging opportunities exist for the application of AI approaches to untangle the complex physiologic process(es) that modulate health and life span. It is expected that efficient and novel data mining tools that could unravel molecular mechanisms and causal pathways associated with exceptional health and life span could accelerate the discovery of novel therapeutics for healthy aging. Keeping this in mind, the National Institute on Aging (NIA) convened an interdisciplinary workshop titled "Contributions of Artificial Intelligence to Research on Determinants and Modulation of Health Span and Life Span" in August 2018. The workshop involved experts in the fields of aging, comparative biology, cardiology, cancer, and computational science/AI who brainstormed ideas on how AI can be leveraged for the analyses of large-scale data sets from human epidemiological studies and animal/model organisms to close the current knowledge gaps in processes that drive exceptional life and health span. This report summarizes the discussions and recommendations from the workshop on future application of AI approaches to advance our understanding of human health and life span.

20.
Circ Genom Precis Med ; 11(8): e001977, 2018 08.
Artigo em Inglês | MEDLINE | ID: mdl-30354342

RESUMO

BACKGROUND: Genome-wide association studies have identified multiple loci associated with coronary artery disease and myocardial infarction, but only a few of these loci are current targets for on-market medications. To identify drugs suitable for repurposing and their targets, we created 2 unique pipelines integrating public data on 49 coronary artery disease/myocardial infarction-genome-wide association studies loci, drug-gene interactions, side effects, and chemical interactions. METHODS: We first used publicly available genome-wide association studies results on all phenotypes to predict relevant side effects, identified drug-gene interactions, and prioritized candidates for repurposing among existing drugs. Second, we prioritized gene product targets by calculating a druggability score to estimate how accessible pockets of coronary artery disease/myocardial infarction-associated gene products are, then used again the genome-wide association studies results to predict side effects, excluded loci with widespread cross-tissue expression to avoid housekeeping and genes involved in vital processes and accordingly ranked the remaining gene products. RESULTS: These pipelines ultimately led to 3 suggestions for drug repurposing: pentolinium, adenosine triphosphate, and riociguat (to target CHRNB4, ACSS2, and GUCY1A3, respectively); and 3 proteins for drug development: LMOD1 (leiomodin 1), HIP1 (huntingtin-interacting protein 1), and PPP2R3A (protein phosphatase 2, regulatory subunit b-double prime, α). Most current therapies for coronary artery disease/myocardial infarction treatment were also rediscovered. CONCLUSIONS: Integration of genomic and pharmacological data may prove beneficial for drug repurposing and development, as evidence from our pipelines suggests.


Assuntos
Fármacos Cardiovasculares , Doença da Artéria Coronariana/tratamento farmacológico , Doença da Artéria Coronariana/genética , Descoberta de Drogas/métodos , Reposicionamento de Medicamentos/métodos , Loci Gênicos , Terapia de Alvo Molecular/métodos , Algoritmos , Animais , Fármacos Cardiovasculares/farmacocinética , Fármacos Cardiovasculares/uso terapêutico , Avaliação Pré-Clínica de Medicamentos/métodos , Interações Medicamentosas/genética , Efeitos Colaterais e Reações Adversas Relacionados a Medicamentos/epidemiologia , Efeitos Colaterais e Reações Adversas Relacionados a Medicamentos/genética , Interação Gene-Ambiente , Predisposição Genética para Doença , Estudo de Associação Genômica Ampla , Humanos , Simulação de Acoplamento Molecular , Testes Farmacogenômicos , Polimorfismo de Nucleotídeo Único , Fatores de Risco
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA