RESUMEN
BACKGROUND AND OBJECTIVE: The reconstruction of gene regulatory networks (GRNs) stands as a vital approach in deciphering complex biological processes. The application of nonlinear ordinary differential equations (ODEs) models has demonstrated considerable efficacy in predicting GRNs. Notably, the decay rate and time delay are pivotal in authentic gene regulation, yet their systematic determination in ODEs models remains underexplored. The development of a comprehensive optimization framework for the effective estimation of these key parameters is essential for accurate GRN inference. METHOD: This study introduces GRNMOPT, an innovative methodology for inferring GRNs from time-series and steady-state data. GRNMOPT employs a combined use of decay rate and time delay in constructing ODEs models to authentically represent gene regulatory processes. It incorporates a multi-objective optimization approach, optimizing decay rate and time delay concurrently to derive Pareto optimal sets for these factors, thereby maximizing accuracy metrics such as AUROC (Area Under the Receiver Operating Characteristic curve) and AUPR (Area Under the Precision-Recall curve). Additionally, the use of XGBoost for calculating feature importance aids in identifying potential regulatory gene links. RESULTS: Comprehensive experimental evaluations on two simulated datasets from DREAM4 and three real gene expression datasets (Yeast, In vivo Reverse-engineering and Modeling Assessment [IRMA], and Escherichia coli [E. coli]) reveal that GRNMOPT performs commendably across varying network scales. Furthermore, cross-validation experiments substantiate the robustness of GRNMOPT. CONCLUSION: We propose a novel approach called GRNMOPT to infer GRNs based on a multi-objective optimization framework, which effectively improves inference accuracy and provides a powerful tool for GRNs inference.
RESUMEN
This work used headspace solid-phase microextraction with gas chromatography-mass spectrometry (HS-SPME-GC-MS) to analyze the volatile components of hydrosols of Citrus × aurantium 'Daidai' and Citrus × aurantium L. dried buds (CAVAs and CADBs) by immersion and ultrasound-microwave synergistic-assisted steam distillation. The results show that a total of 106 volatiles were detected in hydrosols, mainly alcohols, alkenes, and esters, and the high content components of hydrosols were linalool, α-terpineol, and trans-geraniol. In terms of variety, the total and unique components of CAVA hydrosols were much higher than those of CADB hydrosols; the relative contents of 13 components of CAVA hydrosols were greater than those of CADB hydrosols, with geranyl acetate up to 15-fold; all hydrosols had a citrus, floral, and woody aroma. From the pretreatment, more volatile components were retained in the immersion; the relative contents of linalool and α-terpineol were increased by the ultrasound-microwave procedure; and the ultrasound-microwave procedure was favorable for the stimulation of the aroma of CAVA hydrosols, but it diminished the aroma of the CADB hydrosols. This study provides theoretical support for in-depth exploration based on the medicine food homology properties of CAVA and for improving the utilization rate of waste resources.
Asunto(s)
Monoterpenos Acíclicos , Citrus , Monoterpenos Ciclohexánicos , Cromatografía de Gases y Espectrometría de Masas , Microextracción en Fase Sólida , Compuestos Orgánicos Volátiles , Cromatografía de Gases y Espectrometría de Masas/métodos , Citrus/química , Microextracción en Fase Sólida/métodos , Compuestos Orgánicos Volátiles/análisis , Compuestos Orgánicos Volátiles/química , Compuestos Orgánicos Volátiles/aislamiento & purificación , Monoterpenos Acíclicos/análisis , Monoterpenos Ciclohexánicos/análisis , Terpenos/análisis , Terpenos/química , Monoterpenos/análisis , Monoterpenos/aislamiento & purificación , Odorantes/análisis , Destilación/métodos , AcetatosRESUMEN
The morphological variation in Schizothorax oconnori, Schizothorax waltoni, and their natural hybrids was examined using conventional and image-based analysis approaches. In total, 38 specimens of S. oconnori, 35 of S. waltoni, and 37 natural hybrids were collected from the Shigatse to the Lhasa section of the Yarlung Zangbo River during June and July 2021. A total of 21 morphometric, 4 meristic, and 27 truss variables were employed for the classification of S. oconnori, S. waltoni, and natural hybrids. Principal component analysis (PCA) and factor analysis (FA), as well as discriminant function analysis (DFA) and cluster analysis (CA), were conducted to identify differences based on traditional and truss measurements. Four principal components explained 75.92% of the variation among the morphometric characters, while five principal components accounted for 79.69% of the variation among the truss distances. FA results showed that factor 1 was associated with head shape, and factor 2 was associated with fins based on morphometric characters. Among the truss characters, factor 1 was related to head shape, and factor 2 was related to chest shape. In DFA, morphometric measurements achieved higher accuracy (100%) compared to truss distances (94.55%). The head morphology of hybrids exhibited intermediate traits between S. oconnori and S. waltoni. Both morphometry-based and truss-based clustering indicated that the morphology of natural hybrids leaned toward S. oconnori. In conclusion, the combination of morphometric and truss analysis is beneficial for classifying S. oconnori, S. waltoni, and their natural hybrids. The presence of natural hybrids could be considered an evolutionary response to the differentiation of nutritional and spatial niches in the middle Yarlung Zangbo River.
RESUMEN
Aim: To predict base-resolution DNA methylation in cancerous and paracancerous tissues. Material & methods: We collected six cancer DNA methylation datasets from The Cancer Genome Atlas and five cancer datasets from Gene Expression Omnibus and established machine learning models using paired cancerous and paracancerous tissues. Tenfold cross-validation and independent validation were performed to demonstrate the effectiveness of the proposed method. Results: The developed cross-tissue prediction models can substantially increase the accuracy at more than 68% of CpG sites and contribute to enhancing the statistical power of differential methylation analyses. An XGBoost model leveraging multiple correlating CpGs may elevate the prediction accuracy. Conclusion: This study provides a powerful tool for DNA methylation analysis and has the potential to gain new insights into cancer research from epigenetics.
RESUMEN
MOTIVATION: Gene regulatory networks (GRNs) are a way of describing the interaction between genes, which contribute to revealing the different biological mechanisms in the cell. Reconstructing GRNs based on gene expression data has been a central computational problem in systems biology. However, due to the high dimensionality and non-linearity of large-scale GRNs, accurately and efficiently inferring GRNs is still a challenging task. RESULTS: In this article, we propose a new approach, iLSGRN, to reconstruct large-scale GRNs from steady-state and time-series gene expression data based on non-linear ordinary differential equations. Firstly, the regulatory gene recognition algorithm calculates the Maximal Information Coefficient between genes and excludes redundant regulatory relationships to achieve dimensionality reduction. Then, the feature fusion algorithm constructs a model leveraging the feature importance derived from XGBoost (eXtreme Gradient Boosting) and RF (Random Forest) models, which can effectively train the non-linear ordinary differential equations model of GRNs and improve the accuracy and stability of the inference algorithm. The extensive experiments on different scale datasets show that our method makes sensible improvement compared with the state-of-the-art methods. Furthermore, we perform cross-validation experiments on the real gene datasets to validate the robustness and effectiveness of the proposed method. AVAILABILITY AND IMPLEMENTATION: The proposed method is written in the Python language, and is available at: https://github.com/lab319/iLSGRN.
Asunto(s)
Algoritmos , Redes Reguladoras de Genes , Biología de Sistemas , Bosques Aleatorios , Factores de Tiempo , Biología Computacional/métodosRESUMEN
Most kidney cancers are kidney renal clear cell carcinoma (KIRC) that is a main cause of cancer-related deaths. Polygenic risk score (PRS) is a weighted linear combination of phenotypic related alleles on the genome that can be used to assess KIRC risk. However, standalone SNP data as input to the PRS model may not provide satisfactory result. Therefore, Transcriptional risk scores (TRS) based on multi-omics data and machine learning models were proposed to assess the risk of KIRC. First, we collected four types of multi-omics data (DNA methylation, miRNA, mRNA and lncRNA) of KIRC patients from the TCGA database. Subsequently, a novel TRS method utilizing multiple omics data and XGBoost model was developed. Finally, we performed prevalence analysis and prognosis prediction to evaluate the utility of the TRS generated by our method. Our TRS methods exhibited better predictive performance than the linear models and other machine learning models. Furthermore, the prediction accuracy of combined TRS model was higher than that of single-omics TRS model. The KM curves showed that TRS was a valid prognostic indicator for cancer staging. Our proposed method extended the current definition of TRS from standalone SNP data to multi-omics data and was superior to the linear models and other machine learning models, which may provide a useful implement for diagnostic and prognostic prediction of KIRC.
Asunto(s)
Carcinoma de Células Renales , Neoplasias Renales , MicroARNs , Humanos , Carcinoma de Células Renales/diagnóstico , Carcinoma de Células Renales/genética , Carcinoma de Células Renales/patología , Neoplasias Renales/diagnóstico , Neoplasias Renales/genética , Neoplasias Renales/patología , MicroARNs/genética , Factores de Riesgo , Riñón/patologíaRESUMEN
BACKGROUND: Patients with impaired kidney function were found at a high risk of COVID-19 hospitalization and mortality in many observational, cross-sectional, and hospital-based studies, but evidence from large-scale prospective cohorts has been lacking. We aimed to examine the association of kidney function-related biomarkers and their genetic predisposition with the risk of developing severe COVID-19 in population-based data. METHODS: We analyzed data from UK Biobank to examine the prospective association of abnormal kidney function biomarkers with severe COVID-19, defined by laboratory-confirmed COVID-19 hospitalizations. Using genotype data, we constructed polygenic risk scores (PRS) to represent an individual's overall genetic risk for these biomarkers. We also identified tipping points where the risk of severe COVID-19 began to increase significantly for each biomarker. RESULTS: Of the 502,506 adults, 1650 (0.32%) were identified as severe COVID-19, before August 12, 2020. High levels of cystatin C (OR: 1.3; 95% CI: 1.2-1.5; FDR = 1.5 × 10-5 ), serum creatinine (OR: 1.7; 95% CI: 1.3-2.1; p = 3.5 × 10-4 ; FDR = 3.5 × 10-4 ), microalbuminuria (OR: 1.4; 95% CI: 1.2-1.6; FDR = 4 × 10-4 ), and UACR (urinary albumin creatinine ratio; OR: 1.4; 95% CI: 1.2-1.6; p = 3.5 × 10-4 ; FDR = 3.5 × 10-4 ) were found significantly associated with severe COVID-19. Individuals with top 10% of PRS for elevated cystatin C, urate, and microalbuminuria had 28% to 43% higher risks of severe COVID-19 than individuals with bottom 30% PRS (p < 0.05). Tipping-point analyses further supported that severe COVID-19 could occur even when the values of cystatin C, urate (male), and microalbuminuria were within their normal value ranges (OR >1.1, p < 0.05). CONCLUSIONS: Findings from this study might point to new directions for clinicians and policymakers in optimizing risk-stratification among patients based on polygenic risk estimation and tipping points of kidney function markers. Our results call for further investigation to develop a better strategy to prevent severe COVID-19 outcomes among patients with genetic predisposition to impaired kidney function. These findings could provide a new tool for clinicians and policymakers in the future especially if we need to live with COVID-19 for a long time.
Asunto(s)
COVID-19 , Insuficiencia Renal , Adulto , Humanos , Masculino , Cistatina C/orina , COVID-19/genética , Predisposición Genética a la Enfermedad , Estudios Transversales , Ácido Úrico , Albuminuria/genética , Biomarcadores , RiñónRESUMEN
The potential role of DNA methylation from paracancerous tissues in cancer diagnosis has not been explored until now. In this study, we built classification models using well-known machine learning models based on DNA methylation profiles of paracancerous tissues. We evaluated our methods on nine cancer datasets collected from The Cancer Genome Atlas (TCGA) and utilized fivefold cross-validation to assess the performance of models. Additionally, we performed gene ontology (GO) enrichment analysis on the basis of the significant CpG sites selected by feature importance scores of XGBoost model, aiming to identify biological pathways involved in cancer progression. We also exploited the XGBoost algorithm to classify cancer types using DNA methylation profiles of paracancerous tissues in external validation datasets. Comparative experiments suggested that XGBoost achieved better predictive performance than the other four machine learning methods in predicting cancer stage. GO enrichment analysis revealed key pathways involved, highlighting the importance of paracancerous tissues in cancer progression. Furthermore, XGBoost model can accurately classify nine different cancers from TCGA, and the feature sets selected by XGBoost can also effectively predict seven cancer types on independent GEO datasets. This study provided new insights into cancer diagnosis from an epigenetic perspective and may facilitate the development of personalized diagnosis and treatment strategies.
Asunto(s)
Metilación de ADN , Neoplasias , Epigenómica , Humanos , Aprendizaje Automático , Estadificación de Neoplasias , Neoplasias/diagnóstico , Neoplasias/genéticaRESUMEN
The outbreak of coronavirus disease (COVID-19) and its accompanying pandemic have created an unprecedented challenge worldwide. Parametric modeling and analyses of the COVID-19 play a critical role in providing vital information about the character and relevant guidance for controlling the pandemic. However, the epidemiological utility of the results obtained from the COVID-19 transmission model largely depends on accurately identifying parameters. This paper extends the susceptible-exposed-infectious-recovered (SEIR) model and proposes an improved quantum-behaved particle swarm optimization (QPSO) algorithm to estimate its parameters. A new strategy is developed to update the weighting factor of the mean best position by the reciprocal of multiplying the fitness of each best particle with the average fitness of all best particles, which can enhance the global search capacity. To increase the particle diversity, a probability function is designed to generate new particles in the updating iteration. When compared to the state-of-the-art estimation algorithms on the epidemic datasets of China, Italy and the US, the proposed method achieves good accuracy and convergence at a comparable computational complexity. The developed framework would be beneficial for experts to understand the characteristics of epidemic development and formulate epidemic prevention and control measures.
RESUMEN
High levels of circulating estradiol (E2) are associated with increased risk of breast cancer, whereas its relationship with breast cancer prognosis is still unclear. We evaluated the effect of E2 concentration on survival endpoints among 8766 breast cancer cases diagnosed between 2005 and 2017 from the Tianjin Breast Cancer Cases Cohort. Levels of serum E2 were measured in pre-menopausal and post-menopausal women. Multivariable-adjusted Cox proportional hazards models were used to estimate hazard ratios (HR) and 95% confidence intervals (95% CI) between quartile of E2 levels and overall survival (OS) and progression-free survival (PFS) of breast cancer. The penalized spline was then used to test for non-linear relationships between E2 (continuous variable) and survival endpoints. 612 deaths and 982 progressions occurred over follow-up through 2017. Compared to women in the quartile 3, the highest quartile of E2 was associated with reduced risk of both PFS in pre-menopausal women (HR 1.79, 95% CI 1.17-2.75, P = 0.008) and OS in post-menopausal women (HR 1.35, 95% CI 1.04-1.74, P = 0.023). OS and PFS in pre-menopausal women exhibited a nonlinear relation ("L-shaped" and "U-shaped", respectively) with E2 levels. However, there was a linear relationship in post-menopausal women. Moreover, patients with estrogen receptor-negative (ER-negative) breast cancer showed a "U-shaped" relationship with OS and PFS in pre-menopausal women. Pre-menopausal breast cancer patients have a plateau stage of prognosis at the intermediate concentrations of E2, whereas post-menopausal patients have no apparent threshold, and ER status may have an impact on this relationship.
Asunto(s)
Neoplasias de la Mama , Estudios de Cohortes , Estradiol , Femenino , Humanos , Menopausia , PremenopausiaRESUMEN
BACKGROUND: Both genetic and lifestyle factors contribute to the risk of type 2 diabetes, but the extent to which there is a synergistic effect of the 2 factors is unclear. The aim of this study was to examine the joint associations of genetic risk and diet quality with incident type 2 diabetes. METHODS AND FINDINGS: We analyzed data from 35,759 men and women in the United States participating in the Nurses' Health Study (NHS) I (1986 to 2016) and II (1991 to 2017) and the Health Professionals Follow-up Study (HPFS; 1986 to 2016) with available genetic data and who did not have diabetes, cardiovascular disease, or cancer at baseline. Genetic risk was characterized using both a global polygenic score capturing overall genetic risk and pathway-specific polygenic scores denoting distinct pathophysiological mechanisms. Diet quality was assessed using the Alternate Healthy Eating Index (AHEI). Cox models were used to calculate hazard ratios (HRs) for type 2 diabetes after adjusting for potential confounders. With over 902,386 person-years of follow-up, 4,433 participants were diagnosed with type 2 diabetes. The relative risk of type 2 diabetes was 1.29 (95% confidence interval [CI] 1.25, 1.32; P < 0.001) per standard deviation (SD) increase in global polygenic score and 1.13 (1.09, 1.17; P < 0.001) per 10-unit decrease in AHEI. Irrespective of genetic risk, low diet quality, as compared to high diet quality, was associated with approximately 30% increased risk of type 2 diabetes (Pinteraction = 0.69). The joint association of low diet quality and increased genetic risk was similar to the sum of the risk associated with each factor alone (Pinteraction = 0.30). Limitations of this study include the self-report of diet information and possible bias resulting from inclusion of highly educated participants with available genetic data. CONCLUSIONS: These data provide evidence for the independent associations of genetic risk and diet quality with incident type 2 diabetes and suggest that a healthy diet is associated with lower diabetes risk across all levels of genetic risk.
Asunto(s)
Diabetes Mellitus Tipo 2 , Adulto , Diabetes Mellitus Tipo 2/etiología , Diabetes Mellitus Tipo 2/genética , Dieta/efectos adversos , Femenino , Estudios de Seguimiento , Humanos , Masculino , Estudios Prospectivos , Factores de Riesgo , Estados Unidos/epidemiologíaRESUMEN
BACKGROUND: Polygenic risk score (PRS) can evaluate the individual-level genetic risk of breast cancer. However, standalone single nucleotide polymorphisms (SNP) data used for PRS may not provide satisfactory prediction accuracy. Additionally, current PRS models based on linear regression have insufficient power to leverage non-linear effects from thousands of associated SNPs. Here, we proposed a transcriptional risk score (TRS) based on multiple omics data to estimate the risk of breast cancer. METHODS: The multiple omics data and clinical data of breast invasive carcinoma (BRCA) were collected from the cancer genome atlas (TCGA) and the gene expression omnibus (GEO). First, we developed a novel TRS model for BRCA utilizing single omic data and LightGBM algorithm. Subsequently, we built a combination model of TRS derived from each omic data to further improve the prediction accuracy. Finally, we performed association analysis and prognosis prediction to evaluate the utility of the TRS generated by our method. RESULTS: The proposed TRS model achieved better predictive performance than the linear models and other ML methods in single omic dataset. An independent validation dataset also verified the effectiveness of our model. Moreover, the combination of the TRS can efficiently strengthen prediction accuracy. The analysis of prevalence and the associations of the TRS with phenotypes including case-control and cancer stage indicated that the risk of breast cancer increases with the increases of TRS. The survival analysis also suggested that TRS for the cancer stage is an effective prognostic metric of breast cancer patients. CONCLUSIONS: Our proposed TRS model expanded the current definition of PRS from standalone SNP data to multiple omics data and outperformed the linear models, which may provide a powerful tool for diagnostic and prognostic prediction of breast cancer.
Asunto(s)
Algoritmos , Neoplasias , Factores de Riesgo , Análisis de Supervivencia , Estudio de Asociación del Genoma CompletoRESUMEN
MOTIVATION: Survival analysis using gene expression profiles plays a crucial role in the interpretation of clinical research and assessment of disease therapy programs. Several prediction models have been developed to explore the relationship between patients' covariates and survival. However, the high-dimensional genomic features limit the prediction performance of the survival model. Thus, an accurate and reliable prediction model is necessary for survival analysis using high-dimensional genomic data. RESULTS: In this study, we proposed an improved survival prediction model based on XGBoost framework called XGBLC, which used Lasso-Cox to enhance the ability to analyze high-dimensional genomic data. The novel first- and second-order gradient statistics of Lasso-Cox were defined to construct the loss function of XGBLC. We extensively tested our XGBLC algorithm on both simulated and real-world datasets, and estimated the performance of models with 5-fold cross-validation. Based on 20 cancer datasets from The Cancer Genome Atlas (TCGA), XGBLC outperforms five state-of-the-art survival methods in terms of C-index, Brier score and AUC. The results show that XGBLC still keeps good accuracy and robustness by comparing the performance on the simulated datasets with different scales. The developed prediction model would be beneficial for physicians to understand the effects of patient's genomic characteristics on survival and make personalized treatment decisions. AVAILABILITY AND IMPLEMENTATION: The implementation of XGBLC algorithm based on R language is available at: https://github.com/lab319/XGBLC. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Asunto(s)
Algoritmos , Neoplasias , Humanos , Genómica , Neoplasias/genética , Genoma , Análisis de SupervivenciaRESUMEN
Uncovering additional causal clinical traits and exposure variables is important when studying osteoporosis mechanisms and for the prevention of osteoporosis. Until recently, the causal relationship between anthropometric measurements and osteoporosis had not been fully revealed. In the present study, we utilized several state-of-the-art Mendelian randomization (MR) methods to investigate whether height, body mass index (BMI), waist-to-hip ratio (WHR), hip circumference (HC), and waist circumference (WC) are causally associated with two major characteristics of osteoporosis, bone mineral density (BMD) and fractures. Genomewide significant (p ≤ 5 × 10-8 ) single-nucleotide polymorphisms (SNPs) associated with the five anthropometric variables were obtained from previous large-scale genomewide association studies (GWAS) and were utilized as instrumental variables. Summary-level data of estimated bone mineral density (eBMD) and fractures were obtained from a large-scale UK Biobank GWAS. Of the MR methods utilized, the inverse-variance weighted method was the primary method used for analysis, and the weighted-median, MR-Egger, mode-based estimate, and MR pleiotropy residual sum and outlier methods were utilized for sensitivity analyses. The results of the present study indicated that each increase in height equal to a single standard deviation (SD) was associated with a 9.9% increase in risk of fracture (odds ratio [OR] = 1.099; 95% confidence interval [CI] 1.067-1.133; p = 8.793 × 10-10 ) and a 0.080 SD decrease of estimated bone mineral density (95% CI -0.106-(-0.054); p = 2.322 × 10-9 ). We also found that BMI was causally associated with eBMD (beta = 0.129, 95% CI 0.065-0.194; p = 8.113 × 10-5 ) but not associated with fracture. The WHR adjusted for BMI, HC adjusted for BMI, and WC adjusted for BMI were not found to be related to fracture occurrence or eBMD. In conclusion, the present study provided genetic evidence for certain causal relationships between anthropometric measurements and bone mineral density or fracture risk. © 2021 American Society for Bone and Mineral Research (ASBMR).
Asunto(s)
Fracturas Óseas , Osteoporosis , Densidad Ósea/genética , Fracturas Óseas/genética , Estudio de Asociación del Genoma Completo , Humanos , Análisis de la Aleatorización Mendeliana , Osteoporosis/genética , Polimorfismo de Nucleótido Simple/genéticaRESUMEN
BACKGROUND: Lung function is a heritable complex phenotype with obesity being one of its important risk factors. However, knowledge of their shared genetic basis is limited. Most genome-wide association studies (GWASs) for lung function have been based on European populations, limiting the generalisability across populations. Large-scale lung function GWASs in other populations are lacking. METHODS: We included 100â285 subjects from the China Kadoorie Biobank (CKB). To identify novel loci for lung function, single-trait GWAS analyses were performed on forced expiratory volume in 1â s (FEV1), forced vital capacity (FVC) and FEV1/FVC in the CKB. We then performed genome-wide cross-trait analysis between lung function and obesity traits (body mass index (BMI), BMI-adjusted waist-to-hip ratio and BMI-adjusted waist circumference) to investigate the shared genetic effects in the CKB. Finally, polygenic risk scores (PRSs) of lung function were developed in the CKB and their interaction with BMI's association on lung function were examined. We also conducted cross-trait analysis in parallel with the CKB using up to 457â756 subjects from the UK Biobank (UKB) for replication and investigation of ancestry-specific effects. RESULTS: We identified nine genome-wide significant novel loci for FEV1, six for FVC and three for FEV1/FVC in the CKB. FEV1 and FVC showed significant negative genetic correlation with obesity traits in both the CKB and UKB. Genetic loci shared between lung function and obesity traits highlighted important biological pathways, including cell proliferation, embryo, skeletal and tissue development, and regulation of gene expression. Mendelian randomisation analysis suggested significant negative causal effects of BMI on FEV1 and on FVC in both the CKB and UKB. Lung function PRSs significantly modified the effect of change in BMI on change in lung function during an average follow-up of 8â years. CONCLUSION: This large-scale GWAS of lung function identified novel loci and shared genetic aetiology between lung function and obesity. Change in BMI might affect change in lung function differently according to a subject's polygenic background. These findings may open new avenues for the development of molecular-targeted therapies for obesity and lung function improvement.
Asunto(s)
Estudio de Asociación del Genoma Completo , Polimorfismo de Nucleótido Simple , Índice de Masa Corporal , China , Volumen Espiratorio Forzado , Humanos , Pulmón , Obesidad/genéticaRESUMEN
OBJECTIVE: We aimed to examine the associations of obesity-related traits (body mass index [BMI], central obesity) and their genetic predisposition with the risk of developing severe COVID-19 in a population-based data. RESEARCH DESIGN AND METHODS: We analyzed data from 489,769 adults enrolled in the UK Biobank-a population-based cohort study. The exposures of interest are BMI categories and central obesity (e.g., larger waist circumference). Using genome-wide genotyping data, we also computed polygenic risk scores (PRSs) that represent an individual's overall genetic risk for each obesity trait. The outcome was severe COVID-19, defined by hospitalization for laboratory-confirmed COVID-19. RESULTS: Of 489,769 individuals, 33% were normal weight (BMI, 18.5-24.9â¯kg/m2), 43% overweight (25.0-29.9â¯kg/m2), and 24% obese (≥30.0â¯kg/m2). The UK Biobank identified 641 patients with severe COVID-19. Compared to adults with normal weight, those with a higher BMI had a dose-response increases in the risk of severe COVID-19, with the following adjusted ORs: for 25.0-29.9â¯kg/m2, 1.40 (95%CI 1.14-1.73; Pâ¯=â¯0.002); for 30.0-34.9â¯kg/m2, 1.73 (95%CI 1.36-2.20; Pâ¯<â¯0.001); for 35.0-39.9â¯kg/m2, 2.82 (95%CI 2.08-3.83; Pâ¯<â¯0.001); and for ≥40.0â¯kg/m2, 3.30 (95%CI 2.17-5.03; Pâ¯<â¯0.001). Likewise, central obesity was associated with significantly higher risk of severe COVID-19 (Pâ¯<â¯0.001). Furthermore, larger PRS for BMI was associated with higher risk of outcome (adjusted OR per BMI PRS Z-score 1.14, 95%CI 1.05-1.24; Pâ¯=â¯0.004). CONCLUSIONS: In this large population-based cohort, individuals with more-severe obesity, central obesity, or genetic predisposition for obesity are at higher risk of developing severe-COVID-19.
Asunto(s)
COVID-19/genética , COVID-19/patología , Predisposición Genética a la Enfermedad/genética , Obesidad Abdominal/complicaciones , Obesidad Abdominal/genética , Índice de Masa Corporal , Diabetes Mellitus Tipo 2/genética , Femenino , Humanos , Masculino , Persona de Mediana Edad , Sobrepeso/genética , Factores de Riesgo , SARS-CoV-2/patogenicidad , Índice de Severidad de la Enfermedad , Circunferencia de la Cintura/genéticaAsunto(s)
Asma/epidemiología , Asma/genética , Infecciones por Coronavirus/epidemiología , Infecciones por Coronavirus/genética , Predisposición Genética a la Enfermedad , Pandemias , Neumonía Viral/epidemiología , Neumonía Viral/genética , Adulto , Anciano , Asma/clasificación , Asma/inmunología , Betacoronavirus/inmunología , Betacoronavirus/patogenicidad , COVID-19 , Prueba de COVID-19 , Técnicas de Laboratorio Clínico/métodos , Estudios de Cohortes , Comorbilidad , Enfermedad de la Arteria Coronaria/epidemiología , Enfermedad de la Arteria Coronaria/genética , Enfermedad de la Arteria Coronaria/inmunología , Infecciones por Coronavirus/diagnóstico , Infecciones por Coronavirus/virología , Femenino , Hospitalización , Humanos , Hipertensión/epidemiología , Hipertensión/genética , Hipertensión/inmunología , Modelos Logísticos , Masculino , Persona de Mediana Edad , Neumonía Viral/virología , Enfermedad Pulmonar Obstructiva Crónica/epidemiología , Enfermedad Pulmonar Obstructiva Crónica/genética , Enfermedad Pulmonar Obstructiva Crónica/inmunología , Factores de Riesgo , SARS-CoV-2 , Índice de Severidad de la Enfermedad , Reino Unido/epidemiologíaRESUMEN
Accurate diagnostic classification of cancers can greatly help physicians to choose surveillance and treatment strategies for patients. Following the explosive growth of huge amounts of biological data, the shift from traditional biostatistical methods to computer-aided means has made machine-learning methods as an integral part of today's cancer prognosis prediction. In this work, we proposed a classification model by leveraging the power of extreme gradient boosting (XGBoost) and using increasingly complex multi-omics data with the aim to separate early stage and late stage cancers. We applied XGBoost model to four kinds of cancer data downloaded from TCGA and compared its performance with other popular machine-learning methods. The experimental results showed that our method obtained statistically significantly better or comparable predictive performance. The results of this study also revealed that DNA methylation outperforms other molecular data (mRNA expression and miRNA expression) in terms of accuracy and stability for discriminating between early stage and late stage groups. Furthermore, integration of multi-omics data by autoencoder can enhance the classification accuracy of cancer stage. Finally, we conducted bioinformatics analyses to assess the medical utility of the significant genes ranked by their importance using XGBoost algorithm. Extensively comparative experiments demonstrated that the XGBoost method has a remarkable performance in predicting the stage of cancer patients with multi-omics data. Moreover, identification of novel candidate genes associated with cancer stages would contribute to further elucidate disease pathogenesis and develop novel therapeutics.
Asunto(s)
MicroARNs , Neoplasias , Algoritmos , Metilación de ADN , Humanos , Aprendizaje Automático , MicroARNs/genética , Neoplasias/diagnóstico , Neoplasias/genéticaRESUMEN
BACKGROUND: The burden of breast cancer has grown rapidly in China during recent decades. However, the association between tumor markers (CA15-3, CA125, and CEA) and breast cancer survival among certain molecular subtypes is unclear; we described this association in a large, population-based study. METHODS: We conducted a cohort study including 10,836 women according to the Tianjin Breast Cancer Cases Cohort. Demographic and epidemiologic data were collected by a structured face-to-face questionnaire. Clinico-pathological parameters were abstracted from medical records, and follow-up information was obtained once a year by telephone. The primary endpoints were breast cancer-specific survival (BCSS) and disease-free survival (DFS). We utilized the Cox proportional hazard model to calculate hazard ratios (HRs) and 95% confidence intervals (CI). RESULTS: Among all patients, elevated CA15-3 and CEA exhibited consistently and statistically significant reduced BCSS compared with normal ones (CA15-3: HR 1.54, 95% CI 1.01-2.34; CEA: HR 2.45, 95% CI 1.40-4.30). Similar patterns of association were observed for DFS (CA15-3: HR 2.09, 95% CI 1.44-3.02; CEA: HR 2.71, 95% CI 1.71-4.27). Moreover, in luminal A subtype, high CA15-3 and CEA levels were associated with decreased BCSS (CA15-3: HR 4.47, 95% CI 2.04-9.81; CEA: HR 3.79, 95% CI 1.68-8.55) and DFS (CA15-3: HR 4.06, 95% CI 2.29-7.18, CEA: HR 3.41, 95% CI 1.75-6.64). In basal-like subtype, elevated CEA conferred reduction for BCSS (HR 5.13, 95% CI 1.65-15.9). However, no association was observed between CA125 and breast cancer outcome. CONCLUSIONS: Preoperative CA15-3 and CEA levels differ in breast cancer molecular subtypes and yield strong prognostic information in Chinese women with breast cancer. Measuring CA15-3 and CEA levels before surgery may have the potential in predicting breast cancer survival and offering patients' personalized treatment strategy among luminal A and basal-like subtypes.
Asunto(s)
Biomarcadores de Tumor/sangre , Neoplasias de la Mama/mortalidad , Mama/patología , Adolescente , Adulto , Anciano , Anciano de 80 o más Años , Mama/cirugía , Neoplasias de la Mama/sangre , Neoplasias de la Mama/patología , Neoplasias de la Mama/terapia , Antígeno Ca-125/sangre , Antígeno Carcinoembrionario/sangre , Quimioterapia Adyuvante , China/epidemiología , Supervivencia sin Enfermedad , Femenino , Estudios de Seguimiento , Proteínas Ligadas a GPI/sangre , Humanos , Mastectomía , Proteínas de la Membrana/sangre , Persona de Mediana Edad , Mucina-1/sangre , Periodo Preoperatorio , Pronóstico , Adulto JovenRESUMEN
Objectives: Lung adenocarcinoma (LUAD) accounts for a majority of cancer-related deaths worldwide annually. The identification of prognostic biomarkers and prediction of prognosis for LUAD patients is necessary. Materials and Methods: In this study, LUAD RNA-Seq data and clinical data from the Cancer Genome Atlas (TCGA) were divided into TCGA cohort I (n = 338) and II (n = 168). The cohort I was used for model construction, and the cohort II and data from Gene Expression Omnibus (GSE72094 cohort, n = 393; GSE11969 cohort, n = 149) were utilized for validation. First, the survival-related seed genes were selected from the cohort I using the machine learning model (random survival forest, RSF), and then in order to improve prediction accuracy, the forward selection model was utilized to identify the prognosis-related key genes among the seed genes using the clinically-integrated RNA-Seq data. Second, the survival risk score system was constructed by using these key genes in the cohort II, the GSE72094 cohort and the GSE11969 cohort, and the evaluation metrics such as HR, p value and C-index were calculated to validate the proposed method. Third, the developed approach was compared with the previous five prediction models. Finally, bioinformatics analyses (pathway, heatmap, protein-gene interaction network) have been applied to the identified seed genes and key genes. Results and Conclusion: Based on the RSF model and clinically-integrated RNA-Seq data, we identified sixteen key genes that formed the prognostic gene expression signature. These sixteen key genes could achieve a strong power for prognostic prediction of LUAD patients in cohort II (HR = 3.80, p = 1.63e-06, C-index = 0.656), and were further validated in the GSE72094 cohort (HR = 4.12, p = 1.34e-10, C-index = 0.672) and GSE11969 cohort (HR = 3.87, p = 6.81e-07, C-index = 0.670). The experimental results of three independent validation cohorts showed that compared with the traditional Cox model and the use of standalone RNA-Seq data, the machine-learning-based method effectively improved the prediction accuracy of LUAD prognosis, and the derived model was also superior to the other five existing prediction models. KEGG pathway analysis found eleven of the sixteen genes were associated with Nicotine addiction. Thirteen of the sixteen genes were reported for the first time as the LUAD prognosis-related key genes. In conclusion, we developed a sixteen-gene prognostic marker for LUAD, which may provide a powerful prognostic tool for precision oncology.