Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 26
Filtrar
Más filtros

Banco de datos
Tipo del documento
Intervalo de año de publicación
1.
Bioinformatics ; 39(10)2023 10 03.
Artículo en Inglés | MEDLINE | ID: mdl-37851379

RESUMEN

MOTIVATION: Gene regulatory networks (GRNs) are a way of describing the interaction between genes, which contribute to revealing the different biological mechanisms in the cell. Reconstructing GRNs based on gene expression data has been a central computational problem in systems biology. However, due to the high dimensionality and non-linearity of large-scale GRNs, accurately and efficiently inferring GRNs is still a challenging task. RESULTS: In this article, we propose a new approach, iLSGRN, to reconstruct large-scale GRNs from steady-state and time-series gene expression data based on non-linear ordinary differential equations. Firstly, the regulatory gene recognition algorithm calculates the Maximal Information Coefficient between genes and excludes redundant regulatory relationships to achieve dimensionality reduction. Then, the feature fusion algorithm constructs a model leveraging the feature importance derived from XGBoost (eXtreme Gradient Boosting) and RF (Random Forest) models, which can effectively train the non-linear ordinary differential equations model of GRNs and improve the accuracy and stability of the inference algorithm. The extensive experiments on different scale datasets show that our method makes sensible improvement compared with the state-of-the-art methods. Furthermore, we perform cross-validation experiments on the real gene datasets to validate the robustness and effectiveness of the proposed method. AVAILABILITY AND IMPLEMENTATION: The proposed method is written in the Python language, and is available at: https://github.com/lab319/iLSGRN.


Asunto(s)
Algoritmos , Redes Reguladoras de Genes , Biología de Sistemas , Bosques Aleatorios , Factores de Tiempo , Biología Computacional/métodos
2.
Bioinformatics ; 38(2): 410-418, 2022 01 03.
Artículo en Inglés | MEDLINE | ID: mdl-34586380

RESUMEN

MOTIVATION: Survival analysis using gene expression profiles plays a crucial role in the interpretation of clinical research and assessment of disease therapy programs. Several prediction models have been developed to explore the relationship between patients' covariates and survival. However, the high-dimensional genomic features limit the prediction performance of the survival model. Thus, an accurate and reliable prediction model is necessary for survival analysis using high-dimensional genomic data. RESULTS: In this study, we proposed an improved survival prediction model based on XGBoost framework called XGBLC, which used Lasso-Cox to enhance the ability to analyze high-dimensional genomic data. The novel first- and second-order gradient statistics of Lasso-Cox were defined to construct the loss function of XGBLC. We extensively tested our XGBLC algorithm on both simulated and real-world datasets, and estimated the performance of models with 5-fold cross-validation. Based on 20 cancer datasets from The Cancer Genome Atlas (TCGA), XGBLC outperforms five state-of-the-art survival methods in terms of C-index, Brier score and AUC. The results show that XGBLC still keeps good accuracy and robustness by comparing the performance on the simulated datasets with different scales. The developed prediction model would be beneficial for physicians to understand the effects of patient's genomic characteristics on survival and make personalized treatment decisions. AVAILABILITY AND IMPLEMENTATION: The implementation of XGBLC algorithm based on R language is available at: https://github.com/lab319/XGBLC. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Algoritmos , Neoplasias , Humanos , Genómica , Neoplasias/genética , Genoma , Análisis de Supervivencia
3.
PLoS Med ; 19(4): e1003972, 2022 04.
Artículo en Inglés | MEDLINE | ID: mdl-35472203

RESUMEN

BACKGROUND: Both genetic and lifestyle factors contribute to the risk of type 2 diabetes, but the extent to which there is a synergistic effect of the 2 factors is unclear. The aim of this study was to examine the joint associations of genetic risk and diet quality with incident type 2 diabetes. METHODS AND FINDINGS: We analyzed data from 35,759 men and women in the United States participating in the Nurses' Health Study (NHS) I (1986 to 2016) and II (1991 to 2017) and the Health Professionals Follow-up Study (HPFS; 1986 to 2016) with available genetic data and who did not have diabetes, cardiovascular disease, or cancer at baseline. Genetic risk was characterized using both a global polygenic score capturing overall genetic risk and pathway-specific polygenic scores denoting distinct pathophysiological mechanisms. Diet quality was assessed using the Alternate Healthy Eating Index (AHEI). Cox models were used to calculate hazard ratios (HRs) for type 2 diabetes after adjusting for potential confounders. With over 902,386 person-years of follow-up, 4,433 participants were diagnosed with type 2 diabetes. The relative risk of type 2 diabetes was 1.29 (95% confidence interval [CI] 1.25, 1.32; P < 0.001) per standard deviation (SD) increase in global polygenic score and 1.13 (1.09, 1.17; P < 0.001) per 10-unit decrease in AHEI. Irrespective of genetic risk, low diet quality, as compared to high diet quality, was associated with approximately 30% increased risk of type 2 diabetes (Pinteraction = 0.69). The joint association of low diet quality and increased genetic risk was similar to the sum of the risk associated with each factor alone (Pinteraction = 0.30). Limitations of this study include the self-report of diet information and possible bias resulting from inclusion of highly educated participants with available genetic data. CONCLUSIONS: These data provide evidence for the independent associations of genetic risk and diet quality with incident type 2 diabetes and suggest that a healthy diet is associated with lower diabetes risk across all levels of genetic risk.


Asunto(s)
Diabetes Mellitus Tipo 2 , Adulto , Diabetes Mellitus Tipo 2/etiología , Diabetes Mellitus Tipo 2/genética , Dieta/efectos adversos , Femenino , Estudios de Seguimiento , Humanos , Masculino , Estudios Prospectivos , Factores de Riesgo , Estados Unidos/epidemiología
4.
Breast Cancer Res Treat ; 194(1): 103-111, 2022 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-35467315

RESUMEN

High levels of circulating estradiol (E2) are associated with increased risk of breast cancer, whereas its relationship with breast cancer prognosis is still unclear. We evaluated the effect of E2 concentration on survival endpoints among 8766 breast cancer cases diagnosed between 2005 and 2017 from the Tianjin Breast Cancer Cases Cohort. Levels of serum E2 were measured in pre-menopausal and post-menopausal women. Multivariable-adjusted Cox proportional hazards models were used to estimate hazard ratios (HR) and 95% confidence intervals (95% CI) between quartile of E2 levels and overall survival (OS) and progression-free survival (PFS) of breast cancer. The penalized spline was then used to test for non-linear relationships between E2 (continuous variable) and survival endpoints. 612 deaths and 982 progressions occurred over follow-up through 2017. Compared to women in the quartile 3, the highest quartile of E2 was associated with reduced risk of both PFS in pre-menopausal women (HR 1.79, 95% CI 1.17-2.75, P = 0.008) and OS in post-menopausal women (HR 1.35, 95% CI 1.04-1.74, P = 0.023). OS and PFS in pre-menopausal women exhibited a nonlinear relation ("L-shaped" and "U-shaped", respectively) with E2 levels. However, there was a linear relationship in post-menopausal women. Moreover, patients with estrogen receptor-negative (ER-negative) breast cancer showed a "U-shaped" relationship with OS and PFS in pre-menopausal women. Pre-menopausal breast cancer patients have a plateau stage of prognosis at the intermediate concentrations of E2, whereas post-menopausal patients have no apparent threshold, and ER status may have an impact on this relationship.


Asunto(s)
Neoplasias de la Mama , Estudios de Cohortes , Estradiol , Femenino , Humanos , Menopausia , Premenopausia
5.
Eur Respir J ; 58(4)2021 10.
Artículo en Inglés | MEDLINE | ID: mdl-33766948

RESUMEN

BACKGROUND: Lung function is a heritable complex phenotype with obesity being one of its important risk factors. However, knowledge of their shared genetic basis is limited. Most genome-wide association studies (GWASs) for lung function have been based on European populations, limiting the generalisability across populations. Large-scale lung function GWASs in other populations are lacking. METHODS: We included 100 285 subjects from the China Kadoorie Biobank (CKB). To identify novel loci for lung function, single-trait GWAS analyses were performed on forced expiratory volume in 1 s (FEV1), forced vital capacity (FVC) and FEV1/FVC in the CKB. We then performed genome-wide cross-trait analysis between lung function and obesity traits (body mass index (BMI), BMI-adjusted waist-to-hip ratio and BMI-adjusted waist circumference) to investigate the shared genetic effects in the CKB. Finally, polygenic risk scores (PRSs) of lung function were developed in the CKB and their interaction with BMI's association on lung function were examined. We also conducted cross-trait analysis in parallel with the CKB using up to 457 756 subjects from the UK Biobank (UKB) for replication and investigation of ancestry-specific effects. RESULTS: We identified nine genome-wide significant novel loci for FEV1, six for FVC and three for FEV1/FVC in the CKB. FEV1 and FVC showed significant negative genetic correlation with obesity traits in both the CKB and UKB. Genetic loci shared between lung function and obesity traits highlighted important biological pathways, including cell proliferation, embryo, skeletal and tissue development, and regulation of gene expression. Mendelian randomisation analysis suggested significant negative causal effects of BMI on FEV1 and on FVC in both the CKB and UKB. Lung function PRSs significantly modified the effect of change in BMI on change in lung function during an average follow-up of 8 years. CONCLUSION: This large-scale GWAS of lung function identified novel loci and shared genetic aetiology between lung function and obesity. Change in BMI might affect change in lung function differently according to a subject's polygenic background. These findings may open new avenues for the development of molecular-targeted therapies for obesity and lung function improvement.


Asunto(s)
Estudio de Asociación del Genoma Completo , Polimorfismo de Nucleótido Simple , Índice de Masa Corporal , China , Volumen Espiratorio Forzado , Humanos , Pulmón , Obesidad/genética
6.
Bioinformatics ; 36(19): 4885-4893, 2020 12 08.
Artículo en Inglés | MEDLINE | ID: mdl-31950997

RESUMEN

MOTIVATION: Gene regulatory networks (GRNs) capture the regulatory interactions between genes, resulting from the fundamental biological process of transcription and translation. In some cases, the topology of GRNs is not known, and has to be inferred from gene expression data. Most of the existing GRNs reconstruction algorithms are either applied to time-series data or steady-state data. Although time-series data include more information about the system dynamics, steady-state data imply stability of the underlying regulatory networks. RESULTS: In this article, we propose a method for inferring GRNs from time-series and steady-state data jointly. We make use of a non-linear ordinary differential equations framework to model dynamic gene regulation and an importance measurement strategy to infer all putative regulatory links efficiently. The proposed method is evaluated extensively on the artificial DREAM4 dataset and two real gene expression datasets of yeast and Escherichia coli. Based on public benchmark datasets, the proposed method outperforms other popular inference algorithms in terms of overall score. By comparing the performance on the datasets with different scales, the results show that our method still keeps good robustness and accuracy at a low computational complexity. AVAILABILITY AND IMPLEMENTATION: The proposed method is written in the Python language, and is available at: https://github.com/lab319/GRNs_nonlinear_ODEs. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Algoritmos , Redes Reguladoras de Genes , Escherichia coli/genética , Regulación de la Expresión Génica , Saccharomyces cerevisiae/genética
7.
Am J Hum Genet ; 95(4): 462-71, 2014 Oct 02.
Artículo en Inglés | MEDLINE | ID: mdl-25279986

RESUMEN

Genome-wide association studies (GWASs) of follicular lymphoma (FL) have previously identified human leukocyte antigen (HLA) gene variants. To identify additional FL susceptibility loci, we conducted a large-scale two-stage GWAS in 4,523 case subjects and 13,344 control subjects of European ancestry. Five non-HLA loci were associated with FL risk: 11q23.3 (rs4938573, p = 5.79 × 10(-20)) near CXCR5; 11q24.3 (rs4937362, p = 6.76 × 10(-11)) near ETS1; 3q28 (rs6444305, p = 1.10 × 10(-10)) in LPP; 18q21.33 (rs17749561, p = 8.28 × 10(-10)) near BCL2; and 8q24.21 (rs13254990, p = 1.06 × 10(-8)) near PVT1. In an analysis of the HLA region, we identified four linked HLA-DRß1 multiallelic amino acids at positions 11, 13, 28, and 30 that were associated with FL risk (pomnibus = 4.20 × 10(-67) to 2.67 × 10(-70)). Additional independent signals included rs17203612 in HLA class II (odds ratio [OR(per-allele)] = 1.44; p = 4.59 × 10(-16)) and rs3130437 in HLA class I (OR(per-allele) = 1.23; p = 8.23 × 10(-9)). Our findings further expand the number of loci associated with FL and provide evidence that multiple common variants outside the HLA region make a significant contribution to FL risk.


Asunto(s)
Biomarcadores de Tumor/genética , Cromosomas Humanos/genética , Predisposición Genética a la Enfermedad , Estudio de Asociación del Genoma Completo , Antígenos HLA/genética , Linfoma Folicular/genética , Polimorfismo de Nucleótido Simple/genética , Alelos , Estudios de Casos y Controles , Haplotipos/genética , Humanos
8.
Nucleic Acids Res ; 42(6): 3515-28, 2014 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-24445802

RESUMEN

Differences in methylation across tissues are critical to cell differentiation and are key to understanding the role of epigenetics in complex diseases. In this investigation, we found that locus-specific methylation differences between tissues are highly consistent across individuals. We developed a novel statistical model to predict locus-specific methylation in target tissue based on methylation in surrogate tissue. The method was evaluated in publicly available data and in two studies using the latest IlluminaBeadChips: a childhood asthma study with methylation measured in both peripheral blood leukocytes (PBL) and lymphoblastoid cell lines; and a study of postoperative atrial fibrillation with methylation in PBL, atrium and artery. We found that our method can greatly improve accuracy of cross-tissue prediction at CpG sites that are variable in the target tissue [R(2) increases from 0.38 (original R(2) between tissues) to 0.89 for PBL-to-artery prediction; from 0.39 to 0.95 for PBL-to-atrium; and from 0.81 to 0.98 for lymphoblastoid cell line-to-PBL based on cross-validation, and confirmed using cross-study prediction]. An extended model with multiple CpGs further improved performance. Our results suggest that large-scale epidemiology studies using easy-to-access surrogate tissues (e.g. blood) could be recalibrated to improve understanding of epigenetics in hard-to-access tissues (e.g. atrium) and might enable non-invasive disease screening using epigenetic profiles.


Asunto(s)
Metilación de ADN , Arterias/metabolismo , Apéndice Atrial/metabolismo , Línea Celular Transformada , Niño , Islas de CpG , Femenino , Humanos , Leucocitos/metabolismo , Masculino , Modelos Estadísticos
9.
10.
Genet Epidemiol ; 37(4): 402-7, 2013 May.
Artículo en Inglés | MEDLINE | ID: mdl-23595356

RESUMEN

The case-only test has been proposed as a more powerful approach to detect gene-environment (G × E) interactions. This approach assumes that the genetic and environmental factors are independent. Although it is well known that Type I error rate will increase if this assumption is violated, it is less widely appreciated that G × E correlation can also lead to power loss. We illustrate this phenomenon by comparing the performance of the case-only test to other approaches to detect G × E interactions in a genome-wide association study (GWAS) of esophageal squamous-cell carcinoma (ESCC) in Chinese populations. Some of these approaches do not use information on the correlation between exposure and genotype (standard logistic regression), whereas others seek to use this information in a robust fashion to boost power without increasing Type I error (two-step, empirical Bayes, and cocktail methods). G × E interactions were identified involving drinking status and two regions containing genes in the alcohol metabolism pathway, 4q23 and 12q24. Although the case-only test yielded the most significant tests of G × E interaction in the 4q23 region, the case-only test failed to identify significant interactions in the 12q24 region which were readily identified using other approaches. The low power of the case-only test in the 12q24 region is likely due to the strong inverse association between the single nucleotide polymorphism (SNPs) in this region and drinking status. This example underscores the need to consider multiple approaches to detect G × E interactions, as different tests are more or less sensitive to different alternative hypotheses and violations of the G × E independence assumption.


Asunto(s)
Carcinoma de Células Escamosas/genética , Neoplasias Esofágicas/genética , Interacción Gen-Ambiente , Estudio de Asociación del Genoma Completo , Teorema de Bayes , Carcinoma de Células Escamosas/etnología , China , Cromosomas Humanos Par 12/genética , Cromosomas Humanos Par 4/genética , Ambiente , Neoplasias Esofágicas/etnología , Femenino , Predisposición Genética a la Enfermedad , Humanos , Masculino , Modelos Genéticos , Modelos Estadísticos , Polimorfismo de Nucleótido Simple , Análisis de Regresión , Encuestas y Cuestionarios
11.
Epigenomics ; 2024 Mar 13.
Artículo en Inglés | MEDLINE | ID: mdl-38477028

RESUMEN

Aim: To predict base-resolution DNA methylation in cancerous and paracancerous tissues. Material & methods: We collected six cancer DNA methylation datasets from The Cancer Genome Atlas and five cancer datasets from Gene Expression Omnibus and established machine learning models using paired cancerous and paracancerous tissues. Tenfold cross-validation and independent validation were performed to demonstrate the effectiveness of the proposed method. Results: The developed cross-tissue prediction models can substantially increase the accuracy at more than 68% of CpG sites and contribute to enhancing the statistical power of differential methylation analyses. An XGBoost model leveraging multiple correlating CpGs may elevate the prediction accuracy. Conclusion: This study provides a powerful tool for DNA methylation analysis and has the potential to gain new insights into cancer research from epigenetics.

12.
Ecol Evol ; 14(5): e11342, 2024 May.
Artículo en Inglés | MEDLINE | ID: mdl-38799395

RESUMEN

The morphological variation in Schizothorax oconnori, Schizothorax waltoni, and their natural hybrids was examined using conventional and image-based analysis approaches. In total, 38 specimens of S. oconnori, 35 of S. waltoni, and 37 natural hybrids were collected from the Shigatse to the Lhasa section of the Yarlung Zangbo River during June and July 2021. A total of 21 morphometric, 4 meristic, and 27 truss variables were employed for the classification of S. oconnori, S. waltoni, and natural hybrids. Principal component analysis (PCA) and factor analysis (FA), as well as discriminant function analysis (DFA) and cluster analysis (CA), were conducted to identify differences based on traditional and truss measurements. Four principal components explained 75.92% of the variation among the morphometric characters, while five principal components accounted for 79.69% of the variation among the truss distances. FA results showed that factor 1 was associated with head shape, and factor 2 was associated with fins based on morphometric characters. Among the truss characters, factor 1 was related to head shape, and factor 2 was related to chest shape. In DFA, morphometric measurements achieved higher accuracy (100%) compared to truss distances (94.55%). The head morphology of hybrids exhibited intermediate traits between S. oconnori and S. waltoni. Both morphometry-based and truss-based clustering indicated that the morphology of natural hybrids leaned toward S. oconnori. In conclusion, the combination of morphometric and truss analysis is beneficial for classifying S. oconnori, S. waltoni, and their natural hybrids. The presence of natural hybrids could be considered an evolutionary response to the differentiation of nutritional and spatial niches in the middle Yarlung Zangbo River.

13.
Carcinogenesis ; 34(8): 1782-6, 2013 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-23536576

RESUMEN

Genome-wide association studies have identified multiple genetic variants associated with risk of esophageal squamous-cell carcinoma (ESCC) in Chinese populations. We examined whether these genetic factors, along with non-genetic factors, can contribute to ESCC risk prediction. We examined 25 single nucleotide polymorphisms (SNPs) and 4 non-genetic factors (sex, age, smoking and drinking) associated with ESCC risk in 9805 cases and 10 493 controls from Chinese populations. Weighted genetic risk score (wGRS) was calculated and logistic regression was used to analyze the association between wGRS and ESCC risk. We calculated the area under the curve (AUC) using receiver operating characteristic curve analysis to measure the discrimination after adding genetic variants to the model with only non-genetic factors. Net reclassification improvement (NRI) was used to quantify the degree of correct reclassification using different models. wGRS of the combined 17 SNPs with significant marginal effect (G SNPs) increased ~4-fold ESCC risk (P = 1.49 × 10(-) (164)) and the associations were significant in both drinkers and non-drinkers. However, wGRS of the eight SNPs with significant effect in gene × drinking interaction (GE SNPs) increased ~4-fold ESCC risk only in drinkers (P interaction = 8.76 × 10(-) (41)). The AUC for a risk model with 4 non-genetic factors, 17 G SNPs, 8 GE SNPs and their interactions with drinking was 70.1%, with the significant improvement of 7.0% compared with the model with only non-genetic factors (P < 0.0001). Our results indicate that incorporating genetic variants, lifestyle factors and their interactions in ESCC risk models can be useful for identifying patients with ESCC.


Asunto(s)
Pueblo Asiatico/genética , Carcinoma de Células Escamosas/epidemiología , Carcinoma de Células Escamosas/genética , Neoplasias Esofágicas/epidemiología , Neoplasias Esofágicas/genética , Estilo de Vida , Consumo de Bebidas Alcohólicas/epidemiología , Consumo de Bebidas Alcohólicas/genética , Área Bajo la Curva , Estudios de Casos y Controles , China/epidemiología , Carcinoma de Células Escamosas de Esófago , Femenino , Predisposición Genética a la Enfermedad , Estudio de Asociación del Genoma Completo/métodos , Humanos , Masculino , Persona de Mediana Edad , Polimorfismo de Nucleótido Simple , Curva ROC , Riesgo , Factores de Riesgo
14.
Math Biosci Eng ; 20(7): 11676-11687, 2023 05 08.
Artículo en Inglés | MEDLINE | ID: mdl-37501415

RESUMEN

Most kidney cancers are kidney renal clear cell carcinoma (KIRC) that is a main cause of cancer-related deaths. Polygenic risk score (PRS) is a weighted linear combination of phenotypic related alleles on the genome that can be used to assess KIRC risk. However, standalone SNP data as input to the PRS model may not provide satisfactory result. Therefore, Transcriptional risk scores (TRS) based on multi-omics data and machine learning models were proposed to assess the risk of KIRC. First, we collected four types of multi-omics data (DNA methylation, miRNA, mRNA and lncRNA) of KIRC patients from the TCGA database. Subsequently, a novel TRS method utilizing multiple omics data and XGBoost model was developed. Finally, we performed prevalence analysis and prognosis prediction to evaluate the utility of the TRS generated by our method. Our TRS methods exhibited better predictive performance than the linear models and other machine learning models. Furthermore, the prediction accuracy of combined TRS model was higher than that of single-omics TRS model. The KM curves showed that TRS was a valid prognostic indicator for cancer staging. Our proposed method extended the current definition of TRS from standalone SNP data to multi-omics data and was superior to the linear models and other machine learning models, which may provide a useful implement for diagnostic and prognostic prediction of KIRC.


Asunto(s)
Carcinoma de Células Renales , Neoplasias Renales , MicroARNs , Humanos , Carcinoma de Células Renales/diagnóstico , Carcinoma de Células Renales/genética , Carcinoma de Células Renales/patología , Neoplasias Renales/diagnóstico , Neoplasias Renales/genética , Neoplasias Renales/patología , MicroARNs/genética , Factores de Riesgo , Riñón/patología
15.
Digit Signal Process ; 127: 103577, 2022 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-35529477

RESUMEN

The outbreak of coronavirus disease (COVID-19) and its accompanying pandemic have created an unprecedented challenge worldwide. Parametric modeling and analyses of the COVID-19 play a critical role in providing vital information about the character and relevant guidance for controlling the pandemic. However, the epidemiological utility of the results obtained from the COVID-19 transmission model largely depends on accurately identifying parameters. This paper extends the susceptible-exposed-infectious-recovered (SEIR) model and proposes an improved quantum-behaved particle swarm optimization (QPSO) algorithm to estimate its parameters. A new strategy is developed to update the weighting factor of the mean best position by the reciprocal of multiplying the fitness of each best particle with the average fitness of all best particles, which can enhance the global search capacity. To increase the particle diversity, a probability function is designed to generate new particles in the updating iteration. When compared to the state-of-the-art estimation algorithms on the epidemic datasets of China, Italy and the US, the proposed method achieves good accuracy and convergence at a comparable computational complexity. The developed framework would be beneficial for experts to understand the characteristics of epidemic development and formulate epidemic prevention and control measures.

16.
Math Biosci Eng ; 19(12): 12353-12370, 2022 08 24.
Artículo en Inglés | MEDLINE | ID: mdl-36654001

RESUMEN

BACKGROUND: Polygenic risk score (PRS) can evaluate the individual-level genetic risk of breast cancer. However, standalone single nucleotide polymorphisms (SNP) data used for PRS may not provide satisfactory prediction accuracy. Additionally, current PRS models based on linear regression have insufficient power to leverage non-linear effects from thousands of associated SNPs. Here, we proposed a transcriptional risk score (TRS) based on multiple omics data to estimate the risk of breast cancer. METHODS: The multiple omics data and clinical data of breast invasive carcinoma (BRCA) were collected from the cancer genome atlas (TCGA) and the gene expression omnibus (GEO). First, we developed a novel TRS model for BRCA utilizing single omic data and LightGBM algorithm. Subsequently, we built a combination model of TRS derived from each omic data to further improve the prediction accuracy. Finally, we performed association analysis and prognosis prediction to evaluate the utility of the TRS generated by our method. RESULTS: The proposed TRS model achieved better predictive performance than the linear models and other ML methods in single omic dataset. An independent validation dataset also verified the effectiveness of our model. Moreover, the combination of the TRS can efficiently strengthen prediction accuracy. The analysis of prevalence and the associations of the TRS with phenotypes including case-control and cancer stage indicated that the risk of breast cancer increases with the increases of TRS. The survival analysis also suggested that TRS for the cancer stage is an effective prognostic metric of breast cancer patients. CONCLUSIONS: Our proposed TRS model expanded the current definition of PRS from standalone SNP data to multiple omics data and outperformed the linear models, which may provide a powerful tool for diagnostic and prognostic prediction of breast cancer.


Asunto(s)
Algoritmos , Neoplasias , Factores de Riesgo , Análisis de Supervivencia , Estudio de Asociación del Genoma Completo
17.
Mol Genet Genomic Med ; 10(11): e2047, 2022 11.
Artículo en Inglés | MEDLINE | ID: mdl-36124564

RESUMEN

BACKGROUND: Patients with impaired kidney function were found at a high risk of COVID-19 hospitalization and mortality in many observational, cross-sectional, and hospital-based studies, but evidence from large-scale prospective cohorts has been lacking. We aimed to examine the association of kidney function-related biomarkers and their genetic predisposition with the risk of developing severe COVID-19 in population-based data. METHODS: We analyzed data from UK Biobank to examine the prospective association of abnormal kidney function biomarkers with severe COVID-19, defined by laboratory-confirmed COVID-19 hospitalizations. Using genotype data, we constructed polygenic risk scores (PRS) to represent an individual's overall genetic risk for these biomarkers. We also identified tipping points where the risk of severe COVID-19 began to increase significantly for each biomarker. RESULTS: Of the 502,506 adults, 1650 (0.32%) were identified as severe COVID-19, before August 12, 2020. High levels of cystatin C (OR: 1.3; 95% CI: 1.2-1.5; FDR = 1.5 × 10-5 ), serum creatinine (OR: 1.7; 95% CI: 1.3-2.1; p = 3.5 × 10-4 ; FDR = 3.5 × 10-4 ), microalbuminuria (OR: 1.4; 95% CI: 1.2-1.6; FDR = 4 × 10-4 ), and UACR (urinary albumin creatinine ratio; OR: 1.4; 95% CI: 1.2-1.6; p = 3.5 × 10-4 ; FDR = 3.5 × 10-4 ) were found significantly associated with severe COVID-19. Individuals with top 10% of PRS for elevated cystatin C, urate, and microalbuminuria had 28% to 43% higher risks of severe COVID-19 than individuals with bottom 30% PRS (p < 0.05). Tipping-point analyses further supported that severe COVID-19 could occur even when the values of cystatin C, urate (male), and microalbuminuria were within their normal value ranges (OR >1.1, p < 0.05). CONCLUSIONS: Findings from this study might point to new directions for clinicians and policymakers in optimizing risk-stratification among patients based on polygenic risk estimation and tipping points of kidney function markers. Our results call for further investigation to develop a better strategy to prevent severe COVID-19 outcomes among patients with genetic predisposition to impaired kidney function. These findings could provide a new tool for clinicians and policymakers in the future especially if we need to live with COVID-19 for a long time.


Asunto(s)
COVID-19 , Insuficiencia Renal , Adulto , Humanos , Masculino , Cistatina C/orina , COVID-19/genética , Predisposición Genética a la Enfermedad , Estudios Transversales , Ácido Úrico , Albuminuria/genética , Biomarcadores , Riñón
18.
Sci Rep ; 12(1): 10646, 2022 06 23.
Artículo en Inglés | MEDLINE | ID: mdl-35739223

RESUMEN

The potential role of DNA methylation from paracancerous tissues in cancer diagnosis has not been explored until now. In this study, we built classification models using well-known machine learning models based on DNA methylation profiles of paracancerous tissues. We evaluated our methods on nine cancer datasets collected from The Cancer Genome Atlas (TCGA) and utilized fivefold cross-validation to assess the performance of models. Additionally, we performed gene ontology (GO) enrichment analysis on the basis of the significant CpG sites selected by feature importance scores of XGBoost model, aiming to identify biological pathways involved in cancer progression. We also exploited the XGBoost algorithm to classify cancer types using DNA methylation profiles of paracancerous tissues in external validation datasets. Comparative experiments suggested that XGBoost achieved better predictive performance than the other four machine learning methods in predicting cancer stage. GO enrichment analysis revealed key pathways involved, highlighting the importance of paracancerous tissues in cancer progression. Furthermore, XGBoost model can accurately classify nine different cancers from TCGA, and the feature sets selected by XGBoost can also effectively predict seven cancer types on independent GEO datasets. This study provided new insights into cancer diagnosis from an epigenetic perspective and may facilitate the development of personalized diagnosis and treatment strategies.


Asunto(s)
Metilación de ADN , Neoplasias , Epigenómica , Humanos , Aprendizaje Automático , Estadificación de Neoplasias , Neoplasias/diagnóstico , Neoplasias/genética
19.
J Bone Miner Res ; 36(7): 1281-1287, 2021 07.
Artículo en Inglés | MEDLINE | ID: mdl-33784428

RESUMEN

Uncovering additional causal clinical traits and exposure variables is important when studying osteoporosis mechanisms and for the prevention of osteoporosis. Until recently, the causal relationship between anthropometric measurements and osteoporosis had not been fully revealed. In the present study, we utilized several state-of-the-art Mendelian randomization (MR) methods to investigate whether height, body mass index (BMI), waist-to-hip ratio (WHR), hip circumference (HC), and waist circumference (WC) are causally associated with two major characteristics of osteoporosis, bone mineral density (BMD) and fractures. Genomewide significant (p ≤ 5 × 10-8 ) single-nucleotide polymorphisms (SNPs) associated with the five anthropometric variables were obtained from previous large-scale genomewide association studies (GWAS) and were utilized as instrumental variables. Summary-level data of estimated bone mineral density (eBMD) and fractures were obtained from a large-scale UK Biobank GWAS. Of the MR methods utilized, the inverse-variance weighted method was the primary method used for analysis, and the weighted-median, MR-Egger, mode-based estimate, and MR pleiotropy residual sum and outlier methods were utilized for sensitivity analyses. The results of the present study indicated that each increase in height equal to a single standard deviation (SD) was associated with a 9.9% increase in risk of fracture (odds ratio [OR] = 1.099; 95% confidence interval [CI] 1.067-1.133; p = 8.793 × 10-10 ) and a 0.080 SD decrease of estimated bone mineral density (95% CI -0.106-(-0.054); p = 2.322 × 10-9 ). We also found that BMI was causally associated with eBMD (beta = 0.129, 95% CI 0.065-0.194; p = 8.113 × 10-5 ) but not associated with fracture. The WHR adjusted for BMI, HC adjusted for BMI, and WC adjusted for BMI were not found to be related to fracture occurrence or eBMD. In conclusion, the present study provided genetic evidence for certain causal relationships between anthropometric measurements and bone mineral density or fracture risk. © 2021 American Society for Bone and Mineral Research (ASBMR).


Asunto(s)
Fracturas Óseas , Osteoporosis , Densidad Ósea/genética , Fracturas Óseas/genética , Estudio de Asociación del Genoma Completo , Humanos , Análisis de la Aleatorización Mendeliana , Osteoporosis/genética , Polimorfismo de Nucleótido Simple/genética
20.
J Cancer ; 11(5): 1288-1298, 2020.
Artículo en Inglés | MEDLINE | ID: mdl-31956375

RESUMEN

Objectives: Lung adenocarcinoma (LUAD) accounts for a majority of cancer-related deaths worldwide annually. The identification of prognostic biomarkers and prediction of prognosis for LUAD patients is necessary. Materials and Methods: In this study, LUAD RNA-Seq data and clinical data from the Cancer Genome Atlas (TCGA) were divided into TCGA cohort I (n = 338) and II (n = 168). The cohort I was used for model construction, and the cohort II and data from Gene Expression Omnibus (GSE72094 cohort, n = 393; GSE11969 cohort, n = 149) were utilized for validation. First, the survival-related seed genes were selected from the cohort I using the machine learning model (random survival forest, RSF), and then in order to improve prediction accuracy, the forward selection model was utilized to identify the prognosis-related key genes among the seed genes using the clinically-integrated RNA-Seq data. Second, the survival risk score system was constructed by using these key genes in the cohort II, the GSE72094 cohort and the GSE11969 cohort, and the evaluation metrics such as HR, p value and C-index were calculated to validate the proposed method. Third, the developed approach was compared with the previous five prediction models. Finally, bioinformatics analyses (pathway, heatmap, protein-gene interaction network) have been applied to the identified seed genes and key genes. Results and Conclusion: Based on the RSF model and clinically-integrated RNA-Seq data, we identified sixteen key genes that formed the prognostic gene expression signature. These sixteen key genes could achieve a strong power for prognostic prediction of LUAD patients in cohort II (HR = 3.80, p = 1.63e-06, C-index = 0.656), and were further validated in the GSE72094 cohort (HR = 4.12, p = 1.34e-10, C-index = 0.672) and GSE11969 cohort (HR = 3.87, p = 6.81e-07, C-index = 0.670). The experimental results of three independent validation cohorts showed that compared with the traditional Cox model and the use of standalone RNA-Seq data, the machine-learning-based method effectively improved the prediction accuracy of LUAD prognosis, and the derived model was also superior to the other five existing prediction models. KEGG pathway analysis found eleven of the sixteen genes were associated with Nicotine addiction. Thirteen of the sixteen genes were reported for the first time as the LUAD prognosis-related key genes. In conclusion, we developed a sixteen-gene prognostic marker for LUAD, which may provide a powerful prognostic tool for precision oncology.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA