RESUMEN
While often represented as static entities, gene networks are highly context-dependent. Here, we developed a multi-task learning strategy to yield context-specific representations of gene network dynamics. We assembled a corpus comprising ~103 million human single-cell transcriptomes from a broad range of tissues and diseases and performed a two stage pretraining, first with non-malignant cells to generate a foundational model and then with continual learning on cancer cells to tune the model to the cancer domain. We performed multi-task learning with the foundational model to learn context-specific representations of a broad range of cell types, tissues, developmental stages, and diseases. We then leveraged the cancer-tuned model to jointly learn cell states and predict tumor-restricting factors within the colorectal tumor microenvironment. Model quantization allowed resource-efficient fine-tuning and inference while preserving biological knowledge. Overall, multi-task learning enables context-specific disease modeling that can yield contextual predictions of candidate therapeutic targets for human disease.
RESUMEN
Deciphering the genetic basis of prostate-specific antigen (PSA) levels may improve their utility for prostate cancer (PCa) screening. Using genome-wide association study (GWAS) summary statistics from 95,768 PCa-free men, we conducted a transcriptome-wide association study (TWAS) to examine impacts of genetically predicted gene expression on PSA. Analyses identified 41 statistically significant (p < 0.05/12,192 = 4.10 × 10-6) associations in whole blood and 39 statistically significant (p < 0.05/13,844 = 3.61 × 10-6) associations in prostate tissue, with 18 genes associated in both tissues. Cross-tissue analyses identified 155 statistically significantly (p < 0.05/22,249 = 2.25 × 10-6) genes. Out of 173 unique PSA-associated genes across analyses, we replicated 151 (87.3%) in a TWAS of 209,318 PCa-free individuals from the Million Veteran Program. Based on conditional analyses, we found 20 genes (11 single tissue, nine cross-tissue) that were associated with PSA levels in the discovery TWAS that were not attributable to a lead variant from a GWAS. Ten of these 20 genes replicated, and two of the replicated genes had colocalization probability of >0.5: CCNA2 and HIST1H2BN. Six of the 20 identified genes are not known to impact PCa risk. Fine-mapping based on whole blood and prostate tissue revealed five protein-coding genes with evidence of causal relationships with PSA levels. Of these five genes, four exhibited evidence of colocalization and one was conditionally independent of previous GWAS findings. These results yield hypotheses that should be further explored to improve understanding of genetic factors underlying PSA levels.
Asunto(s)
Predisposición Genética a la Enfermedad , Estudio de Asociación del Genoma Completo , Antígeno Prostático Específico , Neoplasias de la Próstata , Transcriptoma , Humanos , Masculino , Antígeno Prostático Específico/sangre , Neoplasias de la Próstata/genética , Neoplasias de la Próstata/sangre , Perfilación de la Expresión Génica , Polimorfismo de Nucleótido SimpleRESUMEN
The expansion of biobanks has significantly propelled genomic discoveries yet the sheer scale of data within these repositories poses formidable computational hurdles, particularly in handling extensive matrix operations required by prevailing statistical frameworks. In this work, we introduce computational optimizations to the SAIGE (Scalable and Accurate Implementation of Generalized Mixed Model) algorithm, notably employing a GPU-based distributed computing approach to tackle these challenges. We applied these optimizations to conduct a large-scale genome-wide association study (GWAS) across 2,068 phenotypes derived from electronic health records of 635,969 diverse participants from the Veterans Affairs (VA) Million Veteran Program (MVP). Our strategies enabled scaling up the analysis to over 6,000 nodes on the Department of Energy (DOE) Oak Ridge Leadership Computing Facility (OLCF) Summit High-Performance Computer (HPC), resulting in a 20-fold acceleration compared to the baseline model. We also provide a Docker container with our optimizations that was successfully used on multiple cloud infrastructures on UK Biobank and All of Us datasets where we showed significant time and cost benefits over the baseline SAIGE model.
RESUMEN
Training machine-learning models with synthetically generated data can alleviate the problem of data scarcity when acquiring diverse and sufficiently large datasets is costly and challenging. Here we show that cascaded diffusion models can be used to synthesize realistic whole-slide image tiles from latent representations of RNA-sequencing data from human tumours. Alterations in gene expression affected the composition of cell types in the generated synthetic image tiles, which accurately preserved the distribution of cell types and maintained the cell fraction observed in bulk RNA-sequencing data, as we show for lung adenocarcinoma, kidney renal papillary cell carcinoma, cervical squamous cell carcinoma, colon adenocarcinoma and glioblastoma. Machine-learning models pretrained with the generated synthetic data performed better than models trained from scratch. Synthetic data may accelerate the development of machine-learning models in scarce-data settings and allow for the imputation of missing data modalities.
RESUMEN
Large-scale diffusion MRI tractography remains a significant challenge. Users must orchestrate a complex sequence of instructions that requires many software packages with complex dependencies and high computational costs. We developed MaPPeRTrac, an edge-centric tractography pipeline that simplifies and accelerates this process in a wide range of high-performance computing (HPC) environments. It fully automates either probabilistic or deterministic tractography, starting from a subject's magnetic resonance imaging (MRI) data, including structural and diffusion MRI images, to the edge density image (EDI) of their structural connectomes. Dependencies are containerized with Singularity (now called Apptainer) and decoupled from code to enable rapid prototyping and modification. Data derivatives are organized with the Brain Imaging Data Structure (BIDS) to ensure that they are findable, accessible, interoperable, and reusable following FAIR principles. The pipeline takes full advantage of HPC resources using the Parsl parallel programming framework, resulting in the creation of connectome datasets of unprecedented size. MaPPeRTrac is publicly available and tested on commercial and scientific hardware, so it can accelerate brain connectome research for a broader user community. MaPPeRTrac is available at: https://github.com/LLNL/mappertrac .
Asunto(s)
Conectoma , Imagen por Resonancia Magnética , Imagen por Resonancia Magnética/métodos , Imagen de Difusión por Resonancia Magnética/métodos , Encéfalo/diagnóstico por imagen , Conectoma/métodosRESUMEN
BACKGROUND: An electronic health record-based tool could improve accuracy and eliminate bias in provider estimation of the risk of death from other causes among men with nonmetastatic cancer. OBJECTIVE: To recalibrate and validate the Veterans Aging Cohort Study Charlson Comorbidity Index (VACS-CCI) to predict non-prostate cancer mortality (non-PCM) and to compare it with a tool predicting prostate cancer mortality (PCM). DESIGN, SETTING, AND PARTICIPANTS: An observational cohort of men with biopsy-confirmed nonmetastatic prostate cancer, enrolled from 2001 to 2018 in the national US Veterans Health Administration (VA), was divided by the year of diagnosis into the development (2001-2006 and 2008-2018) and validation (2007) sets. OUTCOME MEASUREMENTS AND STATISTICAL ANALYSIS: Mortality (all cause, non-PCM, and PCM) was evaluated. Accuracy was assessed using calibration curves and C statistic in the development, validation, and combined sets; overall; and by age (<65 and 65+ yr), race (White and Black), Hispanic ethnicity, and treatment groups. RESULTS AND LIMITATIONS: Among 107 370 individuals, we observed 24 977 deaths (86% non-PCM). The median age was 65 yr, 4947 were Black, and 5010 were Hispanic. Compared with CCI and age alone (C statistic 0.67, 95% confidence interval [CI] 0.67-0.68), VACS-CCI demonstrated improved validated discrimination (C statistic 0.75, 95% CI 0.74-0.75 for non-PCM). The prostate cancer mortality tool also discriminated well in validation (C statistic 0.81, 95% CI 0.78-0.83). Both were well calibrated overall and within subgroups. Owing to missing data, 18 009/125 379 (14%) were excluded, and VACS-CCI should be validated outside the VA prior to outside application. CONCLUSIONS: VACS-CCI is ready for implementation within the VA. Electronic health record-assisted calculation is feasible, improves accuracy over age and CCI alone, and could mitigate inaccuracy and bias in provider estimation. PATIENT SUMMARY: Veterans Aging Cohort Study Charlson Comorbidity Index is ready for application within the Veterans Health Administration. Electronic health record-assisted calculation is feasible, improves accuracy over age and Charlson Comorbidity Index alone, and might help mitigate inaccuracy and bias in provider estimation of the risk of non-prostate cancer mortality.
Asunto(s)
Neoplasias de la Próstata , Humanos , Masculino , Neoplasias de la Próstata/mortalidad , Neoplasias de la Próstata/patología , Anciano , Estados Unidos/epidemiología , Persona de Mediana Edad , Estudios de Cohortes , Causas de Muerte , Registros Electrónicos de Salud/estadística & datos numéricosRESUMEN
We conducted a multi-ancestry genome-wide association study of prostate-specific antigen (PSA) levels in 296,754 men (211,342 European ancestry; 58,236 African ancestry; 23,546 Hispanic/Latino; 3,630 Asian ancestry; 96.5% of participants were from the Million Veteran Program). We identified 318 independent genome-wide significant (p≤5e-8) variants, 184 of which were novel. Most demonstrated evidence of replication in an independent cohort (n=95,768). Meta-analyzing discovery and replication (n=392,522) identified 447 variants, of which a further 111 were novel. Out-of-sample variance in PSA explained by our new polygenic risk score reached 16.9% (95% CI=16.1%-17.8%) in European ancestry, 9.5% (95% CI=7.0%-12.2%) in African ancestry, 18.6% (95% CI=15.8%-21.4%) in Hispanic/Latino, and 15.3% (95% CI=12.7%-18.1%) in Asian ancestry, and lower for higher age. Our study highlights how including proportionally more participants from underrepresented populations improves genetic prediction of PSA levels, with potential to personalize prostate cancer screening.
RESUMEN
[This corrects the article DOI: 10.1371/journal.pone.0213013.].
RESUMEN
OBJECTIVE: Suicidal behavior is heritable and is a major cause of death worldwide. Two large-scale genome-wide association studies (GWASs) recently discovered and cross-validated genome-wide significant (GWS) loci for suicide attempt (SA). The present study leveraged the genetic cohorts from both studies to conduct the largest GWAS meta-analysis of SA to date. Multi-ancestry and admixture-specific meta-analyses were conducted within groups of significant African, East Asian, and European ancestry admixtures. METHODS: This study comprised 22 cohorts, including 43,871 SA cases and 915,025 ancestry-matched controls. Analytical methods across multi-ancestry and individual ancestry admixtures included inverse variance-weighted fixed-effects meta-analyses, followed by gene, gene-set, tissue-set, and drug-target enrichment, as well as summary-data-based Mendelian randomization with brain expression quantitative trait loci data, phenome-wide genetic correlation, and genetic causal proportion analyses. RESULTS: Multi-ancestry and European ancestry admixture GWAS meta-analyses identified 12 risk loci at p values <5×10-8. These loci were mostly intergenic and implicated DRD2, SLC6A9, FURIN, NLGN1, SOX5, PDE4B, and CACNG2. The multi-ancestry SNP-based heritability estimate of SA was 5.7% on the liability scale (SE=0.003, p=5.7×10-80). Significant brain tissue gene expression and drug set enrichment were observed. There was shared genetic variation of SA with attention deficit hyperactivity disorder, smoking, and risk tolerance after conditioning SA on both major depressive disorder and posttraumatic stress disorder. Genetic causal proportion analyses implicated shared genetic risk for specific health factors. CONCLUSIONS: This multi-ancestry analysis of suicide attempt identified several loci contributing to risk and establishes significant shared genetic covariation with clinical phenotypes. These findings provide insight into genetic factors associated with suicide attempt across ancestry admixture populations, in veteran and civilian populations, and in attempt versus death.
Asunto(s)
Trastorno Depresivo Mayor , Estudio de Asociación del Genoma Completo , Humanos , Intento de Suicidio , Trastorno Depresivo Mayor/genética , Factores de Riesgo , Ideación Suicida , Polimorfismo de Nucleótido Simple/genética , Predisposición Genética a la Enfermedad/genética , Sitios Genéticos/genéticaRESUMEN
Genome-wide polygenic risk scores (GW-PRS) have been reported to have better predictive ability than PRS based on genome-wide significance thresholds across numerous traits. We compared the predictive ability of several GW-PRS approaches to a recently developed PRS of 269 established prostate cancer risk variants from multi-ancestry GWAS and fine-mapping studies (PRS 269 ). GW-PRS models were trained using a large and diverse prostate cancer GWAS of 107,247 cases and 127,006 controls used to develop the multi-ancestry PRS 269 . Resulting models were independently tested in 1,586 cases and 1,047 controls of African ancestry from the California/Uganda Study and 8,046 cases and 191,825 controls of European ancestry from the UK Biobank and further validated in 13,643 cases and 210,214 controls of European ancestry and 6,353 cases and 53,362 controls of African ancestry from the Million Veteran Program. In the testing data, the best performing GW-PRS approach had AUCs of 0.656 (95% CI=0.635-0.677) in African and 0.844 (95% CI=0.840-0.848) in European ancestry men and corresponding prostate cancer OR of 1.83 (95% CI=1.67-2.00) and 2.19 (95% CI=2.14-2.25), respectively, for each SD unit increase in the GW-PRS. However, compared to the GW-PRS, in African and European ancestry men, the PRS 269 had larger or similar AUCs (AUC=0.679, 95% CI=0.659-0.700 and AUC=0.845, 95% CI=0.841-0.849, respectively) and comparable prostate cancer OR (OR=2.05, 95% CI=1.87-2.26 and OR=2.21, 95% CI=2.16-2.26, respectively). Findings were similar in the validation data. This investigation suggests that current GW-PRS approaches may not improve the ability to predict prostate cancer risk compared to the multi-ancestry PRS 269 constructed with fine-mapping.
RESUMEN
Genome-wide polygenic risk scores (GW-PRSs) have been reported to have better predictive ability than PRSs based on genome-wide significance thresholds across numerous traits. We compared the predictive ability of several GW-PRS approaches to a recently developed PRS of 269 established prostate cancer-risk variants from multi-ancestry GWASs and fine-mapping studies (PRS269). GW-PRS models were trained with a large and diverse prostate cancer GWAS of 107,247 cases and 127,006 controls that we previously used to develop the multi-ancestry PRS269. Resulting models were independently tested in 1,586 cases and 1,047 controls of African ancestry from the California Uganda Study and 8,046 cases and 191,825 controls of European ancestry from the UK Biobank and further validated in 13,643 cases and 210,214 controls of European ancestry and 6,353 cases and 53,362 controls of African ancestry from the Million Veteran Program. In the testing data, the best performing GW-PRS approach had AUCs of 0.656 (95% CI = 0.635-0.677) in African and 0.844 (95% CI = 0.840-0.848) in European ancestry men and corresponding prostate cancer ORs of 1.83 (95% CI = 1.67-2.00) and 2.19 (95% CI = 2.14-2.25), respectively, for each SD unit increase in the GW-PRS. Compared to the GW-PRS, in African and European ancestry men, the PRS269 had larger or similar AUCs (AUC = 0.679, 95% CI = 0.659-0.700 and AUC = 0.845, 95% CI = 0.841-0.849, respectively) and comparable prostate cancer ORs (OR = 2.05, 95% CI = 1.87-2.26 and OR = 2.21, 95% CI = 2.16-2.26, respectively). Findings were similar in the validation studies. This investigation suggests that current GW-PRS approaches may not improve the ability to predict prostate cancer risk compared to the PRS269 developed from multi-ancestry GWASs and fine-mapping.
Asunto(s)
Predisposición Genética a la Enfermedad , Neoplasias de la Próstata , Humanos , Masculino , Población Negra/genética , Estudio de Asociación del Genoma Completo , Herencia Multifactorial/genética , Neoplasias de la Próstata/genética , Factores de Riesgo , Población Blanca/genéticaRESUMEN
Importance: Primary prevention of atherosclerotic cardiovascular disease (ASCVD) relies on risk stratification. Genome-wide polygenic risk scores (PRSs) are proposed to improve ASCVD risk estimation. Objective: To determine whether genome-wide PRSs for coronary artery disease (CAD) and acute ischemic stroke improve ASCVD risk estimation with traditional clinical risk factors in an ancestrally diverse midlife population. Design, Setting, and Participants: This was a prognostic analysis of incident events in a retrospectively defined longitudinal cohort conducted from January 1, 2011, to December 31, 2018. Included in the study were adults free of ASCVD and statin naive at baseline from the Million Veteran Program (MVP), a mega biobank with genetic, survey, and electronic health record data from a large US health care system. Data were analyzed from March 15, 2021, to January 5, 2023. Exposures: PRSs for CAD and ischemic stroke derived from cohorts of largely European descent and risk factors, including age, sex, systolic blood pressure, total cholesterol, high-density lipoprotein (HDL) cholesterol, smoking, and diabetes status. Main Outcomes and Measures: Incident nonfatal myocardial infarction (MI), ischemic stroke, ASCVD death, and composite ASCVD events. Results: A total of 79â¯151 participants (mean [SD] age, 57.8 [13.7] years; 68â¯503 male [86.5%]) were included in the study. The cohort included participants from the following harmonized genetic ancestry and race and ethnicity categories: 18â¯505 non-Hispanic Black (23.4%), 6785 Hispanic (8.6%), and 53â¯861 non-Hispanic White (68.0%) with a median (5th-95th percentile) follow-up of 4.3 (0.7-6.9) years. From 2011 to 2018, 3186 MIs (4.0%), 1933 ischemic strokes (2.4%), 867 ASCVD deaths (1.1%), and 5485 composite ASCVD events (6.9%) were observed. CAD PRS was associated with incident MI in non-Hispanic Black (hazard ratio [HR], 1.10; 95% CI, 1.02-1.19), Hispanic (HR, 1.26; 95% CI, 1.09-1.46), and non-Hispanic White (HR, 1.23; 95% CI, 1.18-1.29) participants. Stroke PRS was associated with incident stroke in non-Hispanic White participants (HR, 1.15; 95% CI, 1.08-1.21). A combined CAD plus stroke PRS was associated with ASCVD deaths among non-Hispanic Black (HR, 1.19; 95% CI, 1.03-1.17) and non-Hispanic (HR, 1.11; 95% CI, 1.03-1.21) participants. The combined PRS was also associated with composite ASCVD across all ancestry groups but greater among non-Hispanic White (HR, 1.20; 95% CI, 1.16-1.24) than non-Hispanic Black (HR, 1.11; 95% CI, 1.05-1.17) and Hispanic (HR, 1.12; 95% CI, 1.00-1.25) participants. Net reclassification improvement from adding PRS to a traditional risk model was modest for the intermediate risk group for composite CVD among men (5-year risk >3.75%, 0.38%; 95% CI, 0.07%-0.68%), among women, (6.79%; 95% CI, 3.01%-10.58%), for age older than 55 years (0.25%; 95% CI, 0.03%-0.47%), and for ages 40 to 55 years (1.61%; 95% CI, -0.07% to 3.30%). Conclusions and Relevance: Study results suggest that PRSs derived predominantly in European samples were statistically significantly associated with ASCVD in the multiancestry midlife and older-age MVP cohort. Overall, modest improvement in discrimination metrics were observed with addition of PRSs to traditional risk factors with greater magnitude in women and younger age groups.
Asunto(s)
Aterosclerosis , Enfermedades Cardiovasculares , Enfermedad de la Arteria Coronaria , Accidente Cerebrovascular Isquémico , Infarto del Miocardio , Accidente Cerebrovascular , Veteranos , Adulto , Humanos , Masculino , Femenino , Persona de Mediana Edad , Enfermedades Cardiovasculares/epidemiología , Enfermedades Cardiovasculares/genética , Estudios Retrospectivos , Medición de Riesgo/métodos , Factores de Riesgo , Enfermedad de la Arteria Coronaria/epidemiología , Enfermedad de la Arteria Coronaria/genética , Aterosclerosis/epidemiología , Infarto del Miocardio/epidemiología , Accidente Cerebrovascular/epidemiología , ColesterolRESUMEN
Suicidal ideation (SI) often precedes and predicts suicide attempt and death, is the most common suicidal phenotype and is over-represented in veterans. The genetic architecture of SI in the absence of suicide attempt (SA) is unknown, yet believed to have distinct and overlapping risk with other suicidal behaviors. We performed the first GWAS of SI without SA in the Million Veteran Program (MVP), identifying 99,814 SI cases from electronic health records without a history of SA or suicide death (SD) and 512,567 controls without SI, SA or SD. GWAS was performed separately in the four largest ancestry groups, controlling for sex, age and genetic substructure. Ancestry-specific results were combined via meta-analysis to identify pan-ancestry loci. Four genome-wide significant (GWS) loci were identified in the pan-ancestry meta-analysis with loci on chromosomes 6 and 9 associated with suicide attempt in an independent sample. Pan-ancestry gene-based analysis identified GWS associations with DRD2, DCC, FBXL19, BCL7C, CTF1, ANNK1, and EXD3. Gene-set analysis implicated synaptic and startle response pathways (q's<0.05). European ancestry (EA) analysis identified GWS loci on chromosomes 6 and 9, as well as GWS gene associations in EXD3, DRD2, and DCC. No other ancestry-specific GWS results were identified, underscoring the need to increase representation of diverse individuals. The genetic correlation of SI and SA within MVP was high (rG = 0.87; p = 1.09e-50), as well as with post-traumatic stress disorder (PTSD; rG = 0.78; p = 1.98e-95) and major depressive disorder (MDD; rG = 0.78; p = 8.33e-83). Conditional analysis on PTSD and MDD attenuated most pan-ancestry and EA GWS signals for SI without SA to nominal significance, with the exception of EXD3 which remained GWS. Our novel findings support a polygenic and complex architecture for SI without SA which is largely shared with SA and overlaps with psychiatric conditions frequently comorbid with suicidal behaviors.
Asunto(s)
Trastorno Depresivo Mayor , Veteranos , Humanos , Ideación Suicida , Veteranos/psicología , Estudio de Asociación del Genoma Completo , Trastorno Depresivo Mayor/genética , Intento de Suicidio/psicología , Factores de RiesgoRESUMEN
BACKGROUND: Genetic factors play an important role in prostate cancer (PCa) susceptibility. OBJECTIVE: To discover common genetic variants contributing to the risk of PCa in men of African ancestry. DESIGN, SETTING, AND PARTICIPANTS: We conducted a meta-analysis of ten genome-wide association studies consisting of 19378 cases and 61620 controls of African ancestry. OUTCOME MEASUREMENTS AND STATISTICAL ANALYSIS: Common genotyped and imputed variants were tested for their association with PCa risk. Novel susceptibility loci were identified and incorporated into a multiancestry polygenic risk score (PRS). The PRS was evaluated for associations with PCa risk and disease aggressiveness. RESULTS AND LIMITATIONS: Nine novel susceptibility loci for PCa were identified, of which seven were only found or substantially more common in men of African ancestry, including an African-specific stop-gain variant in the prostate-specific gene anoctamin 7 (ANO7). A multiancestry PRS of 278 risk variants conferred strong associations with PCa risk in African ancestry studies (odds ratios [ORs] >3 and >5 for men in the top PRS decile and percentile, respectively). More importantly, compared with men in the 40-60% PRS category, men in the top PRS decile had a significantly higher risk of aggressive PCa (OR = 1.23, 95% confidence interval = 1.10-1.38, p = 4.4 × 10-4). CONCLUSIONS: This study demonstrates the importance of large-scale genetic studies in men of African ancestry for a better understanding of PCa susceptibility in this high-risk population and suggests a potential clinical utility of PRS in differentiating between the risks of developing aggressive and nonaggressive disease in men of African ancestry. PATIENT SUMMARY: In this large genetic study in men of African ancestry, we discovered nine novel prostate cancer (PCa) risk variants. We also showed that a multiancestry polygenic risk score was effective in stratifying PCa risk, and was able to differentiate risk of aggressive and nonaggressive disease.
Asunto(s)
Predisposición Genética a la Enfermedad , Neoplasias de la Próstata , Masculino , Humanos , Estudio de Asociación del Genoma Completo , Neoplasias de la Próstata/genética , Neoplasias de la Próstata/epidemiología , Factores de Riesgo , Población Negra/genéticaRESUMEN
Objective. Developing Machine Learning models (N Gorre et al 2023) for clinical applications from scratch can be a cumbersome task requiring varying levels of expertise. Seasoned developers and researchers may also often face incompatible frameworks and data preparation issues. This is further complicated in the context of diagnostic radiology and oncology applications, given the heterogenous nature of the input data and the specialized task requirements. Our goal is to provide clinicians, researchers, and early AI developers with a modular, flexible, and user-friendly software tool that can effectively meet their needs to explore, train, and test AI algorithms by allowing users to interpret their model results. This latter step involves the incorporation of interpretability and explainability methods that would allow visualizing performance as well as interpreting predictions across the different neural network layers of a deep learning algorithm.Approach. To demonstrate our proposed tool, we have developed the CRP10 AI Application Interface (CRP10AII) as part of the MIDRC consortium. CRP10AII is based on the web service Django framework in Python. CRP10AII/Django/Python in combination with another data manager tool/platform, data commons such as Gen3 can provide a comprehensive while easy to use machine/deep learning analytics tool. The tool allows to test, visualize, interpret how and why the deep learning model is performing. The major highlight of CRP10AII is its capability of visualization and interpretability of otherwise Blackbox AI algorithms.Results. CRP10AII provides many convenient features for model building and evaluation, including: (1) query and acquire data according to the specific application (e.g. classification, segmentation) from the data common platform (Gen3 here); (2) train the AI models from scratch or use pre-trained models (e.g. VGGNet, AlexNet, BERT) for transfer learning and test the model predictions, performance assessment, receiver operating characteristics curve evaluation; (3) interpret the AI model predictions using methods like SHAPLEY, LIME values; and (4) visualize the model learning through heatmaps and activation maps of individual layers of the neural network.Significance. Unexperienced users may have more time to swiftly pre-process, build/train their AI models on their own use-cases, and further visualize and explore these AI models as part of this pipeline, all in an end-to-end manner. CRP10AII will be provided as an open-source tool, and we expect to continue developing it based on users' feedback.
Asunto(s)
Algoritmos , Redes Neurales de la Computación , Programas Informáticos , Aprendizaje Automático , Curva ROCRESUMEN
Data scarcity presents a significant obstacle in the field of biomedicine, where acquiring diverse and sufficient datasets can be costly and challenging. Synthetic data generation offers a potential solution to this problem by expanding dataset sizes, thereby enabling the training of more robust and generalizable machine learning models. Although previous studies have explored synthetic data generation for cancer diagnosis, they have predominantly focused on single modality settings, such as whole-slide image tiles or RNA-Seq data. To bridge this gap, we propose a novel approach, RNA-Cascaded-Diffusion-Model or RNA-CDM, for performing RNA-to-image synthesis in a multi-cancer context, drawing inspiration from successful text-to-image synthesis models used in natural images. In our approach, we employ a variational auto-encoder to reduce the dimensionality of a patient's gene expression profile, effectively distinguishing between different types of cancer. Subsequently, we employ a cascaded diffusion model to synthesize realistic whole-slide image tiles using the latent representation derived from the patient's RNA-Seq data. Our results demonstrate that the generated tiles accurately preserve the distribution of cell types observed in real-world data, with state-of-the-art cell identification models successfully detecting important cell types in the synthetic samples. Furthermore, we illustrate that the synthetic tiles maintain the cell fraction observed in bulk RNA-Seq data and that modifications in gene expression affect the composition of cell types in the synthetic tiles. Next, we utilize the synthetic data generated by RNA-CDM to pretrain machine learning models and observe improved performance compared to training from scratch. Our study emphasizes the potential usefulness of synthetic data in developing machine learning models in sarce-data settings, while also highlighting the possibility of imputing missing data modalities by leveraging the available information. In conclusion, our proposed RNA-CDM approach for synthetic data generation in biomedicine, particularly in the context of cancer diagnosis, offers a novel and promising solution to address data scarcity. By generating synthetic data that aligns with real-world distributions and leveraging it to pretrain machine learning models, we contribute to the development of robust clinical decision support systems and potential advancements in precision medicine.
RESUMEN
Importance: Suicide is a leading cause of death; however, the molecular genetic basis of suicidal thoughts and behaviors (SITB) remains unknown. Objective: To identify novel, replicable genomic risk loci for SITB. Design, Setting, and Participants: This genome-wide association study included 633â¯778 US military veterans with and without SITB, as identified through electronic health records. GWAS was performed separately by ancestry, controlling for sex, age, and genetic substructure. Cross-ancestry risk loci were identified through meta-analysis. Study enrollment began in 2011 and is ongoing. Data were analyzed from November 2021 to August 2022. Main Outcome and Measures: SITB. Results: A total of 633â¯778 US military veterans were included in the analysis (57â¯152 [9%] female; 121â¯118 [19.1%] African ancestry, 8285 [1.3%] Asian ancestry, 452â¯767 [71.4%] European ancestry, and 51â¯608 [8.1%] Hispanic ancestry), including 121â¯211 individuals with SITB (19.1%). Meta-analysis identified more than 200 GWS (P < 5 × 10-8) cross-ancestry risk single-nucleotide variants for SITB concentrated in 7 regions on chromosomes 2, 6, 9, 11, 14, 16, and 18. Top single-nucleotide variants were largely intronic in nature; 5 were independently replicated in ISGC, including rs6557168 in ESR1, rs12808482 in DRD2, rs77641763 in EXD3, rs10671545 in DCC, and rs36006172 in TRAF3. Associations for FBXL19 and AC018880.2 were not replicated. Gene-based analyses implicated 24 additional GWS cross-ancestry risk genes, including FURIN, TSNARE1, and the NCAM1-TTC12-ANKK1-DRD2 gene cluster. Cross-ancestry enrichment analyses revealed significant enrichment for expression in brain and pituitary tissue, synapse and ubiquitination processes, amphetamine addiction, parathyroid hormone synthesis, axon guidance, and dopaminergic pathways. Seven other unique European ancestry-specific GWS loci were identified, 2 of which (POM121L2 and METTL15/LINC02758) were replicated. Two additional GWS ancestry-specific loci were identified within the African ancestry (PET112/GATB) and Hispanic ancestry (intergenic locus on chromosome 4) subsets, both of which were replicated. No GWS loci were identified within the Asian ancestry subset; however, significant enrichment was observed for axon guidance, cyclic adenosine monophosphate signaling, focal adhesion, glutamatergic synapse, and oxytocin signaling pathways across all ancestries. Within the European ancestry subset, genetic correlations (r > 0.75) were observed between the SITB phenotype and a suicide attempt-only phenotype, depression, and posttraumatic stress disorder. Additionally, polygenic risk score analyses revealed that the Million Veteran Program polygenic risk score had nominally significant main effects in 2 independent samples of veterans of European and African ancestry. Conclusions and Relevance: The findings of this analysis may advance understanding of the molecular genetic basis of SITB and provide evidence for ESR1, DRD2, TRAF3, and DCC as cross-ancestry candidate risk genes. More work is needed to replicate these findings and to determine if and how these genes might impact clinical care.
Asunto(s)
Veteranos , Humanos , Femenino , Masculino , Ideación Suicida , Estudio de Asociación del Genoma Completo , Factor 3 Asociado a Receptor de TNF/genética , Sitios Genéticos/genética , Nucleótidos , Polimorfismo de Nucleótido Simple/genética , Predisposición Genética a la Enfermedad/genética , Proteínas , Proteínas Serina-Treonina Quinasas/genéticaRESUMEN
The following sections are included: Introduction, Background, and Motivation, Workshop Presenters, References.
Asunto(s)
Biología Computacional , HumanosRESUMEN
Varicose veins represent a common cause of cardiovascular morbidity, with limited available medical therapies. Although varicose veins are heritable and epidemiologic studies have identified several candidate varicose vein risk factors, the molecular and genetic basis remains uncertain. Here we analyzed the contribution of common genetic variants to varicose veins using data from the Veterans Affairs Million Veteran Program and four other large biobanks. Among 49,765 individuals with varicose veins and 1,334,301 disease-free controls, we identified 139 risk loci. We identified genetic overlap between varicose veins, other vascular diseases and dozens of anthropometric factors. Using Mendelian randomization, we prioritized therapeutic targets via integration of proteomic and transcriptomic data. Finally, topological enrichment analyses confirmed the biologic roles of endothelial shear flow disruption, inflammation, vascular remodeling and angiogenesis. These findings may facilitate future efforts to develop nonsurgical therapies for varicose veins.