Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 48
Filter
1.
bioRxiv ; 2024 May 22.
Article in English | MEDLINE | ID: mdl-38826407

ABSTRACT

The expansion of biobanks has significantly propelled genomic discoveries yet the sheer scale of data within these repositories poses formidable computational hurdles, particularly in handling extensive matrix operations required by prevailing statistical frameworks. In this work, we introduce computational optimizations to the SAIGE (Scalable and Accurate Implementation of Generalized Mixed Model) algorithm, notably employing a GPU-based distributed computing approach to tackle these challenges. We applied these optimizations to conduct a large-scale genome-wide association study (GWAS) across 2,068 phenotypes derived from electronic health records of 635,969 diverse participants from the Veterans Affairs (VA) Million Veteran Program (MVP). Our strategies enabled scaling up the analysis to over 6,000 nodes on the Department of Energy (DOE) Oak Ridge Leadership Computing Facility (OLCF) Summit High-Performance Computer (HPC), resulting in a 20-fold acceleration compared to the baseline model. We also provide a Docker container with our optimizations that was successfully used on multiple cloud infrastructures on UK Biobank and All of Us datasets where we showed significant time and cost benefits over the baseline SAIGE model.

2.
HGG Adv ; : 100315, 2024 Jun 05.
Article in English | MEDLINE | ID: mdl-38845201

ABSTRACT

Deciphering the genetic basis of prostate-specific antigen (PSA) levels may improve their utility for prostate cancer (PCa) screening. Using genome-wide summary statistics from 95,768 PCa-free men, we conducted a transcriptome-wide association study (TWAS) to examine impacts of genetically predicted gene expression on PSA. Analyses identified 41 statistically significant (p < 0.05/12,192 = 4.10×10-6) associations in whole blood and 39 statistically significant (p < 0.05/13,844 = 3.61×10-6) associations in prostate tissue, with 18 genes associated in both tissues. Cross-tissue analyses identified 155 statistically significantly (p < 0.05/22,249 = 2.25×10-6) genes. Out of 173 unique PSA-associated genes across analyses, we replicated 151 (87.3%) in TWAS of 209,318 PCa-free individuals from the Million Veteran Program. Based on conditional analyses, we found 20 genes (11 single-tissue, nine cross-tissue) that were associated with PSA levels in the discovery TWAS that were not attributable to a lead variant from a genome-wide association study (GWAS). Ten of these 20 genes replicated, and two of the replicated genes had colocalization probability > 0.5: CCNA2 and HIST1H2BN. Six of the 20 identified genes are not known to impact PCa risk. Fine mapping based on whole blood and prostate tissue revealed five protein-coding genes with evidence of causal relationships with PSA levels. Of these five genes, four exhibited evidence of colocalization and one was conditionally independent of previous GWAS findings. These results yield hypotheses that should be further explored to improve understanding of genetic factors underlying PSA levels.

3.
Nat Biomed Eng ; 2024 Mar 21.
Article in English | MEDLINE | ID: mdl-38514775

ABSTRACT

Training machine-learning models with synthetically generated data can alleviate the problem of data scarcity when acquiring diverse and sufficiently large datasets is costly and challenging. Here we show that cascaded diffusion models can be used to synthesize realistic whole-slide image tiles from latent representations of RNA-sequencing data from human tumours. Alterations in gene expression affected the composition of cell types in the generated synthetic image tiles, which accurately preserved the distribution of cell types and maintained the cell fraction observed in bulk RNA-sequencing data, as we show for lung adenocarcinoma, kidney renal papillary cell carcinoma, cervical squamous cell carcinoma, colon adenocarcinoma and glioblastoma. Machine-learning models pretrained with the generated synthetic data performed better than models trained from scratch. Synthetic data may accelerate the development of machine-learning models in scarce-data settings and allow for the imputation of missing data modalities.

4.
Neuroinformatics ; 22(2): 177-191, 2024 Apr.
Article in English | MEDLINE | ID: mdl-38446357

ABSTRACT

Large-scale diffusion MRI tractography remains a significant challenge. Users must orchestrate a complex sequence of instructions that requires many software packages with complex dependencies and high computational costs. We developed MaPPeRTrac, an edge-centric tractography pipeline that simplifies and accelerates this process in a wide range of high-performance computing (HPC) environments. It fully automates either probabilistic or deterministic tractography, starting from a subject's magnetic resonance imaging (MRI) data, including structural and diffusion MRI images, to the edge density image (EDI) of their structural connectomes. Dependencies are containerized with Singularity (now called Apptainer) and decoupled from code to enable rapid prototyping and modification. Data derivatives are organized with the Brain Imaging Data Structure (BIDS) to ensure that they are findable, accessible, interoperable, and reusable following FAIR principles. The pipeline takes full advantage of HPC resources using the Parsl parallel programming framework, resulting in the creation of connectome datasets of unprecedented size. MaPPeRTrac is publicly available and tested on commercial and scientific hardware, so it can accelerate brain connectome research for a broader user community. MaPPeRTrac is available at: https://github.com/LLNL/mappertrac .


Subject(s)
Connectome , Magnetic Resonance Imaging , Magnetic Resonance Imaging/methods , Diffusion Magnetic Resonance Imaging/methods , Brain/diagnostic imaging , Connectome/methods
5.
Eur Urol Oncol ; 2024 Jan 02.
Article in English | MEDLINE | ID: mdl-38171965

ABSTRACT

BACKGROUND: An electronic health record-based tool could improve accuracy and eliminate bias in provider estimation of the risk of death from other causes among men with nonmetastatic cancer. OBJECTIVE: To recalibrate and validate the Veterans Aging Cohort Study Charlson Comorbidity Index (VACS-CCI) to predict non-prostate cancer mortality (non-PCM) and to compare it with a tool predicting prostate cancer mortality (PCM). DESIGN, SETTING, AND PARTICIPANTS: An observational cohort of men with biopsy-confirmed nonmetastatic prostate cancer, enrolled from 2001 to 2018 in the national US Veterans Health Administration (VA), was divided by the year of diagnosis into the development (2001-2006 and 2008-2018) and validation (2007) sets. OUTCOME MEASUREMENTS AND STATISTICAL ANALYSIS: Mortality (all cause, non-PCM, and PCM) was evaluated. Accuracy was assessed using calibration curves and C statistic in the development, validation, and combined sets; overall; and by age (<65 and 65+ yr), race (White and Black), Hispanic ethnicity, and treatment groups. RESULTS AND LIMITATIONS: Among 107 370 individuals, we observed 24 977 deaths (86% non-PCM). The median age was 65 yr, 4947 were Black, and 5010 were Hispanic. Compared with CCI and age alone (C statistic 0.67, 95% confidence interval [CI] 0.67-0.68), VACS-CCI demonstrated improved validated discrimination (C statistic 0.75, 95% CI 0.74-0.75 for non-PCM). The prostate cancer mortality tool also discriminated well in validation (C statistic 0.81, 95% CI 0.78-0.83). Both were well calibrated overall and within subgroups. Owing to missing data, 18 009/125 379 (14%) were excluded, and VACS-CCI should be validated outside the VA prior to outside application. CONCLUSIONS: VACS-CCI is ready for implementation within the VA. Electronic health record-assisted calculation is feasible, improves accuracy over age and CCI alone, and could mitigate inaccuracy and bias in provider estimation. PATIENT SUMMARY: Veterans Aging Cohort Study Charlson Comorbidity Index is ready for application within the Veterans Health Administration. Electronic health record-assisted calculation is feasible, improves accuracy over age and Charlson Comorbidity Index alone, and might help mitigate inaccuracy and bias in provider estimation of the risk of non-prostate cancer mortality.

6.
medRxiv ; 2023 Oct 30.
Article in English | MEDLINE | ID: mdl-37961155

ABSTRACT

We conducted a multi-ancestry genome-wide association study of prostate-specific antigen (PSA) levels in 296,754 men (211,342 European ancestry; 58,236 African ancestry; 23,546 Hispanic/Latino; 3,630 Asian ancestry; 96.5% of participants were from the Million Veteran Program). We identified 318 independent genome-wide significant (p≤5e-8) variants, 184 of which were novel. Most demonstrated evidence of replication in an independent cohort (n=95,768). Meta-analyzing discovery and replication (n=392,522) identified 447 variants, of which a further 111 were novel. Out-of-sample variance in PSA explained by our new polygenic risk score reached 16.9% (95% CI=16.1%-17.8%) in European ancestry, 9.5% (95% CI=7.0%-12.2%) in African ancestry, 18.6% (95% CI=15.8%-21.4%) in Hispanic/Latino, and 15.3% (95% CI=12.7%-18.1%) in Asian ancestry, and lower for higher age. Our study highlights how including proportionally more participants from underrepresented populations improves genetic prediction of PSA levels, with potential to personalize prostate cancer screening.

8.
Am J Psychiatry ; 180(10): 723-738, 2023 10 01.
Article in English | MEDLINE | ID: mdl-37777856

ABSTRACT

OBJECTIVE: Suicidal behavior is heritable and is a major cause of death worldwide. Two large-scale genome-wide association studies (GWASs) recently discovered and cross-validated genome-wide significant (GWS) loci for suicide attempt (SA). The present study leveraged the genetic cohorts from both studies to conduct the largest GWAS meta-analysis of SA to date. Multi-ancestry and admixture-specific meta-analyses were conducted within groups of significant African, East Asian, and European ancestry admixtures. METHODS: This study comprised 22 cohorts, including 43,871 SA cases and 915,025 ancestry-matched controls. Analytical methods across multi-ancestry and individual ancestry admixtures included inverse variance-weighted fixed-effects meta-analyses, followed by gene, gene-set, tissue-set, and drug-target enrichment, as well as summary-data-based Mendelian randomization with brain expression quantitative trait loci data, phenome-wide genetic correlation, and genetic causal proportion analyses. RESULTS: Multi-ancestry and European ancestry admixture GWAS meta-analyses identified 12 risk loci at p values <5×10-8. These loci were mostly intergenic and implicated DRD2, SLC6A9, FURIN, NLGN1, SOX5, PDE4B, and CACNG2. The multi-ancestry SNP-based heritability estimate of SA was 5.7% on the liability scale (SE=0.003, p=5.7×10-80). Significant brain tissue gene expression and drug set enrichment were observed. There was shared genetic variation of SA with attention deficit hyperactivity disorder, smoking, and risk tolerance after conditioning SA on both major depressive disorder and posttraumatic stress disorder. Genetic causal proportion analyses implicated shared genetic risk for specific health factors. CONCLUSIONS: This multi-ancestry analysis of suicide attempt identified several loci contributing to risk and establishes significant shared genetic covariation with clinical phenotypes. These findings provide insight into genetic factors associated with suicide attempt across ancestry admixture populations, in veteran and civilian populations, and in attempt versus death.


Subject(s)
Depressive Disorder, Major , Genome-Wide Association Study , Humans , Suicide, Attempted , Depressive Disorder, Major/genetics , Risk Factors , Suicidal Ideation , Polymorphism, Single Nucleotide/genetics , Genetic Predisposition to Disease/genetics , Genetic Loci/genetics
10.
Am J Hum Genet ; 110(7): 1200-1206, 2023 07 06.
Article in English | MEDLINE | ID: mdl-37311464

ABSTRACT

Genome-wide polygenic risk scores (GW-PRSs) have been reported to have better predictive ability than PRSs based on genome-wide significance thresholds across numerous traits. We compared the predictive ability of several GW-PRS approaches to a recently developed PRS of 269 established prostate cancer-risk variants from multi-ancestry GWASs and fine-mapping studies (PRS269). GW-PRS models were trained with a large and diverse prostate cancer GWAS of 107,247 cases and 127,006 controls that we previously used to develop the multi-ancestry PRS269. Resulting models were independently tested in 1,586 cases and 1,047 controls of African ancestry from the California Uganda Study and 8,046 cases and 191,825 controls of European ancestry from the UK Biobank and further validated in 13,643 cases and 210,214 controls of European ancestry and 6,353 cases and 53,362 controls of African ancestry from the Million Veteran Program. In the testing data, the best performing GW-PRS approach had AUCs of 0.656 (95% CI = 0.635-0.677) in African and 0.844 (95% CI = 0.840-0.848) in European ancestry men and corresponding prostate cancer ORs of 1.83 (95% CI = 1.67-2.00) and 2.19 (95% CI = 2.14-2.25), respectively, for each SD unit increase in the GW-PRS. Compared to the GW-PRS, in African and European ancestry men, the PRS269 had larger or similar AUCs (AUC = 0.679, 95% CI = 0.659-0.700 and AUC = 0.845, 95% CI = 0.841-0.849, respectively) and comparable prostate cancer ORs (OR = 2.05, 95% CI = 1.87-2.26 and OR = 2.21, 95% CI = 2.16-2.26, respectively). Findings were similar in the validation studies. This investigation suggests that current GW-PRS approaches may not improve the ability to predict prostate cancer risk compared to the PRS269 developed from multi-ancestry GWASs and fine-mapping.


Subject(s)
Genetic Predisposition to Disease , Prostatic Neoplasms , Humans , Male , Black People/genetics , Genome-Wide Association Study , Multifactorial Inheritance/genetics , Prostatic Neoplasms/genetics , Risk Factors , White People/genetics
11.
medRxiv ; 2023 May 15.
Article in English | MEDLINE | ID: mdl-37292833

ABSTRACT

Genome-wide polygenic risk scores (GW-PRS) have been reported to have better predictive ability than PRS based on genome-wide significance thresholds across numerous traits. We compared the predictive ability of several GW-PRS approaches to a recently developed PRS of 269 established prostate cancer risk variants from multi-ancestry GWAS and fine-mapping studies (PRS 269 ). GW-PRS models were trained using a large and diverse prostate cancer GWAS of 107,247 cases and 127,006 controls used to develop the multi-ancestry PRS 269 . Resulting models were independently tested in 1,586 cases and 1,047 controls of African ancestry from the California/Uganda Study and 8,046 cases and 191,825 controls of European ancestry from the UK Biobank and further validated in 13,643 cases and 210,214 controls of European ancestry and 6,353 cases and 53,362 controls of African ancestry from the Million Veteran Program. In the testing data, the best performing GW-PRS approach had AUCs of 0.656 (95% CI=0.635-0.677) in African and 0.844 (95% CI=0.840-0.848) in European ancestry men and corresponding prostate cancer OR of 1.83 (95% CI=1.67-2.00) and 2.19 (95% CI=2.14-2.25), respectively, for each SD unit increase in the GW-PRS. However, compared to the GW-PRS, in African and European ancestry men, the PRS 269 had larger or similar AUCs (AUC=0.679, 95% CI=0.659-0.700 and AUC=0.845, 95% CI=0.841-0.849, respectively) and comparable prostate cancer OR (OR=2.05, 95% CI=1.87-2.26 and OR=2.21, 95% CI=2.16-2.26, respectively). Findings were similar in the validation data. This investigation suggests that current GW-PRS approaches may not improve the ability to predict prostate cancer risk compared to the multi-ancestry PRS 269 constructed with fine-mapping.

12.
JAMA Cardiol ; 8(6): 564-574, 2023 06 01.
Article in English | MEDLINE | ID: mdl-37133828

ABSTRACT

Importance: Primary prevention of atherosclerotic cardiovascular disease (ASCVD) relies on risk stratification. Genome-wide polygenic risk scores (PRSs) are proposed to improve ASCVD risk estimation. Objective: To determine whether genome-wide PRSs for coronary artery disease (CAD) and acute ischemic stroke improve ASCVD risk estimation with traditional clinical risk factors in an ancestrally diverse midlife population. Design, Setting, and Participants: This was a prognostic analysis of incident events in a retrospectively defined longitudinal cohort conducted from January 1, 2011, to December 31, 2018. Included in the study were adults free of ASCVD and statin naive at baseline from the Million Veteran Program (MVP), a mega biobank with genetic, survey, and electronic health record data from a large US health care system. Data were analyzed from March 15, 2021, to January 5, 2023. Exposures: PRSs for CAD and ischemic stroke derived from cohorts of largely European descent and risk factors, including age, sex, systolic blood pressure, total cholesterol, high-density lipoprotein (HDL) cholesterol, smoking, and diabetes status. Main Outcomes and Measures: Incident nonfatal myocardial infarction (MI), ischemic stroke, ASCVD death, and composite ASCVD events. Results: A total of 79 151 participants (mean [SD] age, 57.8 [13.7] years; 68 503 male [86.5%]) were included in the study. The cohort included participants from the following harmonized genetic ancestry and race and ethnicity categories: 18 505 non-Hispanic Black (23.4%), 6785 Hispanic (8.6%), and 53 861 non-Hispanic White (68.0%) with a median (5th-95th percentile) follow-up of 4.3 (0.7-6.9) years. From 2011 to 2018, 3186 MIs (4.0%), 1933 ischemic strokes (2.4%), 867 ASCVD deaths (1.1%), and 5485 composite ASCVD events (6.9%) were observed. CAD PRS was associated with incident MI in non-Hispanic Black (hazard ratio [HR], 1.10; 95% CI, 1.02-1.19), Hispanic (HR, 1.26; 95% CI, 1.09-1.46), and non-Hispanic White (HR, 1.23; 95% CI, 1.18-1.29) participants. Stroke PRS was associated with incident stroke in non-Hispanic White participants (HR, 1.15; 95% CI, 1.08-1.21). A combined CAD plus stroke PRS was associated with ASCVD deaths among non-Hispanic Black (HR, 1.19; 95% CI, 1.03-1.17) and non-Hispanic (HR, 1.11; 95% CI, 1.03-1.21) participants. The combined PRS was also associated with composite ASCVD across all ancestry groups but greater among non-Hispanic White (HR, 1.20; 95% CI, 1.16-1.24) than non-Hispanic Black (HR, 1.11; 95% CI, 1.05-1.17) and Hispanic (HR, 1.12; 95% CI, 1.00-1.25) participants. Net reclassification improvement from adding PRS to a traditional risk model was modest for the intermediate risk group for composite CVD among men (5-year risk >3.75%, 0.38%; 95% CI, 0.07%-0.68%), among women, (6.79%; 95% CI, 3.01%-10.58%), for age older than 55 years (0.25%; 95% CI, 0.03%-0.47%), and for ages 40 to 55 years (1.61%; 95% CI, -0.07% to 3.30%). Conclusions and Relevance: Study results suggest that PRSs derived predominantly in European samples were statistically significantly associated with ASCVD in the multiancestry midlife and older-age MVP cohort. Overall, modest improvement in discrimination metrics were observed with addition of PRSs to traditional risk factors with greater magnitude in women and younger age groups.


Subject(s)
Atherosclerosis , Cardiovascular Diseases , Coronary Artery Disease , Ischemic Stroke , Myocardial Infarction , Stroke , Veterans , Adult , Humans , Male , Female , Middle Aged , Cardiovascular Diseases/epidemiology , Cardiovascular Diseases/genetics , Retrospective Studies , Risk Assessment/methods , Risk Factors , Coronary Artery Disease/epidemiology , Coronary Artery Disease/genetics , Atherosclerosis/epidemiology , Myocardial Infarction/epidemiology , Stroke/epidemiology , Cholesterol
13.
Eur Urol ; 84(1): 13-21, 2023 07.
Article in English | MEDLINE | ID: mdl-36872133

ABSTRACT

BACKGROUND: Genetic factors play an important role in prostate cancer (PCa) susceptibility. OBJECTIVE: To discover common genetic variants contributing to the risk of PCa in men of African ancestry. DESIGN, SETTING, AND PARTICIPANTS: We conducted a meta-analysis of ten genome-wide association studies consisting of 19378 cases and 61620 controls of African ancestry. OUTCOME MEASUREMENTS AND STATISTICAL ANALYSIS: Common genotyped and imputed variants were tested for their association with PCa risk. Novel susceptibility loci were identified and incorporated into a multiancestry polygenic risk score (PRS). The PRS was evaluated for associations with PCa risk and disease aggressiveness. RESULTS AND LIMITATIONS: Nine novel susceptibility loci for PCa were identified, of which seven were only found or substantially more common in men of African ancestry, including an African-specific stop-gain variant in the prostate-specific gene anoctamin 7 (ANO7). A multiancestry PRS of 278 risk variants conferred strong associations with PCa risk in African ancestry studies (odds ratios [ORs] >3 and >5 for men in the top PRS decile and percentile, respectively). More importantly, compared with men in the 40-60% PRS category, men in the top PRS decile had a significantly higher risk of aggressive PCa (OR = 1.23, 95% confidence interval = 1.10-1.38, p = 4.4 × 10-4). CONCLUSIONS: This study demonstrates the importance of large-scale genetic studies in men of African ancestry for a better understanding of PCa susceptibility in this high-risk population and suggests a potential clinical utility of PRS in differentiating between the risks of developing aggressive and nonaggressive disease in men of African ancestry. PATIENT SUMMARY: In this large genetic study in men of African ancestry, we discovered nine novel prostate cancer (PCa) risk variants. We also showed that a multiancestry polygenic risk score was effective in stratifying PCa risk, and was able to differentiate risk of aggressive and nonaggressive disease.


Subject(s)
Genetic Predisposition to Disease , Prostatic Neoplasms , Male , Humans , Genome-Wide Association Study , Prostatic Neoplasms/genetics , Prostatic Neoplasms/epidemiology , Risk Factors , Black People/genetics
14.
PLoS Genet ; 19(3): e1010623, 2023 03.
Article in English | MEDLINE | ID: mdl-36940203

ABSTRACT

Suicidal ideation (SI) often precedes and predicts suicide attempt and death, is the most common suicidal phenotype and is over-represented in veterans. The genetic architecture of SI in the absence of suicide attempt (SA) is unknown, yet believed to have distinct and overlapping risk with other suicidal behaviors. We performed the first GWAS of SI without SA in the Million Veteran Program (MVP), identifying 99,814 SI cases from electronic health records without a history of SA or suicide death (SD) and 512,567 controls without SI, SA or SD. GWAS was performed separately in the four largest ancestry groups, controlling for sex, age and genetic substructure. Ancestry-specific results were combined via meta-analysis to identify pan-ancestry loci. Four genome-wide significant (GWS) loci were identified in the pan-ancestry meta-analysis with loci on chromosomes 6 and 9 associated with suicide attempt in an independent sample. Pan-ancestry gene-based analysis identified GWS associations with DRD2, DCC, FBXL19, BCL7C, CTF1, ANNK1, and EXD3. Gene-set analysis implicated synaptic and startle response pathways (q's<0.05). European ancestry (EA) analysis identified GWS loci on chromosomes 6 and 9, as well as GWS gene associations in EXD3, DRD2, and DCC. No other ancestry-specific GWS results were identified, underscoring the need to increase representation of diverse individuals. The genetic correlation of SI and SA within MVP was high (rG = 0.87; p = 1.09e-50), as well as with post-traumatic stress disorder (PTSD; rG = 0.78; p = 1.98e-95) and major depressive disorder (MDD; rG = 0.78; p = 8.33e-83). Conditional analysis on PTSD and MDD attenuated most pan-ancestry and EA GWS signals for SI without SA to nominal significance, with the exception of EXD3 which remained GWS. Our novel findings support a polygenic and complex architecture for SI without SA which is largely shared with SA and overlaps with psychiatric conditions frequently comorbid with suicidal behaviors.


Subject(s)
Depressive Disorder, Major , Veterans , Humans , Suicidal Ideation , Veterans/psychology , Genome-Wide Association Study , Depressive Disorder, Major/genetics , Suicide, Attempted/psychology , Risk Factors
15.
bioRxiv ; 2023 Jul 10.
Article in English | MEDLINE | ID: mdl-36711711

ABSTRACT

Data scarcity presents a significant obstacle in the field of biomedicine, where acquiring diverse and sufficient datasets can be costly and challenging. Synthetic data generation offers a potential solution to this problem by expanding dataset sizes, thereby enabling the training of more robust and generalizable machine learning models. Although previous studies have explored synthetic data generation for cancer diagnosis, they have predominantly focused on single modality settings, such as whole-slide image tiles or RNA-Seq data. To bridge this gap, we propose a novel approach, RNA-Cascaded-Diffusion-Model or RNA-CDM, for performing RNA-to-image synthesis in a multi-cancer context, drawing inspiration from successful text-to-image synthesis models used in natural images. In our approach, we employ a variational auto-encoder to reduce the dimensionality of a patient's gene expression profile, effectively distinguishing between different types of cancer. Subsequently, we employ a cascaded diffusion model to synthesize realistic whole-slide image tiles using the latent representation derived from the patient's RNA-Seq data. Our results demonstrate that the generated tiles accurately preserve the distribution of cell types observed in real-world data, with state-of-the-art cell identification models successfully detecting important cell types in the synthetic samples. Furthermore, we illustrate that the synthetic tiles maintain the cell fraction observed in bulk RNA-Seq data and that modifications in gene expression affect the composition of cell types in the synthetic tiles. Next, we utilize the synthetic data generated by RNA-CDM to pretrain machine learning models and observe improved performance compared to training from scratch. Our study emphasizes the potential usefulness of synthetic data in developing machine learning models in sarce-data settings, while also highlighting the possibility of imputing missing data modalities by leveraging the available information. In conclusion, our proposed RNA-CDM approach for synthetic data generation in biomedicine, particularly in the context of cancer diagnosis, offers a novel and promising solution to address data scarcity. By generating synthetic data that aligns with real-world distributions and leveraging it to pretrain machine learning models, we contribute to the development of robust clinical decision support systems and potential advancements in precision medicine.

16.
Phys Med Biol ; 68(7)2023 03 23.
Article in English | MEDLINE | ID: mdl-36716497

ABSTRACT

Objective. Developing Machine Learning models (N Gorre et al 2023) for clinical applications from scratch can be a cumbersome task requiring varying levels of expertise. Seasoned developers and researchers may also often face incompatible frameworks and data preparation issues. This is further complicated in the context of diagnostic radiology and oncology applications, given the heterogenous nature of the input data and the specialized task requirements. Our goal is to provide clinicians, researchers, and early AI developers with a modular, flexible, and user-friendly software tool that can effectively meet their needs to explore, train, and test AI algorithms by allowing users to interpret their model results. This latter step involves the incorporation of interpretability and explainability methods that would allow visualizing performance as well as interpreting predictions across the different neural network layers of a deep learning algorithm.Approach. To demonstrate our proposed tool, we have developed the CRP10 AI Application Interface (CRP10AII) as part of the MIDRC consortium. CRP10AII is based on the web service Django framework in Python. CRP10AII/Django/Python in combination with another data manager tool/platform, data commons such as Gen3 can provide a comprehensive while easy to use machine/deep learning analytics tool. The tool allows to test, visualize, interpret how and why the deep learning model is performing. The major highlight of CRP10AII is its capability of visualization and interpretability of otherwise Blackbox AI algorithms.Results. CRP10AII provides many convenient features for model building and evaluation, including: (1) query and acquire data according to the specific application (e.g. classification, segmentation) from the data common platform (Gen3 here); (2) train the AI models from scratch or use pre-trained models (e.g. VGGNet, AlexNet, BERT) for transfer learning and test the model predictions, performance assessment, receiver operating characteristics curve evaluation; (3) interpret the AI model predictions using methods like SHAPLEY, LIME values; and (4) visualize the model learning through heatmaps and activation maps of individual layers of the neural network.Significance. Unexperienced users may have more time to swiftly pre-process, build/train their AI models on their own use-cases, and further visualize and explore these AI models as part of this pipeline, all in an end-to-end manner. CRP10AII will be provided as an open-source tool, and we expect to continue developing it based on users' feedback.


Subject(s)
Algorithms , Neural Networks, Computer , Software , Machine Learning , ROC Curve
17.
JAMA Psychiatry ; 80(2): 135-145, 2023 02 01.
Article in English | MEDLINE | ID: mdl-36515925

ABSTRACT

Importance: Suicide is a leading cause of death; however, the molecular genetic basis of suicidal thoughts and behaviors (SITB) remains unknown. Objective: To identify novel, replicable genomic risk loci for SITB. Design, Setting, and Participants: This genome-wide association study included 633 778 US military veterans with and without SITB, as identified through electronic health records. GWAS was performed separately by ancestry, controlling for sex, age, and genetic substructure. Cross-ancestry risk loci were identified through meta-analysis. Study enrollment began in 2011 and is ongoing. Data were analyzed from November 2021 to August 2022. Main Outcome and Measures: SITB. Results: A total of 633 778 US military veterans were included in the analysis (57 152 [9%] female; 121 118 [19.1%] African ancestry, 8285 [1.3%] Asian ancestry, 452 767 [71.4%] European ancestry, and 51 608 [8.1%] Hispanic ancestry), including 121 211 individuals with SITB (19.1%). Meta-analysis identified more than 200 GWS (P < 5 × 10-8) cross-ancestry risk single-nucleotide variants for SITB concentrated in 7 regions on chromosomes 2, 6, 9, 11, 14, 16, and 18. Top single-nucleotide variants were largely intronic in nature; 5 were independently replicated in ISGC, including rs6557168 in ESR1, rs12808482 in DRD2, rs77641763 in EXD3, rs10671545 in DCC, and rs36006172 in TRAF3. Associations for FBXL19 and AC018880.2 were not replicated. Gene-based analyses implicated 24 additional GWS cross-ancestry risk genes, including FURIN, TSNARE1, and the NCAM1-TTC12-ANKK1-DRD2 gene cluster. Cross-ancestry enrichment analyses revealed significant enrichment for expression in brain and pituitary tissue, synapse and ubiquitination processes, amphetamine addiction, parathyroid hormone synthesis, axon guidance, and dopaminergic pathways. Seven other unique European ancestry-specific GWS loci were identified, 2 of which (POM121L2 and METTL15/LINC02758) were replicated. Two additional GWS ancestry-specific loci were identified within the African ancestry (PET112/GATB) and Hispanic ancestry (intergenic locus on chromosome 4) subsets, both of which were replicated. No GWS loci were identified within the Asian ancestry subset; however, significant enrichment was observed for axon guidance, cyclic adenosine monophosphate signaling, focal adhesion, glutamatergic synapse, and oxytocin signaling pathways across all ancestries. Within the European ancestry subset, genetic correlations (r > 0.75) were observed between the SITB phenotype and a suicide attempt-only phenotype, depression, and posttraumatic stress disorder. Additionally, polygenic risk score analyses revealed that the Million Veteran Program polygenic risk score had nominally significant main effects in 2 independent samples of veterans of European and African ancestry. Conclusions and Relevance: The findings of this analysis may advance understanding of the molecular genetic basis of SITB and provide evidence for ESR1, DRD2, TRAF3, and DCC as cross-ancestry candidate risk genes. More work is needed to replicate these findings and to determine if and how these genes might impact clinical care.


Subject(s)
Veterans , Humans , Female , Male , Suicidal Ideation , Genome-Wide Association Study , TNF Receptor-Associated Factor 3/genetics , Genetic Loci/genetics , Nucleotides , Polymorphism, Single Nucleotide/genetics , Genetic Predisposition to Disease/genetics , Proteins , Protein Serine-Threonine Kinases/genetics
18.
Pac Symp Biocomput ; 28: 541-545, 2023.
Article in English | MEDLINE | ID: mdl-36541008

ABSTRACT

The following sections are included: Introduction, Background, and Motivation, Workshop Presenters, References.


Subject(s)
Computational Biology , Humans
19.
Elife ; 112022 07 08.
Article in English | MEDLINE | ID: mdl-35801699

ABSTRACT

Background: We recently developed a multi-ancestry polygenic risk score (PRS) that effectively stratifies prostate cancer risk across populations. In this study, we validated the performance of the PRS in the multi-ancestry Million Veteran Program and additional independent studies. Methods: Within each ancestry population, the association of PRS with prostate cancer risk was evaluated separately in each case-control study and then combined in a fixed-effects inverse-variance-weighted meta-analysis. We further assessed the effect modification by age and estimated the age-specific absolute risk of prostate cancer for each ancestry population. Results: The PRS was evaluated in 31,925 cases and 490,507 controls, including men from European (22,049 cases, 414,249 controls), African (8794 cases, 55,657 controls), and Hispanic (1082 cases, 20,601 controls) populations. Comparing men in the top decile (90-100% of the PRS) to the average 40-60% PRS category, the prostate cancer odds ratio (OR) was 3.8-fold in European ancestry men (95% CI = 3.62-3.96), 2.8-fold in African ancestry men (95% CI = 2.59-3.03), and 3.2-fold in Hispanic men (95% CI = 2.64-3.92). The PRS did not discriminate risk of aggressive versus nonaggressive prostate cancer. However, the OR diminished with advancing age (European ancestry men in the top decile: ≤55 years, OR = 7.11; 55-60 years, OR = 4.26; >70 years, OR = 2.79). Men in the top PRS decile reached 5% absolute prostate cancer risk ~10 years younger than men in the 40-60% PRS category. Conclusions: Our findings validate the multi-ancestry PRS as an effective prostate cancer risk stratification tool across populations. A clinical study of PRS is warranted to determine whether the PRS could be used for risk-stratified screening and early detection. Funding: This work was supported by the National Cancer Institute at the National Institutes of Health (grant numbers U19 CA214253 to C.A.H., U01 CA257328 to C.A.H., U19 CA148537 to C.A.H., R01 CA165862 to C.A.H., K99 CA246063 to B.F.D, and T32CA229110 to F.C), the Prostate Cancer Foundation (grants 21YOUN11 to B.F.D. and 20CHAS03 to C.A.H.), the Achievement Rewards for College Scientists Foundation Los Angeles Founder Chapter to B.F.D, and the Million Veteran Program-MVP017. This research has been conducted using the UK Biobank Resource under application number 42195. This research is based on data from the Million Veteran Program, Office of Research and Development, and the Veterans Health Administration. This publication does not represent the views of the Department of Veteran Affairs or the United States Government.


Subject(s)
Genome-Wide Association Study , Prostatic Neoplasms , Age Factors , Case-Control Studies , Genetic Predisposition to Disease , Humans , Male , Middle Aged , Multifactorial Inheritance , Prostatic Neoplasms/epidemiology , Prostatic Neoplasms/genetics , Risk Factors , United States/epidemiology
20.
Front Neuroinform ; 16: 752471, 2022.
Article in English | MEDLINE | ID: mdl-35651721

ABSTRACT

The anatomic validity of structural connectomes remains a significant uncertainty in neuroimaging. Edge-centric tractography reconstructs streamlines in bundles between each pair of cortical or subcortical regions. Although edge bundles provides a stronger anatomic embedding than traditional connectomes, calculating them for each region-pair requires exponentially greater computation. We observe that major speedup can be achieved by reducing the number of streamlines used by probabilistic tractography algorithms. To ensure this does not degrade connectome quality, we calculate the identifiability of edge-centric connectomes between test and re-test sessions as a proxy for information content. We find that running PROBTRACKX2 with as few as 1 streamline per voxel per region-pair has no significant impact on identifiability. Variation in identifiability caused by streamline count is overshadowed by variation due to subject demographics. This finding even holds true in an entirely different tractography algorithm using MRTrix. Incidentally, we observe that Jaccard similarity is more effective than Pearson correlation in calculating identifiability for our subject population.

SELECTION OF CITATIONS
SEARCH DETAIL
...