Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 48
Filtrar
1.
Am J Hum Genet ; 110(7): 1200-1206, 2023 07 06.
Artigo em Inglês | MEDLINE | ID: mdl-37311464

RESUMO

Genome-wide polygenic risk scores (GW-PRSs) have been reported to have better predictive ability than PRSs based on genome-wide significance thresholds across numerous traits. We compared the predictive ability of several GW-PRS approaches to a recently developed PRS of 269 established prostate cancer-risk variants from multi-ancestry GWASs and fine-mapping studies (PRS269). GW-PRS models were trained with a large and diverse prostate cancer GWAS of 107,247 cases and 127,006 controls that we previously used to develop the multi-ancestry PRS269. Resulting models were independently tested in 1,586 cases and 1,047 controls of African ancestry from the California Uganda Study and 8,046 cases and 191,825 controls of European ancestry from the UK Biobank and further validated in 13,643 cases and 210,214 controls of European ancestry and 6,353 cases and 53,362 controls of African ancestry from the Million Veteran Program. In the testing data, the best performing GW-PRS approach had AUCs of 0.656 (95% CI = 0.635-0.677) in African and 0.844 (95% CI = 0.840-0.848) in European ancestry men and corresponding prostate cancer ORs of 1.83 (95% CI = 1.67-2.00) and 2.19 (95% CI = 2.14-2.25), respectively, for each SD unit increase in the GW-PRS. Compared to the GW-PRS, in African and European ancestry men, the PRS269 had larger or similar AUCs (AUC = 0.679, 95% CI = 0.659-0.700 and AUC = 0.845, 95% CI = 0.841-0.849, respectively) and comparable prostate cancer ORs (OR = 2.05, 95% CI = 1.87-2.26 and OR = 2.21, 95% CI = 2.16-2.26, respectively). Findings were similar in the validation studies. This investigation suggests that current GW-PRS approaches may not improve the ability to predict prostate cancer risk compared to the PRS269 developed from multi-ancestry GWASs and fine-mapping.


Assuntos
Predisposição Genética para Doença , Neoplasias da Próstata , Humanos , Masculino , População Negra/genética , Estudo de Associação Genômica Ampla , Herança Multifatorial/genética , Neoplasias da Próstata/genética , Fatores de Risco , População Branca/genética
2.
PLoS Genet ; 19(3): e1010623, 2023 03.
Artigo em Inglês | MEDLINE | ID: mdl-36940203

RESUMO

Suicidal ideation (SI) often precedes and predicts suicide attempt and death, is the most common suicidal phenotype and is over-represented in veterans. The genetic architecture of SI in the absence of suicide attempt (SA) is unknown, yet believed to have distinct and overlapping risk with other suicidal behaviors. We performed the first GWAS of SI without SA in the Million Veteran Program (MVP), identifying 99,814 SI cases from electronic health records without a history of SA or suicide death (SD) and 512,567 controls without SI, SA or SD. GWAS was performed separately in the four largest ancestry groups, controlling for sex, age and genetic substructure. Ancestry-specific results were combined via meta-analysis to identify pan-ancestry loci. Four genome-wide significant (GWS) loci were identified in the pan-ancestry meta-analysis with loci on chromosomes 6 and 9 associated with suicide attempt in an independent sample. Pan-ancestry gene-based analysis identified GWS associations with DRD2, DCC, FBXL19, BCL7C, CTF1, ANNK1, and EXD3. Gene-set analysis implicated synaptic and startle response pathways (q's<0.05). European ancestry (EA) analysis identified GWS loci on chromosomes 6 and 9, as well as GWS gene associations in EXD3, DRD2, and DCC. No other ancestry-specific GWS results were identified, underscoring the need to increase representation of diverse individuals. The genetic correlation of SI and SA within MVP was high (rG = 0.87; p = 1.09e-50), as well as with post-traumatic stress disorder (PTSD; rG = 0.78; p = 1.98e-95) and major depressive disorder (MDD; rG = 0.78; p = 8.33e-83). Conditional analysis on PTSD and MDD attenuated most pan-ancestry and EA GWS signals for SI without SA to nominal significance, with the exception of EXD3 which remained GWS. Our novel findings support a polygenic and complex architecture for SI without SA which is largely shared with SA and overlaps with psychiatric conditions frequently comorbid with suicidal behaviors.


Assuntos
Transtorno Depressivo Maior , Veteranos , Humanos , Ideação Suicida , Veteranos/psicologia , Estudo de Associação Genômica Ampla , Transtorno Depressivo Maior/genética , Tentativa de Suicídio/psicologia , Fatores de Risco
3.
PLoS Genet ; 18(4): e1010113, 2022 04.
Artigo em Inglês | MEDLINE | ID: mdl-35482673

RESUMO

The study aims to determine the shared genetic architecture between COVID-19 severity with existing medical conditions using electronic health record (EHR) data. We conducted a Phenome-Wide Association Study (PheWAS) of genetic variants associated with critical illness (n = 35) or hospitalization (n = 42) due to severe COVID-19 using genome-wide association summary data from the Host Genetics Initiative. PheWAS analysis was performed using genotype-phenotype data from the Veterans Affairs Million Veteran Program (MVP). Phenotypes were defined by International Classification of Diseases (ICD) codes mapped to clinically relevant groups using published PheWAS methods. Among 658,582 Veterans, variants associated with severe COVID-19 were tested for association across 1,559 phenotypes. Variants at the ABO locus (rs495828, rs505922) associated with the largest number of phenotypes (nrs495828 = 53 and nrs505922 = 59); strongest association with venous embolism, odds ratio (ORrs495828 1.33 (p = 1.32 x 10-199), and thrombosis ORrs505922 1.33, p = 2.2 x10-265. Among 67 respiratory conditions tested, 11 had significant associations including MUC5B locus (rs35705950) with increased risk of idiopathic fibrosing alveolitis OR 2.83, p = 4.12 × 10-191; CRHR1 (rs61667602) associated with reduced risk of pulmonary fibrosis, OR 0.84, p = 2.26× 10-12. The TYK2 locus (rs11085727) associated with reduced risk for autoimmune conditions, e.g., psoriasis OR 0.88, p = 6.48 x10-23, lupus OR 0.84, p = 3.97 x 10-06. PheWAS stratified by ancestry demonstrated differences in genotype-phenotype associations. LMNA (rs581342) associated with neutropenia OR 1.29 p = 4.1 x 10-13 among Veterans of African and Hispanic ancestry but not European. Overall, we observed a shared genetic architecture between COVID-19 severity and conditions related to underlying risk factors for severe and poor COVID-19 outcomes. Differing associations between genotype-phenotype across ancestries may inform heterogenous outcomes observed with COVID-19. Divergent associations between risk for severe COVID-19 with autoimmune inflammatory conditions both respiratory and non-respiratory highlights the shared pathways and fine balance of immune host response and autoimmunity and caution required when considering treatment targets.


Assuntos
COVID-19 , Veteranos , COVID-19/epidemiologia , COVID-19/genética , Estudos de Associação Genética , Estudo de Associação Genômica Ampla/métodos , Humanos , Polimorfismo de Nucleotídeo Único/genética
4.
Mol Psychiatry ; 27(4): 2264-2272, 2022 04.
Artigo em Inglês | MEDLINE | ID: mdl-35347246

RESUMO

To identify pan-ancestry and ancestry-specific loci associated with attempting suicide among veterans, we conducted a genome-wide association study (GWAS) of suicide attempts within a large, multi-ancestry cohort of U.S. veterans enrolled in the Million Veterans Program (MVP). Cases were defined as veterans with a documented history of suicide attempts in the electronic health record (EHR; N = 14,089) and controls were defined as veterans with no documented history of suicidal thoughts or behaviors in the EHR (N = 395,064). GWAS was performed separately in each ancestry group, controlling for sex, age and genetic substructure. Pan-ancestry risk loci were identified through meta-analysis and included two genome-wide significant loci on chromosomes 20 (p = 3.64 × 10-9) and 1 (p = 3.69 × 10-8). A strong pan-ancestry signal at the Dopamine Receptor D2 locus (p = 1.77 × 10-7) was also identified and subsequently replicated in a large, independent international civilian cohort (p = 7.97 × 10-4). Additionally, ancestry-specific genome-wide significant loci were also detected in African-Americans, European-Americans, Asian-Americans, and Hispanic-Americans. Pathway analyses suggested over-representation of many biological pathways with high clinical significance, including oxytocin signaling, glutamatergic synapse, cortisol synthesis and secretion, dopaminergic synapse, and circadian rhythm. These findings confirm that the genetic architecture underlying suicide attempt risk is complex and includes both pan-ancestry and ancestry-specific risk loci. Moreover, pathway analyses suggested many commonly impacted biological pathways that could inform development of improved therapeutics for suicide prevention.


Assuntos
Estudo de Associação Genômica Ampla , Veteranos , Negro ou Afro-Americano/genética , Loci Gênicos , Predisposição Genética para Doença/genética , Humanos , Polimorfismo de Nucleotídeo Único/genética , Tentativa de Suicídio , População Branca/genética
5.
Am J Respir Crit Care Med ; 206(10): 1220-1229, 2022 11 15.
Artigo em Inglês | MEDLINE | ID: mdl-35771531

RESUMO

Rationale: A common MUC5B gene polymorphism, rs35705950-T, is associated with idiopathic pulmonary fibrosis (IPF), but its role in severe acute respiratory syndrome coronavirus 2 infection and disease severity is unclear. Objectives: To assess whether rs35705950-T confers differential risk for clinical outcomes associated with coronavirus disease (COVID-19) infection among participants in the Million Veteran Program (MVP). Methods: The MUC5B rs35705950-T allele was directly genotyped among MVP participants; clinical events and comorbidities were extracted from the electronic health records. Associations between the incidence or severity of COVID-19 and rs35705950-T were analyzed within each ancestry group in the MVP followed by transancestry meta-analysis. Replication and joint meta-analysis were conducted using summary statistics from the COVID-19 Host Genetics Initiative (HGI). Sensitivity analyses with adjustment for additional covariates (body mass index, Charlson comorbidity index, smoking, asbestosis, rheumatoid arthritis with interstitial lung disease, and IPF) and associations with post-COVID-19 pneumonia were performed in MVP subjects. Measurements and Main Results: The rs35705950-T allele was associated with fewer COVID-19 hospitalizations in transancestry meta-analyses within the MVP (Ncases = 4,325; Ncontrols = 507,640; OR = 0.89 [0.82-0.97]; P = 6.86 × 10-3) and joint meta-analyses with the HGI (Ncases = 13,320; Ncontrols = 1,508,841; OR, 0.90 [0.86-0.95]; P = 8.99 × 10-5). The rs35705950-T allele was not associated with reduced COVID-19 positivity in transancestry meta-analysis within the MVP (Ncases = 19,168/Ncontrols = 492,854; OR, 0.98 [0.95-1.01]; P = 0.06) but was nominally significant (P < 0.05) in the joint meta-analysis with the HGI (Ncases = 44,820; Ncontrols = 1,775,827; OR, 0.97 [0.95-1.00]; P = 0.03). Associations were not observed with severe outcomes or mortality. Among individuals of European ancestry in the MVP, rs35705950-T was associated with fewer post-COVID-19 pneumonia events (OR, 0.82 [0.72-0.93]; P = 0.001). Conclusions: The MUC5B variant rs35705950-T may confer protection in COVID-19 hospitalizations.


Assuntos
COVID-19 , Fibrose Pulmonar Idiopática , Humanos , COVID-19/epidemiologia , COVID-19/genética , Mucina-5B/genética , Polimorfismo Genético , Fibrose Pulmonar Idiopática/genética , Genótipo , Hospitalização , Predisposição Genética para Doença/genética
6.
BMC Genomics ; 20(1): 984, 2019 Dec 16.
Artigo em Inglês | MEDLINE | ID: mdl-31842752

RESUMO

BACKGROUND: Pseudomonas aeruginosa (PA) is an opportunistic Gram-negative bacterium that causes serious life threatening and nosocomial infections including pneumonia. PA has the ability to alter host genome to facilitate its invasion, thus increasing the virulence of the organism. Sphingosine-1- phosphate (S1P), a bioactive lipid, is known to play a key role in facilitating infection. Sphingosine kinases (SPHK) 1&2 phosphorylate sphingosine to generate S1P in mammalian cells. We reported earlier that Sphk2-/- mice offered significant protection against lung inflammation, compared to wild type (WT) animals. Therefore, we profiled the differential expression of genes between the protected group of Sphk2-/- and the wild type controls to better understand the underlying protective mechanisms related to the Sphk2 deletion in lung inflammatory injury. Whole transcriptome shotgun sequencing (RNA-Seq) was performed on mouse lung tissue using NextSeq 500 sequencing system. RESULTS: Two-way analysis of variance (ANOVA) analysis was performed and differentially expressed genes following PA infection were identified using whole transcriptome of Sphk2-/- mice and their WT counterparts. Pathway (PW) enrichment analyses of the RNA seq data identified several signaling pathways that are likely to play a crucial role in pneumonia caused by PA such as those involved in: 1. Immune response to PA infection and NF-κB signal transduction; 2. PKC signal transduction; 3. Impact on epigenetic regulation; 4. Epithelial sodium channel pathway; 5. Mucin expression; and 6. Bacterial infection related pathways. Our genomic data suggests a potential role for SPHK2 in PA-induced pneumonia through elevated expression of inflammatory genes in lung tissue. Further, validation by RT-PCR on 10 differentially expressed genes showed 100% concordance in terms of vectoral changes as well as significant fold change. CONCLUSION: Using Sphk2-/- mice and differential gene expression analysis, we have shown here that S1P/SPHK2 signaling could play a key role in promoting PA pneumonia. The identified genes promote inflammation and suppress others that naturally inhibit inflammation and host defense. Thus, targeting SPHK2/S1P signaling in PA-induced lung inflammation could serve as a potential therapy to combat PA-induced pneumonia.


Assuntos
Deleção de Genes , Perfilação da Expressão Gênica/métodos , Redes Reguladoras de Genes , Fosfotransferases (Aceptor do Grupo Álcool)/genética , Infecções por Pseudomonas/prevenção & controle , Pseudomonas aeruginosa/patogenicidade , Análise de Variância , Animais , Modelos Animais de Doenças , Feminino , Regulação da Expressão Gênica , Sequenciamento de Nucleotídeos em Larga Escala , Pulmão/imunologia , Pulmão/microbiologia , Camundongos , Infecções por Pseudomonas/genética , Infecções por Pseudomonas/imunologia , RNA-Seq , Virulência
7.
J Biomed Inform ; 71: 49-57, 2017 07.
Artigo em Inglês | MEDLINE | ID: mdl-28501646

RESUMO

The volume and diversity of data in biomedical research have been rapidly increasing in recent years. While such data hold significant promise for accelerating discovery, their use entails many challenges including: the need for adequate computational infrastructure, secure processes for data sharing and access, tools that allow researchers to find and integrate diverse datasets, and standardized methods of analysis. These are just some elements of a complex ecosystem that needs to be built to support the rapid accumulation of these data. The NIH Big Data to Knowledge (BD2K) initiative aims to facilitate digitally enabled biomedical research. Within the BD2K framework, the Commons initiative is intended to establish a virtual environment that will facilitate the use, interoperability, and discoverability of shared digital objects used for research. The BD2K Commons Framework Pilots Working Group (CFPWG) was established to clarify goals and work on pilot projects that address existing gaps toward realizing the vision of the BD2K Commons. This report reviews highlights from a two-day meeting involving the BD2K CFPWG to provide insights on trends and considerations in advancing Big Data science for biomedical research in the United States.


Assuntos
Conjuntos de Dados como Assunto , Disseminação de Informação , National Institutes of Health (U.S.) , Pesquisa Biomédica , Humanos , Conhecimento , Pesquisa Translacional Biomédica , Estados Unidos
8.
Graefes Arch Clin Exp Ophthalmol ; 255(8): 1613-1619, 2017 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-28462455

RESUMO

PURPOSE: Retinitis pigmentosa (RP) is a genetically heterogeneous inherited retinal dystrophy. To date, over 80 genes have been implicated in RP. However, the disease demonstrates significant locus and allelic heterogeneity not entirely captured by current testing platforms. The purpose of the present study was to characterize the underlying mutation in a patient with RP without a molecular diagnosis after initial genetic testing. METHODS: Whole-exome sequencing of the affected proband was performed. Candidate gene mutations were selected based on adherence to expected genetic inheritance pattern and predicted pathogenicity. Sanger sequencing of MERTK was completed on the patient's unaffected mother, affected brother, and unaffected sister to determine genetic phase. RESULTS: Eight sequence variants were identified in the proband in known RP-associated genes. Sequence analysis revealed that the proband was a compound heterozygote with two independent mutations in MERTK, a novel nonsense mutation (c.2179C > T) and a previously reported missense variant (c.2530C > T). The proband's affected brother also had both mutations. Predicted phase was confirmed in unaffected family members. CONCLUSION: Our study identifies a novel nonsense mutation in MERTK in a family with RP and no prior molecular diagnosis. The present study also demonstrates the clinical value of exome sequencing in determining the genetic basis of Mendelian diseases when standard genetic testing is unsuccessful.


Assuntos
DNA/genética , Mutação , Retinose Pigmentar/genética , c-Mer Tirosina Quinase/genética , Análise Mutacional de DNA , Exoma , Feminino , Humanos , Masculino , Oftalmoscopia , Linhagem , Retina/patologia , Retinose Pigmentar/diagnóstico , Retinose Pigmentar/metabolismo , c-Mer Tirosina Quinase/metabolismo
9.
Biophys J ; 110(5): 1038-43, 2016 Mar 08.
Artigo em Inglês | MEDLINE | ID: mdl-26958881

RESUMO

We describe the ways in which Galaxy, a web-based reproducible research platform, can be used for web-based sharing of complex computational models. Galaxy allows users to seamlessly customize and run simulations on cloud computing resources, a concept we refer to as Models and Simulations as a Service (MaSS). To illustrate this application of Galaxy, we have developed a tool suite for simulating a high spatial-resolution model of the cardiac Ca(2+) spark that requires supercomputing resources for execution. We also present tools for simulating models encoded in the SBML and CellML model description languages, thus demonstrating how Galaxy's reproducible research features can be leveraged by existing technologies. Finally, we demonstrate how the Galaxy workflow editor can be used to compose integrative models from constituent submodules. This work represents an important novel approach, to our knowledge, to making computational simulations more accessible to the broader scientific community.


Assuntos
Simulação por Computador , Internet , Software , Animais , Axônios/fisiologia , Cálcio/metabolismo , Sinalização do Cálcio , Decapodiformes
10.
Bioinformatics ; 31(2): 187-93, 2015 Jan 15.
Artigo em Inglês | MEDLINE | ID: mdl-25270638

RESUMO

MOTIVATION: The development of cost-effective next-generation sequencing methods has spurred the development of high-throughput bioinformatics tools for detection of sequence variation. With many disparate variant-calling algorithms available, investigators must ask, 'Which method is best for my data?' Machine learning research has shown that so-called ensemble methods that combine the output of multiple models can dramatically improve classifier performance. Here we describe a novel variant-calling approach based on an ensemble of variant-calling algorithms, which we term the Consensus Genotyper for Exome Sequencing (CGES). CGES uses a two-stage voting scheme among four algorithm implementations. While our ensemble method can accept variants generated by any variant-calling algorithm, we used GATK2.8, SAMtools, FreeBayes and Atlas-SNP2 in building CGES because of their performance, widespread adoption and diverse but complementary algorithms. RESULTS: We apply CGES to 132 samples sequenced at the Hudson Alpha Institute for Biotechnology (HAIB, Huntsville, AL) using the Nimblegen Exome Capture and Illumina sequencing technology. Our sample set consisted of 40 complete trios, two families of four, one parent-child duo and two unrelated individuals. CGES yielded the fewest total variant calls (N(CGES) = 139° 897), the highest Ts/Tv ratio (3.02), the lowest Mendelian error rate across all genotypes (0.028%), the highest rediscovery rate from the Exome Variant Server (EVS; 89.3%) and 1000 Genomes (1KG; 84.1%) and the highest positive predictive value (PPV; 96.1%) for a random sample of previously validated de novo variants. We describe these and other quality control (QC) metrics from consensus data and explain how the CGES pipeline can be used to generate call sets of varying quality stringency, including consensus calls present across all four algorithms, calls that are consistent across any three out of four algorithms, calls that are consistent across any two out of four algorithms or a more liberal set of all calls made by any algorithm. AVAILABILITY AND IMPLEMENTATION: To enable accessible, efficient and reproducible analysis, we implement CGES both as a stand-alone command line tool available for download in GitHub and as a set of Galaxy tools and workflows configured to execute on parallel computers. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Algoritmos , Transtorno Autístico/genética , Exoma/genética , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Polimorfismo de Nucleotídeo Único/genética , Software , Sequência Consenso , Interpretação Estatística de Dados , Testes Genéticos , Genótipo , Humanos
11.
Am J Med Genet B Neuropsychiatr Genet ; 171(4): 534-45, 2016 06.
Artigo em Inglês | MEDLINE | ID: mdl-26990047

RESUMO

Recent studies show that human-specific LINE1s (L1HS) play a key role in the development of the central nervous system (CNS) and its disorders, and that their transpositions within the human genome are more common than previously thought. Many polymorphic L1HS, that is, present or absent across individuals, are not annotated in the current release of the genome and are customarily termed "non-reference L1s." We developed an analytical workflow to identify L1 polymorphic insertions with next-generation sequencing (NGS) using data from a family in which SZ segregates. Our workflow exploits two independent algorithms to detect non-reference L1 insertions, performs local de novo alignment of the regions harboring predicted L1 insertions and resolves the L1 subfamily designation from the de novo assembled sequence. We found 110 non-reference L1 polymorphic loci exhibiting Mendelian inheritance, the vast majority of which are already reported in dbRIP and/or euL1db, thus, confirming their status as non-reference L1 polymorphic insertions. Four previously undetected L1 polymorphic loci were confirmed by PCR amplification and direct sequencing of the insert. A large fraction of our non-reference L1s is located within the open reading frame of protein-coding genes that belong to pathways already implicated in the pathogenesis of schizophrenia. The finding of these polymorphic variants among SZ offsprings is intriguing and suggestive of putative pathogenic role. Our data show the utility of NGS to uncover L1 polymorphic insertions, a neglected type of genetic variants with the potential to influence the risk to develop schizophrenia like SNVs and CNVs. © 2016 Wiley Periodicals, Inc.


Assuntos
Elementos Nucleotídeos Longos e Dispersos , Esquizofrenia/genética , Adulto , Feminino , Predisposição Genética para Doença , Genômica , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Masculino , Pessoa de Meia-Idade , Mutagênese Insercional , Fases de Leitura Aberta , Linhagem , Polimorfismo Genético , Fatores de Risco , Análise de Sequência de DNA
12.
J Biomed Inform ; 49: 119-33, 2014 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-24462600

RESUMO

Due to the upcoming data deluge of genome data, the need for storing and processing large-scale genome data, easy access to biomedical analyses tools, efficient data sharing and retrieval has presented significant challenges. The variability in data volume results in variable computing and storage requirements, therefore biomedical researchers are pursuing more reliable, dynamic and convenient methods for conducting sequencing analyses. This paper proposes a Cloud-based bioinformatics workflow platform for large-scale next-generation sequencing analyses, which enables reliable and highly scalable execution of sequencing analyses workflows in a fully automated manner. Our platform extends the existing Galaxy workflow system by adding data management capabilities for transferring large quantities of data efficiently and reliably (via Globus Transfer), domain-specific analyses tools preconfigured for immediate use by researchers (via user-specific tools integration), automatic deployment on Cloud for on-demand resource allocation and pay-as-you-go pricing (via Globus Provision), a Cloud provisioning tool for auto-scaling (via HTCondor scheduler), and the support for validating the correctness of workflows (via semantic verification tools). Two bioinformatics workflow use cases as well as performance evaluation are presented to validate the feasibility of the proposed approach.


Assuntos
Biologia Computacional , Armazenamento e Recuperação da Informação , Análise de Sequência/instrumentação
13.
Nat Biomed Eng ; 2024 Mar 21.
Artigo em Inglês | MEDLINE | ID: mdl-38514775

RESUMO

Training machine-learning models with synthetically generated data can alleviate the problem of data scarcity when acquiring diverse and sufficiently large datasets is costly and challenging. Here we show that cascaded diffusion models can be used to synthesize realistic whole-slide image tiles from latent representations of RNA-sequencing data from human tumours. Alterations in gene expression affected the composition of cell types in the generated synthetic image tiles, which accurately preserved the distribution of cell types and maintained the cell fraction observed in bulk RNA-sequencing data, as we show for lung adenocarcinoma, kidney renal papillary cell carcinoma, cervical squamous cell carcinoma, colon adenocarcinoma and glioblastoma. Machine-learning models pretrained with the generated synthetic data performed better than models trained from scratch. Synthetic data may accelerate the development of machine-learning models in scarce-data settings and allow for the imputation of missing data modalities.

14.
Eur Urol Oncol ; 2024 Jan 02.
Artigo em Inglês | MEDLINE | ID: mdl-38171965

RESUMO

BACKGROUND: An electronic health record-based tool could improve accuracy and eliminate bias in provider estimation of the risk of death from other causes among men with nonmetastatic cancer. OBJECTIVE: To recalibrate and validate the Veterans Aging Cohort Study Charlson Comorbidity Index (VACS-CCI) to predict non-prostate cancer mortality (non-PCM) and to compare it with a tool predicting prostate cancer mortality (PCM). DESIGN, SETTING, AND PARTICIPANTS: An observational cohort of men with biopsy-confirmed nonmetastatic prostate cancer, enrolled from 2001 to 2018 in the national US Veterans Health Administration (VA), was divided by the year of diagnosis into the development (2001-2006 and 2008-2018) and validation (2007) sets. OUTCOME MEASUREMENTS AND STATISTICAL ANALYSIS: Mortality (all cause, non-PCM, and PCM) was evaluated. Accuracy was assessed using calibration curves and C statistic in the development, validation, and combined sets; overall; and by age (<65 and 65+ yr), race (White and Black), Hispanic ethnicity, and treatment groups. RESULTS AND LIMITATIONS: Among 107 370 individuals, we observed 24 977 deaths (86% non-PCM). The median age was 65 yr, 4947 were Black, and 5010 were Hispanic. Compared with CCI and age alone (C statistic 0.67, 95% confidence interval [CI] 0.67-0.68), VACS-CCI demonstrated improved validated discrimination (C statistic 0.75, 95% CI 0.74-0.75 for non-PCM). The prostate cancer mortality tool also discriminated well in validation (C statistic 0.81, 95% CI 0.78-0.83). Both were well calibrated overall and within subgroups. Owing to missing data, 18 009/125 379 (14%) were excluded, and VACS-CCI should be validated outside the VA prior to outside application. CONCLUSIONS: VACS-CCI is ready for implementation within the VA. Electronic health record-assisted calculation is feasible, improves accuracy over age and CCI alone, and could mitigate inaccuracy and bias in provider estimation. PATIENT SUMMARY: Veterans Aging Cohort Study Charlson Comorbidity Index is ready for application within the Veterans Health Administration. Electronic health record-assisted calculation is feasible, improves accuracy over age and Charlson Comorbidity Index alone, and might help mitigate inaccuracy and bias in provider estimation of the risk of non-prostate cancer mortality.

15.
Neuroinformatics ; 22(2): 177-191, 2024 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-38446357

RESUMO

Large-scale diffusion MRI tractography remains a significant challenge. Users must orchestrate a complex sequence of instructions that requires many software packages with complex dependencies and high computational costs. We developed MaPPeRTrac, an edge-centric tractography pipeline that simplifies and accelerates this process in a wide range of high-performance computing (HPC) environments. It fully automates either probabilistic or deterministic tractography, starting from a subject's magnetic resonance imaging (MRI) data, including structural and diffusion MRI images, to the edge density image (EDI) of their structural connectomes. Dependencies are containerized with Singularity (now called Apptainer) and decoupled from code to enable rapid prototyping and modification. Data derivatives are organized with the Brain Imaging Data Structure (BIDS) to ensure that they are findable, accessible, interoperable, and reusable following FAIR principles. The pipeline takes full advantage of HPC resources using the Parsl parallel programming framework, resulting in the creation of connectome datasets of unprecedented size. MaPPeRTrac is publicly available and tested on commercial and scientific hardware, so it can accelerate brain connectome research for a broader user community. MaPPeRTrac is available at: https://github.com/LLNL/mappertrac .


Assuntos
Conectoma , Imageamento por Ressonância Magnética , Imageamento por Ressonância Magnética/métodos , Imagem de Difusão por Ressonância Magnética/métodos , Encéfalo/diagnóstico por imagem , Conectoma/métodos
16.
bioRxiv ; 2024 May 22.
Artigo em Inglês | MEDLINE | ID: mdl-38826407

RESUMO

The expansion of biobanks has significantly propelled genomic discoveries yet the sheer scale of data within these repositories poses formidable computational hurdles, particularly in handling extensive matrix operations required by prevailing statistical frameworks. In this work, we introduce computational optimizations to the SAIGE (Scalable and Accurate Implementation of Generalized Mixed Model) algorithm, notably employing a GPU-based distributed computing approach to tackle these challenges. We applied these optimizations to conduct a large-scale genome-wide association study (GWAS) across 2,068 phenotypes derived from electronic health records of 635,969 diverse participants from the Veterans Affairs (VA) Million Veteran Program (MVP). Our strategies enabled scaling up the analysis to over 6,000 nodes on the Department of Energy (DOE) Oak Ridge Leadership Computing Facility (OLCF) Summit High-Performance Computer (HPC), resulting in a 20-fold acceleration compared to the baseline model. We also provide a Docker container with our optimizations that was successfully used on multiple cloud infrastructures on UK Biobank and All of Us datasets where we showed significant time and cost benefits over the baseline SAIGE model.

17.
HGG Adv ; : 100315, 2024 Jun 05.
Artigo em Inglês | MEDLINE | ID: mdl-38845201

RESUMO

Deciphering the genetic basis of prostate-specific antigen (PSA) levels may improve their utility for prostate cancer (PCa) screening. Using genome-wide summary statistics from 95,768 PCa-free men, we conducted a transcriptome-wide association study (TWAS) to examine impacts of genetically predicted gene expression on PSA. Analyses identified 41 statistically significant (p < 0.05/12,192 = 4.10×10-6) associations in whole blood and 39 statistically significant (p < 0.05/13,844 = 3.61×10-6) associations in prostate tissue, with 18 genes associated in both tissues. Cross-tissue analyses identified 155 statistically significantly (p < 0.05/22,249 = 2.25×10-6) genes. Out of 173 unique PSA-associated genes across analyses, we replicated 151 (87.3%) in TWAS of 209,318 PCa-free individuals from the Million Veteran Program. Based on conditional analyses, we found 20 genes (11 single-tissue, nine cross-tissue) that were associated with PSA levels in the discovery TWAS that were not attributable to a lead variant from a genome-wide association study (GWAS). Ten of these 20 genes replicated, and two of the replicated genes had colocalization probability > 0.5: CCNA2 and HIST1H2BN. Six of the 20 identified genes are not known to impact PCa risk. Fine mapping based on whole blood and prostate tissue revealed five protein-coding genes with evidence of causal relationships with PSA levels. Of these five genes, four exhibited evidence of colocalization and one was conditionally independent of previous GWAS findings. These results yield hypotheses that should be further explored to improve understanding of genetic factors underlying PSA levels.

18.
Pac Symp Biocomput ; 28: 541-545, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-36541008

RESUMO

The following sections are included: Introduction, Background, and Motivation, Workshop Presenters, References.


Assuntos
Biologia Computacional , Humanos
19.
bioRxiv ; 2023 Jul 10.
Artigo em Inglês | MEDLINE | ID: mdl-36711711

RESUMO

Data scarcity presents a significant obstacle in the field of biomedicine, where acquiring diverse and sufficient datasets can be costly and challenging. Synthetic data generation offers a potential solution to this problem by expanding dataset sizes, thereby enabling the training of more robust and generalizable machine learning models. Although previous studies have explored synthetic data generation for cancer diagnosis, they have predominantly focused on single modality settings, such as whole-slide image tiles or RNA-Seq data. To bridge this gap, we propose a novel approach, RNA-Cascaded-Diffusion-Model or RNA-CDM, for performing RNA-to-image synthesis in a multi-cancer context, drawing inspiration from successful text-to-image synthesis models used in natural images. In our approach, we employ a variational auto-encoder to reduce the dimensionality of a patient's gene expression profile, effectively distinguishing between different types of cancer. Subsequently, we employ a cascaded diffusion model to synthesize realistic whole-slide image tiles using the latent representation derived from the patient's RNA-Seq data. Our results demonstrate that the generated tiles accurately preserve the distribution of cell types observed in real-world data, with state-of-the-art cell identification models successfully detecting important cell types in the synthetic samples. Furthermore, we illustrate that the synthetic tiles maintain the cell fraction observed in bulk RNA-Seq data and that modifications in gene expression affect the composition of cell types in the synthetic tiles. Next, we utilize the synthetic data generated by RNA-CDM to pretrain machine learning models and observe improved performance compared to training from scratch. Our study emphasizes the potential usefulness of synthetic data in developing machine learning models in sarce-data settings, while also highlighting the possibility of imputing missing data modalities by leveraging the available information. In conclusion, our proposed RNA-CDM approach for synthetic data generation in biomedicine, particularly in the context of cancer diagnosis, offers a novel and promising solution to address data scarcity. By generating synthetic data that aligns with real-world distributions and leveraging it to pretrain machine learning models, we contribute to the development of robust clinical decision support systems and potential advancements in precision medicine.

20.
Phys Med Biol ; 68(7)2023 03 23.
Artigo em Inglês | MEDLINE | ID: mdl-36716497

RESUMO

Objective. Developing Machine Learning models (N Gorre et al 2023) for clinical applications from scratch can be a cumbersome task requiring varying levels of expertise. Seasoned developers and researchers may also often face incompatible frameworks and data preparation issues. This is further complicated in the context of diagnostic radiology and oncology applications, given the heterogenous nature of the input data and the specialized task requirements. Our goal is to provide clinicians, researchers, and early AI developers with a modular, flexible, and user-friendly software tool that can effectively meet their needs to explore, train, and test AI algorithms by allowing users to interpret their model results. This latter step involves the incorporation of interpretability and explainability methods that would allow visualizing performance as well as interpreting predictions across the different neural network layers of a deep learning algorithm.Approach. To demonstrate our proposed tool, we have developed the CRP10 AI Application Interface (CRP10AII) as part of the MIDRC consortium. CRP10AII is based on the web service Django framework in Python. CRP10AII/Django/Python in combination with another data manager tool/platform, data commons such as Gen3 can provide a comprehensive while easy to use machine/deep learning analytics tool. The tool allows to test, visualize, interpret how and why the deep learning model is performing. The major highlight of CRP10AII is its capability of visualization and interpretability of otherwise Blackbox AI algorithms.Results. CRP10AII provides many convenient features for model building and evaluation, including: (1) query and acquire data according to the specific application (e.g. classification, segmentation) from the data common platform (Gen3 here); (2) train the AI models from scratch or use pre-trained models (e.g. VGGNet, AlexNet, BERT) for transfer learning and test the model predictions, performance assessment, receiver operating characteristics curve evaluation; (3) interpret the AI model predictions using methods like SHAPLEY, LIME values; and (4) visualize the model learning through heatmaps and activation maps of individual layers of the neural network.Significance. Unexperienced users may have more time to swiftly pre-process, build/train their AI models on their own use-cases, and further visualize and explore these AI models as part of this pipeline, all in an end-to-end manner. CRP10AII will be provided as an open-source tool, and we expect to continue developing it based on users' feedback.


Assuntos
Algoritmos , Redes Neurais de Computação , Software , Aprendizado de Máquina , Curva ROC
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA