Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 65
Filtrar
1.
BMC Bioinformatics ; 25(1): 147, 2024 Apr 11.
Artículo en Inglés | MEDLINE | ID: mdl-38605284

RESUMEN

BACKGROUND: Expression quantitative trait locus (eQTL) analysis aims to detect the genetic variants that influence the expression of one or more genes. Gene-level eQTL testing forms a natural grouped-hypothesis testing strategy with clear biological importance. Methods to control family-wise error rate or false discovery rate for group testing have been proposed earlier, but may not be powerful or easily apply to eQTL data, for which certain structured alternatives may be defensible and may enable the researcher to avoid overly conservative approaches. RESULTS: In an empirical Bayesian setting, we propose a new method to control the false discovery rate (FDR) for grouped hypotheses. Here, each gene forms a group, with SNPs annotated to the gene corresponding to individual hypotheses. The heterogeneity of effect sizes in different groups is considered by the introduction of a random effects component. Our method, entitled Random Effects model and testing procedure for Group-level FDR control (REG-FDR), assumes a model for alternative hypotheses for the eQTL data and controls the FDR by adaptive thresholding. As a convenient alternate approach, we also propose Z-REG-FDR, an approximate version of REG-FDR, that uses only Z-statistics of association between genotype and expression for each gene-SNP pair. The performance of Z-REG-FDR is evaluated using both simulated and real data. Simulations demonstrate that Z-REG-FDR performs similarly to REG-FDR, but with much improved computational speed. CONCLUSION: Our results demonstrate that the Z-REG-FDR method performs favorably compared to other methods in terms of statistical power and control of FDR. It can be of great practical use for grouped hypothesis testing for eQTL analysis or similar problems in statistical genomics due to its fast computation and ability to be fit using only summary data.


Asunto(s)
Genómica , Sitios de Carácter Cuantitativo , Simulación por Computador , Teorema de Bayes , Genotipo
2.
Hepatology ; 2024 Mar 27.
Artículo en Inglés | MEDLINE | ID: mdl-38536042

RESUMEN

BACKGROUND AND AIMS: It is not known why severe cystic fibrosis (CF) liver disease (CFLD) with portal hypertension occurs in only ~7% of people with CF. We aimed to identify genetic modifiers for severe CFLD to improve understanding of disease mechanisms. APPROACH AND RESULTS: Whole-genome sequencing was available in 4082 people with CF with pancreatic insufficiency (n = 516 with severe CFLD; n = 3566 without CFLD). We tested ~15.9 million single nucleotide polymorphisms (SNPs) for association with severe CFLD versus no-CFLD, using pre-modulator clinical phenotypes including (1) genetic variant ( SERPINA1 ; Z allele) previously associated with severe CFLD; (2) candidate SNPs (n = 205) associated with non-CF liver diseases; (3) genome-wide association study of common/rare SNPs; (4) transcriptome-wide association; and (5) gene-level and pathway analyses. The Z allele was significantly associated with severe CFLD ( p = 1.1 × 10 -4 ). No significant candidate SNPs were identified. A genome-wide association study identified genome-wide significant SNPs in 2 loci and 2 suggestive loci. These 4 loci contained genes [significant, PKD1 ( p = 8.05 × 10 -10 ) and FNBP1 ( p = 4.74 × 10 -9 ); suggestive, DUSP6 ( p = 1.51 × 10 -7 ) and ANKUB1 ( p = 4.69 × 10 -7 )] relevant to severe CFLD pathophysiology. The transcriptome-wide association identified 3 genes [ CXCR1 ( p = 1.01 × 10 -6 ) , AAMP ( p = 1.07 × 10 -6 ), and TRBV24 ( p = 1.23 × 10 -5 )] involved in hepatic inflammation and innate immunity. Gene-ranked analyses identified pathways enriched in genes linked to multiple liver pathologies. CONCLUSION: These results identify loci/genes associated with severe CFLD that point to disease mechanisms involving hepatic fibrosis, inflammation, innate immune function, vascular pathology, intracellular signaling, actin cytoskeleton and tight junction integrity and mechanisms of hepatic steatosis and insulin resistance. These discoveries will facilitate mechanistic studies and the development of therapeutics for severe CFLD.

3.
Toxicology ; 503: 153763, 2024 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-38423244

RESUMEN

Per- and poly-fluoroalkyl substances (PFAS) are extensively used in commerce leading to their prevalence in the environment. Due to their chemical stability, PFAS are considered to be persistent and bioaccumulative; they are frequently detected in both the environment and humans. Because of this, PFAS as a class (composed of hundreds to thousands of chemicals) are contaminants of very high concern. Little information is available for the vast majority of PFAS, and regulatory agencies lack safety data to determine whether exposure limits or restrictions are needed. Cell-based assays are a pragmatic approach to inform decision-makers on potential health hazards; therefore, we hypothesized that a targeted battery of human in vitro assays can be used to determine whether there are structure-bioactivity relationships for PFAS, and to characterize potential risks by comparing bioactivity (points of departure) to exposure estimates. We tested 56 PFAS from 8 structure-based subclasses in concentration response (0.1-100 µM) using six human cell types selected from target organs with suggested adverse effects of PFAS - human induced pluripotent stem cell (iPSC)-derived hepatocytes, neurons, and cardiomyocytes, primary human hepatocytes, endothelial and HepG2 cells. While many compounds were without effect; certain PFAS demonstrated cell-specific activity highlighting the necessity of using a compendium of in vitro models to identify potential hazards. No class-specific groupings were evident except for some chain length- and structure-related trends. In addition, margins of exposure (MOE) were derived using empirical and predicted exposure data. Conservative MOE calculations showed that most tested PFAS had a MOE in the 1-100 range; ∼20% of PFAS had MOE<1, providing tiered priorities for further studies. Overall, we show that a compendium of human cell-based models can be used to derive bioactivity estimates for a range of PFAS, enabling comparisons with human biomonitoring data. Furthermore, we emphasize that establishing structure-bioactivity relationships may be challenging for the tested PFAS.


Asunto(s)
Fluorocarburos , Células Madre Pluripotentes Inducidas , Humanos , Monitoreo Biológico , Fluorocarburos/química
4.
Toxics ; 11(7)2023 Jul 05.
Artículo en Inglés | MEDLINE | ID: mdl-37505552

RESUMEN

Human cell-based test methods can be used to evaluate potential hazards of mixtures and products of petroleum refining ("unknown or variable composition, complex reaction products, or biological materials" substances, UVCBs). Analyses of bioactivity and detailed chemical characterization of petroleum UVCBs were used separately for grouping these substances; a combination of the approaches has not been undertaken. Therefore, we used a case example of representative high production volume categories of petroleum UVCBs, 25 lower olefin substances from low benzene naphtha and resin oils categories, to determine whether existing manufacturing-based category grouping can be supported. We collected two types of data: nontarget ion mobility spectrometry-mass spectrometry of both neat substances and their organic extracts and in vitro bioactivity of the organic extracts in five human cell types: umbilical vein endothelial cells and induced pluripotent stem cell-derived hepatocytes, endothelial cells, neurons, and cardiomyocytes. We found that while similarity in composition and bioactivity can be observed for some substances, existing categories are largely heterogeneous. Strong relationships between composition and bioactivity were observed, and individual constituents that determine these associations were identified. Overall, this study showed a promising approach that combines chemical composition and bioactivity data to better characterize the variability within manufacturing categories of petroleum UVCBs.

5.
Biom J ; 65(6): e2200029, 2023 08.
Artículo en Inglés | MEDLINE | ID: mdl-37212427

RESUMEN

Multivariate heterogeneous responses and heteroskedasticity have attracted increasing attention in recent years. In genome-wide association studies, effective simultaneous modeling of multiple phenotypes would improve statistical power and interpretability. However, a flexible common modeling system for heterogeneous data types can pose computational difficulties. Here we build upon a previous method for multivariate probit estimation using a two-stage composite likelihood that exhibits favorable computational time while retaining attractive parameter estimation properties. We extend this approach to incorporate multivariate responses of heterogeneous data types (binary and continuous), and possible heteroskedasticity. Although the approach has wide applications, it would be particularly useful for genomics, precision medicine, or individual biomedical prediction. Using a genomics example, we explore statistical power and confirm that the approach performs well for hypothesis testing and coverage percentages under a wide variety of settings. The approach has the potential to better leverage genomics data and provide interpretable inference for pleiotropy, in which a locus is associated with multiple traits.


Asunto(s)
Estudio de Asociación del Genoma Completo , Genómica , Estudio de Asociación del Genoma Completo/métodos , Fenotipo , Genómica/métodos , Probabilidad
6.
J Cyst Fibros ; 22(5): 857-863, 2023 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-37217389

RESUMEN

BACKGROUND: Pseudomonas aeruginosa (Pa) infection in cystic fibrosis (CF) is characterized in stages: never (prior to first positive culture) to incident (first positive culture) to chronic. The association of Pa infection stage with lung function trajectory is poorly understood and the impact of age on this association has not been examined. We hypothesized that FEV1 decline would be slowest prior to Pa infection, intermediate after incident infection and greatest after chronic Pa infection. METHODS: Participants in a large US prospective cohort study diagnosed with CF prior to age 3 contributed data through the U.S. CF Patient Registry. Cubic spline linear mixed effects models were used to evaluate the longitudinal association of Pa stage (never, incident, chronic using 4 different definitions) with FEV1 adjusted for relevant covariates. Models contained interaction terms between age and Pa stage. RESULTS: 1,264 subjects born 1992-2006 provided a median 9.5 (IQR 0.25 to 15.75) years of follow up through 2017. 89% developed incident Pa; 39-58% developed chronic Pa depending on the definition. Compared to never Pa, incident Pa infection was associated with greater annual FEV1 decline and chronic Pa infection with the greatest FEV1 decline. The most rapid FEV1 decline and strongest association with Pa infection stage was seen in early adolescence (ages 12-15). CONCLUSIONS: Annual FEV1 decline worsens significantly with each Pa infection stage in children with CF. Our findings suggest that measures to prevent chronic infection, particularly during the high-risk period of early adolescence, could mitigate FEV1 decline and improve survival.


Asunto(s)
Fibrosis Quística , Infecciones por Pseudomonas , Adolescente , Humanos , Niño , Preescolar , Fibrosis Quística/complicaciones , Fibrosis Quística/diagnóstico , Fibrosis Quística/epidemiología , Infecciones por Pseudomonas/diagnóstico , Infecciones por Pseudomonas/epidemiología , Infecciones por Pseudomonas/complicaciones , Estudios Prospectivos , Pruebas de Función Respiratoria , Pseudomonas aeruginosa , Pulmón
7.
Am J Respir Crit Care Med ; 207(10): 1324-1333, 2023 05 15.
Artículo en Inglés | MEDLINE | ID: mdl-36921087

RESUMEN

Rationale: Lung disease is the major cause of morbidity and mortality in persons with cystic fibrosis (pwCF). Variability in CF lung disease has substantial non-CFTR (CF transmembrane conductance regulator) genetic influence. Identification of genetic modifiers has prognostic and therapeutic importance. Objectives: Identify genetic modifier loci and genes/pathways associated with pulmonary disease severity. Methods: Whole-genome sequencing data on 4,248 unique pwCF with pancreatic insufficiency and lung function measures were combined with imputed genotypes from an additional 3,592 patients with pancreatic insufficiency from the United States, Canada, and France. This report describes association of approximately 15.9 million SNPs using the quantitative Kulich normal residual mortality-adjusted (KNoRMA) lung disease phenotype in 7,840 pwCF using premodulator lung function data. Measurements and Main Results: Testing included common and rare SNPs, transcriptome-wide association, gene-level, and pathway analyses. Pathway analyses identified novel associations with genes that have key roles in organ development, and we hypothesize that these genes may relate to dysanapsis and/or variability in lung repair. Results confirmed and extended previous genome-wide association study findings. These whole-genome sequencing data provide finely mapped genetic information to support mechanistic studies. No novel primary associations with common single variants or rare variants were found. Multilocus effects at chr5p13 (SLC9A3/CEP72) and chr11p13 (EHF/APIP) were identified. Variant effect size estimates at associated loci were consistently ordered across the cohorts, indicating possible age or birth cohort effects. Conclusions: This premodulator genomic, transcriptomic, and pathway association study of 7,840 pwCF will facilitate mechanistic and postmodulator genetic studies and the development of novel therapeutics for CF lung disease.


Asunto(s)
Fibrosis Quística , Humanos , Fibrosis Quística/genética , Estudio de Asociación del Genoma Completo/métodos , Regulador de Conductancia de Transmembrana de Fibrosis Quística/genética , Gravedad del Paciente , Pulmón , Proteínas Asociadas a Microtúbulos/genética
8.
Bioengineering (Basel) ; 10(2)2023 Feb 08.
Artículo en Inglés | MEDLINE | ID: mdl-36829725

RESUMEN

The microbiota has proved to be one of the critical factors for many diseases, and researchers have been using microbiome data for disease prediction. However, models trained on one independent microbiome study may not be easily applicable to other independent studies due to the high level of variability in microbiome data. In this study, we developed a method for improving the generalizability and interpretability of machine learning models for predicting three different diseases (colorectal cancer, Crohn's disease, and immunotherapy response) using nine independent microbiome datasets. Our method involves combining a smaller dataset with a larger dataset, and we found that using at least 25% of the target samples in the source data resulted in improved model performance. We determined random forest as our top model and employed feature selection to identify common and important taxa for disease prediction across the different studies. Our results suggest that this leveraging scheme is a promising approach for improving the accuracy and interpretability of machine learning models for predicting diseases based on microbiome data.

9.
Front Digit Health ; 5: 1291132, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-38173911

RESUMEN

The landscape of healthcare communication is undergoing a profound transformation in the digital age, and at the heart of this evolution are AI-powered chatbots. This mini-review delves into the role of AI chatbots in digital health, providing a detailed exploration of their applications, benefits, challenges, and future prospects. Our focus is on their versatile applications within healthcare, encompassing health information dissemination, appointment scheduling, medication management, remote patient monitoring, and emotional support services. The review underscores the compelling advantages of AI chatbots. However, it also addresses the significant challenges posed by the integration of AI tools into healthcare communication.

10.
BMC Bioinformatics ; 23(1): 468, 2022 Nov 08.
Artículo en Inglés | MEDLINE | ID: mdl-36348267

RESUMEN

BACKGROUND: Studying the co-occurrence network structure of microbial samples is one of the critical approaches to understanding the perplexing and delicate relationship between the microbe, host, and diseases. It is also critical to develop a tool for investigating co-occurrence networks and differential abundance analyses to reveal the disease-related taxa-taxa relationship. In addition, it is also necessary to tighten the co-occurrence network into smaller modules to increase the ability for functional annotation and interpretability of  these taxa-taxa relationships.  Also, it is critical to retain the phylogenetic relationship among the taxa to identify differential abundance patterns, which can be used to resolve contradicting functions reported by different studies. RESULTS: In this article, we present Correlation and Consensus-based Cross-taxonomy Network Analysis (C3NA), a user-friendly R package for investigating compositional microbial sequencing data to identify and compare co-occurrence patterns across different taxonomic levels. C3NA contains two interactive graphic user interfaces (Shiny applications), one of them dedicated to the comparison between two diagnoses, e.g., disease versus control. We used C3NA to analyze two well-studied diseases, colorectal cancer, and Crohn's disease. We discovered clusters of study and disease-dependent taxa that overlap with known functional taxa studied by other discovery studies and differential abundance analyses. CONCLUSION: C3NA offers a new microbial data analyses pipeline for refined and enriched taxa-taxa co-occurrence network analyses, and the usability was further expanded via the built-in Shiny applications for interactive investigation.


Asunto(s)
Filogenia , Consenso
11.
Am J Hum Genet ; 109(11): 1986-1997, 2022 11 03.
Artículo en Inglés | MEDLINE | ID: mdl-36198314

RESUMEN

Whole-genome sequencing (WGS) is the gold standard for fully characterizing genetic variation but is still prohibitively expensive for large samples. To reduce costs, many studies sequence only a subset of individuals or genomic regions, and genotype imputation is used to infer genotypes for the remaining individuals or regions without sequencing data. However, not all variants can be well imputed, and the current state-of-the-art imputation quality metric, denoted as standard Rsq, is poorly calibrated for lower-frequency variants. Here, we propose MagicalRsq, a machine-learning-based method that integrates variant-level imputation and population genetics statistics, to provide a better calibrated imputation quality metric. Leveraging WGS data from the Cystic Fibrosis Genome Project (CFGP), and whole-exome sequence data from UK BioBank (UKB), we performed comprehensive experiments to evaluate the performance of MagicalRsq compared to standard Rsq for partially sequenced studies. We found that MagicalRsq aligns better with true R2 than standard Rsq in almost every situation evaluated, for both European and African ancestry samples. For example, when applying models trained from 1,992 CFGP sequenced samples to an independent 3,103 samples with no sequencing but TOPMed imputation from array genotypes, MagicalRsq, compared to standard Rsq, achieved net gains of 1.4 million rare, 117k low-frequency, and 18k common variants, where net gains were gained numbers of correctly distinguished variants by MagicalRsq over standard Rsq. MagicalRsq can serve as an improved post-imputation quality metric and will benefit downstream analysis by better distinguishing well-imputed variants from those poorly imputed. MagicalRsq is freely available on GitHub.


Asunto(s)
Estudio de Asociación del Genoma Completo , Polimorfismo de Nucleótido Simple , Humanos , Estudio de Asociación del Genoma Completo/métodos , Polimorfismo de Nucleótido Simple/genética , Calibración , Genotipo , Aprendizaje Automático
12.
Am J Hum Genet ; 109(10): 1894-1908, 2022 10 06.
Artículo en Inglés | MEDLINE | ID: mdl-36206743

RESUMEN

Individuals with cystic fibrosis (CF) develop complications of the gastrointestinal tract influenced by genetic variants outside of CFTR. Cystic fibrosis-related diabetes (CFRD) is a distinct form of diabetes with a variable age of onset that occurs frequently in individuals with CF, while meconium ileus (MI) is a severe neonatal intestinal obstruction affecting ∼20% of newborns with CF. CFRD and MI are slightly correlated traits with previous evidence of overlap in their genetic architectures. To better understand the genetic commonality between CFRD and MI, we used whole-genome-sequencing data from the CF Genome Project to perform genome-wide association. These analyses revealed variants at 11 loci (6 not previously identified) that associated with MI and at 12 loci (5 not previously identified) that associated with CFRD. Of these, variants at SLC26A9, CEBPB, and PRSS1 associated with both traits; variants at SLC26A9 and CEBPB increased risk for both traits, while variants at PRSS1, the higher-risk alleles for CFRD, conferred lower risk for MI. Furthermore, common and rare variants within the SLC26A9 locus associated with MI only or CFRD only. As expected, different loci modify risk of CFRD and MI; however, a subset exhibit pleiotropic effects indicating etiologic and mechanistic overlap between these two otherwise distinct complications of CF.


Asunto(s)
Fibrosis Quística , Diabetes Mellitus , Enfermedades del Recién Nacido , Obstrucción Intestinal , Fibrosis Quística/complicaciones , Fibrosis Quística/genética , Regulador de Conductancia de Transmembrana de Fibrosis Quística/genética , Diabetes Mellitus/genética , Estudio de Asociación del Genoma Completo , Humanos , Recién Nacido , Obstrucción Intestinal/complicaciones , Obstrucción Intestinal/genética
13.
Sci Rep ; 12(1): 15151, 2022 09 07.
Artículo en Inglés | MEDLINE | ID: mdl-36071064

RESUMEN

In this study, we generated whole-transcriptome RNA-Seq from n = 192 genotyped liver samples and used these data with existing data from the GTEx Project (RNA-Seq) and previous liver eQTL (microarray) studies to create an enhanced transcriptomic sequence resource in the human liver. Analyses of genotype-expression associations show pronounced enrichment of associations with genes of drug response. The associations are primarily consistent across the two RNA-Seq datasets, with some modest variation, indicating the importance of obtaining multiple datasets to produce a robust resource. We further used an empirical Bayesian model to compare eQTL patterns in liver and an additional 20 GTEx tissues, finding that MHC genes, and especially class II genes, are enriched for liver-specific eQTL patterns. To illustrate the utility of the resource to augment GWAS analysis with small sample sizes, we developed a novel meta-analysis technique to combine several liver eQTL data sources. We also illustrate its application using a transcriptome-enhanced re-analysis of a study of neutropenia in pancreatic cancer patients. The associations of genotype with liver expression, including splice variation and its genetic associations, are made available in a searchable genome browser.


Asunto(s)
Estudio de Asociación del Genoma Completo , Sitios de Carácter Cuantitativo , Teorema de Bayes , Estudio de Asociación del Genoma Completo/métodos , Genómica , Humanos , Hígado , Polimorfismo de Nucleótido Simple
14.
Am J Hum Genet ; 109(9): 1638-1652, 2022 09 01.
Artículo en Inglés | MEDLINE | ID: mdl-36055212

RESUMEN

Hypoxia-inducible factor prolyl hydroxylase inhibitors (HIF-PHIs) are currently under clinical development for treating anemia in chronic kidney disease (CKD), but it is important to monitor their cardiovascular safety. Genetic variants can be used as predictors to help inform the potential risk of adverse effects associated with drug treatments. We therefore aimed to use human genetics to help assess the risk of adverse cardiovascular events associated with therapeutically altered EPO levels to help inform clinical trials studying the safety of HIF-PHIs. By performing a genome-wide association meta-analysis of EPO (n = 6,127), we identified a cis-EPO variant (rs1617640) lying in the EPO promoter region. We validated this variant as most likely causal in controlling EPO levels by using genetic and functional approaches, including single-base gene editing. Using this variant as a partial predictor for therapeutic modulation of EPO and large genome-wide association data in Mendelian randomization tests, we found no evidence (at p < 0.05) that genetically predicted long-term rises in endogenous EPO, equivalent to a 2.2-unit increase, increased risk of coronary artery disease (CAD, OR [95% CI] = 1.01 [0.93, 1.07]), myocardial infarction (MI, OR [95% CI] = 0.99 [0.87, 1.15]), or stroke (OR [95% CI] = 0.97 [0.87, 1.07]). We could exclude increased odds of 1.15 for cardiovascular disease for a 2.2-unit EPO increase. A combination of genetic and functional studies provides a powerful approach to investigate the potential therapeutic profile of EPO-increasing therapies for treating anemia in CKD.


Asunto(s)
Anemia , Enfermedad de la Arteria Coronaria , Infarto del Miocardio , Insuficiencia Renal Crónica , Anemia/tratamiento farmacológico , Anemia/genética , Enfermedad de la Arteria Coronaria/genética , Estudio de Asociación del Genoma Completo , Humanos , Análisis de la Aleatorización Mendeliana , Infarto del Miocardio/genética , Insuficiencia Renal Crónica/genética
15.
Toxics ; 10(8)2022 Aug 01.
Artículo en Inglés | MEDLINE | ID: mdl-36006120

RESUMEN

Human cell-based population-wide in vitro models have been proposed as a strategy to derive chemical-specific estimates of inter-individual variability; however, the utility of this approach has not yet been tested for cumulative exposures in mixtures. This study aimed to test defined mixtures and their individual components and determine whether adverse effects of the mixtures were likely to be more variable in a population than those of the individual chemicals. The in vitro model comprised 146 human lymphoblastoid cell lines from four diverse subpopulations of European and African descent. Cells were exposed, in concentration−response, to 42 chemicals from diverse classes of environmental pollutants; in addition, eight defined mixtures were prepared from these chemicals using several exposure- or hazard-based scenarios. Points of departure for cytotoxicity were derived using Bayesian concentration−response modeling and population variability was quantified in the form of a toxicodynamic variability factor (TDVF). We found that 28 chemicals and all mixtures exhibited concentration−response cytotoxicity, enabling calculation of the TDVF. The median TDVF across test substances, for both individual chemicals or defined mixtures, ranged from a default assumption (101/2) of toxicodynamic variability in human population to >10. The data also provide a proof of principle for single-variant genome-wide association mapping for toxicity of the chemicals and mixtures, although replication would be necessary due to statistical power limitations with the current sample size. This study demonstrates the feasibility of using a set of human lymphoblastoid cell lines as an in vitro model to quantify the extent of inter-individual variability in hazardous properties of both individual chemicals and mixtures. The data show that population variability of the mixtures is unlikely to exceed that of the most variable component, and that similarity in genome-wide associations among components may be used to accrue additional evidence for grouping of constituents in a mixture for cumulative assessments.

16.
Front Mol Biosci ; 9: 921945, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-36032686

RESUMEN

In the United States, colorectal cancer is the second largest cause of cancer death, and accurate early detection and identification of high-risk patients is a high priority. Although fecal screening tests are available, the close relationship between colorectal cancer and the gut microbiome has generated considerable interest. We describe a machine learning method for gut microbiome data to assist in diagnosing colorectal cancer. Our methodology integrates feature engineering, mediation analysis, statistical modeling, and network analysis into a novel unified pipeline. Simulation results illustrate the value of the method in comparison to existing methods. For predicting colorectal cancer in two real datasets, this pipeline showed an 8.7% higher prediction accuracy and 13% higher area under the receiver operator characteristic curve than other published work. Additionally, the approach highlights important colorectal cancer-related taxa for prioritization, such as high levels of Bacteroides fragilis, which can help elucidate disease pathology. Our algorithms and approach can be widely applied for Colorectal cancer prediction using either 16 S rRNA or shotgun metagenomics data.

17.
Front Pharmacol ; 13: 883433, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-35899108

RESUMEN

The need to test chemicals in a timely and cost-effective manner has driven the development of new alternative methods (NAMs) that utilize in silico and in vitro approaches for toxicity prediction. There is a wealth of existing data from human studies that can aid in understanding the ability of NAMs to support chemical safety assessment. This study aims to streamline the integration of data from existing human cohorts by programmatically identifying related variables within each study. Study variables from the Atherosclerosis Risk in Communities (ARIC) study were clustered based on their correlation within the study. The quality of the clusters was evaluated via a combination of manual review and natural language processing (NLP). We identified 391 clusters including 3,285 variables. Manual review of the clusters containing more than one variable determined that human reviewers considered 95% of the clusters related to some degree. To evaluate potential bias in the human reviewers, clusters were also scored via NLP, which showed a high concordance with the human classification. Clusters were further consolidated into cluster groups using the Louvain community finding algorithm. Manual review of the cluster groups confirmed that clusters within a group were more related than clusters from different groups. Our data-driven approach can facilitate data harmonization and curation efforts by providing human annotators with groups of related variables reflecting the themes present in the data. Reviewing groups of related variables should increase efficiency of the human review, and the number of variables reviewed can be reduced by focusing curator attention on variable groups whose theme is relevant for the topic being studied.

18.
HGG Adv ; 3(3): 100117, 2022 Jul 14.
Artículo en Inglés | MEDLINE | ID: mdl-35647563

RESUMEN

CFTR F508del (c.1521_1523delCTT, p.Phe508delPhe) is the most common pathogenic allele underlying cystic fibrosis (CF), and its frequency varies in a geographic cline across Europe. We hypothesized that genetic variation associated with this cline is overrepresented in a large cohort (N > 5,000) of persons with CF who underwent whole-genome sequencing and that this pattern could result in spurious associations between variants correlated with both the F508del genotype and CF-related outcomes. Using principal-component (PC) analyses, we showed that variation in the CFTR region disproportionately contributes to a PC explaining a relatively high proportion of genetic variance. Variation near CFTR was correlated with population structure among persons with CF, and this correlation was driven by a subset of the sample inferred to have European ancestry. We performed genome-wide association studies comparing persons with CF with one versus two copies of the F508del allele; this allowed us to identify genetic variation associated with the F508del allele and to determine that standard PC-adjustment strategies eliminated the significant association signals. Our results suggest that PC adjustment can adequately prevent spurious associations between genetic variants and CF-related traits and are therefore effective tools to control for population structure even when population structure is confounded with disease severity and a common pathogenic variant.

19.
HGG Adv ; 3(2): 100090, 2022 Apr 14.
Artículo en Inglés | MEDLINE | ID: mdl-35128485

RESUMEN

Cystic fibrosis (CF) is a severe genetic disorder that can cause multiple comorbidities affecting the lungs, the pancreas, the luminal digestive system and beyond. In our previous genome-wide association studies (GWAS), we genotyped approximately 8,000 CF samples using a mixture of different genotyping platforms. More recently, the Cystic Fibrosis Genome Project (CFGP) performed deep (approximately 30×) whole genome sequencing (WGS) of 5,095 samples to better understand the genetic mechanisms underlying clinical heterogeneity among patients with CF. For mixtures of GWAS array and WGS data, genotype imputation has proven effective in increasing effective sample size. Therefore, we first performed imputation for the approximately 8,000 CF samples with GWAS array genotype using the Trans-Omics for Precision Medicine (TOPMed) freeze 8 reference panel. Our results demonstrate that TOPMed can provide high-quality imputation for patients with CF, boosting genomic coverage from approximately 0.3-4.2 million genotyped markers to approximately 11-43 million well-imputed markers, and significantly improving polygenic risk score (PRS) prediction accuracy. Furthermore, we built a CF-specific CFGP reference panel based on WGS data of patients with CF. We demonstrate that despite having approximately 3% the sample size of TOPMed, our CFGP reference panel can still outperform TOPMed when imputing some CF disease-causing variants, likely owing to allele and haplotype differences between patients with CF and general populations. We anticipate our imputed data for 4,656 samples without WGS data will benefit our subsequent genetic association studies, and the CFGP reference panel built from CF WGS samples will benefit other investigators studying CF.

20.
J Cyst Fibros ; 21(1): 40-44, 2022 01.
Artículo en Inglés | MEDLINE | ID: mdl-34393091

RESUMEN

Chronic Pseudomonas aeruginosa (Pa) infection is associated with increased morbidity and mortality in people with cystic fibrosis (CF). There is no gold standard definition of chronic Pa infection in CF. We compared chronic Pa definitions using encounter-based versus annualized data in the Early Pseudomonas Infection Control (EPIC) Observational study cohort, and subsequently compared annualized chronic Pa definitions across a range of U.S. cohorts spanning decades of CF care. We found that an annualized chronic Pa definition requiring at least 1 Pa+ culture in 3 of 4 consecutive years ("Green 3/4") resulted in chronic Pa metrics similar to established encounter-based modified Leeds criteria definitions, including a similar age at and proportion who fulfilled chronic Pa criteria, and a similar proportion with sustained Pa infection after meeting the chronic Pa definition. The Green 3/4 chronic Pa definition will be valuable for longitudinal analyses in cohorts with limited culture frequency.


Asunto(s)
Fibrosis Quística/microbiología , Infecciones por Pseudomonas/diagnóstico , Terminología como Asunto , Niño , Preescolar , Enfermedad Crónica , Estudios de Cohortes , Humanos , Lactante , Pseudomonas aeruginosa , Sistema de Registros , Factores de Tiempo
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...