Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 18 de 18
Filtrar
Más filtros













Base de datos
Intervalo de año de publicación
1.
BMC Med Genomics ; 17(1): 132, 2024 May 16.
Artículo en Inglés | MEDLINE | ID: mdl-38755654

RESUMEN

BACKGROUND: Polygenic risk scores (PRS) quantify an individual's genetic predisposition for different traits and are expected to play an increasingly important role in personalized medicine. A crucial challenge in clinical practice is the generalizability and transferability of PRS models to populations with different ancestries. When assessing the generalizability of PRS models for continuous traits, the R 2 is a commonly used measure to evaluate prediction accuracy. While the R 2 is a well-defined goodness-of-fit measure for statistical linear models, there exist different definitions for its application on test data, which complicates interpretation and comparison of results. METHODS: Based on large-scale genotype data from the UK Biobank, we compare three definitions of the R 2 on test data for evaluating the generalizability of PRS models to different populations. Polygenic models for several phenotypes, including height, BMI and lipoprotein A, are derived based on training data with European ancestry using state-of-the-art regression methods and are evaluated on various test populations with different ancestries. RESULTS: Our analysis shows that the choice of the R 2  definition can lead to considerably different results on test data, making the comparison of R 2  values from the literature problematic. While the definition as the squared correlation between predicted and observed phenotypes solely addresses the discriminative performance and always yields values between 0 and 1, definitions of the R 2 based on the mean squared prediction error (MSPE) with reference to intercept-only models assess both discrimination and calibration. These MSPE-based definitions can yield negative values indicating miscalibrated predictions for out-of-target populations. We argue that the choice of the most appropriate definition depends on the aim of PRS analysis - whether it primarily serves for risk stratification or also for individual phenotype prediction. Moreover, both correlation-based and MSPE-based definitions of R 2 can provide valuable complementary information. CONCLUSIONS: Awareness of the different definitions of the R 2 on test data is necessary to facilitate the reporting and interpretation of results on PRS generalizability. It is recommended to explicitly state which definition was used when reporting R 2 values on test data. Further research is warranted to develop and evaluate well-calibrated polygenic models for diverse populations.


Asunto(s)
Modelos Genéticos , Herencia Multifactorial , Humanos , Fenotipo , Predisposición Genética a la Enfermedad
2.
Am J Med Genet A ; : e63641, 2024 May 09.
Artículo en Inglés | MEDLINE | ID: mdl-38725242

RESUMEN

Next-generation phenotyping (NGP) can be used to compute the similarity of dysmorphic patients to known syndromic diseases. So far, the technology has been evaluated in variant prioritization and classification, providing evidence for pathogenicity if the phenotype matched with other patients with a confirmed molecular diagnosis. In a Nigerian cohort of individuals with facial dysmorphism, we used the NGP tool GestaltMatcher to screen portraits prior to genetic testing and subjected individuals with high similarity scores to exome sequencing (ES). Here, we report on two individuals with global developmental delay, pulmonary artery stenosis, and genital and limb malformations for whom GestaltMatcher yielded Cornelia de Lange syndrome (CdLS) as the top hit. ES revealed a known pathogenic nonsense variant, NM_133433.4: c.598C>T; p.(Gln200*), as well as a novel frameshift variant c.7948dup; p.(Ile2650Asnfs*11) in NIPBL. Our results suggest that NGP can be used as a screening tool and thresholds could be defined for achieving high diagnostic yields in ES. Training the artificial intelligence (AI) with additional cases of the same ethnicity might further increase the positive predictive value of GestaltMatcher.

3.
medRxiv ; 2024 Mar 08.
Artículo en Inglés | MEDLINE | ID: mdl-37503210

RESUMEN

Dysmorphologists sometimes encounter challenges in recognizing disorders due to phenotypic variability influenced by factors such as age and ethnicity. Moreover, the performance of Next Generation Phenotyping Tools such as GestaltMatcher is dependent on the diversity of the training set. Therefore, we developed GestaltMatcher Database (GMDB) - a global reference for the phenotypic variability of rare diseases that complies with the FAIR-principles. We curated dysmorphic patient images and metadata from 2,224 publications, transforming GMDB into an online dynamic case report journal. To encourage clinicians worldwide to contribute, each case can receive a Digital Object Identifier (DOI), making it a citable micro-publication. This resulted in a collection of 2,312 unpublished images, partly with longitudinal data. We have compiled a collection of 10,189 frontal images from 7,695 patients representing 683 disorders. The web interface enables gene- and phenotype-centered queries for registered users (https://db.gestaltmatcher.org/). Despite the predominant European ancestry of most patients (59%), our global collaborations have facilitated the inclusion of data from frequently underrepresented ethnicities, with 17% Asian, 4% African, and 6% with other ethnic backgrounds. The analysis has revealed a significant enhancement in GestaltMatcher performance across all ethnic groups, incorporating non-European ethnicities, showcasing a remarkable increase in Top-1-Accuracy by 31.56% and Top-5-Accuracy by 12.64%. Importantly, this improvement was achieved without altering the performance metrics for European patients. GMDB addresses dysmorphology challenges by representing phenotypic variability and including underrepresented groups, enhancing global diagnostic rates and serving as a vital clinician reference database.

4.
Graefes Arch Clin Exp Ophthalmol ; 262(1): 53-60, 2024 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-37672102

RESUMEN

PURPOSE: Subretinal drusenoid deposits (SDDs) are distinct extracellular alteration anterior to the retinal pigment epithelium (RPE). Given their commonly uniform phenotype, a hereditary predisposition seems likely. Hence, we aim to investigate prevalence and determinants in patients' first-degree relatives. METHODS: We recruited SDD outpatients at their visits to our clinic and invited their relatives. We performed a full ophthalmic examination including spectral domain-optical coherence tomography (SD-OCT) and graded presence, disease stage of SDD as well as percentage of infrared (IR) en face area affected by SDD. Moreover, we performed genetic sequencing and calculated a polygenic risk score (PRS) for AMD. We conducted multivariable regression models to assess potential determinants of SDD and associations of SDD with PRS. RESULTS: We included 195 participants, 123 patients (mean age 81.4 ± 7.2 years) and 72 relatives (mean age 52.2 ± 14.2 years), of which 7 presented SDD, resulting in a prevalence of 9.7%. We found older age to be associated with SDD presence and area in the total cohort and a borderline association of higher body mass index (BMI) with SDD presence in the relatives. Individuals with SDD tended to have a higher PRS, which, however, was not statistically significant in the multivariable regression. CONCLUSION: Our study indicates a potential hereditary aspect of SDD and confirms the strong association with age. Based on our results, relatives of SDD patients ought to be closely monitored for retinal alterations, particularly at an older age. Further longitudinal studies with larger sample size and older relatives are needed to confirm or refute our findings.


Asunto(s)
Drusas Retinianas , Humanos , Anciano , Anciano de 80 o más Años , Adulto , Persona de Mediana Edad , Drusas Retinianas/diagnóstico , Drusas Retinianas/epidemiología , Drusas Retinianas/genética , Prevalencia , Epitelio Pigmentado de la Retina , Puntuación de Riesgo Genético , Tomografía de Coherencia Óptica/métodos , Angiografía con Fluoresceína
5.
Sci Rep ; 13(1): 18783, 2023 11 01.
Artículo en Inglés | MEDLINE | ID: mdl-37914736

RESUMEN

Lynch syndrome (LS) is characterised by an increased risk of developing colorectal cancer (CRC) and other extracolonic epithelial cancers. It is caused by pathogenic germline variants in DNA mismatch repair (MMR) genes or the EPCAM gene, leading to a less functional DNA MMR system. Individuals diagnosed with LS (LS individuals) have a 10-80% lifetime risk of developing cancer. However, there is considerable variability in the age of cancer onset, which cannot be attributed to the specific MMR gene or variant alone. It is speculated that multiple genetic and environmental factors contribute to this variability, including two single nucleotide polymorphisms (SNPs) in the methylenetetrahydrofolate reductase (MTHFR) gene: C677T (rs1801133) and A1298C (rs1801131). By decreasing MTHFR activity, these SNPs theoretically reduce the silencing of DNA repair genes and increase the availability of nucleotides for DNA synthesis and repair, thereby protecting against early-onset cancer in LS. We investigated the effect of these SNPs on LS disease expression in 2,723 LS individuals from Australia, Poland, Germany, Norway and Spain. The association between age at cancer onset and SNP genotype (risk of cancer) was estimated using Cox regression adjusted for gender, country and affected MMR gene. For A1298C (rs1801131), both the AC and CC genotypes were significantly associated with a reduced risk of developing CRC compared to the AA genotype, but no association was seen for C677T (rs1801133). However, an aggregated effect of protective alleles was seen when combining the alleles from the two SNPs, especially for LS individuals carrying 1 and 2 alleles. For individuals with germline pathogenic variants in MLH1, the CC genotype of A1298C was estimated to reduce the risk of CRC significantly by 39% (HR = 0.61, 95% CI 0.42, 0.89, p = 0.011), while for individuals with pathogenic germline MSH2 variants, the AC genotype (compared to AA) was estimated to reduce the risk of CRC by 26% (HR = 0.66, 95% CI 0.53, 0.83, p = 0.01). In comparison, no association was observed for C677T (rs1801133). In conclusion, our study suggests that combining the MMR gene information with the MTHFR genotype, including the aggregated effect of protective alleles, could be useful in developing an algorithm that estimates the risk of CRC in LS individuals.


Asunto(s)
Neoplasias Colorrectales Hereditarias sin Poliposis , Neoplasias Colorrectales , Humanos , Neoplasias Colorrectales Hereditarias sin Poliposis/genética , Neoplasias Colorrectales/epidemiología , Neoplasias Colorrectales/genética , Neoplasias Colorrectales/patología , Metilenotetrahidrofolato Reductasa (NADPH2)/genética , Genotipo , Polimorfismo de Nucleótido Simple , ADN , Predisposición Genética a la Enfermedad , Estudios de Casos y Controles
6.
BMC Genom Data ; 24(1): 50, 2023 09 04.
Artículo en Inglés | MEDLINE | ID: mdl-37667186

RESUMEN

BACKGROUND: A relevant part of the genetic architecture of complex traits is still unknown; despite the discovery of many disease-associated common variants. Polygenic risk score (PRS) models are based on the evaluation of the additive effects attributable to common variants and have been successfully implemented to assess the genetic susceptibility for many phenotypes. In contrast, burden tests are often used to identify an enrichment of rare deleterious variants in specific genes. Both kinds of genetic contributions are typically analyzed independently. Many studies suggest that complex phenotypes are influenced by both low effect common variants and high effect rare deleterious variants. The aim of this paper is to integrate the effect of both common and rare functional variants for a more comprehensive genetic risk modeling. METHODS: We developed a framework combining gene-based scores based on the enrichment of rare functionally relevant variants with genome-wide PRS based on common variants for association analysis and prediction models. We applied our framework on UK Biobank dataset with genotyping and exome data and considered 28 blood biomarkers levels as target phenotypes. For each biomarker, an association analysis was performed on full cohort using gene-based scores (GBS). The cohort was then split into 3 subsets for PRS construction and feature selection, predictive model training, and independent evaluation, respectively. Prediction models were generated including either PRS, GBS or both (combined). RESULTS: Association analyses of the cohort were able to detect significant genes that were previously known to be associated with different biomarkers. Interestingly, the analyses also revealed heterogeneous effect sizes and directionality highlighting the complexity of the blood biomarkers regulation. However, the combined models for many biomarkers show little or no improvement in prediction accuracy compared to the PRS models. CONCLUSION: This study shows that rare variants play an important role in the genetic architecture of complex multifactorial traits such as blood biomarkers. However, while rare deleterious variants play a strong role at an individual level, our results indicate that classical common variant based PRS might be more informative to predict the genetic susceptibility at the population level.


Asunto(s)
Exoma , Predisposición Genética a la Enfermedad , Humanos , Predisposición Genética a la Enfermedad/genética , Biomarcadores , Fenotipo , Herencia Multifactorial/genética
7.
Eur J Hum Genet ; 31(11): 1251-1260, 2023 11.
Artículo en Inglés | MEDLINE | ID: mdl-37644171

RESUMEN

Heterozygous, pathogenic CUX1 variants are associated with global developmental delay or intellectual disability. This study delineates the clinical presentation in an extended cohort and investigates the molecular mechanism underlying the disorder in a Cux1+/- mouse model. Through international collaboration, we assembled the phenotypic and molecular information for 34 individuals (23 unpublished individuals). We analyze brain CUX1 expression and susceptibility to epilepsy in Cux1+/- mice. We describe 34 individuals, from which 30 were unrelated, with 26 different null and four missense variants. The leading symptoms were mild to moderate delayed speech and motor development and borderline to moderate intellectual disability. Additional symptoms were muscular hypotonia, seizures, joint laxity, and abnormalities of the forehead. In Cux1+/- mice, we found delayed growth, histologically normal brains, and increased susceptibility to seizures. In Cux1+/- brains, the expression of Cux1 transcripts was half of WT animals. Expression of CUX1 proteins was reduced, although in early postnatal animals significantly more than in adults. In summary, disease-causing CUX1 variants result in a non-syndromic phenotype of developmental delay and intellectual disability. In some individuals, this phenotype ameliorates with age, resulting in a clinical catch-up and normal IQ in adulthood. The post-transcriptional balance of CUX1 expression in the heterozygous brain at late developmental stages appears important for this favorable clinical course.


Asunto(s)
Discapacidad Intelectual , Trastornos del Neurodesarrollo , Adulto , Animales , Humanos , Ratones , Heterocigoto , Proteínas de Homeodominio/genética , Discapacidad Intelectual/genética , Discapacidad Intelectual/diagnóstico , Trastornos del Neurodesarrollo/genética , Trastornos del Neurodesarrollo/patología , Fenotipo , Proteínas Represoras/genética , Convulsiones , Factores de Transcripción/genética , Factores de Transcripción/metabolismo
8.
Front Genet ; 14: 1217860, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-37441549

RESUMEN

Polygenic risk scores (PRS) calculate the risk for a specific disease based on the weighted sum of associated alleles from different genetic loci in the germline estimated by regression models. Recent advances in genetics made it possible to create polygenic predictors of complex human traits, including risks for many important complex diseases, such as cancer, diabetes, or cardiovascular diseases, typically influenced by many genetic variants, each of which has a negligible effect on overall risk. In the current study, we analyzed whether adding additional PRS from other diseases to the prediction models and replacing the regressions with machine learning models can improve overall predictive performance. Results showed that multi-PRS models outperform single-PRS models significantly on different diseases. Moreover, replacing regression models with machine learning models, i.e., deep learning, can also improve overall accuracy.

9.
BMC Med Genomics ; 16(1): 164, 2023 07 12.
Artículo en Inglés | MEDLINE | ID: mdl-37438803

RESUMEN

BACKGROUND & AIMS: We aimed to assess the performance of European-derived polygenic risk scores (PRSs) for common metabolic diseases such as coronary artery disease (CAD), obesity, and type 2 diabetes (T2D) in the South Asian (SAS) individuals in the UK Biobank. Additionally, we studied the interaction between PRS and family history (FH) in the same population. METHODS: To calculate the PRS, we used a previously published model derived from the EUR population and applied it to the individuals of SAS ancestry from the UKB study. Each PRS was adjusted according to an individual's genotype location in the principal components (PC) space to derive an ancestry adjusted PRS (aPRS). We calculated the percentiles based on aPRS and stratified individuals into three aPRS categories: low, intermediate, and high. Considering the intermediate-aPRS percentile as a reference, we compared the low and high aPRS categories and generated the odds ratio (OR) estimates. Further, we measured the combined role of aPRS and first-degree family history (FH) in the SAS population. RESULTS: The risk of developing severe obesity for SAS individuals was almost twofold higher for individuals with high aPRS than for those with intermediate aPRS, with an OR of 1.95 (95% CI = 1.71-2.23, P < 0.01). At the same time, the risk of severe obesity was lower in the low-aPRS group (OR = 0.60, CI = 0.53-0.67, P < 0.01). Results in the same direction were found in the EUR data, where the low-PRS group had an OR of 0.53 (95% CI = 0.51-0.56, P < 0.01) and the high-PRS group had an OR of 2.06 (95% CI = 2.00-2.12, P < 0.01). We observed similar results for CAD and T2D. Further, we show that SAS individuals with a familial history of CAD and T2D with high-aPRS are associated with a higher risk of these diseases, implying a greater genetic predisposition. CONCLUSION: Our findings suggest that CAD, obesity, and T2D GWAS summary statistics generated predominantly from the EUR population can be potentially used to derive aPRS in SAS individuals for risk stratification. With future GWAS recruiting more SAS participants and tailoring the PRSs towards SAS ancestry, the predictive power of PRS is likely to improve further.


Asunto(s)
Enfermedad de la Arteria Coronaria , Diabetes Mellitus Tipo 2 , Obesidad Mórbida , Humanos , Enfermedad de la Arteria Coronaria/genética , Diabetes Mellitus Tipo 2/genética , Obesidad/genética , Factores de Riesgo , Reino Unido , Pueblo Asiatico , Herencia Multifactorial
10.
J Med Genet ; 60(11): 1044-1051, 2023 Nov.
Artículo en Inglés | MEDLINE | ID: mdl-37321833

RESUMEN

BACKGROUND: Polygenic risk scores (PRSs) have been used to stratify colorectal cancer (CRC) risk in the general population, whereas its role in Lynch syndrome (LS), the most common type of hereditary CRC, is still conflicting. We aimed to assess the ability of PRS to refine CRC risk prediction in European-descendant individuals with LS. METHODS: 1465 individuals with LS (557 MLH1, 517 MSH2/EPCAM, 299 MSH6 and 92 PMS2) and 5656 CRC-free population-based controls from two independent cohorts were included. A 91-SNP PRS was applied. A Cox proportional hazard regression model with 'family' as a random effect and a logistic regression analysis, followed by a meta-analysis combining both cohorts were conducted. RESULTS: Overall, we did not observe a statistically significant association between PRS and CRC risk in the entire cohort. Nevertheless, PRS was significantly associated with a slightly increased risk of CRC or advanced adenoma (AA), in those with CRC diagnosed <50 years and in individuals with multiple CRCs or AAs diagnosed <60 years. CONCLUSION: The PRS may slightly influence CRC risk in individuals with LS in particular in more extreme phenotypes such as early-onset disease. However, the study design and recruitment strategy strongly influence the results of PRS studies. A separate analysis by genes and its combination with other genetic and non-genetic risk factors will help refine its role as a risk modifier in LS.

11.
Eur J Hum Genet ; 31(7): 824-833, 2023 07.
Artículo en Inglés | MEDLINE | ID: mdl-37130971

RESUMEN

Amino-terminal (Nt-) acetylation (NTA) is a common protein modification, affecting 80% of cytosolic proteins in humans. The human essential gene, NAA10, encodes for the enzyme NAA10, which is the catalytic subunit in the N-terminal acetyltransferase A (NatA) complex, also including the accessory protein, NAA15. The full spectrum of human genetic variation in this pathway is currently unknown. Here we reveal the genetic landscape of variation in NAA10 and NAA15 in humans. Through a genotype-first approach, one clinician interviewed the parents of 56 individuals with NAA10 variants and 19 individuals with NAA15 variants, which were added to all known cases (N = 106 for NAA10 and N = 66 for NAA15). Although there is clinical overlap between the two syndromes, functional assessment demonstrates that the overall level of functioning for the probands with NAA10 variants is significantly lower than the probands with NAA15 variants. The phenotypic spectrum includes variable levels of intellectual disability, delayed milestones, autism spectrum disorder, craniofacial dysmorphology, cardiac anomalies, seizures, and visual abnormalities (including cortical visual impairment and microphthalmia). One female with the p.Arg83Cys variant and one female with an NAA15 frameshift variant both have microphthalmia. The frameshift variants located toward the C-terminal end of NAA10 have much less impact on overall functioning, whereas the females with the p.Arg83Cys missense in NAA10 have substantial impairment. The overall data are consistent with a phenotypic spectrum for these alleles, involving multiple organ systems, thus revealing the widespread effect of alterations of the NTA pathway in humans.


Asunto(s)
Trastorno del Espectro Autista , Discapacidad Intelectual , Microftalmía , Humanos , Femenino , Síndrome , Acetiltransferasa E N-Terminal/genética , Acetiltransferasa E N-Terminal/metabolismo , Genotipo , Discapacidad Intelectual/genética , Acetiltransferasa A N-Terminal/genética , Acetiltransferasa A N-Terminal/metabolismo
12.
Bioinformatics ; 39(5)2023 05 04.
Artículo en Inglés | MEDLINE | ID: mdl-37084271

RESUMEN

MOTIVATION: Missense variants are a frequent class of variation within the coding genome, and some of them cause Mendelian diseases. Despite advances in computational prediction, classifying missense variants into pathogenic or benign remains a major challenge in the context of personalized medicine. Recently, the structure of the human proteome was derived with unprecedented accuracy using the artificial intelligence system AlphaFold2. This raises the question of whether AlphaFold2 wild-type structures can improve the accuracy of computational pathogenicity prediction for missense variants. RESULTS: To address this, we first engineered a set of features for each amino acid from these structures. We then trained a random forest to distinguish between relatively common (proxy-benign) and singleton (proxy-pathogenic) missense variants from gnomAD v3.1. This yielded a novel AlphaFold2-based pathogenicity prediction score, termed AlphScore. Important feature classes used by AlphScore are solvent accessibility, amino acid network related features, features describing the physicochemical environment, and AlphaFold2's quality parameter (predicted local distance difference test). AlphScore alone showed lower performance than existing in silico scores used for missense prediction, such as CADD or REVEL. However, when AlphScore was added to those scores, the performance increased, as measured by the approximation of deep mutational scan data, as well as the prediction of expert-curated missense variants from the ClinVar database. Overall, our data indicate that the integration of AlphaFold2-predicted structures can improve pathogenicity prediction of missense variants. AVAILABILITY AND IMPLEMENTATION: AlphScore, combinations of AlphScore with existing scores, as well as variants used for training and testing are publicly available.


Asunto(s)
Inteligencia Artificial , Biología Computacional , Humanos , Virulencia , Mutación Missense , Mutación
13.
BMC Med Genomics ; 16(1): 42, 2023 03 05.
Artículo en Inglés | MEDLINE | ID: mdl-36872334

RESUMEN

BACKGROUND AND AIMS: Summarised in polygenic risk scores (PRS), the effect of common, low penetrant genetic variants associated with colorectal cancer (CRC), can be used for risk stratification. METHODS: To assess the combined impact of the PRS and other main factors on CRC risk, 163,516 individuals from the UK Biobank were stratified as follows: 1. carriers status for germline pathogenic variants (PV) in CRC susceptibility genes (APC, MLH1, MSH2, MSH6, PMS2), 2. low (< 20%), intermediate (20-80%), or high PRS (> 80%), and 3. family history (FH) of CRC. Multivariable logistic regression and Cox proportional hazards models were applied to compare odds ratios and to compute the lifetime incidence, respectively. RESULTS: Depending on the PRS, the CRC lifetime incidence for non-carriers ranges between 6 and 22%, compared to 40% and 74% for carriers. A suspicious FH is associated with a further increase of the cumulative incidence reaching 26% for non-carriers and 98% for carriers. In non-carriers without FH, but high PRS, the CRC risk is doubled, whereas a low PRS even in the context of a FH results in a decreased risk. The full model including PRS, carrier status, and FH improved the area under the curve in risk prediction (0.704). CONCLUSION: The findings demonstrate that CRC risks are strongly influenced by the PRS for both a sporadic and monogenic background. FH, PV, and common variants complementary contribute to CRC risk. The implementation of PRS in routine care will likely improve personalized risk stratification, which will in turn guide tailored preventive surveillance strategies in high, intermediate, and low risk groups.


Asunto(s)
Neoplasias Colorrectales , Mutación de Línea Germinal , Humanos , Incidencia , Factores de Riesgo , Células Germinativas
14.
Stat Med ; 42(11): 1779-1801, 2023 05 20.
Artículo en Inglés | MEDLINE | ID: mdl-36932460

RESUMEN

We develop a model-based boosting approach for multivariate distributional regression within the framework of generalized additive models for location, scale, and shape. Our approach enables the simultaneous modeling of all distribution parameters of an arbitrary parametric distribution of a multivariate response conditional on explanatory variables, while being applicable to potentially high-dimensional data. Moreover, the boosting algorithm incorporates data-driven variable selection, taking various different types of effects into account. As a special merit of our approach, it allows for modeling the association between multiple continuous or discrete outcomes through the relevant covariates. After a detailed simulation study investigating estimation and prediction performance, we demonstrate the full flexibility of our approach in three diverse biomedical applications. The first is based on high-dimensional genomic cohort data from the UK Biobank, considering a bivariate binary response (chronic ischemic heart disease and high cholesterol). Here, we are able to identify genetic variants that are informative for the association between cholesterol and heart disease. The second application considers the demand for health care in Australia with the number of consultations and the number of prescribed medications as a bivariate count response. The third application analyses two dimensions of childhood undernutrition in Nigeria as a bivariate response and we find that the correlation between the two undernutrition scores is considerably different depending on the child's age and the region the child lives in.


Asunto(s)
Algoritmos , Modelos Estadísticos , Niño , Humanos , Simulación por Computador , Australia , Nigeria
15.
Genet Epidemiol ; 46(8): 589-603, 2022 12.
Artículo en Inglés | MEDLINE | ID: mdl-35938382

RESUMEN

Polygenic risk scores quantify the individual genetic predisposition regarding a particular trait. We propose and illustrate the application of existing statistical learning methods to derive sparser models for genome-wide data with a polygenic signal. Our approach is based on three consecutive steps. First, potentially informative loci are identified by a marginal screening approach. Then, fine-mapping is independently applied for blocks of variants in linkage disequilibrium, where informative variants are retrieved by using variable selection methods including boosting with probing and stochastic searches with the Adaptive Subspace method. Finally, joint prediction models with the selected variants are derived using statistical boosting. In contrast to alternative approaches relying on univariate summary statistics from genome-wide association studies, our three-step approach enables to select and fit multivariable regression models on large-scale genotype data. Based on UK Biobank data, we develop prediction models for LDL-cholesterol as a continuous trait. Additionally, we consider a recent scalable algorithm for the Lasso. Results show that statistical learning approaches based on fine-mapping of genetic signals result in a competitive prediction performance compared to classical polygenic risk approaches, while yielding sparser risk models.


Asunto(s)
Estudio de Asociación del Genoma Completo , Polimorfismo de Nucleótido Simple , Humanos , Estudio de Asociación del Genoma Completo/métodos , LDL-Colesterol/genética , Modelos Genéticos , Herencia Multifactorial/genética
16.
Bioinformatics ; 38(9): 2651-2653, 2022 04 28.
Artículo en Inglés | MEDLINE | ID: mdl-35266528

RESUMEN

SUMMARY: The genetic architecture of complex traits can be influenced by both many common regulatory variants with small effect sizes and rare deleterious variants in coding regions with larger effect sizes. However, the two kinds of genetic contributions are typically analyzed independently. Here, we present GenRisk, a python package for the computation and the integration of gene scores based on the burden of rare deleterious variants and common-variants-based polygenic risk scores. The derived scores can be analyzed within GenRisk to perform association tests or to derive phenotype prediction models by testing multiple classification and regression approaches. GenRisk is compatible with VCF input file formats. AVAILABILITY AND IMPLEMENTATION: GenRisk is an open source publicly available python package that can be downloaded or installed from Github (https://github.com/AldisiRana/GenRisk). SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Herencia Multifactorial , Programas Informáticos , Fenotipo , Sistemas de Lectura Abierta , Factores de Riesgo
17.
Nat Genet ; 54(3): 349-357, 2022 03.
Artículo en Inglés | MEDLINE | ID: mdl-35145301

RESUMEN

Many monogenic disorders cause a characteristic facial morphology. Artificial intelligence can support physicians in recognizing these patterns by associating facial phenotypes with the underlying syndrome through training on thousands of patient photographs. However, this 'supervised' approach means that diagnoses are only possible if the disorder was part of the training set. To improve recognition of ultra-rare disorders, we developed GestaltMatcher, an encoder for portraits that is based on a deep convolutional neural network. Photographs of 17,560 patients with 1,115 rare disorders were used to define a Clinical Face Phenotype Space, in which distances between cases define syndromic similarity. Here we show that patients can be matched to others with the same molecular diagnosis even when the disorder was not included in the training set. Together with mutation data, GestaltMatcher could not only accelerate the clinical diagnosis of patients with ultra-rare disorders and facial dysmorphism but also enable the delineation of new phenotypes.


Asunto(s)
Inteligencia Artificial , Enfermedades Raras , Cara , Humanos , Redes Neurales de la Computación , Fenotipo , Enfermedades Raras/genética
18.
Front Genet ; 13: 1076440, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-36704342

RESUMEN

Polygenic risk scores (PRS) evaluate the individual genetic liability to a certain trait and are expected to play an increasingly important role in clinical risk stratification. Most often, PRS are estimated based on summary statistics of univariate effects derived from genome-wide association studies. To improve the predictive performance of PRS, it is desirable to fit multivariable models directly on the genetic data. Due to the large and high-dimensional data, a direct application of existing methods is often not feasible and new efficient algorithms are required to overcome the computational burden regarding efficiency and memory demands. We develop an adapted component-wise L 2-boosting algorithm to fit genotype data from large cohort studies to continuous outcomes using linear base-learners for the genetic variants. Similar to the snpnet approach implementing lasso regression, the proposed snpboost approach iteratively works on smaller batches of variants. By restricting the set of possible base-learners in each boosting step to variants most correlated with the residuals from previous iterations, the computational efficiency can be substantially increased without losing prediction accuracy. Furthermore, for large-scale data based on various traits from the UK Biobank we show that our method yields competitive prediction accuracy and computational efficiency compared to the snpnet approach and further commonly used methods. Due to the modular structure of boosting, our framework can be further extended to construct PRS for different outcome data and effect types-we illustrate this for the prediction of binary traits.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA