Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 8 de 8
Filter
Add more filters










Database
Language
Publication year range
1.
Stud Health Technol Inform ; 310: 1021-1025, 2024 Jan 25.
Article in English | MEDLINE | ID: mdl-38269969

ABSTRACT

Coronary artery disease (CAD) has the highest disease burden worldwide. To manage this burden, predictive models are required to screen patients for preventative treatment. A range of variables have been explored for their capacity to predict disease, including phenotypic (age, sex, BMI and smoking status), medical imaging (carotid artery thickness) and genotypic. We use a machine learning models and the UK Biobank cohort to measure the prediction capacity of these 3 variable categories, both in combination and isolation. We demonstrate that phenotypic variables from the Framingham risk score have the best prediction capacity, although a combination of phenotypic, medical imaging and genotypic variables deliver the most specific models. Furthermore, we demonstrate that Variant Spark, a random forest based GWAS platform, performs effective feature selection for SNP-based genotype variables, identifying 115 significantly associated SNPs to the CAD phenotype.


Subject(s)
Coronary Artery Disease , Humans , Coronary Artery Disease/diagnostic imaging , Coronary Artery Disease/genetics , Carotid Intima-Media Thickness , Phenotype , Genotype , Machine Learning
2.
Sci Rep ; 13(1): 17662, 2023 10 17.
Article in English | MEDLINE | ID: mdl-37848535

ABSTRACT

Alzheimer's disease (AD) is a complex genetic disease, and variants identified through genome-wide association studies (GWAS) explain only part of its heritability. Epistasis has been proposed as a major contributor to this 'missing heritability', however, many current methods are limited to only modelling additive effects. We use VariantSpark, a machine learning approach to GWAS, and BitEpi, a tool for epistasis detection, to identify AD associated variants and interactions across two independent cohorts, ADNI and UK Biobank. By incorporating significant epistatic interactions, we captured 10.41% more phenotypic variance than logistic regression (LR). We validate the well-established AD loci, APOE, and identify two novel genome-wide significant AD associated loci in both cohorts, SH3BP4 and SASH1, which are also in significant epistatic interactions with APOE. We show that the SH3BP4 SNP has a modulating effect on the known pathogenic APOE SNP, demonstrating a possible protective mechanism against AD. SASH1 is involved in a triplet interaction with pathogenic APOE SNP and ACOT11, where the SASH1 SNP lowered the pathogenic interaction effect between ACOT11 and APOE. Finally, we demonstrate that VariantSpark detects disease associations with 80% fewer controls than LR, unlocking discoveries in well annotated but smaller cohorts.


Subject(s)
Alzheimer Disease , Humans , Alzheimer Disease/genetics , Genome-Wide Association Study , Epistasis, Genetic , Machine Learning , Polymorphism, Single Nucleotide , Apolipoproteins E/genetics , Genetic Predisposition to Disease , Adaptor Proteins, Signal Transducing/genetics
3.
Comput Struct Biotechnol J ; 21: 4354-4360, 2023.
Article in English | MEDLINE | ID: mdl-37711185

ABSTRACT

Random forests (RFs) are a widely used modelling tool capable of feature selection via a variable importance measure (VIM), however, a threshold is needed to control for false positives. In the absence of a good understanding of the characteristics of VIMs, many current approaches attempt to select features associated to the response by training multiple RFs to generate statistical power via a permutation null, by employing recursive feature elimination, or through a combination of both. However, for high-dimensional datasets these approaches become computationally infeasible. In this paper, we present RFlocalfdr, a statistical approach, built on the empirical Bayes argument of Efron, for thresholding mean decrease in impurity (MDI) importances. It identifies features significantly associated with the response while controlling the false positive rate. Using synthetic data and real-world data in health, we demonstrate that RFlocalfdr has equivalent accuracy to currently published approaches, while being orders of magnitude faster. We show that RFlocalfdr can successfully threshold a dataset of 106 datapoints, establishing its usability for large-scale datasets, like genomics. Furthermore, RFlocalfdr is compatible with any RF implementation that returns a VIM and counts, making it a versatile feature selection tool that reduces false discoveries.

4.
Prenat Diagn ; 43(1): 109-116, 2023 01.
Article in English | MEDLINE | ID: mdl-36484552

ABSTRACT

OBJECTIVE: European and Australian guidelines for cystic fibrosis (CF) reproductive carrier screening recommend testing a small number of high frequency CF causing variants, rather than comprehensive CFTR sequencing. The study objective was to determine variant detection rates of commercially available targeted reproductive carrier screening tests in Australia. METHODS: Next-generation DNA sequencing of the CFTR gene was performed on 2552 individuals from a whole population sample to identify CF causing variants. The variant detection rates of two commercially available Australian reproductive carrier screening tests, which target 50 or 175 CF causing variants, in this population were calculated. The ethnicity of individuals was determined using principal component analysis. RESULTS: Variant detection rates of the tests for 50 and 175 CF causing variants were 88.2% and 90.8%, respectively. No CF causing variants in individuals of East Asian ethnicity (n = 3) were detected by either test, while >86.6% (n = 69) of CF causing variants in Europeans would be identified by either test. CONCLUSIONS: Reproductive carrier screening tests for a targeted set of high frequency CF variants are unable to detect approximately 10% of CF variants in a multiethnic Australian population, and individuals of East Asian ethnicity are disproportionally affected by this test limitation.


Subject(s)
Cystic Fibrosis , Humans , Cystic Fibrosis/diagnosis , Cystic Fibrosis/epidemiology , Cystic Fibrosis/genetics , Cystic Fibrosis Transmembrane Conductance Regulator/genetics , Australia/epidemiology , Genetic Testing , Ethnicity , Mutation
5.
Comput Struct Biotechnol J ; 20: 2942-2950, 2022.
Article in English | MEDLINE | ID: mdl-35677774

ABSTRACT

New SARS-CoV-2 variants emerge as part of the virus' adaptation to the human host. The Health Organizations are monitoring newly emerging variants with suspected impact on disease or vaccination efficacy as Variants Being Monitored (VBM), like Delta and Omicron. Genetic changes (SNVs) compared to the Wuhan variant characterize VBMs with current emphasis on the spike protein and lineage markers. However, monitoring VBMs in such a way might miss SNVs with functional effect on disease. Here we introduce a lineage-agnostic genome-wide approach to identify SNVs associated with disease. We curated a case-control dataset of 10,520 samples and identified 117 SNVs significantly associated with adverse patient outcome. While 40% (47) SNV are already monitored and 36% (43) are in the spike protein, we also identified 70 new SNVs that are associated with disease outcome. 31 of these are disease-worsening and predominantly located in the 3'-5' exonuclease (NSP14) with structural modelling revealing a concise cluster in the Zn binding domain that has known host-immune modulating function. Furthermore, we generate clade-independent VBM groupings by identifying interacting SNVs (epistasis). We find 37 sets of higher-order epistatic interactions joining 5 genomic regions (nsp3, nsp14, Spike S1, ORF3a, N). Structural modelling of these regions provides insights into potential mechanistic pathways of increased virulence as well as orthogonal methods of validation. Clade-independent monitoring of functionally interacting (epistasis, co-evolution) SNVs detected emerging VBM a week before they were flagged by Health Organizations and in conjunction with structural modelling provides faster, mechanistic insight into emerging strains to guide public health interventions.

6.
Sci Rep ; 10(1): 16603, 2020 Oct 01.
Article in English | MEDLINE | ID: mdl-32999326

ABSTRACT

An amendment to this paper has been published and can be accessed via a link at the top of the paper.

7.
Front Neurol Neurosci Res ; 1: 100001, 2020 Sep 16.
Article in English | MEDLINE | ID: mdl-34322689

ABSTRACT

In recent years, the advantages of RNA-sequencing (RNA-Seq) have made it the platform of choice for measuring gene expression over traditional microarrays. However, RNA-Seq comes with bioinformatical challenges and higher computational costs. Therefore, this study set out to assess whether the increased depth of transcriptomic information facilitated by RNA-Seq is worth the increased computation over microarrays, specifically at three levels: absolute expression levels, differentially expressed genes identification, and expression QTL (eQTL) mapping in regions of the human brain. Using the United Kingdom Brain Expression Consortium (UKBEC) dataset, there is high agreement of gene expression levels measured by microarrays and RNA-seq when quantifying absolute expression levels and when identifying differentially expressed genes. These findings suggest that depending on the aims of a study, the relative ease of working with microarray data may outweigh the computational time and costs of RNA-Seq pipelines. On the other, there was low agreement when mapping eQTLs. However, a number of eQTLs associated with genes that play important roles in the brain were found in both platforms. For example, a trans-eQTL was mapped that is associated with the MPZ gene in the substantia nigra. These eQTLs that we have highlighted are extremely promising candidates that merit further investigation.

8.
Sci Rep ; 9(1): 19201, 2019 12 16.
Article in English | MEDLINE | ID: mdl-31844111

ABSTRACT

Understanding the complexity of the human brain transcriptome architecture is one of the most important human genetics study areas. Previous studies have applied expression quantitative trait loci (eQTL) analysis at the genome-wide level of the brain to understand the underlying mechanisms relating to neurodegenerative diseases, primarily at the transcript level. To increase the resolution of our understanding, the current study investigates multi/single-region, transcript/exon-level and cis versus trans-acting eQTL, across 10 regions of the human brain. Some of the key findings of this study are: (i) only a relatively small proportion of eQTLs will be detected, where the sensitivity is under 5%; (ii) when an eQTL is acting in multiple regions (MR-eQTL), it tends to have very similar effects on gene expression in each of these regions, as well as being cis-acting; (iii) trans-acting eQTLs tend to have larger effects on expression compared to cis-acting eQTLs and tend to be specific to a single region (SR-eQTL) of the brain; (iv) the cerebellum has a very large number of eQTLs that function exclusively in this region, compared with other regions of the brain; (v) importantly, an interactive visualisation tool (Shiny app) was developed to visualise the MR/SR-eQTL at transcript and exon levels.

SELECTION OF CITATIONS
SEARCH DETAIL
...