Your browser doesn't support javascript.
loading
: 20 | 50 | 100
1 - 20 de 28
1.
Article En | MEDLINE | ID: mdl-38896210

BACKGROUND: The associations between mood disorders (anxiety and depression) and mild cognitive impairment (MCI) or Alzheimer's dementia (AD) remain unclear. METHODS: Data from the Australian Imaging, Biomarker & Lifestyle (AIBL) study were subjected to logistic regression to determine both cross-sectional and longitudinal associations between anxiety/depression and MCI/AD. Effect modification by selected covariates was analysed using the likelihood ratio test. RESULTS: Cross-sectional analysis was performed to explore the association between anxiety/depression and MCI/AD among 2,209 participants with a mean [SD] age of 72.3 [7.4] years, of whom 55.4% were female. After adjusting for confounding variables, we found a significant increase in the odds of AD among participants with two mood disorders (anxiety: OR 1.65 [95% CI 1.04-2.60]; depression: OR 1.73 [1.12-2.69]). Longitudinal analysis was conducted to explore the target associations among 1,379 participants with a mean age of 71.2 [6.6] years, of whom 56.3% were female. During a mean follow-up of 5.0 [4.2] years, 163 participants who developed MCI/AD (refer to as PRO) were identified. Only anxiety was associated with higher odds of PRO after adjusting for covariates (OR 1.56 [1.03-2.39]). However, after additional adjustment for depression, the association became insignificant. Additionally, age, sex, and marital status were identified as effect modifiers for the target associations. CONCLUSION: Our study provides supportive evidence that anxiety and depression impact on the evolution of MCI/AD, which provides valuable epidemiological insights that can inform clinical practice, guiding clinicians in offering targeted dementia prevention and surveillance programs to the at-risk populations.

2.
Hum Immunol ; 85(3): 110790, 2024 May.
Article En | MEDLINE | ID: mdl-38575482

Currently, the genetic variants strongly associated with risk for Multiple Sclerosis (MS) are located in the Major Histocompatibility Complex. This includes DRB1*15:01 and DRB1*15:03 alleles at the HLA-DRB1 locus, the latter restricted to African populations; the DQB1*06:02 allele at the HLA-DQB1 locus which is in high linkage disequilibrium (LD) with DRB1*15:01; and protective allele A*02:01 at the HLA-A locus. HLA allele identification is facilitated by co-inherited ('tag') single nucleotide polymorphisms (SNPs); however, SNP validation is not typically done outside of the discovery population. We examined 19 SNPs reported to be in high LD with these alleles in 2,502 healthy subjects included in the 1000 Genomes panel having typed HLA data. Examination of 3 indices (LD R2 values, sensitivity and specificity, minor allele frequency) revealed few SNPs with high tagging performance. All SNPs examined that tag DRB1*15:01 were in perfect LD in the British population; three showed high tagging performance in 4 of the 5 European, and 2 of the 4 American populations. For DQB1*06:02, with no previously validated tag SNPs, we show that rs3135388 has high tagging performance in one South Asian, one American, and one European population. We identify for the first time that rs2844821 has high tagging performance for A*02:01 in 5 of 7 African populations including African Americans, and 4 of the 5 European populations. These results provide a basis for selecting SNPs with high tagging performance to assess HLA alleles across diverse populations, for MS risk as well as for other diseases and conditions.


Alleles , Gene Frequency , Genetic Predisposition to Disease , Linkage Disequilibrium , Multiple Sclerosis , Polymorphism, Single Nucleotide , Humans , Multiple Sclerosis/genetics , HLA-DQ beta-Chains/genetics , HLA-DRB1 Chains/genetics , Genome, Human , Risk
3.
Genet Epidemiol ; 2024 Mar 19.
Article En | MEDLINE | ID: mdl-38504141

Young breast and bowel cancers (e.g., those diagnosed before age 40 or 50 years) have far greater morbidity and mortality in terms of years of life lost, and are increasing in incidence, but have been less studied. For breast and bowel cancers, the familial relative risks, and therefore the familial variances in age-specific log(incidence), are much greater at younger ages, but little of these familial variances has been explained. Studies of families and twins can address questions not easily answered by studies of unrelated individuals alone. We describe existing and emerging family and twin data that can provide special opportunities for discovery. We present designs and statistical analyses, including novel ideas such as the VALID (Variance in Age-specific Log Incidence Decomposition) model for causes of variation in risk, the DEPTH (DEPendency of association on the number of Top Hits) and other approaches to analyse genome-wide association study data, and the within-pair, ICE FALCON (Inference about Causation from Examining FAmiliaL CONfounding) and ICE CRISTAL (Inference about Causation from Examining Changes in Regression coefficients and Innovative STatistical AnaLysis) approaches to causation and familial confounding. Example applications to breast and colorectal cancer are presented. Motivated by the availability of the resources of the Breast and Colon Cancer Family Registries, we also present some ideas for future studies that could be applied to, and compared with, cancers diagnosed at older ages and address the challenges posed by young breast and bowel cancers.

4.
J Alzheimers Dis ; 97(1): 89-100, 2024.
Article En | MEDLINE | ID: mdl-38007665

The accumulation of amyloid-ß (Aß) plaques in the brain is considered a hallmark of Alzheimer's disease (AD). Mathematical modeling, capable of predicting the motion and accumulation of Aß, has obtained increasing interest as a potential alternative to aid the diagnosis of AD and predict disease prognosis. These mathematical models have provided insights into the pathogenesis and progression of AD that are difficult to obtain through experimental studies alone. Mathematical modeling can also simulate the effects of therapeutics on brain Aß levels, thereby holding potential for drug efficacy simulation and the optimization of personalized treatment approaches. In this review, we provide an overview of the mathematical models that have been used to simulate brain levels of Aß (oligomers, protofibrils, and/or plaques). We classify the models into five categories: the general ordinary differential equation models, the general partial differential equation models, the network models, the linear optimal ordinary differential equation models, and the modified partial differential equation models (i.e., Smoluchowski equation models). The assumptions, advantages and limitations of these models are discussed. Given the popularity of using the Smoluchowski equation models to simulate brain levels of Aß, our review summarizes the history and major advancements in these models (e.g., their application to predict the onset of AD and their combined use with network models). This review is intended to bring mathematical modeling to the attention of more scientists and clinical researchers working on AD to promote cross-disciplinary research.


Alzheimer Disease , Humans , Alzheimer Disease/pathology , Amyloid beta-Peptides/metabolism , Models, Theoretical , Brain/pathology , Computer Simulation , Plaque, Amyloid/pathology
5.
Cancer Epidemiol Biomarkers Prev ; 33(2): 306-313, 2024 02 06.
Article En | MEDLINE | ID: mdl-38059829

BACKGROUND: Cirrus is an automated risk predictor for breast cancer that comprises texture-based mammographic features and is mostly independent of mammographic density. We investigated genetic and environmental variance of variation in Cirrus. METHODS: We measured Cirrus for 3,195 breast cancer-free participants, including 527 pairs of monozygotic (MZ) twins, 271 pairs of dizygotic (DZ) twins, and 1,599 siblings of twins. Multivariate normal models were used to estimate the variance and familial correlations of age-adjusted Cirrus as a function of age. The classic twin model was expanded to allow the shared environment effects to differ by zygosity. The SNP-based heritability was estimated for a subset of 2,356 participants. RESULTS: There was no evidence that the variance or familial correlations depended on age. The familial correlations were 0.52 (SE, 0.03) for MZ pairs and 0.16(SE, 0.03) for DZ and non-twin sister pairs combined. Shared environmental factors specific to MZ pairs accounted for 20% of the variance. Additive genetic factors accounted for 32% (SE = 5%) of the variance, consistent with the SNP-based heritability of 36% (SE = 16%). CONCLUSION: Cirrus is substantially familial due to genetic factors and an influence of shared environmental factors that was evident for MZ twin pairs only. The latter could be due to nongenetic factors operating in utero or in early life that are shared by MZ twins. IMPACT: Early-life factors, shared more by MZ pairs than DZ/non-twin sister pairs, could play a role in the variation in Cirrus, consistent with early life being recognized as a critical window of vulnerability to breast carcinogens.


Breast Neoplasms , Female , Humans , Breast , Breast Neoplasms/diagnostic imaging , Breast Neoplasms/genetics , Mammography , Risk Factors , Twins, Dizygotic/genetics , Twins, Monozygotic/genetics
6.
Neuroimage ; 278: 120279, 2023 09.
Article En | MEDLINE | ID: mdl-37454702

The recent biological redefinition of Alzheimer's Disease (AD) has spurred the development of statistical models that relate changes in biomarkers with neurodegeneration and worsening condition linked to AD. The ability to measure such changes may facilitate earlier diagnoses for affected individuals and help in monitoring the evolution of their condition. Amongst such statistical tools, disease progression models (DPMs) are quantitative, data-driven methods that specifically attempt to describe the temporal dynamics of biomarkers relevant to AD. Due to the heterogeneous nature of this disease, with patients of similar age experiencing different AD-related changes, a challenge facing longitudinal mixed-effects-based DPMs is the estimation of patient-realigning time-shifts. These time-shifts are indispensable for meaningful biomarker modelling, but may impact fitting time or vary with missing data in jointly estimated models. In this work, we estimate an individual's progression through Alzheimer's disease by combining multiple biomarkers into a single value using a probabilistic formulation of principal components analysis. Our results show that this variable, which summarises AD through observable biomarkers, is remarkably similar to jointly estimated time-shifts when we compute our scores for the baseline visit, on cross-sectional data from the Alzheimer's Disease Neuroimaging Initiative (ADNI). Reproducing the expected properties of clinical datasets, we confirm that estimated scores are robust to missing data or unavailable biomarkers. In addition to cross-sectional insights, we can model the latent variable as an individual progression score by repeating estimations at follow-up examinations and refining long-term estimates as more data is gathered, which would be ideal in a clinical setting. Finally, we verify that our score can be used as a pseudo-temporal scale instead of age to ignore some patient heterogeneity in cohort data and highlight the general trend in expected biomarker evolution in affected individuals.


Alzheimer Disease , Cognitive Dysfunction , Humans , Alzheimer Disease/diagnostic imaging , Cross-Sectional Studies , Neuroimaging/methods , Biomarkers , Disease Progression , Magnetic Resonance Imaging
7.
Brief Bioinform ; 23(6)2022 11 19.
Article En | MEDLINE | ID: mdl-36266246

Nucleotide and protein sequences stored in public databases are the cornerstone of many bioinformatics analyses. The records containing these sequences are prone to a wide range of errors, including incorrect functional annotation, sequence contamination and taxonomic misclassification. One source of information that can help to detect errors are the strong interdependency between records. Novel sequences in one database draw their annotations from existing records, may generate new records in multiple other locations and will have varying degrees of similarity with existing records across a range of attributes. A network perspective of these relationships between sequence records, within and across databases, offers new opportunities to detect-or even correct-erroneous entries and more broadly to make inferences about record quality. Here, we describe this novel perspective of sequence database records as a rich network, which we call the sequence database network, and illustrate the opportunities this perspective offers for quantification of database quality and detection of spurious entries. We provide an overview of the relevant databases and describe how the interdependencies between sequence records across these databases can be exploited by network analyses. We review the process of sequence annotation and provide a classification of sources of error, highlighting propagation as a major source. We illustrate the value of a network perspective through three case studies that use network analysis to detect errors, and explore the quality and quantity of critical relationships that would inform such network analyses. This systematic description of a network perspective of sequence database records provides a novel direction to combat the proliferation of errors within these critical bioinformatics resources.


Computational Biology , Databases, Nucleic Acid , Amino Acid Sequence
8.
J Phys Chem B ; 126(28): 5151-5160, 2022 07 21.
Article En | MEDLINE | ID: mdl-35796490

Free energy perturbation (FEP) calculations can predict relative binding affinities of an antigen and its point mutants to the same human leukocyte antigen (HLA) with high accuracy (e.g., within 1.0 kcal/mol to experiment); however, a more challenging task is to compare binding affinities of wholly different antigens binding to completely different HLAs using FEP. Researchers have used a variety of different FEP schemes to compute and compare absolute binding affinities, with varied success. Here, we propose and assess a unifying scheme to compute the relative binding affinities of different antigens binding to completely different HLAs using absolute binding affinity FEP calculations. We apply our affinity calculation technique to HLA-antigen-T-cell receptor (TCR) systems relevant to celiac disease (CeD) by investigating binding affinity differences between HLA-DQ2.5 (enhanced CeD risk) and HLA-DQ7.5 (CeD protective) in the binary (HLA-gliadin) and ternary (HLA-gliadin-TCR) binding complexes for three gliadin derived epitopes: glia-α1, glia-α2, and glia-ω1. Based on FEP calculations with our carefully designed thermodynamic cycles, we demonstrate that HLA-DQ2.5 has higher binding affinity than HLA-DQ7.5 for gliadin and enhanced binding affinity with a common TCR, agreeing with known results that the HLA-DQ2.5 serotype exhibits increased risk for CeD. Our findings reveal that our proposed absolute binding affinity FEP method is appropriate for predicting HLA binding for disparate antigens with different genotypes. We also discuss atomic-level details of HLA genotypes interacting with gluten peptides and TCRs in regard to the pathogenesis of CeD.


Celiac Disease , Glutens , Celiac Disease/genetics , Celiac Disease/metabolism , Epitopes, T-Lymphocyte/genetics , Epitopes, T-Lymphocyte/metabolism , Gliadin/chemistry , Glutens/chemistry , Humans , Peptides/chemistry , Receptors, Antigen, T-Cell/chemistry , Receptors, Antigen, T-Cell/genetics
9.
Bioinformatics ; 38(Suppl 1): i273-i281, 2022 06 24.
Article En | MEDLINE | ID: mdl-35758780

MOTIVATION: Literature-based gene ontology annotations (GOA) are biological database records that use controlled vocabulary to uniformly represent gene function information that is described in the primary literature. Assurance of the quality of GOA is crucial for supporting biological research. However, a range of different kinds of inconsistencies in between literature as evidence and annotated GO terms can be identified; these have not been systematically studied at record level. The existing manual-curation approach to GOA consistency assurance is inefficient and is unable to keep pace with the rate of updates to gene function knowledge. Automatic tools are therefore needed to assist with GOA consistency assurance. This article presents an exploration of different GOA inconsistencies and an early feasibility study of automatic inconsistency detection. RESULTS: We have created a reliable synthetic dataset to simulate four realistic types of GOA inconsistency in biological databases. Three automatic approaches are proposed. They provide reasonable performance on the task of distinguishing the four types of inconsistency and are directly applicable to detect inconsistencies in real-world GOA database records. Major challenges resulting from such inconsistencies in the context of several specific application settings are reported. This is the first study to introduce automatic approaches that are designed to address the challenges in current GOA quality assurance workflows. The data underlying this article are available in Github at https://github.com/jiyuc/AutoGOAConsistency.


Publications , Gene Ontology , Molecular Sequence Annotation
10.
Trends Hear ; 25: 23312165211066174, 2021.
Article En | MEDLINE | ID: mdl-34903103

While cochlear implants have helped hundreds of thousands of individuals, it remains difficult to predict the extent to which an individual's hearing will benefit from implantation. Several publications indicate that machine learning may improve predictive accuracy of cochlear implant outcomes compared to classical statistical methods. However, existing studies are limited in terms of model validation and evaluating factors like sample size on predictive performance. We conduct a thorough examination of machine learning approaches to predict word recognition scores (WRS) measured approximately 12 months after implantation in adults with post-lingual hearing loss. This is the largest retrospective study of cochlear implant outcomes to date, evaluating 2,489 cochlear implant recipients from three clinics. We demonstrate that while machine learning models significantly outperform linear models in prediction of WRS, their overall accuracy remains limited (mean absolute error: 17.9-21.8). The models are robust across clinical cohorts, with predictive error increasing by at most 16% when evaluated on a clinic excluded from the training set. We show that predictive improvement is unlikely to be improved by increasing sample size alone, with doubling of sample size estimated to only increasing performance by 3% on the combined dataset. Finally, we demonstrate how the current models could support clinical decision making, highlighting that subsets of individuals can be identified that have a 94% chance of improving WRS by at least 10% points after implantation, which is likely to be clinically meaningful. We discuss several implications of this analysis, focusing on the need to improve and standardize data collection.


Cochlear Implantation , Cochlear Implants , Deafness , Hearing Aids , Speech Perception , Adult , Cochlear Implantation/methods , Deafness/diagnosis , Humans , Retrospective Studies , Treatment Outcome
11.
Trends Hear ; 25: 23312165211037525, 2021.
Article En | MEDLINE | ID: mdl-34524944

While the majority of cochlear implant recipients benefit from the device, it remains difficult to estimate the degree of benefit for a specific patient prior to implantation. Using data from 2,735 cochlear-implant recipients from across three clinics, the largest retrospective study of cochlear-implant outcomes to date, we investigate the association between 21 preoperative factors and speech recognition approximately one year after implantation and explore the consistency of their effects across the three constituent datasets. We provide evidence of 17 statistically significant associations, in either univariate or multivariate analysis, including confirmation of associations for several predictive factors, which have only been examined in prior smaller studies. Despite the large sample size, a multivariate analysis shows that the variance explained by our models remains modest across the datasets (R2=0.12-0.21). Finally, we report a novel statistical interaction indicating that the duration of deafness in the implanted ear has a stronger impact on hearing outcome when considered relative to a candidate's age. Our multicenter study highlights several real-world complexities that impact the clinical translation of predictive factors for cochlear implantation outcome. We suggest several directions to overcome these challenges and further improve our ability to model patient outcomes with increased accuracy.


Cochlear Implantation , Cochlear Implants , Deafness , Speech Perception , Adult , Deafness/diagnosis , Deafness/surgery , Hearing , Humans , Retrospective Studies , Treatment Outcome
12.
Brief Bioinform ; 22(5)2021 09 02.
Article En | MEDLINE | ID: mdl-33834181

MOTIVATION: The high accuracy of recent haplotype phasing tools is enabling the integration of haplotype (or phase) information more widely in genetic investigations. One such possibility is phase-aware expression quantitative trait loci (eQTL) analysis, where haplotype-based analysis has the potential to detect associations that may otherwise be missed by standard SNP-based approaches. RESULTS: We present eQTLHap, a novel method to investigate associations between gene expression and genetic variants, considering their haplotypic and genotypic effect. Using multiple simulations based on real data, we demonstrate that phase-aware eQTL analysis significantly outperforms typical SNP-based methods when the causal genetic architecture involves multiple SNPs. We show that phase-aware eQTL analysis is robust to phasing errors, showing only a minor impact ($<4\%$) on sensitivity. Applying eQTLHap to real GEUVADIS and GTEx datasets detects numerous novel eQTLs undetected by a single-SNP approach, with 22 eQTLs replicating across studies or tissue types, highlighting the utility of phase-aware eQTL analysis. AVAILABILITY AND IMPLEMENTATION: https://github.com/ziadbkh/eQTLHap. CONTACT: ziad.albkhetan@gmail.com. SUPPLEMENTARY INFORMATION: Supplementary data are available at Briefings in Bioinformatics online.


Computational Biology/methods , Genome-Wide Association Study/methods , Haplotypes , Polymorphism, Single Nucleotide , Quantitative Trait Loci/genetics , Algorithms , Gene Expression Regulation , Genotype , Humans , Internet , Linkage Disequilibrium
14.
Brief Bioinform ; 22(4)2021 07 20.
Article En | MEDLINE | ID: mdl-33236761

Haplotype phasing is a critical step for many genetic applications but incorrect estimates of phase can negatively impact downstream analyses. One proposed strategy to improve phasing accuracy is to combine multiple independent phasing estimates to overcome the limitations of any individual estimate. However, such a strategy is yet to be thoroughly explored. This study provides a comprehensive evaluation of consensus strategies for haplotype phasing. We explore the performance of different consensus paradigms, and the effect of specific constituent tools, across several datasets with different characteristics and their impact on the downstream task of genotype imputation. Based on the outputs of existing phasing tools, we explore two different strategies to construct haplotype consensus estimators: voting across outputs from multiple phasing tools and multiple outputs of a single non-deterministic tool. We find that the consensus approach from multiple tools reduces SE by an average of 10% compared to any constituent tool when applied to European populations and has the highest accuracy regardless of population ethnicity, sample size, variant density or variant frequency. Furthermore, the consensus estimator improves the accuracy of the downstream task of genotype imputation carried out by the widely used Minimac3, pbwt and BEAGLE5 tools. Our results provide guidance on how to produce the most accurate phasing estimates and the trade-offs that a consensus approach may have. Our implementation of consensus haplotype phasing, consHap, is available freely at https://github.com/ziadbkh/consHap. Supplementary information: Supplementary data are available at Briefings in Bioinformatics online.


Algorithms , Databases, Nucleic Acid , Polymorphism, Single Nucleotide , Sequence Analysis, DNA , Haplotypes , Humans
15.
Clin Infect Dis ; 73(9): e3047-e3052, 2021 11 02.
Article En | MEDLINE | ID: mdl-32687168

BACKGROUND: Coronavirus disease 2019 has highlighted deficiencies in the testing capacity of many developed countries during the early stages of pandemics. Here we describe a strategy using pan-family viral assays to improve early accessibility of large-scale nucleic acid testing. METHODS: Coronaviruses and severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) were used as a case study for assessing utility of pan-family viral assays during the early stages of a novel pandemic. Specificity of a pan-coronavirus (Pan-CoV) assay for a novel pathogen was assessed using the frequency of common human coronavirus (HCoV) species in key populations. A reported Pan-CoV assay was assessed to determine sensitivity to 60 reference coronaviruses, including SARS-CoV-2. The resilience of the primer target regions of this assay to mutation was assessed in 8893 high-quality SARS-CoV-2 genomes to predict ongoing utility during pandemic progression. RESULTS: Because of common HCoV species, a Pan-CoV assay would return false positives for as few as 1% of asymptomatic adults, but up to 30% of immunocompromised patients with respiratory disease. One-half of reported Pan-CoV assays identify SARS-CoV-2 and with small adjustments can accommodate diverse variation observed in animal coronaviruses. The target region of 1 well-established Pan-CoV assay is highly resistant to mutation compared to species-specific SARS-CoV-2 reverse transcriptase-polymerase chain reaction assays. CONCLUSIONS: Despite cross-reactivity with common pathogens, pan-family assays may greatly assist management of emerging pandemics through prioritization of high-resolution testing or isolation measures. Targeting highly conserved genomic regions make pan-family assays robust and resilient to mutation. A strategic stockpile of pan-family assays may improve containment of novel diseases before the availability of species-specific assays.


COVID-19 , Pandemics , Animals , Humans , Mass Screening , Public Health , SARS-CoV-2
16.
Eur J Hum Genet ; 28(12): 1743-1752, 2020 12.
Article En | MEDLINE | ID: mdl-32733071

Human Leucocyte Antigen (HLA) testing is useful in the clinical work-up of coeliac disease (CD) with high negative but low positive predictive value. We construct a genomic risk score (GRS) using HLA risk genotypes to improve CD prediction and guide exclusion criteria. Imputed HLA genotypes for five European CD case-control GWAS (n > 15,000) were used to construct and validate an interpretable HLA-based risk model (HDQ15), which shows statistically significant improvements in predictive performance upon all previous HLA-based risk models. Conditioning on this model, we find two novel associations, HLA-DQ6.2 and HLA-DQ7.3, that interact significantly with HLA-DQ2.5 (p = 2.51 × 10-9, 1.99 × 10-7, respectively). Integrating these novel alleles into a new risk model (HDQ17) leads to predictive performance equivalent or better than the strongest reported GRS (GRS228) using 228 single nucleotide polymorphisms (SNPs). We also demonstrate that our proposed HLA-based models can be implemented using only six HLA tagging SNPs with statistically equivalent predictive performance. Using insights from our model to guide exclusionary criteria, we find the positive predictive value of CD testing in high-risk populations can be increased by 55%, from 17.5 to 27.1%, while maintaining a negative predictive value above 99%. Our results suggest that HLA typing is currently undervalued in CD assessment.


Celiac Disease/genetics , Epistasis, Genetic , Genome-Wide Association Study/methods , HLA Antigens/genetics , Algorithms , Alleles , HLA Antigens/metabolism , Humans , Polymorphism, Single Nucleotide
17.
BMC Bioinformatics ; 20(1): 540, 2019 Oct 30.
Article En | MEDLINE | ID: mdl-31666002

BACKGROUND: Knowledge of phase, the specific allele sequence on each copy of homologous chromosomes, is increasingly recognized as critical for detecting certain classes of disease-associated mutations. One approach for detecting such mutations is through phased haplotype association analysis. While the accuracy of methods for phasing genotype data has been widely explored, there has been little attention given to phasing accuracy at haplotype block scale. Understanding the combined impact of the accuracy of phasing tool and the method used to determine haplotype blocks on the error rate within the determined blocks is essential to conduct accurate haplotype analyses. RESULTS: We present a systematic study exploring the relationship between seven widely used phasing methods and two common methods for determining haplotype blocks. The evaluation focuses on the number of haplotype blocks that are incorrectly phased. Insights from these results are used to develop a haplotype estimator based on a consensus of three tools. The consensus estimator achieved the most accurate phasing in all applied tests. Individually, EAGLE2, BEAGLE and SHAPEIT2 alternate in being the best performing tool in different scenarios. Determining haplotype blocks based on linkage disequilibrium leads to more correctly phased blocks compared to a sliding window approach. We find that there is little difference between phasing sections of a genome (e.g. a gene) compared to phasing entire chromosomes. Finally, we show that the location of phasing error varies when the tools are applied to the same data several times, a finding that could be important for downstream analyses. CONCLUSIONS: The choice of phasing and block determination algorithms and their interaction impacts the accuracy of phased haplotype blocks. This work provides guidance and evidence for the different design choices needed for analyses using haplotype blocks. The study highlights a number of issues that may have limited the replicability of previous haplotype analysis.


Haplotypes , Algorithms , Linkage Disequilibrium
18.
Sci Rep ; 9(1): 4163, 2019 03 11.
Article En | MEDLINE | ID: mdl-30853713

It is increasingly recognized that Alzheimer's disease (AD) exists before dementia is present and that shifts in amyloid beta occur long before clinical symptoms can be detected. Early detection of these molecular changes is a key aspect for the success of interventions aimed at slowing down rates of cognitive decline. Recent evidence indicates that of the two established methods for measuring amyloid, a decrease in cerebrospinal fluid (CSF) amyloid ß1-42 (Aß1-42) may be an earlier indicator of Alzheimer's disease risk than measures of amyloid obtained from Positron Emission Tomography (PET). However, CSF collection is highly invasive and expensive. In contrast, blood collection is routinely performed, minimally invasive and cheap. In this work, we develop a blood-based signature that can provide a cheap and minimally invasive estimation of an individual's CSF amyloid status using a machine learning approach. We show that a Random Forest model derived from plasma analytes can accurately predict subjects as having abnormal (low) CSF Aß1-42 levels indicative of AD risk (0.84 AUC, 0.78 sensitivity, and 0.73 specificity). Refinement of the modeling indicates that only APOEε4 carrier status and four plasma analytes (CGA, Aß1-42, Eotaxin 3, APOE) are required to achieve a high level of accuracy. Furthermore, we show across an independent validation cohort that individuals with predicted abnormal CSF Aß1-42 levels transitioned to an AD diagnosis over 120 months significantly faster than those with predicted normal CSF Aß1-42 levels and that the resulting model also validates reasonably across PET Aß1-42 status (0.78 AUC). This is the first study to show that a machine learning approach, using plasma protein levels, age and APOEε4 carrier status, is able to predict CSF Aß1-42 status, the earliest risk indicator for AD, with high accuracy.


Alzheimer Disease/blood , Amyloid beta-Peptides/cerebrospinal fluid , Apolipoproteins E/blood , Chemokine CCL26/blood , Chromogranin A/blood , Peptide Fragments/cerebrospinal fluid , Aged , Aged, 80 and over , Alzheimer Disease/cerebrospinal fluid , Amyloid beta-Peptides/blood , Biomarkers/blood , Female , Humans , Male , Peptide Fragments/blood , Predictive Value of Tests
19.
Pharmacogenomics J ; 19(3): 230-239, 2019 06.
Article En | MEDLINE | ID: mdl-30093715

Reduction of adverse drug reaction (ADR) incidence through screening of predisposing human leucocyte antigen (HLA) alleles is a promising approach for many widely used drugs. However, application of these associations has been limited by the cost burden of HLA genotyping. Use of single nucleotide polymorphisms (SNPs) that can approximate ('tag') HLA alleles of interest has been proposed as a cost-effective and simple alternative to conventional genotyping. However, most reported SNP tags have not been validated and there is concern regarding clinical utility of this approach due to tagging inconsistency across different populations. We assess the ability of 67 previously reported and 378 novel tagging SNPs, identified here in 5 HLA reference panels, to tag 15 ADR-associated HLA alleles in a panel of 955 ethnically diverse samples. Tags for 8 HLA alleles of interest were identified with 100% sensitivity and >95% specificity. These SNPs may act as a reliable genotyping approach for the routine screening of patients, without the need to account for patient ethnicity.


Drug-Related Side Effects and Adverse Reactions/genetics , Ethnicity/genetics , HLA Antigens/genetics , Polymorphism, Single Nucleotide/genetics , Alleles , Genotype , Humans
20.
JNCI Cancer Spectr ; 2(4): pky057, 2018 Oct.
Article En | MEDLINE | ID: mdl-31360877

BACKGROUND: We applied machine learning to find a novel breast cancer predictor based on information in a mammogram. METHODS: Using image-processing techniques, we automatically processed 46 158 analog mammograms for 1345 cases and 4235 controls from a cohort and case-control study of Australian women, and a cohort study of Japanese American women, extracting 20 textural features not based on pixel brightness threshold. We used Bayesian lasso regression to create individual- and mammogram-specific measures of breast cancer risk, Cirrus. We trained and tested measures across studies. We fitted Cirrus with conventional mammographic density measures using logistic regression, and computed odds ratios (OR) per standard deviation adjusted for age and body mass index. RESULTS: Combining studies, almost all textural features were associated with case-control status. The ORs for Cirrus measures trained on one study and tested on another study ranged from 1.56 to 1.78 (all P < 10-6). For the Cirrus measure derived from combining studies, the OR was 1.90 (95% confidence interval [CI] = 1.73 to 2.09), equivalent to a fourfold interquartile risk ratio, and was little attenuated after adjusting for conventional measures. In contrast, the OR for the conventional measure was 1.34 (95% CI = 1.25 to 1.43), and after adjusting for Cirrus it became 1.16 (95% CI = 1.08 to 1.24; P = 4 × 10-5). CONCLUSIONS: A fully automated personal risk measure created from combining textural image features performs better at predicting breast cancer risk than conventional mammographic density risk measures, capturing half the risk-predicting ability of the latter measures. In terms of differentiating affected and unaffected women on a population basis, Cirrus could be one of the strongest known risk factors for breast cancer.

...