Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 31
Filtrar
1.
Genet Epidemiol ; 2024 Mar 19.
Artigo em Inglês | MEDLINE | ID: mdl-38504141

RESUMO

Young breast and bowel cancers (e.g., those diagnosed before age 40 or 50 years) have far greater morbidity and mortality in terms of years of life lost, and are increasing in incidence, but have been less studied. For breast and bowel cancers, the familial relative risks, and therefore the familial variances in age-specific log(incidence), are much greater at younger ages, but little of these familial variances has been explained. Studies of families and twins can address questions not easily answered by studies of unrelated individuals alone. We describe existing and emerging family and twin data that can provide special opportunities for discovery. We present designs and statistical analyses, including novel ideas such as the VALID (Variance in Age-specific Log Incidence Decomposition) model for causes of variation in risk, the DEPTH (DEPendency of association on the number of Top Hits) and other approaches to analyse genome-wide association study data, and the within-pair, ICE FALCON (Inference about Causation from Examining FAmiliaL CONfounding) and ICE CRISTAL (Inference about Causation from Examining Changes in Regression coefficients and Innovative STatistical AnaLysis) approaches to causation and familial confounding. Example applications to breast and colorectal cancer are presented. Motivated by the availability of the resources of the Breast and Colon Cancer Family Registries, we also present some ideas for future studies that could be applied to, and compared with, cancers diagnosed at older ages and address the challenges posed by young breast and bowel cancers.

2.
Bioinformatics ; 40(Suppl 1): i390-i400, 2024 06 28.
Artigo em Inglês | MEDLINE | ID: mdl-38940182

RESUMO

MOTIVATION: Biological background knowledge plays an important role in the manual quality assurance (QA) of biological database records. One such QA task is the detection of inconsistencies in literature-based Gene Ontology Annotation (GOA). This manual verification ensures the accuracy of the GO annotations based on a comprehensive review of the literature used as evidence, Gene Ontology (GO) terms, and annotated genes in GOA records. While automatic approaches for the detection of semantic inconsistencies in GOA have been developed, they operate within predetermined contexts, lacking the ability to leverage broader evidence, especially relevant domain-specific background knowledge. This paper investigates various types of background knowledge that could improve the detection of prevalent inconsistencies in GOA. In addition, the paper proposes several approaches to integrate background knowledge into the automatic GOA inconsistency detection process. RESULTS: We have extended a previously developed GOA inconsistency dataset with several kinds of GOA-related background knowledge, including GeneRIF statements, biological concepts mentioned within evidence texts, GO hierarchy and existing GO annotations of the specific gene. We have proposed several effective approaches to integrate background knowledge as part of the automatic GOA inconsistency detection process. The proposed approaches can improve automatic detection of self-consistency and several of the most prevalent types of inconsistencies.This is the first study to explore the advantages of utilizing background knowledge and to propose a practical approach to incorporate knowledge in automatic GOA inconsistency detection. We establish a new benchmark for performance on this task. Our methods may be applicable to various tasks that involve incorporating biological background knowledge. AVAILABILITY AND IMPLEMENTATION: https://github.com/jiyuc/de-inconsistency.


Assuntos
Ontologia Genética , Anotação de Sequência Molecular , Anotação de Sequência Molecular/métodos , Bases de Dados Genéticas , Biologia Computacional/métodos , Semântica , Humanos
3.
Brief Bioinform ; 23(6)2022 11 19.
Artigo em Inglês | MEDLINE | ID: mdl-36266246

RESUMO

Nucleotide and protein sequences stored in public databases are the cornerstone of many bioinformatics analyses. The records containing these sequences are prone to a wide range of errors, including incorrect functional annotation, sequence contamination and taxonomic misclassification. One source of information that can help to detect errors are the strong interdependency between records. Novel sequences in one database draw their annotations from existing records, may generate new records in multiple other locations and will have varying degrees of similarity with existing records across a range of attributes. A network perspective of these relationships between sequence records, within and across databases, offers new opportunities to detect-or even correct-erroneous entries and more broadly to make inferences about record quality. Here, we describe this novel perspective of sequence database records as a rich network, which we call the sequence database network, and illustrate the opportunities this perspective offers for quantification of database quality and detection of spurious entries. We provide an overview of the relevant databases and describe how the interdependencies between sequence records across these databases can be exploited by network analyses. We review the process of sequence annotation and provide a classification of sources of error, highlighting propagation as a major source. We illustrate the value of a network perspective through three case studies that use network analysis to detect errors, and explore the quality and quantity of critical relationships that would inform such network analyses. This systematic description of a network perspective of sequence database records provides a novel direction to combat the proliferation of errors within these critical bioinformatics resources.


Assuntos
Biologia Computacional , Bases de Dados de Ácidos Nucleicos , Sequência de Aminoácidos
4.
Neuroimage ; 278: 120279, 2023 09.
Artigo em Inglês | MEDLINE | ID: mdl-37454702

RESUMO

The recent biological redefinition of Alzheimer's Disease (AD) has spurred the development of statistical models that relate changes in biomarkers with neurodegeneration and worsening condition linked to AD. The ability to measure such changes may facilitate earlier diagnoses for affected individuals and help in monitoring the evolution of their condition. Amongst such statistical tools, disease progression models (DPMs) are quantitative, data-driven methods that specifically attempt to describe the temporal dynamics of biomarkers relevant to AD. Due to the heterogeneous nature of this disease, with patients of similar age experiencing different AD-related changes, a challenge facing longitudinal mixed-effects-based DPMs is the estimation of patient-realigning time-shifts. These time-shifts are indispensable for meaningful biomarker modelling, but may impact fitting time or vary with missing data in jointly estimated models. In this work, we estimate an individual's progression through Alzheimer's disease by combining multiple biomarkers into a single value using a probabilistic formulation of principal components analysis. Our results show that this variable, which summarises AD through observable biomarkers, is remarkably similar to jointly estimated time-shifts when we compute our scores for the baseline visit, on cross-sectional data from the Alzheimer's Disease Neuroimaging Initiative (ADNI). Reproducing the expected properties of clinical datasets, we confirm that estimated scores are robust to missing data or unavailable biomarkers. In addition to cross-sectional insights, we can model the latent variable as an individual progression score by repeating estimations at follow-up examinations and refining long-term estimates as more data is gathered, which would be ideal in a clinical setting. Finally, we verify that our score can be used as a pseudo-temporal scale instead of age to ignore some patient heterogeneity in cohort data and highlight the general trend in expected biomarker evolution in affected individuals.


Assuntos
Doença de Alzheimer , Disfunção Cognitiva , Humanos , Doença de Alzheimer/diagnóstico por imagem , Estudos Transversais , Neuroimagem/métodos , Biomarcadores , Progressão da Doença , Imageamento por Ressonância Magnética
5.
Brief Bioinform ; 22(5)2021 09 02.
Artigo em Inglês | MEDLINE | ID: mdl-33834181

RESUMO

MOTIVATION: The high accuracy of recent haplotype phasing tools is enabling the integration of haplotype (or phase) information more widely in genetic investigations. One such possibility is phase-aware expression quantitative trait loci (eQTL) analysis, where haplotype-based analysis has the potential to detect associations that may otherwise be missed by standard SNP-based approaches. RESULTS: We present eQTLHap, a novel method to investigate associations between gene expression and genetic variants, considering their haplotypic and genotypic effect. Using multiple simulations based on real data, we demonstrate that phase-aware eQTL analysis significantly outperforms typical SNP-based methods when the causal genetic architecture involves multiple SNPs. We show that phase-aware eQTL analysis is robust to phasing errors, showing only a minor impact ($<4\%$) on sensitivity. Applying eQTLHap to real GEUVADIS and GTEx datasets detects numerous novel eQTLs undetected by a single-SNP approach, with 22 eQTLs replicating across studies or tissue types, highlighting the utility of phase-aware eQTL analysis. AVAILABILITY AND IMPLEMENTATION: https://github.com/ziadbkh/eQTLHap. CONTACT: ziad.albkhetan@gmail.com. SUPPLEMENTARY INFORMATION: Supplementary data are available at Briefings in Bioinformatics online.


Assuntos
Biologia Computacional/métodos , Estudo de Associação Genômica Ampla/métodos , Haplótipos , Polimorfismo de Nucleotídeo Único , Locos de Características Quantitativas/genética , Algoritmos , Regulação da Expressão Gênica , Genótipo , Humanos , Internet , Desequilíbrio de Ligação
6.
Brief Bioinform ; 22(4)2021 07 20.
Artigo em Inglês | MEDLINE | ID: mdl-33236761

RESUMO

Haplotype phasing is a critical step for many genetic applications but incorrect estimates of phase can negatively impact downstream analyses. One proposed strategy to improve phasing accuracy is to combine multiple independent phasing estimates to overcome the limitations of any individual estimate. However, such a strategy is yet to be thoroughly explored. This study provides a comprehensive evaluation of consensus strategies for haplotype phasing. We explore the performance of different consensus paradigms, and the effect of specific constituent tools, across several datasets with different characteristics and their impact on the downstream task of genotype imputation. Based on the outputs of existing phasing tools, we explore two different strategies to construct haplotype consensus estimators: voting across outputs from multiple phasing tools and multiple outputs of a single non-deterministic tool. We find that the consensus approach from multiple tools reduces SE by an average of 10% compared to any constituent tool when applied to European populations and has the highest accuracy regardless of population ethnicity, sample size, variant density or variant frequency. Furthermore, the consensus estimator improves the accuracy of the downstream task of genotype imputation carried out by the widely used Minimac3, pbwt and BEAGLE5 tools. Our results provide guidance on how to produce the most accurate phasing estimates and the trade-offs that a consensus approach may have. Our implementation of consensus haplotype phasing, consHap, is available freely at https://github.com/ziadbkh/consHap. Supplementary information: Supplementary data are available at Briefings in Bioinformatics online.


Assuntos
Algoritmos , Bases de Dados de Ácidos Nucleicos , Polimorfismo de Nucleotídeo Único , Análise de Sequência de DNA , Haplótipos , Humanos
7.
Bioinformatics ; 38(Suppl 1): i273-i281, 2022 06 24.
Artigo em Inglês | MEDLINE | ID: mdl-35758780

RESUMO

MOTIVATION: Literature-based gene ontology annotations (GOA) are biological database records that use controlled vocabulary to uniformly represent gene function information that is described in the primary literature. Assurance of the quality of GOA is crucial for supporting biological research. However, a range of different kinds of inconsistencies in between literature as evidence and annotated GO terms can be identified; these have not been systematically studied at record level. The existing manual-curation approach to GOA consistency assurance is inefficient and is unable to keep pace with the rate of updates to gene function knowledge. Automatic tools are therefore needed to assist with GOA consistency assurance. This article presents an exploration of different GOA inconsistencies and an early feasibility study of automatic inconsistency detection. RESULTS: We have created a reliable synthetic dataset to simulate four realistic types of GOA inconsistency in biological databases. Three automatic approaches are proposed. They provide reasonable performance on the task of distinguishing the four types of inconsistency and are directly applicable to detect inconsistencies in real-world GOA database records. Major challenges resulting from such inconsistencies in the context of several specific application settings are reported. This is the first study to introduce automatic approaches that are designed to address the challenges in current GOA quality assurance workflows. The data underlying this article are available in Github at https://github.com/jiyuc/AutoGOAConsistency.


Assuntos
Publicações , Ontologia Genética , Anotação de Sequência Molecular
8.
Clin Infect Dis ; 73(9): e3047-e3052, 2021 11 02.
Artigo em Inglês | MEDLINE | ID: mdl-32687168

RESUMO

BACKGROUND: Coronavirus disease 2019 has highlighted deficiencies in the testing capacity of many developed countries during the early stages of pandemics. Here we describe a strategy using pan-family viral assays to improve early accessibility of large-scale nucleic acid testing. METHODS: Coronaviruses and severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) were used as a case study for assessing utility of pan-family viral assays during the early stages of a novel pandemic. Specificity of a pan-coronavirus (Pan-CoV) assay for a novel pathogen was assessed using the frequency of common human coronavirus (HCoV) species in key populations. A reported Pan-CoV assay was assessed to determine sensitivity to 60 reference coronaviruses, including SARS-CoV-2. The resilience of the primer target regions of this assay to mutation was assessed in 8893 high-quality SARS-CoV-2 genomes to predict ongoing utility during pandemic progression. RESULTS: Because of common HCoV species, a Pan-CoV assay would return false positives for as few as 1% of asymptomatic adults, but up to 30% of immunocompromised patients with respiratory disease. One-half of reported Pan-CoV assays identify SARS-CoV-2 and with small adjustments can accommodate diverse variation observed in animal coronaviruses. The target region of 1 well-established Pan-CoV assay is highly resistant to mutation compared to species-specific SARS-CoV-2 reverse transcriptase-polymerase chain reaction assays. CONCLUSIONS: Despite cross-reactivity with common pathogens, pan-family assays may greatly assist management of emerging pandemics through prioritization of high-resolution testing or isolation measures. Targeting highly conserved genomic regions make pan-family assays robust and resilient to mutation. A strategic stockpile of pan-family assays may improve containment of novel diseases before the availability of species-specific assays.


Assuntos
COVID-19 , Pandemias , Animais , Humanos , Programas de Rastreamento , Saúde Pública , SARS-CoV-2
9.
BMC Bioinformatics ; 20(1): 540, 2019 Oct 30.
Artigo em Inglês | MEDLINE | ID: mdl-31666002

RESUMO

BACKGROUND: Knowledge of phase, the specific allele sequence on each copy of homologous chromosomes, is increasingly recognized as critical for detecting certain classes of disease-associated mutations. One approach for detecting such mutations is through phased haplotype association analysis. While the accuracy of methods for phasing genotype data has been widely explored, there has been little attention given to phasing accuracy at haplotype block scale. Understanding the combined impact of the accuracy of phasing tool and the method used to determine haplotype blocks on the error rate within the determined blocks is essential to conduct accurate haplotype analyses. RESULTS: We present a systematic study exploring the relationship between seven widely used phasing methods and two common methods for determining haplotype blocks. The evaluation focuses on the number of haplotype blocks that are incorrectly phased. Insights from these results are used to develop a haplotype estimator based on a consensus of three tools. The consensus estimator achieved the most accurate phasing in all applied tests. Individually, EAGLE2, BEAGLE and SHAPEIT2 alternate in being the best performing tool in different scenarios. Determining haplotype blocks based on linkage disequilibrium leads to more correctly phased blocks compared to a sliding window approach. We find that there is little difference between phasing sections of a genome (e.g. a gene) compared to phasing entire chromosomes. Finally, we show that the location of phasing error varies when the tools are applied to the same data several times, a finding that could be important for downstream analyses. CONCLUSIONS: The choice of phasing and block determination algorithms and their interaction impacts the accuracy of phased haplotype blocks. This work provides guidance and evidence for the different design choices needed for analyses using haplotype blocks. The study highlights a number of issues that may have limited the replicability of previous haplotype analysis.


Assuntos
Haplótipos , Algoritmos , Desequilíbrio de Ligação
10.
Pharmacogenomics J ; 19(3): 230-239, 2019 06.
Artigo em Inglês | MEDLINE | ID: mdl-30093715

RESUMO

Reduction of adverse drug reaction (ADR) incidence through screening of predisposing human leucocyte antigen (HLA) alleles is a promising approach for many widely used drugs. However, application of these associations has been limited by the cost burden of HLA genotyping. Use of single nucleotide polymorphisms (SNPs) that can approximate ('tag') HLA alleles of interest has been proposed as a cost-effective and simple alternative to conventional genotyping. However, most reported SNP tags have not been validated and there is concern regarding clinical utility of this approach due to tagging inconsistency across different populations. We assess the ability of 67 previously reported and 378 novel tagging SNPs, identified here in 5 HLA reference panels, to tag 15 ADR-associated HLA alleles in a panel of 955 ethnically diverse samples. Tags for 8 HLA alleles of interest were identified with 100% sensitivity and >95% specificity. These SNPs may act as a reliable genotyping approach for the routine screening of patients, without the need to account for patient ethnicity.


Assuntos
Efeitos Colaterais e Reações Adversas Relacionados a Medicamentos/genética , Etnicidade/genética , Antígenos HLA/genética , Polimorfismo de Nucleotídeo Único/genética , Alelos , Genótipo , Humanos
11.
J Alzheimers Dis ; 2024 Sep 06.
Artigo em Inglês | MEDLINE | ID: mdl-39269831

RESUMO

Background: Integrating scores from multiple cognitive tests into a single cognitive composite has been shown to improve sensitivity to detect AD-related cognitive impairment. However, existing composites have little sensitivity to amyloid-ß status (Aß +/-) in preclinical AD. Objective: Evaluate whether a data-driven approach for deriving cognitive composites can improve the sensitivity to detect Aß status among cognitively unimpaired (CU) individuals compared to existing cognitive composites. Methods: Based on the data from the Anti-Amyloid Treatment in the Asymptomatic Alzheimer's Disease (A4) study, a novel composite, the Data-driven Preclinical Alzheimer's Cognitive Composite (D-PACC), was developed based on test scores and response durations selected using a machine learning algorithm from the Cogstate Brief Battery (CBB). The D-PACC was then compared with conventional composites in the follow-up A4 visits and in individuals from the Alzheimer's Disease Neuroimaging Initiative (ADNI). Result: The D-PACC showed a comparable or significantly higher ability to discriminate Aß status [median Cohen's d = 0.172] than existing composites at the A4 baseline visit, with similar results at the second visit. The D-PACC demonstrated the most consistent sensitivity to Aß status in both A4 and ADNI datasets. Conclusions: The D-PACC showed similar or improved sensitivity when screening for Aß+ in CU populations compared to existing composites but with higher consistency across studies.

12.
Hum Immunol ; 85(3): 110790, 2024 May.
Artigo em Inglês | MEDLINE | ID: mdl-38575482

RESUMO

Currently, the genetic variants strongly associated with risk for Multiple Sclerosis (MS) are located in the Major Histocompatibility Complex. This includes DRB1*15:01 and DRB1*15:03 alleles at the HLA-DRB1 locus, the latter restricted to African populations; the DQB1*06:02 allele at the HLA-DQB1 locus which is in high linkage disequilibrium (LD) with DRB1*15:01; and protective allele A*02:01 at the HLA-A locus. HLA allele identification is facilitated by co-inherited ('tag') single nucleotide polymorphisms (SNPs); however, SNP validation is not typically done outside of the discovery population. We examined 19 SNPs reported to be in high LD with these alleles in 2,502 healthy subjects included in the 1000 Genomes panel having typed HLA data. Examination of 3 indices (LD R2 values, sensitivity and specificity, minor allele frequency) revealed few SNPs with high tagging performance. All SNPs examined that tag DRB1*15:01 were in perfect LD in the British population; three showed high tagging performance in 4 of the 5 European, and 2 of the 4 American populations. For DQB1*06:02, with no previously validated tag SNPs, we show that rs3135388 has high tagging performance in one South Asian, one American, and one European population. We identify for the first time that rs2844821 has high tagging performance for A*02:01 in 5 of 7 African populations including African Americans, and 4 of the 5 European populations. These results provide a basis for selecting SNPs with high tagging performance to assess HLA alleles across diverse populations, for MS risk as well as for other diseases and conditions.


Assuntos
Alelos , Frequência do Gene , Predisposição Genética para Doença , Desequilíbrio de Ligação , Esclerose Múltipla , Polimorfismo de Nucleotídeo Único , Humanos , Esclerose Múltipla/genética , Cadeias beta de HLA-DQ/genética , Cadeias HLA-DRB1/genética , Genoma Humano , Risco
13.
J Alzheimers Dis ; 97(1): 89-100, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38007665

RESUMO

The accumulation of amyloid-ß (Aß) plaques in the brain is considered a hallmark of Alzheimer's disease (AD). Mathematical modeling, capable of predicting the motion and accumulation of Aß, has obtained increasing interest as a potential alternative to aid the diagnosis of AD and predict disease prognosis. These mathematical models have provided insights into the pathogenesis and progression of AD that are difficult to obtain through experimental studies alone. Mathematical modeling can also simulate the effects of therapeutics on brain Aß levels, thereby holding potential for drug efficacy simulation and the optimization of personalized treatment approaches. In this review, we provide an overview of the mathematical models that have been used to simulate brain levels of Aß (oligomers, protofibrils, and/or plaques). We classify the models into five categories: the general ordinary differential equation models, the general partial differential equation models, the network models, the linear optimal ordinary differential equation models, and the modified partial differential equation models (i.e., Smoluchowski equation models). The assumptions, advantages and limitations of these models are discussed. Given the popularity of using the Smoluchowski equation models to simulate brain levels of Aß, our review summarizes the history and major advancements in these models (e.g., their application to predict the onset of AD and their combined use with network models). This review is intended to bring mathematical modeling to the attention of more scientists and clinical researchers working on AD to promote cross-disciplinary research.


Assuntos
Doença de Alzheimer , Humanos , Doença de Alzheimer/patologia , Peptídeos beta-Amiloides/metabolismo , Modelos Teóricos , Encéfalo/patologia , Simulação por Computador , Placa Amiloide/patologia
14.
Alzheimers Res Ther ; 16(1): 175, 2024 Aug 01.
Artigo em Inglês | MEDLINE | ID: mdl-39085973

RESUMO

Several (inter)national longitudinal dementia observational datasets encompassing demographic information, neuroimaging, biomarkers, neuropsychological evaluations, and muti-omics data, have ushered in a new era of potential for integrating machine learning (ML) into dementia research and clinical practice. ML, with its proficiency in handling multi-modal and high-dimensional data, has emerged as an innovative technique to facilitate early diagnosis, differential diagnosis, and to predict onset and progression of mild cognitive impairment and dementia. In this review, we evaluate current and potential applications of ML, including its history in dementia research, how it compares to traditional statistics, the types of datasets it uses and the general workflow. Moreover, we identify the technical barriers and challenges of ML implementations in clinical practice. Overall, this review provides a comprehensive understanding of ML with non-technical explanations for broader accessibility to biomedical scientists and clinicians.


Assuntos
Demência , Aprendizado de Máquina , Humanos , Demência/diagnóstico , Pesquisa Biomédica/métodos , Neuroimagem/métodos
15.
Artigo em Inglês | MEDLINE | ID: mdl-38896210

RESUMO

BACKGROUND: The associations between mood disorders (anxiety and depression) and mild cognitive impairment (MCI) or Alzheimer's dementia (AD) remain unclear. METHODS: Data from the Australian Imaging, Biomarker & Lifestyle (AIBL) study were subjected to logistic regression to determine both cross-sectional and longitudinal associations between anxiety/depression and MCI/AD. Effect modification by selected covariates was analysed using the likelihood ratio test. RESULTS: Cross-sectional analysis was performed to explore the association between anxiety/depression and MCI/AD among 2,209 participants with a mean [SD] age of 72.3 [7.4] years, of whom 55.4% were female. After adjusting for confounding variables, we found a significant increase in the odds of AD among participants with two mood disorders (anxiety: OR 1.65 [95% CI 1.04-2.60]; depression: OR 1.73 [1.12-2.69]). Longitudinal analysis was conducted to explore the target associations among 1,379 participants with a mean age of 71.2 [6.6] years, of whom 56.3% were female. During a mean follow-up of 5.0 [4.2] years, 163 participants who developed MCI/AD (refer to as PRO) were identified. Only anxiety was associated with higher odds of PRO after adjusting for covariates (OR 1.56 [1.03-2.39]). However, after additional adjustment for depression, the association became insignificant. Additionally, age, sex, and marital status were identified as effect modifiers for the target associations. CONCLUSION: Our study provides supportive evidence that anxiety and depression impact on the evolution of MCI/AD, which provides valuable epidemiological insights that can inform clinical practice, guiding clinicians in offering targeted dementia prevention and surveillance programs to the at-risk populations.

16.
Cancer Epidemiol Biomarkers Prev ; 33(2): 306-313, 2024 02 06.
Artigo em Inglês | MEDLINE | ID: mdl-38059829

RESUMO

BACKGROUND: Cirrus is an automated risk predictor for breast cancer that comprises texture-based mammographic features and is mostly independent of mammographic density. We investigated genetic and environmental variance of variation in Cirrus. METHODS: We measured Cirrus for 3,195 breast cancer-free participants, including 527 pairs of monozygotic (MZ) twins, 271 pairs of dizygotic (DZ) twins, and 1,599 siblings of twins. Multivariate normal models were used to estimate the variance and familial correlations of age-adjusted Cirrus as a function of age. The classic twin model was expanded to allow the shared environment effects to differ by zygosity. The SNP-based heritability was estimated for a subset of 2,356 participants. RESULTS: There was no evidence that the variance or familial correlations depended on age. The familial correlations were 0.52 (SE, 0.03) for MZ pairs and 0.16(SE, 0.03) for DZ and non-twin sister pairs combined. Shared environmental factors specific to MZ pairs accounted for 20% of the variance. Additive genetic factors accounted for 32% (SE = 5%) of the variance, consistent with the SNP-based heritability of 36% (SE = 16%). CONCLUSION: Cirrus is substantially familial due to genetic factors and an influence of shared environmental factors that was evident for MZ twin pairs only. The latter could be due to nongenetic factors operating in utero or in early life that are shared by MZ twins. IMPACT: Early-life factors, shared more by MZ pairs than DZ/non-twin sister pairs, could play a role in the variation in Cirrus, consistent with early life being recognized as a critical window of vulnerability to breast carcinogens.


Assuntos
Neoplasias da Mama , Feminino , Humanos , Mama , Neoplasias da Mama/diagnóstico por imagem , Neoplasias da Mama/genética , Mamografia , Fatores de Risco , Gêmeos Dizigóticos/genética , Gêmeos Monozigóticos/genética
17.
BMC Genomics ; 14 Suppl 3: S10, 2013.
Artigo em Inglês | MEDLINE | ID: mdl-23819779

RESUMO

BACKGROUND: It has been hypothesized that multivariate analysis and systematic detection of epistatic interactions between explanatory genotyping variables may help resolve the problem of "missing heritability" currently observed in genome-wide association studies (GWAS). However, even the simplest bivariate analysis is still held back by significant statistical and computational challenges that are often addressed by reducing the set of analysed markers. Theoretically, it has been shown that combinations of loci may exist that show weak or no effects individually, but show significant (even complete) explanatory power over phenotype when combined. Reducing the set of analysed SNPs before bivariate analysis could easily omit such critical loci. RESULTS: We have developed an exhaustive bivariate GWAS analysis methodology that yields a manageable subset of candidate marker pairs for subsequent analysis using other, often more computationally expensive techniques. Our model-free filtering approach is based on classification using ROC curve analysis, an alternative to much slower regression-based modelling techniques. Exhaustive analysis of studies containing approximately 450,000 SNPs and 5,000 samples requires only 2 hours using a desktop CPU or 13 minutes using a GPU (Graphics Processing Unit). We validate our methodology with analysis of simulated datasets as well as the seven Wellcome Trust Case-Control Consortium datasets that represent a wide range of real life GWAS challenges. We have identified SNP pairs that have considerably stronger association with disease than their individual component SNPs that often show negligible effect univariately. When compared against previously reported results in the literature, our methods re-detect most significant SNP-pairs and additionally detect many pairs absent from the literature that show strong association with disease. The high overlap suggests that our fast analysis could substitute for some slower alternatives. CONCLUSIONS: We demonstrate that the proposed methodology is robust, fast and capable of exhaustive search for epistatic interactions using a standard desktop computer. First, our implementation is significantly faster than timings for comparable algorithms reported in the literature, especially as our method allows simultaneous use of multiple statistical filters with low computing time overhead. Second, for some diseases, we have identified hundreds of SNP pairs that pass formal multiple test (Bonferroni) correction and could form a rich source of hypotheses for follow-up analysis. AVAILABILITY: A web-based version of the software used for this analysis is available at http://bioinformatics.research.nicta.com.au/gwis.


Assuntos
Algoritmos , Biologia Computacional/métodos , Epistasia Genética/genética , Estudo de Associação Genômica Ampla/métodos , Modelos Genéticos , Polimorfismo de Nucleotídeo Único/genética , Software , Simulação por Computador , Humanos , Curva ROC , Sensibilidade e Especificidade , Fatores de Tempo
18.
Arterioscler Thromb Vasc Biol ; 31(11): 2723-32, 2011 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-21903946

RESUMO

OBJECTIVE: Traditional risk factors for coronary artery disease (CAD) fail to adequately distinguish patients who have atherosclerotic plaques susceptible to instability from those who have more benign forms. Using plasma lipid profiling, this study aimed to provide insight into disease pathogenesis and evaluate the potential of lipid profiles to assess risk of future plaque instability. METHODS AND RESULTS: Plasma lipid profiles containing 305 lipids were measured on 220 individuals (matched healthy controls, n=80; stable angina, n=60; unstable coronary syndrome, n=80) using electrospray-ionisation tandem mass spectrometry. ReliefF feature selection coupled with an L2-regularized logistic regression based classifier was used to create multivariate classification models which were verified via 3-fold cross-validation (1000 repeats). Models incorporating both lipids and traditional risk factors provided improved classification of unstable CAD from stable CAD (C-statistic=0.875, 95% CI 0.874-0.877) compared with models containing only traditional risk factors (C-statistic=0.796, 95% CI 0.795-0.798). Many of the lipids identified as discriminatory for unstable CAD displayed an association with disease acuity (severity), suggesting that they are antecedents to the onset of acute coronary syndrome. CONCLUSION: Plasma lipid profiling may contribute to a new approach to risk stratification for unstable CAD.


Assuntos
Doença da Artéria Coronariana/sangue , Doença da Artéria Coronariana/epidemiologia , Lipídeos/sangue , Síndrome Coronariana Aguda/sangue , Síndrome Coronariana Aguda/diagnóstico , Síndrome Coronariana Aguda/epidemiologia , Adulto , Idoso , Angina Estável/sangue , Angina Estável/diagnóstico , Angina Estável/epidemiologia , Angina Instável/sangue , Angina Instável/diagnóstico , Angina Instável/epidemiologia , Biomarcadores/sangue , Estudos de Casos e Controles , Doença da Artéria Coronariana/diagnóstico , Feminino , Humanos , Modelos Logísticos , Masculino , Pessoa de Meia-Idade , Fatores de Risco , Índice de Gravidade de Doença
19.
J Phys Chem B ; 126(28): 5151-5160, 2022 07 21.
Artigo em Inglês | MEDLINE | ID: mdl-35796490

RESUMO

Free energy perturbation (FEP) calculations can predict relative binding affinities of an antigen and its point mutants to the same human leukocyte antigen (HLA) with high accuracy (e.g., within 1.0 kcal/mol to experiment); however, a more challenging task is to compare binding affinities of wholly different antigens binding to completely different HLAs using FEP. Researchers have used a variety of different FEP schemes to compute and compare absolute binding affinities, with varied success. Here, we propose and assess a unifying scheme to compute the relative binding affinities of different antigens binding to completely different HLAs using absolute binding affinity FEP calculations. We apply our affinity calculation technique to HLA-antigen-T-cell receptor (TCR) systems relevant to celiac disease (CeD) by investigating binding affinity differences between HLA-DQ2.5 (enhanced CeD risk) and HLA-DQ7.5 (CeD protective) in the binary (HLA-gliadin) and ternary (HLA-gliadin-TCR) binding complexes for three gliadin derived epitopes: glia-α1, glia-α2, and glia-ω1. Based on FEP calculations with our carefully designed thermodynamic cycles, we demonstrate that HLA-DQ2.5 has higher binding affinity than HLA-DQ7.5 for gliadin and enhanced binding affinity with a common TCR, agreeing with known results that the HLA-DQ2.5 serotype exhibits increased risk for CeD. Our findings reveal that our proposed absolute binding affinity FEP method is appropriate for predicting HLA binding for disparate antigens with different genotypes. We also discuss atomic-level details of HLA genotypes interacting with gluten peptides and TCRs in regard to the pathogenesis of CeD.


Assuntos
Doença Celíaca , Glutens , Doença Celíaca/genética , Doença Celíaca/metabolismo , Epitopos de Linfócito T/genética , Epitopos de Linfócito T/metabolismo , Gliadina/química , Glutens/química , Humanos , Peptídeos/química , Receptores de Antígenos de Linfócitos T/química , Receptores de Antígenos de Linfócitos T/genética
20.
Trends Hear ; 25: 23312165211066174, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34903103

RESUMO

While cochlear implants have helped hundreds of thousands of individuals, it remains difficult to predict the extent to which an individual's hearing will benefit from implantation. Several publications indicate that machine learning may improve predictive accuracy of cochlear implant outcomes compared to classical statistical methods. However, existing studies are limited in terms of model validation and evaluating factors like sample size on predictive performance. We conduct a thorough examination of machine learning approaches to predict word recognition scores (WRS) measured approximately 12 months after implantation in adults with post-lingual hearing loss. This is the largest retrospective study of cochlear implant outcomes to date, evaluating 2,489 cochlear implant recipients from three clinics. We demonstrate that while machine learning models significantly outperform linear models in prediction of WRS, their overall accuracy remains limited (mean absolute error: 17.9-21.8). The models are robust across clinical cohorts, with predictive error increasing by at most 16% when evaluated on a clinic excluded from the training set. We show that predictive improvement is unlikely to be improved by increasing sample size alone, with doubling of sample size estimated to only increasing performance by 3% on the combined dataset. Finally, we demonstrate how the current models could support clinical decision making, highlighting that subsets of individuals can be identified that have a 94% chance of improving WRS by at least 10% points after implantation, which is likely to be clinically meaningful. We discuss several implications of this analysis, focusing on the need to improve and standardize data collection.


Assuntos
Implante Coclear , Implantes Cocleares , Surdez , Auxiliares de Audição , Percepção da Fala , Adulto , Implante Coclear/métodos , Surdez/diagnóstico , Humanos , Estudos Retrospectivos , Resultado do Tratamento
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA