Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 31
Filtrar
1.
Artif Intell Med ; 138: 102510, 2023 04.
Artículo en Inglés | MEDLINE | ID: mdl-36990588

RESUMEN

Several recent studies indicate that atypical changes in driving behaviors appear to be early signs of mild cognitive impairment (MCI) and dementia. These studies, however, are limited by small sample sizes and short follow-up duration. This study aims to develop an interaction-based classification method building on a statistic named Influence Score (i.e., I-score) for prediction of MCI and dementia using naturalistic driving data collected from the Longitudinal Research on Aging Drivers (LongROAD) project. Naturalistic driving trajectories were collected through in-vehicle recording devices for up to 44 months from 2977 participants who were cognitively intact at the time of enrollment. These data were further processed and aggregated to generate 31 time-series driving variables. Because of high dimensional time-series features for driving variables, we used I-score for variable selection. I-score is a measure to evaluate variables' ability to predict and is proven to be effective in differentiating between noisy and predictive variables in big data. It is introduced here to select influential variable modules or groups that account for compound interactions among explanatory variables. It is explainable regarding to what extent variables and their interactions contribute to the predictiveness of a classifier. In addition, I-score boosts the performance of classifiers over imbalanced datasets due to its association with the F1 score. Using predictive variables selected by I-score, interaction-based residual blocks are constructed over top I-score modules to generate predictors and ensemble learning aggregates these predictors to boost the prediction of the overall classifier. Experiments using naturalistic driving data show that our proposed classification method achieves the best accuracy (96%) for predicting MCI and dementia, followed by random forest (93%) and logistic regression (88%). In terms of F1 score and AUC, our proposed classifier achieves 98% and 87%, respectively, followed by random forest (with an F1 score of 96% and an AUC of 79%) and logistic regression (with an F1 score of 92% and an AUC of 77%). The results indicate that incorporating I-score into machine learning algorithms could considerably improve the model performance for predicting MCI and dementia in older drivers. We also performed the feature importance analysis and found that the right to left turn ratio and the number of hard braking events are the most important driving variables to predict MCI and dementia.


Asunto(s)
Enfermedad de Alzheimer , Disfunción Cognitiva , Humanos , Anciano , Enfermedad de Alzheimer/diagnóstico , Disfunción Cognitiva/diagnóstico , Disfunción Cognitiva/psicología , Algoritmos , Bosques Aleatorios , Aprendizaje Automático
2.
BMC Proc ; 12(Suppl 9): 40, 2018.
Artículo en Inglés | MEDLINE | ID: mdl-30275890

RESUMEN

In this paper, we consider the use of the least absolute shrinkage and selection operator (LASSO)-type regression techniques to detect important genetic or epigenetic loci in genome-wide association studies (GWAS) and epigenome-wide association studies (EWAS). We demonstrate how these techniques can be adapted to provide quantifiable uncertainty using stability selection, including explicit control of the family-wise error rate. We also consider variants of the LASSO, such as the group LASSO, to study genetic and epigenetic interactions. We use these techniques to reproduce some existing results on the Genetics of Lipid Lowering Drugs and Diet Network (GOLDN) data set, which collects from 991 individuals blood triglyceride and differential methylation at 464,000 cytosine-phosphate-guanine (CpG) sites and 761,000 single-nucleotide polymorphisms (SNPs), and to identify new research directions. Epigenome-wide and genome-wide models based on the LASSO are considered, as well as an interaction model limited to chromosome 11. The analyses replicate findings concerning 2 CpGs in carnitine palmitoyltransferase 1A (CPT1A). Some suggestions are made regarding potentially interesting directions for the analysis of genetic and epigenetic interactions.

3.
BMC Proc ; 12(Suppl 9): 42, 2018.
Artículo en Inglés | MEDLINE | ID: mdl-30263047

RESUMEN

In this paper, a fully statistical investigation of the control of family structure as random effects is analyzed and discussed, using both the genome-wide association studies (GWAS) data and simulated data. Three modeling strategies are proposed and the analysis results suggest the hybrid use of results from all possible models should be combined in practice.

4.
BMC Proc ; 10(Suppl 7): 131-134, 2016.
Artículo en Inglés | MEDLINE | ID: mdl-27980624

RESUMEN

We propose a new method for identifying disease-related regions of single nucleotide variants in recently admixed populations. We use principal component analysis to derive both global and local ancestry information. We then use the summation partition approach to search for disease-related regions based on both rare variants and the local ancestral information of each region. We demonstrate this method using individuals with high systolic blood pressure from a sample of unrelated Mexican American subjects provided in the 19th Genetic Analysis Workshop.

5.
BMC Proc ; 10(Suppl 7): 333-336, 2016.
Artículo en Inglés | MEDLINE | ID: mdl-27980658

RESUMEN

Interactions between genes are an important part of the genetic architecture of complex diseases. In this paper, we use literature-guided individual genes known to be associated with type 2 diabetes (referred to as "seed genes") to create a larger list of genes that share implied or direct networks with these seed genes. This larger list of genes are known to interact with each other, but whether they interact in ways to influence hypertension in individuals presents an interesting question. Using Genetic Analysis Workshop data on individuals with diabetes, for which only case-control labels of hypertension are known, we offer a foray into identification of diabetes-related gene interactions that are associated with hypertension. We use the approach of Lo et al. (Proc Natl Acad Sci U S A 105: 12387-12392, 2008), which creates a score to identify pairwise significant gene associations. We find that the genes GCK and PAX4, formerly known to be found within similar coexpression and pathway networks but without specific direct interactions, do, in fact, show significant joint interaction effects for hypertension.

6.
Proc Natl Acad Sci U S A ; 113(50): 14277-14282, 2016 12 13.
Artículo en Inglés | MEDLINE | ID: mdl-27911830

RESUMEN

We propose approaching prediction from a framework grounded in the theoretical correct prediction rate of a variable set as a parameter of interest. This framework allows us to define a measure of predictivity that enables assessing variable sets for, preferably high, predictivity. We first define the prediction rate for a variable set and consider, and ultimately reject, the naive estimator, a statistic based on the observed sample data, due to its inflated bias for moderate sample size and its sensitivity to noisy useless variables. We demonstrate that the [Formula: see text]-score of the PR method of VS yields a relatively unbiased estimate of a parameter that is not sensitive to noisy variables and is a lower bound to the parameter of interest. Thus, the PR method using the [Formula: see text]-score provides an effective approach to selecting highly predictive variables. We offer simulations and an application of the [Formula: see text]-score on real data to demonstrate the statistic's predictive performance on sample data. We conjecture that using the partition retention and [Formula: see text]-score can aid in finding variable sets with promising prediction rates; however, further research in the avenue of sample-based measures of predictivity is much desired.

7.
Proc Natl Acad Sci U S A ; 112(45): 13892-7, 2015 Nov 10.
Artículo en Inglés | MEDLINE | ID: mdl-26504198

RESUMEN

Thus far, genome-wide association studies (GWAS) have been disappointing in the inability of investigators to use the results of identified, statistically significant variants in complex diseases to make predictions useful for personalized medicine. Why are significant variables not leading to good prediction of outcomes? We point out that this problem is prevalent in simple as well as complex data, in the sciences as well as the social sciences. We offer a brief explanation and some statistical insights on why higher significance cannot automatically imply stronger predictivity and illustrate through simulations and a real breast cancer example. We also demonstrate that highly predictive variables do not necessarily appear as highly significant, thus evading the researcher using significance-based methods. We point out that what makes variables good for prediction versus significance depends on different properties of the underlying distributions. If prediction is the goal, we must lay aside significance as the only selection standard. We suggest that progress in prediction requires efforts toward a new research agenda of searching for a novel criterion to retrieve highly predictive variables rather than highly significant variables. We offer an alternative approach that was not designed for significance, the partition retention method, which was very effective predicting on a long-studied breast cancer data set, by reducing the classification error rate from 30% to 8%.


Asunto(s)
Estudio de Asociación del Genoma Completo , Humanos , Medicina de Precisión , Valor Predictivo de las Pruebas
8.
BMC Proc ; 8(Suppl 1): S47, 2014.
Artículo en Inglés | MEDLINE | ID: mdl-25519328

RESUMEN

Current sequencing technology enables generation of whole genome sequencing data sets that contain a high density of rare variants, each of which is carried by, at most, 5% of the sampled subjects. Such variants are involved in the etiology of most common diseases in humans. These diseases can be studied by relevant longitudinal phenotype traits. Tests for association between such genotype information and longitudinal traits allow the study of the function of rare variants in complex human disorders. In this paper, we propose an association-screening framework that highlights the genotypic differences observed on rare variants and the longitudinal nature of phenotypes. In particular, both variants within a gene and longitudinal phenotypes are used to create partitions of subjects. Association between the 2 sets of constructed partitions is then evaluated. We apply the proposed strategy to the simulated data from the Genetic Analysis Workshop 18 and compare the obtained results with those from sequence kernel association test using the receiver operating characteristic curves.

10.
BMC Proc ; 8(Suppl 1 Genetic Analysis Workshop 18Vanessa Olmo): S60, 2014.
Artículo en Inglés | MEDLINE | ID: mdl-25519395

RESUMEN

It is believed that almost all common diseases are the consequence of complex interactions between genetic markers and environmental factors. However, few such interactions have been documented to date. Conventional statistical methods for detecting gene and environmental interactions are often based on the linear regression model, which assumes a linear interaction effect. In this study, we propose a nonparametric partition-based approach that is able to capture complex interaction patterns. We apply this method to the real data set of hypertension provided by Genetic Analysis Workshop 18. Compared with the linear regression model, the proposed approach is able to identify many additional variants with significant gene-environmental interaction effects. We further investigate one single-nucleotide polymorphism identified by our method and show that its gene-environmental interaction effect is, indeed, nonlinear. To adjust for the family dependence of phenotypes, we apply different permutation strategies and investigate their effects on the outcomes.

11.
BMC Proc ; 8(Suppl 1 Genetic Analysis Workshop 18Vanessa Olmo): S62, 2014.
Artículo en Inglés | MEDLINE | ID: mdl-25519396

RESUMEN

Environment has long been known to play an important part in disease etiology. However, not many genome-wide association studies take environmental factors into consideration. There is also a need for new methods to identify the gene-environment interactions. In this study, we propose a 2-step approach incorporating an influence measure that capturespure gene-environment effect. We found that pure gene-age interaction has a stronger association than considering the genetic effect alone for systolic blood pressure, measured by counting the number of single-nucleotide polymorphisms (SNPs)reaching a certain significance level. We analyzed the subjects by dividing them into two age groups and found no overlap in the top identified SNPs between them. This suggested that age might have a nonlinear effect on genetic association. Furthermore, the scores of the top SNPs for the two age subgroups were about 3times those obtained when using all subjects for systolic blood pressure. In addition, the scores of the older age subgroup were much higher than those for the younger group. The results suggest that genetic effects are stronger in older age and that genetic association studies should take environmental effects into consideration, especially age.

12.
BMC Proc ; 8(Suppl 1 Genetic Analysis Workshop 18Vanessa Olmo): S7, 2014.
Artículo en Inglés | MEDLINE | ID: mdl-25519400

RESUMEN

In this study, we analyze the Genetic Analysis Workshop 18 (GAW18) data to identify regions of single-nucleotide polymorphisms (SNPs), which significantly influence hypertension status among individuals. We have studied the marginal impact of these regions on disease status in the past, but we extend the method to deal with environmental factors present in data collected over several exam periods. We consider the respective interactions between such traits as smoking status and age with the genetic information and hope to augment those genetic regions deemed influential marginally with those that contribute via an interactive effect. In particular, we focus only on rare variants and apply a procedure to combine signal among rare variants in a number of "fixed bins" along the chromosome. We extend the procedure in Agne et al [1] to incorporate environmental factors by dichotomizing subjects via traits such as smoking status and age, running the marginal procedure among each respective category (i.e., smokers or nonsmokers), and then combining their scores into a score for interaction. To avoid overlap of subjects, we examine each exam period individually. Out of a possible 629 fixed-bin regions in chromosome 3, we observe that 11 show up in multiple exam periods for gene-smoking score. Fifteen regions exhibit significance for multiple exam periods for gene-age score, with 4 regions deemed significant for all 3 exam periods. The procedure pinpoints SNPs in 8 "answer" genes, with 5 of these showing up as significant in multiple testing schemes (Gene-Smoking, Gene-Age for Exams 1, 2, and 3).

13.
PLoS One ; 8(12): e83057, 2013.
Artículo en Inglés | MEDLINE | ID: mdl-24358248

RESUMEN

Recently more and more evidence suggest that rare variants with much lower minor allele frequencies play significant roles in disease etiology. Advances in next-generation sequencing technologies will lead to many more rare variants association studies. Several statistical methods have been proposed to assess the effect of rare variants by aggregating information from multiple loci across a genetic region and testing the association between the phenotype and aggregated genotype. One limitation of existing methods is that they only look into the marginal effects of rare variants but do not systematically take into account effects due to interactions among rare variants and between rare variants and environmental factors. In this article, we propose the summation of partition approach (SPA), a robust model-free method that is designed specifically for detecting both marginal effects and effects due to gene-gene (G×G) and gene-environmental (G×E) interactions for rare variants association studies. SPA has three advantages. First, it accounts for the interaction information and gains considerable power in the presence of unknown and complicated G×G or G×E interactions. Secondly, it does not sacrifice the marginal detection power; in the situation when rare variants only have marginal effects it is comparable with the most competitive method in current literature. Thirdly, it is easy to extend and can incorporate more complex interactions; other practitioners and scientists can tailor the procedure to fit their own study friendly. Our simulation studies show that SPA is considerably more powerful than many existing methods in the presence of G×G and G×E interactions.


Asunto(s)
Epistasis Genética , Interacción Gen-Ambiente , Variación Genética , Estudio de Asociación del Genoma Completo/métodos , Modelos Genéticos , Algoritmos , Animales , Simulación por Computador , Frecuencia de los Genes , Estudio de Asociación del Genoma Completo/estadística & datos numéricos , Genotipo , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Polimorfismo de Nucleótido Simple , Carácter Cuantitativo Heredable , Análisis de Secuencia de ADN
14.
Bioinformatics ; 28(21): 2834-42, 2012 Nov 01.
Artículo en Inglés | MEDLINE | ID: mdl-22945786

RESUMEN

MOTIVATION: Epistasis or gene-gene interaction has gained increasing attention in studies of complex diseases. Its presence as an ubiquitous component of genetic architecture of common human diseases has been contemplated. However, the detection of gene-gene interaction is difficult due to combinatorial explosion. RESULTS: We present a novel feature selection method incorporating variable interaction. Three gene expression datasets are analyzed to illustrate our method, although it can also be applied to other types of high-dimensional data. The quality of variables selected is evaluated in two ways: first by classification error rates, then by functional relevance assessed using biological knowledge. We show that the classification error rates can be significantly reduced by considering interactions. Secondly, a sizable portion of genes identified by our method for breast cancer metastasis overlaps with those reported in gene-to-system breast cancer (G2SBC) database as disease associated and some of them have interesting biological implication. In summary, interaction-based methods may lead to substantial gain in biological insights as well as more accurate prediction.


Asunto(s)
Algoritmos , Neoplasias de la Mama/clasificación , Neoplasias de la Mama/genética , Perfilación de la Expresión Génica/métodos , Pruebas Genéticas/métodos , Modelos Estadísticos , Recurrencia Local de Neoplasia/genética , Epistasis Genética , Femenino , Regulación Neoplásica de la Expresión Génica , Humanos , Modelos Logísticos , Familia de Multigenes , Recurrencia Local de Neoplasia/clasificación
15.
BMC Proc ; 5 Suppl 9: S17, 2011.
Artículo en Inglés | MEDLINE | ID: mdl-22373071

RESUMEN

Genome-wide association studies have been successful at identifying common disease variants associated with complex diseases, but the common variants identified have small effect sizes and account for only a small fraction of the estimated heritability for common diseases. Theoretical and empirical studies suggest that rare variants, which are much less frequent in populations and are poorly captured by single-nucleotide polymorphism chips, could play a significant role in complex diseases. Several new statistical methods have been developed for the analysis of rare variants, for example, the combined multivariate and collapsing method, the weighted-sum method and a replication-based method. Here, we apply and compare these methods to the simulated data sets of Genetic Analysis Workshop 17 and thereby explore the contribution of rare variants to disease risk. In addition, we investigate the usefulness of extreme phenotypes in identifying rare risk variants when dealing with quantitative traits. Finally, we perform a pathway analysis and show the importance of the vascular endothelial growth factor pathway in explaining different phenotypes.

16.
BMC Proc ; 5 Suppl 9: S3, 2011 Nov 29.
Artículo en Inglés | MEDLINE | ID: mdl-22373412

RESUMEN

In this study, we analyze the Genetic Analysis Workshop 17 data to identify regions of single-nucleotide polymorphisms (SNPs) that exhibit a significant influence on response rate (proportion of subjects with an affirmative affected status), called the affected ratio, among rare variants. Under the null hypothesis, the distribution of rare variants is assumed to be uniform over case (affected) and control (unaffected) subjects. We attempt to pinpoint regions where the composition is significantly different between case and control events, specifically where there are unusually high numbers of rare variants among affected subjects. We focus on private variants, which require a degree of "collapsing" to combine information over several SNPs, to obtain meaningful results. Instead of implementing a gene-based approach, where regions would vary in size and sometimes be too small to achieve a strong enough signal, we implement a fixed-bin approach, with a preset number of SNPs per region, relying on the assumption that proximity and similarity go hand in hand. Through application of 100-SNP and 30-SNP fixed bins, we identify several most influential regions, which later are seen to contain some of the causal SNPs. The 100- and 30-SNP approaches detected seven and three causal SNPs among the most significant regions, respectively, with two overlapping SNPs located in the ELAVL4 gene, reported by both procedures.

17.
BMC Proc ; 5 Suppl 9: S50, 2011 Nov 29.
Artículo en Inglés | MEDLINE | ID: mdl-22373518

RESUMEN

The advance of high-throughput next-generation sequencing technology makes possible the analysis of rare variants. However, the investigation of rare variants in unrelated-individuals data sets faces the challenge of low power, and most methods circumvent the difficulty by using various collapsing procedures based on genes, pathways, or gene clusters. We suggest a new way to identify causal rare variants using the F-statistic and sliced inverse regression. The procedure is tested on the data set provided by the Genetic Analysis Workshop 17 (GAW17). After preliminary data reduction, we ranked markers according to their F-statistic values. Top-ranked markers were then subjected to sliced inverse regression, and those with higher absolute coefficients in the most significant sliced inverse regression direction were selected. The procedure yields good false discovery rates for the GAW17 data and thus is a promising method for future study on rare variants.

18.
BMC Proc ; 5 Suppl 9: S106, 2011 Nov 29.
Artículo en Inglés | MEDLINE | ID: mdl-22373536

RESUMEN

Both common variants and rare variants are involved in the etiology of most complex diseases in humans. Developments in sequencing technology have led to the identification of a high density of rare variant single-nucleotide polymorphisms (SNPs) on the genome, each of which affects only at most 1% of the population. Genotypes derived from these SNPs allow one to study the involvement of rare variants in common human disorders. Here, we propose an association screening approach that treats genes as units of analysis. SNPs within a gene are used to create partitions of individuals, and inverse-probability weighting is used to overweight genotypic differences observed on rare variants. Association between a phenotype trait and the constructed partition is then evaluated. We consider three association tests (one-way ANOVA, chi-square test, and the partition retention method) and compare these strategies using the simulated data from the Genetic Analysis Workshop 17. Several genes that contain causal SNPs were identified by the proposed method as top genes.

19.
BMC Proc ; 3 Suppl 7: S132, 2009 Dec 15.
Artículo en Inglés | MEDLINE | ID: mdl-20017999

RESUMEN

The genes PTPN22 and HLA-DRB1 have been found by a number of studies to confer an increased risk for rheumatoid arthritis (RA), which indicates that both genes play an important role in RA etiology. It is believed that they not only have strong association with RA individually, but also interact with other related genes that have not been found to have predisposing RA mutations. In this paper, we conduct genome-wide searches for RA-associated gene-gene interactions that involve PTPN22 or HLA-DRB1 using the Genetic Analysis Workshop 16 Problem 1 data from the North American Rheumatoid Arthritis Consortium. MGC13017, HSPCAL3, MIA, PTPNS1L, and IGLVI-70, which showed association with RA in previous studies, have been confirmed in our analysis.

20.
BMC Proc ; 3 Suppl 7: S75, 2009 Dec 15.
Artículo en Inglés | MEDLINE | ID: mdl-20018070

RESUMEN

Rheumatoid arthritis (RA, MIM 180300) is a chronic and complex autoimmune disease. Using the North American Rheumatoid Arthritis Consortium (NARAC) data set provided in Genetic Analysis Workshop 16 (GAW16), we used the genotype-trait distortion (GTD) scores and proposed analysis procedures to capture the gene-gene interaction effects of multiple susceptibility gene regions on RA. In this paper, we focused on 27 RA candidate gene regions (531 SNPs) based on a literature search. Statistical significance was evaluated using 1000 permutations. HLADRB1 was found to have strong marginal association with RA. We identified 14 significant interactions (p < 0.01), which were aggregated into an association network among 12 selected candidate genes PADI4, FCGR3, TNFRSF1B, ITGAV, BTLA, SLC22A4, IL3, VEGF, TNF, NFKBIL1, TRAF1-C5, and MIF. Based on our and other contributors' findings during the GAW16 conference, we further studied 24 candidate regions with 336 SNPs. We found 23 significant interactions (p-value < 0.01), nine interactions in addition to our initial findings, and the association network was extended to include candidate genes HLA-A, HLA-B, HLA-C, CTLA4, and IL6. As we will discuss in this paper, the reported possible interactions between genes may suggest potential biological activities of RA.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA