Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 29
Filtrar
1.
Nat Commun ; 15(1): 7794, 2024 Sep 06.
Artículo en Inglés | MEDLINE | ID: mdl-39242579

RESUMEN

Imaging-based spatial transcriptomics technologies such as Multiplexed error-robust fluorescence in situ hybridization (MERFISH) can capture cellular processes in unparalleled detail. However, rigorous and robust analytical tools are needed to unlock their full potential for discovering subcellular biological patterns. We present Intracellular Spatial Transcriptomic Analysis Toolkit (InSTAnT), a computational toolkit for extracting molecular relationships from spatial transcriptomics data at single molecule resolution. InSTAnT employs specialized statistical tests and algorithms to detect gene pairs and modules exhibiting intriguing patterns of co-localization, both within individual cells and across the cellular landscape. We showcase the toolkit on five different datasets representing two different cell lines, two brain structures, two species, and three different technologies. We perform rigorous statistical assessment of discovered co-localization patterns, find supporting evidence from databases and RNA interactions, and identify associated subcellular domains. We uncover several cell type and region-specific gene co-localizations within the brain. Intra-cellular spatial patterns discovered by InSTAnT mirror diverse molecular relationships, including RNA interactions and shared sub-cellular localization or function, providing a rich compendium of testable hypotheses regarding molecular functions.


Asunto(s)
Algoritmos , Encéfalo , Perfilación de la Expresión Génica , Hibridación Fluorescente in Situ , Transcriptoma , Perfilación de la Expresión Génica/métodos , Humanos , Hibridación Fluorescente in Situ/métodos , Animales , Encéfalo/metabolismo , Ratones , Biología Computacional/métodos , ARN/genética , ARN/metabolismo , Programas Informáticos , Línea Celular
2.
Stat Med ; 43(19): 3742-3758, 2024 Aug 30.
Artículo en Inglés | MEDLINE | ID: mdl-38897921

RESUMEN

Biomarkers are often measured in bulk to diagnose patients, monitor patient conditions, and research novel drug pathways. The measurement of these biomarkers often suffers from detection limits that result in missing and untrustworthy measurements. Frequently, missing biomarkers are imputed so that down-stream analysis can be conducted with modern statistical methods that cannot normally handle data subject to informative censoring. This work develops an empirical Bayes g $$ g $$ -modeling method for imputing and denoising biomarker measurements. We establish superior estimation properties compared to popular methods in simulations and with real data, providing the useful biomarker measurement estimations for down-stream analysis.


Asunto(s)
Teorema de Bayes , Biomarcadores , Simulación por Computador , Humanos , Biomarcadores/análisis , Modelos Estadísticos , Estadísticas no Paramétricas , Interpretación Estadística de Datos
3.
Stat Med ; 43(1): 1-15, 2024 01 15.
Artículo en Inglés | MEDLINE | ID: mdl-37875428

RESUMEN

Wide heterogeneity exists in cancer patients' survival, ranging from a few months to several decades. To accurately predict clinical outcomes, it is vital to build an accurate predictive model that relates the patients' molecular profiles with the patients' survival. With complex relationships between survival and high-dimensional molecular predictors, it is challenging to conduct nonparametric modeling and irrelevant predictors removing simultaneously. In this article, we build a kernel Cox proportional hazards semi-parametric model and propose a novel regularized garrotized kernel machine (RegGKM) method to fit the model. We use the kernel machine method to describe the complex relationship between survival and predictors, while automatically removing irrelevant parametric and nonparametric predictors through a LASSO penalty. An efficient high-dimensional algorithm is developed for the proposed method. Comparison with other competing methods in simulation shows that the proposed method always has better predictive accuracy. We apply this method to analyze a multiple myeloma dataset and predict the patients' death burden based on their gene expressions. Our results can help classify patients into groups with different death risks, facilitating treatment for better clinical outcomes.


Asunto(s)
Algoritmos , Neoplasias , Humanos , Modelos Lineales , Modelos de Riesgos Proporcionales , Simulación por Computador , Neoplasias/genética
4.
Nat Ecol Evol ; 7(8): 1232-1244, 2023 08.
Artículo en Inglés | MEDLINE | ID: mdl-37264201

RESUMEN

Understanding how genotypic variation results in phenotypic variation is especially difficult for collective behaviour because group phenotypes arise from complex interactions among group members. A genome-wide association study identified hundreds of genes associated with colony-level variation in honeybee aggression, many of which also showed strong signals of positive selection, but the influence of these 'colony aggression genes' on brain function was unknown. Here we use single-cell (sc) transcriptomics and gene regulatory network (GRN) analyses to test the hypothesis that genetic variation for colony aggression influences individual differences in brain gene expression and/or gene regulation. We compared soldiers, which respond to territorial intrusion with stinging attacks, and foragers, which do not. Colony environment showed stronger influences on soldier-forager differences in brain gene regulation compared with brain gene expression. GRN plasticity was strongly associated with colony aggression, with larger differences in GRN dynamics detected between soldiers and foragers from more aggressive relative to less aggressive colonies. The regulatory dynamics of subnetworks composed of genes associated with colony aggression genes were more strongly correlated with each other across different cell types and brain regions relative to other genes, especially in brain regions involved with olfaction and vision and multimodal sensory integration, which are known to mediate bee aggression. These results show how group genetics can shape a collective phenotype by modulating individual brain gene regulatory network architecture.


Asunto(s)
Agresión , Abejas , Conducta Animal , Estudio de Asociación del Genoma Completo , Animales , Agresión/fisiología , Abejas/genética , Encéfalo/fisiología , Regulación de la Expresión Génica , Redes Reguladoras de Genes
5.
Res Sq ; 2023 Jan 27.
Artículo en Inglés | MEDLINE | ID: mdl-36747718

RESUMEN

Imaging-based spatial transcriptomics technologies such as MERFISH offer snapshots of cellular processes in unprecedented detail, but new analytic tools are needed to realize their full potential. We present InSTAnT, a computational toolkit for extracting molecular relationships from spatial transcriptomics data at the intra-cellular resolution. InSTAnT detects gene pairs and modules with interesting patterns of mutual co-localization within and across cells, using specialized statistical tests and graph mining. We showcase the toolkit on datasets profiling a human cancer cell line and hypothalamic preoptic region of mouse brain. We performed rigorous statistical assessment of discovered co-localization patterns, found supporting evidence from databases and RNA interactions, and identified subcellular domains associated with RNA-colocalization. We identified several novel cell type-specific gene co-localizations in the brain. Intra-cellular spatial patterns discovered by InSTAnT mirror diverse molecular relationships, including RNA interactions and shared sub-cellular localization or function, providing a rich compendium of testable hypotheses regarding molecular functions.

6.
Biometrics ; 79(2): 1201-1212, 2023 06.
Artículo en Inglés | MEDLINE | ID: mdl-35499364

RESUMEN

Covariance matrix estimation is a fundamental statistical task in many applications, but the sample covariance matrix is suboptimal when the sample size is comparable to or less than the number of features. Such high-dimensional settings are common in modern genomics, where covariance matrix estimation is frequently employed as a method for inferring gene networks. To achieve estimation accuracy in these settings, existing methods typically either assume that the population covariance matrix has some particular structure, for example, sparsity, or apply shrinkage to better estimate the population eigenvalues. In this paper, we study a new approach to estimating high-dimensional covariance matrices. We first frame covariance matrix estimation as a compound decision problem. This motivates defining a class of decision rules and using a nonparametric empirical Bayes g-modeling approach to estimate the optimal rule in the class. Simulation results and gene network inference in an RNA-seq experiment in mouse show that our approach is comparable to or can outperform a number of state-of-the-art proposals.


Asunto(s)
Redes Reguladoras de Genes , Genómica , Animales , Ratones , Teorema de Bayes , Simulación por Computador , Tamaño de la Muestra
7.
Curr Res Food Sci ; 5: 1452-1464, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-36119372

RESUMEN

Chocolate is a product of the fermentation of cacao beans. Performed on-farm or at local cooperatives, these are spontaneous cacao fermentations (SCFs). To better understand SCFs, this study sought to identify SCF microbes, their interrelationships, and other key parameters that influence fermentation. This is important because differences in fermentation can have an impact on final product quality. In this study, a systematic data extraction was performed, searching for literature that identified microbes from SCFs. Each unique microbe, whether by location or by fermentation material, was extracted from the articles, along with parameters associated with fermentation. Data were collected and analyzed for three interactions: microbe-to-geography, microbe-to-fermentation method, and microbe-to-microbe. The goal was to attribute microbes to geographical locations, fermentation materials, or to other microbes. Statistically significant relationships will reveal target areas for future research. Over 1700 microbes (440 unique species) were identified across 60 articles. The top three countries represented are Brazil (22 articles, n = 612 microbes), the Ivory Coast (14 articles, n = 237), and Ghana (10 articles, n = 257). Several countries were far less, or never represented, and should be considered for future research. No specific relationship was identified with microbes to either geographical location or fermentation method. Using a Presence-Absence chart, 127 microbe-to-microbe interactions were identified as statistically significant. Data extraction into SCF research has revealed major gaps of knowledge for the cacao microbiome. By better understanding the cacao microbiome, researchers will be able to identify key microbes and fermentation parameters to better influence the fermentation.

8.
Nat Commun ; 13(1): 1247, 2022 03 10.
Artículo en Inglés | MEDLINE | ID: mdl-35273186

RESUMEN

During development, neural progenitors are temporally patterned to sequentially generate a variety of neural types. In Drosophila neural progenitors called neuroblasts, temporal patterning is regulated by cascades of Temporal Transcription Factors (TTFs). However, known TTFs were mostly identified through candidate approaches and may not be complete. In addition, many fundamental questions remain concerning the TTF cascade initiation, progression, and termination. In this work, we use single-cell RNA sequencing of Drosophila medulla neuroblasts of all ages to identify a list of previously unknown TTFs, and experimentally characterize their roles in temporal patterning and neuronal specification. Our study reveals a comprehensive temporal gene network that patterns medulla neuroblasts from start to end. Furthermore, the speed of the cascade progression is regulated by Lola transcription factors expressed in all medulla neuroblasts. Our comprehensive study of the medulla neuroblast temporal cascade illustrates mechanisms that may be conserved in the temporal patterning of neural progenitors.


Asunto(s)
Proteínas de Drosophila , Células-Madre Neurales , Animales , Drosophila/metabolismo , Proteínas de Drosophila/genética , Proteínas de Drosophila/metabolismo , Drosophila melanogaster/genética , Drosophila melanogaster/metabolismo , Regulación del Desarrollo de la Expresión Génica , Redes Reguladoras de Genes , Células-Madre Neurales/metabolismo , Análisis de Secuencia de ARN , Factores de Transcripción/genética
9.
Lifetime Data Anal ; 28(2): 282-318, 2022 04.
Artículo en Inglés | MEDLINE | ID: mdl-35239126

RESUMEN

For high dimensional gene expression data, one important goal is to identify a small number of genes that are associated with progression of the disease or survival of the patients. In this paper, we consider the problem of variable selection for multivariate survival data. We propose an estimation procedure for high dimensional accelerated failure time (AFT) models with bivariate censored data. The method extends the Buckley-James method by minimizing a penalized [Formula: see text] loss function with a penalty function induced from a bivariate spike-and-slab prior specification. In the proposed algorithm, censored observations are imputed using the Kaplan-Meier estimator, which avoids a parametric assumption on the error terms. Our empirical studies demonstrate that the proposed method provides better performance compared to the alternative procedures designed for univariate survival data regardless of whether the true events are correlated or not, and conceptualizes a formal way of handling bivariate survival data for AFT models. Findings from the analysis of a myeloma clinical trial using the proposed method are also presented.


Asunto(s)
Algoritmos , Teorema de Bayes , Humanos , Análisis de Supervivencia
10.
Stat Med ; 41(12): 2227-2246, 2022 05 30.
Artículo en Inglés | MEDLINE | ID: mdl-35189671

RESUMEN

Clinical studies examining the effectiveness of a treatment with respect to some primary outcome often require long-term follow-up of patients and/or costly or burdensome measurements of the primary outcome of interest. Identifying a surrogate marker for the primary outcome of interest may allow one to evaluate a treatment effect with less follow-up time, less cost, or less burden. While much clinical and statistical work has focused on identifying and validating surrogate markers, available approaches tend to focus on settings in which only a single surrogate marker is of interest. Limited work has been done to accommodate the high-dimensional surrogate marker setting where the number of potential surrogates is greater than the sample size. In this article, we develop methods to estimate the proportion of treatment effect explained by high-dimensional surrogates. We study the asymptotic properties of our proposed estimator, propose inference procedures, and examine finite sample performance via a simulation study. We illustrate our proposed methods using data from a randomized study comparing a novel whey-based oral nutrition supplement with a standard supplement with respect to change in body fat percentage over 12 weeks, where the surrogate markers of interest are gene expression probesets.


Asunto(s)
Simulación por Computador , Biomarcadores , Humanos
11.
Clin Transl Sci ; 14(4): 1578-1589, 2021 07.
Artículo en Inglés | MEDLINE | ID: mdl-33786999

RESUMEN

Sepsis is a major cause of mortality among hospitalized patients worldwide. Shorter time to administration of broad-spectrum antibiotics is associated with improved outcomes, but early recognition of sepsis remains a major challenge. In a two-center cohort study with prospective sample collection from 1400 adult patients in emergency departments suspected of sepsis, we sought to determine the diagnostic and prognostic capabilities of a machine-learning algorithm based on clinical data and a set of uncommonly measured biomarkers. Specifically, we demonstrate that a machine-learning model developed using this dataset outputs a score with not only diagnostic capability but also prognostic power with respect to hospital length of stay (LOS), 30-day mortality, and 3-day inpatient re-admission both in our entire testing cohort and various subpopulations. The area under the receiver operating curve (AUROC) for diagnosis of sepsis was 0.83. Predicted risk scores for patients with septic shock were higher compared with patients with sepsis but without shock (p < 0.0001). Scores for patients with infection and organ dysfunction were higher compared with those without either condition (p < 0.0001). Stratification based on predicted scores of the patients into low, medium, and high-risk groups showed significant differences in LOS (p < 0.0001), 30-day mortality (p < 0.0001), and 30-day inpatient readmission (p < 0.0001). In conclusion, a machine-learning algorithm based on electronic medical record (EMR) data and three nonroutinely measured biomarkers demonstrated good diagnostic and prognostic capability at the time of initial blood culture.


Asunto(s)
Diagnóstico Precoz , Registros Electrónicos de Salud/estadística & datos numéricos , Aprendizaje Automático , Sepsis/diagnóstico , Anciano , Área Bajo la Curva , Biomarcadores/sangre , Servicio de Urgencia en Hospital/estadística & datos numéricos , Femenino , Mortalidad Hospitalaria , Humanos , Tiempo de Internación/estadística & datos numéricos , Masculino , Persona de Mediana Edad , Readmisión del Paciente/estadística & datos numéricos , Pronóstico , Estudios Prospectivos , Curva ROC , Sepsis/sangre , Sepsis/microbiología , Sepsis/mortalidad
12.
Biometrika ; 107(3): 573-589, 2020 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-32831353

RESUMEN

Mediation analysis is difficult when the number of potential mediators is larger than the sample size. In this paper we propose new inference procedures for the indirect effect in the presence of high-dimensional mediators for linear mediation models. We develop methods for both incomplete mediation, where a direct effect may exist, and complete mediation, where the direct effect is known to be absent. We prove consistency and asymptotic normality of our indirect effect estimators. Under complete mediation, where the indirect effect is equivalent to the total effect, we further prove that our approach gives a more powerful test compared to directly testing for the total effect. We confirm our theoretical results in simulations, as well as in an integrative analysis of gene expression and genotype data from a pharmacogenomic study of drug response. We present a novel analysis of gene sets to understand the molecular mechanisms of drug response, and also identify a genome-wide significant noncoding genetic variant that cannot be detected using standard analysis methods.

13.
Proc Natl Acad Sci U S A ; 117(29): 17135-17141, 2020 07 21.
Artículo en Inglés | MEDLINE | ID: mdl-32631983

RESUMEN

For social animals, the genotypes of group members affect the social environment, and thus individual behavior, often indirectly. We used genome-wide association studies (GWAS) to determine the influence of individual vs. group genotypes on aggression in honey bees. Aggression in honey bees arises from the coordinated actions of colony members, primarily nonreproductive "soldier" bees, and thus, experiences evolutionary selection at the colony level. Here, we show that individual behavior is influenced by colony environment, which in turn, is shaped by allele frequency within colonies. Using a population with a range of aggression, we sequenced individual whole genomes and looked for genotype-behavior associations within colonies in a common environment. There were no significant correlations between individual aggression and specific alleles. By contrast, we found strong correlations between colony aggression and the frequencies of specific alleles within colonies, despite a small number of colonies. Associations at the colony level were highly significant and were very similar among both soldiers and foragers, but they covaried with one another. One strongly significant association peak, containing an ortholog of the Drosophila sensory gene dpr4 on linkage group (chromosome) 7, showed strong signals of both selection and admixture during the evolution of gentleness in a honey bee population. We thus found links between colony genetics and group behavior and also, molecular evidence for group-level selection, acting at the colony level. We conclude that group genetics dominates individual genetics in determining the fatal decision of honey bees to sting.


Asunto(s)
Agresión , Abejas/genética , Frecuencia de los Genes/genética , Genoma de los Insectos/genética , Animales , Estudio de Asociación del Genoma Completo , Polimorfismo de Nucleótido Simple/genética , Conducta Social
15.
Sci Rep ; 8(1): 6620, 2018 04 26.
Artículo en Inglés | MEDLINE | ID: mdl-29700343

RESUMEN

Cross-validation (CV) is a technique to assess the generalizability of a model to unseen data. This technique relies on assumptions that may not be satisfied when studying genomics datasets. For example, random CV (RCV) assumes that a randomly selected set of samples, the test set, well represents unseen data. This assumption doesn't hold true where samples are obtained from different experimental conditions, and the goal is to learn regulatory relationships among the genes that generalize beyond the observed conditions. In this study, we investigated how the CV procedure affects the assessment of supervised learning methods used to learn gene regulatory networks (or in other applications). We compared the performance of a regression-based method for gene expression prediction estimated using RCV with that estimated using a clustering-based CV (CCV) procedure. Our analysis illustrates that RCV can produce over-optimistic estimates of the model's generalizability compared to CCV. Next, we defined the 'distinctness' of test set from training set and showed that this measure is predictive of performance of the regression method. Finally, we introduced a simulated annealing method to construct partitions with gradually increasing distinctness and showed that performance of different gene expression prediction methods can be better evaluated using this method.


Asunto(s)
Redes Reguladoras de Genes , Genómica , Modelos Biológicos , Algoritmos , Biología Computacional/métodos , Perfilación de la Expresión Génica , Genómica/métodos , Humanos , Neoplasias/genética , Reproducibilidad de los Resultados
16.
Stat Interface ; 11(4): 573-580, 2018.
Artículo en Inglés | MEDLINE | ID: mdl-30815051

RESUMEN

A key step in pharmacogenomic studies is the development of accurate prediction models for drug response based on individuals' genomic information. Recent interest has centered on semiparametric models based on kernel machine regression, which can flexibly model the complex relationships between gene expression and drug response. However, performance suffers if irrelevant covariates are unknowingly included when training the model. We propose a new semiparametric regression procedure, based on a novel penalized garrotized kernel machine (PGKM), which can better adapt to the presence of irrelevant covariates while still allowing for a complex nonlinear model and gene-gene interactions. We study the performance of our approach in simulations and in a pharmacogenomic study of the renal carcinoma drug temsirolimus. Our method predicts plasma concentration of temsirolimus as well as standard kernel machine regression when no irrelevant covariates are included in training, but has much higher prediction accuracy when the truly important covariates are not known in advance. Supplemental materials, including R code used in this manuscript, are available online.

17.
J Multivar Anal ; 160: 169-184, 2017 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-29203948

RESUMEN

It is frequently of interest to jointly analyze two paired sequences of multiple tests. This paper studies the problem of detecting whether there are more pairs of tests that are significant in both sequences than would be expected by chance. The asymptotic detection boundary is derived in terms of parameters such as the sparsity of non-null cases in each sequence, the effect sizes of the signals, and the magnitude of the dependence between the two sequences. A new test for detecting weak dependence is also proposed, shown to be asymptotically adaptively optimal, studied in simulations, and applied to study genetic pleiotropy in 10 pediatric autoimmune diseases.

18.
PLoS Genet ; 13(7): e1006840, 2017 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-28704398

RESUMEN

Animals exhibit dramatic immediate behavioral plasticity in response to social interactions, and brief social interactions can shape the future social landscape. However, the molecular mechanisms contributing to behavioral plasticity are unclear. Here, we show that the genome dynamically responds to social interactions with multiple waves of transcription associated with distinct molecular functions in the brain of male threespined sticklebacks, a species famous for its behavioral repertoire and evolution. Some biological functions (e.g., hormone activity) peaked soon after a brief territorial challenge and then declined, while others (e.g., immune response) peaked hours afterwards. We identify transcription factors that are predicted to coordinate waves of transcription associated with different components of behavioral plasticity. Next, using H3K27Ac as a marker of chromatin accessibility, we show that a brief territorial intrusion was sufficient to cause rapid and dramatic changes in the epigenome. Finally, we integrate the time course brain gene expression data with a transcriptional regulatory network, and link gene expression to changes in chromatin accessibility. This study reveals rapid and dramatic epigenomic plasticity in response to a brief, highly consequential social interaction.


Asunto(s)
Conducta Animal/fisiología , Plasticidad Neuronal/genética , Smegmamorpha/genética , Conducta Social , Transcripción Genética , Animales , Evolución Biológica , Cerebro/fisiología , Cromatina/genética , Diencéfalo/fisiología , Epigenómica , Genoma , Análisis de Secuencia de ARN , Smegmamorpha/fisiología , Factores de Transcripción/genética
19.
Genome Res ; 27(6): 959-972, 2017 06.
Artículo en Inglés | MEDLINE | ID: mdl-28356321

RESUMEN

Agonistic encounters are powerful effectors of future behavior, and the ability to learn from this type of social challenge is an essential adaptive trait. We recently identified a conserved transcriptional program defining the response to social challenge across animal species, highly enriched in transcription factor (TF), energy metabolism, and developmental signaling genes. To understand the trajectory of this program and to uncover the most important regulatory influences controlling this response, we integrated gene expression data with the chromatin landscape in the hypothalamus, frontal cortex, and amygdala of socially challenged mice over time. The expression data revealed a complex spatiotemporal patterning of events starting with neural signaling molecules in the frontal cortex and ending in the modulation of developmental factors in the amygdala and hypothalamus, underpinned by a systems-wide shift in expression of energy metabolism-related genes. The transcriptional signals were correlated with significant shifts in chromatin accessibility and a network of challenge-associated TFs. Among these, the conserved metabolic and developmental regulator ESRRA was highlighted for an especially early and important regulatory role. Cell-type deconvolution analysis attributed the differential metabolic and developmental signals in this social context primarily to oligodendrocytes and neurons, respectively, and we show that ESRRA is expressed in both cell types. Localizing ESRRA binding sites in cortical chromatin, we show that this nuclear receptor binds both differentially expressed energy-related and neurodevelopmental TF genes. These data link metabolic and neurodevelopmental signaling to social challenge, and identify key regulatory drivers of this process with unprecedented tissue and temporal resolution.


Asunto(s)
Cromatina/metabolismo , Regulación del Desarrollo de la Expresión Génica , Neuronas/metabolismo , Receptores de Estrógenos/genética , Estrés Psicológico/genética , Factores de Transcripción/genética , Conducta Agonística , Amígdala del Cerebelo/metabolismo , Amígdala del Cerebelo/fisiopatología , Animales , Cromatina/ultraestructura , Metabolismo Energético/genética , Lóbulo Frontal/metabolismo , Lóbulo Frontal/fisiopatología , Perfilación de la Expresión Génica , Redes Reguladoras de Genes , Hipotálamo/metabolismo , Hipotálamo/fisiopatología , Masculino , Ratones , Neuronas/citología , Oligodendroglía/citología , Oligodendroglía/metabolismo , Unión Proteica , Receptores de Estrógenos/metabolismo , Transducción de Señal , Estrés Psicológico/metabolismo , Estrés Psicológico/fisiopatología , Factores de Transcripción/metabolismo , Transcripción Genética , Receptor Relacionado con Estrógeno ERRalfa
20.
Biometrics ; 73(2): 582-592, 2017 06.
Artículo en Inglés | MEDLINE | ID: mdl-27792843

RESUMEN

Genetic risk prediction is an important component of individualized medicine, but prediction accuracies remain low for many complex diseases. A fundamental limitation is the sample sizes of the studies on which the prediction algorithms are trained. One way to increase the effective sample size is to integrate information from previously existing studies. However, it can be difficult to find existing data that examine the target disease of interest, especially if that disease is rare or poorly studied. Furthermore, individual-level genotype data from these auxiliary studies are typically difficult to obtain. This article proposes a new approach to integrative genetic risk prediction of complex diseases with binary phenotypes. It accommodates possible heterogeneity in the genetic etiologies of the target and auxiliary diseases using a tuning parameter-free non-parametric empirical Bayes procedure, and can be trained using only auxiliary summary statistics. Simulation studies show that the proposed method can provide superior predictive accuracy relative to non-integrative as well as integrative classifiers. The method is applied to a recent study of pediatric autoimmune diseases, where it substantially reduces prediction error for certain target/auxiliary disease combinations. The proposed method is implemented in the R package ssa.


Asunto(s)
Teorema de Bayes , Algoritmos , Biometría , Humanos , Factores de Riesgo , Tamaño de la Muestra
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA