Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 28
Filtrar
1.
Stat Med ; 43(19): 3742-3758, 2024 Aug 30.
Artigo em Inglês | MEDLINE | ID: mdl-38897921

RESUMO

Biomarkers are often measured in bulk to diagnose patients, monitor patient conditions, and research novel drug pathways. The measurement of these biomarkers often suffers from detection limits that result in missing and untrustworthy measurements. Frequently, missing biomarkers are imputed so that down-stream analysis can be conducted with modern statistical methods that cannot normally handle data subject to informative censoring. This work develops an empirical Bayes g $$ g $$ -modeling method for imputing and denoising biomarker measurements. We establish superior estimation properties compared to popular methods in simulations and with real data, providing the useful biomarker measurement estimations for down-stream analysis.


Assuntos
Teorema de Bayes , Biomarcadores , Simulação por Computador , Humanos , Biomarcadores/análise , Modelos Estatísticos , Estatísticas não Paramétricas , Interpretação Estatística de Dados
2.
Stat Med ; 43(1): 1-15, 2024 01 15.
Artigo em Inglês | MEDLINE | ID: mdl-37875428

RESUMO

Wide heterogeneity exists in cancer patients' survival, ranging from a few months to several decades. To accurately predict clinical outcomes, it is vital to build an accurate predictive model that relates the patients' molecular profiles with the patients' survival. With complex relationships between survival and high-dimensional molecular predictors, it is challenging to conduct nonparametric modeling and irrelevant predictors removing simultaneously. In this article, we build a kernel Cox proportional hazards semi-parametric model and propose a novel regularized garrotized kernel machine (RegGKM) method to fit the model. We use the kernel machine method to describe the complex relationship between survival and predictors, while automatically removing irrelevant parametric and nonparametric predictors through a LASSO penalty. An efficient high-dimensional algorithm is developed for the proposed method. Comparison with other competing methods in simulation shows that the proposed method always has better predictive accuracy. We apply this method to analyze a multiple myeloma dataset and predict the patients' death burden based on their gene expressions. Our results can help classify patients into groups with different death risks, facilitating treatment for better clinical outcomes.


Assuntos
Algoritmos , Neoplasias , Humanos , Modelos Lineares , Modelos de Riscos Proporcionais , Simulação por Computador , Neoplasias/genética
3.
Nat Ecol Evol ; 7(8): 1232-1244, 2023 08.
Artigo em Inglês | MEDLINE | ID: mdl-37264201

RESUMO

Understanding how genotypic variation results in phenotypic variation is especially difficult for collective behaviour because group phenotypes arise from complex interactions among group members. A genome-wide association study identified hundreds of genes associated with colony-level variation in honeybee aggression, many of which also showed strong signals of positive selection, but the influence of these 'colony aggression genes' on brain function was unknown. Here we use single-cell (sc) transcriptomics and gene regulatory network (GRN) analyses to test the hypothesis that genetic variation for colony aggression influences individual differences in brain gene expression and/or gene regulation. We compared soldiers, which respond to territorial intrusion with stinging attacks, and foragers, which do not. Colony environment showed stronger influences on soldier-forager differences in brain gene regulation compared with brain gene expression. GRN plasticity was strongly associated with colony aggression, with larger differences in GRN dynamics detected between soldiers and foragers from more aggressive relative to less aggressive colonies. The regulatory dynamics of subnetworks composed of genes associated with colony aggression genes were more strongly correlated with each other across different cell types and brain regions relative to other genes, especially in brain regions involved with olfaction and vision and multimodal sensory integration, which are known to mediate bee aggression. These results show how group genetics can shape a collective phenotype by modulating individual brain gene regulatory network architecture.


Assuntos
Agressão , Abelhas , Comportamento Animal , Estudo de Associação Genômica Ampla , Animais , Agressão/fisiologia , Abelhas/genética , Encéfalo/fisiologia , Regulação da Expressão Gênica , Redes Reguladoras de Genes
4.
Res Sq ; 2023 Jan 27.
Artigo em Inglês | MEDLINE | ID: mdl-36747718

RESUMO

Imaging-based spatial transcriptomics technologies such as MERFISH offer snapshots of cellular processes in unprecedented detail, but new analytic tools are needed to realize their full potential. We present InSTAnT, a computational toolkit for extracting molecular relationships from spatial transcriptomics data at the intra-cellular resolution. InSTAnT detects gene pairs and modules with interesting patterns of mutual co-localization within and across cells, using specialized statistical tests and graph mining. We showcase the toolkit on datasets profiling a human cancer cell line and hypothalamic preoptic region of mouse brain. We performed rigorous statistical assessment of discovered co-localization patterns, found supporting evidence from databases and RNA interactions, and identified subcellular domains associated with RNA-colocalization. We identified several novel cell type-specific gene co-localizations in the brain. Intra-cellular spatial patterns discovered by InSTAnT mirror diverse molecular relationships, including RNA interactions and shared sub-cellular localization or function, providing a rich compendium of testable hypotheses regarding molecular functions.

5.
Biometrics ; 79(2): 1201-1212, 2023 06.
Artigo em Inglês | MEDLINE | ID: mdl-35499364

RESUMO

Covariance matrix estimation is a fundamental statistical task in many applications, but the sample covariance matrix is suboptimal when the sample size is comparable to or less than the number of features. Such high-dimensional settings are common in modern genomics, where covariance matrix estimation is frequently employed as a method for inferring gene networks. To achieve estimation accuracy in these settings, existing methods typically either assume that the population covariance matrix has some particular structure, for example, sparsity, or apply shrinkage to better estimate the population eigenvalues. In this paper, we study a new approach to estimating high-dimensional covariance matrices. We first frame covariance matrix estimation as a compound decision problem. This motivates defining a class of decision rules and using a nonparametric empirical Bayes g-modeling approach to estimate the optimal rule in the class. Simulation results and gene network inference in an RNA-seq experiment in mouse show that our approach is comparable to or can outperform a number of state-of-the-art proposals.


Assuntos
Redes Reguladoras de Genes , Genômica , Animais , Camundongos , Teorema de Bayes , Simulação por Computador , Tamanho da Amostra
6.
Curr Res Food Sci ; 5: 1452-1464, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-36119372

RESUMO

Chocolate is a product of the fermentation of cacao beans. Performed on-farm or at local cooperatives, these are spontaneous cacao fermentations (SCFs). To better understand SCFs, this study sought to identify SCF microbes, their interrelationships, and other key parameters that influence fermentation. This is important because differences in fermentation can have an impact on final product quality. In this study, a systematic data extraction was performed, searching for literature that identified microbes from SCFs. Each unique microbe, whether by location or by fermentation material, was extracted from the articles, along with parameters associated with fermentation. Data were collected and analyzed for three interactions: microbe-to-geography, microbe-to-fermentation method, and microbe-to-microbe. The goal was to attribute microbes to geographical locations, fermentation materials, or to other microbes. Statistically significant relationships will reveal target areas for future research. Over 1700 microbes (440 unique species) were identified across 60 articles. The top three countries represented are Brazil (22 articles, n = 612 microbes), the Ivory Coast (14 articles, n = 237), and Ghana (10 articles, n = 257). Several countries were far less, or never represented, and should be considered for future research. No specific relationship was identified with microbes to either geographical location or fermentation method. Using a Presence-Absence chart, 127 microbe-to-microbe interactions were identified as statistically significant. Data extraction into SCF research has revealed major gaps of knowledge for the cacao microbiome. By better understanding the cacao microbiome, researchers will be able to identify key microbes and fermentation parameters to better influence the fermentation.

7.
Nat Commun ; 13(1): 1247, 2022 03 10.
Artigo em Inglês | MEDLINE | ID: mdl-35273186

RESUMO

During development, neural progenitors are temporally patterned to sequentially generate a variety of neural types. In Drosophila neural progenitors called neuroblasts, temporal patterning is regulated by cascades of Temporal Transcription Factors (TTFs). However, known TTFs were mostly identified through candidate approaches and may not be complete. In addition, many fundamental questions remain concerning the TTF cascade initiation, progression, and termination. In this work, we use single-cell RNA sequencing of Drosophila medulla neuroblasts of all ages to identify a list of previously unknown TTFs, and experimentally characterize their roles in temporal patterning and neuronal specification. Our study reveals a comprehensive temporal gene network that patterns medulla neuroblasts from start to end. Furthermore, the speed of the cascade progression is regulated by Lola transcription factors expressed in all medulla neuroblasts. Our comprehensive study of the medulla neuroblast temporal cascade illustrates mechanisms that may be conserved in the temporal patterning of neural progenitors.


Assuntos
Proteínas de Drosophila , Células-Tronco Neurais , Animais , Drosophila/metabolismo , Proteínas de Drosophila/genética , Proteínas de Drosophila/metabolismo , Drosophila melanogaster/genética , Drosophila melanogaster/metabolismo , Regulação da Expressão Gênica no Desenvolvimento , Redes Reguladoras de Genes , Células-Tronco Neurais/metabolismo , Análise de Sequência de RNA , Fatores de Transcrição/genética
8.
Lifetime Data Anal ; 28(2): 282-318, 2022 04.
Artigo em Inglês | MEDLINE | ID: mdl-35239126

RESUMO

For high dimensional gene expression data, one important goal is to identify a small number of genes that are associated with progression of the disease or survival of the patients. In this paper, we consider the problem of variable selection for multivariate survival data. We propose an estimation procedure for high dimensional accelerated failure time (AFT) models with bivariate censored data. The method extends the Buckley-James method by minimizing a penalized [Formula: see text] loss function with a penalty function induced from a bivariate spike-and-slab prior specification. In the proposed algorithm, censored observations are imputed using the Kaplan-Meier estimator, which avoids a parametric assumption on the error terms. Our empirical studies demonstrate that the proposed method provides better performance compared to the alternative procedures designed for univariate survival data regardless of whether the true events are correlated or not, and conceptualizes a formal way of handling bivariate survival data for AFT models. Findings from the analysis of a myeloma clinical trial using the proposed method are also presented.


Assuntos
Algoritmos , Teorema de Bayes , Humanos , Análise de Sobrevida
9.
Stat Med ; 41(12): 2227-2246, 2022 05 30.
Artigo em Inglês | MEDLINE | ID: mdl-35189671

RESUMO

Clinical studies examining the effectiveness of a treatment with respect to some primary outcome often require long-term follow-up of patients and/or costly or burdensome measurements of the primary outcome of interest. Identifying a surrogate marker for the primary outcome of interest may allow one to evaluate a treatment effect with less follow-up time, less cost, or less burden. While much clinical and statistical work has focused on identifying and validating surrogate markers, available approaches tend to focus on settings in which only a single surrogate marker is of interest. Limited work has been done to accommodate the high-dimensional surrogate marker setting where the number of potential surrogates is greater than the sample size. In this article, we develop methods to estimate the proportion of treatment effect explained by high-dimensional surrogates. We study the asymptotic properties of our proposed estimator, propose inference procedures, and examine finite sample performance via a simulation study. We illustrate our proposed methods using data from a randomized study comparing a novel whey-based oral nutrition supplement with a standard supplement with respect to change in body fat percentage over 12 weeks, where the surrogate markers of interest are gene expression probesets.


Assuntos
Simulação por Computador , Biomarcadores , Humanos
10.
Clin Transl Sci ; 14(4): 1578-1589, 2021 07.
Artigo em Inglês | MEDLINE | ID: mdl-33786999

RESUMO

Sepsis is a major cause of mortality among hospitalized patients worldwide. Shorter time to administration of broad-spectrum antibiotics is associated with improved outcomes, but early recognition of sepsis remains a major challenge. In a two-center cohort study with prospective sample collection from 1400 adult patients in emergency departments suspected of sepsis, we sought to determine the diagnostic and prognostic capabilities of a machine-learning algorithm based on clinical data and a set of uncommonly measured biomarkers. Specifically, we demonstrate that a machine-learning model developed using this dataset outputs a score with not only diagnostic capability but also prognostic power with respect to hospital length of stay (LOS), 30-day mortality, and 3-day inpatient re-admission both in our entire testing cohort and various subpopulations. The area under the receiver operating curve (AUROC) for diagnosis of sepsis was 0.83. Predicted risk scores for patients with septic shock were higher compared with patients with sepsis but without shock (p < 0.0001). Scores for patients with infection and organ dysfunction were higher compared with those without either condition (p < 0.0001). Stratification based on predicted scores of the patients into low, medium, and high-risk groups showed significant differences in LOS (p < 0.0001), 30-day mortality (p < 0.0001), and 30-day inpatient readmission (p < 0.0001). In conclusion, a machine-learning algorithm based on electronic medical record (EMR) data and three nonroutinely measured biomarkers demonstrated good diagnostic and prognostic capability at the time of initial blood culture.


Assuntos
Diagnóstico Precoce , Registros Eletrônicos de Saúde/estatística & dados numéricos , Aprendizado de Máquina , Sepse/diagnóstico , Idoso , Área Sob a Curva , Biomarcadores/sangue , Serviço Hospitalar de Emergência/estatística & dados numéricos , Feminino , Mortalidade Hospitalar , Humanos , Tempo de Internação/estatística & dados numéricos , Masculino , Pessoa de Meia-Idade , Readmissão do Paciente/estatística & dados numéricos , Prognóstico , Estudos Prospectivos , Curva ROC , Sepse/sangue , Sepse/microbiologia , Sepse/mortalidade
11.
Biometrika ; 107(3): 573-589, 2020 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-32831353

RESUMO

Mediation analysis is difficult when the number of potential mediators is larger than the sample size. In this paper we propose new inference procedures for the indirect effect in the presence of high-dimensional mediators for linear mediation models. We develop methods for both incomplete mediation, where a direct effect may exist, and complete mediation, where the direct effect is known to be absent. We prove consistency and asymptotic normality of our indirect effect estimators. Under complete mediation, where the indirect effect is equivalent to the total effect, we further prove that our approach gives a more powerful test compared to directly testing for the total effect. We confirm our theoretical results in simulations, as well as in an integrative analysis of gene expression and genotype data from a pharmacogenomic study of drug response. We present a novel analysis of gene sets to understand the molecular mechanisms of drug response, and also identify a genome-wide significant noncoding genetic variant that cannot be detected using standard analysis methods.

12.
Proc Natl Acad Sci U S A ; 117(29): 17135-17141, 2020 07 21.
Artigo em Inglês | MEDLINE | ID: mdl-32631983

RESUMO

For social animals, the genotypes of group members affect the social environment, and thus individual behavior, often indirectly. We used genome-wide association studies (GWAS) to determine the influence of individual vs. group genotypes on aggression in honey bees. Aggression in honey bees arises from the coordinated actions of colony members, primarily nonreproductive "soldier" bees, and thus, experiences evolutionary selection at the colony level. Here, we show that individual behavior is influenced by colony environment, which in turn, is shaped by allele frequency within colonies. Using a population with a range of aggression, we sequenced individual whole genomes and looked for genotype-behavior associations within colonies in a common environment. There were no significant correlations between individual aggression and specific alleles. By contrast, we found strong correlations between colony aggression and the frequencies of specific alleles within colonies, despite a small number of colonies. Associations at the colony level were highly significant and were very similar among both soldiers and foragers, but they covaried with one another. One strongly significant association peak, containing an ortholog of the Drosophila sensory gene dpr4 on linkage group (chromosome) 7, showed strong signals of both selection and admixture during the evolution of gentleness in a honey bee population. We thus found links between colony genetics and group behavior and also, molecular evidence for group-level selection, acting at the colony level. We conclude that group genetics dominates individual genetics in determining the fatal decision of honey bees to sting.


Assuntos
Agressão , Abelhas/genética , Frequência do Gene/genética , Genoma de Inseto/genética , Animais , Estudo de Associação Genômica Ampla , Polimorfismo de Nucleotídeo Único/genética , Comportamento Social
14.
Sci Rep ; 8(1): 6620, 2018 04 26.
Artigo em Inglês | MEDLINE | ID: mdl-29700343

RESUMO

Cross-validation (CV) is a technique to assess the generalizability of a model to unseen data. This technique relies on assumptions that may not be satisfied when studying genomics datasets. For example, random CV (RCV) assumes that a randomly selected set of samples, the test set, well represents unseen data. This assumption doesn't hold true where samples are obtained from different experimental conditions, and the goal is to learn regulatory relationships among the genes that generalize beyond the observed conditions. In this study, we investigated how the CV procedure affects the assessment of supervised learning methods used to learn gene regulatory networks (or in other applications). We compared the performance of a regression-based method for gene expression prediction estimated using RCV with that estimated using a clustering-based CV (CCV) procedure. Our analysis illustrates that RCV can produce over-optimistic estimates of the model's generalizability compared to CCV. Next, we defined the 'distinctness' of test set from training set and showed that this measure is predictive of performance of the regression method. Finally, we introduced a simulated annealing method to construct partitions with gradually increasing distinctness and showed that performance of different gene expression prediction methods can be better evaluated using this method.


Assuntos
Redes Reguladoras de Genes , Genômica , Modelos Biológicos , Algoritmos , Biologia Computacional/métodos , Perfilação da Expressão Gênica , Genômica/métodos , Humanos , Neoplasias/genética , Reprodutibilidade dos Testes
15.
Stat Interface ; 11(4): 573-580, 2018.
Artigo em Inglês | MEDLINE | ID: mdl-30815051

RESUMO

A key step in pharmacogenomic studies is the development of accurate prediction models for drug response based on individuals' genomic information. Recent interest has centered on semiparametric models based on kernel machine regression, which can flexibly model the complex relationships between gene expression and drug response. However, performance suffers if irrelevant covariates are unknowingly included when training the model. We propose a new semiparametric regression procedure, based on a novel penalized garrotized kernel machine (PGKM), which can better adapt to the presence of irrelevant covariates while still allowing for a complex nonlinear model and gene-gene interactions. We study the performance of our approach in simulations and in a pharmacogenomic study of the renal carcinoma drug temsirolimus. Our method predicts plasma concentration of temsirolimus as well as standard kernel machine regression when no irrelevant covariates are included in training, but has much higher prediction accuracy when the truly important covariates are not known in advance. Supplemental materials, including R code used in this manuscript, are available online.

16.
J Multivar Anal ; 160: 169-184, 2017 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-29203948

RESUMO

It is frequently of interest to jointly analyze two paired sequences of multiple tests. This paper studies the problem of detecting whether there are more pairs of tests that are significant in both sequences than would be expected by chance. The asymptotic detection boundary is derived in terms of parameters such as the sparsity of non-null cases in each sequence, the effect sizes of the signals, and the magnitude of the dependence between the two sequences. A new test for detecting weak dependence is also proposed, shown to be asymptotically adaptively optimal, studied in simulations, and applied to study genetic pleiotropy in 10 pediatric autoimmune diseases.

17.
PLoS Genet ; 13(7): e1006840, 2017 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-28704398

RESUMO

Animals exhibit dramatic immediate behavioral plasticity in response to social interactions, and brief social interactions can shape the future social landscape. However, the molecular mechanisms contributing to behavioral plasticity are unclear. Here, we show that the genome dynamically responds to social interactions with multiple waves of transcription associated with distinct molecular functions in the brain of male threespined sticklebacks, a species famous for its behavioral repertoire and evolution. Some biological functions (e.g., hormone activity) peaked soon after a brief territorial challenge and then declined, while others (e.g., immune response) peaked hours afterwards. We identify transcription factors that are predicted to coordinate waves of transcription associated with different components of behavioral plasticity. Next, using H3K27Ac as a marker of chromatin accessibility, we show that a brief territorial intrusion was sufficient to cause rapid and dramatic changes in the epigenome. Finally, we integrate the time course brain gene expression data with a transcriptional regulatory network, and link gene expression to changes in chromatin accessibility. This study reveals rapid and dramatic epigenomic plasticity in response to a brief, highly consequential social interaction.


Assuntos
Comportamento Animal/fisiologia , Plasticidade Neuronal/genética , Smegmamorpha/genética , Comportamento Social , Transcrição Gênica , Animais , Evolução Biológica , Cérebro/fisiologia , Cromatina/genética , Diencéfalo/fisiologia , Epigenômica , Genoma , Análise de Sequência de RNA , Smegmamorpha/fisiologia , Fatores de Transcrição/genética
18.
Genome Res ; 27(6): 959-972, 2017 06.
Artigo em Inglês | MEDLINE | ID: mdl-28356321

RESUMO

Agonistic encounters are powerful effectors of future behavior, and the ability to learn from this type of social challenge is an essential adaptive trait. We recently identified a conserved transcriptional program defining the response to social challenge across animal species, highly enriched in transcription factor (TF), energy metabolism, and developmental signaling genes. To understand the trajectory of this program and to uncover the most important regulatory influences controlling this response, we integrated gene expression data with the chromatin landscape in the hypothalamus, frontal cortex, and amygdala of socially challenged mice over time. The expression data revealed a complex spatiotemporal patterning of events starting with neural signaling molecules in the frontal cortex and ending in the modulation of developmental factors in the amygdala and hypothalamus, underpinned by a systems-wide shift in expression of energy metabolism-related genes. The transcriptional signals were correlated with significant shifts in chromatin accessibility and a network of challenge-associated TFs. Among these, the conserved metabolic and developmental regulator ESRRA was highlighted for an especially early and important regulatory role. Cell-type deconvolution analysis attributed the differential metabolic and developmental signals in this social context primarily to oligodendrocytes and neurons, respectively, and we show that ESRRA is expressed in both cell types. Localizing ESRRA binding sites in cortical chromatin, we show that this nuclear receptor binds both differentially expressed energy-related and neurodevelopmental TF genes. These data link metabolic and neurodevelopmental signaling to social challenge, and identify key regulatory drivers of this process with unprecedented tissue and temporal resolution.


Assuntos
Cromatina/metabolismo , Regulação da Expressão Gênica no Desenvolvimento , Neurônios/metabolismo , Receptores de Estrogênio/genética , Estresse Psicológico/genética , Fatores de Transcrição/genética , Comportamento Agonístico , Tonsila do Cerebelo/metabolismo , Tonsila do Cerebelo/fisiopatologia , Animais , Cromatina/ultraestrutura , Metabolismo Energético/genética , Lobo Frontal/metabolismo , Lobo Frontal/fisiopatologia , Perfilação da Expressão Gênica , Redes Reguladoras de Genes , Hipotálamo/metabolismo , Hipotálamo/fisiopatologia , Masculino , Camundongos , Neurônios/citologia , Oligodendroglia/citologia , Oligodendroglia/metabolismo , Ligação Proteica , Receptores de Estrogênio/metabolismo , Transdução de Sinais , Estresse Psicológico/metabolismo , Estresse Psicológico/fisiopatologia , Fatores de Transcrição/metabolismo , Transcrição Gênica , Receptor ERRalfa Relacionado ao Estrogênio
19.
J Am Stat Assoc ; 112(519): 1032-1046, 2017.
Artigo em Inglês | MEDLINE | ID: mdl-29375169

RESUMO

Genome-wide association studies (GWAS) and differential expression analyses have had limited success in finding genes that cause complex diseases such as heart failure (HF), a leading cause of death in the United States. This paper proposes a new statistical approach that integrates GWAS and expression quantitative trait loci (eQTL) data to identify important HF genes. For such genes, genetic variations that perturb its expression are also likely to influence disease risk. The proposed method thus tests for the presence of simultaneous signals: SNPs that are associated with the gene's expression as well as with disease. An analytic expression for the p-value is obtained, and the method is shown to be asymptotically adaptively optimal under certain conditions. It also allows the GWAS and eQTL data to be collected from different groups of subjects, enabling investigators to integrate public resources with their own data. Simulation experiments show that it can be more powerful than standard approaches and also robust to linkage disequilibrium between variants. The method is applied to an extensive analysis of HF genomics and identifies several genes with biological evidence for being functionally relevant in the etiology of HF. It is implemented in the R package ssa.

20.
Biometrics ; 73(2): 582-592, 2017 06.
Artigo em Inglês | MEDLINE | ID: mdl-27792843

RESUMO

Genetic risk prediction is an important component of individualized medicine, but prediction accuracies remain low for many complex diseases. A fundamental limitation is the sample sizes of the studies on which the prediction algorithms are trained. One way to increase the effective sample size is to integrate information from previously existing studies. However, it can be difficult to find existing data that examine the target disease of interest, especially if that disease is rare or poorly studied. Furthermore, individual-level genotype data from these auxiliary studies are typically difficult to obtain. This article proposes a new approach to integrative genetic risk prediction of complex diseases with binary phenotypes. It accommodates possible heterogeneity in the genetic etiologies of the target and auxiliary diseases using a tuning parameter-free non-parametric empirical Bayes procedure, and can be trained using only auxiliary summary statistics. Simulation studies show that the proposed method can provide superior predictive accuracy relative to non-integrative as well as integrative classifiers. The method is applied to a recent study of pediatric autoimmune diseases, where it substantially reduces prediction error for certain target/auxiliary disease combinations. The proposed method is implemented in the R package ssa.


Assuntos
Teorema de Bayes , Algoritmos , Biometria , Humanos , Fatores de Risco , Tamanho da Amostra
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA