Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 5 de 5
Filtrar
Más filtros

Base de datos
País/Región como asunto
Tipo del documento
Intervalo de año de publicación
1.
J Am Stat Assoc ; 119(546): 1274-1285, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-38948492

RESUMEN

Transfer learning provides a powerful tool for incorporating data from related studies into a target study of interest. In epidemiology and medical studies, the classification of a target disease could borrow information across other related diseases and populations. In this work, we consider transfer learning for high-dimensional generalized linear models (GLMs). A novel algorithm, TransHDGLM, that integrates data from the target study and the source studies is proposed. Minimax rate of convergence for estimation is established and the proposed estimator is shown to be rate-optimal. Statistical inference for the target regression coefficients is also studied. Asymptotic normality for a debiased estimator is established, which can be used for constructing coordinate-wise confidence intervals of the regression coefficients. Numerical studies show significant improvement in estimation and inference accuracy over GLMs that only use the target data. The proposed methods are applied to a real data study concerning the classification of colorectal cancer using gut microbiomes, and are shown to enhance the classification accuracy in comparison to methods that only use the target data.

2.
Biometrics ; 80(2)2024 Mar 27.
Artículo en Inglés | MEDLINE | ID: mdl-38801257

RESUMEN

To leverage the advancements in genome-wide association studies (GWAS) and quantitative trait loci (QTL) mapping for traits and molecular phenotypes to gain mechanistic understanding of the genetic regulation, biological researchers often investigate the expression QTLs (eQTLs) that colocalize with QTL or GWAS peaks. Our research is inspired by 2 such studies. One aims to identify the causal single nucleotide polymorphisms that are responsible for the phenotypic variation and whose effects can be explained by their impacts at the transcriptomic level in maize. The other study in mouse focuses on uncovering the cis-driver genes that induce phenotypic changes by regulating trans-regulated genes. Both studies can be formulated as mediation problems with potentially high-dimensional exposures, confounders, and mediators that seek to estimate the overall indirect effect (IE) for each exposure. In this paper, we propose MedDiC, a novel procedure to estimate the overall IE based on difference-in-coefficients approach. Our simulation studies find that MedDiC offers valid inference for the IE with higher power, shorter confidence intervals, and faster computing time than competing methods. We apply MedDiC to the 2 aforementioned motivating datasets and find that MedDiC yields reproducible outputs across the analysis of closely related traits, with results supported by external biological evidence. The code and additional information are available on our GitHub page (https://github.com/QiZhangStat/MedDiC).


Asunto(s)
Simulación por Computador , Estudio de Asociación del Genoma Completo , Análisis de Mediación , Polimorfismo de Nucleótido Simple , Sitios de Carácter Cuantitativo , Estudio de Asociación del Genoma Completo/estadística & datos numéricos , Animales , Ratones , Zea mays/genética , Fenotipo
3.
J Am Stat Assoc ; 118(543): 2171-2183, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-38143788

RESUMEN

Transfer learning for high-dimensional Gaussian graphical models (GGMs) is studied. The target GGM is estimated by incorporating the data from similar and related auxiliary studies, where the similarity between the target graph and each auxiliary graph is characterized by the sparsity of a divergence matrix. An estimation algorithm, Trans-CLIME, is proposed and shown to attain a faster convergence rate than the minimax rate in the single-task setting. Furthermore, we introduce a universal debiasing method that can be coupled with a range of initial graph estimators and can be analytically computed in one step. A debiased Trans-CLIME estimator is then constructed and is shown to be element-wise asymptotically normal. This fact is used to construct a multiple testing procedure for edge detection with false discovery rate control. The proposed estimation and multiple testing procedures demonstrate superior numerical performance in simulations and are applied to infer the gene networks in a target brain tissue by leveraging the gene expressions from multiple other brain tissues. A significant decrease in prediction errors and a significant increase in power for link detection are observed.

4.
J Appl Stat ; 49(16): 4181-4205, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-36353298

RESUMEN

This paper introduces a new class of efficient and debiased two-step shrinkage estimators for a linear regression model in the presence of multicollinearity. We derive the proposed estimators' mean square error and define the necessary and sufficient conditions for superiority over the existing estimators. In addition, we develop an algorithm for selecting the shrinkage parameters for the proposed estimators. The comparison of the new estimators versus the traditional ordinary least squares, ridge regression, Liu, and the two-parameter estimators is done by a matrix mean square error criterion. The Monte Carlo simulation results show the superiority of the proposed estimators under certain conditions. In the presence of high but imperfect multicollinearity, the two-step shrinkage estimators' performance is relatively better. Finally, two real-world chemical data are analyzed to demonstrate the advantages and the empirical relevance of our newly proposed estimators. It is shown that the standard errors and the estimated mean square error decrease substantially for the proposed estimator. Hence, the precision of the estimated parameters is increased, which of course is one of the main objectives of the practitioners.

5.
Proc Natl Acad Sci U S A ; 118(42)2021 10 19.
Artículo en Inglés | MEDLINE | ID: mdl-34642247

RESUMEN

This paper empirically examines how the opening of K-12 schools is associated with the spread of COVID-19 using county-level panel data in the United States. As preliminary evidence, our event-study analysis indicates that cases and deaths in counties with in-person or hybrid opening relative to those with remote opening substantially increased after the school opening date, especially for counties without any mask mandate for staff. Our main analysis uses a dynamic panel data model for case and death growth rates, where we control for dynamically evolving mitigation policies, past infection levels, and additive county-level and state-week "fixed" effects. This analysis shows that an increase in visits to both K-12 schools and colleges is associated with a subsequent increase in case and death growth rates. The estimates indicate that fully opening K-12 schools with in-person learning is associated with a 5 (SE = 2) percentage points increase in the growth rate of cases. We also find that the association of K-12 school visits or in-person school openings with case growth is stronger for counties that do not require staff to wear masks at schools. These findings support policies that promote masking and other precautionary measures at schools and giving vaccine priority to education workers.


Asunto(s)
COVID-19/epidemiología , COVID-19/transmisión , Regreso a la Escuela/estadística & datos numéricos , COVID-19/mortalidad , COVID-19/prevención & control , Humanos , Máscaras , Modelos Estadísticos , SARS-CoV-2 , Instituciones Académicas , Viaje , Estados Unidos/epidemiología
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA