Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 9 de 9
Filtrar
1.
Bioinformatics ; 32(3): 330-7, 2016 Feb 01.
Artigo em Inglês | MEDLINE | ID: mdl-26458888

RESUMO

MOTIVATION: In searching for genetic variants for complex diseases with deep sequencing data, genomic marker sets of high-dimensional genotypic data and sparse functional variants are quite common. Existing sequence association tests are incapable of identifying such marker sets or individual causal loci, although they appeared powerful to identify small marker sets with dense functional variants. In sequence association studies of admixed individuals, cryptic relatedness and population structure are known to confound the association analyses. METHOD: We here propose a unified marker wise test (uFineMap) to accurately localize causal loci and a unified high-dimensional set based test (uHDSet) to identify high-dimensional sparse associations in deep sequencing genomic data of multi-ethnic individuals with random relatedness. These two novel tests are based on scaled sparse linear mixed regressions with Lp (0 < p < 1) norm regularization. They jointly adjust for cryptic relatedness, population structure and other confounders to prevent false discoveries and improve statistical power for identifying promising individual markers and marker sets that harbor functional genetic variants of a complex trait. RESULTS: With large scale simulation data and real data analyses, the proposed tests appropriately controlled Type I error rates and appeared to be more powerful than several prominent methods. We illustrated their practical utilities by the applications to DNA sequence data of Framingham Heart Study for osteoporosis. The proposed tests identified 11 novel significant genes that were missed by the prominent famSKAT and GEMMA. In particular, four out of six most significant pathways identified by the uHDSet but missed by famSKAT have been reported to be related to BMD or osteoporosis in the literature. AVAILABILITY AND IMPLEMENTATION: The computational toolkit is available for academic use: https://sites.google.com/site/shaolongscode/home/uhdset CONTACT: wyp@tulane.edu SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Variação Genética , Estudo de Associação Genômica Ampla , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Análise de Sequência de DNA/métodos , Mapeamento Cromossômico , Genômica/métodos , Técnicas de Genotipagem , Humanos , Modelos Lineares , Osteoporose/genética , Fenótipo
3.
Clin Pharmacol Ther ; 115(4): 745-757, 2024 04.
Artigo em Inglês | MEDLINE | ID: mdl-37965805

RESUMO

In 2020, Novartis Pharmaceuticals Corporation and the U.S. Food and Drug Administration (FDA) started a 4-year scientific collaboration to approach complex new data modalities and advanced analytics. The scientific question was to find novel radio-genomics-based prognostic and predictive factors for HR+/HER- metastatic breast cancer under a Research Collaboration Agreement. This collaboration has been providing valuable insights to help successfully implement future scientific projects, particularly using artificial intelligence and machine learning. This tutorial aims to provide tangible guidelines for a multi-omics project that includes multidisciplinary expert teams, spanning across different institutions. We cover key ideas, such as "maintaining effective communication" and "following good data science practices," followed by the four steps of exploratory projects, namely (1) plan, (2) design, (3) develop, and (4) disseminate. We break each step into smaller concepts with strategies for implementation and provide illustrations from our collaboration to further give the readers actionable guidance.


Assuntos
Inteligência Artificial , Multiômica , Humanos , Aprendizado de Máquina , Genômica
4.
J Am Med Inform Assoc ; 29(5): 841-852, 2022 04 13.
Artigo em Inglês | MEDLINE | ID: mdl-35022756

RESUMO

OBJECTIVE: After deploying a clinical prediction model, subsequently collected data can be used to fine-tune its predictions and adapt to temporal shifts. Because model updating carries risks of over-updating/fitting, we study online methods with performance guarantees. MATERIALS AND METHODS: We introduce 2 procedures for continual recalibration or revision of an underlying prediction model: Bayesian logistic regression (BLR) and a Markov variant that explicitly models distribution shifts (MarBLR). We perform empirical evaluation via simulations and a real-world study predicting Chronic Obstructive Pulmonary Disease (COPD) risk. We derive "Type I and II" regret bounds, which guarantee the procedures are noninferior to a static model and competitive with an oracle logistic reviser in terms of the average loss. RESULTS: Both procedures consistently outperformed the static model and other online logistic revision methods. In simulations, the average estimated calibration index (aECI) of the original model was 0.828 (95%CI, 0.818-0.938). Online recalibration using BLR and MarBLR improved the aECI towards the ideal value of zero, attaining 0.265 (95%CI, 0.230-0.300) and 0.241 (95%CI, 0.216-0.266), respectively. When performing more extensive logistic model revisions, BLR and MarBLR increased the average area under the receiver-operating characteristic curve (aAUC) from 0.767 (95%CI, 0.765-0.769) to 0.800 (95%CI, 0.798-0.802) and 0.799 (95%CI, 0.797-0.801), respectively, in stationary settings and protected against substantial model decay. In the COPD study, BLR and MarBLR dynamically combined the original model with a continually refitted gradient boosted tree to achieve aAUCs of 0.924 (95%CI, 0.913-0.935) and 0.925 (95%CI, 0.914-0.935), compared to the static model's aAUC of 0.904 (95%CI, 0.892-0.916). DISCUSSION: Despite its simplicity, BLR is highly competitive with MarBLR. MarBLR outperforms BLR when its prior better reflects the data. CONCLUSIONS: BLR and MarBLR can improve the transportability of clinical prediction models and maintain their performance over time.


Assuntos
Modelos Estatísticos , Doença Pulmonar Obstrutiva Crônica , Teorema de Bayes , Humanos , Modelos Logísticos , Prognóstico
5.
IEEE J Biomed Health Inform ; 24(2): 336-344, 2020 02.
Artigo em Inglês | MEDLINE | ID: mdl-31265424

RESUMO

The study of healthy brain development helps to better understand both brain transformation and connectivity patterns, which happen during childhood to adulthood. This study presents a sparse machine learning solution across whole-brain functional connectivity measures of three datasets, derived from resting state functional magnetic resonance imaging (rs-fMRI) and two task fMRI data including a working memory n-back task (nb-fMRI) and an emotion identification task (em-fMRI). The fMRI data are collected from the Philadelphia Neurodevelopmental Cohort (PNC) for the prediction of brain age in adolescents. Due to extremely large variable-to-instance ratio of PNC data, a high-dimensional matrix with several irrelevant and highly correlated features is generated, and hence a sparse learning approach is necessary to extract effective features from fMRI data. We propose a sparse learner based on the residual errors along the estimation of an inverse problem for extreme learning machine (ELM). Our proposed method is able to overcome the overlearning problem by pruning several redundant features and their corresponding output weights. The proposed multimodal sparse ELM classifier based on residual errors is highly competitive in terms of classification accuracy compared to its counterparts such as conventional ELM, and sparse Bayesian learning ELM.


Assuntos
Envelhecimento/fisiologia , Encéfalo/fisiologia , Adolescente , Algoritmos , Encéfalo/diagnóstico por imagem , Mapeamento Encefálico/métodos , Estudos de Coortes , Emoções , Humanos , Aprendizado de Máquina , Imageamento por Ressonância Magnética , Memória de Curto Prazo
6.
J Am Stat Assoc ; 114(525): 419-433, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-31217649

RESUMO

Sorted L-One Penalized Estimation (SLOPE, Bogdan et al., 2013, 2015) is a relatively new convex optimization procedure which allows for adaptive selection of regressors under sparse high dimensional designs. Here we extend the idea of SLOPE to deal with the situation when one aims at selecting whole groups of explanatory variables instead of single regressors. Such groups can be formed by clustering strongly correlated predictors or groups of dummy variables corresponding to different levels of the same qualitative predictor. We formulate the respective convex optimization problem, gSLOPE (group SLOPE), and propose an efficient algorithm for its solution. We also define a notion of the group false discovery rate (gFDR) and provide a choice of the sequence of tuning parameters for gSLOPE so that gFDR is provably controlled at a prespecified level if the groups of variables are orthogonal to each other. Moreover, we prove that the resulting procedure adapts to unknown sparsity and is asymptotically minimax with respect to the estimation of the proportions of variance of the response variable explained by regressors from different groups. We also provide a method for the choice of the regularizing sequence when variables in different groups are not orthogonal but statistically independent and illustrate its good properties with computer simulations. Finally, we illustrate the advantages of gSLOPE in the context of Genome Wide Association Studies. R package grpSLOPE with an implementation of our method is available on CRAN.

7.
IEEE Trans Med Imaging ; 37(8): 1761-1774, 2018 08.
Artigo em Inglês | MEDLINE | ID: mdl-29993802

RESUMO

Reducing the number of false discoveries is presently one of the most pressing issues in the life sciences. It is of especially great importance for many applications in neuroimaging and genomics, where data sets are typically high-dimensional, which means that the number of explanatory variables exceeds the sample size. The false discovery rate (FDR) is a criterion that can be employed to address that issue. Thus it has gained great popularity as a tool for testing multiple hypotheses. Canonical correlation analysis (CCA) is a statistical technique that is used to make sense of the cross-correlation of two sets of measurements collected on the same set of samples (e.g., brain imaging and genomic data for the same mental illness patients), and sparse CCA extends the classical method to high-dimensional settings. Here, we propose a way of applying the FDR concept to sparse CCA, and a method to control the FDR. The proposed FDR correction directly influences the sparsity of the solution, adapting it to the unknown true sparsity level. Theoretical derivation as well as simulation studies show that our procedure indeed keeps the FDR of the canonical vectors below a user-specified target level. We apply the proposed method to an imaging genomics data set from the Philadelphia Neurodevelopmental Cohort. Our results link the brain connectivity profiles derived from brain activity during an emotion identification task, as measured by functional magnetic resonance imaging, to the corresponding subjects' genomic data.


Assuntos
Genômica/métodos , Processamento de Imagem Assistida por Computador/métodos , Imageamento por Ressonância Magnética/métodos , Adolescente , Adulto , Encéfalo/diagnóstico por imagem , Encéfalo/fisiologia , Criança , Bases de Dados Factuais , Humanos , Aprendizado de Máquina , Modelos Estatísticos , Adulto Jovem
8.
IEEE/ACM Trans Comput Biol Bioinform ; 15(4): 1066-1078, 2018.
Artigo em Inglês | MEDLINE | ID: mdl-29990279

RESUMO

The method of Sorted L-One Penalized Estimation, or SLOPE, is a sparse regression method recently introduced by Bogdan et. al. [1] . It can be used to identify significant predictor variables in a linear model that may have more unknown parameters than observations. When the correlations between predictor variables are small, the SLOPE method is shown to successfully control the false discovery rate (the expected proportion of the irrelevant among all selected predictors) at a user specified level. However, the requirement for nearly uncorrelated predictors is too restrictive for genomic data, as demonstrated in our recent study [2] by an application of SLOPE to realistic simulated DNA sequence data. A possible solution is to divide the predictor variables into nearly uncorrelated groups, and to modify the procedure to select entire groups with an overall significant group effect, rather than individual predictors. Following this motivation, we extend SLOPE in the spirit of Group LASSO to Group SLOPE, a method that can handle group structures between the predictor variables, which are ubiquitous in real genomic data. Our theoretical results show that Group SLOPE controls the group-wise false discovery rate (gFDR), when groups are orthogonal to each other. For use in non-orthogonal settings, we propose two types of Monte Carlo based heuristics, which lead to gFDR control with Group SLOPE in simulations based on real SNP data. As an illustration of the merits of this method, an application of Group SLOPE to a dataset from the Framingham Heart Study results in the identification of some known DNA sequence regions associated with bone health, as well as some new candidate regions. The novel methods are implemented in the R package grpSLOPEMC , which is publicly available at https://github.com/agisga/grpSLOPEMC.


Assuntos
Biologia Computacional/métodos , Análise de Regressão , Algoritmos , Bases de Dados Factuais , Humanos , Aprendizado de Máquina
9.
PLoS One ; 10(10): e0140156, 2015.
Artigo em Inglês | MEDLINE | ID: mdl-26452224

RESUMO

Oxygen is critical for optimal bone regeneration. While axolotls and salamanders have retained the ability to regenerate whole limbs, mammalian regeneration is restricted to the distal tip of the digit (P3) in mice, primates, and humans. Our previous study revealed the oxygen microenvironment during regeneration is dynamic and temporally influential in building and degrading bone. Given that regeneration is dependent on a dynamic and changing oxygen environment, a better understanding of the effects of oxygen during wounding, scarring, and regeneration, and better ways to artificially generate both hypoxic and oxygen replete microenvironments are essential to promote regeneration beyond wounding or scarring. To explore the influence of increased oxygen on digit regeneration in vivo daily treatments of hyperbaric oxygen were administered to mice during all phases of the entire regenerative process. Micro-Computed Tomography (µCT) and histological analysis showed that the daily application of hyperbaric oxygen elicited the same enhanced bone degradation response as two individual pulses of oxygen applied during the blastema phase. We expand past these findings to show histologically that the continuous application of hyperbaric oxygen during digit regeneration results in delayed blastema formation at a much more proximal location after amputation, and the deposition of better organized collagen fibers during bone formation. The application of sustained hyperbaric oxygen also delays wound closure and enhances bone degradation after digit amputation. Thus, hyperbaric oxygen shows the potential for positive influential control on the various phases of an epimorphic regenerative response.


Assuntos
Regeneração Óssea , Colágeno/metabolismo , Membro Posterior/fisiologia , Oxigenoterapia Hiperbárica , Animais , Feminino , Membro Posterior/metabolismo , Camundongos , Cicatrização
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA