Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 39
Filtrar
1.
BMC Bioinformatics ; 25(1): 40, 2024 Jan 23.
Artigo em Inglês | MEDLINE | ID: mdl-38262930

RESUMO

BACKGROUND: Clustering is a fundamental problem in statistics and has broad applications in various areas. Traditional clustering methods treat features equally and ignore the potential structure brought by the characteristic difference of features. Especially in cancer diagnosis and treatment, several types of biological features are collected and analyzed together. Treating these features equally fails to identify the heterogeneity of both data structure and cancer itself, which leads to incompleteness and inefficacy of current anti-cancer therapies. OBJECTIVES: In this paper, we propose a clustering framework based on hierarchical heterogeneous data with prior pairwise relationships. The proposed clustering method fully characterizes the difference of features and identifies potential hierarchical structure by rough and refined clusters. RESULTS: The refined clustering further divides the clusters obtained by the rough clustering into different subtypes. Thus it provides a deeper insight of cancer that can not be detected by existing clustering methods. The proposed method is also flexible with prior information, additional pairwise relationships of samples can be incorporated to help to improve clustering performance. Finally, well-grounded statistical consistency properties of our proposed method are rigorously established, including the accurate estimation of parameters and determination of clustering structures. CONCLUSIONS: Our proposed method achieves better clustering performance than other methods in simulation studies, and the clustering accuracy increases with prior information incorporated. Meaningful biological findings are obtained in the analysis of lung adenocarcinoma with clinical imaging data and omics data, showing that hierarchical structure produced by rough and refined clustering is necessary and reasonable.


Assuntos
Adenocarcinoma de Pulmão , Neoplasias Pulmonares , Humanos , Análise por Conglomerados , Simulação por Computador
2.
Stat Med ; 43(11): 2280-2297, 2024 May 20.
Artigo em Inglês | MEDLINE | ID: mdl-38553996

RESUMO

Cancer heterogeneity analysis is essential for precision medicine. Most of the existing heterogeneity analyses only consider a single type of data and ignore the possible sparsity of important features. In cancer clinical practice, it has been suggested that two types of data, pathological imaging and omics data, are commonly collected and can produce hierarchical heterogeneous structures, in which the refined sub-subgroup structure determined by omics features can be nested in the rough subgroup structure determined by the imaging features. Moreover, sparsity pursuit has extraordinary significance and is more challenging for heterogeneity analysis, because the important features may not be the same in different subgroups, which is ignored by the existing heterogeneity analyses. Fortunately, rich information from previous literature (for example, those deposited in PubMed) can be used to assist feature selection in the present study. Advancing from the existing analyses, in this study, we propose a novel sparse hierarchical heterogeneity analysis framework, which can integrate two types of features and incorporate prior knowledge to improve feature selection. The proposed approach has satisfactory statistical properties and competitive numerical performance. A TCGA real data analysis demonstrates the practical value of our approach in analyzing data heterogeneity and sparsity.


Assuntos
Neoplasias , Humanos , Neoplasias/genética , Medicina de Precisão , Modelos Estatísticos , Simulação por Computador , Heterogeneidade Genética
3.
Entropy (Basel) ; 26(4)2024 Mar 30.
Artigo em Inglês | MEDLINE | ID: mdl-38667864

RESUMO

In the classification task, label noise has a significant impact on models' performance, primarily manifested in the disruption of prediction consistency, thereby reducing the classification accuracy. This work introduces a novel prediction consistency regularization that mitigates the impact of label noise on neural networks by imposing constraints on the prediction consistency of similar samples. However, determining which samples should be similar is a primary challenge. We formalize the similar sample identification as a clustering problem and employ twin contrastive clustering (TCC) to address this issue. To ensure similarity between samples within each cluster, we enhance TCC by adjusting clustering prior to distribution using label information. Based on the adjusted TCC's clustering results, we first construct the prototype for each cluster and then formulate a prototype-based regularization term to enhance prediction consistency for the prototype within each cluster and counteract the adverse effects of label noise. We conducted comprehensive experiments using benchmark datasets to evaluate the effectiveness of our method under various scenarios with different noise rates. The results explicitly demonstrate the enhancement in classification accuracy. Subsequent analytical experiments confirm that the proposed regularization term effectively mitigates noise and that the adjusted TCC enhances the quality of similar sample recognition.

4.
Mol Vis ; 29: 266-273, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-38222453

RESUMO

Clinical relevance: Identification of individuals with a higher risk of developing refractive error under specific gene and environmental backgrounds, especially myopia, could enable more personalized myopic control advice for patients. Background: Refractive error is a common disease that affects visual quality and ocular health worldwide. Its mechanisms have not been elaborated, although both genes and the environment are known to contribute to the process. Interactions between genes and the environment have been shown to exert effects on the onset of refractive error, especially myopia. Axial length elongation is the main characteristic of myopia development and could indicate the severity of myopia. Thus, the purpose of the study was to investigate the interaction between environmental factors and genetic markers of VIPR2 and their impact on spherical equivalence and axial length in a population of Han Chinese children. Methods: A total of 1825 children aged 13~15 years in the Anyang Childhood Eye Study (ACES) were measured for cycloplegic autorefraction, axial length, and height. Saliva DNA was extracted for genotyping three single-nucleotide polymorphisms (SNPs) in the candidate gene (VIPR2). The median outdoor time (2 h/day) was used to categorize children into high and low exposure groups, respectively. Genetic quality control and linear and logistic regressions were performed. Generalized multifactor dimensional reduction (GMDR) was used to investigate gene-environment interactions. Results: There were 1391 children who passed genetic quality control. Rs2071623 of VIPR2 was associated with axial length (T allele, ß=-0.11 se=0.04 p=0.006), while SNP nominally interacted with outdoor time (T allele, ß=-0.17 se=0.08 p=0.029). Rs2071623 in children with high outdoor exposure had a significant interaction effect on axial length (p=0.0007, ß=-0.19 se=0.056) compared to children with low outdoor exposure. GMDR further suggested the existence of an interaction effect between outdoor time and rs2071623. Conclusions: Rs2071623 within VIPR2 could interact with outdoor time in Han Chinese children. More outdoor exposure could enhance the protective effect of the T allele on axial elongation.


Assuntos
Miopia , Receptores Tipo II de Peptídeo Intestinal Vasoativo , Refração Ocular , Humanos , Comprimento Axial do Olho , China/epidemiologia , Olho , Miopia/genética , Polimorfismo de Nucleotídeo Único , Receptores Tipo II de Peptídeo Intestinal Vasoativo/genética , Adolescente
5.
Biometrics ; 79(3): 2404-2416, 2023 09.
Artigo em Inglês | MEDLINE | ID: mdl-36573805

RESUMO

The network analysis plays an important role in numerous application domains including biomedicine. Estimation of the number of communities is a fundamental and critical issue in network analysis. Most existing studies assume that the number of communities is known a priori, or lack of rigorous theoretical guarantee on the estimation consistency. In this paper, we propose a regularized network embedding model to simultaneously estimate the community structure and the number of communities in a unified formulation. The proposed model equips network embedding with a novel composite regularization term, which pushes the embedding vector toward its center and pushes similar community centers collapsed with each other. A rigorous theoretical analysis is conducted, establishing asymptotic consistency in terms of community detection and estimation of the number of communities. Extensive numerical experiments have also been conducted on both synthetic networks and brain functional connectivity network, which demonstrate the superior performance of the proposed method compared with existing alternatives.


Assuntos
Algoritmos , Encéfalo
6.
J Biomed Inform ; 144: 104434, 2023 08.
Artigo em Inglês | MEDLINE | ID: mdl-37391115

RESUMO

OBJECTIVE: Deep neural network (DNN) techniques have demonstrated significant advantages over regression and some other techniques. In recent studies, DNN-based analysis has been conducted on data with high-dimensional input such as omics measurements. In such analysis, regularization, in particular penalization, has been applied to regularize estimation and distinguish relevant input variables from irrelevant ones. A unique challenge arises from the "lack of information" attributable to high dimensionality of input and limited size of training data. For many data/studies, there exist other data/studies that may be relevant and can potentially provide additional information to boost performance. METHODS: In this study, we conduct integrative analysis of multiple independent datasets/studies, with the goal of borrowing information across each other and improving overall performance. Significantly different from regression-based integrative analysis (where alignment can be easily achieved based on covariates), alignment across multiple DNNs can be nontrivial. We develop ANNI, an Aligned DNN technique for Integrative analysis with high-dimensional input. Penalization is applied for regularized estimation, selection of important input variables, and, equally importantly, information borrowing across multiple DNNs. An effective computational algorithm is developed. RESULTS: Extensive simulations demonstrate competitive performance of the proposed technique. The analysis of cancer omics data further establishes its practical utility.


Assuntos
Neoplasias , Redes Neurais de Computação , Humanos , Algoritmos
7.
Bioinformatics ; 37(18): 3073-3074, 2021 09 29.
Artigo em Inglês | MEDLINE | ID: mdl-33638346

RESUMO

SUMMARY: Heterogeneity is a hallmark of many complex human diseases, and unsupervised heterogeneity analysis has been extensively conducted using high-throughput molecular measurements and histopathological imaging features. 'Classic' heterogeneity analysis has been based on simple statistics such as mean, variance and correlation. Network-based analysis takes interconnections as well as individual variable properties into consideration and can be more informative. Several Gaussian graphical model (GGM)-based heterogeneity analysis techniques have been developed, but friendly and portable software is still lacking. To facilitate more extensive usage, we develop the R package HeteroGGM, which conducts GGM-based heterogeneity analysis using the advanced penaliztaion techniques, can provide informative summary and graphical presentation, and is efficient and friendly. AVAILABILITYAND IMPLEMENTATION: The package is available at https://CRAN.R-project.org/package=HeteroGGM. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Software , Humanos , Distribuição Normal
8.
Biometrics ; 78(2): 524-535, 2022 06.
Artigo em Inglês | MEDLINE | ID: mdl-33501648

RESUMO

Heterogeneity is a hallmark of cancer, diabetes, cardiovascular diseases, and many other complex diseases. This study has been partly motivated by the unsupervised heterogeneity analysis for complex diseases based on molecular and imaging data, for which, network-based analysis, by accommodating the interconnections among variables, can be more informative than that limited to mean, variance, and other simple distributional properties. In the literature, there has been very limited research on network-based heterogeneity analysis, and a common limitation shared by the existing techniques is that the number of subgroups needs to be specified a priori or in an ad hoc manner. In this article, we develop a penalized fusion approach for heterogeneity analysis based on the Gaussian graphical model. It applies penalization to the mean and precision matrix parameters to generate regularized and interpretable estimates. More importantly, a fusion penalty is imposed to "automatedly" determine the number of subgroups and generate more concise, reliable, and interpretable estimation. Consistency properties are rigorously established, and an effective computational algorithm is developed. The heterogeneity analysis of non-small-cell lung cancer based on single-cell gene expression data of the Wnt pathway and that of lung adenocarcinoma based on histopathological imaging data not only demonstrate the practical applicability of the proposed approach but also lead to interesting new findings.


Assuntos
Carcinoma Pulmonar de Células não Pequenas , Neoplasias Pulmonares , Algoritmos , Humanos , Neoplasias Pulmonares/genética , Distribuição Normal
9.
Biometrics ; 78(4): 1579-1591, 2022 12.
Artigo em Inglês | MEDLINE | ID: mdl-34390584

RESUMO

In cancer research, supervised heterogeneity analysis has important implications. Such analysis has been traditionally based on clinical/demographic/molecular variables. Recently, histopathological imaging features, which are generated as a byproduct of biopsy, have been shown as effective for modeling cancer outcomes, and a handful of supervised heterogeneity analysis has been conducted based on such features. There are two types of histopathological imaging features, which are extracted based on specific biological knowledge and using automated imaging processing software, respectively. Using both types of histopathological imaging features, our goal is to conduct the first supervised cancer heterogeneity analysis that satisfies a hierarchical structure. That is, the first type of imaging features defines a rough structure, and the second type defines a nested and more refined structure. A penalization approach is developed, which has been motivated by but differs significantly from penalized fusion and sparse group penalization. It has satisfactory statistical and numerical properties. In the analysis of lung adenocarcinoma data, it identifies a heterogeneity structure significantly different from the alternatives and has satisfactory prediction and stability performance.


Assuntos
Neoplasias , Humanos , Neoplasias/diagnóstico por imagem , Software
10.
Appl Opt ; 61(19): 5567-5574, 2022 Jul 01.
Artigo em Inglês | MEDLINE | ID: mdl-36255783

RESUMO

In this paper, a modified probabilistic deep learning method is proposed to attack the double random phase encryption by modeling the conditional distribution of plaintext. The well-trained probabilistic model gives both predictions of plaintext and uncertainty quantification, the latter of which is first introduced to optical cryptanalysis. Predictions of the model are close to real plaintexts, showing the success of the proposed model. Uncertainty quantification reveals the level of reliability of each pixel in the prediction of plaintext without ground truth. Subsequent simulation experiments demonstrate that uncertainty quantification can effectively identify poor-quality predictions to avoid the risk of unreliability from deep learning models.

11.
Biom J ; 64(3): 461-480, 2022 03.
Artigo em Inglês | MEDLINE | ID: mdl-34725857

RESUMO

In high-throughput cancer studies, gene-environment interactions associated with outcomes have important implications. Some commonly adopted identification methods do not respect the "main effect, interaction" hierarchical structure. In addition, they can be challenged by data contamination and/or long-tailed distributions, which are not uncommon. In this article, robust methods based on γ$\gamma$ -divergence and density power divergence are proposed to accommodate contaminated data/long-tailed distributions. A hierarchical sparse group penalty is adopted for regularized estimation and selection and can identify important gene-environment interactions and respect the "main effect, interaction" hierarchical structure. The proposed methods are implemented using an effective group coordinate descent algorithm. Simulation shows that when contamination occurs, the proposed methods can significantly outperform the existing alternatives with more accurate identification. The proposed approach is applied to the analysis of The Cancer Genome Atlas (TCGA) triple-negative breast cancer data and Gene Environment Association Studies (GENEVA) Type 2 Diabetes data.


Assuntos
Diabetes Mellitus Tipo 2 , Neoplasias , Algoritmos , Simulação por Computador , Diabetes Mellitus Tipo 2/genética , Interação Gene-Ambiente , Humanos , Neoplasias/genética
12.
Genet Epidemiol ; 44(2): 159-196, 2020 03.
Artigo em Inglês | MEDLINE | ID: mdl-31724772

RESUMO

Gene-environment (G-E) interaction analysis has been extensively conducted for complex diseases. In marginal analysis, the common practice is to conduct likelihood-based (and other "standard") estimation with each marginal model, and then select significant G-E interactions and main effects based on p values and multiple comparisons adjustment. One limitation of this approach is that the identification results often do not respect the "main effects, interactions" hierarchy, which has been stressed in recent G-E interaction analyses. There is some recent effort tackling this problem, however, with very complex formulations. Another limitation of the common practice is that it may not perform well when regularization is needed, for example, because of "non-normal" distributions. In this article, we propose a marginal penalization approach which adopts a novel penalty to directly tackle the aforementioned problems. The proposed approach has a framework more coherent with that of the recently developed joint analysis methods and an intuitive formulation, and can be effectively realized. In simulation, it outperforms the popular significance-based analysis and simple penalization-based alternatives. Promising findings are made in the analysis of a single-nucleotide polymorphism and a gene expression data.


Assuntos
Interação Gene-Ambiente , Modelos Genéticos , Simulação por Computador , Diabetes Mellitus/genética , Genoma Humano , Humanos , Melanoma/genética , Polimorfismo de Nucleotídeo Único/genética , Neoplasias Cutâneas/genética
13.
Genet Epidemiol ; 44(7): 687-701, 2020 10.
Artigo em Inglês | MEDLINE | ID: mdl-32583530

RESUMO

To date, thousands of genetic variants to be associated with numerous human traits and diseases have been identified by genome-wide association studies (GWASs). The GWASs focus on testing the association between single trait and genetic variants. However, the analysis of multiple traits and single nucleotide polymorphisms (SNPs) might reflect physiological process of complex diseases and the corresponding study is called pleiotropy association analysis. Modern day GWASs report only summary statistics instead of individual-level phenotype and genotype data to avoid logistical and privacy issues. Existing methods for combining multiple phenotypes GWAS summary statistics mainly focus on low-dimensional phenotypes while lose power in high-dimensional cases. To overcome this defect, we propose two kinds of truncated tests to combine multiple phenotypes summary statistics. Extensive simulations show that the proposed methods are robust and powerful when the dimension of the phenotypes is high and only part of the phenotypes are associated with the SNPs. We apply the proposed methods to blood cytokines data collected from Finnish population. Results show that the proposed tests can identify additional genetic markers that are missed by single trait analysis.


Assuntos
Citocinas/sangue , Citocinas/genética , Estudo de Associação Genômica Ampla/estatística & dados numéricos , Modelos Genéticos , Polimorfismo de Nucleotídeo Único/genética , Simulação por Computador , Finlândia , Marcadores Genéticos/genética , Genótipo , Humanos , Fenótipo
14.
Stat Med ; 40(17): 3915-3936, 2021 07 30.
Artigo em Inglês | MEDLINE | ID: mdl-33906263

RESUMO

Heterogeneity is a hallmark of many complex diseases. There are multiple ways of defining heterogeneity, among which the heterogeneity in genetic regulations, for example, gene expressions (GEs) by copy number variations (CNVs), and methylation, has been suggested but little investigated. Heterogeneity in genetic regulations can be linked with disease severity, progression, and other traits and is biologically important. However, the analysis can be very challenging with the high dimensionality of both sides of regulation as well as sparse and weak signals. In this article, we consider the scenario where subjects form unknown subgroups, and each subgroup has unique genetic regulation relationships. Further, such heterogeneity is "guided" by a known biomarker. We develop a multivariate sparse fusion (MSF) approach, which innovatively applies the penalized fusion technique to simultaneously determine the number and structure of subgroups and regulation relationships within each subgroup. An effective computational algorithm is developed, and extensive simulations are conducted. The analysis of heterogeneity in the GE-CNV regulations in melanoma and GE-methylation regulations in stomach cancer using the TCGA data leads to interesting findings.


Assuntos
Variações do Número de Cópias de DNA , Melanoma , Algoritmos , Expressão Gênica , Regulação da Expressão Gênica , Humanos , Melanoma/genética
15.
Cereb Cortex ; 30(6): 3717-3730, 2020 05 18.
Artigo em Inglês | MEDLINE | ID: mdl-31907535

RESUMO

Angiogenesis in the developing cerebral cortex accompanies cortical neurogenesis. However, the precise mechanisms underlying cortical angiogenesis at the embryonic stage remain largely unknown. Here, we show that radial glia-derived vascular cell adhesion molecule 1 (VCAM1) coordinates cortical vascularization through different enrichments in the proximal and distal radial glial processes. We found that VCAM1 was highly enriched around the blood vessels in the inner ventricular zone (VZ), preventing the ingrowth of blood vessels into the mitotic cell layer along the ventricular surface. Disrupting the enrichment of VCAM1 surrounding the blood vessels by a tetraspanin-blocking peptide or conditional deletion of Vcam1 gene in neural progenitor cells increased angiogenesis in the inner VZ. Conversely, VCAM1 expressed in the basal endfeet of radial glial processes promoted angiogenic sprouting from the perineural vascular plexus (PNVP). In utero, overexpression of VCAM1 increased the vessel density in the cortical plate, while knockdown of Vcam1 accomplished the opposite. In vitro, we observed that VCAM1 bidirectionally affected endothelial cell proliferation in a concentration-dependent manner. Taken together, our findings identify that distinct concentrations of VCAM1 around VZ blood vessels and the PNVP differently organize cortical angiogenesis during late embryogenesis.


Assuntos
Proliferação de Células/genética , Córtex Cerebral/embriologia , Células Endoteliais/metabolismo , Células Ependimogliais/metabolismo , Neovascularização Fisiológica/genética , Molécula 1 de Adesão de Célula Vascular/genética , Animais , Proliferação de Células/efeitos dos fármacos , Córtex Cerebral/irrigação sanguínea , Ventrículos Cerebrais/irrigação sanguínea , Ventrículos Cerebrais/embriologia , Células Endoteliais/citologia , Células Ependimogliais/efeitos dos fármacos , Técnicas de Silenciamento de Genes , Técnicas In Vitro , Camundongos , Camundongos Knockout , Neovascularização Fisiológica/efeitos dos fármacos , Molécula 1 de Adesão de Célula Vascular/efeitos dos fármacos , Molécula 1 de Adesão de Célula Vascular/metabolismo
16.
Opt Express ; 28(21): 31832-31843, 2020 Oct 12.
Artigo em Inglês | MEDLINE | ID: mdl-33115148

RESUMO

We propose an optical watermarking method based on a natural speckle pattern. In the watermarking process, the watermark information is embedded into the natural speckle pattern. Then the random-like watermarked image is generated with the proposed grayscale reordering algorithm. During the extraction procedure, the watermarked image is projected to the natural speckle pattern as illumination. Subsequently, they are incoherently superimposed to extract the watermark information directly by human vision. Optical experiments and a hypothesis test are conducted to demonstrate the proposed method with high reliability, imperceptibility and robustness. The proposed method is the first watermarking method utilizing the natural diffuser as the core element in encoding and decoding.

17.
Stat Appl Genet Mol Biol ; 18(2)2019 01 26.
Artigo em Inglês | MEDLINE | ID: mdl-30685746

RESUMO

Response selective sampling design is commonly adopted in genetic epidemiologic study because it can substantially reduce time cost and increase power of identifying deleterious genetic variants predispose to human complex disease comparing with prospective design. The proportional odds model (POM) can be used to fit data obtained by this design. Unlike the logistic regression model, the estimated genetic effect based on POM by taking data as being enrolled prospectively is inconsistent. So the power of resulted Wald test is not satisfactory. The modified POM is suitable to fit this type of data, however, the corresponding Wald test is not optimal when the genetic effect is small. Here, we propose a new association test to handle this issue. Simulation studies show that the proposed test can control the type I error rate correctly and is more powerful than two existing methods. Finally, we applied three tests to Anticyclic Citrullinated Protein Antibody data from Genetic Workshop 16.


Assuntos
Simulação por Computador/estatística & dados numéricos , Estudos de Associação Genética/estatística & dados numéricos , Testes Genéticos/estatística & dados numéricos , Modelos Genéticos , Genótipo , Humanos , Modelos Logísticos , Polimorfismo de Nucleotídeo Único/genética
18.
Stat Probab Lett ; 1632020 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-32431467

RESUMO

We propose a U-statistics test for regression coefficients in high dimensional partially linear models. In addition, the proposed method is extended to test part of the coefficients. Asymptotic distributions of the test statistics are established. Simulation studies demonstrate satisfactory finite-sample performance.

19.
J Biopharm Stat ; 29(4): 606-624, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-31309858

RESUMO

Personalized medicine has received increasing attentions among scientific communities in recent years. Because patients often have heterogenous responses to treatments, discovering individualized treatment rules (ITR) is an important component of precision medicine. To that end, one needs to develop a proper decision rule using patient-specific characteristics to maximize the expected clinical outcome, i.e. the optimal ITR. Recently, outcome weighted learning (OWL) has been proposed to estimate optimal ITR under a weighted classification framework. Since most of commonly used loss functions are unbounded, the resulting ITR may suffer similar effects of outliers as the corresponding classifiers. In this paper, we propose robust OWL (ROWL) to build more stable ITRs using a new family of bounded and non-convex loss functions. Moreover, we extend the proposed ROWL method to the multiple treatment setting under the angle-based classification structure. Our theoretical results show that ROWL is Fisher consistent, and can provide the estimation of rewards' ratios for the resulting ITRs. We develop an efficient difference of convex functions algorithm (DCA) to solve the corresponding nonconvex optimization problem. Through analysis of simulated examples and a real medical dataset, we demonstrate that the proposed ROWL method yields more competitive performance in terms of the empirical value function and the misclassification error than several existing methods.


Assuntos
Aprendizado de Máquina , Medicina de Precisão/métodos , Algoritmos , Humanos , Resultado do Tratamento
20.
Genet Epidemiol ; 41(6): 523-554, 2017 09.
Artigo em Inglês | MEDLINE | ID: mdl-28657194

RESUMO

For the prognosis of complex diseases, beyond the main effects of genetic (G) and environmental (E) factors, gene-environment (G-E) interactions also play an important role. Many approaches have been developed for detecting important G-E interactions, most of which assume that measurements are complete. In practical data analysis, missingness in E measurements is not uncommon, and failing to properly accommodate such missingness leads to biased estimation and false marker identification. In this study, we conduct G-E interaction analysis with prognosis data under an accelerated failure time (AFT) model. To accommodate missingness in E measurements, we adopt a nonparametric kernel-based data augmentation approach. With a well-designed weighting scheme, a nice "byproduct" is that the proposed approach enjoys a certain robustness property. A penalization approach, which respects the "main effects, interactions" hierarchy, is adopted for selection (of important interactions and main effects) and regularized estimation. The proposed approach has sound interpretations and a solid statistical basis. It outperforms multiple alternatives in simulation. The analysis of TCGA data on lung cancer and melanoma leads to interesting findings and models with superior prediction.


Assuntos
Interação Gene-Ambiente , Modelos Genéticos , Adenocarcinoma/genética , Adenocarcinoma de Pulmão , Simulação por Computador , Bases de Dados Genéticas , Humanos , Neoplasias Pulmonares/genética , Melanoma/genética , Neoplasias Cutâneas/genética
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA