Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 291
Filtrar
Mais filtros

Base de dados
Tipo de documento
Intervalo de ano de publicação
1.
Nature ; 595(7866): 283-288, 2021 07.
Artigo em Inglês | MEDLINE | ID: mdl-34010947

RESUMO

COVID-19 manifests with a wide spectrum of clinical phenotypes that are characterized by exaggerated and misdirected host immune responses1-6. Although pathological innate immune activation is well-documented in severe disease1, the effect of autoantibodies on disease progression is less well-defined. Here we use a high-throughput autoantibody discovery technique known as rapid extracellular antigen profiling7 to screen a cohort of 194 individuals infected with SARS-CoV-2, comprising 172 patients with COVID-19 and 22 healthcare workers with mild disease or asymptomatic infection, for autoantibodies against 2,770 extracellular and secreted proteins (members of the exoproteome). We found that patients with COVID-19 exhibit marked increases in autoantibody reactivities as compared to uninfected individuals, and show a high prevalence of autoantibodies against immunomodulatory proteins (including cytokines, chemokines, complement components and cell-surface proteins). We established that these autoantibodies perturb immune function and impair virological control by inhibiting immunoreceptor signalling and by altering peripheral immune cell composition, and found that mouse surrogates of these autoantibodies increase disease severity in a mouse model of SARS-CoV-2 infection. Our analysis of autoantibodies against tissue-associated antigens revealed associations with specific clinical characteristics. Our findings suggest a pathological role for exoproteome-directed autoantibodies in COVID-19, with diverse effects on immune functionality and associations with clinical outcomes.


Assuntos
Autoanticorpos/análise , Autoanticorpos/imunologia , COVID-19/imunologia , COVID-19/metabolismo , Proteoma/imunologia , Proteoma/metabolismo , Animais , Antígenos de Superfície/imunologia , COVID-19/patologia , COVID-19/fisiopatologia , Estudos de Casos e Controles , Proteínas do Sistema Complemento/imunologia , Citocinas/imunologia , Modelos Animais de Doenças , Progressão da Doença , Feminino , Humanos , Masculino , Camundongos , Especificidade de Órgãos/imunologia
2.
Biostatistics ; 2024 Jul 29.
Artigo em Inglês | MEDLINE | ID: mdl-39074174

RESUMO

Cancer is molecularly heterogeneous, with seemingly similar patients having different molecular landscapes and accordingly different clinical behaviors. In recent studies, gene expression networks have been shown as more effective/informative for cancer heterogeneity analysis than some simpler measures. Gene interconnections can be classified as "direct" and "indirect," where the latter can be caused by shared genomic regulators (such as transcription factors, microRNAs, and other regulatory molecules) and other mechanisms. It has been suggested that incorporating the regulators of gene expressions in network analysis and focusing on the direct interconnections can lead to a deeper understanding of the more essential gene interconnections. Such analysis can be seriously challenged by the large number of parameters (jointly caused by network analysis, incorporation of regulators, and heterogeneity) and often weak signals. To effectively tackle this problem, we propose incorporating prior information contained in the published literature. A key challenge is that such prior information can be partial or even wrong. We develop a two-step procedure that can flexibly accommodate different levels of prior information quality. Simulation demonstrates the effectiveness of the proposed approach and its superiority over relevant competitors. In the analysis of a breast cancer dataset, findings different from the alternatives are made, and the identified sample subgroups have important clinical differences.

3.
PLoS Biol ; 20(5): e3001506, 2022 05.
Artigo em Inglês | MEDLINE | ID: mdl-35609110

RESUMO

The impact of Coronavirus Disease 2019 (COVID-19) mRNA vaccination on pregnancy and fertility has become a major topic of public interest. We investigated 2 of the most widely propagated claims to determine (1) whether COVID-19 mRNA vaccination of mice during early pregnancy is associated with an increased incidence of birth defects or growth abnormalities; and (2) whether COVID-19 mRNA-vaccinated human volunteers exhibit elevated levels of antibodies to the human placental protein syncytin-1. Using a mouse model, we found that intramuscular COVID-19 mRNA vaccination during early pregnancy at gestational age E7.5 did not lead to differences in fetal size by crown-rump length or weight at term, nor did we observe any gross birth defects. In contrast, injection of the TLR3 agonist and double-stranded RNA mimic polyinosinic-polycytidylic acid, or poly(I:C), impacted growth in utero leading to reduced fetal size. No overt maternal illness following either vaccination or poly(I:C) exposure was observed. We also found that term fetuses from these murine pregnancies vaccinated prior to the formation of the definitive placenta exhibit high circulating levels of anti-spike and anti-receptor-binding domain (anti-RBD) antibodies to Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) consistent with maternal antibody status, indicating transplacental transfer in the later stages of pregnancy after early immunization. Finally, we did not detect increased levels of circulating anti-syncytin-1 antibodies in a cohort of COVID-19 vaccinated adults compared to unvaccinated adults by ELISA. Our findings contradict popular claims associating COVID-19 mRNA vaccination with infertility and adverse neonatal outcomes.


Assuntos
COVID-19 , Animais , Anticorpos Antivirais , COVID-19/prevenção & controle , Feminino , Feto , Produtos do Gene env , Humanos , Camundongos , Placenta/metabolismo , Gravidez , Proteínas da Gravidez , RNA Mensageiro/genética , RNA Mensageiro/metabolismo , SARS-CoV-2 , Vacinação
4.
Genet Epidemiol ; 47(3): 261-286, 2023 04.
Artigo em Inglês | MEDLINE | ID: mdl-36807383

RESUMO

Gene-environment (G-E) interaction analysis plays an important role in studying complex diseases. Extensive methodological research has been conducted on G-E interaction analysis, and the existing methods are mostly based on regression techniques. In many fields including biomedicine and omics, it has been increasingly recognized that deep learning may outperform regression with its unique flexibility (e.g., in accommodating unspecified nonlinear effects) and superior prediction performance. However, there has been a lack of development in deep learning for G-E interaction analysis. In this article, we fill this important knowledge gap and develop a new analysis approach based on deep neural network in conjunction with penalization. The proposed approach can simultaneously conduct model estimation and selection (of important main G effects and G-E interactions), while uniquely respecting the "main effects, interactions" variable selection hierarchy. Simulation shows that it has superior prediction and feature selection performance. The analysis of data on lung adenocarcinoma and skin cutaneous melanoma overall survival further establishes its practical utility. Overall, this study can advance G-E interaction analysis by delivering a powerful new analysis approach based on modern deep learning.


Assuntos
Aprendizado Profundo , Melanoma , Neoplasias Cutâneas , Humanos , Interação Gene-Ambiente , Modelos Genéticos , Melanoma Maligno Cutâneo
5.
Biostatistics ; 24(2): 425-442, 2023 04 14.
Artigo em Inglês | MEDLINE | ID: mdl-37057611

RESUMO

Cancer is a heterogeneous disease. Finite mixture of regression (FMR)-as an important heterogeneity analysis technique when an outcome variable is present-has been extensively employed in cancer research, revealing important differences in the associations between a cancer outcome/phenotype and covariates. Cancer FMR analysis has been based on clinical, demographic, and omics variables. A relatively recent and alternative source of data comes from histopathological images. Histopathological images have been long used for cancer diagnosis and staging. Recently, it has been shown that high-dimensional histopathological image features, which are extracted using automated digital image processing pipelines, are effective for modeling cancer outcomes/phenotypes. Histopathological imaging-environment interaction analysis has been further developed to expand the scope of cancer modeling and histopathological imaging-based analysis. Motivated by the significance of cancer FMR analysis and a still strong demand for more effective methods, in this article, we take the natural next step and conduct cancer FMR analysis based on models that incorporate low-dimensional clinical/demographic/environmental variables, high-dimensional imaging features, as well as their interactions. Complementary to many of the existing studies, we develop a Bayesian approach for accommodating high dimensionality, screening out noises, identifying signals, and respecting the "main effects, interactions" variable selection hierarchy. An effective computational algorithm is developed, and simulation shows advantageous performance of the proposed approach. The analysis of The Cancer Genome Atlas data on lung squamous cell cancer leads to interesting findings different from the alternative approaches.


Assuntos
Interação Gene-Ambiente , Neoplasias , Humanos , Teorema de Bayes , Neoplasias/diagnóstico por imagem , Simulação por Computador , Análise de Regressão
6.
Brief Bioinform ; 23(2)2022 03 10.
Artigo em Inglês | MEDLINE | ID: mdl-35039832

RESUMO

Cancer is an omics disease. The development in high-throughput profiling has fundamentally changed cancer research and clinical practice. Compared with clinical, demographic and environmental data, the analysis of omics data-which has higher dimensionality, weaker signals and more complex distributional properties-is much more challenging. Developments in the literature are often 'scattered', with individual studies focused on one or a few closely related methods. The goal of this review is to assist cancer researchers with limited statistical expertise in establishing the 'overall framework' of cancer omics data analysis. To facilitate understanding, we mainly focus on intuition, concepts and key steps, and refer readers to the original publications for mathematical details. This review broadly covers unsupervised and supervised analysis, as well as individual-gene-based, gene-set-based and gene-network-based analysis. We also briefly discuss 'special topics' including interaction analysis, multi-datasets analysis and multi-omics analysis.


Assuntos
Genômica , Neoplasias , Análise de Dados , Genômica/métodos , Humanos , Neoplasias/genética
7.
Brief Bioinform ; 23(5)2022 09 20.
Artigo em Inglês | MEDLINE | ID: mdl-35876281

RESUMO

In biomedical research, the replicability of findings across studies is highly desired. In this study, we focus on cancer omics data, for which the examination of replicability has been mostly focused on important omics variables identified in different studies. In published literature, although there have been extensive attention and ad hoc discussions, there is insufficient quantitative research looking into replicability measures and their properties. The goal of this study is to fill this important knowledge gap. In particular, we consider three sensible replicability measures, for which we examine distributional properties and develop a way of making inference. Applying them to three The Cancer Genome Atlas (TCGA) datasets reveals in general low replicability and significant across-data variations. To further comprehend such findings, we resort to simulation, which confirms the validity of the findings with the TCGA data and further informs the dependence of replicability on signal level (or equivalently sample size). Overall, this study can advance our understanding of replicability for cancer omics and other studies that have identification as a key goal.


Assuntos
Pesquisa Biomédica , Neoplasias , Humanos , Neoplasias/genética , Tamanho da Amostra
8.
Bioinformatics ; 39(8)2023 08 01.
Artigo em Inglês | MEDLINE | ID: mdl-37490475

RESUMO

MOTIVATION: Analyzing genetic data to identify markers and construct predictive models is of great interest in biomedical research. However, limited by cost and sample availability, genetic studies often suffer from the "small sample size, high dimensionality" problem. To tackle this problem, an integrative analysis that collectively analyzes multiple datasets with compatible designs is often conducted. For regularizing estimation and selecting relevant variables, penalization and other regularization techniques are routinely adopted. "Blindly" searching over a vast number of variables may not be efficient. RESULTS: We propose incorporating prior information to assist integrative analysis of multiple genetic datasets. To obtain accurate prior information, we adopt a convolutional neural network with an active learning strategy to label textual information from previous studies. Then the extracted prior information is incorporated using a group LASSO-based technique. We conducted a series of simulation studies that demonstrated the satisfactory performance of the proposed method. Finally, data on skin cutaneous melanoma are analyzed to establish practical utility. AVAILABILITY AND IMPLEMENTATION: Code is available at https://github.com/ldz7/PAIA. The data that support the findings in this article are openly available in TCGA (The Cancer Genome Atlas) at https://portal.gdc.cancer.gov/.


Assuntos
Melanoma , Neoplasias Cutâneas , Humanos , Melanoma/genética , Simulação por Computador , Genoma , Melanoma Maligno Cutâneo
9.
Bioinformatics ; 39(12)2023 12 01.
Artigo em Inglês | MEDLINE | ID: mdl-38060266

RESUMO

SUMMARY: Densely measured SNP data are routinely analyzed but face challenges due to its high dimensionality, especially when gene-environment interactions are incorporated. In recent literature, a functional analysis strategy has been developed, which treats dense SNP measurements as a realization of a genetic function and can 'bypass' the dimensionality challenge. However, there is a lack of portable and friendly software, which hinders practical utilization of these functional methods. We fill this knowledge gap and develop the R package FunctanSNP. This comprehensive package encompasses estimation, identification, and visualization tools and has undergone extensive testing using both simulated and real data, confirming its reliability. FunctanSNP can serve as a convenient and reliable tool for analyzing SNP and other densely measured data. AVAILABILITY AND IMPLEMENTATION: The package is available at https://CRAN.R-project.org/package=FunctanSNP.


Assuntos
Software , Reprodutibilidade dos Testes
10.
Stat Med ; 2024 Sep 11.
Artigo em Inglês | MEDLINE | ID: mdl-39260448

RESUMO

Data irregularity in cancer genomics studies has been widely observed in the form of outliers and heavy-tailed distributions in the complex traits. In the past decade, robust variable selection methods have emerged as powerful alternatives to the nonrobust ones to identify important genes associated with heterogeneous disease traits and build superior predictive models. In this study, to keep the remarkable features of the quantile LASSO and fully Bayesian regularized quantile regression while overcoming their disadvantage in the analysis of high-dimensional genomics data, we propose the spike-and-slab quantile LASSO through a fully Bayesian spike-and-slab formulation under the robust likelihood by adopting the asymmetric Laplace distribution (ALD). The proposed robust method has inherited the prominent properties of selective shrinkage and self-adaptivity to the sparsity pattern from the spike-and-slab LASSO (Roc̆ková and George, J Am Stat Associat, 2018, 113(521): 431-444). Furthermore, the spike-and-slab quantile LASSO has a computational advantage to locate the posterior modes via soft-thresholding rule guided Expectation-Maximization (EM) steps in the coordinate descent framework, a phenomenon rarely observed for robust regularization with nondifferentiable loss functions. We have conducted comprehensive simulation studies with a variety of heavy-tailed errors in both homogeneous and heterogeneous model settings to demonstrate the superiority of the spike-and-slab quantile LASSO over its competing methods. The advantage of the proposed method has been further demonstrated in case studies of the lung adenocarcinomas (LUAD) and skin cutaneous melanoma (SKCM) data from The Cancer Genome Atlas (TCGA).

11.
Stat Med ; 43(11): 2280-2297, 2024 May 20.
Artigo em Inglês | MEDLINE | ID: mdl-38553996

RESUMO

Cancer heterogeneity analysis is essential for precision medicine. Most of the existing heterogeneity analyses only consider a single type of data and ignore the possible sparsity of important features. In cancer clinical practice, it has been suggested that two types of data, pathological imaging and omics data, are commonly collected and can produce hierarchical heterogeneous structures, in which the refined sub-subgroup structure determined by omics features can be nested in the rough subgroup structure determined by the imaging features. Moreover, sparsity pursuit has extraordinary significance and is more challenging for heterogeneity analysis, because the important features may not be the same in different subgroups, which is ignored by the existing heterogeneity analyses. Fortunately, rich information from previous literature (for example, those deposited in PubMed) can be used to assist feature selection in the present study. Advancing from the existing analyses, in this study, we propose a novel sparse hierarchical heterogeneity analysis framework, which can integrate two types of features and incorporate prior knowledge to improve feature selection. The proposed approach has satisfactory statistical properties and competitive numerical performance. A TCGA real data analysis demonstrates the practical value of our approach in analyzing data heterogeneity and sparsity.


Assuntos
Neoplasias , Humanos , Neoplasias/genética , Medicina de Precisão , Modelos Estatísticos , Simulação por Computador , Heterogeneidade Genética
12.
Environ Health ; 23(1): 28, 2024 Mar 19.
Artigo em Inglês | MEDLINE | ID: mdl-38504322

RESUMO

BACKGROUND: The effects of organochlorine pesticide (OCP) exposure on the development of human papillary thyroid cancer (PTC) are not well understood. A nested case-control study was conducted with data from the U.S. Department of Defense Serum Repository (DoDSR) cohort between 2000 and 2013 to assess associations of individual OCPs serum concentrations with PTC risk. METHODS: This study included 742 histologically confirmed PTC cases (341 females, 401 males) and 742 individually-matched controls with pre-diagnostic serum samples selected from the DoDSR. Associations between categories of lipid-corrected serum concentrations of seven OCPs and PTC risk were evaluated for classical PTC and follicular PTC using conditional logistic regression, adjusted for body mass index category and military branch to compute odds ratios (OR) and 95% confidence intervals (CIs). Effect modification by sex, birth cohort, and race was examined. RESULTS: There was no evidence of associations between most of the OCPs and PTC, overall or stratified by histological subtype. Overall, there was no evidence of an association between hexachlorobenzene (HCB) and PTC, but stratified by histological subtype HCB was associated with significantly increased risk of classical PTC (third tertile above the limit of detection (LOD) vs.

Assuntos
Hexaclorocicloexano , Hidrocarbonetos Clorados , Militares , Praguicidas , Neoplasias da Glândula Tireoide , Masculino , Humanos , Feminino , Câncer Papilífero da Tireoide/epidemiologia , Hexaclorobenzeno , Estudos de Casos e Controles , Neoplasias da Glândula Tireoide/induzido quimicamente , Neoplasias da Glândula Tireoide/epidemiologia
13.
Artigo em Inglês | MEDLINE | ID: mdl-38098875

RESUMO

With the development of data collection techniques, analysis with a survival response and high-dimensional covariates has become routine. Here we consider an interaction model, which includes a set of low-dimensional covariates, a set of high-dimensional covariates, and their interactions. This model has been motivated by gene-environment (G-E) interaction analysis, where the E variables have a low dimension, and the G variables have a high dimension. For such a model, there has been extensive research on estimation and variable selection. Comparatively, inference studies with a valid false discovery rate (FDR) control have been very limited. The existing high-dimensional inference tools cannot be directly applied to interaction models, as interactions and main effects are not "equal". In this article, for high-dimensional survival analysis with interactions, we model survival using the Accelerated Failure Time (AFT) model and adopt a "weighted least squares + debiased Lasso" approach for estimation and selection. A hierarchical FDR control approach is developed for inference and respect of the "main effects, interactions" hierarchy. The asymptotic distribution properties of the debiased Lasso estimators are rigorously established. Simulation demonstrates the satisfactory performance of the proposed approach, and the analysis of a breast cancer dataset further establishes its practical utility.

14.
Entropy (Basel) ; 26(4)2024 Mar 30.
Artigo em Inglês | MEDLINE | ID: mdl-38667864

RESUMO

In the classification task, label noise has a significant impact on models' performance, primarily manifested in the disruption of prediction consistency, thereby reducing the classification accuracy. This work introduces a novel prediction consistency regularization that mitigates the impact of label noise on neural networks by imposing constraints on the prediction consistency of similar samples. However, determining which samples should be similar is a primary challenge. We formalize the similar sample identification as a clustering problem and employ twin contrastive clustering (TCC) to address this issue. To ensure similarity between samples within each cluster, we enhance TCC by adjusting clustering prior to distribution using label information. Based on the adjusted TCC's clustering results, we first construct the prototype for each cluster and then formulate a prototype-based regularization term to enhance prediction consistency for the prototype within each cluster and counteract the adverse effects of label noise. We conducted comprehensive experiments using benchmark datasets to evaluate the effectiveness of our method under various scenarios with different noise rates. The results explicitly demonstrate the enhancement in classification accuracy. Subsequent analytical experiments confirm that the proposed regularization term effectively mitigates noise and that the adjusted TCC enhances the quality of similar sample recognition.

15.
Genet Epidemiol ; 46(5-6): 317-340, 2022 07.
Artigo em Inglês | MEDLINE | ID: mdl-35766061

RESUMO

Penalized variable selection for high-dimensional longitudinal data has received much attention as it can account for the correlation among repeated measurements while providing additional and essential information for improved identification and prediction performance. Despite the success, in longitudinal studies, the potential of penalization methods is far from fully understood for accommodating structured sparsity. In this article, we develop a sparse group penalization method to conduct the bi-level gene-environment (G × $\times $ E) interaction study under the repeatedly measured phenotype. Within the quadratic inference function framework, the proposed method can achieve simultaneous identification of main and interaction effects on both the group and individual levels. Simulation studies have shown that the proposed method outperforms major competitors. In the case study of asthma data from the Childhood Asthma Management Program, we conduct G × $\times $ E study by using high-dimensional single nucleotide polymorphism data as genetic factors and the longitudinal trait, forced expiratory volume in 1 s, as the phenotype. Our method leads to improved prediction and identification of main and interaction effects with important implications.


Assuntos
Asma , Interação Gene-Ambiente , Asma/genética , Simulação por Computador , Humanos , Estudos Longitudinais , Modelos Genéticos
16.
Biostatistics ; 23(2): 574-590, 2022 04 13.
Artigo em Inglês | MEDLINE | ID: mdl-33040145

RESUMO

In recent biomedical research, genome-wide association studies (GWAS) have demonstrated great success in investigating the genetic architecture of human diseases. For many complex diseases, multiple correlated traits have been collected. However, most of the existing GWAS are still limited because they analyze each trait separately without considering their correlations and suffer from a lack of sufficient information. Moreover, the high dimensionality of single nucleotide polymorphism (SNP) data still poses tremendous challenges to statistical methods, in both theoretical and practical aspects. In this article, we innovatively propose an integrative functional linear model for GWAS with multiple traits. This study is the first to approximate SNPs as functional objects in a joint model of multiple traits with penalization techniques. It effectively accommodates the high dimensionality of SNPs and correlations among multiple traits to facilitate information borrowing. Our extensive simulation studies demonstrate the satisfactory performance of the proposed method in the identification and estimation of disease-associated genetic variants, compared to four alternatives. The analysis of type 2 diabetes data leads to biologically meaningful findings with good prediction accuracy and selection stability.


Assuntos
Diabetes Mellitus Tipo 2 , Estudo de Associação Genômica Ampla , Diabetes Mellitus Tipo 2/genética , Estudo de Associação Genômica Ampla/métodos , Humanos , Modelos Lineares , Fenótipo , Polimorfismo de Nucleotídeo Único
17.
Brief Bioinform ; 22(3)2021 05 20.
Artigo em Inglês | MEDLINE | ID: mdl-32793970

RESUMO

Gene expression data have played an essential role in many biomedical studies. When the number of genes is large and sample size is limited, there is a 'lack of information' problem, leading to low-quality findings. To tackle this problem, both horizontal and vertical data integrations have been developed, where vertical integration methods collectively analyze data on gene expressions as well as their regulators (such as mutations, DNA methylation and miRNAs). In this article, we conduct a selective review of vertical data integration methods for gene expression data. The reviewed methods cover both marginal and joint analysis and supervised and unsupervised analysis. The main goal is to provide a sketch of the vertical data integration paradigm without digging into too many technical details. We also briefly discuss potential pitfalls, directions for future developments and application notes.


Assuntos
Expressão Gênica , Análise por Conglomerados , Análise de Dados , Humanos , Aprendizado de Máquina não Supervisionado
18.
Brief Bioinform ; 22(4)2021 07 20.
Artigo em Inglês | MEDLINE | ID: mdl-33313791

RESUMO

Structures of genetic regulatory networks are not fixed. These structural perturbations can cause changes to the reachability of systems' state spaces. As system structures are related to genotypes and state spaces are related to phenotypes, it is important to study the relationship between structures and state spaces. However, there is still no method can quantitively describe the reachability differences of two state spaces caused by structural perturbations. Therefore, Difference in Reachability between State Spaces (DReSS) is proposed. DReSS index family can quantitively describe differences of reachability, attractor sets between two state spaces and can help find the key structure in a system, which may influence system's state space significantly. First, basic properties of DReSS including non-negativity, symmetry and subadditivity are proved. Then, typical examples are shown to explain the meaning of DReSS and the differences between DReSS and traditional graph distance. Finally, differences of DReSS distribution between real biological regulatory networks and random networks are compared. Results show most structural perturbations in biological networks tend to affect reachability inside and between attractor basins rather than to affect attractor set itself when compared with random networks, which illustrates that most genotype differences tend to influence the proportion of different phenotypes and only a few ones can create new phenotypes. DReSS can provide researchers with a new insight to study the relation between genotypes and phenotypes.


Assuntos
Algoritmos , Redes Reguladoras de Genes , Genótipo , Modelos Genéticos
19.
Bioinformatics ; 38(11): 3139-3140, 2022 05 26.
Artigo em Inglês | MEDLINE | ID: mdl-35485739

RESUMO

SUMMARY: Gene-environment (G-E) interactions have important implications for many complex diseases. With higher dimensionality and weaker signals, G-E interaction analysis is more challenged than the analysis of main G (and E) effects. The accumulation of published literature makes it possible to borrow strength from prior information and improve analysis. In a recent study, a 'quasi-likelihood + penalization' approach was developed to effectively incorporate prior information. Here, we first extend it to linear, logistic and Poisson regressions. Such models are much more popular in practice. More importantly, we develop the R package GEInfo, which realizes this approach in a user-friendly manner. To facilitate direct comparison and routine data analysis, the package also includes functions for alternative methods and visualization. AVAILABILITY AND IMPLEMENTATION: The package is available at https://CRAN.R-project.org/package=GEInfo. SUPPLEMENTARY INFORMATION: Supplementary materials are available at Bioinformatics online.


Assuntos
Interação Gene-Ambiente , Software
20.
Bioinformatics ; 38(10): 2855-2862, 2022 05 13.
Artigo em Inglês | MEDLINE | ID: mdl-35561185

RESUMO

MOTIVATION: Cancer genetic heterogeneity analysis has critical implications for tumour classification, response to therapy and choice of biomarkers to guide personalized cancer medicine. However, existing heterogeneity analysis based solely on molecular profiling data usually suffers from a lack of information and has limited effectiveness. Many biomedical and life sciences databases have accumulated a substantial volume of meaningful biological information. They can provide additional information beyond molecular profiling data, yet pose challenges arising from potential noise and uncertainty. RESULTS: In this study, we aim to develop a more effective heterogeneity analysis method with the help of prior information. A network-based penalization technique is proposed to innovatively incorporate a multi-view of prior information from multiple databases, which accommodates heterogeneity attributed to both differential genes and gene relationships. To account for the fact that the prior information might not be fully credible, we propose a weighted strategy, where the weight is determined dependent on the data and can ensure that the present model is not excessively disturbed by incorrect information. Simulation and analysis of The Cancer Genome Atlas glioblastoma multiforme data demonstrate the practical applicability of the proposed method. AVAILABILITY AND IMPLEMENTATION: R code implementing the proposed method is available at https://github.com/mengyunwu2020/PECM. The data that support the findings in this paper are openly available in TCGA (The Cancer Genome Atlas) at https://portal.gdc.cancer.gov/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Glioblastoma , Software , Simulação por Computador , Genoma , Glioblastoma/genética , Humanos , Medicina de Precisão
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA