Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 104
Filtrar
1.
Stat Methods Med Res ; : 9622802241254196, 2024 May 20.
Artigo em Inglês | MEDLINE | ID: mdl-38767219

RESUMO

In many cluster-correlated data analyses, informative cluster size poses a challenge that can potentially introduce bias in statistical analyses. Different methodologies have been introduced in statistical literature to address this bias. In this study, we consider a complex form of informativeness where the number of observations corresponding to latent levels of a unit-level continuous covariate within a cluster is associated with the response variable. This type of informativeness has not been explored in prior research. We present a novel test statistic designed to evaluate the effect of the continuous covariate while accounting for the presence of informativeness. The covariate induces a continuum of latent subgroups within the clusters, and our test statistic is formulated by aggregating values from an established statistic that accounts for informative subgroup sizes when comparing group-specific marginal distributions. Through carefully designed simulations, we compare our test with four traditional methods commonly employed in the analysis of cluster-correlated data. Only our test maintains the size across all data-generating scenarios with informativeness. We illustrate the proposed method to test for marginal associations in periodontal data with this distinctive form of informativeness.

2.
Stat Med ; 2024 Apr 15.
Artigo em Inglês | MEDLINE | ID: mdl-38618705

RESUMO

Urban environments, characterized by bustling mass transit systems and high population density, host a complex web of microorganisms that impact microbial interactions. These urban microbiomes, influenced by diverse demographics and constant human movement, are vital for understanding microbial dynamics. We explore urban metagenomics, utilizing an extensive dataset from the Metagenomics & Metadesign of Subways & Urban Biomes (MetaSUB) consortium, and investigate antimicrobial resistance (AMR) patterns. In this pioneering research, we delve into the role of bacteriophages, or "phages"-viruses that prey on bacteria and can facilitate the exchange of antibiotic resistance genes (ARGs) through mechanisms like horizontal gene transfer (HGT). Despite their potential significance, existing literature lacks a consensus on their significance in ARG dissemination. We argue that they are an important consideration. We uncover that environmental variables, such as those on climate, demographics, and landscape, can obscure phage-resistome relationships. We adjust for these potential confounders and clarify these relationships across specific and overall antibiotic classes with precision, identifying several key phages. Leveraging machine learning tools and validating findings through clinical literature, we uncover novel associations, adding valuable insights to our comprehension of AMR development.

3.
J Appl Stat ; 51(5): 891-912, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38524800

RESUMO

We propose a novel personalized concept for the optimal treatment selection for a situation where the response is a multivariate vector that could contain right-censored variables such as survival time. The proposed method can be applied with any number of treatments and outcome variables, under a broad set of models. Following a working semiparametric Single Index Model that relates covariates and responses, we first define a patient-specific composite score, constructed from individual covariates. We then estimate conditional means of each response, given the patient score, correspond to each treatment, using a nonparametric smooth estimator. Next, a rank aggregation technique is applied to estimate an ordering of treatments based on ranked lists of treatment performance measures given by conditional means. We handle the right-censored data by incorporating the inverse probability of censoring weighting to the corresponding estimators. An empirical study illustrates the performance of the proposed method in finite sample problems. To show the applicability of the proposed procedure for real data, we also present a data analysis using HIV clinical trial data, that contained a right-censored survival event as one of the endpoints.

4.
BMC Bioinformatics ; 25(1): 117, 2024 Mar 18.
Artigo em Inglês | MEDLINE | ID: mdl-38500042

RESUMO

BACKGROUND: A recent breakthrough in differential network (DN) analysis of microbiome data has been realized with the advent of next-generation sequencing technologies. The DN analysis disentangles the microbial co-abundance among taxa by comparing the network properties between two or more graphs under different biological conditions. However, the existing methods to the DN analysis for microbiome data do not adjust for other clinical differences between subjects. RESULTS: We propose a Statistical Approach via Pseudo-value Information and Estimation for Differential Network Analysis (SOHPIE-DNA) that incorporates additional covariates such as continuous age and categorical BMI. SOHPIE-DNA is a regression technique adopting jackknife pseudo-values that can be implemented readily for the analysis. We demonstrate through simulations that SOHPIE-DNA consistently reaches higher recall and F1-score, while maintaining similar precision and accuracy to existing methods (NetCoMi and MDiNE). Lastly, we apply SOHPIE-DNA on two real datasets from the American Gut Project and the Diet Exchange Study to showcase the utility. The analysis of the Diet Exchange Study is to showcase that SOHPIE-DNA can also be used to incorporate the temporal change of connectivity of taxa with the inclusion of additional covariates. As a result, our method has found taxa that are related to the prevention of intestinal inflammation and severity of fatigue in advanced metastatic cancer patients. CONCLUSION: SOHPIE-DNA is the first attempt of introducing the regression framework for the DN analysis in microbiome data. This enables the prediction of characteristics of a connectivity of a network with the presence of additional covariate information in the regression. The R package with a vignette of our methodology is available through the CRAN repository ( https://CRAN.R-project.org/package=SOHPIE ), named SOHPIE (pronounced as Sofie). The source code and user manual can be found at https://github.com/sjahnn/SOHPIE-DNA .


Assuntos
Microbiota , Humanos , Microbiota/genética , Software , Análise de Regressão , DNA
5.
Obes Surg ; 34(1): 1-14, 2024 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-38040984

RESUMO

INTRODUCTION: Obesity affects millions of Americans. The vagal nerves convey the degree of stomach fullness to the brain via afferent visceral fibers. Studies have found that vagal nerve stimulation (VNS) promotes reduced food intake, causes weight loss, and reduces cravings and appetite. METHODS: Here, we evaluate the efficacy of a novel stimulus waveform applied bilaterally to the subdiaphragmatic vagal nerve stimulation (sVNS) for almost 13 weeks. A stimulating cuff electrode was implanted in obesity-prone Sprague Dawley rats maintained on a high-fat diet. Body weight, food consumption, and daily movement were tracked over time and compared against three control groups: sham rats on a high-fat diet that were implanted with non-operational cuffs, rats on a high-fat diet that were not implanted, and rats on a standard diet that were not implanted. RESULTS: Results showed that rats on a high-fat diet that received sVNS attained a similar weight to rats on a standard diet due primarily to a reduction in daily caloric intake. Rats on a high-fat diet that received sVNS had significantly less body fat than other high-fat controls. Rats receiving sVNS also began moving a similar amount to rats on the standard diet. CONCLUSION: Results from this study suggest that bilateral subdiaphragmatic vagal nerve stimulation can alter the rate of growth of rats maintained on a high-fat diet through a reduction in daily caloric intake, returning their body weight to that which is similar to rats on a standard diet over approximately 13 weeks.


Assuntos
Obesidade Mórbida , Estimulação do Nervo Vago , Humanos , Ratos , Animais , Peso Corporal/fisiologia , Adiposidade , Estimulação do Nervo Vago/efeitos adversos , Ratos Sprague-Dawley , Obesidade Mórbida/cirurgia , Obesidade/terapia , Obesidade/etiologia , Dieta Hiperlipídica , Nervo Vago/fisiologia
6.
Bioinformatics ; 40(1)2024 01 02.
Artigo em Inglês | MEDLINE | ID: mdl-38134422

RESUMO

SUMMARY: The SOHPIE R package implements a novel functionality for "multivariable" differential co-abundance network (DN, hereafter) analyses of microbiome data. It incorporates a regression approach that adjusts for additional covariates for DN analyses. This distinguishes from previous prominent approaches in DN analyses such as MDiNE and NetCoMi which do not feature a covariate adjustment of finding taxa that are differentially connected (DC, hereafter) between individuals with different clinical and phenotypic characteristics. AVAILABILITY AND IMPLEMENTATION: SOHPIE with a vignette is available on CRAN repository https://CRAN.R-project.org/package=SOHPIE and published under General Public License (GPL) version 3 license.


Assuntos
Microbiota , Software , Humanos
7.
BMC Genomics ; 24(1): 687, 2023 Nov 16.
Artigo em Inglês | MEDLINE | ID: mdl-37974076

RESUMO

BACKGROUND: Advances in sequencing technology and cost reduction have enabled an emergence of various statistical methods used in RNA-sequencing data, including the differential co-expression network analysis (or differential network analysis). A key benefit of this method is that it takes into consideration the interactions between or among genes and do not require an established knowledge in biological pathways. As of now, none of existing softwares can incorporate covariates that should be adjusted if they are confounding factors while performing the differential network analysis. RESULTS: We develop an R package PRANA which a user can easily include multiple covariates. The main R function in this package leverages a novel pseudo-value regression approach for a differential network analysis in RNA-sequencing data. This software is also enclosed with complementary R functions for extracting adjusted p-values and coefficient estimates of all or specific variable for each gene, as well as for identifying the names of genes that are differentially connected (DC, hereafter) between subjects under biologically different conditions from the output. CONCLUSION: Herewith, we demonstrate the application of this package in a real data on chronic obstructive pulmonary disease. PRANA is available through the CRAN repositories under the GPL-3 license: https://cran.r-project.org/web/packages/PRANA/index.html .


Assuntos
RNA , Software , Humanos , Sequência de Bases , Análise de Sequência de RNA
8.
Stat Methods Med Res ; 32(12): 2285-2298, 2023 12.
Artigo em Inglês | MEDLINE | ID: mdl-37886856

RESUMO

We present a nonparametric method for estimating the conditional future state entry probabilities and distributions of state entry time conditional on a past state visit when data are subject to dependent censorings in a progressive multistate model where Markovianity of the system is not assumed. These estimators are constructed using the competing risk techniques with risk sets consisting of fractional observations and inverse probability of censoring weights. The fractional observations correspond to estimates of the number of persons who ultimately enter a state from which the future state in question can be reached in one step. We then address the corresponding regression problem by combining these marginal estimators with the pseudo-value approach. The performance of our regression scheme is studied using a comprehensive simulation study. An analysis of existing data on graft-versus-host disease for bone marrow transplant individuals is presented using our novel methodology. A second analysis of another well-known data set on burn patients is also included.


Assuntos
Modelos Estatísticos , Humanos , Análise de Regressão , Probabilidade , Simulação por Computador
9.
Front Genet ; 14: 1235927, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37662846

RESUMO

The COVID-19 pandemic caused by SARS-CoV-2 has resulted in millions of confirmed cases and deaths worldwide. Understanding the biological mechanisms of SARS-CoV-2 infection is crucial for the development of effective therapies. This study conducts differential expression (DE) analysis, pathway analysis, and differential network (DN) analysis on RNA-seq data of four lung cell lines, NHBE, A549, A549.ACE2, and Calu3, to identify their common and unique biological features in response to SARS-CoV-2 infection. DE analysis shows that cell line A549.ACE2 has the highest number of DE genes, while cell line NHBE has the lowest. Among the DE genes identified for the four cell lines, 12 genes are overlapped, associated with various health conditions. The most significant signaling pathways varied among the four cell lines. Only one pathway, "cytokine-cytokine receptor interaction", is found to be significant among all four cell lines and is related to inflammation and immune response. The DN analysis reveals considerable variation in the differential connectivity of the most significant pathway shared among the four lung cell lines. These findings help to elucidate the mechanisms of SARS-CoV-2 infection and potential therapeutic targets.

10.
Front Rehabil Sci ; 4: 1189292, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37484602

RESUMO

Objective: We tested Goal Management Training (GMT), which has been recommended as an executive training protocol that may improve the deficits in the complex tasks inherent in life role participation experienced by those with chronic mild traumatic brain injury and post-traumatic stress disease (mTBI/PTSD). We assessed, not only cognitive function, but also life role participation (quality of life). Methods: We enrolled and treated 14 individuals and administered 10 GMT sessions in-person and provided the use of the Veterans Task Manager (VTM), a Smartphone App, which was designed to serve as a "practice-buddy" device to ensure translation of in-person learning to independent home and community practice of complex tasks. Pre-/post-treatment primary measure was the NIH Examiner, Unstructured Task. Secondary measures were as follows: Tower of London time to complete (cTOL), Community Reintegration of Service Members (CRIS) three subdomains [Extent of Participation; Limitations; Satisfaction of Life Role Participation (Satisfaction)]. We analyzed pre-post-treatment, t-test models to explore change, and generated descriptive statistics to inspect given individual patterns of change across measures. Results: There was statistically significant improvement for the NIH EXAMINER Unstructured Task (p < .02; effect size = .67) and cTOL (p < .01; effect size = .52. There was a statistically significant improvement for two CRIS subdomains: Extent of Participation (p < .01; effect size = .75; Limitations (p < .05; effect size = .59). Individuals varied in their treatment response, across measures. Conclusions and Clinical Significance: In Veterans with mTBI/PTSD in response to GMT and the VTM learning support buddy, there was significant improvement in executive cognition processes, sufficiently robust to produce significant improvement in community life role participation. The individual variations support need for precision neurorehabilitation. The positive results occurred in response to treatment advantages afforded by the content of the combined GMT and the employment of the VTM learning support buddy, with advantages including the following: manualized content of the GMT; incremental complex task difficulty; GMT structure and flexibility to incorporate individualized functional goals; and the VTM capability of ensuring translation of in-person instruction to home and community practice, solidifying newly learned executive cognitive processes. Study results support future study, including a potential randomized controlled trial, the manualized GMT and availability of the VTM to ensure future clinical deployment of treatment, as warranted.

11.
Stat Methods Med Res ; 32(8): 1494-1510, 2023 08.
Artigo em Inglês | MEDLINE | ID: mdl-37323013

RESUMO

Multistate current status data presents a more severe form of censoring due to the single observation of study participants transitioning through a sequence of well-defined disease states at random inspection times. Moreover, these data may be clustered within specified groups, and informativeness of the cluster sizes may arise due to the existing latent relationship between the transition outcomes and the cluster sizes. Failure to adjust for this informativeness may lead to a biased inference. Motivated by a clinical study of periodontal disease, we propose an extension of the pseudo-value approach to estimate covariate effects on the state occupation probabilities for these clustered multistate current status data with informative cluster or intra-cluster group sizes. In our approach, the proposed pseudo-value technique initially computes marginal estimators of the state occupation probabilities utilizing nonparametric regression. Next, the estimating equations based on the corresponding pseudo-values are reweighted by functions of the cluster sizes to adjust for informativeness. We perform a variety of simulation studies to study the properties of our pseudo-value regression based on the nonparametric marginal estimators under different scenarios of informativeness. For illustration, the method is applied to the motivating periodontal disease dataset, which encapsulates the complex data-generation mechanism.


Assuntos
Modelos Estatísticos , Doenças Periodontais , Humanos , Análise por Conglomerados , Simulação por Computador , Doenças Periodontais/epidemiologia , Tamanho da Amostra
12.
ArXiv ; 2023 Mar 23.
Artigo em Inglês | MEDLINE | ID: mdl-36994149

RESUMO

A recent breakthrough in differential network (DN) analysis of microbiome data has been realized with the advent of next-generation sequencing technologies. The DN analysis disentangles the microbial co-abundance among taxa by comparing the network properties between two or more graphs under different biological conditions. However, the existing methods to the DN analysis for microbiome data do not adjust for other clinical differences between subjects. We propose a Statistical Approach via Pseudo-value Information and Estimation for Differential Network Analysis (SOHPIE-DNA) that incorporates additional covariates such as continuous age and categorical BMI. SOHPIE-DNA is a regression technique adopting jackknife pseudo-values that can be implemented readily for the analysis. We demonstrate through simulations that SOHPIE-DNA consistently reaches higher recall and F1-score, while maintaining similar precision and accuracy to existing methods (NetCoMi and MDiNE). Lastly, we apply SOHPIE-DNA on two real datasets from the American Gut Project and the Diet Exchange Study to showcase the utility. The analysis of the Diet Exchange Study is to showcase that SOHPIE-DNA can also be used to incorporate the temporal change of connectivity of taxa with the inclusion of additional covariates. As a result, our method has found taxa that are related to the prevention of intestinal inflammation and severity of fatigue in advanced metastatic cancer patients.

13.
Stat Med ; 42(13): 2162-2178, 2023 06 15.
Artigo em Inglês | MEDLINE | ID: mdl-36973919

RESUMO

Informative cluster size (ICS) arises in situations with clustered data where a latent relationship exists between the number of participants in a cluster and the outcome measures. Although this phenomenon has been sporadically reported in the statistical literature for nearly two decades now, further exploration is needed in certain statistical methodologies to avoid potentially misleading inferences. For inference about population quantities without covariates, inverse cluster size reweightings are often employed to adjust for ICS. Further, to study the effect of covariates on disease progression described by a multistate model, the pseudo-value regression technique has gained popularity in time-to-event data analysis. We seek to answer the question: "How to apply pseudo-value regression to clustered time-to-event data when cluster size is informative?" ICS adjustment by the reweighting method can be performed in two steps; estimation of marginal functions of the multistate model and fitting the estimating equations based on pseudo-value responses, leading to four possible strategies. We present theoretical arguments and thorough simulation experiments to ascertain the correct strategy for adjusting for ICS. A further extension of our methodology is implemented to include informativeness induced by the intracluster group size. We demonstrate the methods in two real-world applications: (i) to determine predictors of tooth survival in a periodontal study and (ii) to identify indicators of ambulatory recovery in spinal cord injury patients who participated in locomotor-training rehabilitation.


Assuntos
Modelos Estatísticos , Dente , Humanos , Análise por Conglomerados , Simulação por Computador , Análise de Regressão
14.
BMC Bioinformatics ; 24(1): 8, 2023 Jan 09.
Artigo em Inglês | MEDLINE | ID: mdl-36624383

RESUMO

BACKGROUND: The differential network (DN) analysis identifies changes in measures of association among genes under two or more experimental conditions. In this article, we introduce a pseudo-value regression approach for network analysis (PRANA). This is a novel method of differential network analysis that also adjusts for additional clinical covariates. We start from mutual information criteria, followed by pseudo-value calculations, which are then entered into a robust regression model. RESULTS: This article assesses the model performances of PRANA in a multivariable setting, followed by a comparison to dnapath and DINGO in both univariable and multivariable settings through variety of simulations. Performance in terms of precision, recall, and F1 score of differentially connected (DC) genes is assessed. By and large, PRANA outperformed dnapath and DINGO, neither of which is equipped to adjust for available covariates such as patient-age. Lastly, we employ PRANA in a real data application from the Gene Expression Omnibus database to identify DC genes that are associated with chronic obstructive pulmonary disease to demonstrate its utility. CONCLUSION: To the best of our knowledge, this is the first attempt of utilizing a regression modeling for DN analysis by collective gene expression levels between two or more groups with the inclusion of additional clinical covariates. By and large, adjusting for available covariates improves accuracy of a DN analysis.


Assuntos
Perfilação da Expressão Gênica , Redes Reguladoras de Genes , Humanos , Perfilação da Expressão Gênica/métodos
15.
Stat Med ; 2022 Dec 27.
Artigo em Inglês | MEDLINE | ID: mdl-36574753

RESUMO

We propose a Bayesian hurdle mixed-effects model to analyze longitudinal ordinal data under a complex multilevel structure. This research was motivated by the dataset gathered from the Iowa Fluoride Study (IFS) in order to establish the relationships between fluorosis status and potential risk/protective factors. Dental fluorosis is characterized by spots on tooth enamel and is due to ingestion of excessive fluoride intake during enamel formation. Observations are collected from multiple surface zones on each tooth and on all available teeth of children from the studied cohort, which are longitudinally observed at ages 9, 13, and 17. The data not only exhibit a complex hierarchical structure, but also have a large proportion of zero values that are likely to follow different statistical patterns from non-zero categories. Therefore, we develop a hurdle model to consider the zero category separately, while a proportional odds model is used for the positive categories. The estimated parameters are obtained from a Gibbs sampler implemented by the OpenBUGS software. Our model is compared with two popular methods for ordinal data: the proportional odds model and the partial proportional odds model. We perform a comprehensive analysis of the IFS data and evaluate the accuracy and effectiveness of our methodology through simulation studies. Our discoveries provide novel insights to statisticians and dental practitioners about the associations between patient and clinical characteristics and dental fluorosis.

16.
NeuroRehabilitation ; 49(4): 573-584, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34806625

RESUMO

BACKGROUND: Gait deficits and functional disability are persistent problems for many stroke survivors, even after standard neurorehabilitation. There is little quantified information regarding the trajectories of response to a long-dose, 12-month intervention. OBJECTIVE: We quantified treatment response to an intensive neurorehabilitation mobility and fitness program. METHODS: The 12-month neurorehabilitation program targeted impairments in balance, limb coordination, gait coordination, and functional mobility, for five chronic stroke survivors. We obtained measures of those variables every two months. RESULTS: We found statistically and clinically significant group improvement in measures of impairment and function. There was high variation across individuals in terms of the timing and the gains exhibited. CONCLUSIONS: Long-duration neurorehabilitation (12 months) for mobility/fitness produced clinically and/or statistically significant gains in impairment and function. There was unique pattern of change for each individual. Gains exhibited late in the treatment support a 12-month intervention. Some measures for some subjects did not reach a plateau at 12 months, justifying further investigation of a longer program (>12 months) of rehabilitation and/or maintenance care for stroke survivors.


Assuntos
Reabilitação do Acidente Vascular Cerebral , Acidente Vascular Cerebral , Terapia por Exercício , Marcha , Humanos , Qualidade de Vida , Recuperação de Função Fisiológica , Acidente Vascular Cerebral/complicações , Sobreviventes
17.
PLoS One ; 16(11): e0259193, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34767561

RESUMO

MOTIVATION: Gene expression data provide an opportunity for reverse-engineering gene-gene associations using network inference methods. However, it is difficult to assess the performance of these methods because the true underlying network is unknown in real data. Current benchmarks address this problem by subsampling a known regulatory network to conduct simulations. But the topology of regulatory networks can vary greatly across organisms or tissues, and reference-based generators-such as GeneNetWeaver-are not designed to capture this heterogeneity. This means, for example, benchmark results from the E. coli regulatory network will not carry over to other organisms or tissues. In contrast, probabilistic generators do not require a reference network, and they have the potential to capture a rich distribution of topologies. This makes probabilistic generators an ideal approach for obtaining a robust benchmarking of network inference methods. RESULTS: We propose a novel probabilistic network generator that (1) provides an alternative to address the inherent limitation of reference-based generators and (2) is able to create realistic gene association networks, and (3) captures the heterogeneity found across gold-standard networks better than existing generators used in practice. Eight organism-specific and 12 human tissue-specific gold-standard association networks are considered. Several measures of global topology are used to determine the similarity of generated networks to the gold-standards. Along with demonstrating the variability of network structure across organisms and tissues, we show that the commonly used "scale-free" model is insufficient for replicating these structures. AVAILABILITY: This generator is implemented in the R package "SeqNet" and is available on CRAN (https://cran.r-project.org/web/packages/SeqNet/index.html).


Assuntos
Algoritmos , Redes Reguladoras de Genes/genética , Animais , Expressão Gênica , Humanos , Cadeias de Markov , Software
18.
Stat Med ; 40(28): 6410-6420, 2021 12 10.
Artigo em Inglês | MEDLINE | ID: mdl-34496070

RESUMO

In studies following selective sampling protocols for secondary outcomes, conventional analyses regarding their appearance could provide misguided information. In the large type 1 diabetes prevention and prediction (DIPP) cohort study monitoring type 1 diabetes-associated autoantibodies, we propose to model their appearance via a multivariate frailty model, which incorporates a correlation component that is important for unbiased estimation of the baseline hazards under the selective sampling mechanism. As further advantages, the frailty model allows for systematic evaluation of the association and the differences in regression parameters among the autoantibodies. We demonstrate the properties of the model by a simulation study and the analysis of the autoantibodies and their association with background factors in the DIPP study, in which we found that high genetic risk is associated with the appearance of all the autoantibodies, whereas the association with sex and urban municipality was evident for IA-2A and IAA autoantibodies.


Assuntos
Diabetes Mellitus Tipo 1 , Fragilidade , Autoanticorpos/análise , Estudos de Coortes , Humanos , Fatores de Risco
19.
Front Genet ; 12: 642759, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34497631

RESUMO

The tumor microenvironment is composed of tumor cells, stroma cells, immune cells, blood vessels, and other associated non-cancerous cells. Gene expression measurements on tumor samples are an average over cells in the microenvironment. However, research questions often seek answers about tumor cells rather than the surrounding non-tumor tissue. Previous studies have suggested that the tumor purity (TP)-the proportion of tumor cells in a solid tumor sample-has a confounding effect on differential expression (DE) analysis of high vs. low survival groups. We investigate three ways incorporating the TP information in the two statistical methods used for analyzing gene expression data, namely, differential network (DN) analysis and DE analysis. Analysis 1 ignores the TP information completely, Analysis 2 uses a truncated sample by removing the low TP samples, and Analysis 3 uses TP as a covariate in the underlying statistical models. We use three gene expression data sets related to three different cancers from the Cancer Genome Atlas (TCGA) for our investigation. The networks from Analysis 2 have greater amount of differential connectivity in the two networks than that from Analysis 1 in all three cancer datasets. Similarly, Analysis 1 identified more differentially expressed genes than Analysis 2. Results of DN and DE analyses using Analysis 3 were mostly consistent with those of Analysis 1 across three cancers. However, Analysis 3 identified additional cancer-related genes in both DN and DE analyses. Our findings suggest that using TP as a covariate in a linear model is appropriate for DE analysis, but a more robust model is needed for DN analysis. However, because true DN or DE patterns are not known for the empirical datasets, simulated datasets can be used to study the statistical properties of these methods in future studies.

20.
J Stat Softw ; 98(12)2021 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-34321962

RESUMO

Gene expression data provide an abundant resource for inferring connections in gene regulatory networks. While methodologies developed for this task have shown success, a challenge remains in comparing the performance among methods. Gold-standard datasets are scarce and limited in use. And while tools for simulating expression data are available, they are not designed to resemble the data obtained from RNA-seq experiments. SeqNet is an R package that provides tools for generating a rich variety of gene network structures and simulating RNA-seq data from them. This produces in silico RNA-seq data for benchmarking and assessing gene network inference methods. The package is available on CRAN and on GitHub at https://github.com/tgrimes/SeqNet.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...