Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 1.384
Filter
1.
PLoS Comput Biol ; 20(9): e1012448, 2024 Sep.
Article in English | MEDLINE | ID: mdl-39259748

ABSTRACT

Large-scale studies of gene expression are commonly influenced by biological and technical sources of expression variation, including batch effects, sample characteristics, and environmental impacts. Learning the causal relationships between observable variables may be challenging in the presence of unobserved confounders. Furthermore, many high-dimensional regression techniques may perform worse. In fact, controlling for unobserved confounding variables is essential, and many deconfounding methods have been suggested for application in a variety of situations. The main contribution of this article is the development of a two-stage deconfounding procedure based on Bow-free Acyclic Paths (BAP) search developed into the framework of Structural Equation Models (SEM), called SEMbap(). In the first stage, an exhaustive search of missing edges with significant covariance is performed via Shipley d-separation tests; then, in the second stage, a Constrained Gaussian Graphical Model (CGGM) is fitted or a low dimensional representation of bow-free edges structure is obtained via Graph Laplacian Principal Component Analysis (gLPCA). We compare four popular deconfounding methods to BAP search approach with applications on simulated and observed expression data. In the former, different structures of the hidden covariance matrix have been replicated. Compared to existing methods, BAP search algorithm is able to correctly identify hidden confounding whilst controlling false positive rate and achieving good fitting and perturbation metrics.


Subject(s)
Algorithms , Computational Biology , Computational Biology/methods , Humans , Principal Component Analysis , Computer Simulation , Gene Expression Profiling/methods , Gene Expression Profiling/statistics & numerical data , Models, Statistical , Correlation of Data , Normal Distribution
2.
PLoS Comput Biol ; 20(9): e1012301, 2024 Sep.
Article in English | MEDLINE | ID: mdl-39226325

ABSTRACT

Clustering is widely used in bioinformatics and many other fields, with applications from exploratory analysis to prediction. Many types of data have associated uncertainty or measurement error, but this is rarely used to inform the clustering. We present Dirichlet Process Mixtures with Uncertainty (DPMUnc), an extension of a Bayesian nonparametric clustering algorithm which makes use of the uncertainty associated with data points. We show that DPMUnc out-performs existing methods on simulated data. We cluster immune-mediated diseases (IMD) using GWAS summary statistics, which have uncertainty linked with the sample size of the study. DPMUnc separates autoimmune from autoinflammatory diseases and isolates other subgroups such as adult-onset arthritis. We additionally consider how DPMUnc can be used to cluster gene expression datasets that have been summarised using gene signatures. We first introduce a novel procedure for generating a summary of a gene signature on a dataset different to the one where it was discovered, which incorporates a measure of the variability in expression across signature genes within each individual. We summarise three public gene expression datasets containing patients with a range of IMD, using three relevant gene signatures. We find association between disease and the clusters returned by DPMUnc, with clustering structure replicated across the datasets. The significance of this work is two-fold. Firstly, we demonstrate that when data has associated uncertainty, this uncertainty should be used to inform clustering and we present a method which does this, DPMUnc. Secondly, we present a procedure for using gene signatures in datasets other than where they were originally defined. We show the value of this procedure by summarising gene expression data from patients with immune-mediated diseases using relevant gene signatures, and clustering these patients using DPMUnc.


Subject(s)
Algorithms , Bayes Theorem , Computational Biology , Humans , Cluster Analysis , Uncertainty , Computational Biology/methods , Genome-Wide Association Study/methods , Genome-Wide Association Study/statistics & numerical data , Gene Expression Profiling/statistics & numerical data , Gene Expression Profiling/methods , Databases, Genetic/statistics & numerical data , Computer Simulation
3.
Biometrics ; 80(3)2024 Jul 01.
Article in English | MEDLINE | ID: mdl-39248122

ABSTRACT

The geometric median, which is applicable to high-dimensional data, can be viewed as a generalization of the univariate median used in 1-dimensional data. It can be used as a robust estimator for identifying the location of multi-dimensional data and has a wide range of applications in real-world scenarios. This paper explores the problem of high-dimensional multivariate analysis of variance (MANOVA) using the geometric median. A maximum-type statistic that relies on the differences between the geometric medians among various groups is introduced. The distribution of the new test statistic is derived under the null hypothesis using Gaussian approximations, and its consistency under the alternative hypothesis is established. To approximate the distribution of the new statistic in high dimensions, a wild bootstrap algorithm is proposed and theoretically justified. Through simulation studies conducted across a variety of dimensions, sample sizes, and data-generating models, we demonstrate the finite-sample performance of our geometric median-based MANOVA method. Additionally, we implement the proposed approach to analyze a breast cancer gene expression dataset.


Subject(s)
Algorithms , Breast Neoplasms , Computer Simulation , Humans , Multivariate Analysis , Breast Neoplasms/genetics , Models, Statistical , Female , Data Interpretation, Statistical , Gene Expression Profiling/statistics & numerical data , Sample Size , Biometry/methods
4.
J Bioinform Comput Biol ; 22(4): 2450014, 2024 Aug.
Article in English | MEDLINE | ID: mdl-39183679

ABSTRACT

Cancer subtyping refers to categorizing a particular cancer type into distinct subtypes or subgroups based on a range of molecular characteristics, clinical manifestations, histological features, and other relevant factors. The identification of cancer subtypes can significantly enhance precision in clinical practice and facilitate personalized diagnosis and treatment strategies. Recent advancements in the field have witnessed the emergence of numerous network fusion methods aimed at identifying cancer subtypes. The majority of these fusion algorithms, however, solely rely on the fusion network of a single core matrix for the identification of cancer subtypes and fail to comprehensively capture similarity. To tackle this issue, in this study, we propose a novel cancer subtype recognition method, referred to as PCA-constrained multi-core matrix fusion network (PCA-MM-FN). The PCA-MM-FN algorithm initially employs three distinct methods to obtain three core matrices. Subsequently, the obtained core matrices are projected into a shared subspace using principal component analysis, followed by a weighted network fusion. Lastly, spectral clustering is conducted on the fused network. The results obtained from conducting experiments on the mRNA expression, DNA methylation, and miRNA expression of five TCGA datasets and three multi-omics benchmark datasets demonstrate that the proposed PCA-MM-FN approach exhibits superior accuracy in identifying cancer subtypes compared to the existing methods.


Subject(s)
Algorithms , Computational Biology , DNA Methylation , MicroRNAs , Neoplasms , Principal Component Analysis , Humans , Neoplasms/genetics , Neoplasms/classification , MicroRNAs/genetics , Computational Biology/methods , Cluster Analysis , RNA, Messenger/genetics , RNA, Messenger/metabolism , Gene Expression Profiling/methods , Gene Expression Profiling/statistics & numerical data , Databases, Genetic
5.
PLoS Comput Biol ; 20(8): e1012339, 2024 Aug.
Article in English | MEDLINE | ID: mdl-39116191

ABSTRACT

Single-cell RNA sequencing (scRNA-seq) has emerged as a powerful tool in genomics research, enabling the analysis of gene expression at the individual cell level. However, scRNA-seq data often suffer from a high rate of dropouts, where certain genes fail to be detected in specific cells due to technical limitations. This missing data can introduce biases and hinder downstream analysis. To overcome this challenge, the development of effective imputation methods has become crucial in the field of scRNA-seq data analysis. Here, we propose an imputation method based on robust and non-negative matrix factorization (scRNMF). Instead of other matrix factorization algorithms, scRNMF integrates two loss functions: L2 loss and C-loss. The L2 loss function is highly sensitive to outliers, which can introduce substantial errors. We utilize the C-loss function when dealing with zero values in the raw data. The primary advantage of the C-loss function is that it imposes a smaller punishment for larger errors, which results in more robust factorization when handling outliers. Various datasets of different sizes and zero rates are used to evaluate the performance of scRNMF against other state-of-the-art methods. Our method demonstrates its power and stability as a tool for imputation of scRNA-seq data.


Subject(s)
Algorithms , Computational Biology , RNA-Seq , Single-Cell Analysis , Single-Cell Analysis/methods , Single-Cell Analysis/statistics & numerical data , RNA-Seq/methods , RNA-Seq/statistics & numerical data , Computational Biology/methods , Humans , Sequence Analysis, RNA/methods , Sequence Analysis, RNA/statistics & numerical data , Gene Expression Profiling/methods , Gene Expression Profiling/statistics & numerical data , Software , Single-Cell Gene Expression Analysis
6.
Bull Math Biol ; 86(9): 105, 2024 Jul 12.
Article in English | MEDLINE | ID: mdl-38995438

ABSTRACT

The growing complexity of biological data has spurred the development of innovative computational techniques to extract meaningful information and uncover hidden patterns within vast datasets. Biological networks, such as gene regulatory networks and protein-protein interaction networks, hold critical insights into biological features' connections and functions. Integrating and analyzing high-dimensional data, particularly in gene expression studies, stands prominent among the challenges in deciphering these networks. Clustering methods play a crucial role in addressing these challenges, with spectral clustering emerging as a potent unsupervised technique considering intrinsic geometric structures. However, spectral clustering's user-defined cluster number can lead to inconsistent and sometimes orthogonal clustering regimes. We propose the Multi-layer Bundling (MLB) method to address this limitation, combining multiple prominent clustering regimes to offer a comprehensive data view. We call the outcome clusters "bundles". This approach refines clustering outcomes, unravels hierarchical organization, and identifies bridge elements mediating communication between network components. By layering clustering results, MLB provides a global-to-local view of biological feature clusters enabling insights into intricate biological systems. Furthermore, the method enhances bundle network predictions by integrating the bundle co-cluster matrix with the affinity matrix. The versatility of MLB extends beyond biological networks, making it applicable to various domains where understanding complex relationships and patterns is needed.


Subject(s)
Algorithms , Computational Biology , Gene Regulatory Networks , Mathematical Concepts , Protein Interaction Maps , Cluster Analysis , Humans , Models, Biological , Gene Expression Profiling/statistics & numerical data , Gene Expression Profiling/methods
7.
PLoS Comput Biol ; 20(7): e1011620, 2024 Jul.
Article in English | MEDLINE | ID: mdl-38976751

ABSTRACT

Boolean networks are largely employed to model the qualitative dynamics of cell fate processes by describing the change of binary activation states of genes and transcription factors with time. Being able to bridge such qualitative states with quantitative measurements of gene expression in cells, as scRNA-seq, is a cornerstone for data-driven model construction and validation. On one hand, scRNA-seq binarisation is a key step for inferring and validating Boolean models. On the other hand, the generation of synthetic scRNA-seq data from baseline Boolean models provides an important asset to benchmark inference methods. However, linking characteristics of scRNA-seq datasets, including dropout events, with Boolean states is a challenging task. We present scBoolSeq, a method for the bidirectional linking of scRNA-seq data and Boolean activation state of genes. Given a reference scRNA-seq dataset, scBoolSeq computes statistical criteria to classify the empirical gene pseudocount distributions as either unimodal, bimodal, or zero-inflated, and fit a probabilistic model of dropouts, with gene-dependent parameters. From these learnt distributions, scBoolSeq can perform both binarisation of scRNA-seq datasets, and generate synthetic scRNA-seq datasets from Boolean traces, as issued from Boolean networks, using biased sampling and dropout simulation. We present a case study demonstrating the application of scBoolSeq's binarisation scheme in data-driven model inference. Furthermore, we compare synthetic scRNA-seq data generated by scBoolSeq with BoolODE's, data for the same Boolean Network model. The comparison shows that our method better reproduces the statistics of real scRNA-seq datasets, such as the mean-variance and mean-dropout relationships while exhibiting clearly defined trajectories in two-dimensional projections of the data.


Subject(s)
Computational Biology , Single-Cell Analysis , Computational Biology/methods , Single-Cell Analysis/methods , Single-Cell Analysis/statistics & numerical data , Humans , RNA-Seq/methods , RNA-Seq/statistics & numerical data , Gene Expression Profiling/methods , Gene Expression Profiling/statistics & numerical data , Sequence Analysis, RNA/methods , Sequence Analysis, RNA/statistics & numerical data , Algorithms , Gene Regulatory Networks/genetics , Models, Statistical , Software , Single-Cell Gene Expression Analysis
8.
Brief Bioinform ; 25(4)2024 May 23.
Article in English | MEDLINE | ID: mdl-39007596

ABSTRACT

Biclustering, the simultaneous clustering of rows and columns of a data matrix, has proved its effectiveness in bioinformatics due to its capacity to produce local instead of global models, evolving from a key technique used in gene expression data analysis into one of the most used approaches for pattern discovery and identification of biological modules, used in both descriptive and predictive learning tasks. This survey presents a comprehensive overview of biclustering. It proposes an updated taxonomy for its fundamental components (bicluster, biclustering solution, biclustering algorithms, and evaluation measures) and applications. We unify scattered concepts in the literature with new definitions to accommodate the diversity of data types (such as tabular, network, and time series data) and the specificities of biological and biomedical data domains. We further propose a pipeline for biclustering data analysis and discuss practical aspects of incorporating biclustering in real-world applications. We highlight prominent application domains, particularly in bioinformatics, and identify typical biclusters to illustrate the analysis output. Moreover, we discuss important aspects to consider when choosing, applying, and evaluating a biclustering algorithm. We also relate biclustering with other data mining tasks (clustering, pattern mining, classification, triclustering, N-way clustering, and graph mining). Thus, it provides theoretical and practical guidance on biclustering data analysis, demonstrating its potential to uncover actionable insights from complex datasets.


Subject(s)
Algorithms , Computational Biology , Cluster Analysis , Computational Biology/methods , Gene Expression Profiling/methods , Gene Expression Profiling/statistics & numerical data , Humans
9.
Biometrics ; 80(3)2024 Jul 01.
Article in English | MEDLINE | ID: mdl-39073775

ABSTRACT

Recent breakthroughs in spatially resolved transcriptomics (SRT) technologies have enabled comprehensive molecular characterization at the spot or cellular level while preserving spatial information. Cells are the fundamental building blocks of tissues, organized into distinct yet connected components. Although many non-spatial and spatial clustering approaches have been used to partition the entire region into mutually exclusive spatial domains based on the SRT high-dimensional molecular profile, most require an ad hoc selection of less interpretable dimensional-reduction techniques. To overcome this challenge, we propose a zero-inflated negative binomial mixture model to cluster spots or cells based on their molecular profiles. To increase interpretability, we employ a feature selection mechanism to provide a low-dimensional summary of the SRT molecular profile in terms of discriminating genes that shed light on the clustering result. We further incorporate the SRT geospatial profile via a Markov random field prior. We demonstrate how this joint modeling strategy improves clustering accuracy, compared with alternative state-of-the-art approaches, through simulation studies and 3 real data applications.


Subject(s)
Bayes Theorem , Computer Simulation , Gene Expression Profiling , Cluster Analysis , Gene Expression Profiling/methods , Gene Expression Profiling/statistics & numerical data , Humans , Transcriptome , Markov Chains , Models, Statistical , Data Interpretation, Statistical
10.
Stat Appl Genet Mol Biol ; 23(1)2024 Jan 01.
Article in English | MEDLINE | ID: mdl-38810893

ABSTRACT

This article addresses the limitations of existing statistical models in analyzing and interpreting highly skewed miRNA-seq raw read count data that can range from zero to millions. A heavy-tailed model using discrete stable distributions is proposed as a novel approach to better capture the heterogeneity and extreme values commonly observed in miRNA-seq data. Additionally, the parameters of the discrete stable distribution are proposed as an alternative target for differential expression analysis. An R package for computing and estimating the discrete stable distribution is provided. The proposed model is applied to miRNA-seq raw counts from the Norwegian Women and Cancer Study (NOWAC) and the Cancer Genome Atlas (TCGA) databases. The goodness-of-fit is compared with the popular Poisson and negative binomial distributions, and the discrete stable distributions are found to give a better fit for both datasets. In conclusion, the use of discrete stable distributions is shown to potentially lead to more accurate modeling of the underlying biological processes.


Subject(s)
MicroRNAs , Models, Statistical , MicroRNAs/genetics , Humans , Female , High-Throughput Nucleotide Sequencing/methods , Gene Expression Profiling/methods , Gene Expression Profiling/statistics & numerical data , Neoplasms/genetics , Sequence Analysis, RNA/methods , Sequence Analysis, RNA/statistics & numerical data , Software
11.
PLoS Comput Biol ; 20(5): e1012014, 2024 May.
Article in English | MEDLINE | ID: mdl-38809943

ABSTRACT

Recent advances in single-cell technologies have enabled high-resolution characterization of tissue and cancer compositions. Although numerous tools for dimension reduction and clustering are available for single-cell data analyses, these methods often fail to simultaneously preserve local cluster structure and global data geometry. To address these challenges, we developed a novel analyses framework, Single-Cell Path Metrics Profiling (scPMP), using power-weighted path metrics, which measure distances between cells in a data-driven way. Unlike Euclidean distance and other commonly used distance metrics, path metrics are density sensitive and respect the underlying data geometry. By combining path metrics with multidimensional scaling, a low dimensional embedding of the data is obtained which preserves both the global data geometry and cluster structure. We evaluate the method both for clustering quality and geometric fidelity, and it outperforms current scRNAseq clustering algorithms on a wide range of benchmarking data sets.


Subject(s)
Algorithms , Computational Biology , Single-Cell Analysis , Cluster Analysis , Single-Cell Analysis/methods , Single-Cell Analysis/statistics & numerical data , Humans , Computational Biology/methods , RNA-Seq/methods , RNA-Seq/statistics & numerical data , Gene Expression Profiling/methods , Gene Expression Profiling/statistics & numerical data , Sequence Analysis, RNA/methods , Sequence Analysis, RNA/statistics & numerical data , Single-Cell Gene Expression Analysis
12.
Pathol Res Pract ; 231: 153780, 2022 Mar.
Article in English | MEDLINE | ID: mdl-35101714

ABSTRACT

miR-145-5p is a microRNA whose role in diverse disorders has been verified. This miRNA is encoded by MIR145 gene on chromosome 5. This miRNA is mainly considered as a tumor suppressor miRNA in diverse types of cancers, including bladder cancer, breast cancer, cervical cancer, cholangiocarcinoma, renal cancer, and gastrointestinal cancers. However, few studies have reported up-regulation of this miRNA in some cancers. Moreover, it has been shown to affect pathogenesis of a number of non-malignant conditions such as aplastic anemia, asthma, cerebral ischemia/reperfusion injury, diabetic nephropathy, rheumatoid arthritis and Sjögren syndrome. In the current review, we summarize the available literature about the role of miR-145-5p in these conditions.


Subject(s)
Breast Neoplasms/genetics , MicroRNAs/metabolism , Stomach Neoplasms/genetics , Urinary Bladder Neoplasms/genetics , Breast Neoplasms/etiology , Breast Neoplasms/physiopathology , Down-Regulation/genetics , Gene Expression Profiling/methods , Gene Expression Profiling/statistics & numerical data , Humans , MicroRNAs/analysis , MicroRNAs/genetics , Stomach Neoplasms/etiology , Stomach Neoplasms/physiopathology , Urinary Bladder Neoplasms/etiology , Urinary Bladder Neoplasms/physiopathology
13.
J Diabetes Res ; 2022: 3511329, 2022.
Article in English | MEDLINE | ID: mdl-35155683

ABSTRACT

Type 1 diabetes (T1D) arises from autoimmune-mediated destruction of insulin-producing ß-cells leading to impaired insulin secretion and hyperglycemia. T1D is accompanied by DNA damage, oxidative stress, and inflammation, although there is still scarce information about the oxidative stress response and DNA repair in T1D pathogenesis. We used the microarray method to assess mRNA expression profiles in peripheral blood mononuclear cells (PBMCs) of 19 T1D patients compared to 11 controls and identify mRNA targets of microRNAs that were previously reported for T1D patients. We found 277 differentially expressed genes (220 upregulated and 57 downregulated) in T1D patients compared to controls. Analysis by gene sets (GSA and GSEA) showed an upregulation of processes linked to ROS generation, oxidative stress, inflammation, cell death, ER stress, and DNA repair in T1D patients. Besides, genes related to oxidative stress responses and DNA repair (PTGS2, ATF3, FOSB, DUSP1, and TNFAIP3) were found to be targets of four microRNAs (hsa-miR-101, hsa-miR148a, hsa-miR-27b, and hsa-miR-424). The expression levels of these mRNAs and microRNAs were confirmed by qRT-PCR. Therefore, the present study on differential expression profiles indicates relevant biological functions related to oxidative stress response, DNA repair, inflammation, and apoptosis in PBMCs of T1D patients relative to controls. We also report new insights regarding microRNA-mRNA interactions, which may play important roles in the T1D pathogenesis.


Subject(s)
Diabetes Mellitus, Type 1/drug therapy , MicroRNAs/pharmacology , Adolescent , Adult , Cell Death/drug effects , Cell Death/genetics , DNA Repair/drug effects , DNA Repair/genetics , Diabetes Mellitus, Type 1/metabolism , Diabetes Mellitus, Type 1/physiopathology , Female , Gene Expression Profiling/methods , Gene Expression Profiling/statistics & numerical data , Humans , Inflammation/drug therapy , Inflammation/genetics , Male , MicroRNAs/metabolism , MicroRNAs/therapeutic use , Oxidative Stress/drug effects , Oxidative Stress/genetics , Up-Regulation
15.
Genome Med ; 14(1): 18, 2022 02 21.
Article in English | MEDLINE | ID: mdl-35184750

ABSTRACT

BACKGROUND: Measuring host gene expression is a promising diagnostic strategy to discriminate bacterial and viral infections. Multiple signatures of varying size, complexity, and target populations have been described. However, there is little information to indicate how the performance of various published signatures compare to one another. METHODS: This systematic comparison of host gene expression signatures evaluated the performance of 28 signatures, validating them in 4589 subjects from 51 publicly available datasets. Thirteen COVID-specific datasets with 1416 subjects were included in a separate analysis. Individual signature performance was evaluated using the area under the receiving operating characteristic curve (AUC) value. Overall signature performance was evaluated using median AUCs and accuracies. RESULTS: Signature performance varied widely, with median AUCs ranging from 0.55 to 0.96 for bacterial classification and 0.69-0.97 for viral classification. Signature size varied (1-398 genes), with smaller signatures generally performing more poorly (P < 0.04). Viral infection was easier to diagnose than bacterial infection (84% vs. 79% overall accuracy, respectively; P < .001). Host gene expression classifiers performed more poorly in some pediatric populations (3 months-1 year and 2-11 years) compared to the adult population for both bacterial infection (73% and 70% vs. 82%, respectively; P < .001) and viral infection (80% and 79% vs. 88%, respectively; P < .001). We did not observe classification differences based on illness severity as defined by ICU admission for bacterial or viral infections. The median AUC across all signatures for COVID-19 classification was 0.80 compared to 0.83 for viral classification in the same datasets. CONCLUSIONS: In this systematic comparison of 28 host gene expression signatures, we observed differences based on a signature's size and characteristics of the validation population, including age and infection type. However, populations used for signature discovery did not impact performance, underscoring the redundancy among many of these signatures. Furthermore, differential performance in specific populations may only be observable through this type of large-scale validation.


Subject(s)
Bacterial Infections/diagnosis , Datasets as Topic/statistics & numerical data , Host-Pathogen Interactions/genetics , Transcriptome , Virus Diseases/diagnosis , Adult , Bacterial Infections/epidemiology , Bacterial Infections/genetics , Biomarkers/analysis , COVID-19/diagnosis , COVID-19/genetics , Child , Cohort Studies , Diagnosis, Differential , Gene Expression Profiling/statistics & numerical data , Genetic Association Studies/statistics & numerical data , Humans , Publications/statistics & numerical data , SARS-CoV-2/pathogenicity , Validation Studies as Topic , Virus Diseases/epidemiology , Virus Diseases/genetics
16.
Comput Math Methods Med ; 2022: 5777946, 2022.
Article in English | MEDLINE | ID: mdl-35096131

ABSTRACT

BACKGROUND: Smoking is one of the risk factors of coronary heart disease (CHD), while its underlying mechanism is less well defined. PURPOSE: To identify and testify 6 key genes of CHD related to smoking through weighted gene coexpression network analysis (WGCNA), protein-protein interaction (PPI) network analysis, and pathway analysis. METHODS: CHD patients' samples were first downloaded from Gene Expression Omnibus (GEO). Then, genes of interest were obtained after analysis of variance (ANOVA). Thereafter, 23 coexpressed modules that were determined after genes with similar expression were incorporated via WGCNA. The biological functions of genes in the modules were researched by enrichment analysis. Pearson correlation analysis and PPI network analysis were used to screen core genes related to smoking in CHD. RESULTS: The violet module was the most significantly associated with smoking (r = -0.28, p = 0.006). Genes in this module mainly participated in biological functions related to the heart. Altogether, 6 smoking-related core genes were identified through bioinformatics analyses. Their expressions in animal models were detected through the animal experiment. CONCLUSION: This study identified 6 core genes to serve as underlying biomarkers for monitoring and predicting smoker's CHD risk.


Subject(s)
Coronary Disease/etiology , Coronary Disease/genetics , Gene Regulatory Networks , Smoking/adverse effects , Smoking/genetics , Analysis of Variance , Animals , Computational Biology , Databases, Genetic , Disease Models, Animal , Gene Expression Profiling/statistics & numerical data , Heart Disease Risk Factors , Humans , Male , Mice , Mice, Inbred BALB C , Protein Interaction Maps/genetics
17.
Comput Math Methods Med ; 2022: 2021613, 2022.
Article in English | MEDLINE | ID: mdl-35069777

ABSTRACT

BACKGROUND: Hepatocellular carcinoma (HCC) is predominant among all types of primary liver cancers characterised by high morbidity and mortality. Genes in the mediator complex (MED) family are engaged in the tumour-immune microenvironment and function as regulatory hubs mediating carcinogenesis and progression across diverse cancer types. Whereas research studies have been conducted to examine the mechanisms in several cancers, studies that systematically focused on the therapeutic and prognostic values of MED in patients with HCC are limited. METHODS: The online databases ONCOMINE, GEPIA, UALCAN, GeneMANIA, cBioPortal, OmicStudio, STING, Metascape, and TIMER were used in this study. RESULTS: The transcriptional levels of all members of the MED family in HCC presented an aberrant high expression pattern. Significant correlations were found between the MED1, MED6, MED8, MED10, MED12, MED15, MED17, MED19, MED20, MED21, MED22, MED23, MED24, MED25, MED26, and MED27 expression levels and the pathological stage in the patients with HCC. The patients with high expression levels of MED6, MED8, MED10, MED17, MED19, MED20, MED21, MED22, MED24, and MED25 were significantly associated with poor prognosis. Functional enrichment analysis revealed that the members of the MED family were mainly enriched in the nucleobase-containing compound catabolic process, regulation of chromosome organisation, and transcriptional regulation by TP53. Significant correlations were found between the MED6, MED8, MED10, MED17, MED19, MED20, MED21, MED22, MED24, and MED25 expression levels and all types of immune cells (B cells, CD8+ T cells, CD4+ T cells, macrophages, neutrophils, and dendritic cells). B cells and MED8 were independent predictors of overall survival. We found significant correlations between the somatic copy number alterations of the MED6, MED8, MED10, MED20, MED21, MED22, MED24, and MED25 molecules and the abundance of immune infiltrates. CONCLUSIONS: Our study delineated a thorough landscape to investigate the therapeutic and prognostic potentials of the MED family for HCC cases, which yielded promising results for the development of immunotherapeutic drugs and construction of a prognostic stratification model.


Subject(s)
Biomarkers, Tumor/genetics , Carcinoma, Hepatocellular/genetics , Liver Neoplasms/genetics , Mediator Complex/genetics , Biomarkers, Tumor/immunology , Carcinoma, Hepatocellular/immunology , Computational Biology , Databases, Genetic , Gene Expression Profiling/statistics & numerical data , Gene Expression Regulation, Neoplastic , Gene Regulatory Networks , Humans , Kaplan-Meier Estimate , Liver Neoplasms/immunology , Mediator Complex/immunology , Multigene Family , Prognosis , Protein Interaction Maps/genetics , Protein Interaction Maps/immunology , Tumor Microenvironment/genetics , Tumor Microenvironment/immunology
18.
Comput Math Methods Med ; 2022: 6609901, 2022.
Article in English | MEDLINE | ID: mdl-35069789

ABSTRACT

Intervertebral disc degeneration (IDD) is a major cause of lower back pain. However, to date, the molecular mechanism of the IDD remains unclear. Gene expression profiles and clinical traits were downloaded from the Gene Expression Omnibus (GEO) database. Firstly, weighted gene coexpression network analysis (WGCNA) was used to screen IDD-related genes. Moreover, least absolute shrinkage and selection operator (LASSO) logistic regression and support vector machine (SVM) algorithms were used to identify characteristic genes. Furthermore, we further investigated the immune landscape by the Cell-type Identification By Estimating Relative Subsets Of RNA Transcripts (CIBERSORT) algorithm and the correlations between key characteristic genes and infiltrating immune cells. Finally, a competing endogenous RNA (ceRNA) network was established to show the regulatory mechanisms of characteristic genes. A total of 2458 genes were identified by WGCNA, and 48 of them were disordered. After overlapping the genes obtained by LASSO and SVM-RFE algorithms, genes including LINC01347, ASAP1-IT1, lnc-SEPT7L-1, B3GNT8, CHRNB3, CLEC4F, LOC102724000, SERINC2, and LOC102723649 were identified as characteristic genes of IDD. Moreover, differential analysis further identified ASAP1-IT1 and SERINC2 as key characteristic genes. Furthermore, we found that the expression of both ASAP1-IT1 and SERINC2 was related to the proportions of T cells gamma delta and Neutrophils. Finally, a ceRNA network was established to show the regulatory mechanisms of ASAP1-IT1 and SERINC2. In conclusion, the present study identified ASAP1-IT1 and SERINC2 as the key characteristic genes of IDD through integrative bioinformatic analyses, which may contribute to the diagnosis and treatment of IDD.


Subject(s)
Gene Regulatory Networks , Intervertebral Disc Degeneration/genetics , Adaptor Proteins, Signal Transducing/genetics , Algorithms , Computational Biology , Databases, Genetic/statistics & numerical data , Down-Regulation , Gene Expression Profiling/statistics & numerical data , Humans , Intervertebral Disc Degeneration/blood , Intervertebral Disc Degeneration/immunology , Membrane Proteins/genetics , RNA/blood , RNA/genetics , Up-Regulation
19.
J Comput Biol ; 29(1): 23-26, 2022 01.
Article in English | MEDLINE | ID: mdl-35020490

ABSTRACT

scDesign2 is a transparent simulator that generates high-fidelity single-cell gene expression count data with gene correlations captured. This article shows how to download and install the scDesign2 R package, how to fit probabilistic models (one per cell type) to real data and simulate synthetic data from the fitted models, and how to use scDesign2 to guide experimental design and benchmark computational methods. Finally, a note is given about cell clustering as a preprocessing step before model fitting and data simulation.


Subject(s)
Gene Expression Profiling/statistics & numerical data , Single-Cell Analysis/statistics & numerical data , Software , Algorithms , Animals , Cluster Analysis , Computational Biology , Computer Simulation , Databases, Nucleic Acid/statistics & numerical data , Gene Expression , Mice , Models, Statistical , RNA-Seq/statistics & numerical data
20.
J Comput Biol ; 29(2): 121-139, 2022 02.
Article in English | MEDLINE | ID: mdl-35041494

ABSTRACT

Current expression quantification methods suffer from a fundamental but undercharacterized type of error: the most likely estimates for transcript abundances are not unique. This means multiple estimates of transcript abundances generate the observed RNA-seq reads with equal likelihood, and the underlying true expression cannot be determined. This is called nonidentifiability in probabilistic modeling. It is further exacerbated by incomplete reference transcriptomes where reads may be sequenced from unannotated transcripts. Graph quantification is a generalization to transcript quantification, accounting for the reference incompleteness by allowing exponentially many unannotated transcripts to express reads. We propose methods to calculate a "confidence range of expression" for each transcript, representing its possible abundance across equally optimal estimates for both quantification models. This range informs both whether a transcript has potential estimation error due to nonidentifiability and the extent of the error. Applying our methods to the Human Body Map data, we observe that 35%-50% of transcripts potentially suffer from inaccurate quantification caused by nonidentifiability. When comparing the expression between isoforms in one sample, we find that the degree of inaccuracy of 20%-47% transcripts can be so large that the ranking of expression between the transcript and other isoforms from the same gene cannot be determined. When comparing the expression of a transcript between two groups of RNA-seq samples in differential expression analysis, we observe that the majority of detected differentially expressed transcripts are reliable with a few exceptions after considering the ranges of the optimal expression estimates.


Subject(s)
Algorithms , Gene Expression Profiling/statistics & numerical data , Transcriptome , Alternative Splicing , Computational Biology , Confidence Intervals , Databases, Nucleic Acid/statistics & numerical data , Humans , Models, Statistical , RNA-Seq/statistics & numerical data
SELECTION OF CITATIONS
SEARCH DETAIL