Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 43
Filter
1.
EBioMedicine ; 98: 104870, 2023 Dec.
Article in English | MEDLINE | ID: mdl-37967508

ABSTRACT

BACKGROUND: Nasopharyngeal carcinoma (NPC) is a malignant head and neck cancer with a high incidence in Southern China and Southeast Asia. Patients with remote metastasis and recurrent NPC have poor prognosis. Thus, a better understanding of NPC pathogenesis may identify novel therapies to address the unmet clinical needs. METHODS: H3K27ac ChIP-seq and HiChIP was applied to understand the enhancer landscapes and the chromosome interactions. Whole genome sequencing was conducted to analyze the relationship between genomic variations and epigenetic dysregulation. CRISPRi and JQ1 treatment were used to evaluate the transcriptional regulation of SOX2 SEs. Colony formation assay, survival analysis and in vivo subcutaneous patient-derived xenograft assays were applied to explore the function and clinical relevance of SOX2 in NPC. FINDINGS: We globally mapped the enhancer landscapes and generated NPC enhancer connectomes, linking NPC specific enhancers and SEs. We found five overlapped genes, including SOX2, among super-enhancer regulated genes, survival related genes and NPC essential genes. The mRNA expression of SOX2 was repressed when applying CRISPRi targeting different SOX2 SEs or JQ1 treatment. Next, we identified a genetic variation (Chr3:181422197, G > A) in SOX2 SE which is correlated with higher expression of SOX2 and poor survival. In addition, SOX2 was highly expressed in NPC and is correlated with short survival in patients with NPC. Knock-down of SOX2 suppressed tumor growth in vitro and in vivo. INTERPRETATION: Our study demonstrated the super-enhancer landscape with chromosome interactions and identified super-enhancer driven SOX2 promotes tumorigenesis, suggesting that SOX2 is a potential therapeutic target for patients with NPC. FUNDING: A full list of funding bodies that contributed to this study can be found in the Acknowledgements section.


Subject(s)
Nasopharyngeal Neoplasms , Humans , Nasopharyngeal Carcinoma/genetics , Nasopharyngeal Carcinoma/pathology , Nasopharyngeal Neoplasms/genetics , Nasopharyngeal Neoplasms/pathology , Neoplasm Recurrence, Local/genetics , Survival Analysis , Chromatin/genetics , Cell Line, Tumor , Gene Expression Regulation, Neoplastic , Cell Proliferation , SOXB1 Transcription Factors/genetics , SOXB1 Transcription Factors/metabolism
2.
Genome Res ; 33(5): 750-762, 2023 May.
Article in English | MEDLINE | ID: mdl-37308294

ABSTRACT

For most biological and medical applications of single-cell transcriptomics, an integrative study of multiple heterogeneous single-cell RNA sequencing (scRNA-seq) data sets is crucial. However, present approaches are unable to integrate diverse data sets from various biological conditions effectively because of the confounding effects of biological and technical differences. We introduce single-cell integration (scInt), an integration method based on accurate, robust cell-cell similarity construction and unified contrastive biological variation learning from multiple scRNA-seq data sets. scInt provides a flexible and effective approach to transfer knowledge from the already integrated reference to the query. We show that scInt outperforms 10 other cutting-edge approaches using both simulated and real data sets, particularly in the case of complex experimental designs. Application of scInt to mouse developing tracheal epithelial data shows its ability to integrate development trajectories from different developmental stages. Furthermore, scInt successfully identifies functionally distinct condition-specific cell subpopulations in single-cell heterogeneous samples from a variety of biological conditions.


Subject(s)
Single-Cell Analysis , Single-Cell Gene Expression Analysis , Animals , Mice , Single-Cell Analysis/methods , Gene Expression Profiling/methods , Exome Sequencing , Sequence Analysis, RNA/methods
3.
J Virol ; 96(18): e0073922, 2022 09 28.
Article in English | MEDLINE | ID: mdl-36094314

ABSTRACT

Epstein-Barr virus (EBV) persists in human cells as episomes. EBV episomes are chromatinized and their 3D conformation varies greatly in cells expressing different latency genes. We used HiChIP, an assay which combines genome-wide chromatin conformation capture followed by deep sequencing (Hi-C) and chromatin immunoprecipitation (ChIP), to interrogate the EBV episome 3D conformation in different cancer cell lines. In an EBV-transformed lymphoblastoid cell line (LCL) GM12878 expressing type III EBV latency genes, abundant genomic interactions were identified by H3K27ac HiChIP. A strong enhancer was located near the BILF2 gene and looped to multiple genes around BALFs loci. Perturbation of the BILF2 enhancer by CRISPR interference (CRISPRi) and CRISPR activation (CRISPRa) altered the expression of BILF2 enhancer-linked genes, including BARF0 and BALF2, suggesting that this enhancer regulates the expression of linked genes. H3K27ac ChIP followed by deep sequencing (ChIP-seq) identified several strong EBV enhancers in T/NK (natural killer) lymphoma cells that express type II EBV latency genes. Extensive intragenomic interactions were also found which linked enhancers to target genes. A strong enhancer at BILF2 also looped to the BALF loci. CRISPRi also validated the functional connection between BILF2 enhancer and BARF1 gene. In contrast, H3K27ac HiChIP found significantly fewer intragenomic interactions in type I EBV latency gene-expressing primary effusion lymphoma (PEL) cell lines. These data provided new insight into the regulation of EBV latency gene expression in different EBV-associated tumors. IMPORTANCE EBV is the first human DNA tumor virus identified, discovered over 50 years ago. EBV causes ~200,000 cases of various cancers each year. EBV-encoded oncogenes, noncoding RNAs, and microRNAs (miRNAs) can promote cell growth and survival and suppress senescence. Regulation of EBV gene expression is very complex. The viral C promoter regulates the expression of all EBV nuclear antigens (EBNAs), some of which are very far away from the C promoter. Another way by which the virus activates remote gene expression is through DNA looping. In this study, we describe the viral genome looping patterns in various EBV-associated cancer cell lines and identify important EBV enhancers in these cells. This study also identified novel opportunities to perturb and eventually control EBV gene expression in these cancer cells.


Subject(s)
Epstein-Barr Virus Infections , Herpesvirus 4, Human , Plasmids , Virus Latency , Cell Line, Tumor , Enhancer Elements, Genetic/genetics , Epstein-Barr Virus Infections/genetics , Epstein-Barr Virus Infections/virology , Epstein-Barr Virus Nuclear Antigens/genetics , Herpesvirus 4, Human/genetics , Humans , MicroRNAs/metabolism , Neoplasms/virology , Plasmids/chemistry , Plasmids/genetics , Plasmids/metabolism , Viral Proteins/genetics , Virus Latency/genetics
4.
Cell Biosci ; 12(1): 142, 2022 Sep 02.
Article in English | MEDLINE | ID: mdl-36056412

ABSTRACT

BACKGROUND: Single-cell RNA sequencing (scRNA-seq) provides a powerful tool to capture transcriptomes at single-cell resolution. However, dropout events distort the gene expression levels and underlying biological signals, misleading the downstream analysis of scRNA-seq data. RESULTS: We develop a statistical model-based multidimensional imputation algorithm, scMTD, that identifies local cell neighbors and specific gene co-expression networks based on the pseudo-time of cells, leveraging information on cell-level, gene-level, and transcriptome dynamic to recover scRNA-seq data. Compared with the state-of-the-art imputation methods through several real-data-based analytical experiments, scMTD effectively recovers biological signals of transcriptomes and consistently outperforms the other algorithms in improving FISH validation, trajectory inference, differential expression analysis, clustering analysis, and identification of cell types. CONCLUSIONS: scMTD maintains the gene expression characteristics, enhances the clustering of cell subpopulations, assists the study of gene expression dynamics, contributes to the discovery of rare cell types, and applies to both UMI-based and non-UMI-based data. Overall, scMTD's reliability, applicability, and scalability make it a promising imputation approach for scRNA-seq data.

5.
Methods Mol Biol ; 2432: 101-111, 2022.
Article in English | MEDLINE | ID: mdl-35505210

ABSTRACT

With the rapid development of methylation profiling technology, many datasets are generated to quantify genome-wide methylation patterns. Given the heavy burden of multiple testing of hundreds of thousands of DNA methylation markers, individual studies often have limited sample sizes and power. The EWAS meta-analysis is an approach that combines results from multiple studies on the same scientific question. It helps to improve statistical power by combining information from individual studies and reduce the chances of false positives. This chapter introduces commonly used meta-analysis methods and analytical tools with application to EWAS data.


Subject(s)
Epigenesis, Genetic , Epigenome , DNA Methylation , Genome-Wide Association Study/methods , Sample Size
6.
Mol Cancer ; 21(1): 74, 2022 03 12.
Article in English | MEDLINE | ID: mdl-35279145

ABSTRACT

BACKGROUND: Epithelial-to-mesenchymal transition (EMT) is a process linked to metastasis and drug resistance with non-coding RNAs (ncRNAs) playing pivotal roles. We previously showed that miR-100 and miR-125b, embedded within the third intron of the ncRNA host gene MIR100HG, confer resistance to cetuximab, an anti-epidermal growth factor receptor (EGFR) monoclonal antibody, in colorectal cancer (CRC). However, whether the MIR100HG transcript itself has a role in cetuximab resistance or EMT is unknown. METHODS: The correlation between MIR100HG and EMT was analyzed by curating public CRC data repositories. The biological roles of MIR100HG in EMT, metastasis and cetuximab resistance in CRC were determined both in vitro and in vivo. The expression patterns of MIR100HG, hnRNPA2B1 and TCF7L2 in CRC specimens from patients who progressed on cetuximab and patients with metastatic disease were analyzed by RNAscope and immunohistochemical staining. RESULTS: The expression of MIR100HG was strongly correlated with EMT markers and acted as a positive regulator of EMT. MIR100HG sustained cetuximab resistance and facilitated invasion and metastasis in CRC cells both in vitro and in vivo. hnRNPA2B1 was identified as a binding partner of MIR100HG. Mechanistically, MIR100HG maintained mRNA stability of TCF7L2, a major transcriptional coactivator of the Wnt/ß-catenin signaling, by interacting with hnRNPA2B1. hnRNPA2B1 recognized the N6-methyladenosine (m6A) site of TCF7L2 mRNA in the presence of MIR100HG. TCF7L2, in turn, activated MIR100HG transcription, forming a feed forward regulatory loop. The MIR100HG/hnRNPA2B1/TCF7L2 axis was augmented in specimens from CRC patients who either developed local or distant metastasis or had disease progression that was associated with cetuximab resistance. CONCLUSIONS: MIR100HG and hnRNPA2B1 interact to control the transcriptional activity of Wnt signaling in CRC via regulation of TCF7L2 mRNA stability. Our findings identified MIR100HG as a potent EMT inducer in CRC that may contribute to cetuximab resistance and metastasis by activation of a MIR100HG/hnRNPA2B1/TCF7L2 feedback loop.


Subject(s)
Colorectal Neoplasms , Heterogeneous-Nuclear Ribonucleoprotein Group A-B , MicroRNAs , RNA, Long Noncoding , Cell Line, Tumor , Cell Movement/genetics , Cetuximab/genetics , Cetuximab/metabolism , Colorectal Neoplasms/pathology , Epithelial-Mesenchymal Transition/genetics , Gene Expression Regulation, Neoplastic , Heterogeneous-Nuclear Ribonucleoprotein Group A-B/genetics , Humans , MicroRNAs/genetics , MicroRNAs/metabolism , RNA, Long Noncoding/genetics , RNA, Long Noncoding/metabolism , RNA, Messenger/genetics , Transcription Factor 7-Like 2 Protein/genetics , Transcription Factor 7-Like 2 Protein/metabolism , Wnt Signaling Pathway/genetics
7.
PLoS Comput Biol ; 18(1): e1009770, 2022 Jan.
Article in English | MEDLINE | ID: mdl-34986151

ABSTRACT

[This corrects the article DOI: 10.1371/journal.pcbi.1009118.].

8.
PLoS Comput Biol ; 17(6): e1009118, 2021 06.
Article in English | MEDLINE | ID: mdl-34138847

ABSTRACT

The single-cell RNA sequencing (scRNA-seq) technologies obtain gene expression at single-cell resolution and provide a tool for exploring cell heterogeneity and cell types. As the low amount of extracted mRNA copies per cell, scRNA-seq data exhibit a large number of dropouts, which hinders the downstream analysis of the scRNA-seq data. We propose a statistical method, SDImpute (Single-cell RNA-seq Dropout Imputation), to implement block imputation for dropout events in scRNA-seq data. SDImpute automatically identifies the dropout events based on the gene expression levels and the variations of gene expression across similar cells and similar genes, and it implements block imputation for dropouts by utilizing gene expression unaffected by dropouts from similar cells. In the experiments, the results of the simulated datasets and real datasets suggest that SDImpute is an effective tool to recover the data and preserve the heterogeneity of gene expression across cells. Compared with the state-of-the-art imputation methods, SDImpute improves the accuracy of the downstream analysis including clustering, visualization, and differential expression analysis.


Subject(s)
RNA-Seq/statistics & numerical data , Single-Cell Analysis/statistics & numerical data , Software , Animals , Cluster Analysis , Computational Biology , Computer Simulation , Data Interpretation, Statistical , Data Visualization , Databases, Nucleic Acid/statistics & numerical data , Gene Expression Profiling/statistics & numerical data , Genetic Techniques/statistics & numerical data , Humans , RNA, Messenger/genetics , RNA, Messenger/isolation & purification
9.
Article in English | MEDLINE | ID: mdl-33401657

ABSTRACT

COVID-19 patients always develop multiple organ dysfunction syndromes other than lungs, suggesting the novel virus SARS-CoV-2 also invades other organs. Therefore, studying the viral susceptibility of other organs is important for a deeper understanding of viral pathogenesis. Angiotensin-converting enzyme II (ACE2) is the receptor protein of SARS-CoV-2, and TMPRSS2 promotes virus proliferation and transmission. We investigated the ACE2 and TMPRSS2 expression levels of cell types from 31 organs to evaluate the risk of viral infection using single-cell RNA sequencing (scRNA-seq) data. For the first time, we found that the gall bladder and fallopian tube are vulnerable to SARS-CoV-2 infection. Besides, the nose, heart, small intestine, large intestine, esophagus, brain, testis, and kidney are also identified to be high-risk organs with high expression levels of ACE2 and TMPRSS2. Moreover, the susceptible organs are grouped into three risk levels based on the ACE2 and TMPRSS2 expression. As a result, the respiratory system, digestive system, and urinary system are at the top-risk level for SARS-CoV-2 infection. This study provides evidence for SARS-CoV-2 infection in the human nervous system, digestive system, reproductive system, respiratory system, circulatory system, and urinary system using scRNA-seq data, which helps in the clinical diagnosis and treatment of patients.


Subject(s)
Angiotensin-Converting Enzyme 2/genetics , COVID-19/genetics , Genetic Predisposition to Disease , RNA, Small Cytoplasmic/genetics , Serine Endopeptidases/genetics , Female , Gene Expression Profiling , Humans , Male , Single-Cell Analysis
10.
Genomics ; 113(2): 456-462, 2021 03.
Article in English | MEDLINE | ID: mdl-33383142

ABSTRACT

T-cell receptor (TCR) is crucial in T cell-mediated virus clearance. To date, TCR bias has been observed in various diseases. However, studies on the TCR repertoire of COVID-19 patients are lacking. Here, we used single-cell V(D)J sequencing to conduct comparative analyses of TCR repertoire between 12 COVID-19 patients and 6 healthy controls, as well as other virus-infected samples. We observed distinct T cell clonal expansion in COVID-19. Further analysis of VJ gene combination revealed 6 VJ pairs significantly increased, while 139 pairs significantly decreased in COVID-19 patients. When considering the VJ combination of α and ß chains at the same time, the combination with the highest frequency on COVID-19 was TRAV12-2-J27-TRBV7-9-J2-3. Besides, preferential usage of V and J gene segments was also observed in samples infected by different viruses. Our study provides novel insights on TCR in COVID-19, which contribute to our understanding of the immune response induced by SARS-CoV-2.


Subject(s)
COVID-19/genetics , High-Throughput Nucleotide Sequencing , Receptors, Antigen, T-Cell/genetics , SARS-CoV-2 , Single-Cell Analysis , COVID-19/immunology , Female , Humans , Male , T-Lymphocytes/immunology
11.
BMC Bioinformatics ; 21(Suppl 16): 540, 2020 Dec 16.
Article in English | MEDLINE | ID: mdl-33323107

ABSTRACT

BACKGROUND: Single-cell RNA sequencing can be used to fairly determine cell types, which is beneficial to the medical field, especially the many recent studies on COVID-19. Generally, single-cell RNA data analysis pipelines include data normalization, size reduction, and unsupervised clustering. However, different normalization and size reduction methods will significantly affect the results of clustering and cell type enrichment analysis. Choices of preprocessing paths is crucial in scRNA-Seq data mining, because a proper preprocessing path can extract more important information from complex raw data and lead to more accurate clustering results. RESULTS: We proposed a method called NDRindex (Normalization and Dimensionality Reduction index) to evaluate data quality of outcomes of normalization and dimensionality reduction methods. The method includes a function to calculate the degree of data aggregation, which is the key to measuring data quality before clustering. For the five single-cell RNA sequence datasets we tested, the results proved the efficacy and accuracy of our index. CONCLUSIONS: This method we introduce focuses on filling the blanks in the selection of preprocessing paths, and the result proves its effectiveness and accuracy. Our research provides useful indicators for the evaluation of RNA-Seq data.


Subject(s)
Computational Biology/methods , Databases, Nucleic Acid/classification , Databases, Nucleic Acid/standards , RNA-Seq/methods , COVID-19/virology , Cluster Analysis , Humans , SARS-CoV-2/genetics
12.
J Alzheimers Dis ; 76(2): 713-724, 2020.
Article in English | MEDLINE | ID: mdl-32538835

ABSTRACT

BACKGROUND: Altered calcium homeostasis is hypothesized to underlie Alzheimer's disease (AD). However, it remains unclear whether serum calcium levels are genetically associated with AD risk. OBJECTIVE: To develop effective therapies, we should establish the causal link between serum calcium levels and AD. METHODS: Here, we performed a Mendelian randomization study to investigate the causal association of increased serum calcium levels with AD risk using the genetic variants from a large-scale serum calcium genome-wide association study (GWAS) dataset (61,079 individuals of European descent) and a large-scale AD GWAS dataset (54,162 individuals including 17,008 AD cases and 37,154 controls of European descent). Here, we selected the inverse-variance weighted (IVW) as the main analysis method. Meanwhile, we selected other three sensitivity analysis methods to examine the robustness of the IVW estimate. RESULTS: IVW analysis showed that the increased serum calcium level (per 1 standard deviation (SD) increase 0.5 mg/dL) was significantly associated with a reduced AD risk (OR = 0.57, 95% CI 0.35-0.95, p = 0.031). Meanwhile, all the estimates from other sensitivity analysis methods were consistent with the IVW estimate in terms of direction and magnitude. CONCLUSION: In summary, we provided evidence that increased serum calcium levels could reduce the risk of AD. Meanwhile, randomized controlled study should be conducted to clarify whether diet calcium intake or calcium supplement, or both could reduce the risk of AD.


Subject(s)
Alzheimer Disease/blood , Calcium/blood , Databases, Genetic , Genetic Variation/genetics , Mendelian Randomization Analysis/methods , Aged , Alzheimer Disease/diagnosis , Alzheimer Disease/epidemiology , Biomarkers/blood , Female , Humans , Male , Middle Aged
13.
BMC Genomics ; 21(1): 149, 2020 Feb 11.
Article in English | MEDLINE | ID: mdl-32046631

ABSTRACT

BACKGROUND: With the rapid development of high-throughput sequencing technologies, many datasets on the same biological subject are generated. A meta-analysis is an approach that combines results from different studies on the same topic. The random-effects model in a meta-analysis enables the modeling of differences between studies by incorporating the between-study variance. RESULTS: This paper proposes a moments estimator of the between-study variance that represents the across-study variation. A new random-effects method (DSLD2), which involves two-step estimation starting with the DSL estimate and the [Formula: see text] in the second step, is presented. The DSLD2 method is compared with 6 other meta-analysis methods based on effect sizes across 8 aspects under three hypothesis settings. The results show that DSLD2 is a suitable method for identifying differentially expressed genes under the first hypothesis. The DSLD2 method is also applied to Alzheimer's microarray datasets. The differentially expressed genes detected by the DSLD2 method are significantly enriched in neurological diseases. CONCLUSIONS: The results from both simulationes and an application show that DSLD2 is a suitable method for detecting differentially expressed genes under the first hypothesis.


Subject(s)
Gene Expression Profiling/methods , Alzheimer Disease/genetics , Data Interpretation, Statistical , Humans , Likelihood Functions , Meta-Analysis as Topic , Models, Statistical , Monte Carlo Method , ROC Curve
14.
J Alzheimers Dis ; 73(2): 609-618, 2020.
Article in English | MEDLINE | ID: mdl-31815694

ABSTRACT

Observational studies strongly supported the association of low levels of circulating 25-hydroxyvitamin D (25OHD) and cognitive impairment or dementia in aging populations. However, randomized controlled trials have not shown clear evidence that vitamin D supplementation could improve cognitive outcomes. In fact, some studies reported the association between vitamin D and cognitive impairment based on individuals aged 60 years and over. However, it is still unclear that whether vitamin D levels are causally associated with Alzheimer's disease (AD) risk in individuals aged 60 years and over. Here, we performed a Mendelian randomization (MR) study to investigate the causal association between vitamin D levels and AD using a large-scale vitamin D genome-wide association study (GWAS) dataset and two large-scale AD GWAS datasets from the IGAP and UK Biobank with individuals aged 60 years and over. Our results showed that genetically increased 25OHD levels were significantly associated with reduced AD risk in individuals aged 60 years and over. Hence, our findings in combination with previous literature indicate that maintaining adequate vitamin D status in older people especially aged 60 years and over, may contribute to slow down cognitive decline and forestall AD. Long-term randomized controlled trials are required to test whether vitamin D supplementation may prevent AD in older people especially those aged 60 years and may be recommended as preventive agents.


Subject(s)
Alzheimer Disease/blood , Alzheimer Disease/epidemiology , Mendelian Randomization Analysis , Vitamin D Deficiency/epidemiology , Vitamin D Deficiency/genetics , Vitamin D/genetics , Aged , Aged, 80 and over , Biological Specimen Banks , Cognition Disorders/metabolism , Cognition Disorders/psychology , Databases, Factual , Female , Genome-Wide Association Study , Humans , Hydroxycholecalciferols/blood , Hydroxycholecalciferols/genetics , Male , Middle Aged , Nutritional Status , United Kingdom/epidemiology , Vitamin D/blood
15.
BMC Bioinformatics ; 20(Suppl 25): 691, 2019 Dec 24.
Article in English | MEDLINE | ID: mdl-31874619

ABSTRACT

BACKGROUND: The association between BIN1 rs744373 variant and Alzheimer's disease (AD) had been identified by genome-wide association studies (GWASs) as well as candidate gene studies in Caucasian populations. But in East Asian populations, both positive and negative results had been identified by association studies. Considering the smaller sample sizes of the studies in East Asian, we believe that the results did not have enough statistical power. RESULTS: We conducted a meta-analysis with 71,168 samples (22,395 AD cases and 48,773 controls, from 37 studies of 19 articles). Based on the additive model, we observed significant genetic heterogeneities in pooled populations as well as Caucasians and East Asians. We identified a significant association between rs744373 polymorphism with AD in pooled populations (P = 5 × 10- 07, odds ratio (OR) = 1.12, and 95% confidence interval (CI) 1.07-1.17) and in Caucasian populations (P = 3.38 × 10- 08, OR = 1.16, 95% CI 1.10-1.22). But in the East Asian populations, the association was not identified (P = 0.393, OR = 1.057, and 95% CI 0.95-1.15). Besides, the regression analysis suggested no significant publication bias. The results for sensitivity analysis as well as meta-analysis under the dominant model and recessive model remained consistent, which demonstrated the reliability of our finding. CONCLUSIONS: The large-scale meta-analysis highlighted the significant association between rs744373 polymorphism and AD risk in Caucasian populations but not in the East Asian populations.


Subject(s)
Adaptor Proteins, Signal Transducing/genetics , Alzheimer Disease/genetics , Nuclear Proteins/genetics , Tumor Suppressor Proteins/genetics , Asian People/genetics , Genetic Heterogeneity , Genome-Wide Association Study , Humans , Polymorphism, Genetic , Reproducibility of Results , White People/genetics
16.
BMC Bioinformatics ; 20(Suppl 18): 573, 2019 Nov 25.
Article in English | MEDLINE | ID: mdl-31760933

ABSTRACT

BACKGROUND: During procedures for conducting multiple sequence alignment, that is so essential to use the substitution score of pairwise alignment. To compute adaptive scores for alignment, researchers usually use Hidden Markov Model or probabilistic consistency methods such as partition function. Recent studies show that optimizing the parameters for hidden Markov model, as well as integrating hidden Markov model with partition function can raise the accuracy of alignment. The combination of partition function and optimized HMM, which could further improve the alignment's accuracy, however, was ignored by these researches. RESULTS: A novel algorithm for MSA called ProbPFP is presented in this paper. It intergrate optimized HMM by particle swarm with partition function. The algorithm of PSO was applied to optimize HMM's parameters. After that, the posterior probability obtained by the HMM was combined with the one obtained by partition function, and thus to calculate an integrated substitution score for alignment. In order to evaluate the effectiveness of ProbPFP, we compared it with 13 outstanding or classic MSA methods. The results demonstrate that the alignments obtained by ProbPFP got the maximum mean TC scores and mean SP scores on these two benchmark datasets: SABmark and OXBench, and it got the second highest mean TC scores and mean SP scores on the benchmark dataset BAliBASE. ProbPFP is also compared with 4 other outstanding methods, by reconstructing the phylogenetic trees for six protein families extracted from the database TreeFam, based on the alignments obtained by these 5 methods. The result indicates that the reference trees are closer to the phylogenetic trees reconstructed from the alignments obtained by ProbPFP than the other methods. CONCLUSIONS: We propose a new multiple sequence alignment method combining optimized HMM and partition function in this paper. The performance validates this method could make a great improvement of the alignment's accuracy.


Subject(s)
Computational Biology/methods , Proteins/genetics , Sequence Alignment/methods , Algorithms , Animals , Humans , Markov Chains , Multigene Family , Phylogeny , Proteins/chemistry , Software
19.
Entropy (Basel) ; 21(3)2019 Mar 04.
Article in English | MEDLINE | ID: mdl-33266957

ABSTRACT

The advancement of high-throughput RNA sequencing has uncovered the profound truth in biology, ranging from the study of differential expressed genes to the identification of different genomic phenotype across multiple conditions. However, lack of biological replicates and low expressed data are still obstacles to measuring differentially expressed genes effectively. We present an algorithm based on differential entropy-like function (DEF) to test for the differential expression across time-course data or multi-sample data with few biological replicates. Compared with limma, edgeR, DESeq2, and baySeq, DEF maintains equivalent or better performance on the real data of two conditions. Moreover, DEF is well suited for predicting the genes that show the greatest differences across multiple conditions such as time-course data and identifies various biologically relevant genes.

SELECTION OF CITATIONS
SEARCH DETAIL