Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 37
Filtrar
Mais filtros

Bases de dados
Tipo de documento
Intervalo de ano de publicação
1.
Brief Bioinform ; 23(2)2022 03 10.
Artigo em Inglês | MEDLINE | ID: mdl-35136933

RESUMO

The advances in single-cell RNA sequencing (scRNA-seq) technologies enable the characterization of transcriptomic profiles at the cellular level and demonstrate great promise in bulk sample analysis thereby offering opportunities to transfer gene signature from scRNA-seq to bulk data. However, the gene expression signatures identified from single cells are typically inapplicable to bulk RNA-seq data due to the profiling differences of distinct sequencing technologies. Here, we propose single-cell pair-wise gene expression (scPAGE), a novel method to develop single-cell gene pair signatures (scGPSs) that were beneficial to bulk RNA-seq classification to transfer knowledge across platforms. PAGE was adopted to tackle the challenge of profiling differences. We applied the method to acute myeloid leukemia (AML) and identified the scGPS from mouse scRNA-seq that allowed discriminating between AML and control cells. The scGPS was validated in bulk RNA-seq datasets and demonstrated better performance (average area under the curve [AUC] = 0.96) than the conventional gene expression strategies (average AUC$\le$ 0.88) suggesting its potential in disclosing the molecular mechanism of AML. The scGPS also outperformed its bulk counterpart, which highlighted the benefit of gene signature transfer. Furthermore, we confirmed the utility of scPAGE in sepsis as an example of other disease scenarios. scPAGE leveraged the advantages of single-cell profiles to enhance the analysis of bulk samples revealing great potential of transferring knowledge from single-cell to bulk transcriptome studies.


Assuntos
Leucemia Mieloide Aguda , Análise de Célula Única , Animais , Perfilação da Expressão Gênica/métodos , Leucemia Mieloide Aguda/genética , Camundongos , RNA-Seq , Análise de Sequência de RNA/métodos , Análise de Célula Única/métodos , Transcriptoma
2.
Bioinformatics ; 39(5)2023 05 04.
Artigo em Inglês | MEDLINE | ID: mdl-37084255

RESUMO

MOTIVATION: Human gut microbiota plays a vital role in maintaining body health. The dysbiosis of gut microbiota is associated with a variety of diseases. It is critical to uncover the associations between gut microbiota and disease states as well as other intrinsic or environmental factors. However, inferring alterations of individual microbial taxa based on relative abundance data likely leads to false associations and conflicting discoveries in different studies. Moreover, the effects of underlying factors and microbe-microbe interactions could lead to the alteration of larger sets of taxa. It might be more robust to investigate gut microbiota using groups of related taxa instead of the composition of individual taxa. RESULTS: We proposed a novel method to identify underlying microbial modules, i.e. groups of taxa with similar abundance patterns affected by a common latent factor, from longitudinal gut microbiota and applied it to inflammatory bowel disease (IBD). The identified modules demonstrated closer intragroup relationships, indicating potential microbe-microbe interactions and influences of underlying factors. Associations between the modules and several clinical factors were investigated, especially disease states. The IBD-associated modules performed better in stratifying the subjects compared with the relative abundance of individual taxa. The modules were further validated in external cohorts, demonstrating the efficacy of the proposed method in identifying general and robust microbial modules. The study reveals the benefit of considering the ecological effects in gut microbiota analysis and the great promise of linking clinical factors with underlying microbial modules. AVAILABILITY AND IMPLEMENTATION: https://github.com/rwang-z/microbial_module.git.


Assuntos
Microbioma Gastrointestinal , Doenças Inflamatórias Intestinais , Humanos , Interações Microbianas
3.
Bioinformatics ; 39(3)2023 03 01.
Artigo em Inglês | MEDLINE | ID: mdl-36857587

RESUMO

MOTIVATION: The confusion of acute inflammation infected by virus and bacteria or noninfectious inflammation will lead to missing the best therapy occasion resulting in poor prognoses. The diagnostic model based on host gene expression has been widely used to diagnose acute infections, but the clinical usage was hindered by the capability across different samples and cohorts due to the small sample size for signature training and discovery. RESULTS: Here, we construct a large-scale dataset integrating multiple host transcriptomic data and analyze it using a sophisticated strategy which removes batch effect and extracts the common information from different cohorts based on the relative expression alteration of gene pairs. We assemble 2680 samples across 16 cohorts and separately build gene pair signature (GPS) for bacterial, viral, and noninfected patients. The three GPSs are further assembled into an antibiotic decision model (bacterial-viral-noninfected GPS, bvnGPS) using multiclass neural networks, which is able to determine whether a patient is bacterial infected, viral infected, or noninfected. bvnGPS can distinguish bacterial infection with area under the receiver operating characteristic curve (AUC) of 0.953 (95% confidence interval, 0.948-0.958) and viral infection with AUC of 0.956 (0.951-0.961) in the test set (N = 760). In the validation set (N = 147), bvnGPS also shows strong performance by attaining an AUC of 0.988 (0.978-0.998) on bacterial-versus-other and an AUC of 0.994 (0.984-1.000) on viral-versus-other. bvnGPS has the potential to be used in clinical practice and the proposed procedure provides insight into data integration, feature selection and multiclass classification for host transcriptomics data. AVAILABILITY AND IMPLEMENTATION: The codes implementing bvnGPS are available at https://github.com/Ritchiegit/bvnGPS. The construction of iPAGE algorithm and the training of neural network was conducted on Python 3.7 with Scikit-learn 0.24.1 and PyTorch 1.7. The visualization of the results was implemented on R 4.2, Python 3.7, and Matplotlib 3.3.4.


Assuntos
Transcriptoma , Viroses , Humanos , Redes Neurais de Computação , Bactérias , Viroses/diagnóstico , Viroses/genética , Inflamação
4.
Bioinformatics ; 38(14): 3513-3522, 2022 07 11.
Artigo em Inglês | MEDLINE | ID: mdl-35674358

RESUMO

MOTIVATION: Hepatocellular carcinoma (HCC) is a primary malignancy with a poor prognosis. Recently, multi-omics molecular-level measurement enables HCC diagnosis and prognosis prediction, which is crucial for early intervention of personalized therapy to diminish mortality. Here, we introduce a novel strategy utilizing DNA methylation and RNA expression data to achieve a multi-omics gene pair signature (GPS) for HCC discrimination. RESULTS: The immune genes with negative correlations between expression and promoter methylation are enriched in the highly connected cancer-related pathway network, which are considered as the candidates for HCC detection. After that, we separately construct a methylation GPS (mGPS) and an expression GPS (eGPS), and then assemble them as a meGPS with five gene pairs, in which the significant methylation and expression changes occur between HCC tumor and non-tumor groups. Reliable performance has been validated by independent tissue (age, gender and etiology) and blood datasets. This study proposes a procedure for multi-omics GPS identification and develops a novel HCC signature using both methylome and transcriptome data, suggesting potential molecular targets for the detection and therapy of HCC. AVAILABILITY AND IMPLEMENTATION: Models are available at https://github.com/bioinformaticStudy/meGPS.git. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Carcinoma Hepatocelular , Neoplasias Hepáticas , Humanos , Carcinoma Hepatocelular/diagnóstico , Carcinoma Hepatocelular/genética , Carcinoma Hepatocelular/metabolismo , Transcriptoma , Neoplasias Hepáticas/diagnóstico , Neoplasias Hepáticas/genética , Neoplasias Hepáticas/metabolismo , Epigenoma , Metilação de DNA , Biomarcadores Tumorais/genética , Regulação Neoplásica da Expressão Gênica
5.
BMC Genomics ; 22(1): 275, 2021 Apr 16.
Artigo em Inglês | MEDLINE | ID: mdl-33863291

RESUMO

BACKGROUND: Sepsis is the major cause of death in Intensive Care Unit (ICU) globally. Molecular detection enables rapid diagnosis that allows early intervention to minimize the death rate. Recent studies showed that long non-coding RNAs (lncRNAs) regulate proinflammatory genes and are related to the dysfunction of organs in sepsis. Identifying lncRNA signature with absolute abundance is challenging because of the technical variation and the systematic experimental bias. RESULTS: Cohorts (n = 768) containing whole blood lncRNA profiling of sepsis patients in the Gene Expression Omnibus (GEO) database were included. We proposed a novel diagnostic strategy that made use of the relative expressions of lncRNA pairs, which are reversed between sepsis patients and normal controls (eg. lncRNAi > lncRNAj in sepsis patients and lncRNAi < lncRNAj in normal controls), to identify 14 lncRNA pairs as a sepsis diagnostic signature. The signature was then applied to independent cohorts (n = 644) to evaluate its predictive performance across different ages and normalization methods. Comparing to common machine learning models and existing signatures, SepSigLnc consistently attains better performance on the validation cohorts from the same age group (AUC = 0.990 & 0.995 in two cohorts) and across different groups (AUC = 0.878 on average), as well as cohorts processed by an alternative normalization method (AUC = 0.953 on average). Functional analysis demonstrates that the lncRNA pairs in SepsigLnc are functionally similar and tend to implicate in the same biological processes including cell fate commitment and cellular response to steroid hormone stimulus. CONCLUSION: Our study identified 14 lncRNA pairs as signature that can facilitate the diagnosis of septic patients at an intervenable point when clinical manifestations are not dramatic. Also, the computational procedure can be generalized to a standard procedure for discovering diagnostic molecule signatures.


Assuntos
Neoplasias Encefálicas/mortalidade , RNA Longo não Codificante/genética , Sepse/diagnóstico , Estudos de Coortes , Humanos , Sepse/genética
6.
Mol Med ; 27(1): 15, 2021 02 12.
Artigo em Inglês | MEDLINE | ID: mdl-33579185

RESUMO

BACKGROUND: Cyclin-dependent kinases 2/4/6 (CDK2/4/6) play critical roles in cell cycle progression, and their deregulations are hallmarks of hepatocellular carcinoma (HCC). METHODS: We used the combination of computational and experimental approaches to discover a CDK2/4/6 triple-inhibitor from FDA approved small-molecule drugs for the treatment of HCC. RESULTS: We identified vanoxerine dihydrochloride as a new CDK2/4/6 inhibitor, and a strong cytotoxicdrugin human HCC QGY7703 and Huh7 cells (IC50: 3.79 µM for QGY7703and 4.04 µM for Huh7 cells). In QGY7703 and Huh7 cells, vanoxerine dihydrochloride treatment caused G1-arrest, induced apoptosis, and reduced the expressions of CDK2/4/6, cyclin D/E, retinoblastoma protein (Rb), as well as the phosphorylation of CDK2/4/6 and Rb. Drug combination study indicated that vanoxerine dihydrochloride and 5-Fu produced synergistic cytotoxicity in vitro in Huh7 cells. Finally, in vivo study in BALB/C nude mice subcutaneously xenografted with Huh7 cells, vanoxerine dihydrochloride (40 mg/kg, i.p.) injection for 21 days produced significant anti-tumor activity (p < 0.05), which was comparable to that achieved by 5-Fu (10 mg/kg, i.p.), with the combination treatment resulted in synergistic effect. Immunohistochemistry staining of the tumor tissues also revealed significantly reduced expressions of Rb and CDK2/4/6in vanoxerinedihydrochloride treatment group. CONCLUSIONS: The present study isthe first report identifying a new CDK2/4/6 triple inhibitor vanoxerine dihydrochloride, and demonstrated that this drug represents a novel therapeutic strategy for HCC treatment.


Assuntos
Carcinoma Hepatocelular/tratamento farmacológico , Quinase 2 Dependente de Ciclina/antagonistas & inibidores , Quinase 4 Dependente de Ciclina/antagonistas & inibidores , Quinase 6 Dependente de Ciclina/antagonistas & inibidores , Fluoruracila/administração & dosagem , Neoplasias Hepáticas/tratamento farmacológico , Piperazinas/administração & dosagem , Animais , Carcinoma Hepatocelular/metabolismo , Linhagem Celular Tumoral , Proliferação de Células/efeitos dos fármacos , Sobrevivência Celular/efeitos dos fármacos , Quinase 2 Dependente de Ciclina/metabolismo , Quinase 4 Dependente de Ciclina/metabolismo , Quinase 6 Dependente de Ciclina/metabolismo , Regulação para Baixo , Sinergismo Farmacológico , Feminino , Fluoruracila/farmacologia , Regulação Neoplásica da Expressão Gênica/efeitos dos fármacos , Humanos , Injeções Subcutâneas , Neoplasias Hepáticas/metabolismo , Camundongos , Camundongos Endogâmicos BALB C , Camundongos Nus , Fosforilação/efeitos dos fármacos , Piperazinas/farmacologia , Ensaios Antitumorais Modelo de Xenoenxerto
7.
Bioinformatics ; 35(20): 3989-3995, 2019 10 15.
Artigo em Inglês | MEDLINE | ID: mdl-30873528

RESUMO

MOTIVATION: Studies have shown that the accuracy of random forest (RF)-based scoring functions (SFs), such as RF-Score-v3, increases with more training samples, whereas that of classical SFs, such as X-Score, does not. Nevertheless, the impact of the similarity between training and test samples on this matter has not been studied in a systematic manner. It is therefore unclear how these SFs would perform when only trained on protein-ligand complexes that are highly dissimilar or highly similar to the test set. It is also unclear whether SFs based on machine learning algorithms other than RF can also improve accuracy with increasing training set size and to what extent they learn from dissimilar or similar training complexes. RESULTS: We present a systematic study to investigate how the accuracy of classical and machine-learning SFs varies with protein-ligand complex similarities between training and test sets. We considered three types of similarity metrics, based on the comparison of either protein structures, protein sequences or ligand structures. Regardless of the similarity metric, we found that incorporating a larger proportion of similar complexes to the training set did not make classical SFs more accurate. In contrast, RF-Score-v3 was able to outperform X-Score even when trained on just 32% of the most dissimilar complexes, showing that its superior performance owes considerably to learning from dissimilar training complexes to those in the test set. In addition, we generated the first SF employing Extreme Gradient Boosting (XGBoost), XGB-Score, and observed that it also improves with training set size while outperforming the rest of SFs. Given the continuous growth of training datasets, the development of machine-learning SFs has become very appealing. AVAILABILITY AND IMPLEMENTATION: https://github.com/HongjianLi/MLSF. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Aprendizado de Máquina , Ligantes , Ligação Proteica , Proteínas
8.
BMC Bioinformatics ; 20(Suppl 26): 628, 2019 Dec 16.
Artigo em Inglês | MEDLINE | ID: mdl-31839008

RESUMO

BACKGROUND: Development of new drugs is a time-consuming and costly process, and the cost is still increasing in recent years. However, the number of drugs approved by FDA every year per dollar spent on development is declining. Drug repositioning, which aims to find new use of existing drugs, attracts attention of pharmaceutical researchers due to its high efficiency. A variety of computational methods for drug repositioning have been proposed based on machine learning approaches, network-based approaches, matrix decomposition approaches, etc. RESULTS: We propose a novel computational method for drug repositioning. We construct and decompose three-dimensional tensors, which consist of the associations among drugs, targets and diseases, to derive latent factors reflecting the functional patterns of the three kinds of entities. The proposed method outperforms several baseline methods in recovering missing associations. Most of the top predictions are validated by literature search and computational docking. Latent factors are used to cluster the drugs, targets and diseases into functional groups. Topological Data Analysis (TDA) is applied to investigate the properties of the clusters. We find that the latent factors are able to capture the functional patterns and underlying molecular mechanisms of drugs, targets and diseases. In addition, we focus on repurposing drugs for cancer and discover not only new therapeutic use but also adverse effects of the drugs. In the in-depth study of associations among the clusters of drugs, targets and cancer subtypes, we find there exist strong associations between particular clusters. CONCLUSIONS: The proposed method is able to recover missing associations, discover new predictions and uncover functional clusters of drugs, targets and diseases. The clustering of drugs, targets and diseases, as well as the associations among the clusters, provides a new guiding framework for drug repositioning.


Assuntos
Biologia Computacional , Reposicionamento de Medicamentos , Análise por Conglomerados , Biologia Computacional/métodos , Reposicionamento de Medicamentos/métodos , Humanos , Aprendizado de Máquina
9.
J Pharmacol Sci ; 135(3): 114-120, 2017 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-29132796

RESUMO

Hyperuricemia, a long-term purine metabolic disorder, is a well-known risk factor for gout, hypertension and diabetes. In maintaining normal whole-body purine levels, xanthine oxidase (XOD) is a key enzyme in the purine metabolic pathway, as it catalyzes the oxidation of hypoxanthine to xanthine and finally to uric acid. Here we used the protein-ligand docking software idock to virtually screen potential XOD inhibitors from 3167 approved small compounds/drugs. The inhibitory activities of the ten compounds with the highest scores were tested on XOD in vitro. Interestingly, all the ten compounds inhibited the activity of XOD at certain degrees. Particularly, the anti-ulcerative-colitis drug olsalazine sodium demonstrated a great inhibitory activity for XOD (IC50 = 3.4 mg/L). Enzymatic kinetic studies revealed that the drug was a hybrid-type inhibitor of xanthine oxidase. Furthermore, the drug strikingly decreased serum urate levels, serum/hepatic activities of XOD at a dose-dependent manner in vivo. Thus, we demonstrated a successful hunting process of compounds/drugs for hyperuricemia through virtual screening, supporting a potential usage of olsalazine sodium in the treatment of hyperuricemia.


Assuntos
Ácidos Aminossalicílicos/farmacologia , Antiulcerosos/farmacologia , Ácido Úrico/sangue , Xantina Desidrogenase/antagonistas & inibidores , Xantina Desidrogenase/metabolismo , Ácidos Aminossalicílicos/uso terapêutico , Animais , Relação Dose-Resposta a Droga , Avaliação Pré-Clínica de Medicamentos , Hiperuricemia/tratamento farmacológico , Técnicas In Vitro , Masculino , Camundongos , Relação Estrutura-Atividade
10.
BMC Bioinformatics ; 17(Suppl 11): 308, 2016 Sep 22.
Artigo em Inglês | MEDLINE | ID: mdl-28185549

RESUMO

BACKGROUND: Pose generation error is usually quantified as the difference between the geometry of the pose generated by the docking software and that of the same molecule co-crystallised with the considered protein. Surprisingly, the impact of this error on binding affinity prediction is yet to be systematically analysed across diverse protein-ligand complexes. RESULTS: Against commonly-held views, we have found that pose generation error has generally a small impact on the accuracy of binding affinity prediction. This is also true for large pose generation errors and it is not only observed with machine-learning scoring functions, but also with classical scoring functions such as AutoDock Vina. Furthermore, we propose a procedure to correct a substantial part of this error which consists of calibrating the scoring functions with re-docked, rather than co-crystallised, poses. In this way, the relationship between Vina-generated protein-ligand poses and their binding affinities is directly learned. As a result, test set performance after this error-correcting procedure is much closer to that of predicting the binding affinity in the absence of pose generation error (i.e. on crystal structures). We evaluated several strategies, obtaining better results for those using a single docked pose per ligand than those using multiple docked poses per ligand. CONCLUSIONS: Binding affinity prediction is often carried out on the docked pose of a known binder rather than its co-crystallised pose. Our results suggest than pose generation error is in general far less damaging for binding affinity prediction than it is currently believed. Another contribution of our study is the proposal of a procedure that largely corrects for this error. The resulting machine-learning scoring function is freely available at http://istar.cse.cuhk.edu.hk/rf-score-4.tgz and http://ballester.marseille.inserm.fr/rf-score-4.tgz .


Assuntos
Simulação de Acoplamento Molecular/normas , Proteínas Nucleares/metabolismo , Pirazinas/metabolismo , Software , Fatores de Transcrição/metabolismo , Humanos , Ligantes , Proteínas Nucleares/química , Ligação Proteica , Conformação Proteica , Pirazinas/química , Fatores de Transcrição/química
11.
Molecules ; 20(6): 10947-62, 2015 Jun 12.
Artigo em Inglês | MEDLINE | ID: mdl-26076113

RESUMO

Docking scoring functions can be used to predict the strength of protein-ligand binding. It is widely believed that training a scoring function with low-quality data is detrimental for its predictive performance. Nevertheless, there is a surprising lack of systematic validation experiments in support of this hypothesis. In this study, we investigated to which extent training a scoring function with data containing low-quality structural and binding data is detrimental for predictive performance. We actually found that low-quality data is not only non-detrimental, but beneficial for the predictive performance of machine-learning scoring functions, though the improvement is less important than that coming from high-quality data. Furthermore, we observed that classical scoring functions are not able to effectively exploit data beyond an early threshold, regardless of its quality. This demonstrates that exploiting a larger data volume is more important for the performance of machine-learning scoring functions than restricting to a smaller set of higher data quality.


Assuntos
Modelos Teóricos , Relação Estrutura-Atividade
12.
BMC Bioinformatics ; 15: 56, 2014 Feb 25.
Artigo em Inglês | MEDLINE | ID: mdl-24564583

RESUMO

BACKGROUND: Visualization of protein-ligand complex plays an important role in elaborating protein-ligand interactions and aiding novel drug design. Most existing web visualizers either rely on slow software rendering, or lack virtual reality support. The vital feature of macromolecular surface construction is also unavailable. RESULTS: We have developed iview, an easy-to-use interactive WebGL visualizer of protein-ligand complex. It exploits hardware acceleration rather than software rendering. It features three special effects in virtual reality settings, namely anaglyph, parallax barrier and oculus rift, resulting in visually appealing identification of intermolecular interactions. It supports four surface representations including Van der Waals surface, solvent excluded surface, solvent accessible surface and molecular surface. Moreover, based on the feature-rich version of iview, we have also developed a neat and tailor-made version specifically for our istar web platform for protein-ligand docking purpose. This demonstrates the excellent portability of iview. CONCLUSIONS: Using innovative 3D techniques, we provide a user friendly visualizer that is not intended to compete with professional visualizers, but to enable easy accessibility and platform independence.


Assuntos
Biologia Computacional/métodos , Simulação de Acoplamento Molecular/métodos , Proteínas/química , Proteínas/metabolismo , Software , Gráficos por Computador , Internet , Ligantes , Ligação Proteica , Interface Usuário-Computador
13.
BMC Bioinformatics ; 15: 291, 2014 Aug 27.
Artigo em Inglês | MEDLINE | ID: mdl-25159129

RESUMO

BACKGROUND: State-of-the-art protein-ligand docking methods are generally limited by the traditionally low accuracy of their scoring functions, which are used to predict binding affinity and thus vital for discriminating between active and inactive compounds. Despite intensive research over the years, classical scoring functions have reached a plateau in their predictive performance. These assume a predetermined additive functional form for some sophisticated numerical features, and use standard multivariate linear regression (MLR) on experimental data to derive the coefficients. RESULTS: In this study we show that such a simple functional form is detrimental for the prediction performance of a scoring function, and replacing linear regression by machine learning techniques like random forest (RF) can improve prediction performance. We investigate the conditions of applying RF under various contexts and find that given sufficient training samples RF manages to comprehensively capture the non-linearity between structural features and measured binding affinities. Incorporating more structural features and training with more samples can both boost RF performance. In addition, we analyze the importance of structural features to binding affinity prediction using the RF variable importance tool. Lastly, we use Cyscore, a top performing empirical scoring function, as a baseline for comparison study. CONCLUSIONS: Machine-learning scoring functions are fundamentally different from classical scoring functions because the former circumvents the fixed functional form relating structural features with binding affinities. RF, but not MLR, can effectively exploit more structural features and more training samples, leading to higher prediction performance. The future availability of more X-ray crystal structures will further widen the performance gap between RF-based and MLR-based scoring functions. This further stresses the importance of substituting RF for MLR in scoring function development.


Assuntos
Inteligência Artificial , Biologia Computacional/métodos , Proteínas/metabolismo , Ligantes , Modelos Lineares , Ligação Proteica
14.
Nucleic Acids Res ; 40(19): 9392-403, 2012 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-22904079

RESUMO

In protein-DNA interactions, particularly transcription factor (TF) and transcription factor binding site (TFBS) bindings, associated residue variations form patterns denoted as subtypes. Subtypes may lead to changed binding preferences, distinguish conserved from flexible binding residues and reveal novel binding mechanisms. However, subtypes must be studied in the context of core bindings. While solving 3D structures would require huge experimental efforts, recent sequence-based associated TF-TFBS pattern discovery has shown to be promising, upon which a large-scale subtype study is possible and desirable. In this article, we investigate residue-varying subtypes based on associated TF-TFBS patterns. By re-categorizing the patterns with respect to varying TF amino acids, statistically significant (P values ≤ 0.005) subtypes leading to varying TFBS patterns are discovered without using TF family or domain annotations. Resultant subtypes have various biological meanings. The subtypes reflect familial and functional properties and exhibit changed binding preferences supported by 3D structures. Conserved residues critical for maintaining TF-TFBS bindings are revealed by analyzing the subtypes. In-depth analysis on the subtype pair PKVVIL-CACGTG versus PKVEIL-CAGCTG shows the V/E variation is indicative for distinguishing Myc from MRF families. Discovered from sequences only, the TF-TFBS subtypes are informative and promising for more biological findings, complementing and extending recent one-sided subtype and familial studies with comprehensive evidence.


Assuntos
DNA/química , Fatores de Transcrição/química , Fatores de Transcrição/classificação , Sítios de Ligação , Imunoprecipitação da Cromatina , DNA/metabolismo , Bases de Dados de Proteínas , Modelos Moleculares , Motivos de Nucleotídeos , Matrizes de Pontuação de Posição Específica , Ligação Proteica , Análise de Sequência de DNA , Fatores de Transcrição/metabolismo
15.
Bioinformatics ; 27(4): 471-8, 2011 Feb 15.
Artigo em Inglês | MEDLINE | ID: mdl-21193520

RESUMO

MOTIVATION: The bindings between transcription factors (TFs) and transcription factor binding sites (TFBSs) are fundamental protein-DNA interactions in transcriptional regulation. Extensive efforts have been made to better understand the protein-DNA interactions. Recent mining on exact TF-TFBS-associated sequence patterns (rules) has shown great potentials and achieved very promising results. However, exact rules cannot handle variations in real data, resulting in limited informative rules. In this article, we generalize the exact rules to approximate ones for both TFs and TFBSs, which are essential for biological variations. RESULTS: A progressive approach is proposed to address the approximation to alleviate the computational requirements. Firstly, similar TFBSs are grouped from the available TF-TFBS data (TRANSFAC database). Secondly, approximate and highly conserved binding cores are discovered from TF sequences corresponding to each TFBS group. A customized algorithm is developed for the specific objective. We discover the approximate TF-TFBS rules by associating the grouped TFBS consensuses and TF cores. The rules discovered are evaluated by matching (verifying with) the actual protein-DNA binding pairs from Protein Data Bank (PDB) 3D structures. The approximate results exhibit many more verified rules and up to 300% better verification ratios than the exact ones. The customized algorithm achieves over 73% better verification ratios than traditional methods. Approximate rules (64-79%) are shown statistically significant. Detailed variation analysis and conservation verification on NCBI records demonstrate that the approximate rules reveal both the flexible and specific protein-DNA interactions accurately. The approximate TF-TFBS rules discovered show great generalized capability of exploring more informative binding rules.


Assuntos
Algoritmos , Proteínas de Ligação a DNA/genética , DNA/genética , Fatores de Transcrição/genética , Sequência de Bases , Sítios de Ligação , Biologia Computacional/métodos , DNA/metabolismo , Proteínas de Ligação a DNA/metabolismo , Regulação da Expressão Gênica , Ligação Proteica , Estrutura Terciária de Proteína , Fatores de Transcrição/metabolismo
16.
Nucleic Acids Res ; 38(19): 6324-37, 2010 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-20529874

RESUMO

Protein-DNA bindings between transcription factors (TFs) and transcription factor binding sites (TFBSs) play an essential role in transcriptional regulation. Over the past decades, significant efforts have been made to study the principles for protein-DNA bindings. However, it is considered that there are no simple one-to-one rules between amino acids and nucleotides. Many methods impose complicated features beyond sequence patterns. Protein-DNA bindings are formed from associated amino acid and nucleotide sequence pairs, which determine many functional characteristics. Therefore, it is desirable to investigate associated sequence patterns between TFs and TFBSs. With increasing computational power, availability of massive experimental databases on DNA and proteins, and mature data mining techniques, we propose a framework to discover associated TF-TFBS binding sequence patterns in the most explicit and interpretable form from TRANSFAC. The framework is based on association rule mining with Apriori algorithm. The patterns found are evaluated by quantitative measurements at several levels on TRANSFAC. With further independent verifications from literatures, Protein Data Bank and homology modeling, there are strong evidences that the patterns discovered reveal real TF-TFBS bindings across different TFs and TFBSs, which can drive for further knowledge to better understand TF-TFBS bindings.


Assuntos
Proteínas de Ligação a DNA/química , DNA/química , Mineração de Dados/métodos , Elementos Reguladores de Transcrição , Análise de Sequência de DNA , Fatores de Transcrição/química , Algoritmos , Sítios de Ligação , DNA/metabolismo , Proteínas de Ligação a DNA/metabolismo , Bases de Dados Genéticas , Homologia Estrutural de Proteína , Fatores de Transcrição/metabolismo
17.
IEEE/ACM Trans Comput Biol Bioinform ; 19(6): 3246-3254, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-34437068

RESUMO

High-throughput sequencing can detect tens of thousands of genes in parallel, providing opportunities for improving the diagnostic accuracy of multiple diseases including sepsis, which is an aggressive inflammatory response to infection that can cause organ failure and death. Early screening of sepsis is essential in clinic, but no effective diagnostic biomarkers are available yet. Here, we present a novel method, Recurrent Logistic Regression, to identify diagnostic biomarkers for sepsis from the blood transcriptome data. A panel including five immune-related genes, LRRN3, IL2RB, FCER1A, TLR5, and S100A12, are determined as diagnostic biomarkers (LIFTS) for sepsis. LIFTS discriminates patients with sepsis from normal controls in high accuracy (AUROC = 0.9959 on average; IC = [0.9722-1.0]) on nine validation cohorts across three independent platforms, which outperforms existing markers. Our analysis determined an accurate prediction model and reproducible transcriptome biomarkers that can lay a foundation for clinical diagnostic tests and biological mechanistic studies.


Assuntos
Sepse , Humanos , Sepse/diagnóstico , Sepse/genética , Transcriptoma/genética , Biomarcadores
18.
Comput Biol Med ; 148: 105881, 2022 09.
Artigo em Inglês | MEDLINE | ID: mdl-35940161

RESUMO

The non-coding RNA (ncRNA) regulation appears to be associated to the diagnosis and targeted therapy of complex diseases. Motifs of non-coding RNAs and genes in the competing endogenous RNA (ceRNA) network would probably contribute to the accurate prediction of serous ovarian carcinoma (SOC). We conducted a microarray study profiling the whole transcriptomes of eight human SOCs and eight controls and constructed a ceRNA network including mRNAs, long ncRNAs, and circular RNAs (circRNAs). Novel form of motifs (mRNA-ncRNA-mRNA) were identified from the ceRNA network and defined as non-coding RNA's competing endogenous gene pairs (ceGPs), using a proposed method denoised individualized pair analysis of gene expression (deiPAGE). 18 cricRNA's ceGPs (cceGPs) were identified from multiple cohorts and were fused as an indicator (SOC index) for SOC discrimination, which carried a high predictive capacity in independent cohorts. SOC index was negatively correlated with the CD8+/CD4+ ratio in tumour-infiltration, reflecting the migration and growth of tumour cells in ovarian cancer progression. Moreover, most of the RNAs in SOC index were experimentally validated involved in ovarian cancer development. Our results elucidate the discriminative capability of SOC index and suggest that the novel competing endogenous motifs play important roles in expression regulation and could be potential target for investigating ovarian cancer mechanism or its therapy.


Assuntos
MicroRNAs , Neoplasias Ovarianas , RNA Longo não Codificante , Feminino , Perfilação da Expressão Gênica , Redes Reguladoras de Genes , Humanos , RNA Mensageiro , RNA não Traduzido , Transcriptoma
19.
Front Cell Dev Biol ; 9: 671302, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-33996828

RESUMO

Bisulfite sequencing is considered as the gold standard approach for measuring DNA methylation, which acts as a pivotal part in regulating a variety of biological processes without changes in DNA sequences. In this study, we introduced the most prevalent methods for processing bisulfite sequencing data and evaluated the consistency of the data acquired from different measurements in liver cancer. Firstly, we introduced three commonly used bisulfite sequencing assays, i.e., reduced-representation bisulfite sequencing (RRBS), whole-genome bisulfite sequencing (WGBS), and targeted bisulfite sequencing (targeted BS). Next, we discussed the principles and compared different methods for alignment, quality assessment, methylation level scoring, and differentially methylated region identification. After that, we screened differential methylated genes in liver cancer through the three bisulfite sequencing assays and evaluated the consistency of their results. Ultimately, we compared bisulfite sequencing to 450 k beadchip and assessed the statistical similarity and functional association of differentially methylated genes (DMGs) among the four assays. Our results demonstrated that the DMGs measured by WGBS, RRBS, targeted BS and 450 k beadchip are consistently hypo-methylated in liver cancer with high functional similarity.

20.
Front Pharmacol ; 12: 691769, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34335258

RESUMO

Background: Hepatocellular carcinoma (HCC) is a lethal malignancy lacking effective treatment. The Cyclin-dependent kinases 4/6 (CDK4/6) and PI3K/AKT signal pathways play pivotal roles in carcinogenesis and are promising therapeutic targets for HCC. Here we identified a new CDK4/6 and PI3K/AKT multi-kinase inhibitor for the treatment of HCC. Methods: Using a repurposing and ensemble docking methodology, we screened a library of worldwide approved drugs to identify candidate CDK4/6 inhibitors. By MTT, apoptosis, and flow cytometry analysis, we investigated the effects of candidate drug in reducing cell-viability,inducing apoptosis, and causing cell-cycle arrest. The drug combination and thermal proteomic profiling (TPP) method were used to investigate whether the candidate drug produced antagonistic effect. The in vivo anti-cancer effect was performed in BALB/C nude mice subcutaneously xenografted with Huh7 cells. Results: We demonstrated for the first time that the anti-plasmodium drug aminoquinol is a new CDK4/6 and PI3K/AKT inhibitor. Aminoquinol significantly decreased cell viability, induced apoptosis, increased the percentage of cells in G1 phase. Drug combination screening indicated that aminoquinol could produce antagonistic effect with the PI3K inhibitor LY294002. TPP analysis confirmed that aminoquinol significantly stabilized CDK4, CDK6, PI3K and AKT proteins. Finally, in vivo study in Huh7 cells xenografted nude mice demonstrated that aminoquinol exhibited strong anti-tumor activity, comparable to that of the leading cancer drug 5-fluorouracil with the combination treatment showed the highest therapeutic effect. Conclusion: The present study indicates for the first time the discovery of a new CDK4/6 and PI3K/AKT multi-kinase inhibitor aminoquinol. It could be used alone or as a combination therapeutic strategy for the treatment of HCC.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA