Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 130
Filtrar
1.
Anal Chem ; 2024 May 14.
Artigo em Inglês | MEDLINE | ID: mdl-38743842

RESUMO

The metabolic signature identification of colorectal cancer is critical for its early diagnosis and therapeutic approaches that will significantly block cancer progression and improve patient survival. Here, we combined an untargeted metabolic analysis strategy based on internal extractive electrospray ionization mass spectrometry and the machine learning approach to analyze metabolites in 173 pairs of cancer samples and matched normal tissue samples to build robust metabolic signature models for diagnostic purposes. Screening and independent validation of metabolic signatures from colorectal cancers via machine learning methods (Logistic Regression_L1 for feature selection and eXtreme Gradient Boosting for classification) was performed to generate a panel of seven signatures with good diagnostic performance (the accuracy of 87.74%, sensitivity of 85.82%, and specificity of 89.66%). Moreover, seven signatures were evaluated according to their ability to distinguish between cancer and normal tissues, with the metabolic molecule PC (30:0) showing good diagnostic performance. In addition, genes associated with PC (30:0) were identified by multiomics analysis (combining metabolic data with transcriptomic data analysis) and our results showed that PC (30:0) could promote the proliferation of colorectal cancer cell SW480, revealing the correlation between genetic changes and metabolic dysregulation in cancer. Overall, our results reveal potential determinants affecting metabolite dysregulation, paving the way for a mechanistic understanding of altered tissue metabolites in colorectal cancer and design interventions for manipulating the levels of circulating metabolites.

2.
PLoS Comput Biol ; 20(4): e1011945, 2024 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-38578805

RESUMO

Early identification of safe and efficacious disease targets is crucial to alleviating the tremendous cost of drug discovery projects. However, existing experimental methods for identifying new targets are generally labor-intensive and failure-prone. On the other hand, computational approaches, especially machine learning-based frameworks, have shown remarkable application potential in drug discovery. In this work, we propose Progeni, a novel machine learning-based framework for target identification. In addition to fully exploiting the known heterogeneous biological networks from various sources, Progeni integrates literature evidence about the relations between biological entities to construct a probabilistic knowledge graph. Graph neural networks are then employed in Progeni to learn the feature embeddings of biological entities to facilitate the identification of biologically relevant target candidates. A comprehensive evaluation of Progeni demonstrated its superior predictive power over the baseline methods on the target identification task. In addition, our extensive tests showed that Progeni exhibited high robustness to the negative effect of exposure bias, a common phenomenon in recommendation systems, and effectively identified new targets that can be strongly supported by the literature. Moreover, our wet lab experiments successfully validated the biological significance of the top target candidates predicted by Progeni for melanoma and colorectal cancer. All these results suggested that Progeni can identify biologically effective targets and thus provide a powerful and useful tool for advancing the drug discovery process.


Assuntos
Biologia Computacional , Descoberta de Drogas , Aprendizado de Máquina , Redes Neurais de Computação , Humanos , Biologia Computacional/métodos , Descoberta de Drogas/métodos , Algoritmos , Melanoma , Probabilidade , Neoplasias Colorretais
3.
Front Genet ; 15: 1352504, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38487252

RESUMO

Background: Cancer is a significant global health problem that continues to cause a high number of deaths worldwide. Traditional cancer treatments often come with risks that can compromise the functionality of vital organs. As a potential alternative to these conventional therapies, Anticancer peptides (ACPs) have garnered attention for their small size, high specificity, and reduced toxicity, making them as a promising option for cancer treatments. Methods: However, the process of identifying effective ACPs through wet-lab screening experiments is time-consuming and requires a lot of labor. To overcome this challenge, a deep ensemble learning method is constructed to predict anticancer peptides (ACPs) in this study. To evaluate the reliability of the framework, four different datasets are used in this study for training and testing. During the training process of the model, integration of feature selection methods, feature dimensionality reduction measures, and optimization of the deep ensemble model are carried out. Finally, we explored the interpretability of features that affected the final prediction results and built a web server platform to facilitate anticancer peptides prediction, which can be used by all researchers for further studies. This web server can be accessed at http://lmylab.online:5001/. Results: The result of this study achieves an accuracy rate of 98.53% and an AUC (Area under Curve) value of 0.9972 on the ACPfel dataset, it has improvements on other datasets as well.

4.
Anal Biochem ; 689: 115495, 2024 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-38431142

RESUMO

RNA modification, N4-acetylcytidine (ac4C), is enzymatically catalyzed by N-acetyltransferase 10 (NAT10) and plays an essential role across tRNA, rRNA, and mRNA. It influences various cellular functions, including mRNA stability and rRNA biosynthesis. Wet-lab detection of ac4C modification sites is highly resource-intensive and costly. Therefore, various machine learning and deep learning techniques have been employed for computational detection of ac4C modification sites. The known ac4C modification sites are limited for training an accurate and stable prediction model. This study introduces GANSamples-ac4C, a novel framework that synergizes transfer learning and generative adversarial network (GAN) to generate synthetic RNA sequences to train a better ac4C modification site prediction model. Comparative analysis reveals that GANSamples-ac4C outperforms existing state-of-the-art methods in identifying ac4C sites. Moreover, our result underscores the potential of synthetic data in mitigating the issue of data scarcity for biological sequence prediction tasks. Another major advantage of GANSamples-ac4C is its interpretable decision logic. Multi-faceted interpretability analyses detect key regions in the ac4C sequences influencing the discriminating decision between positive and negative samples, a pronounced enrichment of G in this region, and ac4C-associated motifs. These findings may offer novel insights for ac4C research. The GANSamples-ac4C framework and its source code are publicly accessible at http://www.healthinformaticslab.org/supp/.


Assuntos
Citidina/análogos & derivados , Aprendizado de Máquina , RNA , Estabilidade de RNA
5.
J Vis Exp ; (205)2024 Mar 01.
Artigo em Inglês | MEDLINE | ID: mdl-38497637

RESUMO

Transcriptome represents the expression levels of many genes in a sample and has been widely used in biological research and clinical practice. Researchers usually focused on transcriptomic biomarkers with differential representations between a phenotype group and a control group of samples. This study presented a multitask graph-attention network (GAT) learning framework to learn the complex inter-genic interactions of the reference samples. A demonstrative reference model was pre-trained on the healthy samples (HealthModel), which could be directly used to generate the model-based quantitative transcriptional regulation (mqTrans) view of the independent test transcriptomes. The generated mqTrans view of transcriptomes was demonstrated by prediction tasks and dark biomarker detection. The coined term "dark biomarker" stemmed from its definition that a dark biomarker showed differential representation in the mqTrans view but no differential expression in its original expression level. A dark biomarker was always overlooked in traditional biomarker detection studies due to the absence of differential expression. The source code and the manual of the pipeline HealthModelPipe can be downloaded from http://www.healthinformaticslab.org/supp/resources.php.


Assuntos
Perfilação da Expressão Gênica , Transcriptoma , Regulação da Expressão Gênica , Biomarcadores , Fenótipo
6.
Bioinformatics ; 40(4)2024 Mar 29.
Artigo em Inglês | MEDLINE | ID: mdl-38426310

RESUMO

MOTIVATION: Predicting molecular properties is a pivotal task in various scientific domains, including drug discovery, material science, and computational chemistry. This problem is often hindered by the lack of annotated data and imbalanced class distributions, which pose significant challenges in developing accurate and robust predictive models. RESULTS: This study tackles these issues by employing pretrained molecular models within a few-shot learning framework. A novel dynamic contrastive loss function is utilized to further improve model performance in the situation of class imbalance. The proposed MolFeSCue framework not only facilitates rapid generalization from minimal samples, but also employs a contrastive loss function to extract meaningful molecular representations from imbalanced datasets. Extensive evaluations and comparisons of MolFeSCue and state-of-the-art algorithms have been conducted on multiple benchmark datasets, and the experimental data demonstrate our algorithm's effectiveness in molecular representations and its broad applicability across various pretrained models. Our findings underscore MolFeSCues potential to accelerate advancements in drug discovery. AVAILABILITY AND IMPLEMENTATION: We have made all the source code utilized in this study publicly accessible via GitHub at http://www.healthinformaticslab.org/supp/ or https://github.com/zhangruochi/MolFeSCue. The code (MolFeSCue-v1-00) is also available as the supplementary file of this paper.


Assuntos
Algoritmos , Benchmarking , Descoberta de Drogas , Modelos Moleculares , Software
7.
Genes (Basel) ; 14(12)2023 Dec 01.
Artigo em Inglês | MEDLINE | ID: mdl-38136991

RESUMO

A transcriptome profiles the expression levels of genes in cells and has accumulated a huge amount of public data. Most of the existing biomarker-related studies investigated the differential expression of individual transcriptomic features under the assumption of inter-feature independence. Many transcriptomic features without differential expression were ignored from the biomarker lists. This study proposed a computational analysis protocol (mqTrans) to analyze transcriptomes from the view of high-dimensional inter-feature correlations. The mqTrans protocol trained a regression model to predict the expression of an mRNA feature from those of the transcription factors (TFs). The difference between the predicted and real expression of an mRNA feature in a query sample was defined as the mqTrans feature. The new mqTrans view facilitated the detection of thirteen transcriptomic features with differentially expressed mqTrans features, but without differential expression in the original transcriptomic values in three independent datasets of lung cancer. These features were called dark biomarkers because they would have been ignored in a conventional differential analysis. The detailed discussion of one dark biomarker, GBP5, and additional validation experiments suggested that the overlapping long non-coding RNAs might have contributed to this interesting phenomenon. In summary, this study aimed to find undifferentially expressed genes with significantly changed mqTrans values in lung cancer. These genes were usually ignored in most biomarker detection studies of undifferential expression. However, their differentially expressed mqTrans values in three independent datasets suggested their strong associations with lung cancer.


Assuntos
Neoplasias Pulmonares , Humanos , Neoplasias Pulmonares/genética , Neoplasias Pulmonares/diagnóstico , Perfilação da Expressão Gênica , Transcriptoma/genética , Biomarcadores , RNA Mensageiro/genética
8.
Comput Biol Med ; 167: 107613, 2023 12.
Artigo em Inglês | MEDLINE | ID: mdl-37918259

RESUMO

Thyroid cancer is the most common type of endocrine system cancer. The pre-cancer and early stages are usually benign or slowly growing, and do not need invasive treatments. This study investigated the challenging classification task of four classes of samples, i.e., normal controls (N), thyroid adenomas (TA), papillary thyroid cancers (PTC) and metastasized papillary thyroid cancers (MPTC). We proposed a multi-view progression diagnosis framework ThyroidBloodTest to integrate the two views of RNAseq platelet transcriptomes (View-T) and blood routine (View-B) features. Platelet transcriptome represented the molecular-level information, while the blood routine features were easy to obtain in the clinical practice. Eleven feature selection algorithms and seven classifiers were evaluated for both views. The experimental data suggested the importance of choosing appropriate data analysis algorithms and feature engineering techniques like principal component analysis (PCA). The best ThyroidBloodTest model achieved Acc = 0.8750 for the four-class classification of the N/TA/PTC/MPTC samples based on the integrated feature space of View-T and View-B. The cellular localization cytosol and three post-translational modification types acetylation/phosphorylation/ubiquitination were observed to be enriched in the proteins encoded by the View-T biomarkers. The numbers of different immune cells also contributed positively to the progression diagnosis of thyroid cancer. The proposed multi-view prediction model demonstrated the necessity of integrating both platelet transcriptomes and blood routine tests for the progression diagnosis of thyroid cancer.


Assuntos
Carcinoma Papilar , Neoplasias da Glândula Tireoide , Humanos , Câncer Papilífero da Tireoide/diagnóstico , Câncer Papilífero da Tireoide/genética , Transcriptoma/genética , Carcinoma Papilar/patologia , Neoplasias da Glândula Tireoide/diagnóstico , Neoplasias da Glândula Tireoide/genética
9.
BMC Infect Dis ; 23(1): 622, 2023 Sep 21.
Artigo em Inglês | MEDLINE | ID: mdl-37735372

RESUMO

BACKGROUND: Coronavirus disease 2019 (COVID-19) is a rapidly developing and sometimes lethal pulmonary disease. Accurately predicting COVID-19 mortality will facilitate optimal patient treatment and medical resource deployment, but the clinical practice still needs to address it. Both complete blood counts and cytokine levels were observed to be modified by COVID-19 infection. This study aimed to use inexpensive and easily accessible complete blood counts to build an accurate COVID-19 mortality prediction model. The cytokine fluctuations reflect the inflammatory storm induced by COVID-19, but their levels are not as commonly accessible as complete blood counts. Therefore, this study explored the possibility of predicting cytokine levels based on complete blood counts. METHODS: We used complete blood counts to predict cytokine levels. The predictive model includes an autoencoder, principal component analysis, and linear regression models. We used classifiers such as support vector machine and feature selection models such as adaptive boost to predict the mortality of COVID-19 patients. RESULTS: Complete blood counts and original cytokine levels reached the COVID-19 mortality classification area under the curve (AUC) values of 0.9678 and 0.9111, respectively, and the cytokine levels predicted by the feature set alone reached the classification AUC value of 0.9844. The predicted cytokine levels were more significantly associated with COVID-19 mortality than the original values. CONCLUSIONS: Integrating the predicted cytokine levels and complete blood counts improved a COVID-19 mortality prediction model using complete blood counts only. Both the cytokine level prediction models and the COVID-19 mortality prediction models are publicly available at http://www.healthinformaticslab.org/supp/resources.php .


Assuntos
COVID-19 , Humanos , Área Sob a Curva , Citocinas , Modelos Lineares , Análise de Componente Principal
10.
Comput Biol Med ; 163: 107187, 2023 09.
Artigo em Inglês | MEDLINE | ID: mdl-37393787

RESUMO

Artificial intelligence (AI) has achieved significant progress in the field of drug discovery. AI-based tools have been used in all aspects of drug discovery, including chemical structure recognition. We propose a chemical structure recognition framework, Optical Chemical Molecular Recognition (OCMR), to improve the data extraction capability in practical scenarios compared with the rule-based and end-to-end deep learning models. The proposed OCMR framework enhances the recognition performances via the integration of local information in the topology of molecular graphs. OCMR handles complex tasks like non-canonical drawing and atomic group abbreviation and substantially improves the current state-of-the-art results on multiple public benchmark datasets and one internally curated dataset.


Assuntos
Inteligência Artificial , Benchmarking , Descoberta de Drogas
11.
Adv Biol (Weinh) ; 7(12): e2300189, 2023 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-37423953

RESUMO

This work hypothesizes that some genes undergo radically changed transcription regulations (TRs) in breast cancer (BC), but don't show differential expressions for unknown reasons. The TR of a gene is quantitatively formulated by a regression model between the expression of this gene and multiple transcription factors (TFs). The difference between the predicted and real expression levels of a gene in a query sample is defined as the mqTrans value of this gene, which quantitatively reflects its regulatory changes. This work systematically screens the undifferentially expressed genes with differentially expressed mqTrans values in 1036 samples across five datasets and three ethnic groups. This study calls the 25 genes satisfying the above hypothesis in at least four datasets as dark biomarkers, and the strong dark biomarker gene CXXC5 (CXXC Finger Protein 5) is even supported by all the five independent BC datasets. Although CXXC5 does not show differential expressions in BC, its transcription regulations show quantitative associations with BCs in diversified cohorts. The overlapping long noncoding RNAs (lncRNAs) may have contributed their transcripts to the expression miscalculations of dark biomarkers. The mqTrans analysis serves as a complementary view of the transcriptome-based detections of biomarkers that are ignored by many existing studies.


Assuntos
Neoplasias da Mama , Humanos , Feminino , Neoplasias da Mama/diagnóstico , Neoplasias da Mama/genética , Fatores de Transcrição/genética , Fatores de Transcrição/metabolismo , Regulação da Expressão Gênica , Transcriptoma , Biomarcadores , Proteínas de Ligação a DNA/genética , Proteínas de Ligação a DNA/metabolismo
12.
Brief Bioinform ; 24(4)2023 07 20.
Artigo em Inglês | MEDLINE | ID: mdl-37427963

RESUMO

Survival analysis is critical to cancer prognosis estimation. High-throughput technologies facilitate the increase in the dimension of genic features, but the number of clinical samples in cohorts is relatively small due to various reasons, including difficulties in participant recruitment and high data-generation costs. Transcriptome is one of the most abundantly available OMIC (referring to the high-throughput data, including genomic, transcriptomic, proteomic and epigenomic) data types. This study introduced a multitask graph attention network (GAT) framework DQSurv for the survival analysis task. We first used a large dataset of healthy tissue samples to pretrain the GAT-based HealthModel for the quantitative measurement of the gene regulatory relations. The multitask survival analysis framework DQSurv used the idea of transfer learning to initiate the GAT model with the pretrained HealthModel and further fine-tuned this model using two tasks i.e. the main task of survival analysis and the auxiliary task of gene expression prediction. This refined GAT was denoted as DiseaseModel. We fused the original transcriptomic features with the difference vector between the latent features encoded by the HealthModel and DiseaseModel for the final task of survival analysis. The proposed DQSurv model stably outperformed the existing models for the survival analysis of 10 benchmark cancer types and an independent dataset. The ablation study also supported the necessity of the main modules. We released the codes and the pretrained HealthModel to facilitate the feature encodings and survival analysis of transcriptome-based future studies, especially on small datasets. The model and the code are available at http://www.healthinformaticslab.org/supp/.


Assuntos
Algoritmos , Neoplasias , Humanos , Proteômica , Análise de Sobrevida
13.
Genes (Basel) ; 14(6)2023 05 24.
Artigo em Inglês | MEDLINE | ID: mdl-37372321

RESUMO

Background: Colon cancer (CC) is common, and the mortality rate greatly increases as the disease progresses to the metastatic stage. Early detection of metastatic colon cancer (mCC) is crucial for reducing the mortality rate. Most previous studies have focused on the top-ranked differentially expressed transcriptomic biomarkers between mCC and primary CC while ignoring non-differentially expressed genes. Results: This study proposed that the complicated inter-feature correlations could be quantitatively formulated as a complementary transcriptomic view. We used a regression model to formulate the correlation between the expression levels of a messenger RNA (mRNA) and its regulatory transcription factors (TFs). The change between the predicted and real expression levels of a query mRNA was defined as the mqTrans value in the given sample, reflecting transcription regulatory changes compared with the model-training samples. A dark biomarker in mCC is defined as an mRNA gene that is non-differentially expressed in mCC but demonstrates mqTrans values significantly associated with mCC. This study detected seven dark biomarkers using 805 samples from three independent datasets. Evidence from the literature supports the role of some of these dark biomarkers. Conclusions: This study presented a complementary high-dimensional analysis procedure for transcriptome-based biomarker investigations with a case study on mCC.


Assuntos
Neoplasias do Colo , Perfilação da Expressão Gênica , Humanos , Biomarcadores , Perfilação da Expressão Gênica/métodos , Transcriptoma/genética , Neoplasias do Colo/genética , Neoplasias do Colo/patologia , RNA Mensageiro/genética
14.
Comput Biol Med ; 160: 107030, 2023 06.
Artigo em Inglês | MEDLINE | ID: mdl-37196456

RESUMO

Methylation is a major DNA epigenetic modification for regulating the biological processes without altering the DNA sequence, and multiple types of DNA methylations have been discovered, including 6mA, 5hmC, and 4mC. Multiple computational approaches were developed to automatically identify the DNA methylation residues using machine learning or deep learning algorithms. The machine learning (ML) based methods are difficult to be transferred to the other predicting tasks of the DNA methylation sites using additional knowledge. Deep learning (DL) may facilitate the transfer learning of knowledge from similar tasks, but they are often ineffective on small datasets. This study proposes an integrated feature representation framework EpiTEAmDNA based on the strategies of transfer learning and ensemble learning, which is evaluated on multiple DNA methylation types across 15 species. EpiTEAmDNA integrates convolutional neural network (CNN) and conventional machine learning methods, and shows improved performances than the existing DL-based methods on small datasets when no additional knowledge is available. The experimental data suggests that the EpiTEAmDNA models may be further improved via transfer learning based on additional knowledge. The evaluation experiments on the independent test datasets also suggest that the proposed EpiTEAmDNA framework outperforms the existing models in most prediction tasks of the 3 DNA methylation types across 15 species. The source code, pre-trained global model, and the EpiTEAmDNA feature representation framework are freely available at http://www.healthinformaticslab.org/supp/.


Assuntos
Aprendizado de Máquina , Redes Neurais de Computação , DNA/genética , Epigênese Genética , Metilação de DNA
15.
J Hazard Mater ; 456: 131717, 2023 Aug 15.
Artigo em Inglês | MEDLINE | ID: mdl-37245369

RESUMO

Herein, L-cysteine (Cys) was modified on zero-valent iron (C-ZVIbm) by using a mechanical ball-milling method to improve the surface functionality and the Cr(VI) removal efficiency. Characterization results indicated that Cys was modified on the surface of ZVI by the specific adsorption of Cys on the oxide shell to form a -COO-Fe complex. The Cr(VI) removal efficiency of C-ZVIbm (99.6%) was much higher than that of ZVIbm (7.3%) in 30 min. The attenuated total reflectance Fourier transform infrared spectroscopy (ATR-FTIR) analysis inferred that Cr(VI) was more likely to be adsorbed on the surface of C-ZVIbm to form bidentate binuclear inner-sphere complexes. The adsorption process was well-matched to the Freundlich isotherm and the pseudo-second-order kinetic model. Electrochemical analysis and electron paramagnetic resonance (ESR) spectroscopy revealed that Cys on the C-ZVIbm lowered the redox potential of Fe(III)/Fe(II), and favored the surface Fe(III)/Fe(II) cycling mediated by the electrons from Fe0 core. These electron transfer processes were beneficial to the surface reduction of Cr(VI) to Cr(III). Our findings provide new understandings into the surface modification of ZVI with a low-molecular weight amino acid to promote in-situ Fe(III)/Fe(II) cycling, and have great potential for the construction of efficient systems for Cr(VI) removal.

16.
Comput Biol Chem ; 104: 107858, 2023 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-37058814

RESUMO

Colon cancer is a common cancer type in both sexes and its mortality rate increases at the metastatic stage. Most studies exclude nondifferentially expressed genes from biomarker analysis of metastatic colon cancers. The motivation of this study is to find the latent associations of the nondifferentially expressed genes with metastatic colon cancers and to evaluate the gender specificity of such associations. This study formulates the expression level prediction of a gene as a regression model trained for primary colon cancers. The difference between a gene's predicted and original expression levels in a testing sample is defined as its mqTrans value (model-based quantitative measure of transcription regulation), which quantitatively measures the change of the gene's transcription regulation in this testing sample. We use the mqTrans analysis to detect the messenger RNA (mRNA) genes with nondifferential expression on their original expression levels but differentially expressed mqTrans values between primary and metastatic colon cancers. These genes are referred to as dark biomarkers of metastatic colon cancer. All dark biomarker genes were verified by two transcriptome profiling technologies, RNA-seq and microarray. The mqTrans analysis of a mixed cohort of both sexes could not recover gender-specific dark biomarkers. Most dark biomarkers overlap with long non-coding RNAs (lncRNAs), and these lncRNAs might have contributed their transcripts to calculating the dark biomarkers' expression levels. Therefore, mqTrans analysis serves as a complementary approach to identify dark biomarkers generally ignored by conventional studies, and it is essential to separate the female and male samples into two analysis experiments. The dataset and mqTrans analysis code are available at https://figshare.com/articles/dataset/22250536.


Assuntos
Adenocarcinoma , Neoplasias do Colo , RNA Longo não Codificante , Humanos , Masculino , Feminino , RNA Longo não Codificante/genética , Biomarcadores Tumorais/genética , Biomarcadores Tumorais/metabolismo , Neoplasias do Colo/genética , Perfilação da Expressão Gênica , Adenocarcinoma/genética , Regulação Neoplásica da Expressão Gênica , Redes Reguladoras de Genes
17.
Technol Health Care ; 31(4): 1171-1187, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-36617797

RESUMO

BACKGROUND: Acne is a skin lesion type widely existing in adolescents, and poses computational challenges for automatic diagnosis. Computer vision algorithms are utilized to detect and determine different subtypes of acne. Most of the existing acne detection algorithms are based on the facial natural images, which carry noisy factors like illuminations. OBJECTIVE: In order to tackle this issue, this study collected a dataset ACNEDer of dermoscopic acne images with annotations. Deep learning methods have demonstrated powerful capabilities in automatic acne diagnosis, and they usually release the training epoch with the best performance as the delivered model. METHODS: This study proposes a novel self-ensemble and stacking-based framework AcneTyper for diagnosing the acne subtypes. Instead of delivering the best epoch, AcneTyper consolidates the prediction results of all training epochs as the latent features and stacks the best subset of these latent features for distinguishing different acne subtypes. RESULTS: The proposed AcneTyper framework achieves a promising detection performance of acne subtypes and even outperforms a clinical dermatologist with two-year experiences by 6.8% in accuracy. CONCLUSION: The method we proposed is used to determine different subtypes of acne and outperforms inexperienced dermatologists and contributes to reducing the probability of misdiagnosis.


Assuntos
Acne Vulgar , Algoritmos , Adolescente , Humanos , Acne Vulgar/diagnóstico por imagem , Acne Vulgar/patologia , Interpretação de Imagem Assistida por Computador/métodos , Dermoscopia/métodos
18.
Per Med ; 20(2): 143-155, 2023 03.
Artigo em Inglês | MEDLINE | ID: mdl-36705049

RESUMO

Aim: Transcriptional regulation is actively involved in the onset and progression of various diseases. This study used the feature-engineering approach model-based quantitative transcription regulation to quantitatively measure the correlation between mRNA and transcription factors in a reference dataset of chronic lymphocytic leukemia (CLL) transcriptomes. Methods: A comprehensive investigation of transcriptional regulation changes in CLL was conducted using 973 samples in six independent datasets. Results & conclusion: Seven mRNAs were detected to have significantly differential model-based quantitative transcription regulation values but no differential expression between CLL patients and controls. We called these genes 'dark biomarkers' because their original expression levels did not show differential changes in the CLL patients. The overlapping lncRNAs might have contributed their transcripts to the expression miscalculations of these dark biomarkers.


Assuntos
Leucemia Linfocítica Crônica de Células B , Humanos , Leucemia Linfocítica Crônica de Células B/genética , Leucemia Linfocítica Crônica de Células B/metabolismo , Fatores de Transcrição/genética , Transcriptoma/genética , Biomarcadores Tumorais/genética
19.
Pac Symp Biocomput ; 28: 157-168, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-36540973

RESUMO

Identifying effective target-disease associations (TDAs) can alleviate the tremendous cost incurred by clinical failures of drug development. Although many machine learning models have been proposed to predict potential novel TDAs rapidly, their credibility is not guaranteed, thus requiring extensive experimental validation. In addition, it is generally challenging for current models to predict meaningful associations for entities with less information, hence limiting the application potential of these models in guiding future research. Based on recent advances in utilizing graph neural networks to extract features from heterogeneous biological data, we develop CreaTDA, an end-to-end deep learning-based framework that effectively learns latent feature representations of targets and diseases to facilitate TDA prediction. We also propose a novel way of encoding credibility information obtained from literature to enhance the performance of TDA prediction and predict more novel TDAs with real evidence support from previous studies. Compared with state-of-the-art baseline methods, CreaTDA achieves substantially better prediction performance on the whole TDA network and its sparse sub-networks containing the proteins associated with few known diseases. Our results demonstrate that CreaTDA can provide a powerful and helpful tool for identifying novel target-disease associations, thereby facilitating drug discovery.


Assuntos
Biologia Computacional , Redes Neurais de Computação , Humanos , Biologia Computacional/métodos , Aprendizado de Máquina , Descoberta de Drogas , Proteínas
20.
IEEE/ACM Trans Comput Biol Bioinform ; 20(2): 1030-1040, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-35503835

RESUMO

Identifying interactions between compounds and proteins is an essential task in drug discovery. To recommend compounds as new drug candidates, applying the computational approaches has a lower cost than conducting the wet-lab experiments. Machine learning-based methods, especially deep learning-based methods, have advantages in learning complex feature interactions between compounds and proteins. However, deep learning models will over-generalize and lead to the problem of predicting less relevant compound-protein pairs when the compound-protein feature interactions are high-dimensional sparse. This problem can be overcome by learning both low-order and high-order feature interactions. In this paper, we propose a novel hybrid model with Factorization Machines and Graph Neural Network called FMGNN to extract the low-order and high-order features, respectively. Then, we design a compound-protein interactions (CPIs) prediction method with pharmacophore features of compound and physicochemical properties of amino acids. The pharmacophore features can ensure that the prediction results much more fit the expectation of biological experiment and the physicochemical properties of amino acids are loaded into the embedding layer to improve the convergence speed and accuracy of protein feature learning. The experimental results on several datasets, especially on an imbalanced large-scale dataset, showed that our proposed method outperforms other existing methods for CPI prediction. The western blot experiment results on wogonin and its candidate target proteins also showed that our proposed method is effective and accurate for finding target proteins. The computer program of implementing the model FMGNN is available at https://github.com/tcygxu2021/FMGNN.


Assuntos
Aminoácidos , Farmacóforo , Redes Neurais de Computação , Proteínas/química , Aprendizado de Máquina
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA