Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 15 de 15
Filtrar
1.
Anal Chem ; 96(21): 8772-8781, 2024 05 28.
Artigo em Inglês | MEDLINE | ID: mdl-38743842

RESUMO

The metabolic signature identification of colorectal cancer is critical for its early diagnosis and therapeutic approaches that will significantly block cancer progression and improve patient survival. Here, we combined an untargeted metabolic analysis strategy based on internal extractive electrospray ionization mass spectrometry and the machine learning approach to analyze metabolites in 173 pairs of cancer samples and matched normal tissue samples to build robust metabolic signature models for diagnostic purposes. Screening and independent validation of metabolic signatures from colorectal cancers via machine learning methods (Logistic Regression_L1 for feature selection and eXtreme Gradient Boosting for classification) was performed to generate a panel of seven signatures with good diagnostic performance (the accuracy of 87.74%, sensitivity of 85.82%, and specificity of 89.66%). Moreover, seven signatures were evaluated according to their ability to distinguish between cancer and normal tissues, with the metabolic molecule PC (30:0) showing good diagnostic performance. In addition, genes associated with PC (30:0) were identified by multiomics analysis (combining metabolic data with transcriptomic data analysis) and our results showed that PC (30:0) could promote the proliferation of colorectal cancer cell SW480, revealing the correlation between genetic changes and metabolic dysregulation in cancer. Overall, our results reveal potential determinants affecting metabolite dysregulation, paving the way for a mechanistic understanding of altered tissue metabolites in colorectal cancer and design interventions for manipulating the levels of circulating metabolites.


Assuntos
Neoplasias Colorretais , Aprendizado de Máquina , Neoplasias Colorretais/metabolismo , Neoplasias Colorretais/diagnóstico , Humanos , Metabolômica , Linhagem Celular Tumoral , Espectrometria de Massas por Ionização por Electrospray , Metaboloma , Proliferação de Células , Multiômica
2.
BMC Infect Dis ; 23(1): 622, 2023 Sep 21.
Artigo em Inglês | MEDLINE | ID: mdl-37735372

RESUMO

BACKGROUND: Coronavirus disease 2019 (COVID-19) is a rapidly developing and sometimes lethal pulmonary disease. Accurately predicting COVID-19 mortality will facilitate optimal patient treatment and medical resource deployment, but the clinical practice still needs to address it. Both complete blood counts and cytokine levels were observed to be modified by COVID-19 infection. This study aimed to use inexpensive and easily accessible complete blood counts to build an accurate COVID-19 mortality prediction model. The cytokine fluctuations reflect the inflammatory storm induced by COVID-19, but their levels are not as commonly accessible as complete blood counts. Therefore, this study explored the possibility of predicting cytokine levels based on complete blood counts. METHODS: We used complete blood counts to predict cytokine levels. The predictive model includes an autoencoder, principal component analysis, and linear regression models. We used classifiers such as support vector machine and feature selection models such as adaptive boost to predict the mortality of COVID-19 patients. RESULTS: Complete blood counts and original cytokine levels reached the COVID-19 mortality classification area under the curve (AUC) values of 0.9678 and 0.9111, respectively, and the cytokine levels predicted by the feature set alone reached the classification AUC value of 0.9844. The predicted cytokine levels were more significantly associated with COVID-19 mortality than the original values. CONCLUSIONS: Integrating the predicted cytokine levels and complete blood counts improved a COVID-19 mortality prediction model using complete blood counts only. Both the cytokine level prediction models and the COVID-19 mortality prediction models are publicly available at http://www.healthinformaticslab.org/supp/resources.php .


Assuntos
COVID-19 , Humanos , Área Sob a Curva , Citocinas , Modelos Lineares , Análise de Componente Principal
3.
Skin Res Technol ; 28(5): 677-688, 2022 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-35639819

RESUMO

BACKGROUND: Acne is one of the most common skin lesions in adolescents. Some severe or inflammatory acne leads to scars, which may have major impacts on patients' quality of life or even job prospects. Grading acne plays an important role in diagnosis, and the diagnosis is made by counting the number of acne. It is a labor-intensive job and it is easy for dermatologists to make mistakes, so it is very important to develop automatic diagnosis methods. Ensemble learning may improve the prediction results of the base models, but its time complexity is relatively high. The ensemble pruning strategy may solve this computational challenge by removing the redundant base models. MATERIALS AND METHODS: This study proposed a novel ensemble pruning framework of deep learning models to accurately detect and grade acne using images. First, we train multi-base models and prune the redundancy models according to the performance and diversity of the models. Then, we construct the new features of the training data by the base models we select in the previous step. Next, we remove the redundancy models further by a feature selection algorithm. Finally, we integrate all the base models by classifiers. The ensemble pruning algorithm was proposed to prune the deep learning base models. RESULTS: The experimental data showed that the ensemble pruned framework achieved a prediction accuracy of 85.82% on the acne dataset, better than the existing studies. To verify our method's effectiveness, we test our method in a skin cancer dataset and greatly outperform the state-of-the-art methods. CONCLUSION: The method we proposed is used to grade acne. Our method's performance outperforms state-of-the-art methods on two datasets, and it can also remove redundancy models to reduce computational complexity.


Assuntos
Acne Vulgar , Aprendizado Profundo , Acne Vulgar/diagnóstico por imagem , Adolescente , Algoritmos , Humanos , Qualidade de Vida
4.
Medicina (Kaunas) ; 57(2)2021 Jan 22.
Artigo em Inglês | MEDLINE | ID: mdl-33499377

RESUMO

BACKGROUND AND OBJECTIVE: Primary lung cancer is a lethal and rapidly-developing cancer type and is one of the most leading causes of cancer deaths. MATERIALS AND METHODS: Statistical methods such as Cox regression are usually used to detect the prognosis factors of a disease. This study investigated survival prediction using machine learning algorithms. The clinical data of 28,458 patients with primary lung cancers were collected from the Surveillance, Epidemiology, and End Results (SEER) database. RESULTS: This study indicated that the survival rate of women with primary lung cancer was often higher than that of men (p < 0.001). Seven popular machine learning algorithms were utilized to evaluate one-year, three-year, and five-year survival prediction The two classifiers extreme gradient boosting (XGB) and logistic regression (LR) achieved the best prediction accuracies. The importance variable of the trained XGB models suggested that surgical removal (feature "Surgery") made the largest contribution to the one-year survival prediction models, while the metastatic status (feature "N" stage) of the regional lymph nodes was the most important contributor to three-year and five-year survival prediction. The female patients' three-year prognosis model achieved a prediction accuracy of 0.8297 on the independent future samples, while the male model only achieved the accuracy 0.7329. CONCLUSIONS: This data suggested that male patients may have more complicated factors in lung cancer than females, and it is necessary to develop gender-specific diagnosis and prognosis models.


Assuntos
Neoplasias Pulmonares , Aprendizado de Máquina , Algoritmos , Feminino , Humanos , Modelos Logísticos , Neoplasias Pulmonares/diagnóstico , Masculino , Prognóstico
5.
J Vis Exp ; (205)2024 Mar 01.
Artigo em Inglês | MEDLINE | ID: mdl-38497637

RESUMO

Transcriptome represents the expression levels of many genes in a sample and has been widely used in biological research and clinical practice. Researchers usually focused on transcriptomic biomarkers with differential representations between a phenotype group and a control group of samples. This study presented a multitask graph-attention network (GAT) learning framework to learn the complex inter-genic interactions of the reference samples. A demonstrative reference model was pre-trained on the healthy samples (HealthModel), which could be directly used to generate the model-based quantitative transcriptional regulation (mqTrans) view of the independent test transcriptomes. The generated mqTrans view of transcriptomes was demonstrated by prediction tasks and dark biomarker detection. The coined term "dark biomarker" stemmed from its definition that a dark biomarker showed differential representation in the mqTrans view but no differential expression in its original expression level. A dark biomarker was always overlooked in traditional biomarker detection studies due to the absence of differential expression. The source code and the manual of the pipeline HealthModelPipe can be downloaded from http://www.healthinformaticslab.org/supp/resources.php.


Assuntos
Perfilação da Expressão Gênica , Transcriptoma , Regulação da Expressão Gênica , Biomarcadores , Fenótipo
6.
Adv Biol (Weinh) ; 7(12): e2300189, 2023 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-37423953

RESUMO

This work hypothesizes that some genes undergo radically changed transcription regulations (TRs) in breast cancer (BC), but don't show differential expressions for unknown reasons. The TR of a gene is quantitatively formulated by a regression model between the expression of this gene and multiple transcription factors (TFs). The difference between the predicted and real expression levels of a gene in a query sample is defined as the mqTrans value of this gene, which quantitatively reflects its regulatory changes. This work systematically screens the undifferentially expressed genes with differentially expressed mqTrans values in 1036 samples across five datasets and three ethnic groups. This study calls the 25 genes satisfying the above hypothesis in at least four datasets as dark biomarkers, and the strong dark biomarker gene CXXC5 (CXXC Finger Protein 5) is even supported by all the five independent BC datasets. Although CXXC5 does not show differential expressions in BC, its transcription regulations show quantitative associations with BCs in diversified cohorts. The overlapping long noncoding RNAs (lncRNAs) may have contributed their transcripts to the expression miscalculations of dark biomarkers. The mqTrans analysis serves as a complementary view of the transcriptome-based detections of biomarkers that are ignored by many existing studies.


Assuntos
Neoplasias da Mama , Humanos , Feminino , Neoplasias da Mama/diagnóstico , Neoplasias da Mama/genética , Fatores de Transcrição/genética , Fatores de Transcrição/metabolismo , Regulação da Expressão Gênica , Transcriptoma , Biomarcadores , Proteínas de Ligação a DNA/genética , Proteínas de Ligação a DNA/metabolismo
7.
Comput Biol Chem ; 104: 107858, 2023 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-37058814

RESUMO

Colon cancer is a common cancer type in both sexes and its mortality rate increases at the metastatic stage. Most studies exclude nondifferentially expressed genes from biomarker analysis of metastatic colon cancers. The motivation of this study is to find the latent associations of the nondifferentially expressed genes with metastatic colon cancers and to evaluate the gender specificity of such associations. This study formulates the expression level prediction of a gene as a regression model trained for primary colon cancers. The difference between a gene's predicted and original expression levels in a testing sample is defined as its mqTrans value (model-based quantitative measure of transcription regulation), which quantitatively measures the change of the gene's transcription regulation in this testing sample. We use the mqTrans analysis to detect the messenger RNA (mRNA) genes with nondifferential expression on their original expression levels but differentially expressed mqTrans values between primary and metastatic colon cancers. These genes are referred to as dark biomarkers of metastatic colon cancer. All dark biomarker genes were verified by two transcriptome profiling technologies, RNA-seq and microarray. The mqTrans analysis of a mixed cohort of both sexes could not recover gender-specific dark biomarkers. Most dark biomarkers overlap with long non-coding RNAs (lncRNAs), and these lncRNAs might have contributed their transcripts to calculating the dark biomarkers' expression levels. Therefore, mqTrans analysis serves as a complementary approach to identify dark biomarkers generally ignored by conventional studies, and it is essential to separate the female and male samples into two analysis experiments. The dataset and mqTrans analysis code are available at https://figshare.com/articles/dataset/22250536.


Assuntos
Adenocarcinoma , Neoplasias do Colo , RNA Longo não Codificante , Humanos , Masculino , Feminino , RNA Longo não Codificante/genética , Biomarcadores Tumorais/genética , Biomarcadores Tumorais/metabolismo , Neoplasias do Colo/genética , Perfilação da Expressão Gênica , Adenocarcinoma/genética , Regulação Neoplásica da Expressão Gênica , Redes Reguladoras de Genes
8.
Comput Biol Med ; 163: 107187, 2023 09.
Artigo em Inglês | MEDLINE | ID: mdl-37393787

RESUMO

Artificial intelligence (AI) has achieved significant progress in the field of drug discovery. AI-based tools have been used in all aspects of drug discovery, including chemical structure recognition. We propose a chemical structure recognition framework, Optical Chemical Molecular Recognition (OCMR), to improve the data extraction capability in practical scenarios compared with the rule-based and end-to-end deep learning models. The proposed OCMR framework enhances the recognition performances via the integration of local information in the topology of molecular graphs. OCMR handles complex tasks like non-canonical drawing and atomic group abbreviation and substantially improves the current state-of-the-art results on multiple public benchmark datasets and one internally curated dataset.


Assuntos
Inteligência Artificial , Benchmarking , Descoberta de Drogas
9.
Genes (Basel) ; 14(12)2023 12 01.
Artigo em Inglês | MEDLINE | ID: mdl-38136991

RESUMO

A transcriptome profiles the expression levels of genes in cells and has accumulated a huge amount of public data. Most of the existing biomarker-related studies investigated the differential expression of individual transcriptomic features under the assumption of inter-feature independence. Many transcriptomic features without differential expression were ignored from the biomarker lists. This study proposed a computational analysis protocol (mqTrans) to analyze transcriptomes from the view of high-dimensional inter-feature correlations. The mqTrans protocol trained a regression model to predict the expression of an mRNA feature from those of the transcription factors (TFs). The difference between the predicted and real expression of an mRNA feature in a query sample was defined as the mqTrans feature. The new mqTrans view facilitated the detection of thirteen transcriptomic features with differentially expressed mqTrans features, but without differential expression in the original transcriptomic values in three independent datasets of lung cancer. These features were called dark biomarkers because they would have been ignored in a conventional differential analysis. The detailed discussion of one dark biomarker, GBP5, and additional validation experiments suggested that the overlapping long non-coding RNAs might have contributed to this interesting phenomenon. In summary, this study aimed to find undifferentially expressed genes with significantly changed mqTrans values in lung cancer. These genes were usually ignored in most biomarker detection studies of undifferential expression. However, their differentially expressed mqTrans values in three independent datasets suggested their strong associations with lung cancer.


Assuntos
Neoplasias Pulmonares , Humanos , Neoplasias Pulmonares/genética , Neoplasias Pulmonares/diagnóstico , Perfilação da Expressão Gênica , Transcriptoma/genética , Biomarcadores , RNA Mensageiro/genética
10.
J Bioinform Comput Biol ; 20(3): 2250013, 2022 06.
Artigo em Inglês | MEDLINE | ID: mdl-35818996

RESUMO

Modern biotechnologies have generated huge amount of OMIC data, among which transcriptomes and methylomes are two major OMIC types. Transcriptomes measure the expression levels of all the transcripts while methylomes depict the cytosine methylation levels across a genome. Both OMIC data types could be generated by array or sequencing. And some studies deliver many more features (the number of features is denoted as [Formula: see text]) for a sample than the number [Formula: see text] of samples in a cohort, which induce the "large [Formula: see text] small [Formula: see text]" paradigm. This study focused on the classification problem about OMIC with "large [Formula: see text] small [Formula: see text]" paradigm. A Siamese convolutional network was utilized to transform the OMIC features into a new space with minimized intra-class distances and maximized inter-class distances between the samples. The proposed feature engineering algorithm SiaCo was comprehensively evaluated using both transcriptome and methylome datasets. The experimental data showed that SiaCo generated SiaCo features with improved classification accuracies for binary classification problems, and achieved improvements on the independent test dataset. The individual SiaCo features did not show better inter-class discrimination powers than the original OMIC features. This may be due to that the Siamese convolutional network optimized the collective performances of the SiaCo features, instead of the individual feature's discrimination power. The inherent transformation nature of the Siamese twin network also makes the SiaCo features lack of interpretability. The source code of SiaCo is freely available at http://www.healthinformaticslab.org/supp/resources.php.


Assuntos
Algoritmos , Genoma , Humanos , Software
11.
Genes (Basel) ; 13(10)2022 10 21.
Artigo em Inglês | MEDLINE | ID: mdl-36292801

RESUMO

Melanoma is a lethal skin disease that develops from moles. This study aimed to integrate multimodal data to predict metastatic melanoma, which is highly aggressive and difficult to treat. The proposed EnsembleSKCM method evaluated the prediction performances of long noncoding RNAs (lncRNAs), protein-coding messenger genes (mRNAs) and pathology images (images) for metastatic melanoma. Feature selection was used to screen for metastatic biomarkers in the lncRNA and mRNA datasets. The integrated EnsembleSKCM model was built based on the weighted results of the lncRNA-, mRNA- and image-based models. EnsembleSKCM achieved 0.9444 in the prediction accuracy of metastatic melanoma and outperformed the single-modal prediction models based on the lncRNA, mRNA and image data. The experimental data suggest the importance of integrating the complementary information from the three data modalities. WGCNA was used to analyze the relationship of molecular-level features and image features, and the results show connections between them. Another cohort was used to validate our prediction.


Assuntos
Melanoma , Segunda Neoplasia Primária , RNA Longo não Codificante , Humanos , RNA Longo não Codificante/genética , Melanoma/diagnóstico por imagem , Melanoma/genética , Melanoma/patologia , RNA Mensageiro/genética , Biomarcadores
12.
Comput Biol Med ; 148: 105883, 2022 09.
Artigo em Inglês | MEDLINE | ID: mdl-35878490

RESUMO

The transcriptome describes the expression of all genes in a sample. Most studies have investigated the differential patterns or discrimination powers of transcript expression levels. In this study, we hypothesized that the quantitative correlations between the expression levels of transcription factors (TFs) and their regulated target genes (mRNAs) serve as a novel view of healthy status, and a disease sample exhibits a differential landscape (mqTrans) of transcription regulations compared with healthy status. We formulated quantitative transcription regulation relationships of metabolism-related genes as a multi-input multi-output regression model via a gated recurrent unit (GRU) network. The GRU model was trained using healthy blood transcriptomes and the expression levels of mRNAs were predicted by those of the TFs. The mqTrans feature of a gene was defined as the difference between its predicted and actual expression levels. A pan-cancer investigation of the differentially expressed mqTrans features was conducted between the early- and late-stage cancers in 26 cancer types of The Cancer Genome Atlas database. This study focused on the differentially expressed mqTrans features, that did not show differential expression in the actual expression levels. These genes could not be detected by conventional differential analysis. Such dark biomarkers are worthy of further wet-lab investigation. The experimental data also showed that the proposed mqTrans investigation improved the classification between early- and late-stage samples for some cancer types. Thus, the mqTrans features serve as a complementary view to transcriptomes, an OMIC type with mature high-throughput production technologies, and abundant public resources.


Assuntos
Regulação Neoplásica da Expressão Gênica , Neoplasias , Perfilação da Expressão Gênica , Redes Reguladoras de Genes , Humanos , RNA Mensageiro , Fatores de Transcrição , Transcriptoma
13.
Comput Biol Med ; 133: 104405, 2021 06.
Artigo em Inglês | MEDLINE | ID: mdl-33930763

RESUMO

The era of big data introduces both opportunities and challenges for biomedical researchers. One of the inherent difficulties in the biomedical research field is to recruit large cohorts of samples, while high-throughput biotechnologies may produce thousands or even millions of features for each sample. Researchers tend to evaluate the individual correlation of each feature with the class label and use the incremental feature selection (IFS) strategy to select the top-ranked features with the best prediction performance. Recent experimental data showed that a subset of continuously ranked features randomly restarted from a low-ranked feature (an RIFS block) may outperform the subset of top-ranked features. This study proposed a feature selection Algorithm RIFS2D by integrating multiple RIFS blocks. A comprehensive comparative experiment was conducted with the IFS, RIFS and existing feature selection algorithms and demonstrated that a subset of low-ranked features may also achieve promising prediction performance. This study suggested that a prediction model with promising performance may be trained by low-ranked features, even when top-ranked features did not achieve satisfying prediction performance. Further comparative experiments were conducted between RIFS2D and t-tests for the detection of early-stage breast cancer. The data showed that the RIFS2D-recommended features achieved better prediction accuracy and were targeted by more drugs than the t-test top-ranked features.


Assuntos
Algoritmos , Biomarcadores
14.
Comput Biol Med ; 135: 104571, 2021 08.
Artigo em Inglês | MEDLINE | ID: mdl-34166881

RESUMO

Cancer is one of the major causes of mortality worldwide. Regional lymph node metastasis is an important mechanism during the spread of human cancers, in which transcription regulation plays an essential role. This study formulated a regression-model-based quantitative transcription regulation (mqTrans) between one mRNA gene and multiple transcription factors (TFs). Computational pan-cancer screening was carried out to detect the quantitative dysregulation of transcription regulation in the regional lymph node metastasis of 18 cancer types. Only a few metastasis-dysregulated mqTrans models were shared among the cancer types. The mRNA genes of the metastasis-dysregulated mqTrans models were not differentially expressed in regional lymph node metastasis. The experimental data suggested that mqTrans technology provided a complementary approach to the evaluation of transcription regulation mechanisms and may facilitate its quantitative investigation in other phenotypes.


Assuntos
Linfonodos , Humanos , Metástase Linfática , RNA Mensageiro
15.
Diagnostics (Basel) ; 11(2)2021 Feb 19.
Artigo em Inglês | MEDLINE | ID: mdl-33669819

RESUMO

The incidence and mortality rates of lung cancers are different between females and males. Therefore, sex information should be an important part of how to train and optimize a diagnostic model. However, most of the existing studies do not fully utilize this information. This study carried out a comparative investigation between sex-specific models and sex-independent models. Three feature selection algorithms and five classifiers were utilized to evaluate the contribution of the sex information to the detection of early-stage lung cancers. Both lung adenocarcinoma (LUAD) and lung squamous cell carcinoma (LUSC) showed that the sex-specific models outperformed the sex-independent detection of early-stage lung cancers. The Venn plots suggested that females and males shared only a few transcriptomic biomarkers of early-stage lung cancers. Our experimental data suggested that sex information should be included in optimizing disease diagnosis models.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA