Pesquisa | Portal de Pesquisa da BVS

ifDEEPre: large protein language-based deep learning enables interpretable and fast predictions of enzyme commission numbers.

Tan, Qingxiong; Xiao, Jin; Chen, Jiayang; Wang, Yixuan; Zhang, Zeliang; Zhao, Tiancheng; Li, Yu.

Brief Bioinform ; 25(4)2024 May 23.

Artigo em Inglês | MEDLINE | ID: mdl-38942594

RESUMO

Accurate understanding of the biological functions of enzymes is vital for various tasks in both pathologies and industrial biotechnology. However, the existing methods are usually not fast enough and lack explanations on the prediction results, which severely limits their real-world applications. Following our previous work, DEEPre, we propose a new interpretable and fast version (ifDEEPre) by designing novel self-guided attention and incorporating biological knowledge learned via large protein language models to accurately predict the commission numbers of enzymes and confirm their functions. Novel self-guided attention is designed to optimize the unique contributions of representations, automatically detecting key protein motifs to provide meaningful interpretations. Representations learned from raw protein sequences are strictly screened to improve the running speed of the framework, 50 times faster than DEEPre while requiring 12.89 times smaller storage space. Large language modules are incorporated to learn physical properties from hundreds of millions of proteins, extending biological knowledge of the whole network. Extensive experiments indicate that ifDEEPre outperforms all the current methods, achieving more than 14.22% larger F1-score on the NEW dataset. Furthermore, the trained ifDEEPre models accurately capture multi-level protein biological patterns and infer evolutionary trends of enzymes by taking only raw sequences without label information. Meanwhile, ifDEEPre predicts the evolutionary relationships between different yeast sub-species, which are highly consistent with the ground truth. Case studies indicate that ifDEEPre can detect key amino acid motifs, which have important implications for designing novel enzymes. A web server running ifDEEPre is available at https://proj.cse.cuhk.edu.hk/aihlab/ifdeepre/ to provide convenient services to the public. Meanwhile, ifDEEPre is freely available on GitHub at https://github.com/ml4bio/ifDEEPre/.

Assuntos

Aprendizado Profundo , Enzimas , Enzimas/química , Enzimas/metabolismo , Biologia Computacional/métodos , Software , Proteínas/química , Proteínas/metabolismo , Bases de Dados de Proteínas , Algoritmos

AcrNET: predicting anti-CRISPR with deep learning.

Li, Yunxiang; Wei, Yumeng; Xu, Sheng; Tan, Qingxiong; Zong, Licheng; Wang, Jiuming; Wang, Yixuan; Chen, Jiayang; Hong, Liang; Li, Yu.

Bioinformatics ; 39(5)2023 05 04.

Artigo em Inglês | MEDLINE | ID: mdl-37084259

RESUMO

MOTIVATION: As an important group of proteins discovered in phages, anti-CRISPR inhibits the activity of the immune system of bacteria (i.e. CRISPR-Cas), offering promise for gene editing and phage therapy. However, the prediction and discovery of anti-CRISPR are challenging due to their high variability and fast evolution. Existing biological studies rely on known CRISPR and anti-CRISPR pairs, which may not be practical considering the huge number. Computational methods struggle with prediction performance. To address these issues, we propose a novel deep neural network for anti-CRISPR analysis (AcrNET), which achieves significant performance. RESULTS: On both the cross-fold and cross-dataset validation, our method outperforms the state-of-the-art methods. Notably, AcrNET improves the prediction performance by at least 15% regarding the F1 score for the cross-dataset test problem comparing with state-of-art Deep Learning method. Moreover, AcrNET is the first computational method to predict the detailed anti-CRISPR classes, which may help illustrate the anti-CRISPR mechanism. Taking advantage of a Transformer protein language model ESM-1b, which was pre-trained on 250 million protein sequences, AcrNET overcomes the data scarcity problem. Extensive experiments and analysis suggest that the Transformer model feature, evolutionary feature, and local structure feature complement each other, which indicates the critical properties of anti-CRISPR proteins. AlphaFold prediction, further motif analysis, and docking experiments further demonstrate that AcrNET can capture the evolutionarily conserved pattern and the interaction between anti-CRISPR and the target implicitly. AVAILABILITY AND IMPLEMENTATION: Web server: https://proj.cse.cuhk.edu.hk/aihlab/AcrNET/. Training code and pre-trained model are available at.

Assuntos

Bacteriófagos , Aprendizado Profundo , Redes Neurais de Computação , Edição de Genes , Proteínas

Unbiased organism-agnostic and highly sensitive signal peptide predictor with deep protein language model.

Shen, Junbo; Yu, Qinze; Chen, Shenyang; Tan, Qingxiong; Li, Jingchen; Li, Yu.

Nat Comput Sci ; 4(1): 29-42, 2024 Jan.

Artigo em Inglês | MEDLINE | ID: mdl-38177492

RESUMO

Signal peptides (SPs) are essential to target and transfer transmembrane and secreted proteins to the correct positions. Many existing computational tools for predicting SPs disregard the extreme data imbalance problem and rely on additional group information of proteins. Here we introduce Unbiased Organism-agnostic Signal Peptide Network (USPNet), an SP classification and cleavage-site prediction deep learning method. Extensive experimental results show that USPNet substantially outperforms previous methods on classification performance by 10%. An SP-discovering pipeline with USPNet is designed to explore unprecedented SPs from metagenomic data. It reveals 347 SP candidates, with the lowest sequence identity between our candidates and the closest SP in the training dataset at only 13%. In addition, the template modeling scores between candidates and SPs in the training set are mostly above 0.8. The results showcase that USPNet has learnt the SP structure with raw amino acid sequences and the large protein language model, thereby enabling the discovery of unknown SPs.

Assuntos

Sinais Direcionadores de Proteínas , Proteínas , Sinais Direcionadores de Proteínas/genética , Proteínas/química , Sequência de Aminoácidos

Improving power ramp rate of a coal-fired power plant by a bypass steam accumulator.

Ding, Hongyu; Ding, Sibian; Tan, Qingxiong; Zhang, Cheng; Fang, Qingyan; Yang, Tao.

Heliyon ; 10(11): e32412, 2024 Jun 15.

Artigo em Inglês | MEDLINE | ID: mdl-38912492

RESUMO

The increasing penetration of high-volatility renewable energy sources in the power system presents higher demands for flexibility from coal-fired power plant (CFPP). To enhance the flexibility of CFPPs, researchers have conducted a significant amount of thermal-system-level research in recent years on increasing system peak shaving depth. However, the load ramp rate of CFPPs under deep peak shaving is rarely discussed, despite its significance to the overall flexibility performance of CFPPs. This paper proposes a steam accumulator storage system integrating to the turbine's bypass system. The steam accumulator charges directly with working fluid from the live steam or reheat systems and discharge to the turbine, responding quickly to power ramp commands. A steady state model and a dynamic model of the proposed system were built and validated, and the calculation shows that the proposed scheme provides a load change of +2.13 % Pe and -8.3%Pe during a round-trip with a power efficiency of 63.6 % at a unit load of 40 % THA. The unit's load increase rate under coordinated control was enhanced by 1.5 % Pe/min, reaching 3 % Pe/min, using the proposed steam accumulator without revising the original controls, and the load decrease rate reached at least 5 % Pe/min. The results indicate that the proposed system provides a straightforward, easy-to-implement, and efficient solution for enhancing the load ramp rate of CFPPs at low loads.

Cross-Domain Missingness-Aware Time-Series Adaptation With Similarity Distillation in Medical Applications.

Yang, Baoyao; Ye, Mang; Tan, Qingxiong; Yuen, Pong C.

IEEE Trans Cybern ; 52(5): 3394-3407, 2022 May.

Artigo em Inglês | MEDLINE | ID: mdl-32795976

RESUMO

Medical time series of laboratory tests has been collected in electronic health records (EHRs) in many countries. Machine-learning algorithms have been proposed to analyze the condition of patients using these medical records. However, medical time series may be recorded using different laboratory parameters in different datasets. This results in the failure of applying a pretrained model on a test dataset containing a time series of different laboratory parameters. This article proposes to solve this problem with an unsupervised time-series adaptation method that generates time series across laboratory parameters. Specifically, a medical time-series generation network with similarity distillation is developed to reduce the domain gap caused by the difference in laboratory parameters. The relations of different laboratory parameters are analyzed, and the similarity information is distilled to guide the generation of target-domain specific laboratory parameters. To further improve the performance in cross-domain medical applications, a missingness-aware feature extraction network is proposed, where the missingness patterns reflect the health conditions and, thus, serve as auxiliary features for medical analysis. In addition, we also introduce domain-adversarial networks in both feature level and time-series level to enhance the adaptation across domains. Experimental results show that the proposed method achieves good performance on both private and publicly available medical datasets. Ablation studies and distribution visualization are provided to further analyze the properties of the proposed method.

Assuntos

Algoritmos , Destilação , Registros Eletrônicos de Saúde , Humanos , Aprendizado de Máquina , Fatores de Tempo

Novel machine learning models outperform risk scores in predicting hepatocellular carcinoma in patients with chronic viral hepatitis.

Wong, Grace Lai-Hung; Hui, Vicki Wing-Ki; Tan, Qingxiong; Xu, Jingwen; Lee, Hye Won; Yip, Terry Cheuk-Fung; Yang, Baoyao; Tse, Yee-Kit; Yin, Chong; Lyu, Fei; Lai, Jimmy Che-To; Lui, Grace Chung-Yan; Chan, Henry Lik-Yuen; Yuen, Pong-Chi; Wong, Vincent Wai-Sun.

JHEP Rep ; 4(3): 100441, 2022 Mar.

Artigo em Inglês | MEDLINE | ID: mdl-35198928

RESUMO

BACKGROUND & AIMS: Accurate hepatocellular carcinoma (HCC) risk prediction facilitates appropriate surveillance strategy and reduces cancer mortality. We aimed to derive and validate novel machine learning models to predict HCC in a territory-wide cohort of patients with chronic viral hepatitis (CVH) using data from the Hospital Authority Data Collaboration Lab (HADCL). METHODS: This was a territory-wide, retrospective, observational, cohort study of patients with CVH in Hong Kong in 2000-2018 identified from HADCL based on viral markers, diagnosis codes, and antiviral treatment for chronic hepatitis B and/or C. The cohort was randomly split into training and validation cohorts in a 7:3 ratio. Five popular machine learning methods, namely, logistic regression, ridge regression, AdaBoost, decision tree, and random forest, were performed and compared to find the best prediction model. RESULTS: A total of 124,006 patients with CVH with complete data were included to build the models. In the training cohort (n = 86,804; 6,821 HCC), ridge regression (area under the receiver operating characteristic curve [AUROC] 0.842), decision tree (0.952), and random forest (0.992) performed the best. In the validation cohort (n = 37,202; 2,875 HCC), ridge regression (AUROC 0.844) and random forest (0.837) maintained their accuracy, which was significantly higher than those of HCC risk scores: CU-HCC (0.672), GAG-HCC (0.745), REACH-B (0.671), PAGE-B (0.748), and REAL-B (0.712) scores. The low cut-off (0.07) of HCC ridge score (HCC-RS) achieved 90.0% sensitivity and 98.6% negative predictive value (NPV) in the validation cohort. The high cut-off (0.15) of HCC-RS achieved high specificity (90.0%) and NPV (95.6%); 31.1% of patients remained indeterminate. CONCLUSIONS: HCC-RS from the ridge regression machine learning model accurately predicted HCC in patients with CVH. These machine learning models may be developed as built-in functional keys or calculators in electronic health systems to reduce cancer mortality. LAY SUMMARY: Novel machine learning models generated accurate risk scores for hepatocellular carcinoma (HCC) in patients with chronic viral hepatitis. HCC ridge score was consistently more accurate than existing HCC risk scores. These models may be incorporated into electronic medical health systems to develop appropriate cancer surveillance strategies and reduce cancer death.

Importance-aware personalized learning for early risk prediction using static and dynamic health data.

Tan, Qingxiong; Ye, Mang; Ma, Andy Jinhua; Yip, Terry Cheuk-Fung; Wong, Grace Lai-Hung; Yuen, Pong C.

J Am Med Inform Assoc ; 28(4): 713-726, 2021 03 18.

Artigo em Inglês | MEDLINE | ID: mdl-33496786

RESUMO

OBJECTIVE: Accurate risk prediction is important for evaluating early medical treatment effects and improving health care quality. Existing methods are usually designed for dynamic medical data, which require long-term observations. Meanwhile, important personalized static information is ignored due to the underlying uncertainty and unquantifiable ambiguity. It is urgent to develop an early risk prediction method that can adaptively integrate both static and dynamic health data. MATERIALS AND METHODS: Data were from 6367 patients with Peptic Ulcer Bleeding between 2007 and 2016. This article develops a novel End-to-end Importance-Aware Personalized Deep Learning Approach (eiPDLA) to achieve accurate early clinical risk prediction. Specifically, eiPDLA introduces a long short-term memory with temporal attention to learn sequential dependencies from time-stamped records and simultaneously incorporating a residual network with correlation attention to capture their influencing relationship with static medical data. Furthermore, a new multi-residual multi-scale network with the importance-aware mechanism is designed to adaptively fuse the learned multisource features, automatically assigning larger weights to important features while weakening the influence of less important features. RESULTS: Extensive experimental results on a real-world dataset illustrate that our method significantly outperforms the state-of-the-arts for early risk prediction under various settings (eg, achieving an AUC score of 0.944 at 1 year ahead of risk prediction). Case studies indicate that the achieved prediction results are highly interpretable. CONCLUSION: These results reflect the importance of combining static and dynamic health data, mining their influencing relationship, and incorporating the importance-aware mechanism to automatically identify important features. The achieved accurate early risk prediction results save precious time for doctors to timely design effective treatments and improve clinical outcomes.

Assuntos

Aprendizado Profundo , Úlcera Péptica Hemorrágica , Medicina de Precisão , Medição de Risco/métodos , Mineração de Dados , Conjuntos de Dados como Assunto , Humanos , Modelos Teóricos , Redes Neurais de Computação , Prognóstico

Explainable Uncertainty-Aware Convolutional Recurrent Neural Network for Irregular Medical Time Series.

Tan, Qingxiong; Ye, Mang; Ma, Andy Jinhua; Yang, Baoyao; Yip, Terry Cheuk-Fung; Wong, Grace Lai-Hung; Yuen, Pong C.

IEEE Trans Neural Netw Learn Syst ; 32(10): 4665-4679, 2021 10.

Artigo em Inglês | MEDLINE | ID: mdl-33055037

RESUMO

Influenced by the dynamic changes in the severity of illness, patients usually take examinations in hospitals irregularly, producing a large volume of irregular medical time-series data. Performing diagnosis prediction from the irregular medical time series is challenging because the intervals between consecutive records significantly vary along time. Existing methods often handle this problem by generating regular time series from the irregular medical records without considering the uncertainty in the generated data, induced by the varying intervals. Thus, a novel Uncertainty-Aware Convolutional Recurrent Neural Network (UA-CRNN) is proposed in this article, which introduces the uncertainty information in the generated data to boost the risk prediction. To tackle the complex medical time series with subseries of different frequencies, the uncertainty information is further incorporated into the subseries level rather than the whole sequence to seamlessly adjust different time intervals. Specifically, a hierarchical uncertainty-aware decomposition layer (UADL) is designed to adaptively decompose time series into different subseries and assign them proper weights in accordance with their reliabilities. Meanwhile, an Explainable UA-CRNN (eUA-CRNN) is proposed to exploit filters with different passbands to ensure the unity of components in each subseries and the diversity of components in different subseries. Furthermore, eUA-CRNN incorporates with an uncertainty-aware attention module to learn attention weights from the uncertainty information, providing the explainable prediction results. The extensive experimental results on three real-world medical data sets illustrate the superiority of the proposed method compared with the state-of-the-art methods.

Assuntos

Aprendizado Profundo/tendências , Registros Eletrônicos de Saúde/tendências , Redes Neurais de Computação , Incerteza , Humanos , Fatores de Tempo

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA