Pesquisa | BVS - MINISTÉRIO DA SAÚDE

1.

Inspection of defects in composite structures using long pulse thermography and shearography.

Wei, Yanjie; Xiao, Yao; Gu, Xiaohui; Ren, Jianying; Zhang, Yu; Zhang, Dongsheng; Chen, Yanhong; Li, Haiyan; Li, Shaohua.

Heliyon ; 10(12): e33184, 2024 Jun 30.

Artigo em Inglês | MEDLINE | ID: mdl-39005912

RESUMO

Long pulse thermography (LPT) and shearography have been developed as primary methods for detecting debonding or delamination defects in composites due to their full-field imaging, non-contact operation, and high detection efficiency. Both methods utilize halogen lamps as the excitation source for thermal loading. However, the defects detected by the two techniques differ due to their distinct inspection mechanisms. In this study, LPT and shearography are employed to evaluate internal damage in various composite structures. The experimental results demonstrate that LPT, when combined with thermal signal processing algorithms, can clearly detect debonding defects in rubber-to-metal bonded plates, whereas excessive adhesive defects can only be identified by shearography. Flat-bottom holes in the CFRP panel can only be detected by LPT, and shearography is particularly effective for detecting composite materials with a metal skin. For the quantitative measurement of defect sizes, the average errors of the rubber-to-metal bonded plate and CFRP panel using LPT are 4.9 % and 2.2 %, respectively, whereas the average errors of the rubber-to-metal bonded plate and aluminum honeycomb panel using shearography are 15.12 % and 95.4 %, respectively. This indicates that LPT is superior to shearography in quantitatively measuring defect sizes. These two nondestructive testing methods, based on different principles, each have their own advantages and disadvantages. Employing a multi-modal inspection method can leverage their complementary advantages, preventing misdetection and leakage of internal defects in composites.

2.

RevGraphVAMP: A protein molecular simulation analysis model combining graph convolutional neural networks and physical constraints.

Huang, Ying; Zhang, Huiling; Lin, Zhenli; Wei, Yanjie; Xi, Wenhui.

Methods ; 229: 163-174, 2024 Sep.

Artigo em Inglês | MEDLINE | ID: mdl-38972499

RESUMO

Molecular dynamics simulation is a crucial research domain within the life sciences, focusing on comprehending the mechanisms of biomolecular interactions at atomic scales. Protein simulation, as a critical subfield, often utilizes MD for implementation, with trajectory data play a pivotal role in drug discovery. The advancement of high-performance computing and deep learning technology becomes popular and critical to predict protein properties from vast trajectory data, posing challenges regarding data features extraction from the complicated simulation data and dimensionality reduction. Simultaneously, it is essential to provide a meaningful explanation of the biological mechanism behind dimensionality. To tackle this challenge, we propose a new unsupervised model named RevGraphVAMP to intelligently analyze the simulation trajectory. This model is based on the variational approach for Markov processes (VAMP) and integrates graph convolutional neural networks and physical constraint optimization to enhance the learning performance. Additionally, we introduce attention mechanism to assess the importance of key interaction region, facilitating the interpretation of molecular mechanism. In comparison to other VAMPNets models, our model showcases competitive performance, improved accuracy in state transition prediction, as demonstrated through its application to two public datasets and the Shank3-Rap1 complex, which is associated with autism spectrum disorder. Moreover, it enhanced dimensionality reduction discrimination across different substates and provides interpretable results for protein structural characterization.

Assuntos

Cadeias de Markov , Simulação de Dinâmica Molecular , Redes Neurais de Computação , Proteínas , Proteínas/química , Humanos , Aprendizado Profundo

3.

Fully Flexible Molecular Alignment Enables Accurate Ligand Structure Modeling.

Wang, Zhihao; Zhou, Fan; Wang, Zechen; Hu, Qiuyue; Li, Yong-Qiang; Wang, Sheng; Wei, Yanjie; Zheng, Liangzhen; Li, Weifeng; Peng, Xiangda.

J Chem Inf Model ; 2024 Jul 29.

Artigo em Inglês | MEDLINE | ID: mdl-39074901

RESUMO

Accurate protein-ligand binding poses are the prerequisites of structure-based binding affinity prediction and provide the structural basis for in-depth lead optimization in small molecule drug design. However, it is challenging to provide reasonable predictions of binding poses for different molecules due to the complexity and diversity of the chemical space of small molecules. Similarity-based molecular alignment techniques can effectively narrow the search range, as structurally similar molecules are likely to have similar binding modes, with higher similarity usually correlated to higher success rates. However, molecular similarity is not consistently high because molecules often require changes to achieve specific purposes, leading to reduced alignment precision. To address this issue, we propose a new alignment methodâZ-align. This method uses topological structural information as a criterion for evaluating similarity, reducing the reliance on molecular fingerprint similarity. Our method has achieved success rates significantly higher than those of other methods at moderate levels of similarity. Additionally, our approach can comprehensively and flexibly optimize bond lengths and angles of molecules, maintaining a high accuracy even when dealing with larger molecules. Consequently, our proposed solution helps in achieving more accurate binding poses in protein-ligand docking problems, facilitating the development of small molecule drugs. Z-align is freely available as a web server at https://cloud.zelixir.com/zalign/home.

4.

SeedHit: A GPU Friendly Pre-Align Filtering Algorithm.

Ju, Zhen; Zhang, Jingjing; Li, Xuelei; Meng, Jintao; Wei, Yanjie.

IEEE/ACM Trans Comput Biol Bioinform ; PP2024 Jun 21.

Artigo em Inglês | MEDLINE | ID: mdl-38905083

RESUMO

The amount of genetic data generated by Next Generation Sequencing (NGS) technologies grows faster than Moore's law. This necessitates the development of efficient NGS data processing and analysis algorithms. A filter before the computationally-costly analysis step can significantly reduce the run time of the NGS data analysis. As GPUs are orders of magnitude more powerful than CPUs, this paper proposes a GPU-friendly pre-align filtering algorithm named SeedHit for the fast processing of NGS data. Inspired by BLAST, SeedHit counts seed hits between two sequences to determine their similarity. In SeedHit, a nucleic acid in a gene sequence is presented in binary format. By packaging data and generating a lookup table that fits into the L1 cache, SeedHit is GPU-friendly and high- throughput. Using three 16 s rRNA datasets from Greengenes as input SeedHit can reject 84%-89% dissimilar sequence pairs on average when the similarity is 0.9-0.99. The throughput of SeedHit achieved 1 T/s (Tera base per second) on 3080 Ti. Compared with the other two GPU-based filtering algorithms, GateKeeper and SneakySnake, SeedHit has the highest rejection rate and throughput. By incorporating SeedHit into our in-house clustering algorithm nGIA, the modified nGIA achieved a 1.6-2.1 times speedup compared to the original version.

5.

Phase transition-driven encapsulation of biomolecules using liquid metal with on-demand release for biomedical applications.

Gao, Yakun; Chen, Gangsheng; Ma, Biao; Wang, Yaru; Wei, Yanjie; Qian, Yunzhi; Kong, Ziyan; Hu, Yian; Ding, Xiong; Ping, Zhi; Zhao, Chao; Liu, Hong.

Biosens Bioelectron ; 259: 116403, 2024 Sep 01.

Artigo em Inglês | MEDLINE | ID: mdl-38776802

RESUMO

Robust encapsulation and controllable release of biomolecules have wide biomedical applications ranging from biosensing, drug delivery to information storage. However, conventional biomolecule encapsulation strategies have limitations in complicated operations, optical instability, and difficulty in decapsulation. Here, we report a simple, robust, and solvent-free biomolecule encapsulation strategy based on gallium liquid metal featuring low-temperature phase transition, self-healing, high hermetic sealing, and intrinsic resistance to optical damage. We sandwiched the biomolecules with the solid gallium films followed by low-temperature welding of the films for direct sealing. The gallium can not only protect DNA and enzymes from various physical and chemical damages but also allow the on-demand release of biomolecules by applying vibration to break the liquid gallium. We demonstrated that a DNA-coded image file can be recovered with up to 99.9% sequence retention after an accelerated aging test. We also showed the practical applications of the controllable release of bioreagents in a one-pot RPA-CRISPR/Cas12a reaction for SARS-COV-2 screening with a low detection limit of 10 copies within 40 min. This work may facilitate the development of robust and stimuli-responsive biomolecule capsules by using low-melting metals for biotechnology.

Assuntos

Técnicas Biossensoriais , Transição de Fase , SARS-CoV-2 , Técnicas Biossensoriais/métodos , SARS-CoV-2/isolamento & purificação , COVID-19/virologia , Gálio/química , Humanos , DNA/química , Sistemas CRISPR-Cas , Cápsulas/química

6.

A new paradigm for applying deep learning to protein-ligand interaction prediction.

Wang, Zechen; Wang, Sheng; Li, Yangyang; Guo, Jingjing; Wei, Yanjie; Mu, Yuguang; Zheng, Liangzhen; Li, Weifeng.

Brief Bioinform ; 25(3)2024 Mar 27.

Artigo em Inglês | MEDLINE | ID: mdl-38581420

RESUMO

Protein-ligand interaction prediction presents a significant challenge in drug design. Numerous machine learning and deep learning (DL) models have been developed to accurately identify docking poses of ligands and active compounds against specific targets. However, current models often suffer from inadequate accuracy or lack practical physical significance in their scoring systems. In this research paper, we introduce IGModel, a novel approach that utilizes the geometric information of protein-ligand complexes as input for predicting the root mean square deviation of docking poses and the binding strength (pKd, the negative value of the logarithm of binding affinity) within the same prediction framework. This ensures that the output scores carry intuitive meaning. We extensively evaluate the performance of IGModel on various docking power test sets, including the CASF-2016 benchmark, PDBbind-CrossDocked-Core and DISCO set, consistently achieving state-of-the-art accuracies. Furthermore, we assess IGModel's generalizability and robustness by evaluating it on unbiased test sets and sets containing target structures generated by AlphaFold2. The exceptional performance of IGModel on these sets demonstrates its efficacy. Additionally, we visualize the latent space of protein-ligand interactions encoded by IGModel and conduct interpretability analysis, providing valuable insights. This study presents a novel framework for DL-based prediction of protein-ligand interactions, contributing to the advancement of this field. The IGModel is available at GitHub repository https://github.com/zchwang/IGModel.

Assuntos

Aprendizado Profundo , Proteínas , Proteínas/química , Ligação Proteica , Ligantes , Desenho de Fármacos

7.

Mitigated N₂O emissions from submerged-plant-covered aquatic ecosystems on the Changjiang River Delta.

Li, Qingqian; Yu, Huibin; Yuan, Peng; Liu, Ruixia; Jing, Zhangmu; Wei, Yanjie; Tu, Shengqiang; Gao, Hongjie; Song, Yonghui.

Sci Total Environ ; 928: 172592, 2024 Jun 10.

Artigo em Inglês | MEDLINE | ID: mdl-38642768

RESUMO

Submerged plants affect nitrogen cycling in aquatic ecosystems. However, whether and how submerged plants change nitrous oxide (N2O) production mechanism and emissions flux remains controversial. Current research primarily focuses on the feedback from N2O release to variation of substrate level and microbial communities. It is deficient in connecting the relative contribution of individual N2O production processes (i.e., the N2O partition). Here, we attempted to offer a comprehensive understanding of the N2O mitigation mechanism in aquatic ecosystems on the Changjiang River Delta according to stable isotopic techniques, metagenome-assembly genome analysis, and statistical analysis. We found that the submerged plant reduced 45 % of N2O emissions by slowing down the dissolved inorganic nitrogen conversion velocity to N2O in sediment (Vf-[DIN]sed). It was attributed to changing the N2O partition and suppressing the potential capacity of net N2O production (i.e., nor/nosZ). The dominated production processes showed a shift with increasing excess N2O. Meanwhile, distinct shift thresholds of planted and unplanted habitats reflected different mechanisms of stimulated N2O production. The hotspot zone of N2O production corresponded to high nor/nosZ and unsaturated oxygen (O2) in unplanted habitat. In contrast, planted habitat hotspot has lower nor/nosZ and supersaturated O2. O2 from photosynthesis critically impacted the activities of N2O producers and consumers. In summary, the presence of submerged plants is beneficial to mitigate N2O emissions from aquatic ecosystems.

Assuntos

Ecossistema , Óxido Nitroso , Rios , China , Rios/química , Óxido Nitroso/análise , Plantas , Monitoramento Ambiental , Poluentes Atmosféricos/análise

8.

Design, in silico evaluation, and in vitro verification of new bivalent Smac mimetics with pro-apoptotic activity.

Huang, Qingsheng; Peng, Yin; Peng, Yuefeng; Lin, Huijuan; Deng, Shiqi; Feng, Shengzhong; Wei, Yanjie.

Methods ; 224: 35-46, 2024 Apr.

Artigo em Inglês | MEDLINE | ID: mdl-38373678

RESUMO

Bivalent Smac mimetics have been shown to possess binding affinity and pro-apoptotic activity similar to or more potent than that of native Smac, a protein dimer able to neutralize the anti-apoptotic activity of an inhibitor of caspase enzymes, XIAP, which endows cancer cells with resistance to anticancer drugs. We design five new bivalent Smac mimetics, which are formed by various linkers tethering two diazabicyclic cores being the IAP binding motifs. We built in silico models of the five mimetics by the TwistDock workflow and evaluated their conformational tendency, which suggests that compound 3, whose linker is n-hexylene, possess the highest binding potency among the five. After synthesis of these compounds, their ability in tumour cell growth inhibition and apoptosis induction displayed in experiments with SK-OV-3 and MDA-MB-231 cancer cell lines confirms our prediction. Among the five mimetics, compound 3 displays promising pro-apoptotic activity and deserves further optimization.

Assuntos

Antineoplásicos , Neoplasias , Humanos , Proteínas Inibidoras de Apoptose/metabolismo , Proteínas Inibidoras de Apoptose/farmacologia , Proteínas Inibidoras de Apoptose Ligadas ao Cromossomo X/metabolismo , Proteínas Inibidoras de Apoptose Ligadas ao Cromossomo X/farmacologia , Antineoplásicos/farmacologia , Antineoplásicos/química , Conformação Molecular , Apoptose , Linhagem Celular Tumoral

9.

A Weakly Supervised Learning Method for Cell Detection and Tracking Using Incomplete Initial Annotations.

Wu, Hao; Niyogisubizo, Jovial; Zhao, Keliang; Meng, Jintao; Xi, Wenhui; Li, Hongchang; Pan, Yi; Wei, Yanjie.

Int J Mol Sci ; 24(22)2023 Nov 07.

Artigo em Inglês | MEDLINE | ID: mdl-38003217

RESUMO

The automatic detection of cells in microscopy image sequences is a significant task in biomedical research. However, routine microscopy images with cells, which are taken during the process whereby constant division and differentiation occur, are notoriously difficult to detect due to changes in their appearance and number. Recently, convolutional neural network (CNN)-based methods have made significant progress in cell detection and tracking. However, these approaches require many manually annotated data for fully supervised training, which is time-consuming and often requires professional researchers. To alleviate such tiresome and labor-intensive costs, we propose a novel weakly supervised learning cell detection and tracking framework that trains the deep neural network using incomplete initial labels. Our approach uses incomplete cell markers obtained from fluorescent images for initial training on the Induced Pluripotent Stem (iPS) cell dataset, which is rarely studied for cell detection and tracking. During training, the incomplete initial labels were updated iteratively by combining detection and tracking results to obtain a model with better robustness. Our method was evaluated using two fields of the iPS cell dataset, along with the cell detection accuracy (DET) evaluation metric from the Cell Tracking Challenge (CTC) initiative, and it achieved 0.862 and 0.924 DET, respectively. The transferability of the developed model was tested using the public dataset FluoN2DH-GOWT1, which was taken from CTC; this contains two datasets with reference annotations. We randomly removed parts of the annotations in each labeled data to simulate the initial annotations on the public dataset. After training the model on the two datasets, with labels that comprise 10% cell markers, the DET improved from 0.130 to 0.903 and 0.116 to 0.877. When trained with labels that comprise 60% cell markers, the performance was better than the model trained using the supervised learning method. This outcome indicates that the model's performance improved as the quality of the labels used for training increased.

Assuntos

Redes Neurais de Computação , Aprendizado de Máquina Supervisionado , Processamento de Imagem Assistida por Computador/métodos

10.

JCcirc: circRNA full-length sequence assembly through integrated junction contigs.

Zhang, Jingjing; Zhang, Huiling; Ju, Zhen; Peng, Yin; Pan, Yi; Xi, Wenhui; Wei, Yanjie.

Brief Bioinform ; 24(6)2023 09 22.

Artigo em Inglês | MEDLINE | ID: mdl-37833842

RESUMO

Recent studies have shed light on the potential of circular RNA (circRNA) as a biomarker for disease diagnosis and as a nucleic acid vaccine. The exploration of these functionalities requires correct circRNA full-length sequences; however, existing assembly tools can only correctly assemble some circRNAs, and their performance can be further improved. Here, we introduce a novel feature known as the junction contig (JC), which is an extension of the back-splice junction (BSJ). Leveraging the strengths of both BSJ and JC, we present a novel method called JCcirc (https://github.com/cbbzhang/JCcirc). It enables efficient reconstruction of all types of circRNA full-length sequences and their alternative isoforms using splice graphs and fragment coverage. Our findings demonstrate the superiority of JCcirc over existing methods on human simulation datasets, and its average F1 score surpasses CircAST by 0.40 and outperforms both CIRI-full and circRNAfull by 0.13. For circRNAs below 400 bp, 400-800 bp, 800 bp-1200 bp and above 1200 bp, the correct assembly rates are 0.13, 0.09, 0.04 and 0.03 higher, respectively, than those achieved by existing methods. Moreover, JCcirc also outperforms existing assembly tools on other five model species datasets and real sequencing datasets. These results show that JCcirc is a robust tool for accurately assembling circRNA full-length sequences, laying the foundation for the functional analysis of circRNAs.

Assuntos

RNA Circular , RNA , Humanos , RNA Circular/genética , Análise de Sequência de RNA/métodos , Isoformas de Proteínas/genética , RNA/genética

11.

zPoseScore model for accurate and robust protein-ligand docking pose scoring in CASP15.

Shen, Tao; Liu, Fuxu; Wang, Zechen; Sun, Jinyuan; Bu, Yifan; Meng, Jintao; Chen, Weihua; Yao, Keyi; Mu, Yuguang; Li, Weifeng; Zhao, Guoping; Wang, Sheng; Wei, Yanjie; Zheng, Liangzhen.

Proteins ; 91(12): 1837-1849, 2023 Dec.

Artigo em Inglês | MEDLINE | ID: mdl-37606194

RESUMO

We introduce a deep learning-based ligand pose scoring model called zPoseScore for predicting protein-ligand complexes in the 15th Critical Assessment of Protein Structure Prediction (CASP15). Our contributions are threefold: first, we generate six training and evaluation data sets by employing advanced data augmentation and sampling methods. Second, we redesign the "zFormer" module, inspired by AlphaFold2's Evoformer, to efficiently describe protein-ligand interactions. This module enables the extraction of protein-ligand paired features that lead to accurate predictions. Finally, we develop the zPoseScore framework with zFormer for scoring and ranking ligand poses, allowing for atomic-level protein-ligand feature encoding and fusion to output refined ligand poses and ligand per-atom deviations. Our results demonstrate excellent performance on various testing data sets, achieving Pearson's correlation R = 0.783 and 0.659 for ranking docking decoys generated based on experimental and predicted protein structures of CASF-2016 protein-ligand complexes. Additionally, we obtain an averaged local distance difference test (lDDT pli = 0.558) of AIchemy LIG2 in CASP15 for de novo protein-ligand complex structure predictions. Detailed analysis shows that accurate ligand binding site prediction and side-chain orientation are crucial for achieving better prediction performance. Our proposed model is one of the most accurate protein-ligand pose prediction models and could serve as a valuable tool in small molecule drug discovery.

Assuntos

Proteínas , Ligantes , Ligação Proteica , Proteínas/química , Sítios de Ligação , Simulação de Acoplamento Molecular

12.

SWsnn: A Novel Simulator for Spiking Neural Networks.

Wang, Zhichao; Li, Xuelei; Fan, Jianping; Meng, Jintao; Lin, Zhenli; Pan, Yi; Wei, Yanjie.

J Comput Biol ; 30(9): 951-960, 2023 09.

Artigo em Inglês | MEDLINE | ID: mdl-37585615

RESUMO

Spiking neural network (SNN) simulators play an important role in neural system modeling and brain function research. They can help scientists reproduce and explore neuronal activities in brain regions, neuroscience, brain-like computing, and other fields and can also be applied to artificial intelligence, machine learning, and other fields. At present, many simulators using central processing unit (CPU) or graphics processing unit (GPU) have been developed. However, due to the randomness of connections between neurons and spiking events in SNN simulation, this causes a lot of memory access time. To alleviate this problem, we developed an SNN simulator SWsnn based on the new Sunway SW26010pro processor. The SW26010pro processor consists of six core groups, each with 16 MB of local data memory (LDM). LDM has the characteristics of high-speed read and write, which is suitable for performing simulation tasks similar to SNNs. Experimental results show that SWsnn runs faster than other mainstream GPU-based simulators when simulating a certain scale of neural network, showing a strong performance advantage. To conduct larger scale simulations, SWsnn designed a simulation computation based on a large shared model of Sunway processor and developed a multiprocessor version of SWsnn based on this mode, achieving larger scale SNN simulations.

Assuntos

Inteligência Artificial , Redes Neurais de Computação , Simulação por Computador , Neurônios/fisiologia , Encéfalo

13.

Editorial: Computational solutions for microbiome and metagenomics sequencing analyses, Volume II.

Niyogisubizo, Jovial; Cai, Yunpeng; Zhang, Lu; Zhang, Xingyu; Wei, Yanjie.

Front Mol Biosci ; 10: 1253303, 2023.

Artigo em Inglês | MEDLINE | ID: mdl-37529378

14.

Corrigendum: Identification of circRNA biomarker for gastric cancer through integrated analysis.

Hossain, Md Tofazzal; Li, Song; Reza, Md Selim; Feng, Shengzhong; Zhang, Xiaojing; Jin, Zhe; Wei, Yanjie; Peng, Yin.

Front Mol Biosci ; 10: 1249019, 2023.

Artigo em Inglês | MEDLINE | ID: mdl-37469706

RESUMO

[This corrects the article DOI: 10.3389/fmolb.2022.857320.].

15.

Corrigendum: Evaluation of CircRNA sequence assembly methods using long reads.

Zhang, Jingjing; Hossain, Md Tofazzal; Liu, Weiguo; Peng, Yin; Pan, Yi; Wei, Yanjie.

Front Genet ; 14: 1248519, 2023.

Artigo em Inglês | MEDLINE | ID: mdl-37485341

RESUMO

[This corrects the article DOI: 10.3389/fgene.2022.816825.].

16.

RabbitQCPlus 2.0: More efficient and versatile quality control for sequencing data.

Yan, Lifeng; Yin, Zekun; Zhang, Hao; Zhao, Zhan; Wang, Mingkai; Müller, André; Kallenborn, Felix; Wichmann, Alexander; Wei, Yanjie; Niu, Beifang; Schmidt, Bertil; Liu, Weiguo.

Methods ; 216: 39-50, 2023 08.

Artigo em Inglês | MEDLINE | ID: mdl-37330158

RESUMO

Assessing the quality of sequencing data plays a crucial role in downstream data analysis. However, existing tools often achieve sub-optimal efficiency, especially when dealing with compressed files or performing complicated quality control operations such as over-representation analysis and error correction. We present RabbitQCPlus, an ultra-efficient quality control tool for modern multi-core systems. RabbitQCPlus uses vectorization, memory copy reduction, parallel (de)compression, and optimized data structures to achieve substantial performance gains. It is 1.1 to 5.4 times faster when performing basic quality control operations compared to state-of-the-art applications yet requires fewer compute resources. Moreover, RabbitQCPlus is at least 4 times faster than other applications when processing gzip-compressed FASTQ files and 1.3 times faster with the error correction module turned on. Furthermore, it takes less than 4 minutes to process 280 GB of plain FASTQ sequencing data, while other applications take at least 22 minutes on a 48-core server when enabling the per-read over-representation analysis. C++ sources are available at https://github.com/RabbitBio/RabbitQCPlus.

Assuntos

Compressão de Dados , Software , Sequenciamento de Nucleotídeos em Larga Escala , Controle de Qualidade , Algoritmos , Análise de Sequência de DNA

17.

RabbitTClust: enabling fast clustering analysis of millions of bacteria genomes with MinHash sketches.

Xu, Xiaoming; Yin, Zekun; Yan, Lifeng; Zhang, Hao; Xu, Borui; Wei, Yanjie; Niu, Beifang; Schmidt, Bertil; Liu, Weiguo.

Genome Biol ; 24(1): 121, 2023 05 17.

Artigo em Inglês | MEDLINE | ID: mdl-37198663

RESUMO

We present RabbitTClust, a fast and memory-efficient genome clustering tool based on sketch-based distance estimation. Our approach enables efficient processing of large-scale datasets by combining dimensionality reduction techniques with streaming and parallelization on modern multi-core platforms. 113,674 complete bacterial genome sequences from RefSeq, 455 GB in FASTA format, can be clustered within less than 6 min and 1,009,738 GenBank assembled bacterial genomes, 4.0 TB in FASTA format, within only 34 min on a 128-core workstation. Our results further identify 1269 redundant genomes, with identical nucleotide content, in the RefSeq bacterial genomes database.

Assuntos

Genoma , Software , Bases de Dados de Ácidos Nucleicos , Análise por Conglomerados , Bactérias , Algoritmos , Genoma Bacteriano

18.

Mechanistic insights into dissolved organic matter-driven protistan and bacterial community dynamics influenced by vegetation restoration.

Jing, Zhang-Mu; Li, Qing-Qian; Wei, Yan-Jie; Dong, Bin; Yuan, Peng; Liu, Rui-Xia; Gao, Hong-Jie.

Environ Res ; 227: 115710, 2023 06 15.

Artigo em Inglês | MEDLINE | ID: mdl-36933634

RESUMO

Vegetation restoration projects can not only improve water quality by absorbing and transferring pollutants and nutrients from non-vegetation sources, but also protect biodiversity by providing habitat for biological growth. However, the mechanism of the protistan and bacterial assembly processes in the vegetation restoration project were rarely explored. To address this, based on 18 S rRNA and 16 S rRNA high-throughput sequencing, we investigated the mechanism of protistan and bacterial community assembly processes, environmental conditions, and microbial interactions in the rivers with (out) vegetation restoration. The results indicated that the deterministic process dominated the protistan and bacterial community assembly (94.29% and 92.38%), influenced by biotic and abiotic factors. For biotic factors, microbial network connectivity was higher in the vegetation zone (average degree = 20.34) than in the bare zone (average degree = 11.00). For abiotic factors, the concentration of dissolved organic carbon ([DOC]) was the most important environmental factor affecting the microbial community composition. [DOC] was lower significantly in vegetation zone (18.65 ± 6.34 mg/L) than in the bare zone (28.22 ± 4.82 mg/L). In overlying water, vegetation restoration upregulated the protein-like fluorescence components (C1 and C2) by 1.26 and 1.01-folds and downregulated the terrestrial humic-like fluorescence components (C3 and C4) by 0.54 and 0.55-folds, respectively. The different DOM components guided bacteria and protists to select different interactive relationships. The protein-like DOM components led to bacterial competition, whereas the humus-like DOM components resulted in protistan competition. Finally, the structural equation model was established to explain that DOM components can affect protistan and bacterial diversity by providing substrates, facilitating microbial interactions, and promoting nutrient input. In general, our study provides insights into the responses of vegetation restored ecosystems to the dynamics and interactives in the anthropogenically influenced river and evaluates the ecological restoration performance of vegetation restoration from a molecular biology perspective.

Assuntos

Matéria Orgânica Dissolvida , Microbiota , Rios/química , Qualidade da Água , Bactérias/genética , Espectrometria de Fluorescência

19.

Deep Learning-Based Bioactive Therapeutic Peptide Generation and Screening.

Zhang, Haiping; Saravanan, Konda Mani; Wei, Yanjie; Jiao, Yang; Yang, Yang; Pan, Yi; Wu, Xuli; Zhang, John Z H.

J Chem Inf Model ; 63(3): 835-845, 2023 02 13.

Artigo em Inglês | MEDLINE | ID: mdl-36724090

RESUMO

Many bioactive peptides demonstrated therapeutic effects over complicated diseases, such as antiviral, antibacterial, anticancer, etc. It is possible to generate a large number of potentially bioactive peptides using deep learning in a manner analogous to the generation of de novo chemical compounds using the acquired bioactive peptides as a training set. Such generative techniques would be significant for drug development since peptides are much easier and cheaper to synthesize than compounds. Despite the limited availability of deep learning-based peptide-generating models, we have built an LSTM model (called LSTM_Pep) to generate de novo peptides and fine-tuned the model to generate de novo peptides with specific prospective therapeutic benefits. Remarkably, the Antimicrobial Peptide Database has been effectively utilized to generate various kinds of potential active de novo peptides. We proposed a pipeline for screening those generated peptides for a given target and used the main protease of SARS-COV-2 as a proof-of-concept. Moreover, we have developed a deep learning-based protein-peptide prediction model (DeepPep) for rapid screening of the generated peptides for the given targets. Together with the generating model, we have demonstrated that iteratively fine-tuning training, generating, and screening peptides for higher-predicted binding affinity peptides can be achieved. Our work sheds light on developing deep learning-based methods and pipelines to effectively generate and obtain bioactive peptides with a specific therapeutic effect and showcases how artificial intelligence can help discover de novo bioactive peptides that can bind to a particular target.

Assuntos

COVID-19 , Aprendizado Profundo , Humanos , Inteligência Artificial , Desenho de Fármacos , SARS-CoV-2 , Peptídeos/farmacologia

20.

A fully differentiable ligand pose optimization framework guided by deep learning and a traditional scoring function.

Wang, Zechen; Zheng, Liangzhen; Wang, Sheng; Lin, Mingzhi; Wang, Zhihao; Kong, Adams Wai-Kin; Mu, Yuguang; Wei, Yanjie; Li, Weifeng.

Brief Bioinform ; 24(1)2023 01 19.

Artigo em Inglês | MEDLINE | ID: mdl-36502369

RESUMO

The recently reported machine learning- or deep learning-based scoring functions (SFs) have shown exciting performance in predicting protein-ligand binding affinities with fruitful application prospects. However, the differentiation between highly similar ligand conformations, including the native binding pose (the global energy minimum state), remains challenging that could greatly enhance the docking. In this work, we propose a fully differentiable, end-to-end framework for ligand pose optimization based on a hybrid SF called DeepRMSD+Vina combined with a multi-layer perceptron (DeepRMSD) and the traditional AutoDock Vina SF. The DeepRMSD+Vina, which combines (1) the root mean square deviation (RMSD) of the docking pose with respect to the native pose and (2) the AutoDock Vina score, is fully differentiable; thus is capable of optimizing the ligand binding pose to the energy-lowest conformation. Evaluated by the CASF-2016 docking power dataset, the DeepRMSD+Vina reaches a success rate of 94.4%, which outperforms most reported SFs to date. We evaluated the ligand conformation optimization framework in practical molecular docking scenarios (redocking and cross-docking tasks), revealing the high potentialities of this framework in drug design and discovery. Structural analysis shows that this framework has the ability to identify key physical interactions in protein-ligand binding, such as hydrogen-bonding. Our work provides a paradigm for optimizing ligand conformations based on deep learning algorithms. The DeepRMSD+Vina model and the optimization framework are available at GitHub repository https://github.com/zchwang/DeepRMSD-Vina_Optimization.

Assuntos

Aprendizado Profundo , Ligantes , Simulação de Acoplamento Molecular , Proteínas/química , Desenho de Fármacos , Ligação Proteica

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA