Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 132
Filtrar
1.
Bioinformatics ; 2024 May 02.
Artigo em Inglês | MEDLINE | ID: mdl-38696758

RESUMO

MOTIVATION: Peptides are promising agents for the treatment of a variety of diseases due to their specificity and efficacy. However, the development of peptide-based drugs is often hindered by the potential toxicity of peptides, which poses a significant barrier to their clinical application. Traditional experimental methods for evaluating peptide toxicity are time-consuming and costly, making the development process inefficient. Therefore, there is an urgent need for computational tools specifically designed to predict peptide toxicity accurately and rapidly, facilitating the identification of safe peptide candidates for drug development. RESULTS: We provide here a novel computational approach, CAPTP, which leverages the power of convolutional and self-attention to enhance the prediction of peptide toxicity from amino acid sequences. CAPTP demonstrates outstanding performance, achieving a Matthews correlation coefficient of approximately 0.82 in both cross-validation settings and on independent test dataset. This performance surpasses that of existing state-of-the-art peptide toxicity predictors. Importantly, CAPTP maintains its robustness and generalizability even when dealing with data imbalances. Further analysis by CAPTP reveals that certain sequential patterns, particularly in the head and central regions of peptides, are crucial in determining their toxicity. This insight can significantly inform and guide the design of safer peptide drugs. AVAILABILITY: The source code for CAPTP is freely available at https://github.com/jiaoshihu/CAPTP. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

2.
Phys Chem Chem Phys ; 26(15): 11657-11666, 2024 Apr 17.
Artigo em Inglês | MEDLINE | ID: mdl-38563149

RESUMO

Silica exhibits a rich phase diagram with numerous stable structures existing at different temperature and pressure conditions, including its glassy form. In large-scale atomistic simulations, due to the small energy difference, several phases may coexist. While, in terms of long-range order, there are clear differences between these phases, their short- or medium-range structural properties are similar for many phases, thus making it difficult to detect the structural differences. In this study, a methodology based on unsupervised learning is proposed to detect the differences in local structures between eight phases of silica, using atomic models prepared by molecular dynamics (MD) simulations. A combination of two-step locality preserving projections (TS-LPP) and locally averaged atomic fingerprints (LAAF) descriptor was employed to find a low-dimensional space in which the differences among all the phases can be detected. From the distance between each structure in the found low-dimensional space, the similarity between the structures can be discussed and subtle local changes in the structures can be detected. Using the obtained low-dimensional space, the ß-α transition in quartz at a low temperature was analyzed, as well as the structural evolution during the melt-quench process starting from α-quartz. The proper differentiation and ease of visualization make the present methodology promising for improving the analysis of the structure and properties of glasses, where subtle differences in structure appear due to differences in the temperature and pressure conditions at which they were synthesized.

3.
Adv Sci (Weinh) ; : e2400009, 2024 Apr 11.
Artigo em Inglês | MEDLINE | ID: mdl-38602457

RESUMO

Recent studies have revealed that numerous lncRNAs can translate proteins under specific conditions, performing diverse biological functions, thus termed coding lncRNAs. Their comprehensive landscape, however, remains elusive due to this field's preliminary and dispersed nature. This study introduces codLncScape, a framework for coding lncRNA exploration consisting of codLncDB, codLncFlow, codLncWeb, and codLncNLP. Specifically, it contains a manually compiled knowledge base, codLncDB, encompassing 353 coding lncRNA entries validated by experiments. Building upon codLncDB, codLncFlow investigates the expression characteristics of these lncRNAs and their diagnostic potential in the pan-cancer context, alongside their association with spermatogenesis. Furthermore, codLncWeb emerges as a platform for storing, browsing, and accessing knowledge concerning coding lncRNAs within various programming environments. Finally, codLncNLP serves as a knowledge-mining tool to enhance the timely content inclusion and updates within codLncDB. In summary, this study offers a well-functioning, content-rich ecosystem for coding lncRNA research, aiming to accelerate systematic studies in this field.

4.
Comput Biol Med ; 171: 108181, 2024 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-38428094

RESUMO

In the field of drug discovery and pharmacology research, precise and rapid prediction of drug-target binding affinity (DTA) and drug-drug interaction (DDI) are essential for drug efficacy and safety. However, pharmacological data are often distributed across different institutions. Moreover, due to concerns regarding data privacy and intellectual property, the sharing of pharmacological data is often restricted. It is difficult for institutions to achieve the desired performance by solely utilizing their data. This urgent challenge calls for a solution that not only enhances collaboration between multiple institutions to improve prediction accuracy but also safeguards data privacy. In this study, we propose a novel federated learning (FL) framework to advance the prediction of DTA and DDI, namely FL-DTA and FL-DDI. The proposed framework enables multiple institutions to collaboratively train a predictive model without the need to share their local data. Moreover, to ensure data privacy, we employ secure multi-party computation (MPC) during the federated learning model aggregation phase. We evaluated the proposed method on two DTA and one DDI benchmark datasets and compared them with centralized learning and local learning. The experimental results indicate that the proposed method performs closely to centralized learning, and significantly outperforms local learning. Moreover, the proposed framework ensures data security while promoting collaboration among institutions, thereby accelerating the drug discovery process.


Assuntos
Benchmarking , Aprendizagem , Sistemas de Liberação de Medicamentos , Descoberta de Drogas
5.
Int J Biol Macromol ; 265(Pt 1): 130659, 2024 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-38462114

RESUMO

Understanding the subcellular localization of lncRNAs is crucial for comprehending their regulation activities. The conventional detection of lncRNA subcellular location usually uses in situ detection techniques, which are resource intensive. Some machine learning-based algorithms have been proposed for lncRNA subcellular location prediction in mammals. However, due to the low level of conservation of lncRNA sequence, the performance of cross-species models remains unsatisfactory. In this study, we curated a novel dataset containing subcellular location information of lncRNAs in Homo sapiens. Subsequently, based on the BERT pre-trained language algorithm, we developed a model for lncRNA subcellular location prediction. Our model achieved a micro-average area under the receiver operating characteristic (AUROC) of 0.791 on the training set and an AUROC of 0.700 on the testing nucleus set. Additionally, we conducted cross-species validation and motif discovery to further investigate underlying patterns. In summary, our study provides valuable guidance and computational analysis tools for exploring the mechanisms of lncRNA subcellular localization and the dynamic spatial changes of RNA in abnormal physiological states.


Assuntos
RNA Longo não Codificante , Animais , Humanos , RNA Longo não Codificante/genética , Algoritmos , Aprendizado de Máquina , Biologia Computacional/métodos , Mamíferos/genética
6.
Comput Biol Med ; 168: 107762, 2024 01.
Artigo em Inglês | MEDLINE | ID: mdl-38056212

RESUMO

Antibiotic resistance continues to be a growing concern for global health, accentuating the need for novel antibiotic discoveries. Traditional methodologies in this field have relied heavily on extensive experimental screening, which is often time-consuming and costly. Contrastly, computer-assisted drug screening offers rapid, cost-effective solutions. In this work, we propose FIAMol-AB, a deep learning model that combines graph neural networks, text convolutional networks and molecular fingerprint techniques. This method also combines an attention mechanism to fuse multiple forms of information within the model. The experiments show that FIAMol-AB may offer potential advantages in antibiotic discovery tasks over some existing methods. We conducted some analysis based on our model's results, which help highlight the potential significance of certain features in the model's predictive performance. Compared to different models, ours demonstrate promising results, indicating potential robustness and versatility. This suggests that by integrating multi-view information and attention mechanisms, FIAMol-AB might better learn complex molecular structures, potentially improving the precision and efficiency of antibiotic discovery. We hope our FIAMol-AB can be used as a useful method in the ongoing fight against antibiotic resistance.


Assuntos
Aprendizado Profundo , Antibacterianos/farmacologia , Avaliação Pré-Clínica de Medicamentos , Redes Neurais de Computação
7.
Genome Biol Evol ; 15(11)2023 Nov 01.
Artigo em Inglês | MEDLINE | ID: mdl-38014863

RESUMO

Semisulcospira habei is a freshwater snail species endemic to the Lake Biwa drainage and belongs to a species group radiated within the lake system. We report the chromosome-scale genome assembly of S. habei, including eight megascaffolds larger than 150 Mb. The genome assembly size is about 2.0 Gb with an N50 of 237 Mb. There are 41,547 protein-coding genes modeled by ab initio gene prediction based on the transcriptome data set, and the BUSCO completeness of the annotated genes was 92.2%. The repeat elements comprise approximately 76% of the genome assembly. The Hi-C contact map showed seven well-resolved scaffolds that correspond to the basic haploid chromosome number of S. habei inferred from the preceding karyotypic study, while it also exhibited one scaffold with a complicated mosaic pattern that is likely to represent the complex of multiple supernumerary chromosomes. The genome assembly reported here represents a high-quality genome resource in disentangling the genomic background of the adaptive radiation of Semisulcospira and also facilitates evolutionary studies in the superfamily Cerithioidea.


Assuntos
Lagos , Caramujos , Animais , Caramujos/genética , Cromossomos/genética , Genômica , Tamanho do Genoma
8.
Bioinformatics ; 39(11)2023 11 01.
Artigo em Inglês | MEDLINE | ID: mdl-37847658

RESUMO

MOTIVATION: The rapid and extensive transmission of the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has led to an unprecedented global health emergency, affecting millions of people and causing an immense socioeconomic impact. The identification of SARS-CoV-2 phosphorylation sites plays an important role in unraveling the complex molecular mechanisms behind infection and the resulting alterations in host cell pathways. However, currently available prediction tools for identifying these sites lack accuracy and efficiency. RESULTS: In this study, we presented a comprehensive biological function analysis of SARS-CoV-2 infection in a clonal human lung epithelial A549 cell, revealing dramatic changes in protein phosphorylation pathways in host cells. Moreover, a novel deep learning predictor called PSPred-ALE is specifically designed to identify phosphorylation sites in human host cells that are infected with SARS-CoV-2. The key idea of PSPred-ALE lies in the use of a self-adaptive learning embedding algorithm, which enables the automatic extraction of context sequential features from protein sequences. In addition, the tool uses multihead attention module that enables the capturing of global information, further improving the accuracy of predictions. Comparative analysis of features demonstrated that the self-adaptive learning embedding features are superior to hand-crafted statistical features in capturing discriminative sequence information. Benchmarking comparison shows that PSPred-ALE outperforms the state-of-the-art prediction tools and achieves robust performance. Therefore, the proposed model can effectively identify phosphorylation sites assistant the biomedical scientists in understanding the mechanism of phosphorylation in SARS-CoV-2 infection. AVAILABILITY AND IMPLEMENTATION: PSPred-ALE is available at https://github.com/jiaoshihu/PSPred-ALE and Zenodo (https://doi.org/10.5281/zenodo.8330277).


Assuntos
COVID-19 , Redes Neurais de Computação , Humanos , SARS-CoV-2 , Fosforilação , Algoritmos
9.
Front Plant Sci ; 14: 1269964, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37868310

RESUMO

Heat stress is a severe challenge for plant production, and the use of thermotolerant cultivars is critical to ensure stable production in high-temperature-prone environments. However, the selection of thermotolerant cultivars is difficult due to the complex nature of heat stress and the time and space needed for evaluation. In this study, we characterized genome-wide differences in gene expression between thermotolerant and thermosensitive tomato cultivars and examined the possibility of selecting gene expression markers to estimate thermotolerance among different tomato cultivars. We selected one thermotolerant and one thermosensitive cultivar based on physiological evaluations and compared heat-responsive gene expression in these cultivars under stepwise heat stress and acute heat shock conditions. Transcriptomic analyses reveled that two heat-inducible gene expression pathways, controlled by the heat shock element (HSE) and the evening element (EE), respectively, presented different responses depending on heat stress conditions. HSE-regulated gene expression was induced under both conditions, while EE-regulated gene expression was only induced under gradual heat stress conditions in both cultivars. Furthermore, HSE-regulated genes showed higher expression in the thermotolerant cultivar than the sensitive cultivar under acute heat shock conditions. Then, candidate expression biomarker genes were selected based on the transcriptome data, and the usefulness of these candidate genes was validated in five cultivars. This study shows that the thermotolerance of tomato is correlated with its ability to maintain the heat shock response (HSR) under acute severe heat shock conditions. Furthermore, it raises the possibility that the robustness of the HSR under severe heat stress can be used as an indicator to evaluate the thermotolerance of crop cultivars.

10.
Methods ; 219: 1-7, 2023 11.
Artigo em Inglês | MEDLINE | ID: mdl-37689121

RESUMO

With the increasing availability of large-scale QSAR (Quantitative Structure-Activity Relationship) datasets, collaborative analysis has become a promising approach for drug discovery. Traditional centralized analysis which typically concentrates data on a central server for training faces challenges such as data privacy and security. Distributed analysis such as federated learning offers a solution by enabling collaborative model training without sharing raw data. However, it may fail when the training data in the local devices are non-independent and identically distributed (non-IID). In this paper, we propose a novel framework for collaborative drug discovery using federated learning on non-IID datasets. We address the difficulty of training on non-IID data by globally sharing a small subset of data among all institutions. Our framework allows multiple institutions to jointly train a robust predictive model while preserving the privacy of their individual data. We leverage the federated learning paradigm to distribute the model training process across local devices, eliminating the need for data exchange. The experimental results on 15 benchmark datasets demonstrate that the proposed method achieves competitive predictive accuracy to centralized analysis while respecting data privacy. Moreover, our framework offers benefits such as reduced data transmission and enhanced scalability, making it suitable for large-scale collaborative drug discovery efforts.


Assuntos
Benchmarking , Descoberta de Drogas
11.
Sci Rep ; 13(1): 11820, 2023 07 21.
Artigo em Inglês | MEDLINE | ID: mdl-37479701

RESUMO

Recent studies showed that machine learning models such as gradient-boosting decision tree (GBDT) can predict diabetes with high accuracy from big data. In this study, we asked whether highly accurate prediction of diabetes is possible even from small data by expanding the amount of data through data collaboration (DC) analysis, a modern framework for integrating and analyzing data accumulated at multiple institutions while ensuring confidentiality. To this end, we focused on data from two institutions: health checkup data of 1502 citizens accumulated in Tsukuba City and health history data of 1399 patients collected at the University of Tsukuba Hospital. When using only the health checkup data, the ROC-AUC and Recall for logistic regression (LR) were 0.858 ± 0.014 and 0.970 ± 0.019, respectively, while those for GBDT were 0.856 ± 0.014 and 0.983 ± 0.016, respectively. When using also the health history data through DC analysis, these values for LR improved to 0.875 ± 0.013 and 0.993 ± 0.009, respectively, while those for GBDT deteriorated because of the low compatibility with a method used for confidential data sharing (although DC analysis brought improvements). Even in a situation where health checkup data of only 324 citizens are available, the ROC-AUC and Recall for LR were 0.767 ± 0.025 and 0.867 ± 0.04, respectively, thanks to DC analysis, indicating an 11% and 12% improvement. Thus, we concluded that the answer to the above question was "Yes" for LR but "No" for GBDT for the data set tested in this study.


Assuntos
Diabetes Mellitus , Humanos , Diabetes Mellitus/diagnóstico , Diabetes Mellitus/epidemiologia , Aprendizado de Máquina , Modelos Logísticos
12.
Comput Biol Med ; 164: 107223, 2023 09.
Artigo em Inglês | MEDLINE | ID: mdl-37490833

RESUMO

The increased availability of high-throughput technologies has enabled biomedical researchers to learn about disease etiology across multiple omics layers, which shows promise for improving cancer subtype identification. Many computational methods have been developed to perform clustering on multi-omics data, however, only a few of them are applicable for partial multi-omics in which some samples lack data in some types of omics. In this study, we propose a novel multi-omics clustering method based on latent sub-space learning (MCLS), which can deal with the missing multi-omics for clustering. We utilize the data with complete omics to construct a latent subspace using PCA-based feature extraction and singular value decomposition (SVD). The data with incomplete multi-omics are then projected to the latent subspace, and spectral clustering is performed to find the clusters. The proposed MCLS method is evaluated on seven different cancer datasets on three levels of omics in both full and partial cases compared to several state-of-the-art methods. The experimental results show that the proposed MCLS method is more efficient and effective than the compared methods for cancer subtype identification in multi-omics data analysis, which provides important references to a comprehensive understanding of cancer and biological mechanisms. AVAILABILITY: The proposed method can be freely accessible at https://github.com/ShangCS/MCLS.


Assuntos
Algoritmos , Neoplasias , Humanos , Multiômica , Análise por Conglomerados , Neoplasias/genética , Análise de Dados
13.
J Mol Biol ; 435(14): 168116, 2023 07 15.
Artigo em Inglês | MEDLINE | ID: mdl-37356901

RESUMO

Dimensionality reduction is a hot topic in machine learning that can help researchers find key features from complex medical or biological data, which is crucial for biological sequence research, drug development, etc. However, when applied to specific datasets, different dimensionality reduction methods generate different results, which produces instability and makes tuning the parameters a time-consuming task. Exploring high quality features, genes, or attributes from complex data is an important task and challenge. To ensure the efficiency, robustness, and accuracy of experiments, in this work, we developed a dimensionality reduction tool MRMD3.0 based on the ensemble strategy of link analysis. It is mainly divided into two steps: first, the ensemble method is used to integrate different feature ranking algorithms to calculate feature importance, and then the forward feature search strategy combined with cross-validation is used to explore the proper feature combination. Compared with the previously developed version, MRMD3.0 has added more link-based ensemble algorithms, including PageRank, HITS, LeaderRank, and TrustRank. At the same time, more feature ranking algorithms have been added, and their effect and calculation speed have been greatly improved. In addition, the newest version provides an interface used by each feature ranking method and five kinds of charts to help users analyze features. Finally, we also provide an online webserver to help researchers analyze the data. Availability and implementation Webserver: http://lab.malab.cn/soft/MRMDv3/home.html. GitHub: https://github.com/heshida01/MRMD3.0.


Assuntos
Visualização de Dados , Software , Algoritmos , Aprendizado de Máquina
14.
Phys Chem Chem Phys ; 25(27): 17978-17986, 2023 Jul 12.
Artigo em Inglês | MEDLINE | ID: mdl-37377109

RESUMO

The atomic descriptors used in machine learning to predict forces are often high dimensional. In general, by retrieving a significant amount of structural information from these descriptors, accurate force predictions can be achieved. On the other hand, to acquire higher robustness for transferability without overfitting, sufficient reduction of descriptors should be necessary. In this study, we propose a method to automatically determine hyperparameters in the atomic descriptors, aiming to obtain accurate machine learning forces while using a small number of descriptors. Our method focuses on identifying an appropriate threshold cut-off for the variance value of the descriptor components. To demonstrate the effectiveness of our method, we apply it to crystalline, liquid, and amorphous structures in SiO2, SiGe, and Si systems. By using both conventional two-body descriptors and our introduced split-type three-body descriptors, we demonstrate that our method can provide machine learning forces that enable efficient and robust molecular dynamics simulations.

15.
Brief Funct Genomics ; 22(4): 392-400, 2023 07 17.
Artigo em Inglês | MEDLINE | ID: mdl-37078726

RESUMO

Language models have shown the capacity to learn complex molecular distributions. In the field of molecular generation, they are designed to explore the distribution of molecules, and previous studies have demonstrated their ability to learn molecule sequences. In the early times, recurrent neural networks (RNNs) were widely used for feature extraction from sequence data and have been used for various molecule generation tasks. In recent years, the attention mechanism for sequence data has become popular. It captures the underlying relationships between words and is widely applied to language models. The Transformer-Layer, a model based on a self-attentive mechanism, also shines the same as the RNN-based model. In this research, we investigated the difference between RNNs and the Transformer-Layer to learn a more complex distribution of molecules. For this purpose, we experimented with three different generative tasks: the distributions of molecules with elevated scores of penalized LogP, multimodal distributions of molecules and the largest molecules in PubChem. We evaluated the models on molecular properties, basic metrics, Tanimoto similarity, etc. In addition, we applied two different representations of the molecule, SMILES and SELFIES. The results show that the two language models can learn complex molecular distributions and SMILES-based representation has better performance than SELFIES. The choice between RNNs and the Transformer-Layer needs to be based on the characteristics of dataset. RNNs work better on data focus on local features and decreases with multidistribution data, while the Transformer-Layer is more suitable when meeting molecular with larger weights and focusing on global features.


Assuntos
Idioma , Redes Neurais de Computação
16.
BMC Biol ; 21(1): 93, 2023 04 24.
Artigo em Inglês | MEDLINE | ID: mdl-37095510

RESUMO

BACKGROUND: RNA 5-methyluridine (m5U) modifications are obtained by methylation at the C5 position of uridine catalyzed by pyrimidine methylation transferase, which is related to the development of human diseases. Accurate identification of m5U modification sites from RNA sequences can contribute to the understanding of their biological functions and the pathogenesis of related diseases. Compared to traditional experimental methods, computational methods developed based on machine learning with ease of use can identify modification sites from RNA sequences in an efficient and time-saving manner. Despite the good performance of these computational methods, there are some drawbacks and limitations. RESULTS: In this study, we have developed a novel predictor, m5U-SVM, based on multi-view features and machine learning algorithms to construct predictive models for identifying m5U modification sites from RNA sequences. In this method, we used four traditional physicochemical features and distributed representation features. The optimized multi-view features were obtained from the four fused traditional physicochemical features by using the two-step LightGBM and IFS methods, and then the distributed representation features were fused with the optimized physicochemical features to obtain the new multi-view features. The best performing classifier, support vector machine, was identified by screening different machine learning algorithms. Compared with the results, the performance of the proposed model is better than that of the existing state-of-the-art tool. CONCLUSIONS: m5U-SVM provides an effective tool that successfully captures sequence-related attributes of modifications and can accurately predict m5U modification sites from RNA sequences. The identification of m5U modification sites helps to understand and delve into the related biological processes and functions.


Assuntos
RNA , Máquina de Vetores de Suporte , Humanos , Algoritmos , Metilação , Biologia Computacional/métodos
17.
Plant Mol Biol ; 112(1-2): 33-45, 2023 May.
Artigo em Inglês | MEDLINE | ID: mdl-37014509

RESUMO

The primary transcript structure provides critical insights into protein diversity, transcriptional modification, and functions. Cassava transcript structures are highly diverse because of alternative splicing (AS) events and high heterozygosity. To precisely determine and characterize transcript structures, fully sequencing cloned transcripts is the most reliable method. However, cassava annotations were mainly determined according to fragmentation-based sequencing analyses (e.g., EST and short-read RNA-seq). In this study, we sequenced the cassava full-length cDNA library, which included rare transcripts. We obtained 8,628 non-redundant fully sequenced transcripts and detected 615 unannotated AS events and 421 unannotated loci. The different protein sequences resulting from the unannotated AS events tended to have diverse functional domains, implying that unannotated AS contributes to the truncation of functional domains. The unannotated loci tended to be derived from orphan genes, implying that the loci may be associated with cassava-specific traits. Unexpectedly, individual cassava transcripts were more likely to have multiple AS events than Arabidopsis transcripts, suggestive of the regulated interactions between cassava splicing-related complexes. We also observed that the unannotated loci and/or AS events were commonly in regions with abundant single nucleotide variations, insertions-deletions, and heterozygous sequences. These findings reflect the utility of completely sequenced FLcDNA clones for overcoming cassava-specific annotation-related problems to elucidate transcript structures. Our work provides researchers with transcript structural details that are useful for annotating highly diverse and unique transcripts and alternative splicing events.


Assuntos
Processamento Alternativo , Manihot , Processamento Alternativo/genética , Manihot/genética , Manihot/metabolismo , Nucleotídeos , Biblioteca Gênica , Sequência de Bases
18.
J Cheminform ; 15(1): 38, 2023 Mar 28.
Artigo em Inglês | MEDLINE | ID: mdl-36978179

RESUMO

Drug discovery for a protein target is a laborious and costly process. Deep learning (DL) methods have been applied to drug discovery and successfully generated novel molecular structures, and they can substantially reduce development time and costs. However, most of them rely on prior knowledge, either by drawing on the structure and properties of known molecules to generate similar candidate molecules or extracting information on the binding sites of protein pockets to obtain molecules that can bind to them. In this paper, DeepTarget, an end-to-end DL model, was proposed to generate novel molecules solely relying on the amino acid sequence of the target protein to reduce the heavy reliance on prior knowledge. DeepTarget includes three modules: Amino Acid Sequence Embedding (AASE), Structural Feature Inference (SFI), and Molecule Generation (MG). AASE generates embeddings from the amino acid sequence of the target protein. SFI inferences the potential structural features of the synthesized molecule, and MG seeks to construct the eventual molecule. The validity of the generated molecules was demonstrated by a benchmark platform of molecular generation models. The interaction between the generated molecules and the target proteins was also verified on the basis of two metrics, drug-target affinity and molecular docking. The results of the experiments indicated the efficacy of the model for direct molecule generation solely conditioned on amino acid sequence.

19.
Methods ; 211: 61-67, 2023 03.
Artigo em Inglês | MEDLINE | ID: mdl-36804215

RESUMO

Recent advances in multi-omics databases offer the opportunity to explore complex systems of cancers across hierarchical biological levels. Some methods have been proposed to identify the genes that play a vital role in disease development by integrating multi-omics. However, the existing methods identify the related genes separately, neglecting the gene interactions that are related to the multigenic disease. In this study, we develop a learning framework to identify the interactive genes based on multi-omics data including gene expression. Firstly, we integrate different omics based on their similarities and apply spectral clustering for cancer subtype identification. Then, a gene co-expression network is construct for each cancer subtype. Finally, we detect the interactive genes in the co-expression network by learning the dense subgraphs based on the L1 prosperities of eigenvectors in the modularity matrix. We apply the proposed learning framework on a multi-omics cancer dataset to identify the interactive genes for each cancer subtype. The detected genes are examined by DAVID and KEGG tools for systematic gene ontology enrichment analysis. The analysis results show that the detected genes have relationships to cancer development and the genes in different cancer subtypes are related to different biological processes and pathways, which are expected to yield important references for understanding tumor heterogeneity and improving patient survival.


Assuntos
Multiômica , Neoplasias , Humanos , Neoplasias/genética , Análise por Conglomerados , Bases de Dados Factuais
20.
Microbiol Resour Announc ; 12(1): e0105422, 2023 Jan 24.
Artigo em Inglês | MEDLINE | ID: mdl-36515545

RESUMO

Actinomycetes isolated from the marine environment often require the presence of seawater for their growth and/or morphological development. Here, we report the isolation and genome sequencing of marine sponge-derived Streptomyces sp. strain G-5 with such a seawater requirement.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA