Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 154
Filtrar
Mais filtros

Bases de dados
País/Região como assunto
Tipo de documento
Intervalo de ano de publicação
1.
Genome Res ; 34(7): 1036-1051, 2024 Aug 20.
Artigo em Inglês | MEDLINE | ID: mdl-39134412

RESUMO

Cell identity annotation for single-cell transcriptome data is a crucial process for constructing cell atlases, unraveling pathogenesis, and inspiring therapeutic approaches. Currently, the efficacy of existing methodologies is contingent upon specific data sets. Nevertheless, such data are often sourced from various batches, sequencing technologies, tissues, and even species. Notably, the gene regulatory relationship remains unaffected by the aforementioned factors, highlighting the extensive gene interactions within organisms. Therefore, we propose scHGR, an automated annotation tool designed to leverage gene regulatory relationships in constructing gene-mediated cell communication graphs for single-cell transcriptome data. This strategy helps reduce noise from diverse data sources while establishing distant cellular connections, yielding valuable biological insights. Experiments involving 22 scenarios demonstrate that scHGR precisely and consistently annotates cell identities, benchmarked against state-of-the-art methods. Crucially, scHGR uncovers novel subtypes within peripheral blood mononuclear cells, specifically from CD4+ T cells and cytotoxic T cells. Furthermore, by characterizing a cell atlas comprising 56 cell types for COVID-19 patients, scHGR identifies vital factors like IL1 and calcium ions, offering insights for targeted therapeutic interventions.


Assuntos
COVID-19 , Redes Reguladoras de Genes , RNA-Seq , Análise da Expressão Gênica de Célula Única , Humanos , Linfócitos T CD4-Positivos/metabolismo , COVID-19/genética , COVID-19/virologia , Leucócitos Mononucleares/metabolismo , Anotação de Sequência Molecular , RNA-Seq/métodos , SARS-CoV-2/genética , Transcriptoma
2.
Brief Bioinform ; 24(4)2023 07 20.
Artigo em Inglês | MEDLINE | ID: mdl-37321965

RESUMO

In recent years, protein structure problems have become a hotspot for understanding protein folding and function mechanisms. It has been observed that most of the protein structure works rely on and benefit from co-evolutionary information obtained by multiple sequence alignment (MSA). As an example, AlphaFold2 (AF2) is a typical MSA-based protein structure tool which is famous for its high accuracy. As a consequence, these MSA-based methods are limited by the quality of the MSAs. Especially for orphan proteins that have no homologous sequence, AlphaFold2 performs unsatisfactorily as MSA depth decreases, which may pose a barrier to its widespread application in protein mutation and design problems in which there are no rich homologous sequences and rapid prediction is needed. In this paper, we constructed two standard datasets for orphan and de novo proteins which have insufficient/none homology information, called Orphan62 and Design204, respectively, to fairly evaluate the performance of the various methods in this case. Then, depending on whether or not utilizing scarce MSA information, we summarized two approaches, MSA-enhanced and MSA-free methods, to effectively solve the issue without sufficient MSAs. MSA-enhanced model aims to improve poor MSA quality from the data source by knowledge distillation and generation models. MSA-free model directly learns the relationship between residues on enormous protein sequences from pre-trained models, bypassing the step of extracting the residue pair representation from MSA. Next, we evaluated the performance of four MSA-free methods (trRosettaX-Single, TRFold, ESMFold and ProtT5) and MSA-enhanced (Bagging MSA) method compared with a traditional MSA-based method AlphaFold2, in two protein structure-related prediction tasks, respectively. Comparison analyses show that trRosettaX-Single and ESMFold which belong to MSA-free method can achieve fast prediction ($\sim\! 40$s) and comparable performance compared with AF2 in tertiary structure prediction, especially for short peptides, $\alpha $-helical segments and targets with few homologous sequences. Bagging MSA utilizing MSA enhancement improves the accuracy of our trained base model which is an MSA-based method when poor homology information exists in secondary structure prediction. Our study provides biologists an insight of how to select rapid and appropriate prediction tools for enzyme engineering and peptide drug development. CONTACT: guofei@csu.edu.cn, jj.tang@siat.ac.cn.


Assuntos
Algoritmos , Furilfuramida , Alinhamento de Sequência , Proteínas/química , Sequência de Aminoácidos
3.
Brief Bioinform ; 24(3)2023 05 19.
Artigo em Inglês | MEDLINE | ID: mdl-36932655

RESUMO

Determining drug-drug interactions (DDIs) is an important part of pharmacovigilance and has a vital impact on public health. Compared with drug trials, obtaining DDI information from scientific articles is a faster and lower cost but still a highly credible approach. However, current DDI text extraction methods consider the instances generated from articles to be independent and ignore the potential connections between different instances in the same article or sentence. Effective use of external text data could improve prediction accuracy, but existing methods cannot extract key information from external data accurately and reasonably, resulting in low utilization of external data. In this study, we propose a DDI extraction framework, instance position embedding and key external text for DDI (IK-DDI), which adopts instance position embedding and key external text to extract DDI information. The proposed framework integrates the article-level and sentence-level position information of the instances into the model to strengthen the connections between instances generated from the same article or sentence. Moreover, we introduce a comprehensive similarity-matching method that uses string and word sense similarity to improve the matching accuracy between the target drug and external text. Furthermore, the key sentence search method is used to obtain key information from external data. Therefore, IK-DDI can make full use of the connection between instances and the information contained in external text data to improve the efficiency of DDI extraction. Experimental results show that IK-DDI outperforms existing methods on both macro-averaged and micro-averaged metrics, which suggests our method provides complete framework that can be used to extract relationships between biomedical entities and process external text data.


Assuntos
Mineração de Dados , Farmacovigilância , Mineração de Dados/métodos , Interações Medicamentosas , Benchmarking , Sistemas de Liberação de Medicamentos
4.
Brief Bioinform ; 24(6)2023 09 22.
Artigo em Inglês | MEDLINE | ID: mdl-37930024

RESUMO

Development of robust and effective strategies for synthesizing new compounds, drug targeting and constructing GEnome-scale Metabolic models (GEMs) requires a deep understanding of the underlying biological processes. A critical step in achieving this goal is accurately identifying the categories of pathways in which a compound participated. However, current machine learning-based methods often overlook the multifaceted nature of compounds, resulting in inaccurate pathway predictions. Therefore, we present a novel framework on Multi-View Multi-Label Learning for Metabolic Pathway Inference, hereby named MVML-MPI. First, MVML-MPI learns the distinct compound representations in parallel with corresponding compound encoders to fully extract features. Subsequently, we propose an attention-based mechanism that offers a fusion module to complement these multi-view representations. As a result, MVML-MPI accurately represents and effectively captures the complex relationship between compounds and metabolic pathways and distinguishes itself from current machine learning-based methods. In experiments conducted on the Kyoto Encyclopedia of Genes and Genomes pathways dataset, MVML-MPI outperformed state-of-the-art methods, demonstrating the superiority of MVML-MPI and its potential to utilize the field of metabolic pathway design, which can aid in optimizing drug-like compounds and facilitating the development of GEMs. The code and data underlying this article are freely available at https://github.com/guofei-tju/MVML-MPI. Contact:  jtang@cse.sc.edu, guofei@csu.edu.com or wuxi_dyj@csj.uestc.edu.cn.


Assuntos
Aprendizado de Máquina , Redes e Vias Metabólicas
5.
Bioinformatics ; 40(9)2024 Sep 02.
Artigo em Inglês | MEDLINE | ID: mdl-39342389

RESUMO

MOTIVATION: Retrosynthesis identifies available precursor molecules for various and novel compounds. With the advancements and practicality of language models, Transformer-based models have increasingly been used to automate this process. However, many existing methods struggle to efficiently capture reaction transformation information, limiting the accuracy and applicability of their predictions. RESULTS: We introduce RetroCaptioner, an advanced end-to-end, Transformer-based framework featuring a Contrastive Reaction Center Captioner. This captioner guides the training of dual-view attention models using a contrastive learning approach. It leverages learned molecular graph representations to capture chemically plausible constraints within a single-step learning process. We integrate the single-encoder, dual-encoder, and encoder-decoder paradigms to effectively fuse information from the sequence and graph representations of molecules. This involves modifying the Transformer encoder into a uni-view sequence encoder and a dual-view module. Furthermore, we enhance the captioning of atomic correspondence between SMILES and graphs. Our proposed method, RetroCaptioner, achieved outstanding performance with 67.2% in top-1 and 93.4% in top-10 exact matched accuracy on the USPTO-50k dataset, alongside an exceptional SMILES validity score of 99.4%. In addition, RetroCaptioner has demonstrated its reliability in generating synthetic routes for the drug protokylol. AVAILABILITY AND IMPLEMENTATION: The code and data are available at https://github.com/guofei-tju/RetroCaptioner.


Assuntos
Software , Algoritmos , Aprendizado de Máquina
6.
Brief Bioinform ; 23(2)2022 03 10.
Artigo em Inglês | MEDLINE | ID: mdl-35134117

RESUMO

Targeted drugs have been applied to the treatment of cancer on a large scale, and some patients have certain therapeutic effects. It is a time-consuming task to detect drug-target interactions (DTIs) through biochemical experiments. At present, machine learning (ML) has been widely applied in large-scale drug screening. However, there are few methods for multiple information fusion. We propose a multiple kernel-based triple collaborative matrix factorization (MK-TCMF) method to predict DTIs. The multiple kernel matrices (contain chemical, biological and clinical information) are integrated via multi-kernel learning (MKL) algorithm. And the original adjacency matrix of DTIs could be decomposed into three matrices, including the latent feature matrix of the drug space, latent feature matrix of the target space and the bi-projection matrix (used to join the two feature spaces). To obtain better prediction performance, MKL algorithm can regulate the weight of each kernel matrix according to the prediction error. The weights of drug side-effects and target sequence are the highest. Compared with other computational methods, our model has better performance on four test data sets.


Assuntos
Algoritmos , Efeitos Colaterais e Reações Adversas Relacionados a Medicamentos , Interações Medicamentosas , Humanos , Aprendizado de Máquina
7.
Brief Bioinform ; 23(2)2022 03 10.
Artigo em Inglês | MEDLINE | ID: mdl-35062026

RESUMO

Inferring gene regulatory networks (GRNs) based on gene expression profiles is able to provide an insight into a number of cellular phenotypes from the genomic level and reveal the essential laws underlying various life phenomena. Different from the bulk expression data, single-cell transcriptomic data embody cell-to-cell variance and diverse biological information, such as tissue characteristics, transformation of cell types, etc. Inferring GRNs based on such data offers unprecedented advantages for making a profound study of cell phenotypes, revealing gene functions and exploring potential interactions. However, the high sparsity, noise and dropout events of single-cell transcriptomic data pose new challenges for regulation identification. We develop a hybrid deep learning framework for GRN inference from single-cell transcriptomic data, DGRNS, which encodes the raw data and fuses recurrent neural network and convolutional neural network (CNN) to train a model capable of distinguishing related gene pairs from unrelated gene pairs. To overcome the limitations of such datasets, it applies sliding windows to extract valuable features while preserving the direction of regulation. DGRNS is constructed as a deep learning model containing gated recurrent unit network for exploring time-dependent information and CNN for learning spatially related information. Our comprehensive and detailed comparative analysis on the dataset of mouse hematopoietic stem cells illustrates that DGRNS outperforms state-of-the-art methods. The networks inferred by DGRNS are about 16% higher than the area under the receiver operating characteristic curve of other unsupervised methods and 10% higher than the area under the precision recall curve of other supervised methods. Experiments on human datasets show the strong robustness and excellent generalization of DGRNS. By comparing the predictions with standard network, we discover a series of novel interactions which are proved to be true in some specific cell types. Importantly, DGRNS identifies a series of regulatory relationships with high confidence and functional consistency, which have not yet been experimentally confirmed and merit further research.


Assuntos
Aprendizado Profundo , Redes Reguladoras de Genes , Algoritmos , Animais , Camundongos , Redes Neurais de Computação , Transcriptoma
8.
Brief Bioinform ; 23(1)2022 01 17.
Artigo em Inglês | MEDLINE | ID: mdl-34791034

RESUMO

Identifying driver genes, exactly from massive genes with mutations, promotes accurate diagnosis and treatment of cancer. In recent years, a lot of works about uncovering driver genes based on integration of mutation data and gene interaction networks is gaining more attention. However, it is in suspense if it is more effective for prioritizing driver genes when integrating various types of mutation information (frequency and functional impact) and gene networks. Hence, we build a two-stage-vote ensemble framework based on somatic mutations and mutual interactions. Specifically, we first represent and combine various kinds of mutation information, which are propagated through networks by an improved iterative framework. The first vote is conducted on iteration results by voting methods, and the second vote is performed to get ensemble results of the first poll for the final driver gene list. Compared with four excellent previous approaches, our method has better performance in identifying driver genes on $33$ types of cancer from The Cancer Genome Atlas. Meanwhile, we also conduct a comparative analysis about two kinds of mutation information, five gene interaction networks and four voting strategies. Our framework offers a new view for data integration and promotes more latent cancer genes to be admitted.


Assuntos
Redes Reguladoras de Genes , Neoplasias , Epistasia Genética , Humanos , Mutação , Neoplasias/genética , Oncogenes
9.
Bioinformatics ; 39(1)2023 01 01.
Artigo em Inglês | MEDLINE | ID: mdl-36355452

RESUMO

MOTIVATION: Somatic mutation co-occurrence has been proven to have a profound effect on tumorigenesis. While some studies have been conducted on co-mutations, a centralized resource dedicated to co-mutations in cancer is still lacking. RESULTS: Using multi-omics data from over 30 000 subjects and 1747 cancer cell lines, we present the Cancer co-mutation database (CoMutDB), the most comprehensive resource devoted to describing cancer co-mutations and their characteristics. AVAILABILITY AND IMPLEMENTATION: The data underlying this article are available in the online database CoMutDB: http://www.innovebioinfo.com/Database/CoMutDB/Home.php.


Assuntos
Neoplasias , Humanos , Mutação , Bases de Dados Factuais , Neoplasias/genética , Carcinogênese , Transformação Celular Neoplásica
10.
Nucleic Acids Res ; 50(1): e4, 2022 01 11.
Artigo em Inglês | MEDLINE | ID: mdl-34606615

RESUMO

Efficient annotation of alterations in binding sequences of molecular regulators can help identify novel candidates for mechanisms study and offer original therapeutic hypotheses. In this work, we developed Somatic Binding Sequence Annotator (SBSA) as a full-capacity online tool to annotate altered binding motifs/sequences, addressing diverse types of genomic variants and molecular regulators. The genomic variants can be somatic mutation, single nucleotide polymorphism, RNA editing, etc. The binding motifs/sequences involve transcription factors (TFs), RNA-binding proteins, miRNA seeds, miRNA-mRNA 3'-UTR binding target, or can be any custom motifs/sequences. Compared to similar tools, SBSA is the first to support miRNA seeds and miRNA-mRNA 3'-UTR binding target, and it unprecedentedly implements a personalized genome approach that accommodates joint adjacent variants. SBSA is empowered to support an indefinite species, including preloaded reference genomes for SARS-Cov-2 and 25 other common organisms. We demonstrated SBSA by annotating multi-omics data from over 30,890 human subjects. Of the millions of somatic binding sequences identified, many are with known severe biological repercussions, such as the somatic mutation in TERT promoter region which causes a gained binding sequence for E26 transformation-specific factor (ETS1). We further validated the function of this TERT mutation using experimental data in cancer cells. Availability:http://innovebioinfo.com/Annotation/SBSA/SBSA.php.


Assuntos
COVID-19/virologia , Biologia Computacional/instrumentação , Genômica/instrumentação , Mutação , Proteômica/instrumentação , SARS-CoV-2 , Regiões 3' não Traduzidas , Algoritmos , Motivos de Aminoácidos , COVID-19/metabolismo , Biologia Computacional/métodos , Computadores , Técnicas Genéticas , Genoma Humano , Genômica/métodos , Humanos , Internet , MicroRNAs/metabolismo , Fenótipo , Regiões Promotoras Genéticas , Ligação Proteica , Proteômica/métodos , Proteína Proto-Oncogênica c-ets-1/genética , Proteína Proto-Oncogênica c-ets-1/metabolismo , Proteínas de Ligação a RNA/metabolismo , Telomerase/metabolismo
11.
Brief Bioinform ; 22(6)2021 11 05.
Artigo em Inglês | MEDLINE | ID: mdl-33959764

RESUMO

Diseases caused by bacterial infections become a critical problem in public heath. Antibiotic, the traditional treatment, gradually loses their effectiveness due to the resistance. Meanwhile, antibacterial proteins attract more attention because of broad spectrum and little harm to host cells. Therefore, exploring new effective antibacterial proteins is urgent and necessary. In this paper, we are committed to evaluating the effectiveness of ab-initio docking methods in antibacterial protein-protein docking. For this purpose, we constructed a three-dimensional (3D) structure dataset of antibacterial protein complex, called APCset, which contained $19$ protein complexes whose receptors or ligands are homologous to antibacterial peptides from Antimicrobial Peptide Database. Then we selected five representative ab-initio protein-protein docking tools including ZDOCK3.0.2, FRODOCK3.0, ATTRACT, PatchDock and Rosetta to identify these complexes' structure, whose performance differences were obtained by analyzing from five aspects, including top/best pose, first hit, success rate, average hit count and running time. Finally, according to different requirements, we assessed and recommended relatively efficient protein-protein docking tools. In terms of computational efficiency and performance, ZDOCK was more suitable as preferred computational tool, with average running time of $6.144$ minutes, average Fnat of best pose of $0.953$ and average rank of best pose of $4.158$. Meanwhile, ZDOCK still yielded better performance on Benchmark 5.0, which proved ZDOCK was effective in performing docking on large-scale dataset. Our survey can offer insights into the research on the treatment of bacterial infections by utilizing the appropriate docking methods.


Assuntos
Algoritmos , Peptídeos Antimicrobianos/química , Biologia Computacional , Bases de Dados de Proteínas , Simulação de Acoplamento Molecular , Software
12.
Brief Bioinform ; 22(6)2021 11 05.
Artigo em Inglês | MEDLINE | ID: mdl-33939795

RESUMO

Lots of biological processes are controlled by gene regulatory networks (GRNs), such as growth and differentiation of cells, occurrence and development of the diseases. Therefore, it is important to persistently concentrate on the research of GRN. The determination of the gene-gene relationships from gene expression data is a complex issue. Since it is difficult to efficiently obtain the regularity behind the gene-gene relationship by only relying on biochemical experimental methods, thus various computational methods have been used to construct GRNs, and some achievements have been made. In this paper, we propose a novel method MMFGRN (for "Multi-source Multi-model Fusion for Gene Regulatory Network reconstruction") to reconstruct the GRN. In order to make full use of the limited datasets and explore the potential regulatory relationships contained in different data types, we construct the MMFGRN model from three perspectives: single time series data model, single steady-data model and time series and steady-data joint model. And, we utilize the weighted fusion strategy to get the final global regulatory link ranking. Finally, MMFGRN model yields the best performance on the DREAM4 InSilico_Size10 data, outperforming other popular inference algorithms, with an overall area under receiver operating characteristic score of 0.909 and area under precision-recall (AUPR) curves score of 0.770 on the 10-gene network. Additionally, as the network scale increases, our method also has certain advantages with an overall AUPR score of 0.335 on the DREAM4 InSilico_Size100 data. These results demonstrate the good robustness of MMFGRN on different scales of networks. At the same time, the integration strategy proposed in this paper provides a new idea for the reconstruction of the biological network model without prior knowledge, which can help researchers to decipher the elusive mechanism of life.


Assuntos
Biologia Computacional/métodos , Regulação da Expressão Gênica , Redes Reguladoras de Genes , Software , Algoritmos , Reprodutibilidade dos Testes , Fluxo de Trabalho
13.
Brief Bioinform ; 22(5)2021 09 02.
Artigo em Inglês | MEDLINE | ID: mdl-33443536

RESUMO

Relationship of accurate associations between non-coding RNAs and diseases could be of great help in the treatment of human biomedical research. However, the traditional technology is only applied on one type of non-coding RNA or a specific disease, and the experimental method is time-consuming and expensive. More computational tools have been proposed to detect new associations based on known ncRNA and disease information. Due to the ncRNAs (circRNAs, miRNAs and lncRNAs) having a close relationship with the progression of various human diseases, it is critical for developing effective computational predictors for ncRNA-disease association prediction. In this paper, we propose a new computational method of three-matrix factorization with hypergraph regularization terms (HGRTMF) based on central kernel alignment (CKA), for identifying general ncRNA-disease associations. In the process of constructing the similarity matrix, various types of similarity matrices are applicable to circRNAs, miRNAs and lncRNAs. Our method achieves excellent performance on five datasets, involving three types of ncRNAs. In the test, we obtain best area under the curve scores of $0.9832$, $0.9775$, $0.9023$, $0.8809$ and $0.9185$ via 5-fold cross-validation and $0.9832$, $0.9836$, $0.9198$, $0.9459$ and $0.9275$ via leave-one-out cross-validation on five datasets. Furthermore, our novel method (CKA-HGRTMF) is also able to discover new associations between ncRNAs and diseases accurately. Availability: Codes and data are available: https://github.com/hzwh6910/ncRNA2Disease.git. Contact:fguo@tju.edu.cn.


Assuntos
Algoritmos , Biologia Computacional , Doença/genética , Modelos Genéticos , RNA não Traduzido , Humanos , RNA não Traduzido/genética , RNA não Traduzido/metabolismo
14.
Brief Bioinform ; 22(6)2021 11 05.
Artigo em Inglês | MEDLINE | ID: mdl-34131696

RESUMO

Major histocompatibility complex (MHC) possesses important research value in the treatment of complex human diseases. A plethora of computational tools has been developed to predict MHC class I binders. Here, we comprehensively reviewed 27 up-to-date MHC I binding prediction tools developed over the last decade, thoroughly evaluating feature representation methods, prediction algorithms and model training strategies on a benchmark dataset from Immune Epitope Database. A common limitation was identified during the review that all existing tools can only handle a fixed peptide sequence length. To overcome this limitation, we developed a bilateral and variable long short-term memory (BVLSTM)-based approach, named BVLSTM-MHC. It is the first variable-length MHC class I binding predictor. In comparison to the 10 mainstream prediction tools on an independent validation dataset, BVLSTM-MHC achieved the best performance in six out of eight evaluated metrics. A web server based on the BVLSTM-MHC model was developed to enable accurate and efficient MHC class I binder prediction in human, mouse, macaque and chimpanzee.


Assuntos
Sítios de Ligação , Proteínas de Transporte/química , Biologia Computacional/métodos , Antígenos de Histocompatibilidade Classe I/química , Redes Neurais de Computação , Software , Sequência de Aminoácidos , Proteínas de Transporte/metabolismo , Bases de Dados Factuais , Aprendizado Profundo , Epitopos/química , Epitopos/imunologia , Epitopos/metabolismo , Antígenos de Histocompatibilidade Classe I/imunologia , Antígenos de Histocompatibilidade Classe I/metabolismo , Aprendizado de Máquina , Ligação Proteica , Curva ROC , Reprodutibilidade dos Testes , Navegador
15.
Brief Bioinform ; 22(5)2021 09 02.
Artigo em Inglês | MEDLINE | ID: mdl-33539514

RESUMO

Gene regulatory network (GRN) is the important mechanism of maintaining life process, controlling biochemical reaction and regulating compound level, which plays an important role in various organisms and systems. Reconstructing GRN can help us to understand the molecular mechanism of organisms and to reveal the essential rules of a large number of biological processes and reactions in organisms. Various outstanding network reconstruction algorithms use specific assumptions that affect prediction accuracy, in order to deal with the uncertainty of processing. In order to study why a certain method is more suitable for specific research problem or experimental data, we conduct research from model-based, information-based and machine learning-based method classifications. There are obviously different types of computational tools that can be generated to distinguish GRNs. Furthermore, we discuss several classical, representative and latest methods in each category to analyze core ideas, general steps, characteristics, etc. We compare the performance of state-of-the-art GRN reconstruction technologies on simulated networks and real networks under different scaling conditions. Through standardized performance metrics and common benchmarks, we quantitatively evaluate the stability of various methods and the sensitivity of the same algorithm applying to different scaling networks. The aim of this study is to explore the most appropriate method for a specific GRN, which helps biologists and medical scientists in discovering potential drug targets and identifying cancer biomarkers.


Assuntos
Biologia Computacional/métodos , Regulação da Expressão Gênica , Redes Reguladoras de Genes , Aprendizado de Máquina , Transcriptoma , Teorema de Bayes , Biomarcadores Tumorais/genética , Bases de Dados Genéticas , Escherichia coli/genética , Modelos Genéticos , Neoplasias/genética , RNA-Seq/métodos
16.
Brief Bioinform ; 22(3)2021 05 20.
Artigo em Inglês | MEDLINE | ID: mdl-32778871

RESUMO

Quantifying DNA properties is a challenging task in the broad field of human genomics. Since the vast majority of non-coding DNA is still poorly understood in terms of function, this task is particularly important to have enormous benefit for biology research. Various DNA sequences should have a great variety of representations, and specific functions may focus on corresponding features in the front part of learning model. Currently, however, for multi-class prediction of non-coding DNA regulatory functions, most powerful predictive models do not have appropriate feature extraction and selection approaches for specific functional effects, so that it is difficult to gain a better insight into their internal correlations. Hence, we design a category attention layer and category dense layer in order to select efficient features and distinguish different DNA functions. In this study, we propose a hybrid deep neural network method, called DeepATT, for identifying $919$ regulatory functions on nearly $5$ million DNA sequences. Our model has four built-in neural network constructions: convolution layer captures regulatory motifs, recurrent layer captures a regulatory grammar, category attention layer selects corresponding valid features for different functions and category dense layer classifies predictive labels with selected features of regulatory functions. Importantly, we compare our novel method, DeepATT, with existing outstanding prediction tools, DeepSEA and DanQ. DeepATT performs significantly better than other existing tools for identifying DNA functions, at least increasing $1.6\%$ area under precision recall. Furthermore, we can mine the important correlation among different DNA functions according to the category attention module. Moreover, our novel model can greatly reduce the number of parameters by the mechanism of attention and locally connected, on the basis of ensuring accuracy.


Assuntos
DNA/genética , Bases de Dados de Ácidos Nucleicos , Redes Neurais de Computação , Sequências Reguladoras de Ácido Nucleico , Análise de Sequência de DNA
17.
Bioinformatics ; 38(6): 1716-1723, 2022 03 04.
Artigo em Inglês | MEDLINE | ID: mdl-34999771

RESUMO

MOTIVATION: Recently, with the development of high-throughput experimental technology, reconstruction of gene regulatory network (GRN) has ushered in new opportunities and challenges. Some previous methods mainly extract gene expression information based on RNA-seq data, but the associated information is very limited. With the establishment of gene expression image database, it is possible to infer GRN from image data with rich spatial information. RESULTS: First, we propose a new convolutional neural network (called SDINet), which can extract gene expression information from images and identify the interaction between genes. SDINet can obtain the detailed information and high-level semantic information from the images well. And it can achieve satisfying performance on image data (Acc: 0.7196, F1: 0.7374). Second, we apply the idea of our SDINet to build an RNA-model, which also achieves good results on RNA-seq data (Acc: 0.8962, F1: 0.8950). Finally, we combine image data and RNA-seq data, and design a new fusion network to explore the potential relationship between them. Experiments show that our proposed network fusing two modalities can obtain satisfying performance (Acc: 0.9116, F1: 0.9118) than any single data. AVAILABILITY AND IMPLEMENTATION: Data and code are available from https://github.com/guofei-tju/Combine-Gene-Expression-images-and-RNA-seq-data-For-infering-GRN.


Assuntos
Redes Reguladoras de Genes , Expressão Gênica , RNA-Seq , Análise de Sequência de RNA/métodos
18.
J Transl Med ; 21(1): 783, 2023 11 04.
Artigo em Inglês | MEDLINE | ID: mdl-37925448

RESUMO

Prior research has shown that the deconvolution of cell-free RNA can uncover the tissue origin. The conventional deconvolution approaches rely on constructing a reference tissue-specific gene panel, which cannot capture the inherent variation present in actual data. To address this, we have developed a novel method that utilizes a neural network framework to leverage the entire training dataset. Our approach involved training a model that incorporated 15 distinct tissue types. Through one semi-independent and two complete independent validations, including deconvolution using a semi in silico dataset, deconvolution with a custom normal tissue mixture RNA-seq data, and deconvolution of longitudinal circulating tumor cell RNA-seq (ctcRNA) data from a cancer patient with metastatic tumors, we demonstrate the efficacy and advantages of the deep-learning approach which were exerted by effectively capturing the inherent variability present in the dataset, thus leading to enhanced accuracy. Sensitivity analyses reveal that neural network models are less susceptible to the presence of missing data, making them more suitable for real-world applications. Moreover, by leveraging the concept of organotropism, we applied our approach to trace the migration of circulating tumor cell-derived RNA (ctcRNA) in a cancer patient with metastatic tumors, thereby highlighting the potential clinical significance of early detection of cancer metastasis.


Assuntos
Células Neoplásicas Circulantes , RNA , Humanos , Redes Neurais de Computação , RNA-Seq , Análise de Sequência de RNA
19.
Brief Bioinform ; 21(5): 1628-1640, 2020 09 25.
Artigo em Inglês | MEDLINE | ID: mdl-31697319

RESUMO

Human protein subcellular localization has an important research value in biological processes, also in elucidating protein functions and identifying drug targets. Over the past decade, a number of protein subcellular localization prediction tools have been designed and made freely available online. The purpose of this paper is to summarize the progress of research on the subcellular localization of human proteins in recent years, including commonly used data sets proposed by the predecessors and the performance of all selected prediction tools against the same benchmark data set. We carry out a systematic evaluation of several publicly available subcellular localization prediction methods on various benchmark data sets. Among them, we find that mLASSO-Hum and pLoc-mHum provide a statistically significant improvement in performance, as measured by the value of accuracy, relative to the other methods. Meanwhile, we build a new data set using the latest version of Uniprot database and construct a new GO-based prediction method HumLoc-LBCI in this paper. Then, we test all selected prediction tools on the new data set. Finally, we discuss the possible development directions of human protein subcellular localization. Availability: The codes and data are available from http://www.lbci.cn/syn/.


Assuntos
Internet , Proteínas/metabolismo , Frações Subcelulares/metabolismo , Benchmarking , Conjuntos de Dados como Assunto , Humanos
20.
Nanotechnology ; 33(20)2022 Feb 21.
Artigo em Inglês | MEDLINE | ID: mdl-34983034

RESUMO

BiFeO3is a photocatalyst with excellent performance. However, its applications are limited due to its wide bandgap. In this paper, MIL-101(Fe)@BiOI composite material is synthesized by hydrothermal method and then calcined at high temperature to obtain BiFeO3@Bi5O7I composite material with high degradation capacity. Among them, an n-n heterojunction is formed, which improves the efficiency of charge transfer, and the recombination of light-generated electrons and holes promotes improved photocatalytic efficiency and stability. The result of photocatalytic degradation of tetracycline under visible light irradiation showed, BiFeO3@Bi5O7I (1:2) has the best photodegradation effect, with a degradation rate of 86.4%, which proves its potential as a photocatalyst.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA