Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 2.786
Filtrar
Mais filtros

Tipo de documento
Intervalo de ano de publicação
1.
Cell ; 178(3): 640-652.e14, 2019 07 25.
Artigo em Inglês | MEDLINE | ID: mdl-31280961

RESUMO

Knowledge abstracted from previous experiences can be transferred to aid new learning. Here, we asked whether such abstract knowledge immediately guides the replay of new experiences. We first trained participants on a rule defining an ordering of objects and then presented a novel set of objects in a scrambled order. Across two studies, we observed that representations of these novel objects were reactivated during a subsequent rest. As in rodents, human "replay" events occurred in sequences accelerated in time, compared to actual experience, and reversed their direction after a reward. Notably, replay did not simply recapitulate visual experience, but followed instead a sequence implied by learned abstract knowledge. Furthermore, each replay contained more than sensory representations of the relevant objects. A sensory code of object representations was preceded 50 ms by a code factorized into sequence position and sequence identity. We argue that this factorized representation facilitates the generalization of a previously learned structure to new objects.


Assuntos
Aprendizagem , Memória , Potenciais de Ação , Adulto , Feminino , Hipocampo/fisiologia , Humanos , Magnetoencefalografia , Masculino , Estimulação Luminosa , Recompensa , Adulto Jovem
2.
Cell ; 172(5): 1122-1131.e9, 2018 02 22.
Artigo em Inglês | MEDLINE | ID: mdl-29474911

RESUMO

The implementation of clinical-decision support algorithms for medical imaging faces challenges with reliability and interpretability. Here, we establish a diagnostic tool based on a deep-learning framework for the screening of patients with common treatable blinding retinal diseases. Our framework utilizes transfer learning, which trains a neural network with a fraction of the data of conventional approaches. Applying this approach to a dataset of optical coherence tomography images, we demonstrate performance comparable to that of human experts in classifying age-related macular degeneration and diabetic macular edema. We also provide a more transparent and interpretable diagnosis by highlighting the regions recognized by the neural network. We further demonstrate the general applicability of our AI system for diagnosis of pediatric pneumonia using chest X-ray images. This tool may ultimately aid in expediting the diagnosis and referral of these treatable conditions, thereby facilitating earlier treatment, resulting in improved clinical outcomes. VIDEO ABSTRACT.


Assuntos
Aprendizado Profundo , Diagnóstico por Imagem , Pneumonia/diagnóstico , Criança , Humanos , Redes Neurais de Computação , Pneumonia/diagnóstico por imagem , Curva ROC , Reprodutibilidade dos Testes , Tomografia de Coerência Óptica
3.
Proc Natl Acad Sci U S A ; 121(2): e2312159120, 2024 Jan 09.
Artigo em Inglês | MEDLINE | ID: mdl-38175862

RESUMO

We address the challenge of acoustic simulations in three-dimensional (3D) virtual rooms with parametric source positions, which have applications in virtual/augmented reality, game audio, and spatial computing. The wave equation can fully describe wave phenomena such as diffraction and interference. However, conventional numerical discretization methods are computationally expensive when simulating hundreds of source and receiver positions, making simulations with parametric source positions impractical. To overcome this limitation, we propose using deep operator networks to approximate linear wave-equation operators. This enables the rapid prediction of sound propagation in realistic 3D acoustic scenes with parametric source positions, achieving millisecond-scale computations. By learning a compact surrogate model, we avoid the offline calculation and storage of impulse responses for all relevant source/listener pairs. Our experiments, including various complex scene geometries, show good agreement with reference solutions, with root mean squared errors ranging from 0.02 to 0.10 Pa. Notably, our method signifies a paradigm shift as-to our knowledge-no prior machine learning approach has achieved precise predictions of complete wave fields within realistic domains.

4.
Proc Natl Acad Sci U S A ; 121(6): e2314853121, 2024 Feb 06.
Artigo em Inglês | MEDLINE | ID: mdl-38285937

RESUMO

Amino acid mutations that lower a protein's thermodynamic stability are implicated in numerous diseases, and engineered proteins with enhanced stability can be important in research and medicine. Computational methods for predicting how mutations perturb protein stability are, therefore, of great interest. Despite recent advancements in protein design using deep learning, in silico prediction of stability changes has remained challenging, in part due to a lack of large, high-quality training datasets for model development. Here, we describe ThermoMPNN, a deep neural network trained to predict stability changes for protein point mutations given an initial structure. In doing so, we demonstrate the utility of a recently released megascale stability dataset for training a robust stability model. We also employ transfer learning to leverage a second, larger dataset by using learned features extracted from ProteinMPNN, a deep neural network trained to predict a protein's amino acid sequence given its three-dimensional structure. We show that our method achieves state-of-the-art performance on established benchmark datasets using a lightweight model architecture that allows for rapid, scalable predictions. Finally, we make ThermoMPNN readily available as a tool for stability prediction and design.


Assuntos
Redes Neurais de Computação , Proteínas , Proteínas/genética , Proteínas/química , Sequência de Aminoácidos , Estabilidade Proteica , Aprendizado de Máquina
5.
Brief Bioinform ; 25(3)2024 Mar 27.
Artigo em Inglês | MEDLINE | ID: mdl-38706315

RESUMO

In UniProtKB, up to date, there are more than 251 million proteins deposited. However, only 0.25% have been annotated with one of the more than 15000 possible Pfam family domains. The current annotation protocol integrates knowledge from manually curated family domains, obtained using sequence alignments and hidden Markov models. This approach has been successful for automatically growing the Pfam annotations, however at a low rate in comparison to protein discovery. Just a few years ago, deep learning models were proposed for automatic Pfam annotation. However, these models demand a considerable amount of training data, which can be a challenge with poorly populated families. To address this issue, we propose and evaluate here a novel protocol based on transfer learningThis requires the use of protein large language models (LLMs), trained with self-supervision on big unnanotated datasets in order to obtain sequence embeddings. Then, the embeddings can be used with supervised learning on a small and annotated dataset for a specialized task. In this protocol we have evaluated several cutting-edge protein LLMs together with machine learning architectures to improve the actual prediction of protein domain annotations. Results are significatively better than state-of-the-art for protein families classification, reducing the prediction error by an impressive 60% compared to standard methods. We explain how LLMs embeddings can be used for protein annotation in a concrete and easy way, and provide the pipeline in a github repo. Full source code and data are available at https://github.com/sinc-lab/llm4pfam.


Assuntos
Bases de Dados de Proteínas , Proteínas , Proteínas/química , Anotação de Sequência Molecular/métodos , Biologia Computacional/métodos , Aprendizado de Máquina
6.
Brief Bioinform ; 25(2)2024 Jan 22.
Artigo em Inglês | MEDLINE | ID: mdl-38485768

RESUMO

Enhancers, noncoding DNA fragments, play a pivotal role in gene regulation, facilitating gene transcription. Identifying enhancers is crucial for understanding genomic regulatory mechanisms, pinpointing key elements and investigating networks governing gene expression and disease-related mechanisms. Existing enhancer identification methods exhibit limitations, prompting the development of our novel multi-input deep learning framework, termed Enhancer-MDLF. Experimental results illustrate that Enhancer-MDLF outperforms the previous method, Enhancer-IF, across eight distinct human cell lines and exhibits superior performance on generic enhancer datasets and enhancer-promoter datasets, affirming the robustness of Enhancer-MDLF. Additionally, we introduce transfer learning to provide an effective and potential solution to address the prediction challenges posed by enhancer specificity. Furthermore, we utilize model interpretation to identify transcription factor binding site motifs that may be associated with enhancer regions, with important implications for facilitating the study of enhancer regulatory mechanisms. The source code is openly accessible at https://github.com/HaoWuLab-Bioinformatics/Enhancer-MDLF.


Assuntos
Aprendizado Profundo , Elementos Facilitadores Genéticos , Humanos , Genômica/métodos , Regulação da Expressão Gênica , Regiões Promotoras Genéticas
7.
Brief Bioinform ; 25(4)2024 May 23.
Artigo em Inglês | MEDLINE | ID: mdl-38990514

RESUMO

Protein-peptide interactions (PPepIs) are vital to understanding cellular functions, which can facilitate the design of novel drugs. As an essential component in forming a PPepI, protein-peptide binding sites are the basis for understanding the mechanisms involved in PPepIs. Therefore, accurately identifying protein-peptide binding sites becomes a critical task. The traditional experimental methods for researching these binding sites are labor-intensive and time-consuming, and some computational tools have been invented to supplement it. However, these computational tools have limitations in generality or accuracy due to the need for ligand information, complex feature construction, or their reliance on modeling based on amino acid residues. To deal with the drawbacks of these computational algorithms, we describe a geometric attention-based network for peptide binding site identification (GAPS) in this work. The proposed model utilizes geometric feature engineering to construct atom representations and incorporates multiple attention mechanisms to update relevant biological features. In addition, the transfer learning strategy is implemented for leveraging the protein-protein binding sites information to enhance the protein-peptide binding sites recognition capability, taking into account the common structure and biological bias between proteins and peptides. Consequently, GAPS demonstrates the state-of-the-art performance and excellent robustness in this task. Moreover, our model exhibits exceptional performance across several extended experiments including predicting the apo protein-peptide, protein-cyclic peptide and the AlphaFold-predicted protein-peptide binding sites. These results confirm that the GAPS model is a powerful, versatile, stable method suitable for diverse binding site predictions.


Assuntos
Peptídeos , Sítios de Ligação , Peptídeos/química , Peptídeos/metabolismo , Ligação Proteica , Biologia Computacional/métodos , Algoritmos , Proteínas/química , Proteínas/metabolismo , Aprendizado de Máquina
8.
Brief Bioinform ; 25(4)2024 May 23.
Artigo em Inglês | MEDLINE | ID: mdl-38920345

RESUMO

Bioactive peptide therapeutics has been a long-standing research topic. Notably, the antimicrobial peptides (AMPs) have been extensively studied for its therapeutic potential. Meanwhile, the demand for annotating other therapeutic peptides, such as antiviral peptides (AVPs) and anticancer peptides (ACPs), also witnessed an increase in recent years. However, we conceive that the structure of peptide chains and the intrinsic information between the amino acids is not fully investigated among the existing protocols. Therefore, we develop a new graph deep learning model, namely TP-LMMSG, which offers lightweight and easy-to-deploy advantages while improving the annotation performance in a generalizable manner. The results indicate that our model can accurately predict the properties of different peptides. The model surpasses the other state-of-the-art models on AMP, AVP and ACP prediction across multiple experimental validated datasets. Moreover, TP-LMMSG also addresses the challenges of time-consuming pre-processing in graph neural network frameworks. With its flexibility in integrating heterogeneous peptide features, our model can provide substantial impacts on the screening and discovery of therapeutic peptides. The source code is available at https://github.com/NanjunChen37/TP_LMMSG.


Assuntos
Aminoácidos , Redes Neurais de Computação , Peptídeos , Aminoácidos/química , Peptídeos/química , Biologia Computacional/métodos , Aprendizado Profundo , Peptídeos Antimicrobianos/química , Algoritmos
9.
Brief Bioinform ; 25(2)2024 Jan 22.
Artigo em Inglês | MEDLINE | ID: mdl-38261342

RESUMO

Accurate identification of cell cycle phases in single-cell RNA-sequencing (scRNA-seq) data is crucial for biomedical research. Many methods have been developed to tackle this challenge, employing diverse approaches to predict cell cycle phases. In this review article, we delve into the standard processes in identifying cell cycle phases within scRNA-seq data and present several representative methods for comparison. To rigorously assess the accuracy of these methods, we propose an error function and employ multiple benchmarking datasets encompassing human and mouse data. Our evaluation results reveal a key finding: the fit between the reference data and the dataset being analyzed profoundly impacts the effectiveness of cell cycle phase identification methods. Therefore, researchers must carefully consider the compatibility between the reference data and their dataset to achieve optimal results. Furthermore, we explore the potential benefits of incorporating benchmarking data with multiple known cell cycle phases into the analysis. Merging such data with the target dataset shows promise in enhancing prediction accuracy. By shedding light on the accuracy and performance of cell cycle phase prediction methods across diverse datasets, this review aims to motivate and guide future methodological advancements. Our findings offer valuable insights for researchers seeking to improve their understanding of cellular dynamics through scRNA-seq analysis, ultimately fostering the development of more robust and widely applicable cell cycle identification methods.


Assuntos
Benchmarking , Pesquisa Biomédica , Humanos , Animais , Camundongos , Ciclo Celular , Pesquisadores
10.
Brief Bioinform ; 25(2)2024 Jan 22.
Artigo em Inglês | MEDLINE | ID: mdl-38385876

RESUMO

Enhancers play an important role in the process of gene expression regulation. In DNA sequence abundance or absence of enhancers and irregularities in the strength of enhancers affects gene expression process that leads to the initiation and propagation of diverse types of genetic diseases such as hemophilia, bladder cancer, diabetes and congenital disorders. Enhancer identification and strength prediction through experimental approaches is expensive, time-consuming and error-prone. To accelerate and expedite the research related to enhancers identification and strength prediction, around 19 computational frameworks have been proposed. These frameworks used machine and deep learning methods that take raw DNA sequences and predict enhancer's presence and strength. However, these frameworks still lack in performance and are not useful in real time analysis. This paper presents a novel deep learning framework that uses language modeling strategies for transforming DNA sequences into statistical feature space. It applies transfer learning by training a language model in an unsupervised fashion by predicting a group of nucleotides also known as k-mers based on the context of existing k-mers in a sequence. At the classification stage, it presents a novel classifier that reaps the benefits of two different architectures: convolutional neural network and attention mechanism. The proposed framework is evaluated over the enhancer identification benchmark dataset where it outperforms the existing best-performing framework by 5%, and 9% in terms of accuracy and MCC. Similarly, when evaluated over the enhancer strength prediction benchmark dataset, it outperforms the existing best-performing framework by 4%, and 7% in terms of accuracy and MCC.


Assuntos
Benchmarking , Medicina , Redes Neurais de Computação , Nucleotídeos , Sequências Reguladoras de Ácido Nucleico
11.
Brief Bioinform ; 25(6)2024 Sep 23.
Artigo em Inglês | MEDLINE | ID: mdl-39316944

RESUMO

As a main subtype of post-translational modification (PTM), protein lysine acylations (PLAs) play crucial roles in regulating diverse functions of proteins. With recent advancements in proteomics technology, the identification of PTM is becoming a data-rich field. A large amount of experimentally verified data is urgently required to be translated into valuable biological insights. With computational approaches, PLA can be accurately detected across the whole proteome, even for organisms with small-scale datasets. Herein, a comprehensive summary of 166 in silico PLA prediction methods is presented, including a single type of PLA site and multiple types of PLA sites. This recapitulation covers important aspects that are critical for the development of a robust predictor, including data collection and preparation, sample selection, feature representation, classification algorithm design, model evaluation, and method availability. Notably, we discuss the application of protein language models and transfer learning to solve the small-sample learning issue. We also highlight the prediction methods developed for functionally relevant PLA sites and species/substrate/cell-type-specific PLA sites. In conclusion, this systematic review could potentially facilitate the development of novel PLA predictors and offer useful insights to researchers from various disciplines.


Assuntos
Biologia Computacional , Lisina , Processamento de Proteína Pós-Traducional , Proteínas , Lisina/metabolismo , Lisina/química , Biologia Computacional/métodos , Proteínas/metabolismo , Proteínas/química , Acilação , Algoritmos , Humanos , Software , Bases de Dados de Proteínas
12.
Brief Bioinform ; 25(3)2024 Mar 27.
Artigo em Inglês | MEDLINE | ID: mdl-38581422

RESUMO

Reliable cell type annotations are crucial for investigating cellular heterogeneity in single-cell omics data. Although various computational approaches have been proposed for single-cell RNA sequencing (scRNA-seq) annotation, high-quality cell labels are still lacking in single-cell sequencing assay for transposase-accessible chromatin (scATAC-seq) data, because of extreme sparsity and inconsistent chromatin accessibility between datasets. Here, we present a novel automated cell annotation method that transfers cell type information from a well-labeled scRNA-seq reference to an unlabeled scATAC-seq target, via a parallel graph neural network, in a semi-supervised manner. Unlike existing methods that utilize only gene expression or gene activity features, HyGAnno leverages genome-wide accessibility peak features to facilitate the training process. In addition, HyGAnno reconstructs a reference-target cell graph to detect cells with low prediction reliability, according to their specific graph connectivity patterns. HyGAnno was assessed across various datasets, showcasing its strengths in precise cell annotation, generating interpretable cell embeddings, robustness to noisy reference data and adaptability to tumor tissues.


Assuntos
Cromatina , Redes Neurais de Computação , Reprodutibilidade dos Testes
13.
Brief Bioinform ; 25(5)2024 Jul 25.
Artigo em Inglês | MEDLINE | ID: mdl-39082648

RESUMO

Metabolic processes can transform a drug into metabolites with different properties that may affect its efficacy and safety. Therefore, investigation of the metabolic fate of a drug candidate is of great significance for drug discovery. Computational methods have been developed to predict drug metabolites, but most of them suffer from two main obstacles: the lack of model generalization due to restrictions on metabolic transformation rules or specific enzyme families, and high rate of false-positive predictions. Here, we presented MetaPredictor, a rule-free, end-to-end and prompt-based method to predict possible human metabolites of small molecules including drugs as a sequence translation problem. We innovatively introduced prompt engineering into deep language models to enrich domain knowledge and guide decision-making. The results showed that using prompts that specify the sites of metabolism (SoMs) can steer the model to propose more accurate metabolite predictions, achieving a 30.4% increase in recall and a 16.8% reduction in false positives over the baseline model. The transfer learning strategy was also utilized to tackle the limited availability of metabolic data. For the adaptation to automatic or non-expert prediction, MetaPredictor was designed as a two-stage schema consisting of automatic identification of SoMs followed by metabolite prediction. Compared to four available drug metabolite prediction tools, our method showed comparable performance on the major enzyme families and better generalization that could additionally identify metabolites catalyzed by less common enzymes. The results indicated that MetaPredictor could provide a more comprehensive and accurate prediction of drug metabolism through the effective combination of transfer learning and prompt-based learning strategies.


Assuntos
Simulação por Computador , Aprendizado Profundo , Humanos , Preparações Farmacêuticas/metabolismo , Preparações Farmacêuticas/química , Biologia Computacional/métodos , Descoberta de Drogas/métodos , Software , Algoritmos
14.
Brief Bioinform ; 25(2)2024 Jan 22.
Artigo em Inglês | MEDLINE | ID: mdl-38305455

RESUMO

Novel hypotheses in biomedical research are often developed or validated in model organisms such as mice and zebrafish and thus play a crucial role. However, due to biological differences between species, translating these findings into human applications remains challenging. Moreover, commonly used orthologous gene information is often incomplete and entails a significant information loss during gene-id conversion. To address these issues, we present a novel methodology for species-agnostic transfer learning with heterogeneous domain adaptation. We extended the cross-domain structure-preserving projection toward out-of-sample prediction. Our approach not only allows knowledge integration and translation across various species without relying on gene orthology but also identifies similar GO among the most influential genes composing the latent space for integration. Subsequently, during the alignment of latent spaces, each composed of species-specific genes, it is possible to identify functional annotations of genes missing from public orthology databases. We evaluated our approach with four different single-cell sequencing datasets focusing on cell-type prediction and compared it against related machine-learning approaches. In summary, the developed model outperforms related methods working without prior knowledge when predicting unseen cell types based on other species' data. The results demonstrate that our novel approach allows knowledge transfer beyond species barriers without the dependency on known gene orthology but utilizing the entire gene sets.


Assuntos
Algoritmos , Peixe-Zebra , Camundongos , Humanos , Animais , Peixe-Zebra/genética , Perfilação da Expressão Gênica , Especificidade da Espécie , Aprendizado de Máquina
15.
Brief Bioinform ; 25(4)2024 May 23.
Artigo em Inglês | MEDLINE | ID: mdl-39038937

RESUMO

Peptide drugs are becoming star drug agents with high efficiency and selectivity which open up new therapeutic avenues for various diseases. However, the sensitivity to hydrolase and the relatively short half-life have severely hindered their development. In this study, a new generation artificial intelligence-based system for accurate prediction of peptide half-life was proposed, which realized the half-life prediction of both natural and modified peptides and successfully bridged the evaluation possibility between two important species (human, mouse) and two organs (blood, intestine). To achieve this, enzymatic cleavage descriptors were integrated with traditional peptide descriptors to construct a better representation. Then, robust models with accurate performance were established by comparing traditional machine learning and transfer learning, systematically. Results indicated that enzymatic cleavage features could certainly enhance model performance. The deep learning model integrating transfer learning significantly improved predictive accuracy, achieving remarkable R2 values: 0.84 for natural peptides and 0.90 for modified peptides in human blood, 0.984 for natural peptides and 0.93 for modified peptides in mouse blood, and 0.94 for modified peptides in mouse intestine on the test set, respectively. These models not only successfully composed the above-mentioned system but also improved by approximately 15% in terms of correlation compared to related works. This study is expected to provide powerful solutions for peptide half-life evaluation and boost peptide drug development.


Assuntos
Peptídeos , Animais , Meia-Vida , Humanos , Camundongos , Peptídeos/metabolismo , Peptídeos/química , Aprendizado Profundo , Aprendizado de Máquina
16.
Proc Natl Acad Sci U S A ; 120(10): e2216894120, 2023 03 07.
Artigo em Inglês | MEDLINE | ID: mdl-36848555

RESUMO

Drought tolerance is a highly complex trait controlled by numerous interconnected pathways with substantial variation within and across plant species. This complexity makes it difficult to distill individual genetic loci underlying tolerance, and to identify core or conserved drought-responsive pathways. Here, we collected drought physiology and gene expression datasets across diverse genotypes of the C4 cereals sorghum and maize and searched for signatures defining water-deficit responses. Differential gene expression identified few overlapping drought-associated genes across sorghum genotypes, but using a predictive modeling approach, we found a shared core drought response across development, genotype, and stress severity. Our model had similar robustness when applied to datasets in maize, reflecting a conserved drought response between sorghum and maize. The top predictors are enriched in functions associated with various abiotic stress-responsive pathways as well as core cellular functions. These conserved drought response genes were less likely to contain deleterious mutations than other gene sets, suggesting that core drought-responsive genes are under evolutionary and functional constraints. Our findings support a broad evolutionary conservation of drought responses in C4 grasses regardless of innate stress tolerance, which could have important implications for developing climate resilient cereals.


Assuntos
Sorghum , Zea mays , Zea mays/genética , Sorghum/genética , Secas , Grão Comestível/genética , Poaceae
17.
Am J Hum Genet ; 109(11): 1998-2008, 2022 11 03.
Artigo em Inglês | MEDLINE | ID: mdl-36240765

RESUMO

As most existing genome-wide association studies (GWASs) were conducted in European-ancestry cohorts, and as the existing polygenic risk score (PRS) models have limited transferability across ancestry groups, PRS research on non-European-ancestry groups needs to make efficient use of available data until we attain large sample sizes across all ancestry groups. Here we propose a PRS method using transfer learning techniques. Our approach, TL-PRS, uses gradient descent to fine-tune the baseline PRS model from an ancestry group with large sample GWASs to the dataset of target ancestry. In our application of constructing PRS for seven quantitative and two dichotomous traits for 10,285 individuals of South Asian ancestry and 8,168 individuals of African ancestry in UK Biobank, TL-PRS using PRS-CS as a baseline method obtained 25% average relative improvement for South Asian samples and 29% for African samples compared to the standard PRS-CS method in terms of predicted R2. Our approach increases the transferability of PRSs across ancestries and thereby helps reduce existing inequities in genetics research.


Assuntos
Estudo de Associação Genômica Ampla , Herança Multifatorial , Humanos , Herança Multifatorial/genética , Predisposição Genética para Doença , Polimorfismo de Nucleotídeo Único/genética , Fatores de Risco , Aprendizado de Máquina
18.
Brief Bioinform ; 24(6)2023 09 22.
Artigo em Inglês | MEDLINE | ID: mdl-37870286

RESUMO

The advanced language models have enabled us to recognize protein-protein interactions (PPIs) and interaction sites using protein sequences or structures. Here, we trained the MindSpore ProteinBERT (MP-BERT) model, a Bidirectional Encoder Representation from Transformers, using protein pairs as inputs, making it suitable for identifying PPIs and their respective interaction sites. The pretrained model (MP-BERT) was fine-tuned as MPB-PPI (MP-BERT on PPI) and demonstrated its superiority over the state-of-the-art models on diverse benchmark datasets for predicting PPIs. Moreover, the model's capability to recognize PPIs among various organisms was evaluated on multiple organisms. An amalgamated organism model was designed, exhibiting a high level of generalization across the majority of organisms and attaining an accuracy of 92.65%. The model was also customized to predict interaction site propensity by fine-tuning it with PPI site data as MPB-PPISP. Our method facilitates the prediction of both PPIs and their interaction sites, thereby illustrating the potency of transfer learning in dealing with the protein pair task.


Assuntos
Aprendizado de Máquina , Proteínas , Proteínas/química , Sequência de Aminoácidos
19.
Brief Bioinform ; 24(1)2023 01 19.
Artigo em Inglês | MEDLINE | ID: mdl-36631408

RESUMO

The gut microbial communities are highly plastic throughout life, and the human gut microbial communities show spatial-temporal dynamic patterns at different life stages. However, the underlying association between gut microbial communities and time-related factors remains unclear. The lack of context-awareness, insufficient data, and the existence of batch effect are the three major issues, making the life trajection of the host based on gut microbial communities problematic. Here, we used a novel computational approach (microDELTA, microbial-based deep life trajectory) to track longitudinal human gut microbial communities' alterations, which employs transfer learning for context-aware mining of gut microbial community dynamics at different life stages. Using an infant cohort, we demonstrated that microDELTA outperformed Neural Network for accurately predicting the age of infant with different delivery mode, especially for newborn infants of vaginal delivery with the area under the receiver operating characteristic curve of microDELTA and Neural Network at 0.811 and 0.436, respectively. In this context, we have discovered the influence of delivery mode on infant gut microbial communities. Along the human lifespan, we also applied microDELTA to a Chinese traveler cohort, a Hadza hunter-gatherer cohort and an elderly cohort. Results revealed the association between long-term dietary shifts during travel and adult gut microbial communities, the seasonal cycling of gut microbial communities for the Hadza hunter-gatherers, and the distinctive microbial pattern of elderly gut microbial communities. In summary, microDELTA can largely solve the issues in tracing the life trajectory of the human microbial communities and generate accurate and flexible models for a broad spectrum of microbial-based longitudinal researches.


Assuntos
Aprendizado Profundo , Microbioma Gastrointestinal , Microbiota , Recém-Nascido , Lactente , Feminino , Humanos , Idoso , Dieta
20.
Brief Bioinform ; 24(2)2023 03 19.
Artigo em Inglês | MEDLINE | ID: mdl-36653906

RESUMO

Spatially resolved transcriptomics technologies enable comprehensive measurement of gene expression patterns in the context of intact tissues. However, existing technologies suffer from either low resolution or shallow sequencing depth. Here, we present DIST, a deep learning-based method that imputes the gene expression profiles on unmeasured locations and enhances the gene expression for both original measured spots and imputed spots by self-supervised learning and transfer learning. We evaluate the performance of DIST for imputation, clustering, differential expression analysis and functional enrichment analysis. The results show that DIST can impute the gene expression accurately, enhance the gene expression for low-quality data, help detect more biological meaningful differentially expressed genes and pathways, therefore allow for deeper insights into the biological processes.


Assuntos
Aprendizado Profundo , Transcriptoma , Perfilação da Expressão Gênica/métodos , Análise por Conglomerados
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA