Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 42
Filtrar
Mais filtros

Base de dados
País/Região como assunto
Tipo de documento
País de afiliação
Intervalo de ano de publicação
1.
Nature ; 620(7972): 47-60, 2023 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-37532811

RESUMO

Artificial intelligence (AI) is being increasingly integrated into scientific discovery to augment and accelerate research, helping scientists to generate hypotheses, design experiments, collect and interpret large datasets, and gain insights that might not have been possible using traditional scientific methods alone. Here we examine breakthroughs over the past decade that include self-supervised learning, which allows models to be trained on vast amounts of unlabelled data, and geometric deep learning, which leverages knowledge about the structure of scientific data to enhance model accuracy and efficiency. Generative AI methods can create designs, such as small-molecule drugs and proteins, by analysing diverse data modalities, including images and sequences. We discuss how these methods can help scientists throughout the scientific process and the central issues that remain despite such advances. Both developers and users of AI toolsneed a better understanding of when such approaches need improvement, and challenges posed by poor data quality and stewardship remain. These issues cut across scientific disciplines and require developing foundational algorithmic approaches that can contribute to scientific understanding or acquire it autonomously, making them critical areas of focus for AI innovation.


Assuntos
Inteligência Artificial , Projetos de Pesquisa , Inteligência Artificial/normas , Inteligência Artificial/tendências , Conjuntos de Dados como Assunto , Aprendizado Profundo , Projetos de Pesquisa/normas , Projetos de Pesquisa/tendências , Aprendizado de Máquina não Supervisionado
2.
Brief Bioinform ; 24(1)2023 01 19.
Artigo em Inglês | MEDLINE | ID: mdl-36592061

RESUMO

Drug-drug interaction (DDI) prediction identifies interactions of drug combinations in which the adverse side effects caused by the physicochemical incompatibility have attracted much attention. Previous studies usually model drug information from single or dual views of the whole drug molecules but ignore the detailed interactions among atoms, which leads to incomplete and noisy information and limits the accuracy of DDI prediction. In this work, we propose a novel dual-view drug representation learning network for DDI prediction ('DSN-DDI'), which employs local and global representation learning modules iteratively and learns drug substructures from the single drug ('intra-view') and the drug pair ('inter-view') simultaneously. Comprehensive evaluations demonstrate that DSN-DDI significantly improved performance on DDI prediction for the existing drugs by achieving a relatively improved accuracy of 13.01% and an over 99% accuracy under the transductive setting. More importantly, DSN-DDI achieves a relatively improved accuracy of 7.07% to unseen drugs and shows the usefulness for real-world DDI applications. Finally, DSN-DDI exhibits good transferability on synergistic drug combination prediction and thus can serve as a generalized framework in the drug discovery field.


Assuntos
Efeitos Colaterais e Reações Adversas Relacionados a Medicamentos , Humanos , Interações Medicamentosas , Descoberta de Drogas , Biologia Computacional
3.
Brief Bioinform ; 24(6)2023 09 22.
Artigo em Inglês | MEDLINE | ID: mdl-37903413

RESUMO

Accurate prediction of drug-target affinity (DTA) is of vital importance in early-stage drug discovery, facilitating the identification of drugs that can effectively interact with specific targets and regulate their activities. While wet experiments remain the most reliable method, they are time-consuming and resource-intensive, resulting in limited data availability that poses challenges for deep learning approaches. Existing methods have primarily focused on developing techniques based on the available DTA data, without adequately addressing the data scarcity issue. To overcome this challenge, we present the Semi-Supervised Multi-task training (SSM) framework for DTA prediction, which incorporates three simple yet highly effective strategies: (1) A multi-task training approach that combines DTA prediction with masked language modeling using paired drug-target data. (2) A semi-supervised training method that leverages large-scale unpaired molecules and proteins to enhance drug and target representations. This approach differs from previous methods that only employed molecules or proteins in pre-training. (3) The integration of a lightweight cross-attention module to improve the interaction between drugs and targets, further enhancing prediction accuracy. Through extensive experiments on benchmark datasets such as BindingDB, DAVIS and KIBA, we demonstrate the superior performance of our framework. Additionally, we conduct case studies on specific drug-target binding activities, virtual screening experiments, drug feature visualizations and real-world applications, all of which showcase the significant potential of our work. In conclusion, our proposed SSM-DTA framework addresses the data limitation challenge in DTA prediction and yields promising results, paving the way for more efficient and accurate drug discovery processes.


Assuntos
Benchmarking , Descoberta de Drogas , Sistemas de Liberação de Medicamentos
4.
Brief Bioinform ; 24(1)2023 01 19.
Artigo em Inglês | MEDLINE | ID: mdl-36573491

RESUMO

Precisely predicting the drug-drug interaction (DDI) is an important application and host research topic in drug discovery, especially for avoiding the adverse effect when using drug combination treatment for patients. Nowadays, machine learning and deep learning methods have achieved great success in DDI prediction. However, we notice that most of the works ignore the importance of the relation type when building the DDI prediction models. In this work, we propose a novel R$^2$-DDI framework, which introduces a relation-aware feature refinement module for drug representation learning. The relation feature is integrated into drug representation and refined in the framework. With the refinement features, we also incorporate the consistency training method to regularize the multi-branch predictions for better generalization. Through extensive experiments and studies, we demonstrate our R$^2$-DDI approach can significantly improve the DDI prediction performance over multiple real-world datasets and settings, and our method shows better generalization ability with the help of the feature refinement design.


Assuntos
Efeitos Colaterais e Reações Adversas Relacionados a Medicamentos , Humanos , Interações Medicamentosas , Aprendizado de Máquina , Descoberta de Drogas
5.
Brief Bioinform ; 23(6)2022 11 19.
Artigo em Inglês | MEDLINE | ID: mdl-36156661

RESUMO

Pre-trained language models have attracted increasing attention in the biomedical domain, inspired by their great success in the general natural language domain. Among the two main branches of pre-trained language models in the general language domain, i.e. BERT (and its variants) and GPT (and its variants), the first one has been extensively studied in the biomedical domain, such as BioBERT and PubMedBERT. While they have achieved great success on a variety of discriminative downstream biomedical tasks, the lack of generation ability constrains their application scope. In this paper, we propose BioGPT, a domain-specific generative Transformer language model pre-trained on large-scale biomedical literature. We evaluate BioGPT on six biomedical natural language processing tasks and demonstrate that our model outperforms previous models on most tasks. Especially, we get 44.98%, 38.42% and 40.76% F1 score on BC5CDR, KD-DTI and DDI end-to-end relation extraction tasks, respectively, and 78.2% accuracy on PubMedQA, creating a new record. Our case study on text generation further demonstrates the advantage of BioGPT on biomedical literature to generate fluent descriptions for biomedical terms.


Assuntos
Mineração de Dados , Processamento de Linguagem Natural
6.
Brief Bioinform ; 23(5)2022 09 20.
Artigo em Inglês | MEDLINE | ID: mdl-35514186

RESUMO

The identification of active binding drugs for target proteins (referred to as drug-target interaction prediction) is the key challenge in virtual screening, which plays an essential role in drug discovery. Although recent deep learning-based approaches achieve better performance than molecular docking, existing models often neglect topological or spatial of intermolecular information, hindering prediction performance. We recognize this problem and propose a novel approach called the Intermolecular Graph Transformer (IGT) that employs a dedicated attention mechanism to model intermolecular information with a three-way Transformer-based architecture. IGT outperforms state-of-the-art (SoTA) approaches by 9.1% and 20.5% over the second best option for binding activity and binding pose prediction, respectively, and exhibits superior generalization ability to unseen receptor proteins than SoTA approaches. Furthermore, IGT exhibits promising drug screening ability against severe acute respiratory syndrome coronavirus 2 by identifying 83.1% active drugs that have been validated by wet-lab experiments with near-native predicted binding poses. Source code and datasets are available at https://github.com/microsoft/IGT-Intermolecular-Graph-Transformer.


Assuntos
Algoritmos , COVID-19 , Humanos , Simulação de Acoplamento Molecular , Proteínas/química , Software
7.
Brief Bioinform ; 23(6)2022 11 19.
Artigo em Inglês | MEDLINE | ID: mdl-36136367

RESUMO

Well understanding protein function and structure in computational biology helps in the understanding of human beings. To face the limited proteins that are annotated structurally and functionally, the scientific community embraces the self-supervised pre-training methods from large amounts of unlabeled protein sequences for protein embedding learning. However, the protein is usually represented by individual amino acids with limited vocabulary size (e.g. 20 type proteins), without considering the strong local semantics existing in protein sequences. In this work, we propose a novel pre-training modeling approach SPRoBERTa. We first present an unsupervised protein tokenizer to learn protein representations with local fragment pattern. Then, a novel framework for deep pre-training model is introduced to learn protein embeddings. After pre-training, our method can be easily fine-tuned for different protein tasks, including amino acid-level prediction task (e.g. secondary structure prediction), amino acid pair-level prediction task (e.g. contact prediction) and also protein-level prediction task (remote homology prediction, protein function prediction). Experiments show that our approach achieves significant improvements in all tasks and outperforms the previous methods. We also provide detailed ablation studies and analysis for our protein tokenizer and training framework.


Assuntos
Biologia Computacional , Proteínas , Humanos , Proteínas/química , Biologia Computacional/métodos , Sequência de Aminoácidos , Estrutura Secundária de Proteína , Aminoácidos
9.
Bioinformatics ; 38(22): 5100-5107, 2022 11 15.
Artigo em Inglês | MEDLINE | ID: mdl-36205562

RESUMO

MOTIVATION: The interaction between drugs and targets (DTI) in human body plays a crucial role in biomedical science and applications. As millions of papers come out every year in the biomedical domain, automatically discovering DTI knowledge from biomedical literature, which are usually triplets about drugs, targets and their interaction, becomes an urgent demand in the industry. Existing methods of discovering biological knowledge are mainly extractive approaches that often require detailed annotations (e.g. all mentions of biological entities, relations between every two entity mentions, etc.). However, it is difficult and costly to obtain sufficient annotations due to the requirement of expert knowledge from biomedical domains. RESULTS: To overcome these difficulties, we explore an end-to-end solution for this task by using generative approaches. We regard the DTI triplets as a sequence and use a Transformer-based model to directly generate them without using the detailed annotations of entities and relations. Further, we propose a semi-supervised method, which leverages the aforementioned end-to-end model to filter unlabeled literature and label them. Experimental results show that our method significantly outperforms extractive baselines on DTI discovery. We also create a dataset, KD-DTI, to advance this task and release it to the community. AVAILABILITY AND IMPLEMENTATION: Our code and data are available at https://github.com/bert-nmt/BERT-DTI. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Publicações , Software , Humanos , Interações Medicamentosas
10.
J Chem Phys ; 159(3)2023 Jul 21.
Artigo em Inglês | MEDLINE | ID: mdl-37458355

RESUMO

Machine learning force fields (MLFFs) have gained popularity in recent years as they provide a cost-effective alternative to ab initio molecular dynamics (MD) simulations. Despite a small error on the test set, MLFFs inherently suffer from generalization and robustness issues during MD simulations. To alleviate these issues, we propose global force metrics and fine-grained metrics from element and conformation aspects to systematically measure MLFFs for every atom and every conformation of molecules. We selected three state-of-the-art MLFFs (ET, NequIP, and ViSNet) and comprehensively evaluated on aspirin, Ac-Ala3-NHMe, and Chignolin MD datasets with the number of atoms ranging from 21 to 166. Driven by the trained MLFFs on these molecules, we performed MD simulations from different initial conformations, analyzed the relationship between the force metrics and the stability of simulation trajectories, and investigated the reason for collapsed simulations. Finally, the performance of MLFFs and the stability of MD simulations can be further improved guided by the proposed force metrics for model training, specifically training MLFF models with these force metrics as loss functions, fine-tuning by reweighting samples in the original dataset, and continued training by recruiting additional unexplored data.

11.
Bioinformatics ; 37(22): 4075-4082, 2021 11 18.
Artigo em Inglês | MEDLINE | ID: mdl-34042965

RESUMO

MOTIVATION: Gradient descent-based protein modeling is a popular protein structure prediction approach that takes as input the predicted inter-residue distances and other necessary constraints and folds protein structures by minimizing protein-specific energy potentials. The constraints from multiple predicted protein properties provide redundant and sometime conflicting information that can trap the optimization process into local minima and impairs the modeling efficiency. RESULTS: To address these issues, we developed a self-adaptive protein modeling framework, SAMF. It eliminates redundancy of constraints and resolves conflicts, folds protein structures in an iterative way, and picks up the best structures by a deep quality analysis system. Without a large amount of complicated domain knowledge and numerous patches as barriers, SAMF achieves the state-of-the-art performance by exploiting the power of cutting-edge techniques of deep learning. SAMF has a modular design and can be easily customized and extended. As the quality of input constraints is ever growing, the superiority of SAMF will be amplified over time. AVAILABILITY AND IMPLEMENTATION: The source code and data for reproducing the results is available at https://msracb.blob.core.windows.net/pub/psp/SAMF.zip. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Proteínas , Software , Proteínas/metabolismo
12.
Environ Sci Technol ; 56(14): 9903-9914, 2022 07 19.
Artigo em Inglês | MEDLINE | ID: mdl-35793558

RESUMO

Accurate timely estimation of emissions of nitrogen oxides (NOx) is a prerequisite for designing an effective strategy for reducing O3 and PM2.5 pollution. The satellite-based top-down method can provide near-real-time constraints on emissions; however, its efficiency is largely limited by efforts in dealing with the complex emission-concentration response. Here, we propose a novel machine-learning-based method using a physically informed variational autoencoder (VAE) emission predictor to infer NOx emissions from satellite-retrieved surface NO2 concentrations. The computational burden can be significantly reduced with the help of a neural network trained with a chemical transport model, allowing the VAE emission predictor to provide a timely estimation of posterior emissions based on the satellite-retrieved surface NO2 concentration. The VAE emission predictor successfully corrected the underestimation of NOx emissions in rural areas and the overestimation in urban areas, resulting in smaller normalized mean biases (reduced from -0.8 to -0.4) and larger R2 values (increased from 0.4 to 0.7). The interpretability of the VAE emission predictor was investigated using sensitivity analysis by modulating each feature, indicating that NO2 concentration and planetary boundary layer (PBL) height are important for estimating NOx emissions, which is consistent with our common knowledge. The advantages of the VAE emission predictor in efficiency, flexibility, and accuracy demonstrate its great potential in estimating the latest emissions and evaluating the control effectiveness from observations.


Assuntos
Poluentes Atmosféricos , Poluição do Ar , Poluentes Atmosféricos/análise , Poluição do Ar/análise , Redes Neurais de Computação , Óxido Nítrico/análise , Dióxido de Nitrogênio/análise , Óxidos de Nitrogênio/análise , Emissões de Veículos/análise
13.
Atmos Res ; 265: 1-11, 2022 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-34857979

RESUMO

Fast and accurate prediction of ambient ozone (O3) formed from atmospheric photochemical processes is crucial for designing effective O3 pollution control strategies in the context of climate change. The chemical transport model (CTM) is the fundamental tool for O3 prediction and policy design, however, existing CTM-based approaches are computationally expensive, and resource burdens limit their usage and effectiveness in air quality management. Here we proposed a novel method (noted as DeepCTM) that using deep learning to mimic CTM simulations to improve the computational efficiency of photochemical modeling. The well-trained DeepCTM successfully reproduces CTM-simulated O3 concentration using input features of precursor emissions, meteorological factors, and initial conditions. The advantage of the DeepCTM is its high efficiency in identifying the dominant contributors to O3 formation and quantifying the O3 response to variations in emissions and meteorology. The emission-meteorology-concentration linkages implied by the DeepCTM are consistent with known mechanisms of atmospheric chemistry, indicating that the DeepCTM is also scientifically reasonable. The DeepCTM application in China suggests that O3 concentrations are strongly influenced by the initialized O3 concentration, as well as emission and meteorological factors during daytime when O3 is formed photochemically. The variation of meteorological factors such as short-wave radiation can also significantly modulate the O3 chemistry. The DeepCTM developed in this study exhibits great potential for efficiently representing the complex atmospheric system and can provide policymakers with urgently needed information for designing effective control strategies to mitigate O3 pollution.

14.
BMC Bioinformatics ; 22(1): 351, 2021 Jun 28.
Artigo em Inglês | MEDLINE | ID: mdl-34182922

RESUMO

BACKGROUND: Fragment libraries play a key role in fragment-assembly based protein structure prediction, where protein fragments are assembled to form a complete three-dimensional structure. Rich and accurate structural information embedded in fragment libraries has not been systematically extracted and used beyond fragment assembly. METHODS: To better leverage the valuable structural information for protein structure prediction, we extracted seven types of structural information from fragment libraries. We broadened the usage of such structural information by transforming fragment libraries into protein-specific potentials for gradient-descent based protein folding and encoding fragment libraries as structural features for protein property prediction. RESULTS: Fragment libraires improved the accuracy of protein folding and outperformed state-of-the-art algorithms with respect to predicted properties, such as torsion angles and inter-residue distances. CONCLUSION: Our work implies that the rich structural information extracted from fragment libraries can complement sequence-derived features to help protein structure prediction.


Assuntos
Algoritmos , Proteínas , Dobramento de Proteína , Proteínas/genética
15.
New Phytol ; 229(5): 2970-2983, 2021 03.
Artigo em Inglês | MEDLINE | ID: mdl-33111313

RESUMO

In grasses, two types of phased, small interfering RNAs (phasiRNAs) are expressed largely in young, developing anthers. They are 21 or 24 nucleotides (nt) in length and are triggered by miR2118 or miR2275, respectively. However, most of their functions and activities are not fully understood. We performed comparative genomic analysis of their source loci (PHAS) in five Oryza genomes and combined this with analysis of high-throughput sRNA and degradome datasets. In total, we identified 8216 21-PHAS and 626 24-PHAS loci. Local tandem and segmental duplications mainly contributed to the expansion and supercluster distribution of the 21-PHAS loci. Despite their relatively conserved genomic positions, PHAS sequences diverged rapidly, except for the miR2118/2275 target sites, which were under strong selection for conservation. We found that 21-nt phasiRNAs with a 5'-terminal uridine (U) demonstrated cis-cleavage at PHAS precursors, and these cis-acting sites were also variable among close species. miR2118 could trigger phasiRNA production from its own antisense transcript and the derived phasiRNAs might reversibly regulate miR2118 precursors. We hypothesised that successful initiation of phasiRNA biogenesis is conservatively maintained, while phasiRNA products diverged quickly and are not individually conserved. In particular, phasiRNA production is under the control of multiple reciprocal regulation mechanisms.


Assuntos
MicroRNAs , Oryza , Regulação da Expressão Gênica de Plantas , MicroRNAs/genética , Oryza/genética , Poaceae/genética , RNA de Plantas/genética , RNA Interferente Pequeno/genética
16.
Plant Cell ; 30(8): 1729-1744, 2018 08.
Artigo em Inglês | MEDLINE | ID: mdl-29967288

RESUMO

Centromeres are dynamic chromosomal regions, and the genetic and epigenetic environment of the centromere is often regarded as oppressive to protein-coding genes. Here, we used comparative genomic and phylogenomic approaches to study the evolution of centromeres and centromere-linked genes in the genus Oryza We report a 12.4-Mb high-quality BAC-based pericentromeric assembly for Oryza brachyantha, which diverged from cultivated rice (Oryza sativa) ∼15 million years ago. The synteny analyses reveal seven medium (>50 kb) pericentric inversions in O. sativa and 10 in O. brachyantha Of these inversions, three resulted in centromere movement (Chr1, Chr7, and Chr9). Additionally, we identified a potential centromere-repositioning event, in which the ancestral centromere on chromosome 12 in O. brachyantha jumped ∼400 kb away, possibly mediated by a duplicated transposition event (>28 kb). More strikingly, we observed an excess of syntenic gene loss at and near the centromeric regions (P < 2.2 × 10-16). Most (33/47) of the missing genes moved to other genomic regions; therefore such excess could be explained by the selective loss of the copy in or near centromeric regions after gene duplication. The pattern of gene loss immediately adjacent to centromeric regions suggests centromere chromatin dynamics (e.g., spreading or microrepositioning) may drive such gene loss.


Assuntos
Centrômero/genética , Oryza/genética , Cromatina/genética , Cromossomos de Plantas/genética , Duplicação Gênica/genética , Genoma de Planta/genética
17.
Environ Sci Technol ; 54(14): 8589-8600, 2020 07 21.
Artigo em Inglês | MEDLINE | ID: mdl-32551547

RESUMO

Efficient prediction of the air quality response to emission changes is a prerequisite for an integrated assessment system in developing effective control policies. Yet, representing the nonlinear response of air quality to emission controls with accuracy remains a major barrier in air quality-related decision making. Here, we demonstrate a novel method that combines deep learning approaches with chemical indicators of pollutant formation to quickly estimate the coefficients of air quality response functions using ambient concentrations of 18 chemical indicators simulated with a comprehensive atmospheric chemical transport model (CTM). By requiring only two CTM simulations for model application, the new method significantly enhances the computational efficiency compared to existing methods that achieve lower accuracy despite requiring 20+ CTM simulations (the benchmark statistical model). Our results demonstrate the utility of deep learning approaches for capturing the nonlinearity of atmospheric chemistry and physics and the prospects of the new method to support effective policymaking in other environment systems.


Assuntos
Poluentes Atmosféricos , Poluição do Ar , Aprendizado Profundo , Poluentes Atmosféricos/análise , Poluição do Ar/análise , Monitoramento Ambiental , Modelos Estatísticos
18.
Stroke ; 46(10): 2822-9, 2015 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-26286544

RESUMO

BACKGROUND AND PURPOSE: Although recent trials have suggested that stenting is worse than medical therapy for patients with severe symptomatic intracranial atherosclerotic stenosis, it is not clear whether this conclusion applies to a subset of patients with hypoperfusion symptoms. To justify for a new trial in China, we performed a multicenter prospective registry study to evaluate the safety and efficacy of endovascular stenting within 30 days for patients with severe symptomatic intracranial atherosclerotic stenosis. METHODS: Patients with symptomatic intracranial atherosclerotic stenosis caused by 70% to 99% stenosis combined with poor collaterals were enrolled. The patients were treated either with balloon-mounted stent or with balloon predilation plus self-expanding stent as determined by the operators following a guideline. The primary outcome within 30 days is stroke, transient ischemic attack, and death after stenting. The secondary outcome is successful revascularization. The baseline characteristics and outcomes of the 2 treatment groups were compared. RESULTS: From September 2013 to January 2015, among 354 consecutive patients, 300 patients (aged 58.3±9.78 years) were recruited, including 159 patients treated with balloon-mounted stent and 141 patients with balloon plus self-expanding stent. The 30-day rate of stroke, transient ischemic attack, and death was 4.3%. Successful revascularization was 97.3%. Patients treated with balloon-mounted stent were older, less likely to have middle cerebral artery lesions, more likely to have vertebral artery lesions, more likely to have Mori A lesions, less likely to have Mori C lesions, and likely to have lower degree of residual stenosis than patients treated with balloon plus self-expanding stent. CONCLUSIONS: The short-term safety and efficacy of endovascular stenting for patients with severe symptomatic intracranial atherosclerotic stenosis in China is acceptable. Balloon-mounted stent may have lower degree of residual stenosis than self-expanding stent. CLINICAL TRIAL REGISTRATION: URL: http://www.clinicaltrials.gov. Unique identifier: NCT01968122.


Assuntos
Procedimentos Endovasculares/instrumentação , Procedimentos Endovasculares/métodos , Arteriosclerose Intracraniana/cirurgia , China , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , Sistema de Registros , Stents
19.
Neural Netw ; 176: 106354, 2024 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-38723308

RESUMO

Neural operators, as a powerful approximation to the non-linear operators between infinite-dimensional function spaces, have proved to be promising in accelerating the solution of partial differential equations (PDE). However, it requires a large amount of simulated data, which can be costly to collect. This can be avoided by learning physics from the physics-constrained loss, which we refer to it as mean squared residual (MSR) loss constructed by the discretized PDE. We investigate the physical information in the MSR loss, which we called long-range entanglements, and identify the challenge that the neural network requires the capacity to model the long-range entanglements in the spatial domain of the PDE, whose patterns vary in different PDEs. To tackle the challenge, we propose LordNet, a tunable and efficient neural network for modeling various entanglements. Inspired by the traditional solvers, LordNet models the long-range entanglements with a series of matrix multiplications, which can be seen as the low-rank approximation to the general fully-connected layers and extracts the dominant pattern with reduced computational cost. The experiments on solving Poisson's equation and (2D and 3D) Navier-Stokes equation demonstrate that the long-range entanglements from the MSR loss can be well modeled by the LordNet, yielding better accuracy and generalization ability than other neural networks. The results show that the Lordnet can be 40× faster than traditional PDE solvers. In addition, LordNet outperforms other modern neural network architectures in accuracy and efficiency with the smallest parameter size.


Assuntos
Redes Neurais de Computação , Simulação por Computador , Algoritmos , Dinâmica não Linear
20.
IEEE Trans Pattern Anal Mach Intell ; 46(8): 5763-5778, 2024 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-38421846

RESUMO

Randomness is widely introduced in neural network training to simplify model optimization or avoid the over-fitting problem. Among them, dropout and its variations in different aspects (e.g., data, model structure) are prevalent in regularizing the training of deep neural networks. Though effective and performing well, the randomness introduced by these dropout-based methods causes nonnegligible inconsistency between training and inference. In this paper, we introduce a simple consistency training strategy to regularize such randomness, namely R-Drop, which forces two output distributions sampled by each type of randomness to be consistent. Specifically, R-Drop minimizes the bidirectional KL-divergence between two output distributions produced by dropout-based randomness for each training sample. Theoretical analysis reveals that R-Drop can reduce the above inconsistency by reducing the inconsistency among the sampled sub structures and bridging the gap between the loss calculated by the full model and sub structures. Experiments on 7 widely-used deep learning tasks ( 23 datasets in total) demonstrate that R-Drop is universally effective for different types of neural networks (i.e., feed-forward, recurrent, and graph neural networks) and different learning paradigms (supervised, parameter-efficient, and semi-supervised). In particular, it achieves state-of-the-art performances with the vanilla Transformer model on WMT14 English → German translation ( 30.91 BLEU) and WMT14 English → French translation ( 43.95 BLEU), even surpassing models trained with extra large-scale data and expert-designed advanced variants of Transformer models.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA