Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 7 de 7
Filtrar
1.
Comput Struct Biotechnol J ; 21: 5839-5850, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-38074472

RESUMEN

Generative adversarial networks (GANs) have successfully generated functional protein sequences. However, traditional GANs often suffer from inherent randomness, resulting in a lower probability of obtaining desirable sequences. Due to the high cost of wet-lab experiments, the main goal of computer-aided antibody optimization is to identify high-quality candidate antibodies from a large range of possibilities, yet improving the ability of GANs to generate these desired antibodies is a challenge. In this study, we propose and evaluate a new GAN called the Language Model Guided Antibody Generative Adversarial Network (AbGAN-LMG). This GAN uses a language model as an input, harnessing such models' powerful representational capabilities to improve the GAN's generation of high-quality antibodies. We conducted a comprehensive evaluation of the antibody libraries and sequences generated by AbGAN-LMG for COVID-19 (SARS-CoV-2) and Middle East Respiratory Syndrome (MERS-CoV). Results indicate that AbGAN-LMG has learned the fundamental characteristics of antibodies and that it improved the diversity of the generated libraries. Additionally, when generating sequences using AZD-8895 as the target antibody for optimization, over 50% of the generated sequences exhibited better developability than AZD-8895 itself. Through molecular docking, we identified 70 antibodies that demonstrated higher affinity for the wild-type receptor-binding domain (RBD) of SARS-CoV-2 compared to AZD-8895. In conclusion, AbGAN-LMG demonstrates that language models used in conjunction with GANs can enable the generation of higher-quality libraries and candidate sequences, thereby improving the efficiency of antibody optimization. AbGAN-LMG is available at http://39.102.71.224:88/.

2.
BMC Bioinformatics ; 24(1): 486, 2023 Dec 19.
Artículo en Inglés | MEDLINE | ID: mdl-38114906

RESUMEN

BACKGROUND: Automatic and accurate extraction of diverse biomedical relations from literature is a crucial component of bio-medical text mining. Currently, stacking various classification networks on pre-trained language models to perform fine-tuning is a common framework to end-to-end solve the biomedical relation extraction (BioRE) problem. However, the sequence-based pre-trained language models underutilize the graphical topology of language to some extent. In addition, sequence-oriented deep neural networks have limitations in processing graphical features. RESULTS: In this paper, we propose a novel method for sentence-level BioRE task, BioEGRE (BioELECTRA and Graph pointer neural net-work for Relation Extraction), aimed at leveraging the linguistic topological features. First, the biomedical literature is preprocessed to retain sentences involving pre-defined entity pairs. Secondly, SciSpaCy is employed to conduct dependency parsing; sentences are modeled as graphs based on the parsing results; BioELECTRA is utilized to generate token-level representations, which are modeled as attributes of nodes in the sentence graphs; a graph pointer neural network layer is employed to select the most relevant multi-hop neighbors to optimize representations; a fully-connected neural network layer is employed to generate the sentence-level representation. Finally, the Softmax function is employed to calculate the probabilities. Our proposed method is evaluated on three BioRE tasks: a multi-class (CHEMPROT) and two binary tasks (GAD and EU-ADR). The results show that our method achieves F1-scores of 79.97% (CHEMPROT), 83.31% (GAD), and 83.51% (EU-ADR), surpassing the performance of existing state-of-the-art models. CONCLUSION: The experimental results on 3 biomedical benchmark datasets demonstrate the effectiveness and generalization of BioEGRE, which indicates that linguistic topology and a graph pointer neural network layer explicitly improve performance for BioRE tasks.


Asunto(s)
Lenguaje , Redes Neurales de la Computación , Minería de Datos , Lingüística , Procesamiento de Lenguaje Natural
3.
MAbs ; 15(1): 2285904, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-38010801

RESUMEN

Prior research has generated a vast amount of antibody sequences, which has allowed the pre-training of language models on amino acid sequences to improve the efficiency of antibody screening and optimization. However, compared to those for proteins, there are fewer pre-trained language models available for antibody sequences. Additionally, existing pre-trained models solely rely on embedding representations using amino acids or k-mers, which do not explicitly take into account the role of secondary structure features. Here, we present a new pre-trained model called BERT2DAb. This model incorporates secondary structure information based on self-attention to learn representations of antibody sequences. Our model achieves state-of-the-art performance on three downstream tasks, including two antigen-antibody binding classification tasks (precision: 85.15%/94.86%; recall:87.41%/86.15%) and one antigen-antibody complex mutation binding free energy prediction task (Pearson correlation coefficient: 0.77). Moreover, we propose a novel method to analyze the relationship between attention weights and contact states of pairs of subsequences in tertiary structures. This enhances the interpretability of BERT2DAb. Overall, our model demonstrates strong potential for improving antibody screening and design through downstream applications.


Asunto(s)
Aminoácidos , Proteínas , Secuencia de Aminoácidos , Proteínas/química , Aminoácidos/química , Estructura Secundaria de Proteína , Anticuerpos
4.
BMC Bioinformatics ; 23(1): 501, 2022 Nov 22.
Artículo en Inglés | MEDLINE | ID: mdl-36418937

RESUMEN

BACKGROUND: Automatic and accurate recognition of various biomedical named entities from literature is an important task of biomedical text mining, which is the foundation of extracting biomedical knowledge from unstructured texts into structured formats. Using the sequence labeling framework and deep neural networks to implement biomedical named entity recognition (BioNER) is a common method at present. However, the above method often underutilizes syntactic features such as dependencies and topology of sentences. Therefore, it is an urgent problem to be solved to integrate semantic and syntactic features into the BioNER model. RESULTS: In this paper, we propose a novel biomedical named entity recognition model, named BioByGANS (BioBERT/SpaCy-Graph Attention Network-Softmax), which uses a graph to model the dependencies and topology of a sentence and formulate the BioNER task as a node classification problem. This formulation can introduce more topological features of language and no longer be only concerned about the distance between words in the sequence. First, we use periods to segment sentences and spaces and symbols to segment words. Second, contextual features are encoded by BioBERT, and syntactic features such as part of speeches, dependencies and topology are preprocessed by SpaCy respectively. A graph attention network is then used to generate a fusing representation considering both the contextual features and syntactic features. Last, a softmax function is used to calculate the probabilities and get the results. We conduct experiments on 8 benchmark datasets, and our proposed model outperforms existing BioNER state-of-the-art methods on the BC2GM, JNLPBA, BC4CHEMD, BC5CDR-chem, BC5CDR-disease, NCBI-disease, Species-800, and LINNAEUS datasets, and achieves F1-scores of 85.15%, 78.16%, 92.97%, 94.74%, 87.74%, 91.57%, 75.01%, 90.99%, respectively. CONCLUSION: The experimental results on 8 biomedical benchmark datasets demonstrate the effectiveness of our model, and indicate that formulating the BioNER task into a node classification problem and combining syntactic features into the graph attention networks can significantly improve model performance.


Asunto(s)
Lenguaje , Semántica , Habla , Conocimiento , Benchmarking
5.
Annu Int Conf IEEE Eng Med Biol Soc ; 2021: 2119-2122, 2021 11.
Artículo en Inglés | MEDLINE | ID: mdl-34891707

RESUMEN

To realize integration, organization and reusability of knowledge related to COVID-19, an ontology for COVID-19 (CIDO-COVID-19) was constructed which extended the Coronavirus Infectious Disease Ontology (CIDO) by adding terms of COVID-19 related to symptoms, prevention, drugs and clinical domains. First, terms from the existing ontologies, literature, clinical guidelines and other resources about COVID-19 were merged. Then, the Stanford seven-step approach was used to define and organize the acquired terms. Finally, the CIDO-COVID-19 was built on basis of the terms mentioned above using Protégé. The CIDO-COVID-19 is a more comprehensive ontology for COVID-19, covering multiple areas in the domain of COVID-19, including disease, diagnosis, etiology, virus, transmission, symptom, treatment, drug and prevention.Clinical Relevance- The CIDO-COVID-19 covers multiple areas related to COVID-19, including diseases, diagnosis, etiology, virus, transmission, symptoms, treatment, drugs, prevention. Compared with the CIDO, it is expanded to cover drugs, prevention, and clinical domain. The definition of terms in CIDO-COVID-19 refers to biomedical ontologies, Clinical glossaries and clinical guidelines for COVID-19, which can provide clinicians with standard terminology in the clinical domain.


Asunto(s)
COVID-19 , Enfermedades Transmisibles , Humanos , SARS-CoV-2
6.
J Biomed Inform ; 119: 103836, 2021 07.
Artículo en Inglés | MEDLINE | ID: mdl-34116253

RESUMEN

The technique of information retrieval has been widely used in electronic medical record (EMR) systems. It's a pity that most existing methods have not considered the structures and language features of Chinese EMRs, which affects the performance of retrieval. To improve accuracy and comprehensiveness, we propose an improved algorithm of Chinese EMR retrieval. First, the weights of fields in Chinese EMRs are assigned based on the corresponding importance in clinical applications. Second, negative relations in EMRs are detected, and the retrieval scores of negative terms are adjusted accordingly. Third, the retrieval results are re-ranked by expansion terms and time information to enhance the recall without decreasing precision. Experiment results show that the improved algorithm increases the precision and recall significantly, which shows that the algorithm takes a full account of the characteristics of Chinese EMRs and fits the needs for clinical applications.


Asunto(s)
Registros Electrónicos de Salud , Lenguaje , Algoritmos , China , Almacenamiento y Recuperación de la Información
7.
RSC Adv ; 9(27): 15238-15245, 2019 May 14.
Artículo en Inglés | MEDLINE | ID: mdl-35514847

RESUMEN

Nanocomposites composed by polymeric matrix with micro/nano fillers have drawn lots of attention since their dramatic properties beyond pristine polymers. The spatial distribution of the micro/nano fillers in the polymeric matrix determines the final desired properties of the nanocomposites, thus deserves to investigate. Here, we proposed an effective method of assembling the micro/nano fillers to pre-designed patterns within the polymeric matrix by AC-electro-field-assisted aligning. By pre-designed AC electric fields which could be dynamically controllable, the distribution of microparticles (acting as fillers) in the matrix was tuned to various patterns related to the electric fields, such as linear alignment and circular alignment. The field-oriented particles chains could act as endoskeletal structures, showing unique properties (i.e., mechanical, optical, and anisotropic properties) beyond those of the conventional composites with randomly distributed particles.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA