Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 7.836
Filtrar
Más filtros

Intervalo de año de publicación
1.
Cell ; 187(3): 526-544, 2024 Feb 01.
Artículo en Inglés | MEDLINE | ID: mdl-38306980

RESUMEN

Methods from artificial intelligence (AI) trained on large datasets of sequences and structures can now "write" proteins with new shapes and molecular functions de novo, without starting from proteins found in nature. In this Perspective, I will discuss the state of the field of de novo protein design at the juncture of physics-based modeling approaches and AI. New protein folds and higher-order assemblies can be designed with considerable experimental success rates, and difficult problems requiring tunable control over protein conformations and precise shape complementarity for molecular recognition are coming into reach. Emerging approaches incorporate engineering principles-tunability, controllability, and modularity-into the design process from the beginning. Exciting frontiers lie in deconstructing cellular functions with de novo proteins and, conversely, constructing synthetic cellular signaling from the ground up. As methods improve, many more challenges are unsolved.


Asunto(s)
Inteligencia Artificial , Proteínas , Conformación Proteica , Proteínas/química , Proteínas/metabolismo , Ingeniería de Proteínas , Aprendizaje Profundo
2.
Trends Genet ; 40(5): 383-386, 2024 May.
Artículo en Inglés | MEDLINE | ID: mdl-38637270

RESUMEN

Artificial intelligence (AI) in omics analysis raises privacy threats to patients. Here, we briefly discuss risk factors to patient privacy in data sharing, model training, and release, as well as methods to safeguard and evaluate patient privacy in AI-driven omics methods.


Asunto(s)
Inteligencia Artificial , Genómica , Humanos , Genómica/métodos , Privacidad , Difusión de la Información
3.
Trends Genet ; 2024 Aug 07.
Artículo en Inglés | MEDLINE | ID: mdl-39117482

RESUMEN

Harnessing cutting-edge technologies to enhance crop productivity is a pivotal goal in modern plant breeding. Artificial intelligence (AI) is renowned for its prowess in big data analysis and pattern recognition, and is revolutionizing numerous scientific domains including plant breeding. We explore the wider potential of AI tools in various facets of breeding, including data collection, unlocking genetic diversity within genebanks, and bridging the genotype-phenotype gap to facilitate crop breeding. This will enable the development of crop cultivars tailored to the projected future environments. Moreover, AI tools also hold promise for refining crop traits by improving the precision of gene-editing systems and predicting the potential effects of gene variants on plant phenotypes. Leveraging AI-enabled precision breeding can augment the efficiency of breeding programs and holds promise for optimizing cropping systems at the grassroots level. This entails identifying optimal inter-cropping and crop-rotation models to enhance agricultural sustainability and productivity in the field.

4.
Am J Hum Genet ; 2024 Jul 31.
Artículo en Inglés | MEDLINE | ID: mdl-39146935

RESUMEN

Large language models (LLMs) are generating interest in medical settings. For example, LLMs can respond coherently to medical queries by providing plausible differential diagnoses based on clinical notes. However, there are many questions to explore, such as evaluating differences between open- and closed-source LLMs as well as LLM performance on queries from both medical and non-medical users. In this study, we assessed multiple LLMs, including Llama-2-chat, Vicuna, Medllama2, Bard/Gemini, Claude, ChatGPT3.5, and ChatGPT-4, as well as non-LLM approaches (Google search and Phenomizer) regarding their ability to identify genetic conditions from textbook-like clinician questions and their corresponding layperson translations related to 63 genetic conditions. For open-source LLMs, larger models were more accurate than smaller LLMs: 7b, 13b, and larger than 33b parameter models obtained accuracy ranges from 21%-49%, 41%-51%, and 54%-68%, respectively. Closed-source LLMs outperformed open-source LLMs, with ChatGPT-4 performing best (89%-90%). Three of 11 LLMs and Google search had significant performance gaps between clinician and layperson prompts. We also evaluated how in-context prompting and keyword removal affected open-source LLM performance. Models were provided with 2 types of in-context prompts: list-type prompts, which improved LLM performance, and definition-type prompts, which did not. We further analyzed removal of rare terms from descriptions, which decreased accuracy for 5 of 7 evaluated LLMs. Finally, we observed much lower performance with real individuals' descriptions; LLMs answered these questions with a maximum 21% accuracy.

5.
Proc Natl Acad Sci U S A ; 121(16): e2303165121, 2024 Apr 16.
Artículo en Inglés | MEDLINE | ID: mdl-38607932

RESUMEN

Antimicrobial resistance was estimated to be associated with 4.95 million deaths worldwide in 2019. It is possible to frame the antimicrobial resistance problem as a feedback-control problem. If we could optimize this feedback-control problem and translate our findings to the clinic, we could slow, prevent, or reverse the development of high-level drug resistance. Prior work on this topic has relied on systems where the exact dynamics and parameters were known a priori. In this study, we extend this work using a reinforcement learning (RL) approach capable of learning effective drug cycling policies in a system defined by empirically measured fitness landscapes. Crucially, we show that it is possible to learn effective drug cycling policies despite the problems of noisy, limited, or delayed measurement. Given access to a panel of 15 [Formula: see text]-lactam antibiotics with which to treat the simulated Escherichia coli population, we demonstrate that RL agents outperform two naive treatment paradigms at minimizing the population fitness over time. We also show that RL agents approach the performance of the optimal drug cycling policy. Even when stochastic noise is introduced to the measurements of population fitness, we show that RL agents are capable of maintaining evolving populations at lower growth rates compared to controls. We further tested our approach in arbitrary fitness landscapes of up to 1,024 genotypes. We show that minimization of population fitness using drug cycles is not limited by increasing genome size. Our work represents a proof-of-concept for using AI to control complex evolutionary processes.


Asunto(s)
Antiinfecciosos , Aprendizaje , Refuerzo en Psicología , Farmacorresistencia Microbiana , Ciclismo , Escherichia coli/genética
6.
Proc Natl Acad Sci U S A ; 121(18): e2307304121, 2024 Apr 30.
Artículo en Inglés | MEDLINE | ID: mdl-38640257

RESUMEN

Over the past few years, machine learning models have significantly increased in size and complexity, especially in the area of generative AI such as large language models. These models require massive amounts of data and compute capacity to train, to the extent that concerns over the training data (such as protected or private content) cannot be practically addressed by retraining the model "from scratch" with the questionable data removed or altered. Furthermore, despite significant efforts and controls dedicated to ensuring that training corpora are properly curated and composed, the sheer volume required makes it infeasible to manually inspect each datum comprising a training corpus. One potential approach to training corpus data defects is model disgorgement, by which we broadly mean the elimination or reduction of not only any improperly used data, but also the effects of improperly used data on any component of an ML model. Model disgorgement techniques can be used to address a wide range of issues, such as reducing bias or toxicity, increasing fidelity, and ensuring responsible use of intellectual property. In this paper, we survey the landscape of model disgorgement methods and introduce a taxonomy of disgorgement techniques that are applicable to modern ML systems. In particular, we investigate the various meanings of "removing the effects" of data on the trained model in a way that does not require retraining from scratch.


Asunto(s)
Lenguaje , Aprendizaje Automático
7.
Hum Mol Genet ; 33(15): 1367-1377, 2024 Jul 22.
Artículo en Inglés | MEDLINE | ID: mdl-38704739

RESUMEN

Spinal Muscular Atrophy is caused by partial loss of survival of motoneuron (SMN) protein expression. The numerous interaction partners and mechanisms influenced by SMN loss result in a complex disease. Current treatments restore SMN protein levels to a certain extent, but do not cure all symptoms. The prolonged survival of patients creates an increasing need for a better understanding of SMA. Although many SMN-protein interactions, dysregulated pathways, and organ phenotypes are known, the connections among them remain largely unexplored. Monogenic diseases are ideal examples for the exploration of cause-and-effect relationships to create a network describing the disease-context. Machine learning tools can utilize such knowledge to analyze similarities between disease-relevant molecules and molecules not described in the disease so far. We used an artificial intelligence-based algorithm to predict new genes of interest. The transcriptional regulation of 8 out of 13 molecules selected from the predicted set were successfully validated in an SMA mouse model. This bioinformatic approach, using the given experimental knowledge for relevance predictions, enhances efficient targeted research in SMA and potentially in other disease settings.


Asunto(s)
Inteligencia Artificial , Biología Computacional , Modelos Animales de Enfermedad , Atrofia Muscular Espinal , Atrofia Muscular Espinal/genética , Atrofia Muscular Espinal/metabolismo , Animales , Ratones , Humanos , Biología Computacional/métodos , Proteína 1 para la Supervivencia de la Neurona Motora/genética , Proteína 1 para la Supervivencia de la Neurona Motora/metabolismo , Aprendizaje Automático , Algoritmos , Regulación de la Expresión Génica/genética
8.
J Cell Sci ; 137(3)2024 02 01.
Artículo en Inglés | MEDLINE | ID: mdl-38324353

RESUMEN

Fluorescence microscopy is essential for studying living cells, tissues and organisms. However, the fluorescent light that switches on fluorescent molecules also harms the samples, jeopardizing the validity of results - particularly in techniques such as super-resolution microscopy, which demands extended illumination. Artificial intelligence (AI)-enabled software capable of denoising, image restoration, temporal interpolation or cross-modal style transfer has great potential to rescue live imaging data and limit photodamage. Yet we believe the focus should be on maintaining light-induced damage at levels that preserve natural cell behaviour. In this Opinion piece, we argue that a shift in role for AIs is needed - AI should be used to extract rich insights from gentle imaging rather than recover compromised data from harsh illumination. Although AI can enhance imaging, our ultimate goal should be to uncover biological truths, not just retrieve data. It is essential to prioritize minimizing photodamage over merely pushing technical limits. Our approach is aimed towards gentle acquisition and observation of undisturbed living systems, aligning with the essence of live-cell fluorescence microscopy.


Asunto(s)
Inteligencia Artificial , Programas Informáticos , Microscopía Fluorescente
9.
Brief Bioinform ; 25(4)2024 May 23.
Artículo en Inglés | MEDLINE | ID: mdl-38886164

RESUMEN

Morphological profiling is a valuable tool in phenotypic drug discovery. The advent of high-throughput automated imaging has enabled the capturing of a wide range of morphological features of cells or organisms in response to perturbations at the single-cell resolution. Concurrently, significant advances in machine learning and deep learning, especially in computer vision, have led to substantial improvements in analyzing large-scale high-content images at high throughput. These efforts have facilitated understanding of compound mechanism of action, drug repurposing, characterization of cell morphodynamics under perturbation, and ultimately contributing to the development of novel therapeutics. In this review, we provide a comprehensive overview of the recent advances in the field of morphological profiling. We summarize the image profiling analysis workflow, survey a broad spectrum of analysis strategies encompassing feature engineering- and deep learning-based approaches, and introduce publicly available benchmark datasets. We place a particular emphasis on the application of deep learning in this pipeline, covering cell segmentation, image representation learning, and multimodal learning. Additionally, we illuminate the application of morphological profiling in phenotypic drug discovery and highlight potential challenges and opportunities in this field.


Asunto(s)
Aprendizaje Profundo , Descubrimiento de Drogas , Descubrimiento de Drogas/métodos , Humanos , Procesamiento de Imagen Asistido por Computador/métodos , Aprendizaje Automático
10.
Brief Bioinform ; 25(3)2024 Mar 27.
Artículo en Inglés | MEDLINE | ID: mdl-38701411

RESUMEN

Cancer stem cells (CSCs) are a subpopulation of cancer cells within tumors that exhibit stem-like properties and represent a potentially effective therapeutic target toward long-term remission by means of differentiation induction. By leveraging an artificial intelligence approach solely based on transcriptomics data, this study scored a large library of small molecules based on their predicted ability to induce differentiation in stem-like cells. In particular, a deep neural network model was trained using publicly available single-cell RNA-Seq data obtained from untreated human-induced pluripotent stem cells at various differentiation stages and subsequently utilized to screen drug-induced gene expression profiles from the Library of Integrated Network-based Cellular Signatures (LINCS) database. The challenge of adapting such different data domains was tackled by devising an adversarial learning approach that was able to effectively identify and remove domain-specific bias during the training phase. Experimental validation in MDA-MB-231 and MCF7 cells demonstrated the efficacy of five out of six tested molecules among those scored highest by the model. In particular, the efficacy of triptolide, OTS-167, quinacrine, granisetron and A-443654 offer a potential avenue for targeted therapies against breast CSCs.


Asunto(s)
Neoplasias de la Mama , Diferenciación Celular , Células Madre Neoplásicas , Humanos , Células Madre Neoplásicas/metabolismo , Células Madre Neoplásicas/efectos de los fármacos , Células Madre Neoplásicas/patología , Neoplasias de la Mama/genética , Neoplasias de la Mama/metabolismo , Neoplasias de la Mama/patología , Neoplasias de la Mama/tratamiento farmacológico , Diferenciación Celular/efectos de los fármacos , Femenino , Inteligencia Artificial , Regulación Neoplásica de la Expresión Génica/efectos de los fármacos , Células MCF-7 , Línea Celular Tumoral , Redes Neurales de la Computación , Perfilación de la Expresión Génica
11.
Brief Bioinform ; 25(2)2024 Jan 22.
Artículo en Inglés | MEDLINE | ID: mdl-38279651

RESUMEN

Rare antinuclear antibody (ANA) pattern recognition has been a widely applied technology for routine ANA screening in clinical laboratories. In recent years, the application of deep learning methods in recognizing ANA patterns has witnessed remarkable advancements. However, the majority of studies in this field have primarily focused on the classification of the most common ANA patterns, while another subset has concentrated on the detection of mitotic metaphase cells. To date, no prior research has been specifically dedicated to the identification of rare ANA patterns. In the present paper, we introduce a novel attention-based enhancement framework, which was designed for the recognition of rare ANA patterns in ANA-indirect immunofluorescence images. More specifically, we selected the algorithm with the best performance as our target detection network by conducting comparative experiments. We then further developed and enhanced the chosen algorithm through a series of optimizations. Then, attention mechanism was introduced to facilitate neural networks in expediting the learning process, extracting more essential and distinctive features for the target features that belong to the specific patterns. The proposed approach has helped to obtained high precision rate of 86.40%, 82.75% recall, 84.24% F1 score and 84.64% mean average precision for a 9-category rare ANA pattern detection task on our dataset. Finally, we evaluated the potential of the model as medical technologist assistant and observed that the technologist's performance improved after referring to the results of the model prediction. These promising results highlighted its potential as an efficient and reliable tool to assist medical technologists in their clinical practice.


Asunto(s)
Algoritmos , Anticuerpos Antinucleares , Técnica del Anticuerpo Fluorescente Indirecta/métodos
12.
Brief Bioinform ; 25(4)2024 May 23.
Artículo en Inglés | MEDLINE | ID: mdl-38960405

RESUMEN

Plasmids are extrachromosomal DNA found in microorganisms. They often carry beneficial genes that help bacteria adapt to harsh conditions. Plasmids are also important tools in genetic engineering, gene therapy, and drug production. However, it can be difficult to identify plasmid sequences from chromosomal sequences in genomic and metagenomic data. Here, we have developed a new tool called PlasmidHunter, which uses machine learning to predict plasmid sequences based on gene content profile. PlasmidHunter can achieve high accuracies (up to 97.6%) and high speeds in benchmark tests including both simulated contigs and real metagenomic plasmidome data, outperforming other existing tools.


Asunto(s)
Aprendizaje Automático , Plásmidos , Plásmidos/genética , Análisis de Secuencia de ADN/métodos , Programas Informáticos , Biología Computacional/métodos , Algoritmos
13.
Brief Bioinform ; 25(4)2024 May 23.
Artículo en Inglés | MEDLINE | ID: mdl-38960407

RESUMEN

The optimization of therapeutic antibodies through traditional techniques, such as candidate screening via hybridoma or phage display, is resource-intensive and time-consuming. In recent years, computational and artificial intelligence-based methods have been actively developed to accelerate and improve the development of therapeutic antibodies. In this study, we developed an end-to-end sequence-based deep learning model, termed AttABseq, for the predictions of the antigen-antibody binding affinity changes connected with antibody mutations. AttABseq is a highly efficient and generic attention-based model by utilizing diverse antigen-antibody complex sequences as the input to predict the binding affinity changes of residue mutations. The assessment on the three benchmark datasets illustrates that AttABseq is 120% more accurate than other sequence-based models in terms of the Pearson correlation coefficient between the predicted and experimental binding affinity changes. Moreover, AttABseq also either outperforms or competes favorably with the structure-based approaches. Furthermore, AttABseq consistently demonstrates robust predictive capabilities across a diverse array of conditions, underscoring its remarkable capacity for generalization across a wide spectrum of antigen-antibody complexes. It imposes no constraints on the quantity of altered residues, rendering it particularly applicable in scenarios where crystallographic structures remain unavailable. The attention-based interpretability analysis indicates that the causal effects of point mutations on antibody-antigen binding affinity changes can be visualized at the residue level, which might assist automated antibody sequence optimization. We believe that AttABseq provides a fiercely competitive answer to therapeutic antibody optimization.


Asunto(s)
Complejo Antígeno-Anticuerpo , Aprendizaje Profundo , Complejo Antígeno-Anticuerpo/química , Antígenos/química , Antígenos/genética , Antígenos/metabolismo , Antígenos/inmunología , Afinidad de Anticuerpos , Secuencia de Aminoácidos , Biología Computacional/métodos , Humanos , Mutación , Anticuerpos/química , Anticuerpos/inmunología , Anticuerpos/genética , Anticuerpos/metabolismo
14.
Brief Bioinform ; 25(5)2024 Jul 25.
Artículo en Inglés | MEDLINE | ID: mdl-39129360

RESUMEN

The genetic blueprint for the essential functions of life is encoded in DNA, which is translated into proteins-the engines driving most of our metabolic processes. Recent advancements in genome sequencing have unveiled a vast diversity of protein families, but compared with the massive search space of all possible amino acid sequences, the set of known functional families is minimal. One could say nature has a limited protein "vocabulary." A major question for computational biologists, therefore, is whether this vocabulary can be expanded to include useful proteins that went extinct long ago or have never evolved (yet). By merging evolutionary algorithms, machine learning, and bioinformatics, we can develop highly customized "designer proteins." We dub the new subfield of computational evolution, which employs evolutionary algorithms with DNA string representations, biologically accurate molecular evolution, and bioinformatics-informed fitness functions, Evolutionary Algorithms Simulating Molecular Evolution.


Asunto(s)
Algoritmos , Biología Computacional , Evolución Molecular , Biología Computacional/métodos , Proteínas/genética , Proteínas/química , Proteínas/metabolismo , Simulación por Computador
15.
Brief Bioinform ; 25(Supplement_1)2024 Jul 23.
Artículo en Inglés | MEDLINE | ID: mdl-39041915

RESUMEN

This manuscript describes the development of a resources module that is part of a learning platform named 'NIGMS Sandbox for Cloud-based Learning' https://github.com/NIGMS/NIGMS-Sandbox. The overall genesis of the Sandbox is described in the editorial NIGMS Sandbox at the beginning of this Supplement. This module delivers learning materials on implementing deep learning algorithms for biomedical image data in an interactive format that uses appropriate cloud resources for data access and analyses. Biomedical-related datasets are widely used in both research and clinical settings, but the ability for professionally trained clinicians and researchers to interpret datasets becomes difficult as the size and breadth of these datasets increases. Artificial intelligence, and specifically deep learning neural networks, have recently become an important tool in novel biomedical research. However, use is limited due to their computational requirements and confusion regarding different neural network architectures. The goal of this learning module is to introduce types of deep learning neural networks and cover practices that are commonly used in biomedical research. This module is subdivided into four submodules that cover classification, augmentation, segmentation and regression. Each complementary submodule was written on the Google Cloud Platform and contains detailed code and explanations, as well as quizzes and challenges to facilitate user training. Overall, the goal of this learning module is to enable users to identify and integrate the correct type of neural network with their data while highlighting the ease-of-use of cloud computing for implementing neural networks. This manuscript describes the development of a resource module that is part of a learning platform named ``NIGMS Sandbox for Cloud-based Learning'' https://github.com/NIGMS/NIGMS-Sandbox. The overall genesis of the Sandbox is described in the editorial NIGMS Sandbox [1] at the beginning of this Supplement. This module delivers learning materials on the analysis of bulk and single-cell ATAC-seq data in an interactive format that uses appropriate cloud resources for data access and analyses.


Asunto(s)
Aprendizaje Profundo , Redes Neurales de la Computación , Humanos , Investigación Biomédica , Algoritmos , Nube Computacional
16.
Brief Bioinform ; 25(4)2024 May 23.
Artículo en Inglés | MEDLINE | ID: mdl-38856172

RESUMEN

With their diverse biological activities, peptides are promising candidates for therapeutic applications, showing antimicrobial, antitumour and hormonal signalling capabilities. Despite their advantages, therapeutic peptides face challenges such as short half-life, limited oral bioavailability and susceptibility to plasma degradation. The rise of computational tools and artificial intelligence (AI) in peptide research has spurred the development of advanced methodologies and databases that are pivotal in the exploration of these complex macromolecules. This perspective delves into integrating AI in peptide development, encompassing classifier methods, predictive systems and the avant-garde design facilitated by deep-generative models like generative adversarial networks and variational autoencoders. There are still challenges, such as the need for processing optimization and careful validation of predictive models. This work outlines traditional strategies for machine learning model construction and training techniques and proposes a comprehensive AI-assisted peptide design and validation pipeline. The evolving landscape of peptide design using AI is emphasized, showcasing the practicality of these methods in expediting the development and discovery of novel peptides within the context of peptide-based drug discovery.


Asunto(s)
Inteligencia Artificial , Descubrimiento de Drogas , Péptidos , Péptidos/química , Péptidos/uso terapéutico , Péptidos/farmacología , Descubrimiento de Drogas/métodos , Humanos , Diseño de Fármacos , Aprendizaje Automático , Biología Computacional/métodos
17.
Mol Cell Proteomics ; 23(7): 100798, 2024 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-38871251

RESUMEN

Rescoring of peptide spectrum matches originating from database search engines enabled by peptide property predictors is exceeding the performance of peptide identification from traditional database search engines. In contrast to the peptide spectrum match scores calculated by traditional database search engines, rescoring peptide spectrum matches generates scores based on comparing observed and predicted peptide properties, such as fragment ion intensities and retention times. These newly generated scores enable a more efficient discrimination between correct and incorrect peptide spectrum matches. This approach was shown to lead to substantial improvements in the number of confidently identified peptides, facilitating the analysis of challenging datasets in various fields such as immunopeptidomics, metaproteomics, proteogenomics, and single-cell proteomics. In this review, we summarize the key elements leading up to the recent introduction of multiple data-driven rescoring pipelines. We provide an overview of relevant post-processing rescoring tools, introduce prominent data-driven rescoring pipelines for various applications, and highlight limitations, opportunities, and future perspectives of this approach and its impact on mass spectrometry-based proteomics.


Asunto(s)
Péptidos , Proteómica , Proteómica/métodos , Péptidos/metabolismo , Péptidos/química , Humanos , Bases de Datos de Proteínas , Espectrometría de Masas/métodos , Motor de Búsqueda
18.
Mol Cell Proteomics ; 23(3): 100737, 2024 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-38354979

RESUMEN

Personalized medicine can reduce adverse effects, enhance drug efficacy, and optimize treatment outcomes, which represents the essence of personalized medicine in the pharmacy field. Protein drugs are crucial in the field of personalized drug therapy and are currently the mainstay, which possess higher target specificity and biological activity than small-molecule chemical drugs, making them efficient in regulating disease-related biological processes, and have significant potential in the development of personalized drugs. Currently, protein drugs are designed and developed for specific protein targets based on patient-specific protein data. However, due to the rapid development of two-dimensional gel electrophoresis and mass spectrometry, it is now widely recognized that a canonical protein actually includes multiple proteoforms, and the differences between these proteoforms will result in varying responses to drugs. The variation in the effects of different proteoforms can be significant and the impact can even alter the intended benefit of a drug, potentially making it harmful instead of lifesaving. As a result, we propose that protein drugs should shift from being targeted through the lens of protein (proteomics) to being targeted through the lens of proteoform (proteoformics). This will enable the development of personalized protein drugs that are better equipped to meet patients' specific needs and disease characteristics. With further development in the field of proteoformics, individualized drug therapy, especially personalized protein drugs aimed at proteoforms as a drug target, will improve the understanding of disease mechanisms, discovery of new drug targets and signaling pathways, provide a theoretical basis for the development of new drugs, aid doctors in conducting health risk assessments and making more cost-effective targeted prevention strategies conducted by artificial intelligence/machine learning, promote technological innovation, and provide more convenient treatment tailored to individualized patient profile, which will benefit the affected individuals and society at large.


Asunto(s)
Inteligencia Artificial , Proteómica , Humanos , Proteómica/métodos , Medicina de Precisión , Espectrometría de Masas
19.
Plant J ; 2024 Jul 08.
Artículo en Inglés | MEDLINE | ID: mdl-38976238

RESUMEN

Plants produce a staggering array of chemicals that are the basis for organismal function and important human nutrients and medicines. However, it is poorly defined how these compounds evolved and are distributed across the plant kingdom, hindering a systematic view and understanding of plant chemical diversity. Recent advances in plant genome/transcriptome sequencing have provided a well-defined molecular phylogeny of plants, on which the presence of diverse natural products can be mapped to systematically determine their phylogenetic distribution. Here, we built a proof-of-concept workflow where previously reported diverse tyrosine-derived plant natural products were mapped onto the plant tree of life. Plant chemical-species associations were mined from literature, filtered, evaluated through manual inspection of over 2500 scientific articles, and mapped onto the plant phylogeny. The resulting "phylochemical" map confirmed several highly lineage-specific compound class distributions, such as betalain pigments and Amaryllidaceae alkaloids. The map also highlighted several lineages enriched in dopamine-derived compounds, including the orders Caryophyllales, Liliales, and Fabales. Additionally, the application of large language models, using our manually curated data as a ground truth set, showed that post-mining processing can largely be automated with a low false-positive rate, critical for generating a reliable phylochemical map. Although a high false-negative rate remains a challenge, our study demonstrates that combining text mining with language model-based processing can generate broader phylochemical maps, which will serve as a valuable community resource to uncover key evolutionary events that underlie plant chemical diversity and enable system-level views of nature's millions of years of chemical experimentation.

20.
Plant J ; 2024 Aug 17.
Artículo en Inglés | MEDLINE | ID: mdl-39152709

RESUMEN

Structural prediction by artificial intelligence can be powerful new instruments to discover novel protein-protein interactions, but the community still grapples with the implementation, opportunities and limitations. Here, we discuss and re-analyse our in silico screen for novel pathogen-secreted inhibitors of immune hydrolases to illustrate the power and limitations of structural predictions. We discuss strategies of curating sequences, including controls, and reusing sequence alignments and highlight important limitations caused by different platforms, sequence depth and computing times. We hope these experiences will support similar interactomic screens by the research community.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA