Búsqueda | Biblioteca Virtual en Salud Odontología. Uruguay

1.

Mining human microbiomes reveals an untapped source of peptide antibiotics.

Torres, Marcelo D T; Brooks, Erin F; Cesaro, Angela; Sberro, Hila; Gill, Matthew O; Nicolaou, Cosmos; Bhatt, Ami S; de la Fuente-Nunez, Cesar.

Cell ; 187(19): 5453-5467.e15, 2024 Sep 19.

Artículo en Inglés | MEDLINE | ID: mdl-39163860

RESUMEN

Drug-resistant bacteria are outpacing traditional antibiotic discovery efforts. Here, we computationally screened 444,054 previously reported putative small protein families from 1,773 human metagenomes for antimicrobial properties, identifying 323 candidates encoded in small open reading frames (smORFs). To test our computational predictions, 78 peptides were synthesized and screened for antimicrobial activity in vitro, with 70.5% displaying antimicrobial activity. As these compounds were different compared with previously reported antimicrobial peptides, we termed them smORF-encoded peptides (SEPs). SEPs killed bacteria by targeting their membrane, synergizing with each other, and modulating gut commensals, indicating a potential role in reconfiguring microbiome communities in addition to counteracting pathogens. The lead candidates were anti-infective in both murine skin abscess and deep thigh infection models. Notably, prevotellin-2 from Prevotella copri presented activity comparable to the commonly used antibiotic polymyxin B. Our report supports the existence of hundreds of antimicrobials in the human microbiome amenable to clinical translation.

Asunto(s)

Antibacterianos , Péptidos Antimicrobianos , Microbiota , Humanos , Animales , Ratones , Antibacterianos/farmacología , Microbiota/efectos de los fármacos , Péptidos Antimicrobianos/farmacología , Péptidos Antimicrobianos/química , Metagenoma , Femenino , Sistemas de Lectura Abierta , Bacterias/efectos de los fármacos , Bacterias/genética , Bacterias/clasificación , Prevotella/efectos de los fármacos

2.

An explainable language model for antibody specificity prediction using curated influenza hemagglutinin antibodies.

Wang, Yiquan; Lv, Huibin; Teo, Qi Wen; Lei, Ruipeng; Gopal, Akshita B; Ouyang, Wenhao O; Yeung, Yuen-Hei; Tan, Timothy J C; Choi, Danbi; Shen, Ivana R; Chen, Xin; Graham, Claire S; Wu, Nicholas C.

Immunity ; 2024 Aug 15.

Artículo en Inglés | MEDLINE | ID: mdl-39163866

RESUMEN

Despite decades of antibody research, it remains challenging to predict the specificity of an antibody solely based on its sequence. Two major obstacles are the lack of appropriate models and the inaccessibility of datasets for model training. In this study, we curated >5,000 influenza hemagglutinin (HA) antibodies by mining research publications and patents, which revealed many distinct sequence features between antibodies to HA head and stem domains. We then leveraged this dataset to develop a lightweight memory B cell language model (mBLM) for sequence-based antibody specificity prediction. Model explainability analysis showed that mBLM could identify key sequence features of HA stem antibodies. Additionally, by applying mBLM to HA antibodies with unknown epitopes, we discovered and experimentally validated many HA stem antibodies. Overall, this study not only advances our molecular understanding of the antibody response to the influenza virus but also provides a valuable resource for applying deep learning to antibody research.

3.

Disease Heritability Inferred from Familial Relationships Reported in Medical Records.

Polubriaginof, Fernanda C G; Vanguri, Rami; Quinnies, Kayla; Belbin, Gillian M; Yahi, Alexandre; Salmasian, Hojjat; Lorberbaum, Tal; Nwankwo, Victor; Li, Li; Shervey, Mark M; Glowe, Patricia; Ionita-Laza, Iuliana; Simmerling, Mary; Hripcsak, George; Bakken, Suzanne; Goldstein, David; Kiryluk, Krzysztof; Kenny, Eimear E; Dudley, Joel; Vawdrey, David K; Tatonetti, Nicholas P.

Cell ; 173(7): 1692-1704.e11, 2018 06 14.

Artículo en Inglés | MEDLINE | ID: mdl-29779949

RESUMEN

Heritability is essential for understanding the biological causes of disease but requires laborious patient recruitment and phenotype ascertainment. Electronic health records (EHRs) passively capture a wide range of clinically relevant data and provide a resource for studying the heritability of traits that are not typically accessible. EHRs contain next-of-kin information collected via patient emergency contact forms, but until now, these data have gone unused in research. We mined emergency contact data at three academic medical centers and identified 7.4 million familial relationships while maintaining patient privacy. Identified relationships were consistent with genetically derived relatedness. We used EHR data to compute heritability estimates for 500 disease phenotypes. Overall, estimates were consistent with the literature and between sites. Inconsistencies were indicative of limitations and opportunities unique to EHR research. These analyses provide a validation of the use of EHRs for genetics and disease research.

Asunto(s)

Registros Electrónicos de Salud , Enfermedades Genéticas Congénitas/genética , Algoritmos , Bases de Datos Factuales , Relaciones Familiares , Enfermedades Genéticas Congénitas/patología , Genotipo , Humanos , Linaje , Fenotipo , Carácter Cuantitativo Heredable

4.

A large-scale systematic survey reveals recurring molecular features of public antibody responses to SARS-CoV-2.

Wang, Yiquan; Yuan, Meng; Lv, Huibin; Peng, Jian; Wilson, Ian A; Wu, Nicholas C.

Immunity ; 55(6): 1105-1117.e4, 2022 06 14.

Artículo en Inglés | MEDLINE | ID: mdl-35397794

RESUMEN

Global research to combat the COVID-19 pandemic has led to the isolation and characterization of thousands of human antibodies to the SARS-CoV-2 spike protein, providing an unprecedented opportunity to study the antibody response to a single antigen. Using the information derived from 88 research publications and 13 patents, we assembled a dataset of â¼8,000 human antibodies to the SARS-CoV-2 spike protein from >200 donors. By analyzing immunoglobulin V and D gene usages, complementarity-determining region H3 sequences, and somatic hypermutations, we demonstrated that the common (public) responses to different domains of the spike protein were quite different. We further used these sequences to train a deep-learning model to accurately distinguish between the human antibodies to SARS-CoV-2 spike protein and those to influenza hemagglutinin protein. Overall, this study provides an informative resource for antibody research and enhances our molecular understanding of public antibody responses.

Asunto(s)

COVID-19 , SARS-CoV-2 , Anticuerpos Neutralizantes , Anticuerpos Antivirales , Formación de Anticuerpos , Humanos , Pandemias , Glicoproteína de la Espiga del Coronavirus

5.

A basal-level activity of ATR links replication fork surveillance and stress response.

Yin, Yandong; Lee, Wei Ting Chelsea; Gupta, Dipika; Xue, Huijun; Tonzi, Peter; Borowiec, James A; Huang, Tony T; Modesti, Mauro; Rothenberg, Eli.

Mol Cell ; 81(20): 4243-4257.e6, 2021 10 21.

Artículo en Inglés | MEDLINE | ID: mdl-34473946

RESUMEN

Mammalian cells use diverse pathways to prevent deleterious consequences during DNA replication, yet the mechanism by which cells survey individual replisomes to detect spontaneous replication impediments at the basal level, and their accumulation during replication stress, remain undefined. Here, we used single-molecule localization microscopy coupled with high-order-correlation image-mining algorithms to quantify the composition of individual replisomes in single cells during unperturbed replication and under replicative stress. We identified a basal-level activity of ATR that monitors and regulates the amounts of RPA at forks during normal replication. Replication-stress amplifies the basal activity through the increased volume of ATR-RPA interaction and diffusion-driven enrichment of ATR at forks. This localized crowding of ATR enhances its collision probability, stimulating the activation of its replication-stress response. Finally, we provide a computational model describing how the basal activity of ATR is amplified to produce its canonical replication stress response.

Asunto(s)

Proteínas de la Ataxia Telangiectasia Mutada/metabolismo , Replicación del ADN , ADN de Neoplasias/biosíntesis , Algoritmos , Proteínas de la Ataxia Telangiectasia Mutada/genética , Línea Celular Tumoral , Quinasa 1 Reguladora del Ciclo Celular (Checkpoint 1)/genética , Quinasa 1 Reguladora del Ciclo Celular (Checkpoint 1)/metabolismo , ADN de Neoplasias/genética , Humanos , Procesamiento de Imagen Asistido por Computador , Cinética , Mutación , Fosforilación , Proteína de Replicación A/genética , Proteína de Replicación A/metabolismo , Imagen Individual de Molécula

6.

Literature-based predictions of Mendelian disease therapies.

Deisseroth, Cole A; Lee, Won-Seok; Kim, Jiyoen; Jeong, Hyun-Hwan; Dhindsa, Ryan S; Wang, Julia; Zoghbi, Huda Y; Liu, Zhandong.

Am J Hum Genet ; 110(10): 1661-1672, 2023 10 05.

Artículo en Inglés | MEDLINE | ID: mdl-37741276

RESUMEN

In the effort to treat Mendelian disorders, correcting the underlying molecular imbalance may be more effective than symptomatic treatment. Identifying treatments that might accomplish this goal requires extensive and up-to-date knowledge of molecular pathways-including drug-gene and gene-gene relationships. To address this challenge, we present "parsing modifiers via article annotations" (PARMESAN), a computational tool that searches PubMed and PubMed Central for information to assemble these relationships into a central knowledge base. PARMESAN then predicts putatively novel drug-gene relationships, assigning an evidence-based score to each prediction. We compare PARMESAN's drug-gene predictions to all of the drug-gene relationships displayed by the Drug-Gene Interaction Database (DGIdb) and show that higher-scoring relationship predictions are more likely to match the directionality (up- versus down-regulation) indicated by this database. PARMESAN had more than 200,000 drug predictions scoring above 8 (as one example cutoff), for more than 3,700 genes. Among these predicted relationships, 210 were registered in DGIdb and 201 (96%) had matching directionality. This publicly available tool provides an automated way to prioritize drug screens to target the most-promising drugs to test, thereby saving time and resources in the development of therapeutics for genetic disorders.

Asunto(s)

PubMed , Humanos , Bases de Datos Factuales

7.

The rise of taxon-specific epitope predictors.

Campelo, Felipe; Lobo, Francisco P.

Brief Bioinform ; 25(2)2024 Jan 22.

Artículo en Inglés | MEDLINE | ID: mdl-38493292

RESUMEN

Computational predictors of immunogenic peptides, or epitopes, are traditionally built based on data from a broad range of pathogens without consideration for taxonomic information. While this approach may be reasonable if one aims to develop one-size-fits-all models, it may be counterproductive if the proteins for which the model is expected to generalize are known to come from a specific subset of phylogenetically related pathogens. There is mounting evidence that, for these cases, taxon-specific models can outperform generalist ones, even when trained with substantially smaller amounts of data. In this comment, we provide some perspective on the current state of taxon-specific modelling for the prediction of linear B-cell epitopes, and the challenges faced when building and deploying these predictors.

Asunto(s)

Péptidos , Proteínas , Secuencia de Aminoácidos , Epítopos de Linfocito B

8.

Multi-omic analysis tools for microbial metabolites prediction.

Wu, Shengbo; Zhou, Haonan; Chen, Danlei; Lu, Yutong; Li, Yanni; Qiao, Jianjun.

Brief Bioinform ; 25(4)2024 May 23.

Artículo en Inglés | MEDLINE | ID: mdl-38859767

RESUMEN

How to resolve the metabolic dark matter of microorganisms has long been a challenging problem in discovering active molecules. Diverse omics tools have been developed to guide the discovery and characterization of various microbial metabolites, which make it gradually possible to predict the overall metabolites for individual strains. The combinations of multi-omic analysis tools effectively compensates for the shortcomings of current studies that focus only on single omics or a broad class of metabolites. In this review, we systematically update, categorize and sort out different analysis tools for microbial metabolites prediction in the last five years to appeal for the multi-omic combination on the understanding of the metabolic nature of microbes. First, we provide the general survey on different updated prediction databases, webservers, or software that based on genomics, transcriptomics, proteomics, and metabolomics, respectively. Then, we discuss the essentiality on the integration of multi-omics data to predict metabolites of different microbial strains and communities, as well as stressing the combination of other techniques, such as systems biology methods and data-driven algorithms. Finally, we identify key challenges and trends in developing multi-omic analysis tools for more comprehensive prediction on diverse microbial metabolites that contribute to human health and disease treatment.

Asunto(s)

Metabolómica , Programas Informáticos , Metabolómica/métodos , Genómica/métodos , Proteómica/métodos , Humanos , Biología Computacional/métodos , Bacterias/metabolismo , Bacterias/genética , Bacterias/clasificación , Metaboloma , Algoritmos , Multiómica

9.

ChIP-GPT: a managed large language model for robust data extraction from biomedical database records.

Cinquin, Olivier.

Brief Bioinform ; 25(2)2024 Jan 22.

Artículo en Inglés | MEDLINE | ID: mdl-38314912

RESUMEN

Increasing volumes of biomedical data are amassing in databases. Large-scale analyses of these data have wide-ranging applications in biology and medicine. Such analyses require tools to characterize and process entries at scale. However, existing tools, mainly centered on extracting predefined fields, often fail to comprehensively process database entries or correct evident errors-a task humans can easily perform. These tools also lack the ability to reason like domain experts, hindering their robustness and analytical depth. Recent advances with large language models (LLMs) provide a fundamentally new way to query databases. But while a tool such as ChatGPT is adept at answering questions about manually input records, challenges arise when scaling up this process. First, interactions with the LLM need to be automated. Second, limitations on input length may require a record pruning or summarization pre-processing step. Third, to behave reliably as desired, the LLM needs either well-designed, short, 'few-shot' examples, or fine-tuning based on a larger set of well-curated examples. Here, we report ChIP-GPT, based on fine-tuning of the generative pre-trained transformer (GPT) model Llama and on a program prompting the model iteratively and handling its generation of answer text. This model is designed to extract metadata from the Sequence Read Archive, emphasizing the identification of chromatin immunoprecipitation (ChIP) targets and cell lines. When trained with 100 examples, ChIP-GPT demonstrates 90-94% accuracy. Notably, it can seamlessly extract data from records with typos or absent field labels. Our proposed method is easily adaptable to customized questions and different databases.

Asunto(s)

Medicina , Humanos , Línea Celular , Inmunoprecipitación de Cromatina , Bases de Datos Factuales , Lenguaje

10.

IDPpub: Illuminating the Dark Phosphoproteome Through PubMed Mining.

Savage, Sara R; Zhang, Yaoyun; Jaehnig, Eric J; Liao, Yuxing; Shi, Zhiao; Pham, Huy Anh; Xu, Hua; Zhang, Bing.

Mol Cell Proteomics ; 23(1): 100682, 2024 Jan.

Artículo en Inglés | MEDLINE | ID: mdl-37993103

RESUMEN

Global phosphoproteomics experiments quantify tens of thousands of phosphorylation sites. However, data interpretation is hampered by our limited knowledge on functions, biological contexts, or precipitating enzymes of the phosphosites. This study establishes a repository of phosphosites with associated evidence in biomedical abstracts, using deep learning-based natural language processing techniques. Our model for illuminating the dark phosphoproteome through PubMed mining (IDPpub) was generated by fine-tuning BioBERT, a deep learning tool for biomedical text mining. Trained using sentences containing protein substrates and phosphorylation site positions from 3000 abstracts, the IDPpub model was then used to extract phosphorylation sites from all MEDLINE abstracts. The extracted proteins were normalized to gene symbols using the National Center for Biotechnology Information gene query, and sites were mapped to human UniProt sequences using ProtMapper and mouse UniProt sequences by direct match. Precision and recall were calculated using 150 curated abstracts, and utility was assessed by analyzing the CPTAC (Clinical Proteomics Tumor Analysis Consortium) pan-cancer phosphoproteomics datasets and the PhosphoSitePlus database. Using 10-fold cross validation, pairs of correct substrates and phosphosite positions were extracted with an average precision of 0.93 and recall of 0.94. After entity normalization and site mapping to human reference sequences, an independent validation achieved a precision of 0.91 and recall of 0.77. The IDPpub repository contains 18,458 unique human phosphorylation sites with evidence sentences from 58,227 abstracts and 5918 mouse sites in 14,610 abstracts. This included evidence sentences for 1803 sites identified in CPTAC studies that are not covered by manually curated functional information in PhosphoSitePlus. Evaluation results demonstrate the potential of IDPpub as an effective biomedical text mining tool for collecting phosphosites. Moreover, the repository (http://idppub.ptmax.org), which can be automatically updated, can serve as a powerful complement to existing resources.

Asunto(s)

Minería de Datos , Procesamiento de Lenguaje Natural , Humanos , Minería de Datos/métodos , Bases de Datos Factuales , PubMed

11.

Mining in space could spur sustainable growth.

Fleming, Maxwell; Lange, Ian; Shojaeinia, Sayeh; Stuermer, Martin.

Proc Natl Acad Sci U S A ; 120(43): e2221345120, 2023 Oct 24.

Artículo en Inglés | MEDLINE | ID: mdl-37844231

RESUMEN

Growth models with resources and environmental externalities typically assume that planet Earth is a closed economy. However, private firms like Blue Origin and SpaceX have reduced the cost of rocket launches by a factor of 20 over the last decade. What if these costs continue to decline, making mining from asteroids or the moon feasible? What would be the implications for economic growth and the environment? This paper provides stylized facts about cost trends, geology, and the environmental impact of mining on Earth and potentially in Space. We extend a neoclassical growth model to investigate the transition from mining on Earth to Space. We find that such a transition could potentially allow for continued growth of metal use, while limiting environmental and social costs on Earth. Acknowledging the high uncertainty around the topic, our paper provides a starting point for research on how Space mining could contribute to sustainable growth on Earth.

12.

Resistance gene-guided genome mining reveals the roseopurpurins as inhibitors of cyclin-dependent kinases.

Dunbar, Kyle L; Perlatti, Bruno; Liu, Nicholas; Cornelius, Amber; Mummau, Daniel; Chiang, Yi-Ming; Hon, Lawrence; Nimavat, Monika; Pallas, Jason; Kordes, Sina; Ng, Ho Leung; Harvey, Colin J B.

Proc Natl Acad Sci U S A ; 120(48): e2310522120, 2023 Nov 28.

Artículo en Inglés | MEDLINE | ID: mdl-37983497

RESUMEN

With the significant increase in the availability of microbial genome sequences in recent years, resistance gene-guided genome mining has emerged as a powerful approach for identifying natural products with specific bioactivities. Here, we present the use of this approach to reveal the roseopurpurins as potent inhibitors of cyclin-dependent kinases (CDKs), a class of cell cycle regulators implicated in multiple cancers. We identified a biosynthetic gene cluster (BGC) with a putative resistance gene with homology to human CDK2. Using targeted gene disruption and transcription factor overexpression in Aspergillus uvarum, and heterologous expression of the BGC in Aspergillus nidulans, we demonstrated that roseopurpurin C (1) is produced by this cluster and characterized its biosynthesis. We determined the potency, specificity, and mechanism of action of 1 as well as multiple intermediates and shunt products produced from the BGC. We show that 1 inhibits human CDK2 with a Kiapp of 44 nM, demonstrates selectivity for clinically relevant members of the CDK family, and induces G1 cell cycle arrest in HCT116 cells. Structural analysis of 1 complexed with CDK2 revealed the molecular basis of ATP-competitive inhibition.

Asunto(s)

Quinasas Ciclina-Dependientes , Neoplasias , Humanos , Quinasas Ciclina-Dependientes/metabolismo , Quinasa 2 Dependiente de la Ciclina/genética , Ciclinas/metabolismo , Ciclo Celular/genética , Inhibidores Enzimáticos

13.

China, the Democratic Republic of the Congo, and artisanal cobalt mining from 2000 through 2020.

Gulley, Andrew L.

Proc Natl Acad Sci U S A ; 120(26): e2212037120, 2023 06 27.

Artículo en Inglés | MEDLINE | ID: mdl-37339197

RESUMEN

From 2000 through 2020, demand for cobalt to manufacture batteries grew 26-fold. Eighty-two percent of this growth occurred in China and China's cobalt refinery production increased 78-fold. Diminished industrial cobalt mine production in the early-to-mid 2000s led many Chinese companies to purchase ores from artisanal cobalt miners in the Democratic Republic of the Congo (DRC), many of whom have been found to be children. Despite extensive research on artisanal cobalt mining, fundamental questions about its production remain unanswered. This gap is addressed here by estimating artisanal cobalt production, processing, and trade. The results show that, while total DRC cobalt mine production grew from 11,000 metric tons (t) in 2000 to 98,000 t in 2020, artisanal production only grew from 1,000 to 2,000 t in 2000 to 9,000 to 11,000 t in 2020 (with a peak of 17,000 to 21,000 t in 2018). Artisanal production's share of world and DRC cobalt mine production peaked around 2008 at 18 to 23% and 40 to 53%, respectively, before trending down to 6 to 8% and 9 to 11% in 2020, respectively. Artisanal production was chiefly exported to China or processed within the DRC by Chinese firms. An average of 72 to 79% of artisanal production was processed at facilities within the DRC from 2016 through 2020. As such, these facilities may be potential monitoring points for artisanal production and its downstream consumers. This finding may help to support responsible sourcing initiatives and better address abuses related to artisanal cobalt mining by focusing local efforts at the artisanal processing facilities through which most artisanal cobalt production flows.

Asunto(s)

Cobalto , Minería , Humanos , Niño , República Democrática del Congo , Industrias , China

14.

The pseudotorsional space of RNA.

Grille, Leandro; Gallego, Diego; Darré, Leonardo; da Rosa, Gabriela; Battistini, Federica; Orozco, Modesto; Dans, Pablo D.

RNA ; 29(12): 1896-1909, 2023 12.

Artículo en Inglés | MEDLINE | ID: mdl-37793790

RESUMEN

The characterization of the conformational landscape of the RNA backbone is rather complex due to the ability of RNA to assume a large variety of conformations. These backbone conformations can be depicted by pseudotorsional angles linking RNA backbone atoms, from which Ramachandran-like plots can be built. We explore here different definitions of these pseudotorsional angles, finding that the most accurate ones are the traditional Î· (eta) and Î¸ (theta) angles, which represent the relative position of RNA backbone atoms P and C4'. We explore the distribution of Î· - Î¸ in known experimental structures, comparing the pseudotorsional space generated with structures determined exclusively by one experimental technique. We found that the complete picture only appears when combining data from different sources. The maps provide a quite comprehensive representation of the RNA accessible space, which can be used in RNA-structural predictions. Finally, our results highlight that protein interactions lead to significant changes in the population of the Î· - Î¸ space, pointing toward the role of induced-fit mechanisms in protein-RNA recognition.

Asunto(s)

Proteínas , ARN , ARN/genética , ARN/química , Proteínas/química , Conformación de Ácido Nucleico

15.

OTTM: an automated classification tool for translational drug discovery from omics data.

Yang, Xiaobo; Zhang, Bei; Wang, Siqi; Lu, Ye; Chen, Kaixian; Luo, Cheng; Sun, Aihua; Zhang, Hao.

Brief Bioinform ; 24(5)2023 09 20.

Artículo en Inglés | MEDLINE | ID: mdl-37594310

RESUMEN

Omics data from clinical samples are the predominant source of target discovery and drug development. Typically, hundreds or thousands of differentially expressed genes or proteins can be identified from omics data. This scale of possibilities is overwhelming for target discovery and validation using biochemical or cellular experiments. Most of these proteins and genes have no corresponding drugs or even active compounds. Moreover, a proportion of them may have been previously reported as being relevant to the disease of interest. To facilitate translational drug discovery from omics data, we have developed a new classification tool named Omics and Text driven Translational Medicine (OTTM). This tool can markedly narrow the range of proteins or genes that merit further validation via drug availability assessment and literature mining. For the 4489 candidate proteins identified in our previous proteomics study, OTTM recommended 40 FDA-approved or clinical trial drugs. Of these, 15 are available commercially and were tested on hepatocellular carcinoma Hep-G2 cells. Two drugs-tafenoquine succinate (an FDA-approved antimalarial drug targeting CYC1) and branaplam (a Phase 3 clinical drug targeting SMN1 for the treatment of spinal muscular atrophy)-showed potent inhibitory activity against Hep-G2 cell viability, suggesting that CYC1 and SMN1 may be potential therapeutic target proteins for hepatocellular carcinoma. In summary, OTTM is an efficient classification tool that can accelerate the discovery of effective drugs and targets using thousands of candidate proteins identified from omics data. The online and local versions of OTTM are available at http://otter-simm.com/ottm.html.

Asunto(s)

Carcinoma Hepatocelular , Neoplasias Hepáticas , Humanos , Ciencia Traslacional Biomédica , Proteómica , Descubrimiento de Drogas

16.

Large-scale predicting protein functions through heterogeneous feature fusion.

Zheng, Rongtao; Huang, Zhijian; Deng, Lei.

Brief Bioinform ; 24(4)2023 07 20.

Artículo en Inglés | MEDLINE | ID: mdl-37401369

RESUMEN

As the volume of protein sequence and structure data grows rapidly, the functions of the overwhelming majority of proteins cannot be experimentally determined. Automated annotation of protein function at a large scale is becoming increasingly important. Existing computational prediction methods are typically based on expanding the relatively small number of experimentally determined functions to large collections of proteins with various clues, including sequence homology, protein-protein interaction, gene co-expression, etc. Although there has been some progress in protein function prediction in recent years, the development of accurate and reliable solutions still has a long way to go. Here we exploit AlphaFold predicted three-dimensional structural information, together with other non-structural clues, to develop a large-scale approach termed PredGO to annotate Gene Ontology (GO) functions for proteins. We use a pre-trained language model, geometric vector perceptrons and attention mechanisms to extract heterogeneous features of proteins and fuse these features for function prediction. The computational results demonstrate that the proposed method outperforms other state-of-the-art approaches for predicting GO functions of proteins in terms of both coverage and accuracy. The improvement of coverage is because the number of structures predicted by AlphaFold is greatly increased, and on the other hand, PredGO can extensively use non-structural information for functional prediction. Moreover, we show that over 205 000 ($\sim $100%) entries in UniProt for human are annotated by PredGO, over 186 000 ($\sim $90%) of which are based on predicted structure. The webserver and database are available at http://predgo.denglab.org/.

Asunto(s)

Biología Computacional , Proteínas , Humanos , Biología Computacional/métodos , Proteínas/química , Secuencia de Aminoácidos , Redes Neurales de la Computación , Bases de Datos Factuales , Bases de Datos de Proteínas

17.

Extracting structured information from unstructured histopathology reports using generative pre-trained transformer 4 (GPT-4).

Truhn, Daniel; Loeffler, Chiara Ml; Müller-Franzes, Gustav; Nebelung, Sven; Hewitt, Katherine J; Brandner, Sebastian; Bressem, Keno K; Foersch, Sebastian; Kather, Jakob Nikolas.

J Pathol ; 262(3): 310-319, 2024 03.

Artículo en Inglés | MEDLINE | ID: mdl-38098169

RESUMEN

Deep learning applied to whole-slide histopathology images (WSIs) has the potential to enhance precision oncology and alleviate the workload of experts. However, developing these models necessitates large amounts of data with ground truth labels, which can be both time-consuming and expensive to obtain. Pathology reports are typically unstructured or poorly structured texts, and efforts to implement structured reporting templates have been unsuccessful, as these efforts lead to perceived extra workload. In this study, we hypothesised that large language models (LLMs), such as the generative pre-trained transformer 4 (GPT-4), can extract structured data from unstructured plain language reports using a zero-shot approach without requiring any re-training. We tested this hypothesis by utilising GPT-4 to extract information from histopathological reports, focusing on two extensive sets of pathology reports for colorectal cancer and glioblastoma. We found a high concordance between LLM-generated structured data and human-generated structured data. Consequently, LLMs could potentially be employed routinely to extract ground truth data for machine learning from unstructured pathology reports in the future. © 2023 The Authors. The Journal of Pathology published by John Wiley & Sons Ltd on behalf of The Pathological Society of Great Britain and Ireland.

Asunto(s)

Glioblastoma , Medicina de Precisión , Humanos , Aprendizaje Automático , Reino Unido

18.

RDscan: Extracting RNA-disease relationship from the literature based on pre-training model.

Zhang, Yang; Yang, Yu; Ren, Liping; Ning, Lin; Zou, Quan; Luo, Nanchao; Zhang, Yinghui; Liu, Ruijun.

Methods ; 228: 48-54, 2024 Aug.

Artículo en Inglés | MEDLINE | ID: mdl-38789016

RESUMEN

With the rapid advancements in molecular biology and genomics, a multitude of connections between RNA and diseases has been unveiled, making the efficient and accurate extraction of RNA-disease (RD) relationships from extensive biomedical literature crucial for advancing research in this field. This study introduces RDscan, a novel text mining method developed based on the pre-training and fine-tuning strategy, aimed at automatically extracting RD-related information from a vast corpus of literature using pre-trained biomedical large language models (LLM). Initially, we constructed a dedicated RD corpus by manually curating from literature, comprising 2,082 positive and 2,000 negative sentences, alongside an independent test dataset (comprising 500 positive and 500 negative sentences) for training and evaluating RDscan. Subsequently, by fine-tuning the Bioformer and BioBERT pre-trained models, RDscan demonstrated exceptional performance in text classification and named entity recognition (NER) tasks. In 5-fold cross-validation, RDscan significantly outperformed traditional machine learning methods (Support Vector Machine, Logistic Regression and Random Forest). In addition, we have developed an accessible webserver that assists users in extracting RD relationships from text. In summary, RDscan represents the first text mining tool specifically designed for RD relationship extraction, and is poised to emerge as an invaluable tool for researchers dedicated to exploring the intricate interactions between RNA and diseases. Webserver of RDscan is free available at https://cellknowledge.com.cn/RDscan/.

Asunto(s)

Minería de Datos , ARN , Minería de Datos/métodos , ARN/genética , Humanos , Aprendizaje Automático , Enfermedad/genética , Máquina de Vectores de Soporte , Programas Informáticos

19.

A pantropical assessment of deforestation caused by industrial mining.

Giljum, Stefan; Maus, Victor; Kuschnig, Nikolas; Luckeneder, Sebastian; Tost, Michael; Sonter, Laura J; Bebbington, Anthony J.

Proc Natl Acad Sci U S A ; 119(38): e2118273119, 2022 09 20.

Artículo en Inglés | MEDLINE | ID: mdl-36095187

RESUMEN

Growing demand for minerals continues to drive deforestation worldwide. Tropical forests are particularly vulnerable to the environmental impacts of mining and mineral processing. Many local- to regional-scale studies document extensive, long-lasting impacts of mining on biodiversity and ecosystem services. However, the full scope of deforestation induced by industrial mining across the tropics is yet unknown. Here, we present a biome-wide assessment to show where industrial mine expansion has caused the most deforestation from 2000 to 2019. We find that 3,264 km2 of forest was directly lost due to industrial mining, with 80% occurring in only four countries: Indonesia, Brazil, Ghana, and Suriname. Additionally, controlling for other nonmining determinants of deforestation, we find that mining caused indirect forest loss in two-thirds of the investigated countries. Our results illustrate significant yet unevenly distributed and often unmanaged impacts on these biodiverse ecosystems. Impact assessments and mitigation plans of industrial mining activities must address direct and indirect impacts to support conservation of the world's tropical forests.

Asunto(s)

Biodiversidad , Conservación de los Recursos Naturales , Bosques , Minería , Conservación de los Recursos Naturales/métodos

20.

A systematic comparison of natural product potential, with an emphasis on RiPPs, by mining of bacteria of three large ecosystems.

Yi, Yunhai; Liang, Lifeng; de Jong, Anne; Kuipers, Oscar P.

Genomics ; 116(4): 110880, 2024 07.

Artículo en Inglés | MEDLINE | ID: mdl-38857812

RESUMEN

The implementation of several global microbiome studies has yielded extensive insights into the biosynthetic potential of natural microbial communities. However, studies on the distribution of several classes of ribosomally synthesized and post-translationally modified peptides (RiPPs), non-ribosomal peptides (NRPs) and polyketides (PKs) in different large microbial ecosystems have been very limited. Here, we collected a large set of metagenome-assembled bacterial genomes from marine, freshwater and terrestrial ecosystems to investigate the biosynthetic potential of these bacteria. We demonstrate the utility of public dataset collections for revealing the different secondary metabolite biosynthetic potentials among these different living environments. We show that there is a higher occurrence of RiPPs in terrestrial systems, while in marine systems, we found relatively more terpene-, NRP-, and PK encoding gene clusters. Among the many new biosynthetic gene clusters (BGCs) identified, we analyzed various Nif-11-like and nitrile hydratase leader peptide (NHLP) containing gene clusters that would merit further study, including promising products, such as mersacidin-, LAP- and proteusin analogs. This research highlights the significance of public datasets in elucidating the biosynthetic potential of microbes in different living environments and underscores the wide bioengineering opportunities within the RiPP family.

Asunto(s)

Bacterias , Productos Biológicos , Familia de Multigenes , Bacterias/metabolismo , Bacterias/genética , Bacterias/clasificación , Productos Biológicos/metabolismo , Péptidos/metabolismo , Péptidos/genética , Procesamiento Proteico-Postraduccional , Metagenoma , Proteínas Bacterianas/genética , Proteínas Bacterianas/metabolismo , Ecosistema , Genoma Bacteriano , Microbiota , Policétidos/metabolismo

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA