Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 36
Filtrar
Más filtros

Banco de datos
Tipo del documento
Intervalo de año de publicación
1.
Bioinformatics ; 40(3)2024 Mar 04.
Artículo en Inglés | MEDLINE | ID: mdl-38383067

RESUMEN

MOTIVATION: Creating knowledge bases and ontologies is a time consuming task that relies on manual curation. AI/NLP approaches can assist expert curators in populating these knowledge bases, but current approaches rely on extensive training data, and are not able to populate arbitrarily complex nested knowledge schemas. RESULTS: Here we present Structured Prompt Interrogation and Recursive Extraction of Semantics (SPIRES), a Knowledge Extraction approach that relies on the ability of Large Language Models (LLMs) to perform zero-shot learning and general-purpose query answering from flexible prompts and return information conforming to a specified schema. Given a detailed, user-defined knowledge schema and an input text, SPIRES recursively performs prompt interrogation against an LLM to obtain a set of responses matching the provided schema. SPIRES uses existing ontologies and vocabularies to provide identifiers for matched elements. We present examples of applying SPIRES in different domains, including extraction of food recipes, multi-species cellular signaling pathways, disease treatments, multi-step drug mechanisms, and chemical to disease relationships. Current SPIRES accuracy is comparable to the mid-range of existing Relation Extraction methods, but greatly surpasses an LLM's native capability of grounding entities with unique identifiers. SPIRES has the advantage of easy customization, flexibility, and, crucially, the ability to perform new tasks in the absence of any new training data. This method supports a general strategy of leveraging the language interpreting capabilities of LLMs to assemble knowledge bases, assisting manual knowledge curation and acquisition while supporting validation with publicly-available databases and ontologies external to the LLM. AVAILABILITY AND IMPLEMENTATION: SPIRES is available as part of the open source OntoGPT package: https://github.com/monarch-initiative/ontogpt.


Asunto(s)
Bases del Conocimiento , Semántica , Bases de Datos Factuales
2.
Bioinformatics ; 39(7)2023 07 01.
Artículo en Inglés | MEDLINE | ID: mdl-37389415

RESUMEN

MOTIVATION: Knowledge graphs (KGs) are a powerful approach for integrating heterogeneous data and making inferences in biology and many other domains, but a coherent solution for constructing, exchanging, and facilitating the downstream use of KGs is lacking. RESULTS: Here we present KG-Hub, a platform that enables standardized construction, exchange, and reuse of KGs. Features include a simple, modular extract-transform-load pattern for producing graphs compliant with Biolink Model (a high-level data model for standardizing biological data), easy integration of any OBO (Open Biological and Biomedical Ontologies) ontology, cached downloads of upstream data sources, versioned and automatically updated builds with stable URLs, web-browsable storage of KG artifacts on cloud infrastructure, and easy reuse of transformed subgraphs across projects. Current KG-Hub projects span use cases including COVID-19 research, drug repurposing, microbial-environmental interactions, and rare disease research. KG-Hub is equipped with tooling to easily analyze and manipulate KGs. KG-Hub is also tightly integrated with graph machine learning (ML) tools which allow automated graph ML, including node embeddings and training of models for link prediction and node classification. AVAILABILITY AND IMPLEMENTATION: https://kghub.org.


Asunto(s)
Ontologías Biológicas , COVID-19 , Humanos , Reconocimiento de Normas Patrones Automatizadas , Enfermedades Raras , Aprendizaje Automático
3.
J Biomed Inform ; 140: 104341, 2023 04.
Artículo en Inglés | MEDLINE | ID: mdl-36933632

RESUMEN

BACKGROUND: Pharmacokinetic natural product-drug interactions (NPDIs) occur when botanical or other natural products are co-consumed with pharmaceutical drugs. With the growing use of natural products, the risk for potential NPDIs and consequent adverse events has increased. Understanding mechanisms of NPDIs is key to preventing or minimizing adverse events. Although biomedical knowledge graphs (KGs) have been widely used for drug-drug interaction applications, computational investigation of NPDIs is novel. We constructed NP-KG as a first step toward computational discovery of plausible mechanistic explanations for pharmacokinetic NPDIs that can be used to guide scientific research. METHODS: We developed a large-scale, heterogeneous KG with biomedical ontologies, linked data, and full texts of the scientific literature. To construct the KG, biomedical ontologies and drug databases were integrated with the Phenotype Knowledge Translator framework. The semantic relation extraction systems, SemRep and Integrated Network and Dynamic Reasoning Assembler, were used to extract semantic predications (subject-relation-object triples) from full texts of the scientific literature related to the exemplar natural products green tea and kratom. A literature-based graph constructed from the predications was integrated into the ontology-grounded KG to create NP-KG. NP-KG was evaluated with case studies of pharmacokinetic green tea- and kratom-drug interactions through KG path searches and meta-path discovery to determine congruent and contradictory information in NP-KG compared to ground truth data. We also conducted an error analysis to identify knowledge gaps and incorrect predications in the KG. RESULTS: The fully integrated NP-KG consisted of 745,512 nodes and 7,249,576 edges. Evaluation of NP-KG resulted in congruent (38.98% for green tea, 50% for kratom), contradictory (15.25% for green tea, 21.43% for kratom), and both congruent and contradictory (15.25% for green tea, 21.43% for kratom) information compared to ground truth data. Potential pharmacokinetic mechanisms for several purported NPDIs, including the green tea-raloxifene, green tea-nadolol, kratom-midazolam, kratom-quetiapine, and kratom-venlafaxine interactions were congruent with the published literature. CONCLUSION: NP-KG is the first KG to integrate biomedical ontologies with full texts of the scientific literature focused on natural products. We demonstrate the application of NP-KG to identify known pharmacokinetic interactions between natural products and pharmaceutical drugs mediated by drug metabolizing enzymes and transporters. Future work will incorporate context, contradiction analysis, and embedding-based methods to enrich NP-KG. NP-KG is publicly available at https://doi.org/10.5281/zenodo.6814507. The code for relation extraction, KG construction, and hypothesis generation is available at https://github.com/sanyabt/np-kg.


Asunto(s)
Ontologías Biológicas , Productos Biológicos , Reconocimiento de Normas Patrones Automatizadas , Interacciones Farmacológicas , Semántica , Preparaciones Farmacéuticas
4.
mSystems ; 9(1): e0002623, 2024 Jan 23.
Artículo en Inglés | MEDLINE | ID: mdl-38078749

RESUMEN

Microbial communities have evolved to colonize all ecosystems of the planet, from the deep sea to the human gut. Microbes survive by sensing, responding, and adapting to immediate environmental cues. This process is driven by signal transduction proteins such as histidine kinases, which use their sensing domains to bind or otherwise detect environmental cues and "transduce" signals to adjust internal processes. We hypothesized that an ecosystem's unique stimuli leave a sensor "fingerprint," able to identify and shed insight on ecosystem conditions. To test this, we collected 20,712 publicly available metagenomes from Host-associated, Environmental, and Engineered ecosystems across the globe. We extracted and clustered the collection's nearly 18M unique sensory domains into 113,712 similar groupings with MMseqs2. We built gradient-boosted decision tree machine learning models and found we could classify the ecosystem type (accuracy: 87%) and predict the levels of different physical parameters (R2 score: 83%) using the sensor cluster abundance as features. Feature importance enables identification of the most predictive sensors to differentiate between ecosystems which can lead to mechanistic interpretations if the sensor domains are well annotated. To demonstrate this, a machine learning model was trained to predict patient's disease state and used to identify domains related to oxygen sensing present in a healthy gut but missing in patients with abnormal conditions. Moreover, since 98.7% of identified sensor domains are uncharacterized, importance ranking can be used to prioritize sensors to determine what ecosystem function they may be sensing. Furthermore, these new predictive sensors can function as targets for novel sensor engineering with applications in biotechnology, ecosystem maintenance, and medicine.IMPORTANCEMicrobes infect, colonize, and proliferate due to their ability to sense and respond quickly to their surroundings. In this research, we extract the sensory proteins from a diverse range of environmental, engineered, and host-associated metagenomes. We trained machine learning classifiers using sensors as features such that it is possible to predict the ecosystem for a metagenome from its sensor profile. We use the optimized model's feature importance to identify the most impactful and predictive sensors in different environments. We next use the sensor profile from human gut metagenomes to classify their disease states and explore which sensors can explain differences between diseases. The sensors most predictive of environmental labels here, most of which correspond to uncharacterized proteins, are a useful starting point for the discovery of important environment signals and the development of possible diagnostic interventions.


Asunto(s)
Metagenómica , Microbiota , Humanos , Metagenoma , Aprendizaje Automático , Planeta Tierra
5.
Front Microbiol ; 15: 1351678, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-38638909

RESUMEN

Advances in high-throughput technologies have enhanced our ability to describe microbial communities as they relate to human health and disease. Alongside the growth in sequencing data has come an influx of resources that synthesize knowledge surrounding microbial traits, functions, and metabolic potential with knowledge of how they may impact host pathways to influence disease phenotypes. These knowledge bases can enable the development of mechanistic explanations that may underlie correlations detected between microbial communities and disease. In this review, we survey existing resources and methodologies for the computational integration of broad classes of microbial and host knowledge. We evaluate these knowledge bases in their access methods, content, and source characteristics. We discuss challenges of the creation and utilization of knowledge bases including inconsistency of nomenclature assignment of taxa and metabolites across sources, whether the biological entities represented are rooted in ontologies or taxonomies, and how the structure and accessibility limit the diversity of applications and user types. We make this information available in a code and data repository at: https://github.com/lozuponelab/knowledge-source-mappings. Addressing these challenges will allow for the development of more effective tools for drawing from abundant knowledge to find new insights into microbial mechanisms in disease by fostering a systematic and unbiased exploration of existing information.

6.
Infect Dis Model ; 9(2): 634-643, 2024 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-38572058

RESUMEN

Objectives: We aim to estimate geographic variability in total numbers of infections and infection fatality ratios (IFR; the number of deaths caused by an infection per 1,000 infected people) when the availability and quality of data on disease burden are limited during an epidemic. Methods: We develop a noncentral hypergeometric framework that accounts for differential probabilities of positive tests and reflects the fact that symptomatic people are more likely to seek testing. We demonstrate the robustness, accuracy, and precision of this framework, and apply it to the United States (U.S.) COVID-19 pandemic to estimate county-level SARS-CoV-2 IFRs. Results: The estimators for the numbers of infections and IFRs showed high accuracy and precision; for instance, when applied to simulated validation data sets, across counties, Pearson correlation coefficients between estimator means and true values were 0.996 and 0.928, respectively, and they showed strong robustness to model misspecification. Applying the county-level estimators to the real, unsimulated COVID-19 data spanning April 1, 2020 to September 30, 2020 from across the U.S., we found that IFRs varied from 0 to 44.69, with a standard deviation of 3.55 and a median of 2.14. Conclusions: The proposed estimation framework can be used to identify geographic variation in IFRs across settings.

7.
Sci Data ; 11(1): 363, 2024 Apr 11.
Artículo en Inglés | MEDLINE | ID: mdl-38605048

RESUMEN

Translational research requires data at multiple scales of biological organization. Advancements in sequencing and multi-omics technologies have increased the availability of these data, but researchers face significant integration challenges. Knowledge graphs (KGs) are used to model complex phenomena, and methods exist to construct them automatically. However, tackling complex biomedical integration problems requires flexibility in the way knowledge is modeled. Moreover, existing KG construction methods provide robust tooling at the cost of fixed or limited choices among knowledge representation models. PheKnowLator (Phenotype Knowledge Translator) is a semantic ecosystem for automating the FAIR (Findable, Accessible, Interoperable, and Reusable) construction of ontologically grounded KGs with fully customizable knowledge representation. The ecosystem includes KG construction resources (e.g., data preparation APIs), analysis tools (e.g., SPARQL endpoint resources and abstraction algorithms), and benchmarks (e.g., prebuilt KGs). We evaluated the ecosystem by systematically comparing it to existing open-source KG construction methods and by analyzing its computational performance when used to construct 12 different large-scale KGs. With flexible knowledge representation, PheKnowLator enables fully customizable KGs without compromising performance or usability.


Asunto(s)
Disciplinas de las Ciencias Biológicas , Bases del Conocimiento , Reconocimiento de Normas Patrones Automatizadas , Algoritmos , Investigación Biomédica Traslacional
8.
J Biol Chem ; 287(11): 7945-55, 2012 Mar 09.
Artículo en Inglés | MEDLINE | ID: mdl-22253435

RESUMEN

Viral genomes are continually subjected to mutations, and functionally deleterious ones can be rescued by reversion or additional mutations that restore fitness. The error prone nature of HIV-1 replication has resulted in highly diverse viral sequences, and it is not clear how viral proteins such as Tat, which plays a critical role in viral gene expression and replication, retain their complex functions. Although several important amino acid positions in Tat are conserved, we hypothesized that it may also harbor functionally important residues that may not be individually conserved yet appear as correlated pairs, whose analysis could yield new mechanistic insights into Tat function and evolution. To identify such sites, we combined mutual information analysis and experimentation to identify coevolving positions and found that residues 35 and 39 are strongly correlated. Mutation of either residue of this pair into amino acids that appear in numerous viral isolates yields a defective virus; however, simultaneous introduction of both mutations into the heterologous Tat sequence restores gene expression close to wild-type Tat. Furthermore, in contrast to most coevolving protein residues that contribute to the same function, structural modeling and biochemical studies showed that these two residues contribute to two mechanistically distinct steps in gene expression: binding P-TEFb and promoting P-TEFb phosphorylation of the C-terminal domain in RNAPII. Moreover, Tat variants that mimic HIV-1 subtypes B or C at sites 35 and 39 have evolved orthogonal strengths of P-TEFb binding versus RNAPII phosphorylation, suggesting that subtypes have evolved alternate transcriptional strategies to achieve similar gene expression levels.


Asunto(s)
Evolución Molecular , Regulación Viral de la Expresión Génica/fisiología , VIH-1/fisiología , Mutación/fisiología , Replicación Viral/fisiología , Productos del Gen tat del Virus de la Inmunodeficiencia Humana/metabolismo , Genoma Viral/fisiología , Células HEK293 , Células HeLa , Humanos , Fosforilación/fisiología , Factor B de Elongación Transcripcional Positiva/genética , Factor B de Elongación Transcripcional Positiva/metabolismo , Estructura Terciaria de Proteína , ARN Polimerasa II/genética , ARN Polimerasa II/metabolismo , Especificidad de la Especie , Productos del Gen tat del Virus de la Inmunodeficiencia Humana/genética
9.
Nucleic Acids Res ; 39(6): 2344-56, 2011 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-21071404

RESUMEN

Splicing factor 1 (SF1) binds to the branch point sequence (BPS) of mammalian introns and is believed to be important for the splicing of some, but not all, introns. To help identify BPSs, particularly those that depend on SF1, we generated a BPS profile model in which SF1 binding affinity data, validated by branch point mapping, were iteratively incorporated into computational models. We searched a data set of 117,499 human introns for best matches to the SF1 Affinity Model above a threshold, and counted the number of matches at each intronic position. After subtracting a background value, we found that 87.9% of remaining high-scoring matches identified were located in a region upstream of 3'-splice sites where BPSs are typically found. Since U2AF65 recognizes the polypyrimidine tract (PPT) and forms a cooperative RNA complex with SF1, we combined the SF1 model with a PPT model computed from high affinity binding sequences for U2AF65. The combined model, together with binding site location constraints, accurately identified introns bound by SF1 that are candidates for SF1-dependent splicing.


Asunto(s)
Proteínas de Unión al ADN/metabolismo , Intrones , Modelos Genéticos , Factores de Transcripción/metabolismo , Secuencia de Bases , Sitios de Unión , Humanos , Factores de Empalme de ARN , ARN Mensajero/química , Análisis de Secuencia de ARN
10.
ArXiv ; 2023 May 25.
Artículo en Inglés | MEDLINE | ID: mdl-37292480

RESUMEN

Molecular biologists frequently interpret gene lists derived from high-throughput experiments and computational analysis. This is typically done as a statistical enrichment analysis that measures the over- or under-representation of biological function terms associated with genes or their properties, based on curated assertions from a knowledge base (KB) such as the Gene Ontology (GO). Interpreting gene lists can also be framed as a textual summarization task, enabling the use of Large Language Models (LLMs), potentially utilizing scientific texts directly and avoiding reliance on a KB. We developed SPINDOCTOR (Structured Prompt Interpolation of Natural Language Descriptions of Controlled Terms for Ontology Reporting), a method that uses GPT models to perform gene set function summarization as a complement to standard enrichment analysis. This method can use different sources of gene functional information: (1) structured text derived from curated ontological KB annotations, (2) ontology-free narrative gene summaries, or (3) direct model retrieval. We demonstrate that these methods are able to generate plausible and biologically valid summary GO term lists for gene sets. However, GPT-based approaches are unable to deliver reliable scores or p-values and often return terms that are not statistically significant. Crucially, these methods were rarely able to recapitulate the most precise and informative term from standard enrichment, likely due to an inability to generalize and reason using an ontology. Results are highly nondeterministic, with minor variations in prompt resulting in radically different term lists. Our results show that at this point, LLM-based methods are unsuitable as a replacement for standard term enrichment analysis and that manual curation of ontological assertions remains necessary.

11.
Nat Comput Sci ; 3(6): 552-568, 2023 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-38177435

RESUMEN

Graph representation learning methods opened new avenues for addressing complex, real-world problems represented by graphs. However, many graphs used in these applications comprise millions of nodes and billions of edges and are beyond the capabilities of current methods and software implementations. We present GRAPE (Graph Representation Learning, Prediction and Evaluation), a software resource for graph processing and embedding that is able to scale with big graphs by using specialized and smart data structures, algorithms, and a fast parallel implementation of random-walk-based methods. Compared with state-of-the-art software resources, GRAPE shows an improvement of orders of magnitude in empirical space and time complexity, as well as competitive edge- and node-label prediction performance. GRAPE comprises approximately 1.7 million well-documented lines of Python and Rust code and provides 69 node-embedding methods, 25 inference models, a collection of efficient graph-processing utilities, and over 80,000 graphs from the literature and other sources. Standardized interfaces allow a seamless integration of third-party libraries, while ready-to-use and modular pipelines permit an easy-to-use evaluation of graph-representation-learning methods, therefore also positioning GRAPE as a software resource that performs a fair comparison between methods and libraries for graph processing and embedding.


Asunto(s)
Bibliotecas , Vitis , Algoritmos , Programas Informáticos , Aprendizaje
12.
J Bacteriol ; 194(21): 5783-93, 2012 Nov.
Artículo en Inglés | MEDLINE | ID: mdl-22904289

RESUMEN

The carbon monoxide-sensing transcriptional factor CooA has been studied only in hydrogenogenic organisms that can grow using CO as the sole source of energy. Homologs for the canonical CO oxidation system, including CooA, CO dehydrogenase (CODH), and a CO-dependent Coo hydrogenase, are present in the sulfate-reducing bacterium Desulfovibrio vulgaris, although it grows only poorly on CO. We show that D. vulgaris Hildenborough has an active CO dehydrogenase capable of consuming exogenous CO and that the expression of the CO dehydrogenase, but not that of a gene annotated as encoding a Coo hydrogenase, is dependent on both CO and CooA. Carbon monoxide did not act as a general metabolic inhibitor, since growth of a strain deleted for cooA was inhibited by CO on lactate-sulfate but not pyruvate-sulfate. While the deletion strain did not accumulate CO in excess, as would have been expected if CooA were important in the cycling of CO as a metabolic intermediate, global transcriptional analyses suggested that CooA and CODH are used during normal metabolism.


Asunto(s)
Proteínas Bacterianas/genética , Monóxido de Carbono/metabolismo , Desulfovibrio vulgaris/genética , Eliminación de Gen , Perfilación de la Expresión Génica , Regulación Bacteriana de la Expresión Génica , Factores de Transcripción/genética , Aldehído Oxidorreductasas/metabolismo , Desulfovibrio vulgaris/crecimiento & desarrollo , Desulfovibrio vulgaris/metabolismo , Lactatos/metabolismo , Complejos Multienzimáticos/metabolismo , Ácido Pirúvico/metabolismo , Sulfatos/metabolismo
13.
BMC Genomics ; 13: 138, 2012 Apr 16.
Artículo en Inglés | MEDLINE | ID: mdl-22507456

RESUMEN

BACKGROUND: Desulfovibrio vulgaris Hildenborough is a sulfate-reducing bacterium (SRB) that is intensively studied in the context of metal corrosion and heavy-metal bioremediation, and SRB populations are commonly observed in pipe and subsurface environments as surface-associated populations. In order to elucidate physiological changes associated with biofilm growth at both the transcript and protein level, transcriptomic and proteomic analyses were done on mature biofilm cells and compared to both batch and reactor planktonic populations. The biofilms were cultivated with lactate and sulfate in a continuously fed biofilm reactor, and compared to both batch and reactor planktonic populations. RESULTS: The functional genomic analysis demonstrated that biofilm cells were different compared to planktonic cells, and the majority of altered abundances for genes and proteins were annotated as hypothetical (unknown function), energy conservation, amino acid metabolism, and signal transduction. Genes and proteins that showed similar trends in detected levels were particularly involved in energy conservation such as increases in an annotated ech hydrogenase, formate dehydrogenase, pyruvate:ferredoxin oxidoreductase, and rnf oxidoreductase, and the biofilm cells had elevated formate dehydrogenase activity. Several other hydrogenases and formate dehydrogenases also showed an increased protein level, while decreased transcript and protein levels were observed for putative coo hydrogenase as well as a lactate permease and hyp hydrogenases for biofilm cells. Genes annotated for amino acid synthesis and nitrogen utilization were also predominant changers within the biofilm state. Ribosomal transcripts and proteins were notably decreased within the biofilm cells compared to exponential-phase cells but were not as low as levels observed in planktonic, stationary-phase cells. Several putative, extracellular proteins (DVU1012, 1545) were also detected in the extracellular fraction from biofilm cells. CONCLUSIONS: Even though both the planktonic and biofilm cells were oxidizing lactate and reducing sulfate, the biofilm cells were physiologically distinct compared to planktonic growth states due to altered abundances of genes/proteins involved in carbon/energy flow and extracellular structures. In addition, average expression values for multiple rRNA transcripts and respiratory activity measurements indicated that biofilm cells were metabolically more similar to exponential-phase cells although biofilm cells are structured differently. The characterization of physiological advantages and constraints of the biofilm growth state for sulfate-reducing bacteria will provide insight into bioremediation applications as well as microbially-induced metal corrosion.


Asunto(s)
Biopelículas/crecimiento & desarrollo , Carbono/metabolismo , Desulfovibrio vulgaris/crecimiento & desarrollo , Desulfovibrio vulgaris/genética , Metabolismo Energético/genética , Perfilación de la Expresión Génica/métodos , Proteómica/métodos , Proteínas Bacterianas/genética , Proteínas Bacterianas/metabolismo , Biopelículas/efectos de los fármacos , Reactores Biológicos/microbiología , Metabolismo de los Hidratos de Carbono/efectos de los fármacos , Metabolismo de los Hidratos de Carbono/genética , Análisis por Conglomerados , Desulfovibrio vulgaris/efectos de los fármacos , Desulfovibrio vulgaris/fisiología , Metabolismo Energético/efectos de los fármacos , Regulación Bacteriana de la Expresión Génica/efectos de los fármacos , Ácido Láctico/farmacología , Microscopía Confocal , Modelos Biológicos , Plancton/citología , Plancton/efectos de los fármacos , Plancton/microbiología , Análisis de Componente Principal , ARN Mensajero/efectos de los fármacos , ARN Mensajero/genética , ARN Mensajero/metabolismo , Proteínas Ribosómicas/genética , Proteínas Ribosómicas/metabolismo , Sulfatos/farmacología
14.
Appl Environ Microbiol ; 78(4): 1168-77, 2012 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-22156435

RESUMEN

Crp/Fnr-type global transcriptional regulators regulate various metabolic pathways in bacteria and typically function in response to environmental changes. However, little is known about the function of four annotated Crp/Fnr homologs (DVU0379, DVU2097, DVU2547, and DVU3111) in Desulfovibrio vulgaris Hildenborough. A systematic study using bioinformatic, transcriptomic, genetic, and physiological approaches was conducted to characterize their roles in stress responses. Similar growth phenotypes were observed for the crp/fnr deletion mutants under multiple stress conditions. Nevertheless, the idea of distinct functions of Crp/Fnr-type regulators in stress responses was supported by phylogeny, gene transcription changes, fitness changes, and physiological differences. The four D. vulgaris Crp/Fnr homologs are localized in three subfamilies (HcpR, CooA, and cc). The crp/fnr knockout mutants were well separated by transcriptional profiling using detrended correspondence analysis (DCA), and more genes significantly changed in expression in a ΔDVU3111 mutant (JW9013) than in the other three paralogs. In fitness studies, strain JW9013 showed the lowest fitness under standard growth conditions (i.e., sulfate reduction) and the highest fitness under NaCl or chromate stress conditions; better fitness was observed for a ΔDVU2547 mutant (JW9011) under nitrite stress conditions and a ΔDVU2097 mutant (JW9009) under air stress conditions. A higher Cr(VI) reduction rate was observed for strain JW9013 in experiments with washed cells. These results suggested that the four Crp/Fnr-type global regulators play distinct roles in stress responses of D. vulgaris. DVU3111 is implicated in responses to NaCl and chromate stresses, DVU2547 in nitrite stress responses, and DVU2097 in air stress responses.


Asunto(s)
Proteína Receptora de AMP Cíclico/metabolismo , Desulfovibrio vulgaris/fisiología , Regulación Bacteriana de la Expresión Génica , Estrés Fisiológico , Factores de Transcripción/metabolismo , Transcripción Genética , Aire , Proteínas Bacterianas/genética , Proteínas Bacterianas/metabolismo , Cromatos/metabolismo , Cromatos/toxicidad , Biología Computacional , Proteína Receptora de AMP Cíclico/genética , ADN Bacteriano/química , ADN Bacteriano/genética , Desulfovibrio vulgaris/genética , Desulfovibrio vulgaris/crecimiento & desarrollo , Desulfovibrio vulgaris/metabolismo , Eliminación de Gen , Datos de Secuencia Molecular , Nitritos/metabolismo , Nitritos/toxicidad , Análisis de Secuencia de ADN , Cloruro de Sodio/metabolismo , Cloruro de Sodio/toxicidad , Factores de Transcripción/genética , Transcriptoma
15.
Nucleic Acids Res ; 38(Database issue): D396-400, 2010 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-19906701

RESUMEN

Since 2003, MicrobesOnline (http://www.microbesonline.org) has been providing a community resource for comparative and functional genome analysis. The portal includes over 1000 complete genomes of bacteria, archaea and fungi and thousands of expression microarrays from diverse organisms ranging from model organisms such as Escherichia coli and Saccharomyces cerevisiae to environmental microbes such as Desulfovibrio vulgaris and Shewanella oneidensis. To assist in annotating genes and in reconstructing their evolutionary history, MicrobesOnline includes a comparative genome browser based on phylogenetic trees for every gene family as well as a species tree. To identify co-regulated genes, MicrobesOnline can search for genes based on their expression profile, and provides tools for identifying regulatory motifs and seeing if they are conserved. MicrobesOnline also includes fast phylogenetic profile searches, comparative views of metabolic pathways, operon predictions, a workbench for sequence analysis and integration with RegTransBase and other microbial genome resources. The next update of MicrobesOnline will contain significant new functionality, including comparative analysis of metagenomic sequence data. Programmatic access to the database, along with source code and documentation, is available at http://microbesonline.org/programmers.html.


Asunto(s)
Bacterias/genética , Biología Computacional/métodos , Bases de Datos Genéticas , Bases de Datos de Ácidos Nucleicos , Algoritmos , Biología Computacional/tendencias , Bases de Datos de Proteínas , Perfilación de la Expresión Génica , Genoma Bacteriano , Almacenamiento y Recuperación de la Información/métodos , Internet , Análisis de Secuencia por Matrices de Oligonucleótidos , Estructura Terciaria de Proteína , Programas Informáticos
16.
Scientometrics ; 127(5): 2313-2349, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-35431364

RESUMEN

Multiple studies have investigated bibliometric factors predictive of the citation count a research article will receive. In this article, we go beyond bibliometric data by using a range of machine learning techniques to find patterns predictive of citation count using both article content and available metadata. As the input collection, we use the CORD-19 corpus containing research articles-mostly from biology and medicine-applicable to the COVID-19 crisis. Our study employs a combination of state-of-the-art machine learning techniques for text understanding, including embeddings-based language model BERT, several systems for detection and semantic expansion of entities: ConceptNet, Pubtator and ScispaCy. To interpret the resulting models, we use several explanation algorithms: random forest feature importance, LIME, and Shapley values. We compare the performance and comprehensibility of models obtained by "black-box" machine learning algorithms (neural networks and random forests) with models built with rule learning (CORELS, CBA), which are intrinsically explainable. Multiple rules were discovered, which referred to biomedical entities of potential interest. Of the rules with the highest lift measure, several rules pointed to dipeptidyl peptidase4 (DPP4), a known MERS-CoV receptor and a critical determinant of camel to human transmission of the camel coronavirus (MERS-CoV). Some other interesting patterns related to the type of animal investigated were found. Articles referring to bats and camels tend to draw citations, while articles referring to most other animal species related to coronavirus are lowly cited. Bat coronavirus is the only other virus from a non-human species in the betaB clade along with the SARS-CoV and SARS-CoV-2 viruses. MERS-CoV is in a sister betaC clade, also close to human SARS coronaviruses. Thus both species linked to high citation counts harbor coronaviruses which are more phylogenetically similar to human SARS viruses. On the other hand, feline (FIPV, FCOV) and canine coronaviruses (CCOV) are in the alpha coronavirus clade and more distant from the betaB clade with human SARS viruses. Other results include detection of apparent citation bias favouring authors with western sounding names. Equal performance of TF-IDF weights and binary word incidence matrix was observed, with the latter resulting in better interpretability. The best predictive performance was obtained with a "black-box" method-neural network. The rule-based models led to most insights, especially when coupled with text representation using semantic entity detection methods. Follow-up work should focus on the analysis of citation patterns in the context of phylogenetic trees, as well on patterns referring to DPP4, which is currently considered as a SARS-Cov-2 therapeutic target.

17.
Nucleic Acids Res ; 37(9): 2926-39, 2009 May.
Artículo en Inglés | MEDLINE | ID: mdl-19293273

RESUMEN

Hypothetical (HyP) and conserved HyP genes account for >30% of sequenced bacterial genomes. For the sulfate-reducing bacterium Desulfovibrio vulgaris Hildenborough, 347 of the 3634 genes were annotated as conserved HyP (9.5%) along with 887 HyP genes (24.4%). Given the large fraction of the genome, it is plausible that some of these genes serve critical cellular roles. The study goals were to determine which genes were expressed and provide a more functionally based annotation. To accomplish this, expression profiles of 1234 HyP and conserved genes were used from transcriptomic datasets of 11 environmental stresses, complemented with shotgun LC-MS/MS and AMT tag proteomic data. Genes were divided into putatively polycistronic operons and those predicted to be monocistronic, then classified by basal expression levels and grouped according to changes in expression for one or multiple stresses. One thousand two hundred and twelve of these genes were transcribed with 786 producing detectable proteins. There was no evidence for expression of 17 predicted genes. Except for the latter, monocistronic gene annotation was expanded using the above criteria along with matching Clusters of Orthologous Groups. Polycistronic genes were annotated in the same manner with inferences from their proximity to more confidently annotated genes. Two targeted deletion mutants were used as test cases to determine the relevance of the inferred functional annotations.


Asunto(s)
Desulfovibrio vulgaris/genética , Perfilación de la Expresión Génica , Genes Bacterianos , Proteínas Bacterianas/metabolismo , Desulfovibrio vulgaris/metabolismo , Regulación Bacteriana de la Expresión Génica , Proteínas Represoras/metabolismo , Eliminación de Secuencia , Estrés Fisiológico
18.
PLoS Negl Trop Dis ; 15(1): e0008895, 2021 01.
Artículo en Inglés | MEDLINE | ID: mdl-33395417

RESUMEN

A wide variety of symptoms is associated with Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) infection, and these symptoms can overlap with other conditions and diseases. Knowing the distribution of symptoms across diseases and individuals can support clinical actions on timelines shorter than those for drug and vaccine development. Here, we focus on zinc deficiency symptoms, symptom overlap with other conditions, as well as zinc effects on immune health and mechanistic zinc deficiency risk groups. There are well-studied beneficial effects of zinc on the immune system including a decreased susceptibility to and improved clinical outcomes for infectious pathogens including multiple viruses. Zinc is also an anti-inflammatory and anti-oxidative stress agent, relevant to some severe Coronavirus Disease 2019 (COVID-19) symptoms. Unfortunately, zinc deficiency is common worldwide and not exclusive to the developing world. Lifestyle choices and preexisting conditions alone can result in zinc deficiency, and we compile zinc risk groups based on a review of the literature. It is also important to distinguish chronic zinc deficiency from deficiency acquired upon viral infection and immune response and their different supplementation strategies. Zinc is being considered as prophylactic or adjunct therapy for COVID-19, with 12 clinical trials underway, highlighting the relevance of this trace element for global pandemics. Using the example of zinc, we show that there is a critical need for a deeper understanding of essential trace elements in human health, and the resulting deficiency symptoms and their overlap with other conditions. This knowledge will directly support human immune health for decreasing susceptibility, shortening illness duration, and preventing progression to severe cases in the current and future pandemics.


Asunto(s)
Tratamiento Farmacológico de COVID-19 , COVID-19/prevención & control , Zinc/administración & dosificación , Zinc/deficiencia , Antiinflamatorios/farmacología , COVID-19/inmunología , COVID-19/virología , Humanos , Sistema Inmunológico/efectos de los fármacos , Estrés Oxidativo/efectos de los fármacos , Estrés Oxidativo/inmunología , Pandemias , Factores de Riesgo , SARS-CoV-2/aislamiento & purificación
19.
Patterns (N Y) ; 2(1): 100155, 2021 Jan 08.
Artículo en Inglés | MEDLINE | ID: mdl-33196056

RESUMEN

Integrated, up-to-date data about SARS-CoV-2 and COVID-19 is crucial for the ongoing response to the COVID-19 pandemic by the biomedical research community. While rich biological knowledge exists for SARS-CoV-2 and related viruses (SARS-CoV, MERS-CoV), integrating this knowledge is difficult and time-consuming, since much of it is in siloed databases or in textual format. Furthermore, the data required by the research community vary drastically for different tasks; the optimal data for a machine learning task, for example, is much different from the data used to populate a browsable user interface for clinicians. To address these challenges, we created KG-COVID-19, a flexible framework that ingests and integrates heterogeneous biomedical data to produce knowledge graphs (KGs), and applied it to create a KG for COVID-19 response. This KG framework also can be applied to other problems in which siloed biomedical data must be quickly integrated for different research applications, including future pandemics.

20.
mSystems ; 6(1)2021 02 23.
Artículo en Inglés | MEDLINE | ID: mdl-33622857

RESUMEN

Microbiome samples are inherently defined by the environment in which they are found. Therefore, data that provide context and enable interpretation of measurements produced from biological samples, often referred to as metadata, are critical. Important contributions have been made in the development of community-driven metadata standards; however, these standards have not been uniformly embraced by the microbiome research community. To understand how these standards are being adopted, or the barriers to adoption, across research domains, institutions, and funding agencies, the National Microbiome Data Collaborative (NMDC) hosted a workshop in October 2019. This report provides a summary of discussions that took place throughout the workshop, as well as outcomes of the working groups initiated at the workshop.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA