Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 10 de 10
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Chem Res Toxicol ; 36(8): 1267-1277, 2023 08 21.
Artigo em Inglês | MEDLINE | ID: mdl-37471124

RESUMO

Humans and animals are regularly exposed to compounds that may have adverse effects on health. The Toxicity Forecaster (ToxCast) program was developed to use high throughput screening assays to quickly screen chemicals by measuring their effects on many biological end points. Many of these assays test for effects on cellular receptors and transcription factors (TFs), under the assumption that a toxicant may perturb normal signaling pathways in the cell. We hypothesized that we could reconstruct the intermediate proteins in these pathways that may be directly or indirectly affected by the toxicant, potentially revealing important physiological processes not yet tested for many chemicals. We integrate data from ToxCast with a human protein interactome to build toxicant signaling networks that contain physical and signaling protein interactions that may be affected as a result of toxicant exposure. To build these networks, we developed the EdgeLinker algorithm, which efficiently finds short paths in the interactome that connect the receptors to TFs for each toxicant. We performed multiple evaluations and found evidence suggesting that these signaling networks capture biologically relevant effects of toxicants. To aid in dissemination and interpretation, interactive visualizations of these networks are available at http://graphspace.org.


Assuntos
Efeitos Colaterais e Reações Adversas Relacionados a Medicamentos , Ensaios de Triagem em Larga Escala , Animais , Humanos , Algoritmos , Transdução de Sinais
2.
JACS Au ; 3(1): 113-123, 2023 Jan 23.
Artigo em Inglês | MEDLINE | ID: mdl-36711088

RESUMO

The discovery of new materials in unexplored chemical spaces necessitates quick and accurate prediction of thermodynamic stability, often assessed using density functional theory (DFT), and efficient search strategies. Here, we develop a new approach to finding stable inorganic functional materials. We start by defining an upper bound to the fully relaxed energy obtained via DFT as the energy resulting from a constrained optimization over only cell volume. Because the fractional atomic coordinates for these calculations are known a priori, this upper bound energy can be quickly and accurately predicted with a scale-invariant graph neural network (GNN). We generate new structures via ionic substitution of known prototypes, and train our GNN on a new database of 128 000 DFT calculations comprising both fully relaxed and volume-only relaxed structures. By minimizing the predicted upper-bound energy, we discover new stable structures with over 99% accuracy (versus DFT). We demonstrate the method by finding promising new candidates for solid-state battery (SSB) electrolytes that not only possess the required stability, but also additional functional properties such as large electrochemical stability windows and high conduction ion fraction. We expect this proposed framework to be directly applicable to a wide range of design challenges in materials science.

3.
Bioinform Adv ; 2(1): vbac065, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-36158455

RESUMO

Motivation: Integrating multimodal data represents an effective approach to predicting biomedical characteristics, such as protein functions and disease outcomes. However, existing data integration approaches do not sufficiently address the heterogeneous semantics of multimodal data. In particular, early and intermediate approaches that rely on a uniform integrated representation reinforce the consensus among the modalities but may lose exclusive local information. The alternative late integration approach that can address this challenge has not been systematically studied for biomedical problems. Results: We propose Ensemble Integration (EI) as a novel systematic implementation of the late integration approach. EI infers local predictive models from the individual data modalities using appropriate algorithms and uses heterogeneous ensemble algorithms to integrate these local models into a global predictive model. We also propose a novel interpretation method for EI models. We tested EI on the problems of predicting protein function from multimodal STRING data and mortality due to coronavirus disease 2019 (COVID-19) from multimodal data in electronic health records. We found that EI accomplished its goal of producing significantly more accurate predictions than each individual modality. It also performed better than several established early integration methods for each of these problems. The interpretation of a representative EI model for COVID-19 mortality prediction identified several disease-relevant features, such as laboratory test (blood urea nitrogen and calcium) and vital sign measurements (minimum oxygen saturation) and demographics (age). These results demonstrated the effectiveness of the EI framework for biomedical data integration and predictive modeling. Availability and implementation: Code and data are available at https://github.com/GauravPandeyLab/ensemble_integration. Supplementary information: Supplementary data are available at Bioinformatics Advances online.

4.
bioRxiv ; 2022 Jul 25.
Artigo em Inglês | MEDLINE | ID: mdl-35923321

RESUMO

Motivation: Integrating multimodal data represents an effective approach to predicting biomedical characteristics, such as protein functions and disease outcomes. However, existing data integration approaches do not sufficiently address the heterogeneous semantics of multimodal data. In particular, early and intermediate approaches that rely on a uniform integrated representation reinforce the consensus among the modalities, but may lose exclusive local information. The alternative late integration approach that can address this challenge has not been systematically studied for biomedical problems. Results: We propose Ensemble Integration (EI) as a novel systematic implementation of the late integration approach. EI infers local predictive models from the individual data modalities using appropriate algorithms, and uses effective heterogeneous ensemble algorithms to integrate these local models into a global predictive model. We also propose a novel interpretation method for EI models. We tested EI on the problems of predicting protein function from multimodal STRING data, and mortality due to COVID-19 from multimodal data in electronic health records. We found that EI accomplished its goal of producing significantly more accurate predictions than each individual modality. It also performed better than several established early integration methods for each of these problems. The interpretation of a representative EI model for COVID-19 mortality prediction identified several disease-relevant features, such as laboratory test (blood urea nitrogen (BUN) and calcium) and vital sign measurements (minimum oxygen saturation) and demographics (age). These results demonstrated the effectiveness of the EI framework for biomedical data integration and predictive modeling. Availability: Code and data are available at https://github.com/GauravPandeyLab/ensemble_integration . Contact: gaurav.pandey@mssm.edu.

5.
Gigascience ; 10(12)2021 12 29.
Artigo em Inglês | MEDLINE | ID: mdl-34966926

RESUMO

BACKGROUND: Network propagation has been widely used for nearly 20 years to predict gene functions and phenotypes. Despite the popularity of this approach, little attention has been paid to the question of provenance tracing in this context, e.g., determining how much any experimental observation in the input contributes to the score of every prediction. RESULTS: We design a network propagation framework with 2 novel components and apply it to predict human proteins that directly or indirectly interact with SARS-CoV-2 proteins. First, we trace the provenance of each prediction to its experimentally validated sources, which in our case are human proteins experimentally determined to interact with viral proteins. Second, we design a technique that helps to reduce the manual adjustment of parameters by users. We find that for every top-ranking prediction, the highest contribution to its score arises from a direct neighbor in a human protein-protein interaction network. We further analyze these results to develop functional insights on SARS-CoV-2 that expand on known biology such as the connection between endoplasmic reticulum stress, HSPA5, and anti-clotting agents. CONCLUSIONS: We examine how our provenance-tracing method can be generalized to a broad class of network-based algorithms. We provide a useful resource for the SARS-CoV-2 community that implicates many previously undocumented proteins with putative functional relationships to viral infection. This resource includes potential drugs that can be opportunistically repositioned to target these proteins. We also discuss how our overall framework can be extended to other, newly emerging viruses.


Assuntos
COVID-19 , SARS-CoV-2 , Algoritmos , Humanos , Mapas de Interação de Proteínas , Proteínas/metabolismo
6.
Pac Symp Biocomput ; 26: 154-165, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-33691013

RESUMO

Viruses such as the novel coronavirus, SARS-CoV-2, that is wreaking havoc on the world, depend on interactions of its own proteins with those of the human host cells. Relatively small changes in sequence such as between SARS-CoV and SARS-CoV-2 can dramatically change clinical phenotypes of the virus, including transmission rates and severity of the disease. On the other hand, highly dissimilar virus families such as Coronaviridae, Ebola, and HIV have overlap in functions. In this work we aim to analyze the role of protein sequence in the binding of SARS-CoV-2 virus proteins towards human proteins and compare it to that of the above other viruses. We build supervised machine learning models, using Generalized Additive Models to predict interactions based on sequence features and find that our models perform well with an AUC-PR of 0.65 in a class-skew of 1:10. Analysis of the novel predictions using an independent dataset showed statistically significant enrichment. We further map the importance of specific amino-acid sequence features in predicting binding and summarize what combinations of sequences from the virus and the host is correlated with an interaction. By analyzing the sequence-based embeddings of the interactomes from different viruses and clustering them together we find some functionally similar proteins from different viruses. For example, vif protein from HIV-1, vp24 from Ebola and orf3b from SARS-CoV all function as interferon antagonists. Furthermore, we can differentiate the functions of similar viruses, for example orf3a's interactions are more diverged than orf7b interactions when comparing SARS-CoV and SARS-CoV-2.


Assuntos
COVID-19 , SARS-CoV-2 , Sequência de Aminoácidos , Biologia Computacional , Humanos , Proteínas
7.
Bioinformatics ; 37(6): 800-806, 2021 05 05.
Artigo em Inglês | MEDLINE | ID: mdl-33063084

RESUMO

MOTIVATION: Nearly 40% of the genes in sequenced genomes have no experimentally or computationally derived functional annotations. To fill this gap, we seek to develop methods for network-based gene function prediction that can integrate heterogeneous data for multiple species with experimentally based functional annotations and systematically transfer them to newly sequenced organisms on a genome-wide scale. However, the large sizes of such networks pose a challenge for the scalability of current methods. RESULTS: We develop a label propagation algorithm called FastSinkSource. By formally bounding its rate of progress, we decrease the running time by a factor of 100 without sacrificing accuracy. We systematically evaluate many approaches to construct multi-species bacterial networks and apply FastSinkSource and other state-of-the-art methods to these networks. We find that the most accurate and efficient approach is to pre-compute annotation scores for species with experimental annotations, and then to transfer them to other organisms. In this manner, FastSinkSource runs in under 3 min for 200 bacterial species. AVAILABILITY AND IMPLEMENTATION: An implementation of our framework and all data used in this research are available at https://github.com/Murali-group/multi-species-GOA-prediction. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Bactérias , Genoma , Algoritmos , Bactérias/genética , Sequência de Bases , Fenótipo
8.
Nat Methods ; 17(2): 147-154, 2020 02.
Artigo em Inglês | MEDLINE | ID: mdl-31907445

RESUMO

We present a systematic evaluation of state-of-the-art algorithms for inferring gene regulatory networks from single-cell transcriptional data. As the ground truth for assessing accuracy, we use synthetic networks with predictable trajectories, literature-curated Boolean models and diverse transcriptional regulatory networks. We develop a strategy to simulate single-cell transcriptional data from synthetic and Boolean networks that avoids pitfalls of previously used methods. Furthermore, we collect networks from multiple experimental single-cell RNA-seq datasets. We develop an evaluation framework called BEELINE. We find that the area under the precision-recall curve and early precision of the algorithms are moderate. The methods are better in recovering interactions in synthetic networks than Boolean models. The algorithms with the best early precision values for Boolean models also perform well on experimental datasets. Techniques that do not require pseudotime-ordered cells are generally more accurate. Based on these results, we present recommendations to end users. BEELINE will aid the development of gene regulatory network inference algorithms.


Assuntos
Algoritmos , Redes Reguladoras de Genes , Análise de Célula Única/métodos , Transcriptoma , Conjuntos de Dados como Assunto , Análise de Sequência de RNA/métodos
9.
F1000Res ; 7: 727, 2018.
Artigo em Inglês | MEDLINE | ID: mdl-30057757

RESUMO

PathLinker is a graph-theoretic algorithm originally developed to reconstruct the interactions in a signaling pathway of interest. It efficiently computes multiple short paths within a background protein interaction network from the receptors to transcription factors (TFs) in a pathway. Since December 2015, PathLinker has been available as an app for Cytoscape. This paper describes how we automated the app to use the CyRest infrastructure and how users can incorporate PathLinker into their software pipelines.

10.
F1000Res ; 6: 58, 2017.
Artigo em Inglês | MEDLINE | ID: mdl-28413614

RESUMO

PathLinker is a graph-theoretic algorithm for reconstructing the interactions in a signaling pathway of interest. It efficiently computes multiple short paths within a background protein interaction network from the receptors to transcription factors (TFs) in a pathway. We originally developed PathLinker to complement manual curation of signaling pathways, which is slow and painstaking. The method can be used in general to connect any set of sources to any set of targets in an interaction network. The app presented here makes the PathLinker functionality available to Cytoscape users. We present an example where we used PathLinker to compute and analyze the network of interactions connecting proteins that are perturbed by the drug lovastatin.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA