Pesquisa | Biblioteca Virtual em Saúde

1.

Advancing Transcription Factor Binding Site Prediction Using DNA Breathing Dynamics and Sequence Transformers via Cross Attention.

Kabir, Anowarul; Bhattarai, Manish; Rasmussen, Kim Ø; Shehu, Amarda; Bishop, Alan R; Alexandrov, Boian; Usheva, Anny.

bioRxiv ; 2024 Feb 15.

Artigo em Inglês | MEDLINE | ID: mdl-38293094

RESUMO

Understanding the impact of genomic variants on transcription factor binding and gene regulation remains a key area of research, with implications for unraveling the complex mechanisms underlying various functional effects. Our study delves into the role of DNA's biophysical properties, including thermodynamic stability, shape, and flexibility in transcription factor (TF) binding. We developed a multi-modal deep learning model integrating these properties with DNA sequence data. Trained on ChIP-Seq (chromatin immunoprecipitation sequencing) data in vivo involving 690 TF-DNA binding events in human genome, our model significantly improves prediction performance in over 660 binding events, with up to 9.6% increase in AUROC metric compared to the baseline model when using no DNA biophysical properties explicitly. Further, we expanded our analysis to in vitro high-throughput Systematic Evolution of Ligands by Exponential enrichment (SELEX) and Protein Binding Microarray (PBM) datasets, comparing our model with established frameworks. The inclusion of DNA breathing features consistently improved TF binding predictions across different cell lines in these datasets. Notably, for complex ChIP-Seq datasets, integrating DNABERT2 with a cross-attention mechanism provided greater predictive capabilities and insights into the mechanisms of disease-related non-coding variants found in genome-wide association studies. This work highlights the importance of DNA biophysical characteristics in TF binding and the effectiveness of multi-modal deep learning models in gene regulation studies.

2.

Conotoxin Prediction: New Features to Increase Prediction Accuracy.

Monroe, Lyman K; Truong, Duc P; Miner, Jacob C; Adikari, Samantha H; Sasiene, Zachary J; Fenimore, Paul W; Alexandrov, Boian; Williams, Robert F; Nguyen, Hau B.

Toxins (Basel) ; 15(11)2023 11 03.

Artigo em Inglês | MEDLINE | ID: mdl-37999504

RESUMO

Conotoxins are toxic, disulfide-bond-rich peptides from cone snail venom that target a wide range of receptors and ion channels with multiple pathophysiological effects. Conotoxins have extraordinary potential for medical therapeutics that include cancer, microbial infections, epilepsy, autoimmune diseases, neurological conditions, and cardiovascular disorders. Despite the potential for these compounds in novel therapeutic treatment development, the process of identifying and characterizing the toxicities of conotoxins is difficult, costly, and time-consuming. This challenge requires a series of diverse, complex, and labor-intensive biological, toxicological, and analytical techniques for effective characterization. While recent attempts, using machine learning based solely on primary amino acid sequences to predict biological toxins (e.g., conotoxins and animal venoms), have improved toxin identification, these methods are limited due to peptide conformational flexibility and the high frequency of cysteines present in toxin sequences. This results in an enumerable set of disulfide-bridged foldamers with different conformations of the same primary amino acid sequence that affect function and toxicity levels. Consequently, a given peptide may be toxic when its cysteine residues form a particular disulfide-bond pattern, while alternative bonding patterns (isoforms) or its reduced form (free cysteines with no disulfide bridges) may have little or no toxicological effects. Similarly, the same disulfide-bond pattern may be possible for other peptide sequences and result in different conformations that all exhibit varying toxicities to the same receptor or to different receptors. We present here new features, when combined with primary sequence features to train machine learning algorithms to predict conotoxins, that significantly increase prediction accuracy.

Assuntos

Conotoxinas , Caramujo Conus , Animais , Conotoxinas/química , Caramujo Conus/química , Sequência de Aminoácidos , Peptídeos/química , Cisteína/metabolismo , Dissulfetos

3.

Examining DNA breathing with pyDNA-EPBD.

Kabir, Anowarul; Bhattarai, Manish; Rasmussen, Kim Ø; Shehu, Amarda; Usheva, Anny; Bishop, Alan R; Alexandrov, Boian.

Bioinformatics ; 39(11)2023 11 01.

Artigo em Inglês | MEDLINE | ID: mdl-37991847

RESUMO

MOTIVATION: The two strands of the DNA double helix locally and spontaneously separate and recombine in living cells due to the inherent thermal DNA motion. This dynamics results in transient openings in the double helix and is referred to as "DNA breathing" or "DNA bubbles." The propensity to form local transient openings is important in a wide range of biological processes, such as transcription, replication, and transcription factors binding. However, the modeling and computer simulation of these phenomena, have remained a challenge due to the complex interplay of numerous factors, such as, temperature, salt content, DNA sequence, hydrogen bonding, base stacking, and others. RESULTS: We present pyDNA-EPBD, a parallel software implementation of the Extended Peyrard-Bishop-Dauxois (EPBD) nonlinear DNA model that allows us to describe some features of DNA dynamics in detail. The pyDNA-EPBD generates genomic scale profiles of average base-pair openings, base flipping probability, DNA bubble probability, and calculations of the characteristically dynamic length indicating the number of base pairs statistically significantly affected by a single point mutation using the Markov Chain Monte Carlo algorithm. AVAILABILITY AND IMPLEMENTATION: pyDNA-EPBD is supported across most operating systems and is freely available at https://github.com/lanl/pyDNA_EPBD. Extensive documentation can be found at https://lanl.github.io/pyDNA_EPBD/.

Assuntos

DNA , Modelos Químicos , Simulação por Computador , DNA/química , Software , Pareamento de Bases , Conformação de Ácido Nucleico

4.

Examining DNA Breathing with pyDNA-EPBD.

Kabir, Anowarul; Bhattarai, Manish; Rasmussen, Kim Ø; Shehu, Amarda; Usheva, Anny; Bishop, Alan R; Alexandrov, Boian S.

bioRxiv ; 2023 Sep 12.

Artigo em Inglês | MEDLINE | ID: mdl-37745370

RESUMO

Motivation: The two strands of the DNA double helix locally and spontaneously separate and recombine in living cells due to the inherent thermal DNA motion.This dynamics results in transient openings in the double helix and is referred to as "DNA breathing" or "DNA bubbles." The propensity to form local transient openings is important in a wide range of biological processes, such as transcription, replication, and transcription factors binding. However, the modeling and computer simulation of these phenomena, have remained a challenge due to the complex interplay of numerous factors, such as, temperature, salt content, DNA sequence, hydrogen bonding, base stacking, and others. Results: We present pyDNA-EPBD, a parallel software implementation of the Extended Peyrard-Bishop- Dauxois (EPBD) nonlinear DNA model that allows us to describe some features of DNA dynamics in detail. The pyDNA-EPBD generates genomic scale profiles of average base-pair openings, base flipping probability, DNA bubble probability, and calculations of the characteristically dynamic length indicating the number of base pairs statistically significantly affected by a single point mutation using the Markov Chain Monte Carlo (MCMC) algorithm.

5.

Impaired cardiac glycolysis and glycogen depletion are linked to poor myocardial outcomes in juvenile male swine with metabolic syndrome and ischemia.

Broadwin, Mark; Harris, Dwight D; Sabe, Sharif A; Sengun, Elif; Sylvestre, Amber J; Alexandrov, Boian S; Sellke, Frank W; Usheva, Anny.

Physiol Rep ; 11(15): e15742, 2023 08.

Artigo em Inglês | MEDLINE | ID: mdl-37537137

RESUMO

Obesity continues to rise in the juveniles and obese children are more likely to develop metabolic syndrome (MetS) and related cardiovascular disease. Unfortunately, effective prevention and long-term treatment options remain limited. We determined the juvenile cardiac response to MetS in a swine model. Juvenile male swine were fed either an obesogenic diet, to induce MetS, or a lean diet, as a control (LD). Myocardial ischemia was induced with surgically placed ameroid constrictor on the left circumflex artery. Physiological data were recorded and at 22 weeks of age the animals underwent a terminal harvest procedure and myocardial tissue was extracted for total metabolic and proteomic LC/MS-MS, RNA-seq analysis, and data underwent nonnegative matrix factorization for metabolic signatures. Significantly altered in MetS versus. LD were the glycolysis-related metabolites and enzymes. In MetS compared with LD Glycogen synthase 1 (GYS1)-glycogen phosphorylases (PYGM/PYGL) expression disbalance resulted in a loss of myocardial glycogen. Our findings are consistent with the concept that transcriptionally driven myocardial changes in glycogen and glucose metabolism-related enzymes lead to a deficiency of their metabolite products in MetS. This abnormal energy metabolism provides insight into the pathogenesis of the juvenile heart in MetS. This study reveals that MetS and ischemia diminishes ATP availability in the myocardium via altering the glucose-G6P-pyruvate axis at the level of metabolites and gene expression of related enzymes. The observed severe glycogen depletion in MetS coincides with disbalance in expression of GYS1 and both PYGM and PYGL. This altered energy substrate metabolism is a potential target of pharmacological agents for improving juvenile myocardial function in MetS and ischemia.

Assuntos

Síndrome Metabólica , Obesidade Infantil , Suínos , Masculino , Animais , Síndrome Metabólica/metabolismo , Proteômica/métodos , Miocárdio/metabolismo , Glicólise , Isquemia/metabolismo , Modelos Animais de Doenças

6.

Uncovering novel mutational signatures by de novo extraction with SigProfilerExtractor.

Islam, S M Ashiqul; Díaz-Gay, Marcos; Wu, Yang; Barnes, Mark; Vangara, Raviteja; Bergstrom, Erik N; He, Yudou; Vella, Mike; Wang, Jingwei; Teague, Jon W; Clapham, Peter; Moody, Sarah; Senkin, Sergey; Li, Yun Rose; Riva, Laura; Zhang, Tongwu; Gruber, Andreas J; Steele, Christopher D; Otlu, Burçak; Khandekar, Azhar; Abbasi, Ammal; Humphreys, Laura; Syulyukina, Natalia; Brady, Samuel W; Alexandrov, Boian S; Pillay, Nischalan; Zhang, Jinghui; Adams, David J; Martincorena, Iñigo; Wedge, David C; Landi, Maria Teresa; Brennan, Paul; Stratton, Michael R; Rozen, Steven G; Alexandrov, Ludmil B.

Cell Genom ; 2(11): None, 2022 Nov 09.

Artigo em Inglês | MEDLINE | ID: mdl-36388765

RESUMO

Mutational signature analysis is commonly performed in cancer genomic studies. Here, we present SigProfilerExtractor, an automated tool for de novo extraction of mutational signatures, and benchmark it against another 13 bioinformatics tools by using 34 scenarios encompassing 2,500 simulated signatures found in 60,000 synthetic genomes and 20,000 synthetic exomes. For simulations with 5% noise, reflecting high-quality datasets, SigProfilerExtractor outperforms other approaches by elucidating between 20% and 50% more true-positive signatures while yielding 5-fold less false-positive signatures. Applying SigProfilerExtractor to 4,643 whole-genome- and 19,184 whole-exome-sequenced cancers reveals four novel signatures. Two of the signatures are confirmed in independent cohorts, and one of these signatures is associated with tobacco smoking. In summary, this report provides a reference tool for analysis of mutational signatures, a comprehensive benchmarking of bioinformatics tools for extracting signatures, and several novel mutational signatures, including one putatively attributed to direct tobacco smoking mutagenesis in bladder tissues.

7.

Quantum annealing algorithms for Boolean tensor networks.

Pelofske, Elijah; Hahn, Georg; O'Malley, Daniel; Djidjev, Hristo N; Alexandrov, Boian S.

Sci Rep ; 12(1): 8539, 2022 May 20.

Artigo em Inglês | MEDLINE | ID: mdl-35595786

RESUMO

Quantum annealers manufactured by D-Wave Systems, Inc., are computational devices capable of finding high-quality heuristic solutions of NP-hard problems. In this contribution, we explore the potential and effectiveness of such quantum annealers for computing Boolean tensor networks. Tensors offer a natural way to model high-dimensional data commonplace in many scientific fields, and representing a binary tensor as a Boolean tensor network is the task of expressing a tensor containing categorical (i.e., [Formula: see text]) values as a product of low dimensional binary tensors. A Boolean tensor network is computed by Boolean tensor decomposition, and it is usually not exact. The aim of such decomposition is to minimize the given distance measure between the high-dimensional input tensor and the product of lower-dimensional (usually three-dimensional) tensors and matrices representing the tensor network. In this paper, we introduce and analyze three general algorithms for Boolean tensor networks: Tucker, Tensor Train, and Hierarchical Tucker networks. The computation of a Boolean tensor network is reduced to a sequence of Boolean matrix factorizations, which we show can be expressed as a quadratic unconstrained binary optimization problem suitable for solving on a quantum annealer. By using a novel method we introduce called parallel quantum annealing, we demonstrate that Boolean tensor's with up to millions of elements can be decomposed efficiently using a DWave 2000Q quantum annealer.

8.

Improved Protein Decoy Selection via Non-Negative Matrix Factorization.

Akhter, Nasrin; Kabir, Kazi Lutful; Chennupati, Gopinath; Vangara, Raviteja; Alexandrov, Boian S; Djidjev, Hristo; Shehu, Amarda.

IEEE/ACM Trans Comput Biol Bioinform ; 19(3): 1670-1682, 2022.

Artigo em Inglês | MEDLINE | ID: mdl-33400654

RESUMO

A central challenge in protein modeling research and protein structure prediction in particular is known as decoy selection. The problem refers to selecting biologically-active/native tertiary structures among a multitude of physically-realistic structures generated by template-free protein structure prediction methods. Research on decoy selection is active. Clustering-based methods are popular, but they fail to identify good/near-native decoys on datasets where near-native decoys are severely under-sampled by a protein structure prediction method. Reasonable progress is reported by methods that additionally take into account the internal energy of a structure and employ it to identify basins in the energy landscape organizing the multitude of decoys. These methods, however, incur significant time costs for extracting basins from the landscape. In this paper, we propose a novel decoy selection method based on non-negative matrix factorization. We demonstrate that our method outperforms energy landscape-based methods. In particular, the proposed method addresses both the time cost issue and the challenge of identifying good decoys in a sparse dataset, successfully recognizing near-native decoys for both easy and hard protein targets.

Assuntos

Algoritmos , Proteínas , Análise por Conglomerados , Conformação Proteica , Dobramento de Proteína , Proteínas/química , Proteínas/genética

9.

The cardiac molecular setting of metabolic syndrome in pigs reveals disease susceptibility and suggests mechanisms that exacerbate COVID-19 outcomes in patients.

Ziegler, Olivia; Sriram, Nivedita; Gelev, Vladimir; Radeva, Denitsa; Todorov, Kostadin; Feng, Jun; Sellke, Frank W; Robson, Simon C; Hiromura, Makoto; Alexandrov, Boian S; Usheva, Anny.

Sci Rep ; 11(1): 19752, 2021 10 05.

Artigo em Inglês | MEDLINE | ID: mdl-34611227

RESUMO

Although metabolic syndrome (MetS) is linked to an elevated risk of cardiovascular disease (CVD), the cardiac-specific risk mechanism is unknown. Obesity, hypertension, and diabetes (all MetS components) are the most common form of CVD and represent risk factors for worse COVID-19 outcomes compared to their non MetS peers. Here, we use obese Yorkshire pigs as a highly relevant animal model of human MetS, where pigs develop the hallmarks of human MetS and reproducibly mimics the myocardial pathophysiology in patients. Myocardium-specific mass spectroscopy-derived metabolomics, proteomics, and transcriptomics enabled the identity and quality of proteins and metabolites to be investigated in the myocardium to greater depth. Myocardium-specific deregulation of pro-inflammatory markers, propensity for arterial thrombosis, and platelet aggregation was revealed by computational analysis of differentially enriched pathways between MetS and control animals. While key components of the complement pathway and the immune response to viruses are under expressed, key N6-methyladenosin RNA methylation enzymes are largely overexpressed in MetS. Blood tests do not capture the entirety of metabolic changes that the myocardium undergoes, making this analysis of greater value than blood component analysis alone. Our findings create data associations to further characterize the MetS myocardium and disease vulnerability, emphasize the need for a multimodal therapeutic approach, and suggests a mechanism for observed worse outcomes in MetS patients with COVID-19 comorbidity.

Assuntos

COVID-19/patologia , Suscetibilidade a Doenças , Síndrome Metabólica/patologia , Animais , Fatores de Coagulação Sanguínea/genética , Fatores de Coagulação Sanguínea/metabolismo , COVID-19/complicações , COVID-19/virologia , Ciclo-Oxigenase 2/genética , Ciclo-Oxigenase 2/metabolismo , Dieta Hiperlipídica/veterinária , Modelos Animais de Doenças , Humanos , Imunidade Inata/genética , Síndrome Metabólica/complicações , Síndrome Metabólica/metabolismo , Metiltransferases/genética , Metiltransferases/metabolismo , Miocárdio/metabolismo , Estresse Oxidativo/genética , Agregação Plaquetária , Receptores Purinérgicos P2Y1/genética , Receptores Purinérgicos P2Y1/metabolismo , Sistema Renina-Angiotensina , Fatores de Risco , SARS-CoV-2/isolamento & purificação , Suínos , Ativador de Plasminogênio Tipo Uroquinase/genética , Ativador de Plasminogênio Tipo Uroquinase/metabolismo

10.

Author Correction: The cardiac molecular setting of metabolic syndrome in pigs reveals disease susceptibility and suggests mechanisms that exacerbate COVID-19 outcomes in patients.

Ziegler, Olivia; Sriram, Nivedita; Gelev, Vladimir; Radeva, Denitsa; Todorov, Kostadin; Feng, Jun; Sellke, Frank W; Robson, Simon C; Hiromura, Makoto; Alexandrov, Boian S; Usheva, Anny.

Sci Rep ; 11(1): 21613, 2021 Oct 28.

Artigo em Inglês | MEDLINE | ID: mdl-34711865

11.

The genomic and epigenomic evolutionary history of papillary renal cell carcinomas.

Zhu, Bin; Poeta, Maria Luana; Costantini, Manuela; Zhang, Tongwu; Shi, Jianxin; Sentinelli, Steno; Zhao, Wei; Pompeo, Vincenzo; Cardelli, Maurizio; Alexandrov, Boian S; Otlu, Burcak; Hua, Xing; Jones, Kristine; Brodie, Seth; Dabrowska, Malgorzata Ewa; Toro, Jorge R; Yeager, Meredith; Wang, Mingyi; Hicks, Belynda; Alexandrov, Ludmil B; Brown, Kevin M; Wedge, David C; Chanock, Stephen; Fazio, Vito Michele; Gallucci, Michele; Landi, Maria Teresa.

Nat Commun ; 11(1): 3096, 2020 06 18.

Artigo em Inglês | MEDLINE | ID: mdl-32555180

RESUMO

Intratumor heterogeneity (ITH) and tumor evolution have been well described for clear cell renal cell carcinomas (ccRCC), but they are less studied for other kidney cancer subtypes. Here we investigate ITH and clonal evolution of papillary renal cell carcinoma (pRCC) and rarer kidney cancer subtypes, integrating whole-genome sequencing and DNA methylation data. In 29 tumors, up to 10 samples from the center to the periphery of each tumor, and metastatic samples in 2 cases, enable phylogenetic analysis of spatial features of clonal expansion, which shows congruent patterns of genomic and epigenomic evolution. In contrast to previous studies of ccRCC, in pRCC, driver gene mutations and most arm-level somatic copy number alterations (SCNAs) are clonal. These findings suggest that a single biopsy would be sufficient to identify the important genetic drivers and that targeting large-scale SCNAs may improve pRCC treatment, which is currently poor. While type 1 pRCC displays near absence of structural variants (SVs), the more aggressive type 2 pRCC and the rarer subtypes have numerous SVs, which should be pursued for prognostic significance.

Assuntos

Carcinoma de Células Renais/genética , Neoplasias Renais/genética , Variações do Número de Cópias de DNA/genética , Epigenômica , Mutação em Linhagem Germinativa/genética , Humanos , Filogenia

12.

Metabolomics and the pig model reveal aberrant cardiac energy metabolism in metabolic syndrome.

Karimi, Maryam; Petkova, Victoria; Asara, John M; Griffin, Michael J; Sellke, Frank W; Bishop, Alan R; Alexandrov, Boian S; Usheva, Anny.

Sci Rep ; 10(1): 3483, 2020 02 26.

Artigo em Inglês | MEDLINE | ID: mdl-32103083

RESUMO

Although metabolic syndrome (MS) is a significant risk of cardiovascular disease (CVD), the cardiac response (MR) to MS remains unclear due to traditional MS models' narrow scope around a limited number of cell-cycle regulation biomarkers and drawbacks of limited human tissue samples. To date, we developed the most comprehensive platform studying MR to MS in a pig model tightly related to human MS criteria. By incorporating comparative metabolomic, transcriptomic, functional analyses, and unsupervised machine learning (UML), we can discover unknown metabolic pathways connections and links on numerous biomarkers across the MS-associated issues in the heart. For the first time, we show severely diminished availability of glycolytic and citric acid cycle (CAC) pathways metabolites, altered expression, GlcNAcylation, and activity of involved enzymes. A notable exception, however, is the excessive succinate accumulation despite reduced succinate dehydrogenase complex iron-sulfur subunit b (SDHB) expression and decreased content of precursor metabolites. Finally, the expression of metabolites and enzymes from the GABA-glutamate, GABA-putrescine, and the glyoxylate pathways significantly increase, suggesting an alternative cardiac means to replenish succinate and malate in MS. Our platform discovers potential therapeutic targets for MS-associated CVD within pathways that were previously unknown to corelate with the disease.

Assuntos

Metabolismo Energético , Síndrome Metabólica/patologia , Metaboloma , Metabolômica/métodos , Miocárdio/metabolismo , Animais , Biomarcadores/metabolismo , Ciclo do Ácido Cítrico/genética , Dieta Hiperlipídica , Modelos Animais de Doenças , Glicólise/genética , Masculino , Síndrome Metabólica/metabolismo , Fatores de Risco , Succinato Desidrogenase/metabolismo , Ácido Succínico/metabolismo , Suínos , Aprendizado de Máquina não Supervisionado

13.

Robust effect of metabolic syndrome on major metabolic pathways in the myocardium.

Karimi, Maryam; Pavlov, Vasile I; Ziegler, Olivia; Sriram, Nivedita; Yoon, Se-Young; Agbortoko, Vahid; Alexandrova, Stoiana; Asara, John; Sellke, Frank W; Sturek, Michael; Feng, Jun; Alexandrov, Boian S; Usheva, Anny.

PLoS One ; 14(12): e0225857, 2019.

Artigo em Inglês | MEDLINE | ID: mdl-31790488

RESUMO

Although the high-fat-diet-induced metabolic syndrome (MetS) is a precursor of human cardiac pathology, the myocardial metabolic state in MetS is far from clear. The discrepancies in metabolite handling between human and small animal models and the difficulties inherent in obtaining human tissue complicate the identification of the myocardium-specific metabolic response in patients. Here we use the large animal model of swine that develops the hallmark criteria of human MetS. Our comparative metabolomics together with transcriptomics and computational nonnegative matrix factorization (NMF) interpretation of the data exposes significant decline in metabolites related to the fatty acid oxidation, glycolysis, and pentose phosphate pathway. Behind the reversal lies decreased expression of enzymes that operate in the pathways. We showed that diminished glycogen deposition is a metabolic signature of MetS in the pig myocardium. The depletion of glycogen arises from disbalance in expression of genes that break down and synthesize glycogen. We show robust acetoacetate accumulation and activated expression of key enzymes in ketone body formation, catabolism and transporters, suggesting a shift in fuel utilization in MetS. A contrasting enrichment in O-GlcNAcylated proteins uncovers hexosamine pathway and O-GlcNAcase (OGA) expression involvement in the myocardial response to MetS. Although the hexosamine biosynthetic pathway (HBP) activity and the availability of the UDP-GlcNAc substrate in the MetS myocardium is low, the level of O-GlcNacylated proteins is high as the O-GlcNacase is significantly diminished. Our data support the perception of transcriptionally driven myocardial alterations in expression of standard fatty acids, glucose metabolism, glycogen, and ketone body related enzymes and subsequent paucity of their metabolite products in MetS. This aberrant energy metabolism in the MetS myocardium provide insight into the pathogenesis of CVD in MetS.

Assuntos

Redes e Vias Metabólicas , Síndrome Metabólica/metabolismo , Miocárdio/metabolismo , Animais , Colesterol na Dieta/efeitos adversos , Dieta , Glicosilação , Masculino , Metaboloma , Metabolômica , N-Acetilglucosaminiltransferases/metabolismo , RNA Mensageiro/genética , RNA Mensageiro/metabolismo , Fatores de Risco , Suínos , Aprendizado de Máquina não Supervisionado , beta-N-Acetil-Hexosaminidases/metabolismo

14.

Unsupervised Machine Learning for Analysis of Phase Separation in Ternary Lipid Mixture.

Löpez, Cesar A; Vesselinov, Velimir V; Gnanakaran, S; Alexandrov, Boian S.

J Chem Theory Comput ; 15(11): 6343-6357, 2019 Nov 12.

Artigo em Inglês | MEDLINE | ID: mdl-31476122

RESUMO

Phase separation in mixed lipid systems has been extensively studied both experimentally and theoretically because of its biological importance. A detailed description of such complex systems undoubtedly requires novel mathematical frameworks that are capable of decomposing and categorizing the evolution of thousands if not millions of lipids involved in the phenomenon. The interpretation and analysis of molecular dynamics (MD) simulations representing temporal and spatial changes in such systems are still a challenging task. Here, we present an unsupervised machine learning approach based on nonnegative matrix factorization called NMFk that successfully extracts latent (i.e., not directly observable) features from the second layer neighborhood profiles derived from coarse-grained MD simulations of a ternary lipid mixture. Our results demonstrate that NMFk extracts physically meaningful features that uniquely describe the phase separation such as locations and roles of different lipid types, formation of nanodomains, and timescales of lipid segregation.

Assuntos

Lipídeos/química , Aprendizado de Máquina não Supervisionado , 1,2-Dipalmitoilfosfatidilcolina/química , Colesterol/química , Bicamadas Lipídicas/química , Simulação de Dinâmica Molecular , Fosfatidilcolinas/química

15.

Nonnegative tensor factorization for contaminant source identification.

Vesselinov, Velimir V; Alexandrov, Boian S; O'Malley, Daniel.

J Contam Hydrol ; 220: 66-97, 2019 Jan.

Artigo em Inglês | MEDLINE | ID: mdl-30528243

RESUMO

Unsupervised Machine Learning (ML) is becoming increasingly popular for solving various types of data analytics problems including feature extraction, blind source separation, exploratory analyses, model diagnostics, etc. Here, we have developed a new unsupervised ML method based on Nonnegative Tensor Factorization (NTF) for identification of the original groundwater types (including contaminant sources) present in geochemical mixtures observed in an aquifer. Frequently, groundwater types with different geochemical signatures are related to different background and/or contamination sources. The characterization of groundwater mixing processes is a challenging but very important task critical for any environmental management project aiming to characterize the fate and transport of contaminants in the subsurface and perform contaminant remediation. This task typically requires solving complex inverse models representing groundwater flow and geochemical transport in the aquifer, where the inverse analysis accounts for available site data. Usually, the model is calibrated against the available data characterizing the spatial and temporal distribution of the observed geochemical types. Numerous different geochemical constituents and processes may need to be simulated in these models which further complicates the analyses. Additionally, the application of inverse methods may introduce biases in the analyses through the assumptions made in the model development process. Here, we substitute the model inversion with unsupervised ML analysis. The ML analysis does not make any assumptions about underlying physical and geochemical processes occurring in the aquifer. Our ML methodology, called NTFk, is capable of identifying (1) the unknown number of groundwater types (contaminant sources) present in the aquifer, (2) the original geochemical concentrations (signatures) of these groundwater types and (3) spatial and temporal dynamics in the mixing of these groundwater types. These results are obtained only from the measured geochemical data without any additional site information. In general, the NTFk methodology allows for interpretation of large high-dimensional datasets representing diverse spatial and temporal components such as state variables and velocities. NTFk has been tested on synthetic and real-world site three-dimensional datasets. The NTFk algorithm is designed to work with geochemical data represented in the form of concentrations, ratios (of two constituents; for example, isotope ratios), and delta notations (standard normalized stable isotope ratios).

Assuntos

Água Subterrânea , Poluentes Químicos da Água , Monitoramento Ambiental , Isótopos

16.

Nonnegative/Binary matrix factorization with a D-Wave quantum annealer.

O'Malley, Daniel; Vesselinov, Velimir V; Alexandrov, Boian S; Alexandrov, Ludmil B.

PLoS One ; 13(12): e0206653, 2018.

Artigo em Inglês | MEDLINE | ID: mdl-30532243

RESUMO

D-Wave quantum annealers represent a novel computational architecture and have attracted significant interest. Much of this interest has focused on the quantum behavior of D-Wave machines, and there have been few practical algorithms that use the D-Wave. Machine learning has been identified as an area where quantum annealing may be useful. Here, we show that the D-Wave 2X can be effectively used as part of an unsupervised machine learning method. This method takes a matrix as input and produces two low-rank matrices as output-one containing latent features in the data and another matrix describing how the features can be combined to approximately reproduce the input matrix. Despite the limited number of bits in the D-Wave hardware, this method is capable of handling a large input matrix. The D-Wave only limits the rank of the two output matrices. We apply this method to learn the features from a set of facial images and compare the performance of the D-Wave to two classical tools. This method is able to learn facial features and accurately reproduce the set of facial images. The performance of the D-Wave shows some promise, but has some limitations. It outperforms the two classical codes in a benchmark when only a short amount of computational time is allowed (200-20,000 microseconds), but these results suggest heuristics that would likely outperform the D-Wave in this benchmark.

Assuntos

Aprendizado de Máquina , Modelos Teóricos , Teoria Quântica

17.

Nonnegative Matrix Factorization for identification of unknown number of sources emitting delayed signals.

Iliev, Filip L; Stanev, Valentin G; Vesselinov, Velimir V; Alexandrov, Boian S.

PLoS One ; 13(3): e0193974, 2018.

Artigo em Inglês | MEDLINE | ID: mdl-29518126

RESUMO

Factor analysis is broadly used as a powerful unsupervised machine learning tool for reconstruction of hidden features in recorded mixtures of signals. In the case of a linear approximation, the mixtures can be decomposed by a variety of model-free Blind Source Separation (BSS) algorithms. Most of the available BSS algorithms consider an instantaneous mixing of signals, while the case when the mixtures are linear combinations of signals with delays is less explored. Especially difficult is the case when the number of sources of the signals with delays is unknown and has to be determined from the data as well. To address this problem, in this paper, we present a new method based on Nonnegative Matrix Factorization (NMF) that is capable of identifying: (a) the unknown number of the sources, (b) the delays and speed of propagation of the signals, and (c) the locations of the sources. Our method can be used to decompose records of mixtures of signals with delays emitted by an unknown number of sources in a nondispersive medium, based only on recorded data. This is the case, for example, when electromagnetic signals from multiple antennas are received asynchronously; or mixtures of acoustic or seismic signals recorded by sensors located at different positions; or when a shift in frequency is induced by the Doppler effect. By applying our method to synthetic datasets, we demonstrate its ability to identify the unknown number of sources as well as the waveforms, the delays, and the strengths of the signals. Using Bayesian analysis, we also evaluate estimation uncertainties and identify the region of likelihood where the positions of the sources can be found.

Assuntos

Análise Fatorial , Processamento de Sinais Assistido por Computador , Aprendizado de Máquina não Supervisionado , Algoritmos , Teorema de Bayes , Conjuntos de Dados como Assunto , Análise de Fourier , Cadeias de Markov , Método de Monte Carlo

18.

Contaminant source identification using semi-supervised machine learning.

Vesselinov, Velimir V; Alexandrov, Boian S; O'Malley, Daniel.

J Contam Hydrol ; 212: 134-142, 2018 05.

Artigo em Inglês | MEDLINE | ID: mdl-29174719

RESUMO

Identification of the original groundwater types present in geochemical mixtures observed in an aquifer is a challenging but very important task. Frequently, some of the groundwater types are related to different infiltration and/or contamination sources associated with various geochemical signatures and origins. The characterization of groundwater mixing processes typically requires solving complex inverse models representing groundwater flow and geochemical transport in the aquifer, where the inverse analysis accounts for available site data. Usually, the model is calibrated against the available data characterizing the spatial and temporal distribution of the observed geochemical types. Numerous different geochemical constituents and processes may need to be simulated in these models which further complicates the analyses. In this paper, we propose a new contaminant source identification approach that performs decomposition of the observation mixtures based on Non-negative Matrix Factorization (NMF) method for Blind Source Separation (BSS), coupled with a custom semi-supervised clustering algorithm. Our methodology, called NMFk, is capable of identifying (a) the unknown number of groundwater types and (b) the original geochemical concentration of the contaminant sources from measured geochemical mixtures with unknown mixing ratios without any additional site information. NMFk is tested on synthetic and real-world site data. The NMFk algorithm works with geochemical data represented in the form of concentrations, ratios (of two constituents; for example, isotope ratios), and delta notations (standard normalized stable isotope ratios).

Assuntos

Água Subterrânea/química , Aprendizado de Máquina Supervisionado , Poluentes Químicos da Água/química , Monitoramento Ambiental/métodos , Isótopos/análise

19.

Evaluating the role of coherent delocalized phonon-like modes in DNA cyclization.

Alexandrov, Ludmil B; Rasmussen, Kim Ø; Bishop, Alan R; Alexandrov, Boian S.

Sci Rep ; 7(1): 9731, 2017 08 29.

Artigo em Inglês | MEDLINE | ID: mdl-28851939

RESUMO

The innate flexibility of a DNA sequence is quantified by the Jacobson-Stockmayer's J-factor, which measures the propensity for DNA loop formation. Recent studies of ultra-short DNA sequences revealed a discrepancy of up to six orders of magnitude between experimentally measured and theoretically predicted J-factors. These large differences suggest that, in addition to the elastic moduli of the double helix, other factors contribute to loop formation. Here, we develop a new theoretical model that explores how coherent delocalized phonon-like modes in DNA provide single-stranded "flexible hinges" to assist in loop formation. We combine the Czapla-Swigon-Olson structural model of DNA with our extended Peyrard-Bishop-Dauxois model and, without changing any of the parameters of the two models, apply this new computational framework to 86 experimentally characterized DNA sequences. Our results demonstrate that the new computational framework can predict J-factors within an order of magnitude of experimental measurements for most ultra-short DNA sequences, while continuing to accurately describe the J-factors of longer sequences. Further, we demonstrate that our computational framework can be used to describe the cyclization of DNA sequences that contain a base pair mismatch. Overall, our results support the conclusion that coherent delocalized phonon-like modes play an important role in DNA cyclization.

Assuntos

DNA/química , Modelos Químicos , Conformação de Ácido Nucleico , Fônons , Algoritmos , Pareamento de Bases , Sequência de Bases , Ciclização

20.

Dynamical Model of Drug Accumulation in Bacteria: Sensitivity Analysis and Experimentally Testable Predictions.

Vesselinova, Neda; Alexandrov, Boian S; Wall, Michael E.

PLoS One ; 11(11): e0165899, 2016.

Artigo em Inglês | MEDLINE | ID: mdl-27824914

RESUMO

We present a dynamical model of drug accumulation in bacteria. The model captures key features in experimental time courses on ofloxacin accumulation: initial uptake; two-phase response; and long-term acclimation. In combination with experimental data, the model provides estimates of import and export rates in each phase, the time of entry into the second phase, and the decrease of internal drug during acclimation. Global sensitivity analysis, local sensitivity analysis, and Bayesian sensitivity analysis of the model provide information about the robustness of these estimates, and about the relative importance of different parameters in determining the features of the accumulation time courses in three different bacterial species: Escherichia coli, Staphylococcus aureus, and Pseudomonas aeruginosa. The results lead to experimentally testable predictions of the effects of membrane permeability, drug efflux and trapping (e.g., by DNA binding) on drug accumulation. A key prediction is that a sudden increase in ofloxacin accumulation in both E. coli and S. aureus is accompanied by a decrease in membrane permeability.

Assuntos

Antibacterianos/farmacocinética , Bactérias/metabolismo , Modelos Biológicos , Bactérias/efeitos dos fármacos , Teorema de Bayes , Permeabilidade da Membrana Celular , Escherichia coli/metabolismo , Pseudomonas aeruginosa/metabolismo , Staphylococcus aureus/metabolismo

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA