RESUMEN
BACKGROUND: We studied the evolution over time of diffusion weighted imaging (DWI) lesion volume and the factors involved on early and late infarct growth (EIG and LIG) in stroke patients undergoing endovascular treatment (EVT) according to the final revascularization grade. METHODS: This is a prospective cohort of patients with anterior large artery occlusion undergoing EVT arriving at 1 comprehensive stroke center. Magnetic resonance imaging was performed on arrival (pre-EVT), <2 hours after EVT (post-EVT), and on day 5. DWI lesions and perfusion maps were evaluated. Arterial revascularization was assessed according to the modified Thrombolysis in Cerebral Infarction (mTICI) grades. We recorded National Institutes of Health Stroke Scale at arrival and at day 7. EIG was defined as (DWI volume post-EVT-DWI volume pre-EVT), and LIG was defined as (DWI volume at 5d-DWI volume post-EVT). Factors involved in EIG and LIG were tested via multivariable lineal models. RESULTS: We included 98 patients (mean age 70, median National Institutes of Health Stroke Scale score 17, final mTICI≥2b 86%). Median EIG and LIG were 48 and 63.3 mL in patients with final mTICI<2b, and 3.6 and 3.9 cc in patients with final mTICI≥2b. Both EIG and LIG were associated with higher National Institutes of Health Stroke Scale at day 7 (ρ=0.667; P<0.01 and ρ=0.614; P<0.01, respectively). In patients with final mTICI≥2b, each 10% increase in the volume of DWI pre-EVT and each extra pass leaded to growths of 9% (95% CI, 7%-10%) and 14% (95% CI, 2%-28%) in the DWI volume post-EVT, respectively. Furthermore, each 10% increase in the volume of DWI post-EVT, each extra pass, and each 10 mL increase in TMax6s post-EVT were associated with growths of 8% (95% CI, 6%-9%), 9% (95% CI, 0%-19%), and 12% (95% CI, 5%-20%) in the volume of DWI post-EVT, respectively. CONCLUSIONS: Infarct grows during and after EVT, especially in nonrecanalizers but also to a lesser extent in recanalizers. In recanalizers, number of passes and DWI volume influence EIG, while number of passes, DWI, and hypoperfused volume after the procedure determine LIG.
Asunto(s)
Isquemia Encefálica , Procedimientos Endovasculares , Accidente Cerebrovascular , Humanos , Anciano , Estudios Prospectivos , Resultado del Tratamiento , Accidente Cerebrovascular/terapia , Infarto Cerebral/complicaciones , Imagen por Resonancia Magnética , Trombectomía/métodos , Procedimientos Endovasculares/métodos , Isquemia Encefálica/complicaciones , Estudios RetrospectivosRESUMEN
MOTIVATION: An important aspect of infectious disease research involves understanding the differences and commonalities in the infection mechanisms underlying various diseases. Systems biology-based approaches study infectious diseases by analyzing the interactions between the host species and the pathogen organisms. This work aims to combine the knowledge from experimental studies of host-pathogen interactions in several diseases to build stronger predictive models. Our approach is based on a formalism from machine learning called 'multitask learning', which considers the problem of building models across tasks that are related to each other. A 'task' in our scenario is the set of host-pathogen protein interactions involved in one disease. To integrate interactions from several tasks (i.e. diseases), our method exploits the similarity in the infection process across the diseases. In particular, we use the biological hypothesis that similar pathogens target the same critical biological processes in the host, in defining a common structure across the tasks. RESULTS: Our current work on host-pathogen protein interaction prediction focuses on human as the host, and four bacterial species as pathogens. The multitask learning technique we develop uses a task-based regularization approach. We find that the resulting optimization problem is a difference of convex (DC) functions. To optimize, we implement a Convex-Concave procedure-based algorithm. We compare our integrative approach to baseline methods that build models on a single host-pathogen protein interaction dataset. Our results show that our approach outperforms the baselines on the training data. We further analyze the protein interaction predictions generated by the models, and find some interesting insights. AVAILABILITY: The predictions and code are available at: http://www.cs.cmu.edu/â¼mkshirsa/ismb2013_paper320.html . SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Asunto(s)
Inteligencia Artificial , Proteínas Bacterianas/metabolismo , Interacciones Huésped-Patógeno , Mapeo de Interacción de Proteínas/métodos , Algoritmos , HumanosRESUMEN
OBJECTIVE: The risk factors for seizure recurrence after acute symptomatic seizure due to a structural brain lesion are not well established. The aim of this study was to analyze possible associations between demographic, clinical, and electroencephalographic variables and epilepsy development in patients with acute symptomatic seizure due to an acute structural brain lesion. METHODS: We designed an observational prospective study of patients with acute symptomatic seizure due to an acute structural brain lesion (hemorrhagic stroke, ischemic stroke, traumatic brain injury, or meningoencephalitis) who underwent EEG during their initial admission between January 2015 and January 2020. We analyzed prospectively recorded demographic, clinical, electroencephalographic (EEG), and treatment-related variables. All variables were compared between patients with and without seizure recurrence during 2 years of follow up. RESULTS: We included 194 patients (41.2 % women; mean [SD] age, 57.3 [15.8] years) with acute symptomatic seizure due to an acute structural brain lesion. They all underwent EEG during admission and were followed for at least 2 years. The identifiable causes were hemorrhagic stroke (44.8 %), ischemic stroke (19.5 %), traumatic brain injury (18.5 %), and meningoencephalitis (17 %). Fifty-six patients (29 %) experienced a second seizure during follow-up. Seizure recurrence was associated with epileptiform discharges on EEG (52% vs 32 %; OR 2.3 [95 % CI, 1.2-4.3], p = 0.008) and onset with status epilepticus (17% vs 0.05 %, OR 4.03 [95 % CI 1.45-11.2], p = 0.009). CONCLUSIONS: Epileptiform discharges on EEG and status epilepticus in patients with acute symptomatic seizure due to an acute structural brain lesion are associated with a higher risk of epilepsy development.
Asunto(s)
Electroencefalografía , Recurrencia , Convulsiones , Humanos , Femenino , Masculino , Persona de Mediana Edad , Convulsiones/fisiopatología , Convulsiones/etiología , Adulto , Anciano , Estudios Prospectivos , Factores de Riesgo , Meningoencefalitis/fisiopatología , Meningoencefalitis/complicaciones , Estudios de SeguimientoRESUMEN
MOTIVATION: Approaches that use supervised machine learning techniques for protein-protein interaction (PPI) prediction typically use features obtained by integrating several sources of data. Often certain attributes of the data are not available, resulting in missing values. In particular, our host-pathogen PPI datasets have a large fraction, in the range of 58-85% of missing values, which makes it challenging to apply machine learning algorithms. RESULTS: We show that specialized techniques for missing value imputation can improve the performance of the models significantly. We use cross species information in combination with machine learning techniques like Group lasso with â(1)/â(2) regularization. We demonstrate the benefits of our approach on two PPI prediction problems. In our first example of Salmonella-human PPI prediction, we are able to obtain high prediction accuracies with 77.6% precision and 84% recall. Comparison with various other techniques shows an improvement of 9 in F1 score over the next best technique. We also apply our method to Yersinia-human PPI prediction successfully, demonstrating the generality of our approach. AVAILABILITY: Predicted interactions, datasets, features are available at: http://www.cs.cmu.edu/~mkshirsa/eccb2012_paper46.html. CONTACT: judithks@cs.cmu.edu SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Asunto(s)
Inteligencia Artificial , Proteínas Bacterianas/metabolismo , Interacciones Huésped-Patógeno , Mapeo de Interacción de Proteínas/métodos , Algoritmos , Proteínas Bacterianas/química , Expresión Génica , Interacciones Huésped-Patógeno/genética , Humanos , Salmonella/metabolismo , Análisis de Secuencia de Proteína , Yersinia pestis/metabolismoRESUMEN
The emerging field of vascular composite allotransplantation (VCA) has become a clinical reality. Building upon cutting edge understandings of transplant surgery and immunology, complex grafts such as hands and faces can now be transplanted with success. Many of the challenges that have historically been limiting factors in transplantation, such as rejection and the morbidity of immunosuppression, remain challenges in VCA. Because of the accessibility of most VCA grafts, and the highly immunogenic nature of the skin in particular, VCA has become the focal point for cross-disciplinary approaches to developing novel approaches for some of the most challenging immunological problems in transplantation, particularly the early diagnoses and assessment of rejection. This paper provides a historically oriented introduction to the field of organ transplantation and the evolution of VCA.
Asunto(s)
Rechazo de Injerto/inmunología , Procedimientos de Cirugía Plástica/métodos , Complicaciones Posoperatorias/inmunología , Piel/inmunología , Animales , Diagnóstico Precoz , Rechazo de Injerto/diagnóstico , Rechazo de Injerto/etiología , Humanos , Neovascularización Fisiológica/inmunología , Complicaciones Posoperatorias/diagnóstico , Procedimientos de Cirugía Plástica/tendencias , Piel/irrigación sanguíneaRESUMEN
We introduce a new approach to learning statistical models from multiple sequence alignments (MSA) of proteins. Our method, called GREMLIN (Generative REgularized ModeLs of proteINs), learns an undirected probabilistic graphical model of the amino acid composition within the MSA. The resulting model encodes both the position-specific conservation statistics and the correlated mutation statistics between sequential and long-range pairs of residues. Existing techniques for learning graphical models from MSA either make strong, and often inappropriate assumptions about the conditional independencies within the MSA (e.g., Hidden Markov Models), or else use suboptimal algorithms to learn the parameters of the model. In contrast, GREMLIN makes no a priori assumptions about the conditional independencies within the MSA. We formulate and solve a convex optimization problem, thus guaranteeing that we find a globally optimal model at convergence. The resulting model is also generative, allowing for the design of new protein sequences that have the same statistical properties as those in the MSA. We perform a detailed analysis of covariation statistics on the extensively studied WW and PDZ domains and show that our method out-performs an existing algorithm for learning undirected probabilistic graphical models from MSA. We then apply our approach to 71 additional families from the PFAM database and demonstrate that the resulting models significantly out-perform Hidden Markov Models in terms of predictive accuracy.
Asunto(s)
Modelos Químicos , Pliegue de Proteína , Proteínas/química , Alineación de Secuencia/métodos , Secuencia de Aminoácidos , Área Bajo la Curva , Biología Computacional , Gráficos por Computador , Simulación por Computador , Cadenas de Markov , Modelos Moleculares , Modelos Estadísticos , Dominios PDZ , Análisis de Secuencia de Proteína , Relación Estructura-ActividadRESUMEN
MOTIVATION: Protein-protein interactions (PPIs) are critical for virtually every biological function. Recently, researchers suggested to use supervised learning for the task of classifying pairs of proteins as interacting or not. However, its performance is largely restricted by the availability of truly interacting proteins (labeled). Meanwhile, there exists a considerable amount of protein pairs where an association appears between two partners, but not enough experimental evidence to support it as a direct interaction (partially labeled). RESULTS: We propose a semi-supervised multi-task framework for predicting PPIs from not only labeled, but also partially labeled reference sets. The basic idea is to perform multi-task learning on a supervised classification task and a semi-supervised auxiliary task. The supervised classifier trains a multi-layer perceptron network for PPI predictions from labeled examples. The semi-supervised auxiliary task shares network layers of the supervised classifier and trains with partially labeled examples. Semi-supervision could be utilized in multiple ways. We tried three approaches in this article, (i) classification (to distinguish partial positives with negatives); (ii) ranking (to rate partial positive more likely than negatives); (iii) embedding (to make data clusters get similar labels). We applied this framework to improve the identification of interacting pairs between HIV-1 and human proteins. Our method improved upon the state-of-the-art method for this task indicating the benefits of semi-supervised multi-task learning using auxiliary information. AVAILABILITY: http://www.cs.cmu.edu/~qyj/HIVsemi.
Asunto(s)
Inteligencia Artificial , Biología Computacional/métodos , VIH-1/fisiología , Proteínas del Virus de la Inmunodeficiencia Humana/metabolismo , Mapeo de Interacción de Proteínas/métodos , Proteínas/metabolismo , Algoritmos , Interpretación Estadística de Datos , Humanos , Modelos EstadísticosRESUMEN
BACKGROUND: Biological processes in cells are carried out by means of protein-protein interactions. Determining whether a pair of proteins interacts by wet-lab experiments is resource-intensive; only about 38,000 interactions, out of a few hundred thousand expected interactions, are known today. Active machine learning can guide the selection of pairs of proteins for future experimental characterization in order to accelerate accurate prediction of the human protein interactome. RESULTS: Random forest (RF) has previously been shown to be effective for predicting protein-protein interactions. Here, four different active learning algorithms have been devised for selection of protein pairs to be used to train the RF. With labels of as few as 500 protein-pairs selected using any of the four active learning methods described here, the classifier achieved a higher F-score (harmonic mean of Precision and Recall) than with 3000 randomly chosen protein-pairs. F-score of predicted interactions is shown to increase by about 15% with active learning in comparison to that with random selection of data. CONCLUSION: Active learning algorithms enable learning more accurate classifiers with much lesser labelled data and prove to be useful in applications where manual annotation of data is formidable. Active learning techniques demonstrated here can also be applied to other proteomics applications such as protein structure prediction and classification.
Asunto(s)
Algoritmos , Proteínas/química , Proteómica/métodos , Bases de Datos de Proteínas , Humanos , Conformación Proteica , Proteínas/clasificaciónRESUMEN
BACKGROUND: About 30% of genes code for membrane proteins, which are involved in a wide variety of crucial biological functions. Despite their importance, experimentally determined structures correspond to only about 1.7% of protein structures deposited in the Protein Data Bank due to the difficulty in crystallizing membrane proteins. Algorithms that can identify proteins whose high-resolution structure can aid in predicting the structure of many previously unresolved proteins are therefore of potentially high value. Active machine learning is a supervised machine learning approach which is suitable for this domain where there are a large number of sequences but only very few have known corresponding structures. In essence, active learning seeks to identify proteins whose structure, if revealed experimentally, is maximally predictive of others. RESULTS: An active learning approach is presented for selection of a minimal set of proteins whose structures can aid in the determination of transmembrane helices for the remaining proteins. TMpro, an algorithm for high accuracy TM helix prediction we previously developed, is coupled with active learning. We show that with a well-designed selection procedure, high accuracy can be achieved with only few proteins. TMpro, trained with a single protein achieved an F-score of 94% on benchmark evaluation and 91% on MPtopo dataset, which correspond to the state-of-the-art accuracies on TM helix prediction that are achieved usually by training with over 100 training proteins. CONCLUSION: Active learning is suitable for bioinformatics applications, where manually characterized data are not a comprehensive representation of all possible data, and in fact can be a very sparse subset thereof. It aids in selection of data instances which when characterized experimentally can improve the accuracy of computational characterization of remaining raw data. The results presented here also demonstrate that the feature extraction method of TMpro is well designed, achieving a very good separation between TM and non TM segments.
Asunto(s)
Algoritmos , Inteligencia Artificial , Proteínas de la Membrana/química , Estructura Secundaria de Proteína , Bases de Datos de Proteínas , Pliegue de Proteína , Análisis de Secuencia de ProteínaRESUMEN
BACKGROUND: Human immunodeficiency virus-1 (HIV-1) has a minimal genome of only 9 genes, which encode 15 proteins. HIV-1 thus depends on the human host for virtually every aspect of its life cycle. The universal language of communication in biological systems, including between pathogen and host, is via signal transduction pathways. The fundamental units of these pathways are protein protein interactions. Understanding the functional significance of HIV-1, human interactions requires viewing them in the context of human signal transduction pathways. RESULTS: Integration of HIV-1, human interactions with known signal transduction pathways indicates that the majority of known human pathways have the potential to be effected through at least one interaction with an HIV-1 protein at some point during the HIV-1 life cycle. For each pathway, we define simple paths between start points (i.e. no edges going into a node) and end points (i.e. no edges leaving a node). We then identify the paths that pass through human proteins that interact with HIV-1 proteins. We supplement the combined map with functional information, including which proteins are known drug targets and which proteins contribute significantly to HIV-1 function as revealed by recent siRNA screens. We find that there are often alternative paths starting and ending at the same proteins but circumventing the intermediate steps disrupted by HIV-1. CONCLUSION: A mapping of HIV-1, human interactions to human signal transduction pathways is presented here to link interactions with functions. We proposed a new way of analyzing the virus host interactions by identifying HIV-1 targets as well as alternative paths bypassing the HIV-1 targeted steps. This approach yields numerous experimentally testable hypotheses on how HIV-1 function may be compromised and human cellular function restored by pharmacological approaches. We are making the full set of pathway analysis results available to the community.
Asunto(s)
Infecciones por VIH/metabolismo , VIH-1/fisiología , Transducción de Señal , Bases de Datos Genéticas , Sistemas de Liberación de Medicamentos , Regulación de la Expresión Génica , Infecciones por VIH/tratamiento farmacológico , Infecciones por VIH/genética , Infecciones por VIH/virología , Humanos , ARN Interferente Pequeño/genéticaRESUMEN
Covariate shift is a prevalent setting for supervised learning in the wild when the training and test data are drawn from different time periods, different but related domains, or via different sampling strategies. This paper addresses a transfer learning setting, with covariate shift between source and target domains. Most existing methods for correcting covariate shift exploit density ratios of the features to reweight the source-domain data, and when the features are high-dimensional, the estimated density ratios may suffer large estimation variances, leading to poor prediction performance. In this work, we investigate the dependence of covariate shift correction performance on the dimensionality of the features, and propose a correction method that finds a low-dimensional representation of the features, which takes into account feature relevant to the target Y, and exploits the density ratio of this representation for importance reweighting. We discuss the factors affecting the performance of our method and demonstrate its capabilities on both pseudo-real and real-world data.
RESUMEN
A key problem in domain adaptation is determining what to transfer across different domains. We propose a data-driven method to represent these changes across multiple source domains and perform unsupervised domain adaptation. We assume that the joint distributions follow a specific generating process and have a small number of identifiable changing parameters, and develop a data-driven method to identify the changing parameters by learning low-dimensional representations of the changing class-conditional distributions across multiple source domains. The learned low-dimensional representations enable us to reconstruct the target-domain joint distribution from unlabeled target-domain data, and further enable predicting the labels in the target domain. We demonstrate the efficacy of this method by conducting experiments on synthetic and real datasets.
RESUMEN
Disease-causing pathogens such as viruses introduce their proteins into the host cells in which they interact with the host's proteins, enabling the virus to replicate inside the host. These interactions between pathogen and host proteins are key to understanding infectious diseases. Often multiple diseases involve phylogenetically related or biologically similar pathogens. Here we present a multitask learning method to jointly model interactions between human proteins and three different but related viruses: Hepatitis C, Ebola virus, and Influenza A. Our multitask matrix completion-based model uses a shared low-rank structure in addition to a task-specific sparse structure to incorporate the various interactions. We obtain between 7 and 39 percentage points improvement in predictive performance over prior state-of-the-art models. We show how our model's parameters can be interpreted to reveal both general and specific interaction-relevant characteristics of the viruses. Our code is available online.
Asunto(s)
Algoritmos , Biología Computacional/métodos , Interacciones Huésped-Patógeno , Mapas de Interacción de Proteínas , Proteínas/metabolismo , Bases de Datos de Proteínas , Ebolavirus/metabolismo , Hepacivirus/metabolismo , Humanos , Virus de la Influenza A/metabolismo , Modelos Moleculares , Conformación ProteicaRESUMEN
Protein fold recognition is an important step towards understanding protein three-dimensional structures and their functions. A conditional graphical model, i.e., segmentation conditional random fields (SCRFs), is proposed as an effective solution to this problem. In contrast to traditional graphical models, such as the hidden Markov model (HMM), SCRFs follow a discriminative approach. Therefore, it is flexible to include any features in the model, such as overlapping or long-range interaction features over the whole sequence. The model also employs a convex optimization function, which results in globally optimal solutions to the model parameters. On the other hand, the segmentation setting in SCRFs makes their graphical structures intuitively similar to the protein 3-D structures and more importantly provides a framework to model the long-range interactions between secondary structures directly. Our model is applied to predict the parallel beta-helix fold, an important fold in bacterial pathogenesis and carbohydrate binding/cleavage. The cross-family validation shows that SCRFs not only can score all known beta-helices higher than non-beta-helices in the Protein Data Bank (PDB), but also accurately locates rungs in known beta-helix proteins. Our method outperforms BetaWrap, a state-of-the-art algorithm for predicting beta-helix folds, and HMMER, a general motif detection algorithm based on HMM, and has the additional advantage of general application to other protein folds. Applying our prediction model to the Uniprot Database, we identify previously unknown potential beta-helices.
Asunto(s)
Bases de Datos de Proteínas , Pliegue de Proteína , Proteínas/química , Proteínas/genética , Alineación de Secuencia/estadística & datos numéricos , Algoritmos , Animales , Pollos , Biología Computacional , Humanos , Ratones , Modelos Moleculares , Estructura Secundaria de Proteína , Ratas , Programas InformáticosRESUMEN
The need for accurate, automated protein classification methods continues to increase as advances in biotechnology uncover new proteins. G-protein coupled receptors (GPCRs) are a particularly difficult superfamily of proteins to classify due to extreme diversity among its members. Previous comparisons of BLAST, k-nearest neighbor (k-NN), hidden markov model (HMM) and support vector machine (SVM) using alignment-based features have suggested that classifiers at the complexity of SVM are needed to attain high accuracy. Here, analogous to document classification, we applied Decision Tree and Naive Bayes classifiers with chi-square feature selection on counts of n-grams (i.e. short peptide sequences of length n) to this classification task. Using the GPCR dataset and evaluation protocol from the previous study, the Naive Bayes classifier attained an accuracy of 93.0 and 92.4% in level I and level II subfamily classification respectively, while SVM has a reported accuracy of 88.4 and 86.3%. This is a 39.7 and 44.5% reduction in residual error for level I and level II subfamily classification, respectively. The Decision Tree, while inferior to SVM, outperforms HMM in both level I and level II subfamily classification. For those GPCR families whose profiles are stored in the Protein FAMilies database of alignments and HMMs (PFAM), our method performs comparably to a search against those profiles. Finally, our method can be generalized to other protein families by applying it to the superfamily of nuclear receptors with 94.5, 97.8 and 93.6% accuracy in family, level I and level II subfamily classification respectively.
Asunto(s)
Biotecnología/métodos , Biología Computacional/métodos , Proteínas/química , Proteómica/métodos , Receptores Acoplados a Proteínas G/química , Algoritmos , Animales , Teorema de Bayes , Membrana Celular/metabolismo , Núcleo Celular/metabolismo , Bases de Datos de Proteínas , Árboles de Decisión , Genoma Humano , Humanos , Cadenas de Markov , Modelos Biológicos , Modelos Estadísticos , Datos de Secuencia Molecular , Estructura Terciaria de Proteína , Reproducibilidad de los Resultados , Alineación de Secuencia , Análisis de Secuencia de Proteína , Programas Informáticos , Terminología como AsuntoRESUMEN
We consider the problem of building a model to predict protein-protein interactions (PPIs) between the bacterial species Salmonella Typhimurium and the plant host Arabidopsis thaliana which is a host-pathogen pair for which no known PPIs are available. To achieve this, we present approaches, which use homology and statistical learning methods called "transfer learning." In the transfer learning setting, the task of predicting PPIs between Arabidopsis and its pathogen S. Typhimurium is called the "target task." The presented approaches utilize labeled data i.e., known PPIs of other host-pathogen pairs (we call these PPIs the "source tasks"). The homology based approaches use heuristics based on biological intuition to predict PPIs. The transfer learning methods use the similarity of the PPIs from the source tasks to the target task to build a model. For a quantitative evaluation we consider Salmonella-mouse PPI prediction and some other host-pathogen tasks where known PPIs exist. We use metrics such as precision and recall and our results show that our methods perform well on the target task in various transfer settings. We present a brief qualitative analysis of the Arabidopsis-Salmonella predicted interactions. We filter the predictions from all approaches using Gene Ontology term enrichment and only those interactions involving Salmonella effectors. Thereby we observe that Arabidopsis proteins involved e.g., in transcriptional regulation, hormone mediated signaling and defense response may be affected by Salmonella.
RESUMEN
The availability of high-quality physical interaction datasets is a prerequisite for system-level analysis of interactomes and supervised models to predict protein-protein interactions (PPIs). One source is literature-curated PPI databases in which pairwise associations of proteins published in the scientific literature are deposited. However, PPIs may not be clearly labelled as physical interactions affecting the quality of the entire dataset. In order to obtain a high-quality gold standard dataset for PPIs between human immunodeficiency virus (HIV-1) and its human host, we adopted a crowd-sourcing approach. We collected expert opinions and utilized an expectation-maximization based approach to estimate expert labeling quality. These estimates are used to infer the probability of a reported PPI actually being a direct physical interaction given the set of expert opinions. The effectiveness of our approach is demonstrated through synthetic data experiments and a high quality physical interaction network between HIV and human proteins is obtained. Since many literature-curated databases suffer from similar challenges, the framework described herein could be utilized in refining other databases. The curated data is available at http://www.cs.bilkent.edu.tr/~oznur.tastan/supp/psb2015/.
Asunto(s)
Bases de Datos de Proteínas/estadística & datos numéricos , Mapas de Interacción de Proteínas , Biología Computacional , Colaboración de las Masas , Testimonio de Experto , VIH-1/patogenicidad , VIH-1/fisiología , Interacciones Huésped-Patógeno , Proteínas del Virus de la Inmunodeficiencia Humana/fisiología , Humanos , Descubrimiento del Conocimiento , Funciones de Verosimilitud , Modelos Estadísticos , Análisis de SistemasRESUMEN
BACKGROUND: Trauma often cooccurs with cardiac arrest and hemorrhagic shock. Skin and muscle injuries often lead to significant inflammation in the affected tissue. The primary mechanism by which inflammation is initiated, sustained, and terminated is cytokine-mediated immune signaling, but this signaling can be altered by cardiac arrest. The complexity and context sensitivity of immune signaling in general has stymied a clear understanding of these signaling dynamics. METHODOLOGY/PRINCIPAL FINDINGS: We hypothesized that advanced numerical and biological function analysis methods would help elucidate the inflammatory response to skin and muscle wounds in rats, both with and without concomitant shock. Based on the multiplexed analysis of inflammatory mediators, we discerned a differential interleukin (IL)-1α and IL-18 signature in skin vs. muscle, which was suggestive of inflammasome activation in the skin. Immunoblotting revealed caspase-1 activation in skin but not muscle. Notably, IL-1α and IL-18, along with caspase-1, were greatly elevated in the skin following cardiac arrest, consistent with differential inflammasome activation. CONCLUSION/SIGNIFICANCE: Tissue-specific activation of caspase-1 and the NLRP3 inflammasome appear to be key factors in determining the type and severity of the inflammatory response to tissue injury, especially in the presence of severe shock, as suggested via data-driven modeling.
RESUMEN
Classifying biological data is a common task in the biomedical context. Predicting the class of new, unknown information allows researchers to gain insight and make decisions based on the available data. Also, using classification methods often implies choosing the best parameters to obtain optimal class separation, and the number of parameters might be large in biological datasets. Support Vector Machines provide a well-established and powerful classification method to analyse data and find the minimal-risk separation between different classes. Finding that separation strongly depends on the available feature set and the tuning of hyper-parameters. Techniques for feature selection and SVM parameters optimization are known to improve classification accuracy, and its literature is extensive. In this paper we review the strategies that are used to improve the classification performance of SVMs and perform our own experimentation to study the influence of features and hyper-parameters in the optimization process, using several known kernels.
Asunto(s)
Bases de Datos como Asunto/clasificación , Máquina de Vectores de Soporte , Adulto , HumanosRESUMEN
Viruses depend on their hosts at every stage of their life cycles and must therefore communicate with them via Protein-Protein Interactions (PPIs). To investigate the mechanisms of communication by different viruses, we overlay reported pairwise human-virus PPIs on human signalling pathways. Of 671 pathways obtained from NCI and Reactome databases, 355 are potentially targeted by at least one virus. The majority of pathways are linked to more than one virus. We find evidence supporting the hypothesis that viruses often interact with different proteins depending on the targeted pathway. Pathway analysis indicates overrepresentation of some pathways targeted by viruses. The merged network of the most statistically significant pathways shows several centrally located proteins, which are also hub proteins. Generally, hub proteins are targeted more frequently by viruses. Numerous proteins in virus-targeted pathways are known drug targets, suggesting that these might be exploited as potential new approaches to treatments against multiple viruses.