Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 12 de 12
Filter
Add more filters










Publication year range
1.
Big Data ; 4(3): 148-59, 2016 09.
Article in English | MEDLINE | ID: mdl-27541627

ABSTRACT

The availability of electronic health records creates fertile ground for developing computational models of various medical conditions. We present a new approach for detecting and analyzing patients with unexpected responses to treatment, building on machine learning and statistical methodology. Given a specific patient, we compute a statistical score for the deviation of the patient's response from responses observed in other patients having similar characteristics and medication regimens. These scores are used to define cohorts of patients showing deviant responses. Statistical tests are then applied to identify clinical features that correlate with these cohorts. We implement this methodology in a tool that is designed to assist researchers in the pharmaceutical field to uncover new features associated with reduced response to a treatment. It can also aid physicians by flagging patients who are not responding to treatment as expected and hence deserve more attention. The tool provides comprehensive visualizations of the analysis results and the supporting data, both at the cohort level and at the level of individual patients. We demonstrate the utility of our methodology and tool in a population of type II diabetic patients, treated with antidiabetic drugs, and monitored by the HbA1C test.


Subject(s)
Diabetes Mellitus, Type 2/drug therapy , Hypoglycemic Agents/therapeutic use , Electronic Health Records , Humans , Machine Learning
2.
AMIA Jt Summits Transl Sci Proc ; 2015: 137-41, 2015.
Article in English | MEDLINE | ID: mdl-26306256

ABSTRACT

The availability of electronic health records creates fertile ground for developing computational models for various medical conditions. Using machine learning, we can detect patients with unexpected responses to treatment and provide statistical testing and visualization tools to help further analysis. The new system was developed to help researchers uncover new features associated with reduced response to treatment, and to aid physicians in identifying patients that are not responding to treatment as expected and hence deserve more attention. The solution computes a statistical score for the deviation of a given patient's response from responses observed individuals with similar characteristics and medication regimens. Statistical tests are then applied to identify clinical features that correlate with cohorts of patients showing deviant responses. The results provide comprehensive visualizations, both at the cohort and the individual patient levels. We demonstrate the utility of this system in a population of diabetic patients.

3.
Genet Epidemiol ; 38(5): 477-81, 2014 Jul.
Article in English | MEDLINE | ID: mdl-24706571

ABSTRACT

Issues of publication bias, lack of replicability, and false discovery have long plagued the genetics community. Proper utilization of public and shared data resources presents an opportunity to ameliorate these problems. We present an approach to public database management that we term Quality Preserving Database (QPD). It enables perpetual use of the database for testing statistical hypotheses while controlling false discovery and avoiding publication bias on the one hand, and maintaining testing power on the other hand. We demonstrate it on a use case of a replication server for GWAS findings, underlining its practical utility. We argue that a shift to using QPD in managing current and future biological databases will significantly enhance the community's ability to make efficient and statistically sound use of the available data resources.


Subject(s)
Databases, Factual/standards , Information Management/methods , Public Sector , Databases, Factual/economics , Information Management/economics , Information Management/standards , Publication Bias , Quality Control , Reproducibility of Results
4.
Stud Health Technol Inform ; 169: 689-93, 2011.
Article in English | MEDLINE | ID: mdl-21893835

ABSTRACT

The new generation of health information standards, where the syntax and semantics of the content is explicitly formalized, allows for interoperability in healthcare scenarios and analysis in clinical research settings. Studies involving clinical and genomic data include accumulating knowledge as relationships between genotypic and phenotypic information as well as associations within the genomic and clinical worlds. Some involve analysis results targeted at a specific disease; others are of a predictive nature specific to a patient and may be used by decision support applications. Representing knowledge is as important as representing data since data is more useful when coupled with relevant knowledge. Any further analysis and cross-research collaboration would benefit from persisting knowledge and data in a unified way. This paper describes a methodology used in Hypergenes, an EC FP7 project targeting Essential Hypertension, which captures data and knowledge using standards such as HL7 CDA and Clinical Genomics, aligned with the CEN EHR 13606 specification. We demonstrate the benefits of such an approach for clinical research as well as in healthcare oriented scenarios.


Subject(s)
Computer Communication Networks/standards , Decision Support Systems, Clinical/standards , Medical Informatics/standards , Algorithms , Computer Systems , Computers , Genomics , Genotype , Humans , Hypertension/therapy , Medical Records Systems, Computerized , Phenotype , Programming Languages , Software , Systems Integration
5.
Article in English | MEDLINE | ID: mdl-21778529

ABSTRACT

The common scenario in computational biology in which a community of researchers conduct multiple statistical tests on one shared database gives rise to the multiple hypothesis testing problem. Conventional procedures for solving this problem control the probability of false discovery by sacrificing some of the power of the tests. We suggest a scheme for controlling false discovery without any power loss by adding new samples for each use of the database and charging the user with the expenses. The crux of the scheme is a carefully crafted pricing system that fairly prices different user requests based on their demands while keeping the probability of false discovery bounded. We demonstrate this idea in the context of HIV treatment research, where multiple researchers conduct tests on a repository of HIV samples.


Subject(s)
Computational Biology/standards , Database Management Systems/standards , Biomedical Research , Data Interpretation, Statistical
6.
PLoS One ; 3(10): e3470, 2008.
Article in English | MEDLINE | ID: mdl-18941628

ABSTRACT

BACKGROUND: Analysis of the viral genome for drug resistance mutations is state-of-the-art for guiding treatment selection for human immunodeficiency virus type 1 (HIV-1)-infected patients. These mutations alter the structure of viral target proteins and reduce or in the worst case completely inhibit the effect of antiretroviral compounds while maintaining the ability for effective replication. Modern anti-HIV-1 regimens comprise multiple drugs in order to prevent or at least delay the development of resistance mutations. However, commonly used HIV-1 genotype interpretation systems provide only classifications for single drugs. The EuResist initiative has collected data from about 18,500 patients to train three classifiers for predicting response to combination antiretroviral therapy, given the viral genotype and further information. In this work we compare different classifier fusion methods for combining the individual classifiers. PRINCIPAL FINDINGS: The individual classifiers yielded similar performance, and all the combination approaches considered performed equally well. The gain in performance due to combining methods did not reach statistical significance compared to the single best individual classifier on the complete training set. However, on smaller training set sizes (200 to 1,600 instances compared to 2,700) the combination significantly outperformed the individual classifiers (p<0.01; paired one-sided Wilcoxon test). Together with a consistent reduction of the standard deviation compared to the individual prediction engines this shows a more robust behavior of the combined system. Moreover, using the combined system we were able to identify a class of therapy courses that led to a consistent underestimation (about 0.05 AUC) of the system performance. Discovery of these therapy courses is a further hint for the robustness of the combined system. CONCLUSION: The combined EuResist prediction engine is freely available at http://engine.euresist.org.


Subject(s)
Anti-HIV Agents/pharmacology , Artificial Intelligence , Computational Biology/methods , Drug Resistance/genetics , Genome, Viral , Mutation , Diagnosis, Computer-Assisted , Genotype , Internet , Methods , Models, Statistical
7.
Bioinformatics ; 24(13): i399-406, 2008 Jul 01.
Article in English | MEDLINE | ID: mdl-18586740

ABSTRACT

MOTIVATION: Optimizing HIV therapies is crucial since the virus rapidly develops mutations to evade drug pressure. Recent studies have shown that genotypic information might not be sufficient for the design of therapies and that other clinical and demographical factors may play a role in therapy failure. This study is designed to assess the improvement in prediction achieved when such information is taken into account. We use these factors to generate a prediction engine using a variety of machine learning methods and to determine which clinical conditions are most misleading in terms of predicting the outcome of a therapy. RESULTS: Three different machine learning techniques were used: generative-discriminative method, regression with derived evolutionary features, and regression with a mixture of effects. All three methods had similar performances with an area under the receiver operating characteristic curve (AUC) of 0.77. A set of three similar engines limited to genotypic information only achieved an AUC of 0.75. A straightforward combination of the three engines consistently improves the prediction, with significantly better prediction when the full set of features is employed. The combined engine improves on predictions obtained from an online state-of-the-art resistance interpretation system. Moreover, engines tend to disagree more on the outcome of failure therapies than regarding successful ones. Careful analysis of the differences between the engines revealed those mutations and drugs most closely associated with uncertainty of the therapy outcome. AVAILABILITY: The combined prediction engine will be available from July 2008, see http://engine.euresist.org.


Subject(s)
Anti-HIV Agents/therapeutic use , Chromosome Mapping/methods , Decision Support Systems, Clinical , Genetic Predisposition to Disease/genetics , HIV Infections/drug therapy , HIV Infections/genetics , Outcome Assessment, Health Care/methods , Pharmacogenetics/methods , Humans
8.
Proteins ; 72(2): 741-53, 2008 Aug.
Article in English | MEDLINE | ID: mdl-18260101

ABSTRACT

Proteins fold into a well-defined structure as a result of the collapse of the polypeptide chain, while transient protein-complex formation mainly is a result of binding of two folded individual monomers. Therefore, a protein-protein interface does not resemble the core of monomeric proteins, but has a more polar nature. Here, we address the question of whether the physico-chemical characteristics of intraprotein versus interprotein bonds differ, or whether interfaces are different from folded monomers only in the preference for certain types of interactions. To address this question we assembled a high resolution, nonredundant, protein-protein interaction database consisting of 1374 homodimer and 572 heterodimer complexes, and compared the physico-chemical properties of these interactions between protein interfaces and monomers. We performed extensive statistical analysis of geometrical properties of interatomic interactions of different types: hydrogen bonds, electrostatic interactions, and aromatic interactions. Our study clearly shows that there is no significant difference in the chemistry, geometry, or packing density of individual interactions between interfaces and monomeric structures. However, the distribution of different bonds differs. For example, side-chain-side-chain interactions constitute over 62% of all interprotein interactions, while they make up only 36% of the bonds stabilizing a protein structure. As on average, properties of backbone interactions are different from those of side chains, a quantitative difference is observed. Our findings clearly show that the same knowledge-based potential can be used for protein-binding sites as for protein structures. However, one has to keep in mind the different architecture of the interfaces and their unique bond preference.


Subject(s)
Proteins/chemistry , Dimerization , Hydrogen Bonding , Protein Folding , Proteins/metabolism
9.
Nucleic Acids Res ; 35(Web Server issue): W543-8, 2007 Jul.
Article in English | MEDLINE | ID: mdl-17488838

ABSTRACT

The development of bioinformatic tools by individual labs results in the abundance of parallel programs for the same task. For example, identification of binding site regions between interacting proteins is done using: ProMate, WHISCY, PPI-Pred, PINUP and others. All servers first identify unique properties of binding sites and then incorporate them into a predictor. Obviously, the resulting prediction would improve if the most suitable parameters from each of those predictors would be incorporated into one server. However, because of the variation in methods and databases, this is currently not feasible. Here, the protein-binding site prediction server is extended into a general protein-binding sites research tool, ProMateus. This web tool, based on ProMate's infrastructure enables the easy exploration and incorporation of new features and databases by the user, providing an evaluation of the benefit of individual features and their combination within a set framework. This transforms the individual research into a community exercise, bringing out the best from all users for optimized predictions. The analysis is demonstrated on a database of protein protein and protein-DNA interactions. This approach is basically different from that used in generating meta-servers. The implications of the open-research approach are discussed. ProMateus is available at http://bip.weizmann.ac.il/promate.


Subject(s)
Algorithms , Computational Biology/methods , DNA/chemistry , Proteins/chemistry , Software , Binding Sites , Databases, Factual , Internet , Models, Molecular , Protein Binding , Protein Conformation , Protein Structure, Secondary , Proteins/metabolism , Surface Properties
10.
Curr Opin Struct Biol ; 17(1): 67-76, 2007 Feb.
Article in English | MEDLINE | ID: mdl-17239579

ABSTRACT

The formation of specific protein interactions plays a crucial role in most, if not all, biological processes, including signal transduction, cell regulation, the immune response and others. Recent advances in our understanding of the molecular architecture of protein-protein binding sites, which facilitates such diversity in binding affinity and specificity, are enabling us to address key questions. What is the amino acid composition of binding sites? What are interface hotspots? How are binding sites organized? What are the differences between tight and weak interacting complexes? How does water contribute to binding? Can the knowledge gained be translated into protein design? And does a universal code for binding exist, or is it the architecture and chemistry of the interface that enable diverse but specific binding solutions?


Subject(s)
Protein Binding , Proteins/chemistry , Proteins/metabolism , Binding Sites , Multiprotein Complexes , Protein Engineering/methods , Water/chemistry
11.
J Mol Biol ; 338(1): 181-99, 2004 Apr 16.
Article in English | MEDLINE | ID: mdl-15050833

ABSTRACT

Is the whole protein surface available for interaction with other proteins, or are specific sites pre-assigned according to their biophysical and structural character? And if so, is it possible to predict the location of the binding site from the surface properties? These questions are answered quantitatively by probing the surfaces of proteins using spheres of radius of 10 A on a database (DB) of 57 unique, non-homologous proteins involved in heteromeric, transient protein-protein interactions for which the structures of both the unbound and bound states were determined. In structural terms, we found the binding site to have a preference for beta-sheets and for relatively long non-structured chains, but not for alpha-helices. Chemically, aromatic side-chains show a clear preference for binding sites. While the hydrophobic and polar content of the interface is similar to the rest of the surface, hydrophobic and polar residues tend to cluster in interfaces. In the crystal, the binding site has more bound water molecules surrounding it, and a lower B-factor already in the unbound protein. The same biophysical properties were found to hold for the unbound and bound DBs. All the significant interface properties were combined into ProMate, an interface prediction program. This was followed by an optimization step to choose the best combination of properties, as many of them are correlated. During optimization and prediction, the tested proteins were not used for data collection, to avoid over-fitting. The prediction algorithm is fully automated, and is used to predict the location of potential binding sites on unbound proteins with known structures. The algorithm is able to successfully predict the location of the interface for about 70% of the proteins. The success rate of the predictor was equal whether applied on the unbound DB or on the disjoint bound DB. A prediction is assumed correct if over half of the predicted continuous interface patch is indeed interface. The ability to predict the location of protein-protein interfaces has far reaching implications both towards our understanding of specificity and kinetics of binding, as well as in assisting in the analysis of the proteome.


Subject(s)
Algorithms , Computational Biology/methods , Proteins/chemistry , Software , Binding Sites , Databases, Factual , Models, Molecular , Protein Binding , Protein Conformation , Proteins/metabolism , Surface Properties
12.
Protein Eng Des Sel ; 17(2): 183-9, 2004 Feb.
Article in English | MEDLINE | ID: mdl-15007163

ABSTRACT

Docking algorithms produce many possible structures of a protein-protein complex. In most cases some of them resemble the correct structure within an r.m.s.d. of <3 A. A major challenge in the field of docking is to extract the correct structure out of this pool, the so-called 'scoring'. Here, we introduce a new scoring function, which discriminates between the many wrong and few true conformations. The scoring function is based on measuring the tightness of fit of the two docked proteins at a predicted binding interface. The location of the binding interface is identified using the recently developed computer algorithm ProMate. The new scoring function does not rely on energy considerations. It is therefore tolerant to low-resolution descriptions of the interface. A linear relation between the score and the r.m.s.d. relative to the 'true structure' is found in most of the cases evaluated. The function was tested on the docking results of 21 complexes in their unbound form. It was found to be successful in 77% of the examined cases, defining success as scoring a 'true' result with a p value of better than 0.1.


Subject(s)
Algorithms , Models, Molecular , Proteins/chemistry , Proteins/metabolism , Binding Sites , Databases, Protein , Predictive Value of Tests , Protein Binding
SELECTION OF CITATIONS
SEARCH DETAIL
...