Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 36
Filtrar
Más filtros

Banco de datos
Tipo del documento
Intervalo de año de publicación
1.
Int J Mol Sci ; 23(19)2022 Sep 29.
Artículo en Inglés | MEDLINE | ID: mdl-36232786

RESUMEN

ApoB-100 is a member of a large lipid transfer protein superfamily and is one of the main apolipoproteins found on low-density lipoprotein (LDL) and very low-density lipoprotein (VLDL) particles. Despite its clinical significance for the development of cardiovascular disease, there is limited information on apoB-100 structure. We have developed a novel method based on the "divide and conquer" algorithm, using PSIPRED software, by dividing apoB-100 into five subunits and 11 domains. Models of each domain were prepared using I-TASSER, DEMO, RoseTTAFold, Phyre2, and MODELLER. Subsequently, we used disuccinimidyl sulfoxide (DSSO), a new mass spectrometry cleavable cross-linker, and the known position of disulfide bonds to experimentally validate each model. We obtained 65 unique DSSO cross-links, of which 87.5% were within a 26 Å threshold in the final model. We also evaluated the positions of cysteine residues involved in the eight known disulfide bonds in apoB-100, and each pair was measured within the expected 5.6 Å constraint. Finally, multiple domains were combined by applying constraints based on detected long-range DSSO cross-links to generate five subunits, which were subsequently merged to achieve an uninterrupted architecture for apoB-100 around a lipoprotein particle. Moreover, the dynamics of apoB-100 during particle size transitions was examined by comparing VLDL and LDL computational models and using experimental cross-linking data. In addition, the proposed model of receptor ligand binding of apoB-100 provides new insights into some of its functions.


Asunto(s)
Apolipoproteínas B , Cisteína , Apolipoproteína B-100 , Apolipoproteínas B/metabolismo , Simulación por Computador , Disulfuros , Ligandos , Lipoproteínas LDL/química , Lipoproteínas VLDL , Modelos Estructurales , Sulfóxidos
2.
J Chem Inf Model ; 61(6): 2675-2685, 2021 06 28.
Artículo en Inglés | MEDLINE | ID: mdl-34047186

RESUMEN

Opioid receptors (OPRs) are the main targets for the treatment of pain and related disorders. The opiate compounds that activate these receptors are effective analgesics but their use leads to adverse effects, and they often are highly addictive drugs of abuse. There is an urgent need for alternative chemicals that are analgesics and to reduce/avoid the unwanted effects in order to relieve the public health crisis of opioid addiction. Here, we aim to develop computational models to predict the OPR activity of small molecule compounds based on chemical structures and apply these models to identify novel OPR active compounds. We used four different machine learning algorithms to build models based on quantitative high throughput screening (qHTS) data sets of three OPRs in both agonist and antagonist modes. The best performing models were applied to virtually screen a large collection of compounds. The model predicted active compounds were experimentally validated using the same qHTS assays that generated the training data. Random forest was the best classifier with the highest performance metrics, and the mu OPR (OPRM)-agonist model achieved the best performance measured by AUC-ROC (0.88) and MCC (0.7) values. The model predicted actives resulted in hit rates ranging from 2.3% (delta OPR-agonist) to 15.8% (OPRM-agonist) after experimental confirmation. Compared to the original assay hit rate, all models enriched the hit rate by ≥2-fold. Our approach produced robust OPR prediction models that can be applied to prioritize compounds from large libraries for further experimental validation. The models identified several novel potent compounds as activators/inhibitors of OPRs that were confirmed experimentally. The potent hits were further investigated using molecular docking to find the interactions of the novel ligands in the active site of the corresponding OPR.


Asunto(s)
Analgésicos Opioides , Receptores Opioides , Analgésicos , Analgésicos Opioides/toxicidad , Humanos , Simulación del Acoplamiento Molecular , Dolor
3.
Proteins ; 88(11): 1472-1481, 2020 11.
Artículo en Inglés | MEDLINE | ID: mdl-32535960

RESUMEN

Intrinsically disordered regions (IDR) play an important role in key biological processes and are closely related to human diseases. IDRs have great potential to serve as targets for drug discovery, most notably in disordered binding regions. Accurate prediction of IDRs is challenging because their genome wide occurrence and a low ratio of disordered residues make them difficult targets for traditional classification techniques. Existing computational methods mostly rely on sequence profiles to improve accuracy which is time consuming and computationally expensive. This article describes an ab initio sequence-only prediction method-which tries to overcome the challenge of accurate prediction posed by IDRs-based on reduced amino acid alphabets and convolutional neural networks (CNNs). We experiment with six different 3-letter reduced alphabets. We argue that the dimensional reduction in the input alphabet facilitates the detection of complex patterns within the sequence by the convolutional step. Experimental results show that our proposed IDR predictor performs at the same level or outperforms other state-of-the-art methods in the same class, achieving accuracy levels of 0.76 and AUC of 0.85 on the publicly available Critical Assessment of protein Structure Prediction dataset (CASP10). Therefore, our method is suitable for proteome-wide disorder prediction yielding similar or better accuracy than existing approaches at a faster speed.


Asunto(s)
Biología Computacional/métodos , Minería de Datos/estadística & datos numéricos , Proteínas Intrínsecamente Desordenadas/química , Aprendizaje Automático , Redes Neurales de la Computación , Secuencia de Aminoácidos , Área Bajo la Curva , Benchmarking , Conjuntos de Datos como Asunto , Humanos , Reducción de Dimensionalidad Multifactorial , Curva ROC , Análisis de Secuencia de Proteína
4.
Viruses ; 16(6)2024 May 22.
Artículo en Inglés | MEDLINE | ID: mdl-38932114

RESUMEN

When designing live-attenuated respiratory syncytial virus (RSV) vaccine candidates, attenuating mutations can be developed through biologic selection or reverse-genetic manipulation and may include point mutations, codon and gene deletions, and genome rearrangements. Attenuation typically involves the reduction in virus replication, due to direct effects on viral structural and replicative machinery or viral factors that antagonize host defense or cause disease. However, attenuation must balance reduced replication and immunogenic antigen expression. In the present study, we explored a new approach in order to discover attenuating mutations. Specifically, we used protein structure modeling and computational methods to identify amino acid substitutions in the RSV nonstructural protein 1 (NS1) predicted to cause various levels of structural perturbation. Twelve different mutations predicted to alter the NS1 protein structure were introduced into infectious virus and analyzed in cell culture for effects on viral mRNA and protein expression, interferon and cytokine expression, and caspase activation. We found the use of structure-based machine learning to predict amino acid substitutions that reduce the thermodynamic stability of NS1 resulted in various levels of loss of NS1 function, exemplified by effects including reduced multi-cycle viral replication in cells competent for type I interferon, reduced expression of viral mRNAs and proteins, and increased interferon and apoptosis responses.


Asunto(s)
Aprendizaje Automático , Vacunas contra Virus Sincitial Respiratorio , Virus Sincitial Respiratorio Humano , Proteínas no Estructurales Virales , Replicación Viral , Humanos , Proteínas no Estructurales Virales/genética , Proteínas no Estructurales Virales/inmunología , Proteínas no Estructurales Virales/química , Proteínas no Estructurales Virales/metabolismo , Vacunas contra Virus Sincitial Respiratorio/inmunología , Vacunas contra Virus Sincitial Respiratorio/genética , Virus Sincitial Respiratorio Humano/genética , Virus Sincitial Respiratorio Humano/inmunología , Vacunas Atenuadas/inmunología , Vacunas Atenuadas/genética , Infecciones por Virus Sincitial Respiratorio/prevención & control , Infecciones por Virus Sincitial Respiratorio/virología , Infecciones por Virus Sincitial Respiratorio/inmunología , Sustitución de Aminoácidos , Mutación , Línea Celular
5.
Front Microbiol ; 15: 1304044, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-38516021

RESUMEN

Introduction: Antimicrobial peptides (AMPs) are promising alternatives to traditional antibiotics for combating plant pathogenic bacteria in agriculture and the environment. However, identifying potent AMPs through laborious experimental assays is resource-intensive and time-consuming. To address these limitations, this study presents a bioinformatics approach utilizing machine learning models for predicting and selecting AMPs active against plant pathogenic bacteria. Methods: N-gram representations of peptide sequences with 3-letter and 9-letter reduced amino acid alphabets were used to capture the sequence patterns and motifs that contribute to the antimicrobial activity of AMPs. A 5-fold cross-validation technique was used to train the machine learning models and to evaluate their predictive accuracy and robustness. Results: The models were applied to predict putative AMPs encoded by intergenic regions and small open reading frames (ORFs) of the citrus genome. Approximately 7% of the 10,000-peptide dataset from the intergenic region and 7% of the 685,924-peptide dataset from the whole genome were predicted as probable AMPs. The prediction accuracy of the reported models range from 0.72 to 0.91. A subset of the predicted AMPs was selected for experimental test against Spiroplasma citri, the causative agent of citrus stubborn disease. The experimental results confirm the antimicrobial activity of the selected AMPs against the target bacterium, demonstrating the predictive capability of the machine learning models. Discussion: Hydrophobic amino acid residues and positively charged amino acid residues are among the key features in predicting AMPs by the Random Forest Algorithm. Aggregation propensity appears to be correlated with the effectiveness of the AMPs. The described models would contribute to the development of effective AMP-based strategies for plant disease management in agricultural and environmental settings. To facilitate broader accessibility, our model is publicly available on the AGRAMP (Agricultural Ngrams Antimicrobial Peptides) server.

6.
Viruses ; 16(4)2024 04 20.
Artículo en Inglés | MEDLINE | ID: mdl-38675983

RESUMEN

Human immunodeficiency virus type 1 (HIV-1) infection can result in HIV-associated neurocognitive disorder (HAND), a spectrum of disorders characterized by neurological impairment and chronic inflammation. Combined antiretroviral therapy (cART) has elicited a marked reduction in the number of individuals diagnosed with HAND. However, there is continual, low-level viral transcription due to the lack of a transcription inhibitor in cART regimens, which results in the accumulation of viral products within infected cells. To alleviate stress, infected cells can release accumulated products, such as TAR RNA, in extracellular vesicles (EVs), which can contribute to pathogenesis in neighboring cells. Here, we demonstrate that cART can contribute to autophagy deregulation in infected cells and increased EV release. The impact of EVs released from HIV-1 infected myeloid cells was found to contribute to CNS pathogenesis, potentially through EV-mediated TLR3 (Toll-like receptor 3) activation, suggesting the need for therapeutics to target this mechanism. Three HIV-1 TAR-binding compounds, 103FA, 111FA, and Ral HCl, were identified that recognize TAR RNA and reduce TLR activation. These data indicate that packaging of viral products into EVs, potentially exacerbated by antiretroviral therapeutics, may induce chronic inflammation of the CNS observed in cART-treated patients, and novel therapeutic strategies may be exploited to mitigate morbidity.


Asunto(s)
Autofagia , Vesículas Extracelulares , Infecciones por VIH , VIH-1 , Receptor Toll-Like 3 , Vesículas Extracelulares/metabolismo , Humanos , Receptor Toll-Like 3/metabolismo , Receptor Toll-Like 3/genética , VIH-1/fisiología , Infecciones por VIH/virología , Infecciones por VIH/metabolismo , Infecciones por VIH/tratamiento farmacológico , Autofagia/efectos de los fármacos , ARN Viral/metabolismo , ARN Viral/genética
7.
BMC Genomics ; 14 Suppl 4: S3, 2013.
Artículo en Inglés | MEDLINE | ID: mdl-24268064

RESUMEN

BACKGROUND: Successful management of chronic human immunodeficiency virus type 1 (HIV-1) infection with a cocktail of antiretroviral medications can be negatively affected by the presence of drug resistant mutations in the viral targets. These targets include the HIV-1 protease (PR) and reverse transcriptase (RT) proteins, for which a number of inhibitors are available on the market and routinely prescribed. Protein mutational patterns are associated with varying degrees of resistance to their respective inhibitors, with extremes that can range from continued susceptibility to cross-resistance across all drugs. RESULTS: Here we implement statistical learning algorithms to develop structure- and sequence-based models for systematically predicting the effects of mutations in the PR and RT proteins on resistance to each of eight and eleven inhibitors, respectively. Employing a four-body statistical potential, mutant proteins are represented as feature vectors whose components quantify relative environmental perturbations at amino acid residue positions in the respective target structures upon mutation. Two approaches are implemented in developing sequence-based models, based on use of either relative frequencies or counts of n-grams, to generate vectors for representing mutant proteins. To the best of our knowledge, this is the first reported study on structure- and sequence-based predictive models of HIV-1 PR and RT drug resistance developed by implementing a four-body statistical potential and n-grams, respectively, to generate mutant attribute vectors. Performance of the learning methods is evaluated on the basis of tenfold cross-validation, using previously assayed and publicly available in vitro data relating mutational patterns in the targets to quantified inhibitor susceptibility changes. CONCLUSION: Overall performance results are competitive with those of a previously published study utilizing a sequence-based strategy, while our structure- and sequence-based models provide orthogonal and complementary prediction methodologies, respectively. In a novel application, we describe a technique for identifying every possible pair of RT inhibitors as either potentially effective together as part of a cocktail, or a combination that is to be avoided.


Asunto(s)
Farmacorresistencia Viral , Inhibidores de la Proteasa del VIH/farmacología , Proteasa del VIH/genética , Transcriptasa Inversa del VIH/genética , VIH-1/efectos de los fármacos , VIH-1/enzimología , Inhibidores de la Transcriptasa Inversa/farmacología , Algoritmos , Dominio Catalítico/genética , Biología Computacional , Infecciones por VIH/tratamiento farmacológico , Infecciones por VIH/genética , Proteasa del VIH/química , Proteasa del VIH/metabolismo , Inhibidores de la Proteasa del VIH/metabolismo , Transcriptasa Inversa del VIH/antagonistas & inhibidores , Transcriptasa Inversa del VIH/química , VIH-1/genética , VIH-1/metabolismo , Humanos , Modelos Moleculares , Proteínas Mutantes/química , Proteínas Mutantes/metabolismo , Mutación , Fenotipo , Conformación Proteica , Inhibidores de la Transcriptasa Inversa/metabolismo
8.
Front Genet ; 14: 1200770, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-37745840

RESUMEN

Introduction: The African Goat Improvement Network Image Collection Protocol (AGIN-ICP) is an accessible, easy to use, low-cost procedure to collect phenotypic data via digital images. The AGIN-ICP collects images to extract several phenotype measures including health status indicators (anemia status, age, and weight), body measurements, shapes, and coat color and pattern, from digital images taken with standard digital cameras or mobile devices. This strategy is to quickly survey, record, assess, analyze, and store these data for use in a wide variety of production and sampling conditions. Methods: The work was accomplished as part of the multinational African Goat Improvement Network (AGIN) collaborative and is presented here as a case study in the AGIN collaboration model and working directly with community-based breeding programs (CBBP). It was iteratively developed and tested over 3 years, in 12 countries with over 12,000 images taken. Results and discussion: The AGIN-ICP development is described, and field implementation and the quality of the resulting images for use in image analysis and phenotypic data extraction are iteratively assessed. Digital body measures were validated using the PreciseEdge Image Segmentation Algorithm (PE-ISA) and software showing strong manual to digital body measure Pearson correlation coefficients of height, length, and girth measures (0.931, 0.943, 0.893) respectively. It is critical to note that while none of the very detailed tasks in the AGIN-ICP described here is difficult, every single one of them is even easier to accidentally omit, and the impact of such a mistake could render a sample image, a sampling day's images, or even an entire sampling trip's images difficult or unusable for extracting digital phenotypes. Coupled with tissue sampling and genomic testing, it may be useful in the effort to identify and conserve important animal genetic resources and in CBBP genetic improvement programs by providing reliably measured phenotypes with modest cost. Potential users include farmers, animal husbandry officials, veterinarians, regional government or other public health officials, researchers, and others. Based on these results, a final AGIN-ICP is presented, optimizing the costs, ease, and speed of field implementation of the collection method without compromising the quality of the image data collection.

9.
Methods Mol Biol ; 2405: 1-37, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-35298806

RESUMEN

Antibiotic resistance constitutes a global threat and could lead to a future pandemic. One strategy is to develop a new generation of antimicrobials. Naturally occurring antimicrobial peptides (AMPs) are recognized templates and some are already in clinical use. To accelerate the discovery of new antibiotics, it is useful to predict novel AMPs from the sequenced genomes of various organisms. The antimicrobial peptide database (APD) provided the first empirical peptide prediction program. It also facilitated the testing of the first machine-learning algorithms. This chapter provides an overview of machine-learning predictions of AMPs. Most of the predictors, such as AntiBP, CAMP, and iAMPpred, involve a single-label prediction of antimicrobial activity. This type of prediction has been expanded to antifungal, antiviral, antibiofilm, anti-TB, hemolytic, and anti-inflammatory peptides. The multiple functional roles of AMPs annotated in the APD also enabled multi-label predictions (iAMP-2L, MLAMP, and AMAP), which include antibacterial, antiviral, antifungal, antiparasitic, antibiofilm, anticancer, anti-HIV, antimalarial, insecticidal, antioxidant, chemotactic, spermicidal activities, and protease inhibiting activities. Also considered in predictions are peptide posttranslational modification, 3D structure, and microbial species-specific information. We compare important amino acids of AMPs implied from machine learning with the frequently occurring residues of the major classes of natural peptides. Finally, we discuss advances, limitations, and future directions of machine-learning predictions of antimicrobial peptides. Ultimately, we may assemble a pipeline of such predictions beyond antimicrobial activity to accelerate the discovery of novel AMP-based antimicrobials.


Asunto(s)
Antiinfecciosos , Péptidos Antimicrobianos , Aprendizaje Automático , Aminoácidos/química , Antiinfecciosos/química , Antiinfecciosos/farmacología , Péptidos Antimicrobianos/química , Péptidos Antimicrobianos/farmacología , Péptidos/química
10.
PLoS One ; 17(10): e0275821, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-36227957

RESUMEN

Computer vision is a tool that could provide livestock producers with digital body measures and records that are important for animal health and production, namely body height and length, and chest girth. However, to build these tools, the scarcity of labeled training data sets with uniform images (pose, lighting) that also represent real-world livestock can be a challenge. Collecting images in a standard way, with manual image labeling is the gold standard to create such training data, but the time and cost can be prohibitive. We introduce the PreciseEdge image segmentation algorithm to address these issues by employing a standard image collection protocol with a semi-automated image labeling method, and a highly precise image segmentation for automated body measurement extraction directly from each image. These elements, from image collection to extraction are designed to work together to yield values highly correlated to real-world body measurements. PreciseEdge adds a brief preprocessing step inspired by chromakey to a modified GrabCut procedure to generate image masks for data extraction (body measurements) directly from the images. Three hundred RGB (red, green, blue) image samples were collected uniformly per the African Goat Improvement Network Image Collection Protocol (AGIN-ICP), which prescribes camera distance, poses, a blue backdrop, and a custom AGIN-ICP calibration sign. Images were taken in natural settings outdoors and in barns under high and low light, using a Ricoh digital camera producing JPG images (converted to PNG prior to processing). The rear and side AGIN-ICP poses were used for this study. PreciseEdge and GrabCut image segmentation methods were compared for differences in user input required to segment the images. The initial bounding box image output was captured for visual comparison. Automated digital body measurements extracted were compared to manual measures for each method. Both methods allow additional optional refinement (mouse strokes) to aid the segmentation algorithm. These optional mouse strokes were captured automatically and compared. Stroke count distributions for both methods were not normally distributed per Kolmogorov-Smirnov tests. Non-parametric Wilcoxon tests showed the distributions were different (p< 0.001) and the GrabCut stroke count was significantly higher (p = 5.115 e-49), with a mean of 577.08 (std 248.45) versus 221.57 (std 149.45) with PreciseEdge. Digital body measures were highly correlated to manual height, length, and girth measures, (0.931, 0.943, 0.893) for PreciseEdge and (0.936, 0. 944, 0.869) for GrabCut (Pearson correlation coefficient). PreciseEdge image segmentation allowed for masks yielding accurate digital body measurements highly correlated to manual, real-world measurements with over 38% less user input for an efficient, reliable, non-invasive alternative to livestock hand-held direct measuring tools.


Asunto(s)
Ganado , Enfermedades de Transmisión Sexual , Algoritmos , Animales , Procesamiento de Imagen Asistido por Computador/métodos , Ratones
11.
BMC Bioinformatics ; 11: 494, 2010 Oct 05.
Artículo en Inglés | MEDLINE | ID: mdl-20923564

RESUMEN

BACKGROUND: HIV-1 targets human cells expressing both the CD4 receptor, which binds the viral envelope glycoprotein gp120, as well as either the CCR5 (R5) or CXCR4 (X4) co-receptors, which interact primarily with the third hypervariable loop (V3 loop) of gp120. Determination of HIV-1 affinity for either the R5 or X4 co-receptor on host cells facilitates the inclusion of co-receptor antagonists as a part of patient treatment strategies. A dataset of 1193 distinct gp120 V3 loop peptide sequences (989 R5-utilizing, 204 X4-capable) is utilized to train predictive classifiers based on implementations of random forest, support vector machine, boosted decision tree, and neural network machine learning algorithms. An in silico mutagenesis procedure employing multibody statistical potentials, computational geometry, and threading of variant V3 sequences onto an experimental structure, is used to generate a feature vector representation for each variant whose components measure environmental perturbations at corresponding structural positions. RESULTS: Classifier performance is evaluated based on stratified 10-fold cross-validation, stratified dataset splits (2/3 training, 1/3 validation), and leave-one-out cross-validation. Best reported values of sensitivity (85%), specificity (100%), and precision (98%) for predicting X4-capable HIV-1 virus, overall accuracy (97%), Matthew's correlation coefficient (89%), balanced error rate (0.08), and ROC area (0.97) all reach critical thresholds, suggesting that the models outperform six other state-of-the-art methods and come closer to competing with phenotype assays. CONCLUSIONS: The trained classifiers provide instantaneous and reliable predictions regarding HIV-1 co-receptor usage, requiring only translated V3 loop genotypes as input. Furthermore, the novelty of these computational mutagenesis based predictor attributes distinguishes the models as orthogonal and complementary to previous methods that utilize sequence, structure, and/or evolutionary information. The classifiers are available online at http://proteins.gmu.edu/automute.


Asunto(s)
Proteína gp120 de Envoltorio del VIH/química , VIH-1/metabolismo , Modelos Moleculares , Algoritmos , Simulación por Computador , Bases de Datos Genéticas , Proteína gp120 de Envoltorio del VIH/metabolismo , VIH-1/química , VIH-1/genética , Receptores CCR5/genética , Receptores CCR5/metabolismo , Receptores CXCR4/genética , Receptores CXCR4/metabolismo
12.
BMC Struct Biol ; 10 Suppl 1: S5, 2010 May 17.
Artículo en Inglés | MEDLINE | ID: mdl-20487512

RESUMEN

BACKGROUND: There is a considerable literature on the source of the thermostability of proteins from thermophilic organisms. Understanding the mechanisms for this thermostability would provide insights into proteins generally and permit the design of synthetic hyperstable biocatalysts. RESULTS: We have systematically tested a large number of sequence and structure derived quantities for their ability to discriminate thermostable proteins from their non-thermostable orthologs using sets of mesophile-thermophile ortholog pairs. Most of the quantities tested correspond to properties previously reported to be associated with thermostability. Many of the structure related properties were derived from the Delaunay tessellation of protein structures. CONCLUSIONS: Carefully selected sequence based indices discriminate better than purely structure based indices. Combined sequence and structure based indices improve performance somewhat further. Based on our analysis, the strongest contributors to thermostability are an increase in ion pairs on the protein surface and a more strongly hydrophobic interior.


Asunto(s)
Proteínas/química , Secuencia de Aminoácidos , Proteínas Bacterianas/química , Modelos Moleculares , Fosfoglicerato Quinasa/química , Conformación Proteica , Estabilidad Proteica , Pyrococcus/química , Proteína de Unión a TATA-Box/química , Temperatura , Trypanosoma brucei brucei/química
13.
J Theor Biol ; 266(4): 560-8, 2010 Oct 21.
Artículo en Inglés | MEDLINE | ID: mdl-20655929

RESUMEN

Certain genetic variations in the human population are associated with heritable diseases, and single nucleotide polymorphisms (SNPs) represent the most common form of such differences in DNA sequence. In particular, substantial interest exists in determining whether a non-synonymous SNP (nsSNP), leading to a single residue replacement in the translated protein product, is neutral or disease-related. The nature of protein structure-function relationships suggests that nsSNP effects, either benign or leading to aberrant protein function possibly associated with disease, are dependent on relative structural changes introduced upon mutation. In this study, we characterize a representative sampling of 1790 documented neutral and disease-related human nsSNPs mapped to 243 diverse human protein structures, by quantifying environmental perturbations in the associated proteins with the use of a computational mutagenesis methodology that relies on a four-body, knowledge-based, statistical contact potential. These structural change data are used as attributes to generate a vector representation for each nsSNP, in combination with additional features reflecting sequence and structure of the corresponding protein. A trained model based on the random forest supervised classification algorithm achieves 76% cross-validation accuracy. Our classifier performs at least as well as other methods that use significantly larger datasets of nsSNPs for model training, and the novelty of our attributes differentiates the model as an orthogonal approach that can be utilized in conjunction with other techniques. A dedicated server for obtaining predictions, as well as supporting datasets and documentation, is available at http://proteins.gmu.edu/automute.


Asunto(s)
Biología Computacional/métodos , Enfermedad/genética , Bases del Conocimiento , Mutagénesis/genética , Polimorfismo de Nucleótido Simple/genética , Algoritmos , Aspartilglucosilaminasa/química , Bases de Datos Genéticas , Humanos , Aprendizaje , Modelos Moleculares , Estructura Secundaria de Proteína , Curva ROC , Relación Estructura-Actividad
14.
Sci Rep ; 10(1): 19340, 2020 11 09.
Artículo en Inglés | MEDLINE | ID: mdl-33168903

RESUMEN

Mass spectrometry enhanced by nanotechnology can achieve previously unattainable sensitivity for characterizing urinary pathogen-derived peptides. We utilized mass spectrometry enhanced by affinity hydrogel particles (analytical sensitivity = 2.5 pg/mL) to study tick pathogen-specific proteins shed in the urine of patients with (1) erythema migrans rash and acute symptoms, (2) post treatment Lyme disease syndrome (PTLDS), and (3) clinical suspicion of tick-borne illnesses (TBI). Targeted pathogens were Borrelia, Babesia, Anaplasma, Rickettsia, Ehrlichia, Bartonella, Francisella, Powassan virus, tick-borne encephalitis virus, and Colorado tick fever virus. Specificity was defined by 100% amino acid sequence identity with tick-borne pathogen proteins, evolutionary taxonomic verification for related pathogens, and no identity with human or other organisms. Using a cut off of two pathogen peptides, 9/10 acute Lyme Borreliosis patients resulted positive, while we identified zero false positive in 250 controls. Two or more pathogen peptides were identified in 40% of samples from PTLDS and TBI patients (categories 2 and 3 above, n = 59/148). Collectively, 279 distinct unique tick-borne pathogen derived peptides were identified. The number of pathogen specific peptides was directly correlated with presence or absence of symptoms reported by patients (ordinal regression pseudo-R2 = 0.392, p = 0.010). Enhanced mass spectrometry is a new tool for studying tick-borne pathogen infections.


Asunto(s)
Enfermedad de Lyme/microbiología , Enfermedad de Lyme/orina , Péptidos/orina , Garrapatas , Adulto , Anciano , Algoritmos , Animales , Babesia microti/metabolismo , Biomarcadores/metabolismo , Borrelia , Eritema Crónico Migrans/microbiología , Eritema Crónico Migrans/orina , Exantema , Femenino , Humanos , Hidrogeles/química , Infectología , Masculino , Espectrometría de Masas , Mesocricetus , Persona de Mediana Edad , Péptidos/química , Análisis de Regresión , Urinálisis
15.
Bioinformatics ; 24(18): 2002-9, 2008 Sep 15.
Artículo en Inglés | MEDLINE | ID: mdl-18632749

RESUMEN

MOTIVATION: Accurate predictive models for the impact of single amino acid substitutions on protein stability provide insight into protein structure and function. Such models are also valuable for the design and engineering of new proteins. Previously described methods have utilized properties of protein sequence or structure to predict the free energy change of mutants due to thermal (DeltaDeltaG) and denaturant (DeltaDeltaG(H2O)) denaturations, as well as mutant thermal stability (DeltaT(m)), through the application of either computational energy-based approaches or machine learning techniques. However, accuracy associated with applying these methods separately is frequently far from optimal. RESULTS: We detail a computational mutagenesis technique based on a four-body, knowledge-based, statistical contact potential. For any mutation due to a single amino acid replacement in a protein, the method provides an empirical normalized measure of the ensuing environmental perturbation occurring at every residue position. A feature vector is generated for the mutant by considering perturbations at the mutated position and it's ordered six nearest neighbors in the 3-dimensional (3D) protein structure. These predictors of stability change are evaluated by applying machine learning tools to large training sets of mutants derived from diverse proteins that have been experimentally studied and described. Predictive models based on our combined approach are either comparable to, or in many cases significantly outperform, previously published results. AVAILABILITY: A web server with supporting documentation is available at http://proteins.gmu.edu/automute.


Asunto(s)
Inteligencia Artificial , Biología Computacional , Mutagénesis , Proteínas/química , Proteínas/genética , Algoritmos , Simulación por Computador , Bases de Datos de Proteínas , Modelos Moleculares , Pliegue de Proteína , Estructura Terciaria de Proteína , Alineación de Secuencia , Análisis de Secuencia de Proteína , Relación Estructura-Actividad , Termodinámica
16.
Heliyon ; 5(6): e01884, 2019 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-31211262

RESUMEN

Ras proteins play a pivotal role as oncogenes by participating in diverse signaling events, including those linked to cell growth, differentiation, and proliferation. Using experimental fitness data and implementing artificial intelligence and a computational mutagenesis technique, we developed models that reliably predict fitness for all single residue mutants of H-ras proto-oncogene protein p21. The computational mutagenesis generated a feature vector of protein structural changes for each variant, and these data correlated well with fitness. Random forest classification and tree regression machine learning algorithms were implemented for training predictive models. Cross-validations were used to evaluate model performance, and control experiments were performed to assess statistical significance. Classification models revealed a balanced accuracy rate as high as 82%, with a Matthew's correlation of 0.63, and an area under ROC curve of 0.90. Similarly, regression models displayed Pearson's correlation reaching 0.79. On the other hand, control data sets led to performance values consistent with random guessing. Comparisons with several related state-of-the-art methods reflected favorably on our trained models. This H-Ras proof-of-principle study suggests a complementary approach for understanding mechanisms with which other proteins are involved in oncogenesis, including related Ras isoforms, and for providing useful insights into designing future diagnostic and treatment modalities.

17.
Proteins ; 71(4): 1930-9, 2008 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-18186470

RESUMEN

There is substantial interest in methods designed to predict the effect of nonsynonymous single nucleotide polymorphisms (nsSNPs) on protein function, given their potential relationship to heritable diseases. Current state-of-the-art supervised machine learning algorithms, such as random forest (RF), train models that classify single amino acid mutations in proteins as either neutral or deleterious to function. However, it is frequently the case that the functional effect of a polymorphism on a protein resides between these two extremes. The utilization of classifiers that incorporate fuzzy logic provides a natural extension in order to account for the spectrum of possible functional consequences. We generated a dataset of single amino acid substitutions in human proteins having known three-dimensional structures. Each variant was uniquely represented as a feature vector that included computational geometry and knowledge-based statistical potential predictors obtained though application of Delaunay tessellation of protein structures. Additional attributes consisted of physicochemical properties of the native and replacement amino acids as well as topological location of the mutated residue position in the solved structure. Classification performance of the RF algorithm was evaluated on a training set consisting of the disease-associated and neutral nsSNPs taken from our dataset, and attributes were ranked according to their relative importance. Similarly, we evaluated the performance of adaptive neuro-fuzzy inference system (ANFIS). The utility of statistical geometry predictors was compared with that of traditional structural and evolutionary attributes employed by other researchers, revealing an equally effective yet complementary methodology. Among all attributes in our feature set, the statistical geometry predictors were found to be the most highly ranked. On the basis of the AUC (area under the ROC curve) measure of performance, the ANFIS and RF models were equally effective when only statistical geometry features were utilized. Tenfold cross-validation studies evaluating AUC, balanced error rate (BER), and Matthew's correlation coefficient (MCC) showed that our RF model was at least comparable with the well-established methods of SIFT and PolyPhen. The trained RF and ANFIS models were each subsequently used to predict the disease potential of human nsSNPs in our dataset that are currently unclassified (http://rna.gmu.edu/FuzzySnps/).


Asunto(s)
Árboles de Decisión , Lógica Difusa , Polimorfismo de Nucleótido Simple/genética , Proteínas/química , Proteínas/genética , Algoritmos , Secuencia de Aminoácidos , Sustitución de Aminoácidos , Área Bajo la Curva , Inteligencia Artificial , Distribución de Chi-Cuadrado , Biología Computacional/métodos , Bases de Datos Factuales , Humanos , Interacciones Hidrofóbicas e Hidrofílicas , Modelos Estadísticos , Datos de Secuencia Molecular , Redes Neurales de la Computación , Filogenia , Valor Predictivo de las Pruebas , Conformación Proteica , Estructura Secundaria de Proteína , Estructura Terciaria de Proteína , Curva ROC , Reproducibilidad de los Resultados , Homología de Secuencia de Aminoácido
18.
Bioinformatics ; 23(23): 3155-61, 2007 Dec 01.
Artículo en Inglés | MEDLINE | ID: mdl-17977887

RESUMEN

MOTIVATION: An important area of research in biochemistry and molecular biology focuses on characterization of enzyme mutants. However, synthesis and analysis of experimental mutants is time consuming and expensive. We describe a machine-learning approach for inferring the activity levels of all unexplored single point mutants of an enzyme, based on a training set of such mutants with experimentally measured activity. RESULTS: Based on a Delaunay tessellation-derived four-body statistical potential function, a perturbation vector measuring environmental changes relative to wild type (wt) at every residue position uniquely characterizes each enzyme mutant for model development and prediction. First, a measure of model performance utilizing area (AUC) under the receiver operating characteristic (ROC) curve surpasses 0.83 and 0.77 for data sets of experimental HIV-1 protease and T4 lysozyme mutants, respectively. Additionally, a novel method is introduced for evaluating statistical significance associated with the number of correct test set predictions obtained from a trained model. Third, 100 stratified random splits of the protease and T4 lysozyme mutant data sets into training and test sets achieve 77.0% and 80.8% mean accuracy, respectively. Next, protease and T4 lysozyme models trained with experimental mutants are used to predict activity levels for all remaining mutants; a subsequent search for publications reporting on dozens of these test mutants reveals that experimental results are matched by 79% and 86% of predictions, respectively. Finally, learning curves for each mutant enzyme system indicate the influence of training set size on model performance. AVAILABILITY: Prediction databases at http://proteins.gmu.edu/automute/


Asunto(s)
Inteligencia Artificial , Enzimas/química , Enzimas/genética , Modelos Químicos , Mutagénesis Sitio-Dirigida/métodos , Análisis de Secuencia de Proteína/métodos , Algoritmos , Secuencia de Aminoácidos , Sustitución de Aminoácidos/genética , Simulación por Computador , Interpretación Estadística de Datos , Activación Enzimática , Modelos Moleculares , Datos de Secuencia Molecular , Mutación , Reconocimiento de Normas Patrones Automatizadas/métodos , Alineación de Secuencia/métodos , Relación Estructura-Actividad
19.
Hum Mutat ; 27(2): 163-72, 2006 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-16395672

RESUMEN

We describe a novel statistical scoring method based on a computational geometry approach to predict the functional impact (transactivation activity) of missense mutations in the DNA-binding domain (DBD) of the tumor suppressor TP53, which is the most frequently mutated gene in human cancer. Residual scores (RS) for each residue were calculated to reflect differences in the compositional preferences of four nearest-neighbor residues between mutant and wild-type proteins. The RS were then combined into a residual score profile (RSP) representing the RS values for all 194 residues in the DBD. Mutants were grouped into functional categories based on their transactivation activities experimentally measured in yeast functional assays using p53-response elements from eight different promoters. While these functional categories showed significant differences in average RS, the latter lacked resolution power to predict the transactivation activities of individual mutants. In contrast, using decision tree models, we found that the RSP predicted transactivation with an accuracy varying between 64.2% and 78.5% depending on the promoter. Lastly, we used the best model to predict the functional outcome of all missense mutants in the DBD of p53 and compared the predictions with their frequency of occurrence in human cancers. We found that mutants predicted as functional (F) accounted for approximately 14% of all missense mutants found in cancers, while mutants predicted as nonfunctional (NF) represented approximately 86% of the mutants. These results show that this computational approach provides a fast and reliable method for predicting the functional impact of p53 mutants associated with cancer.


Asunto(s)
Interpretación Estadística de Datos , Genes p53 , Mutación , Activación Transcripcional , Proteína p53 Supresora de Tumor/genética , Algoritmos , Inteligencia Artificial , Humanos , Modelos Estadísticos , Mutación Missense , Regiones Promotoras Genéticas , Curva ROC
20.
Proteins ; 64(1): 234-45, 2006 Jul 01.
Artículo en Inglés | MEDLINE | ID: mdl-16617425

RESUMEN

Topological scores, measures of sequence-structure compatibility, are calculated for all 1,881 single point mutants of the human immunodeficiency virus (HIV)-1 protease using a four-body statistical potential function based on Delaunay tessellation of protein structure. Comparison of the mutant topological score data with experimental data from alanine scan studies specifically on the dimer interface residues supports previous findings that 1) L97 and F99 contribute greatly to the Gibbs energy of HIV-1 protease dimerization, 2) Q2 and T4 contribute the least toward the Gibbs energy, and 3) C-terminal residues are more sensitive to mutations than those at the N-terminus. For a more comprehensive treatment of the relationship between protease structure and function, mutant topological scores are compared with the activity levels for a set of 536 experimentally synthesized protease mutants, and a significant correlation is observed. Finally, this structure-function correlation is similarly identified by examining model systems consisting of 2,015 single point mutants of bacteriophage T4 lysozyme as well as 366 single point mutants of HIV-1 reverse transcriptase and is hypothesized to be a property generally applicable to all proteins.


Asunto(s)
Mutagénesis , Proteínas/química , Proteínas/metabolismo , Proteínas Recombinantes/química , Proteínas Recombinantes/metabolismo , Biología Computacional/métodos , Proteasa del VIH/química , Proteasa del VIH/metabolismo , Modelos Moleculares , Conformación Proteica , Proteínas/genética , Relación Estructura-Actividad
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA