Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 36
Filtrar
Mais filtros

Base de dados
Tipo de documento
Intervalo de ano de publicação
1.
Int J Mol Sci ; 23(19)2022 Sep 29.
Artigo em Inglês | MEDLINE | ID: mdl-36232786

RESUMO

ApoB-100 is a member of a large lipid transfer protein superfamily and is one of the main apolipoproteins found on low-density lipoprotein (LDL) and very low-density lipoprotein (VLDL) particles. Despite its clinical significance for the development of cardiovascular disease, there is limited information on apoB-100 structure. We have developed a novel method based on the "divide and conquer" algorithm, using PSIPRED software, by dividing apoB-100 into five subunits and 11 domains. Models of each domain were prepared using I-TASSER, DEMO, RoseTTAFold, Phyre2, and MODELLER. Subsequently, we used disuccinimidyl sulfoxide (DSSO), a new mass spectrometry cleavable cross-linker, and the known position of disulfide bonds to experimentally validate each model. We obtained 65 unique DSSO cross-links, of which 87.5% were within a 26 Å threshold in the final model. We also evaluated the positions of cysteine residues involved in the eight known disulfide bonds in apoB-100, and each pair was measured within the expected 5.6 Å constraint. Finally, multiple domains were combined by applying constraints based on detected long-range DSSO cross-links to generate five subunits, which were subsequently merged to achieve an uninterrupted architecture for apoB-100 around a lipoprotein particle. Moreover, the dynamics of apoB-100 during particle size transitions was examined by comparing VLDL and LDL computational models and using experimental cross-linking data. In addition, the proposed model of receptor ligand binding of apoB-100 provides new insights into some of its functions.


Assuntos
Apolipoproteínas B , Cisteína , Apolipoproteína B-100 , Apolipoproteínas B/metabolismo , Simulação por Computador , Dissulfetos , Ligantes , Lipoproteínas LDL/química , Lipoproteínas VLDL , Modelos Estruturais , Sulfóxidos
2.
J Chem Inf Model ; 61(6): 2675-2685, 2021 06 28.
Artigo em Inglês | MEDLINE | ID: mdl-34047186

RESUMO

Opioid receptors (OPRs) are the main targets for the treatment of pain and related disorders. The opiate compounds that activate these receptors are effective analgesics but their use leads to adverse effects, and they often are highly addictive drugs of abuse. There is an urgent need for alternative chemicals that are analgesics and to reduce/avoid the unwanted effects in order to relieve the public health crisis of opioid addiction. Here, we aim to develop computational models to predict the OPR activity of small molecule compounds based on chemical structures and apply these models to identify novel OPR active compounds. We used four different machine learning algorithms to build models based on quantitative high throughput screening (qHTS) data sets of three OPRs in both agonist and antagonist modes. The best performing models were applied to virtually screen a large collection of compounds. The model predicted active compounds were experimentally validated using the same qHTS assays that generated the training data. Random forest was the best classifier with the highest performance metrics, and the mu OPR (OPRM)-agonist model achieved the best performance measured by AUC-ROC (0.88) and MCC (0.7) values. The model predicted actives resulted in hit rates ranging from 2.3% (delta OPR-agonist) to 15.8% (OPRM-agonist) after experimental confirmation. Compared to the original assay hit rate, all models enriched the hit rate by ≥2-fold. Our approach produced robust OPR prediction models that can be applied to prioritize compounds from large libraries for further experimental validation. The models identified several novel potent compounds as activators/inhibitors of OPRs that were confirmed experimentally. The potent hits were further investigated using molecular docking to find the interactions of the novel ligands in the active site of the corresponding OPR.


Assuntos
Analgésicos Opioides , Receptores Opioides , Analgésicos , Analgésicos Opioides/toxicidade , Humanos , Simulação de Acoplamento Molecular , Dor
3.
Proteins ; 88(11): 1472-1481, 2020 11.
Artigo em Inglês | MEDLINE | ID: mdl-32535960

RESUMO

Intrinsically disordered regions (IDR) play an important role in key biological processes and are closely related to human diseases. IDRs have great potential to serve as targets for drug discovery, most notably in disordered binding regions. Accurate prediction of IDRs is challenging because their genome wide occurrence and a low ratio of disordered residues make them difficult targets for traditional classification techniques. Existing computational methods mostly rely on sequence profiles to improve accuracy which is time consuming and computationally expensive. This article describes an ab initio sequence-only prediction method-which tries to overcome the challenge of accurate prediction posed by IDRs-based on reduced amino acid alphabets and convolutional neural networks (CNNs). We experiment with six different 3-letter reduced alphabets. We argue that the dimensional reduction in the input alphabet facilitates the detection of complex patterns within the sequence by the convolutional step. Experimental results show that our proposed IDR predictor performs at the same level or outperforms other state-of-the-art methods in the same class, achieving accuracy levels of 0.76 and AUC of 0.85 on the publicly available Critical Assessment of protein Structure Prediction dataset (CASP10). Therefore, our method is suitable for proteome-wide disorder prediction yielding similar or better accuracy than existing approaches at a faster speed.


Assuntos
Biologia Computacional/métodos , Mineração de Dados/estatística & dados numéricos , Proteínas Intrinsicamente Desordenadas/química , Aprendizado de Máquina , Redes Neurais de Computação , Sequência de Aminoácidos , Área Sob a Curva , Benchmarking , Conjuntos de Dados como Assunto , Humanos , Redução Dimensional com Múltiplos Fatores , Curva ROC , Análise de Sequência de Proteína
4.
Viruses ; 16(6)2024 May 22.
Artigo em Inglês | MEDLINE | ID: mdl-38932114

RESUMO

When designing live-attenuated respiratory syncytial virus (RSV) vaccine candidates, attenuating mutations can be developed through biologic selection or reverse-genetic manipulation and may include point mutations, codon and gene deletions, and genome rearrangements. Attenuation typically involves the reduction in virus replication, due to direct effects on viral structural and replicative machinery or viral factors that antagonize host defense or cause disease. However, attenuation must balance reduced replication and immunogenic antigen expression. In the present study, we explored a new approach in order to discover attenuating mutations. Specifically, we used protein structure modeling and computational methods to identify amino acid substitutions in the RSV nonstructural protein 1 (NS1) predicted to cause various levels of structural perturbation. Twelve different mutations predicted to alter the NS1 protein structure were introduced into infectious virus and analyzed in cell culture for effects on viral mRNA and protein expression, interferon and cytokine expression, and caspase activation. We found the use of structure-based machine learning to predict amino acid substitutions that reduce the thermodynamic stability of NS1 resulted in various levels of loss of NS1 function, exemplified by effects including reduced multi-cycle viral replication in cells competent for type I interferon, reduced expression of viral mRNAs and proteins, and increased interferon and apoptosis responses.


Assuntos
Aprendizado de Máquina , Vacinas contra Vírus Sincicial Respiratório , Vírus Sincicial Respiratório Humano , Proteínas não Estruturais Virais , Replicação Viral , Humanos , Proteínas não Estruturais Virais/genética , Proteínas não Estruturais Virais/imunologia , Proteínas não Estruturais Virais/química , Proteínas não Estruturais Virais/metabolismo , Vacinas contra Vírus Sincicial Respiratório/imunologia , Vacinas contra Vírus Sincicial Respiratório/genética , Vírus Sincicial Respiratório Humano/genética , Vírus Sincicial Respiratório Humano/imunologia , Vacinas Atenuadas/imunologia , Vacinas Atenuadas/genética , Infecções por Vírus Respiratório Sincicial/prevenção & controle , Infecções por Vírus Respiratório Sincicial/virologia , Infecções por Vírus Respiratório Sincicial/imunologia , Substituição de Aminoácidos , Mutação , Linhagem Celular
5.
Front Microbiol ; 15: 1304044, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38516021

RESUMO

Introduction: Antimicrobial peptides (AMPs) are promising alternatives to traditional antibiotics for combating plant pathogenic bacteria in agriculture and the environment. However, identifying potent AMPs through laborious experimental assays is resource-intensive and time-consuming. To address these limitations, this study presents a bioinformatics approach utilizing machine learning models for predicting and selecting AMPs active against plant pathogenic bacteria. Methods: N-gram representations of peptide sequences with 3-letter and 9-letter reduced amino acid alphabets were used to capture the sequence patterns and motifs that contribute to the antimicrobial activity of AMPs. A 5-fold cross-validation technique was used to train the machine learning models and to evaluate their predictive accuracy and robustness. Results: The models were applied to predict putative AMPs encoded by intergenic regions and small open reading frames (ORFs) of the citrus genome. Approximately 7% of the 10,000-peptide dataset from the intergenic region and 7% of the 685,924-peptide dataset from the whole genome were predicted as probable AMPs. The prediction accuracy of the reported models range from 0.72 to 0.91. A subset of the predicted AMPs was selected for experimental test against Spiroplasma citri, the causative agent of citrus stubborn disease. The experimental results confirm the antimicrobial activity of the selected AMPs against the target bacterium, demonstrating the predictive capability of the machine learning models. Discussion: Hydrophobic amino acid residues and positively charged amino acid residues are among the key features in predicting AMPs by the Random Forest Algorithm. Aggregation propensity appears to be correlated with the effectiveness of the AMPs. The described models would contribute to the development of effective AMP-based strategies for plant disease management in agricultural and environmental settings. To facilitate broader accessibility, our model is publicly available on the AGRAMP (Agricultural Ngrams Antimicrobial Peptides) server.

6.
Viruses ; 16(4)2024 04 20.
Artigo em Inglês | MEDLINE | ID: mdl-38675983

RESUMO

Human immunodeficiency virus type 1 (HIV-1) infection can result in HIV-associated neurocognitive disorder (HAND), a spectrum of disorders characterized by neurological impairment and chronic inflammation. Combined antiretroviral therapy (cART) has elicited a marked reduction in the number of individuals diagnosed with HAND. However, there is continual, low-level viral transcription due to the lack of a transcription inhibitor in cART regimens, which results in the accumulation of viral products within infected cells. To alleviate stress, infected cells can release accumulated products, such as TAR RNA, in extracellular vesicles (EVs), which can contribute to pathogenesis in neighboring cells. Here, we demonstrate that cART can contribute to autophagy deregulation in infected cells and increased EV release. The impact of EVs released from HIV-1 infected myeloid cells was found to contribute to CNS pathogenesis, potentially through EV-mediated TLR3 (Toll-like receptor 3) activation, suggesting the need for therapeutics to target this mechanism. Three HIV-1 TAR-binding compounds, 103FA, 111FA, and Ral HCl, were identified that recognize TAR RNA and reduce TLR activation. These data indicate that packaging of viral products into EVs, potentially exacerbated by antiretroviral therapeutics, may induce chronic inflammation of the CNS observed in cART-treated patients, and novel therapeutic strategies may be exploited to mitigate morbidity.


Assuntos
Autofagia , Vesículas Extracelulares , Infecções por HIV , HIV-1 , Receptor 3 Toll-Like , Vesículas Extracelulares/metabolismo , Humanos , Receptor 3 Toll-Like/metabolismo , Receptor 3 Toll-Like/genética , HIV-1/fisiologia , Infecções por HIV/virologia , Infecções por HIV/metabolismo , Infecções por HIV/tratamento farmacológico , Autofagia/efeitos dos fármacos , RNA Viral/metabolismo , RNA Viral/genética
7.
BMC Genomics ; 14 Suppl 4: S3, 2013.
Artigo em Inglês | MEDLINE | ID: mdl-24268064

RESUMO

BACKGROUND: Successful management of chronic human immunodeficiency virus type 1 (HIV-1) infection with a cocktail of antiretroviral medications can be negatively affected by the presence of drug resistant mutations in the viral targets. These targets include the HIV-1 protease (PR) and reverse transcriptase (RT) proteins, for which a number of inhibitors are available on the market and routinely prescribed. Protein mutational patterns are associated with varying degrees of resistance to their respective inhibitors, with extremes that can range from continued susceptibility to cross-resistance across all drugs. RESULTS: Here we implement statistical learning algorithms to develop structure- and sequence-based models for systematically predicting the effects of mutations in the PR and RT proteins on resistance to each of eight and eleven inhibitors, respectively. Employing a four-body statistical potential, mutant proteins are represented as feature vectors whose components quantify relative environmental perturbations at amino acid residue positions in the respective target structures upon mutation. Two approaches are implemented in developing sequence-based models, based on use of either relative frequencies or counts of n-grams, to generate vectors for representing mutant proteins. To the best of our knowledge, this is the first reported study on structure- and sequence-based predictive models of HIV-1 PR and RT drug resistance developed by implementing a four-body statistical potential and n-grams, respectively, to generate mutant attribute vectors. Performance of the learning methods is evaluated on the basis of tenfold cross-validation, using previously assayed and publicly available in vitro data relating mutational patterns in the targets to quantified inhibitor susceptibility changes. CONCLUSION: Overall performance results are competitive with those of a previously published study utilizing a sequence-based strategy, while our structure- and sequence-based models provide orthogonal and complementary prediction methodologies, respectively. In a novel application, we describe a technique for identifying every possible pair of RT inhibitors as either potentially effective together as part of a cocktail, or a combination that is to be avoided.


Assuntos
Farmacorresistência Viral , Inibidores da Protease de HIV/farmacologia , Protease de HIV/genética , Transcriptase Reversa do HIV/genética , HIV-1/efeitos dos fármacos , HIV-1/enzimologia , Inibidores da Transcriptase Reversa/farmacologia , Algoritmos , Domínio Catalítico/genética , Biologia Computacional , Infecções por HIV/tratamento farmacológico , Infecções por HIV/genética , Protease de HIV/química , Protease de HIV/metabolismo , Inibidores da Protease de HIV/metabolismo , Transcriptase Reversa do HIV/antagonistas & inibidores , Transcriptase Reversa do HIV/química , HIV-1/genética , HIV-1/metabolismo , Humanos , Modelos Moleculares , Proteínas Mutantes/química , Proteínas Mutantes/metabolismo , Mutação , Fenótipo , Conformação Proteica , Inibidores da Transcriptase Reversa/metabolismo
8.
Front Genet ; 14: 1200770, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37745840

RESUMO

Introduction: The African Goat Improvement Network Image Collection Protocol (AGIN-ICP) is an accessible, easy to use, low-cost procedure to collect phenotypic data via digital images. The AGIN-ICP collects images to extract several phenotype measures including health status indicators (anemia status, age, and weight), body measurements, shapes, and coat color and pattern, from digital images taken with standard digital cameras or mobile devices. This strategy is to quickly survey, record, assess, analyze, and store these data for use in a wide variety of production and sampling conditions. Methods: The work was accomplished as part of the multinational African Goat Improvement Network (AGIN) collaborative and is presented here as a case study in the AGIN collaboration model and working directly with community-based breeding programs (CBBP). It was iteratively developed and tested over 3 years, in 12 countries with over 12,000 images taken. Results and discussion: The AGIN-ICP development is described, and field implementation and the quality of the resulting images for use in image analysis and phenotypic data extraction are iteratively assessed. Digital body measures were validated using the PreciseEdge Image Segmentation Algorithm (PE-ISA) and software showing strong manual to digital body measure Pearson correlation coefficients of height, length, and girth measures (0.931, 0.943, 0.893) respectively. It is critical to note that while none of the very detailed tasks in the AGIN-ICP described here is difficult, every single one of them is even easier to accidentally omit, and the impact of such a mistake could render a sample image, a sampling day's images, or even an entire sampling trip's images difficult or unusable for extracting digital phenotypes. Coupled with tissue sampling and genomic testing, it may be useful in the effort to identify and conserve important animal genetic resources and in CBBP genetic improvement programs by providing reliably measured phenotypes with modest cost. Potential users include farmers, animal husbandry officials, veterinarians, regional government or other public health officials, researchers, and others. Based on these results, a final AGIN-ICP is presented, optimizing the costs, ease, and speed of field implementation of the collection method without compromising the quality of the image data collection.

9.
Methods Mol Biol ; 2405: 1-37, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35298806

RESUMO

Antibiotic resistance constitutes a global threat and could lead to a future pandemic. One strategy is to develop a new generation of antimicrobials. Naturally occurring antimicrobial peptides (AMPs) are recognized templates and some are already in clinical use. To accelerate the discovery of new antibiotics, it is useful to predict novel AMPs from the sequenced genomes of various organisms. The antimicrobial peptide database (APD) provided the first empirical peptide prediction program. It also facilitated the testing of the first machine-learning algorithms. This chapter provides an overview of machine-learning predictions of AMPs. Most of the predictors, such as AntiBP, CAMP, and iAMPpred, involve a single-label prediction of antimicrobial activity. This type of prediction has been expanded to antifungal, antiviral, antibiofilm, anti-TB, hemolytic, and anti-inflammatory peptides. The multiple functional roles of AMPs annotated in the APD also enabled multi-label predictions (iAMP-2L, MLAMP, and AMAP), which include antibacterial, antiviral, antifungal, antiparasitic, antibiofilm, anticancer, anti-HIV, antimalarial, insecticidal, antioxidant, chemotactic, spermicidal activities, and protease inhibiting activities. Also considered in predictions are peptide posttranslational modification, 3D structure, and microbial species-specific information. We compare important amino acids of AMPs implied from machine learning with the frequently occurring residues of the major classes of natural peptides. Finally, we discuss advances, limitations, and future directions of machine-learning predictions of antimicrobial peptides. Ultimately, we may assemble a pipeline of such predictions beyond antimicrobial activity to accelerate the discovery of novel AMP-based antimicrobials.


Assuntos
Anti-Infecciosos , Peptídeos Antimicrobianos , Aprendizado de Máquina , Aminoácidos/química , Anti-Infecciosos/química , Anti-Infecciosos/farmacologia , Peptídeos Antimicrobianos/química , Peptídeos Antimicrobianos/farmacologia , Peptídeos/química
10.
PLoS One ; 17(10): e0275821, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-36227957

RESUMO

Computer vision is a tool that could provide livestock producers with digital body measures and records that are important for animal health and production, namely body height and length, and chest girth. However, to build these tools, the scarcity of labeled training data sets with uniform images (pose, lighting) that also represent real-world livestock can be a challenge. Collecting images in a standard way, with manual image labeling is the gold standard to create such training data, but the time and cost can be prohibitive. We introduce the PreciseEdge image segmentation algorithm to address these issues by employing a standard image collection protocol with a semi-automated image labeling method, and a highly precise image segmentation for automated body measurement extraction directly from each image. These elements, from image collection to extraction are designed to work together to yield values highly correlated to real-world body measurements. PreciseEdge adds a brief preprocessing step inspired by chromakey to a modified GrabCut procedure to generate image masks for data extraction (body measurements) directly from the images. Three hundred RGB (red, green, blue) image samples were collected uniformly per the African Goat Improvement Network Image Collection Protocol (AGIN-ICP), which prescribes camera distance, poses, a blue backdrop, and a custom AGIN-ICP calibration sign. Images were taken in natural settings outdoors and in barns under high and low light, using a Ricoh digital camera producing JPG images (converted to PNG prior to processing). The rear and side AGIN-ICP poses were used for this study. PreciseEdge and GrabCut image segmentation methods were compared for differences in user input required to segment the images. The initial bounding box image output was captured for visual comparison. Automated digital body measurements extracted were compared to manual measures for each method. Both methods allow additional optional refinement (mouse strokes) to aid the segmentation algorithm. These optional mouse strokes were captured automatically and compared. Stroke count distributions for both methods were not normally distributed per Kolmogorov-Smirnov tests. Non-parametric Wilcoxon tests showed the distributions were different (p< 0.001) and the GrabCut stroke count was significantly higher (p = 5.115 e-49), with a mean of 577.08 (std 248.45) versus 221.57 (std 149.45) with PreciseEdge. Digital body measures were highly correlated to manual height, length, and girth measures, (0.931, 0.943, 0.893) for PreciseEdge and (0.936, 0. 944, 0.869) for GrabCut (Pearson correlation coefficient). PreciseEdge image segmentation allowed for masks yielding accurate digital body measurements highly correlated to manual, real-world measurements with over 38% less user input for an efficient, reliable, non-invasive alternative to livestock hand-held direct measuring tools.


Assuntos
Gado , Infecções Sexualmente Transmissíveis , Algoritmos , Animais , Processamento de Imagem Assistida por Computador/métodos , Camundongos
11.
BMC Bioinformatics ; 11: 494, 2010 Oct 05.
Artigo em Inglês | MEDLINE | ID: mdl-20923564

RESUMO

BACKGROUND: HIV-1 targets human cells expressing both the CD4 receptor, which binds the viral envelope glycoprotein gp120, as well as either the CCR5 (R5) or CXCR4 (X4) co-receptors, which interact primarily with the third hypervariable loop (V3 loop) of gp120. Determination of HIV-1 affinity for either the R5 or X4 co-receptor on host cells facilitates the inclusion of co-receptor antagonists as a part of patient treatment strategies. A dataset of 1193 distinct gp120 V3 loop peptide sequences (989 R5-utilizing, 204 X4-capable) is utilized to train predictive classifiers based on implementations of random forest, support vector machine, boosted decision tree, and neural network machine learning algorithms. An in silico mutagenesis procedure employing multibody statistical potentials, computational geometry, and threading of variant V3 sequences onto an experimental structure, is used to generate a feature vector representation for each variant whose components measure environmental perturbations at corresponding structural positions. RESULTS: Classifier performance is evaluated based on stratified 10-fold cross-validation, stratified dataset splits (2/3 training, 1/3 validation), and leave-one-out cross-validation. Best reported values of sensitivity (85%), specificity (100%), and precision (98%) for predicting X4-capable HIV-1 virus, overall accuracy (97%), Matthew's correlation coefficient (89%), balanced error rate (0.08), and ROC area (0.97) all reach critical thresholds, suggesting that the models outperform six other state-of-the-art methods and come closer to competing with phenotype assays. CONCLUSIONS: The trained classifiers provide instantaneous and reliable predictions regarding HIV-1 co-receptor usage, requiring only translated V3 loop genotypes as input. Furthermore, the novelty of these computational mutagenesis based predictor attributes distinguishes the models as orthogonal and complementary to previous methods that utilize sequence, structure, and/or evolutionary information. The classifiers are available online at http://proteins.gmu.edu/automute.


Assuntos
Proteína gp120 do Envelope de HIV/química , HIV-1/metabolismo , Modelos Moleculares , Algoritmos , Simulação por Computador , Bases de Dados Genéticas , Proteína gp120 do Envelope de HIV/metabolismo , HIV-1/química , HIV-1/genética , Receptores CCR5/genética , Receptores CCR5/metabolismo , Receptores CXCR4/genética , Receptores CXCR4/metabolismo
12.
BMC Struct Biol ; 10 Suppl 1: S5, 2010 May 17.
Artigo em Inglês | MEDLINE | ID: mdl-20487512

RESUMO

BACKGROUND: There is a considerable literature on the source of the thermostability of proteins from thermophilic organisms. Understanding the mechanisms for this thermostability would provide insights into proteins generally and permit the design of synthetic hyperstable biocatalysts. RESULTS: We have systematically tested a large number of sequence and structure derived quantities for their ability to discriminate thermostable proteins from their non-thermostable orthologs using sets of mesophile-thermophile ortholog pairs. Most of the quantities tested correspond to properties previously reported to be associated with thermostability. Many of the structure related properties were derived from the Delaunay tessellation of protein structures. CONCLUSIONS: Carefully selected sequence based indices discriminate better than purely structure based indices. Combined sequence and structure based indices improve performance somewhat further. Based on our analysis, the strongest contributors to thermostability are an increase in ion pairs on the protein surface and a more strongly hydrophobic interior.


Assuntos
Proteínas/química , Sequência de Aminoácidos , Proteínas de Bactérias/química , Modelos Moleculares , Fosfoglicerato Quinase/química , Conformação Proteica , Estabilidade Proteica , Pyrococcus/química , Proteína de Ligação a TATA-Box/química , Temperatura , Trypanosoma brucei brucei/química
13.
J Theor Biol ; 266(4): 560-8, 2010 Oct 21.
Artigo em Inglês | MEDLINE | ID: mdl-20655929

RESUMO

Certain genetic variations in the human population are associated with heritable diseases, and single nucleotide polymorphisms (SNPs) represent the most common form of such differences in DNA sequence. In particular, substantial interest exists in determining whether a non-synonymous SNP (nsSNP), leading to a single residue replacement in the translated protein product, is neutral or disease-related. The nature of protein structure-function relationships suggests that nsSNP effects, either benign or leading to aberrant protein function possibly associated with disease, are dependent on relative structural changes introduced upon mutation. In this study, we characterize a representative sampling of 1790 documented neutral and disease-related human nsSNPs mapped to 243 diverse human protein structures, by quantifying environmental perturbations in the associated proteins with the use of a computational mutagenesis methodology that relies on a four-body, knowledge-based, statistical contact potential. These structural change data are used as attributes to generate a vector representation for each nsSNP, in combination with additional features reflecting sequence and structure of the corresponding protein. A trained model based on the random forest supervised classification algorithm achieves 76% cross-validation accuracy. Our classifier performs at least as well as other methods that use significantly larger datasets of nsSNPs for model training, and the novelty of our attributes differentiates the model as an orthogonal approach that can be utilized in conjunction with other techniques. A dedicated server for obtaining predictions, as well as supporting datasets and documentation, is available at http://proteins.gmu.edu/automute.


Assuntos
Biologia Computacional/métodos , Doença/genética , Bases de Conhecimento , Mutagênese/genética , Polimorfismo de Nucleotídeo Único/genética , Algoritmos , Aspartilglucosilaminase/química , Bases de Dados Genéticas , Humanos , Aprendizagem , Modelos Moleculares , Estrutura Secundária de Proteína , Curva ROC , Relação Estrutura-Atividade
14.
Sci Rep ; 10(1): 19340, 2020 11 09.
Artigo em Inglês | MEDLINE | ID: mdl-33168903

RESUMO

Mass spectrometry enhanced by nanotechnology can achieve previously unattainable sensitivity for characterizing urinary pathogen-derived peptides. We utilized mass spectrometry enhanced by affinity hydrogel particles (analytical sensitivity = 2.5 pg/mL) to study tick pathogen-specific proteins shed in the urine of patients with (1) erythema migrans rash and acute symptoms, (2) post treatment Lyme disease syndrome (PTLDS), and (3) clinical suspicion of tick-borne illnesses (TBI). Targeted pathogens were Borrelia, Babesia, Anaplasma, Rickettsia, Ehrlichia, Bartonella, Francisella, Powassan virus, tick-borne encephalitis virus, and Colorado tick fever virus. Specificity was defined by 100% amino acid sequence identity with tick-borne pathogen proteins, evolutionary taxonomic verification for related pathogens, and no identity with human or other organisms. Using a cut off of two pathogen peptides, 9/10 acute Lyme Borreliosis patients resulted positive, while we identified zero false positive in 250 controls. Two or more pathogen peptides were identified in 40% of samples from PTLDS and TBI patients (categories 2 and 3 above, n = 59/148). Collectively, 279 distinct unique tick-borne pathogen derived peptides were identified. The number of pathogen specific peptides was directly correlated with presence or absence of symptoms reported by patients (ordinal regression pseudo-R2 = 0.392, p = 0.010). Enhanced mass spectrometry is a new tool for studying tick-borne pathogen infections.


Assuntos
Doença de Lyme/microbiologia , Doença de Lyme/urina , Peptídeos/urina , Carrapatos , Adulto , Idoso , Algoritmos , Animais , Babesia microti/metabolismo , Biomarcadores/metabolismo , Borrelia , Eritema Migrans Crônico/microbiologia , Eritema Migrans Crônico/urina , Exantema , Feminino , Humanos , Hidrogéis/química , Infectologia , Masculino , Espectrometria de Massas , Mesocricetus , Pessoa de Meia-Idade , Peptídeos/química , Análise de Regressão , Urinálise
15.
Bioinformatics ; 24(18): 2002-9, 2008 Sep 15.
Artigo em Inglês | MEDLINE | ID: mdl-18632749

RESUMO

MOTIVATION: Accurate predictive models for the impact of single amino acid substitutions on protein stability provide insight into protein structure and function. Such models are also valuable for the design and engineering of new proteins. Previously described methods have utilized properties of protein sequence or structure to predict the free energy change of mutants due to thermal (DeltaDeltaG) and denaturant (DeltaDeltaG(H2O)) denaturations, as well as mutant thermal stability (DeltaT(m)), through the application of either computational energy-based approaches or machine learning techniques. However, accuracy associated with applying these methods separately is frequently far from optimal. RESULTS: We detail a computational mutagenesis technique based on a four-body, knowledge-based, statistical contact potential. For any mutation due to a single amino acid replacement in a protein, the method provides an empirical normalized measure of the ensuing environmental perturbation occurring at every residue position. A feature vector is generated for the mutant by considering perturbations at the mutated position and it's ordered six nearest neighbors in the 3-dimensional (3D) protein structure. These predictors of stability change are evaluated by applying machine learning tools to large training sets of mutants derived from diverse proteins that have been experimentally studied and described. Predictive models based on our combined approach are either comparable to, or in many cases significantly outperform, previously published results. AVAILABILITY: A web server with supporting documentation is available at http://proteins.gmu.edu/automute.


Assuntos
Inteligência Artificial , Biologia Computacional , Mutagênese , Proteínas/química , Proteínas/genética , Algoritmos , Simulação por Computador , Bases de Dados de Proteínas , Modelos Moleculares , Dobramento de Proteína , Estrutura Terciária de Proteína , Alinhamento de Sequência , Análise de Sequência de Proteína , Relação Estrutura-Atividade , Termodinâmica
16.
Heliyon ; 5(6): e01884, 2019 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-31211262

RESUMO

Ras proteins play a pivotal role as oncogenes by participating in diverse signaling events, including those linked to cell growth, differentiation, and proliferation. Using experimental fitness data and implementing artificial intelligence and a computational mutagenesis technique, we developed models that reliably predict fitness for all single residue mutants of H-ras proto-oncogene protein p21. The computational mutagenesis generated a feature vector of protein structural changes for each variant, and these data correlated well with fitness. Random forest classification and tree regression machine learning algorithms were implemented for training predictive models. Cross-validations were used to evaluate model performance, and control experiments were performed to assess statistical significance. Classification models revealed a balanced accuracy rate as high as 82%, with a Matthew's correlation of 0.63, and an area under ROC curve of 0.90. Similarly, regression models displayed Pearson's correlation reaching 0.79. On the other hand, control data sets led to performance values consistent with random guessing. Comparisons with several related state-of-the-art methods reflected favorably on our trained models. This H-Ras proof-of-principle study suggests a complementary approach for understanding mechanisms with which other proteins are involved in oncogenesis, including related Ras isoforms, and for providing useful insights into designing future diagnostic and treatment modalities.

17.
Proteins ; 71(4): 1930-9, 2008 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-18186470

RESUMO

There is substantial interest in methods designed to predict the effect of nonsynonymous single nucleotide polymorphisms (nsSNPs) on protein function, given their potential relationship to heritable diseases. Current state-of-the-art supervised machine learning algorithms, such as random forest (RF), train models that classify single amino acid mutations in proteins as either neutral or deleterious to function. However, it is frequently the case that the functional effect of a polymorphism on a protein resides between these two extremes. The utilization of classifiers that incorporate fuzzy logic provides a natural extension in order to account for the spectrum of possible functional consequences. We generated a dataset of single amino acid substitutions in human proteins having known three-dimensional structures. Each variant was uniquely represented as a feature vector that included computational geometry and knowledge-based statistical potential predictors obtained though application of Delaunay tessellation of protein structures. Additional attributes consisted of physicochemical properties of the native and replacement amino acids as well as topological location of the mutated residue position in the solved structure. Classification performance of the RF algorithm was evaluated on a training set consisting of the disease-associated and neutral nsSNPs taken from our dataset, and attributes were ranked according to their relative importance. Similarly, we evaluated the performance of adaptive neuro-fuzzy inference system (ANFIS). The utility of statistical geometry predictors was compared with that of traditional structural and evolutionary attributes employed by other researchers, revealing an equally effective yet complementary methodology. Among all attributes in our feature set, the statistical geometry predictors were found to be the most highly ranked. On the basis of the AUC (area under the ROC curve) measure of performance, the ANFIS and RF models were equally effective when only statistical geometry features were utilized. Tenfold cross-validation studies evaluating AUC, balanced error rate (BER), and Matthew's correlation coefficient (MCC) showed that our RF model was at least comparable with the well-established methods of SIFT and PolyPhen. The trained RF and ANFIS models were each subsequently used to predict the disease potential of human nsSNPs in our dataset that are currently unclassified (http://rna.gmu.edu/FuzzySnps/).


Assuntos
Árvores de Decisões , Lógica Fuzzy , Polimorfismo de Nucleotídeo Único/genética , Proteínas/química , Proteínas/genética , Algoritmos , Sequência de Aminoácidos , Substituição de Aminoácidos , Área Sob a Curva , Inteligência Artificial , Distribuição de Qui-Quadrado , Biologia Computacional/métodos , Bases de Dados Factuais , Humanos , Interações Hidrofóbicas e Hidrofílicas , Modelos Estatísticos , Dados de Sequência Molecular , Redes Neurais de Computação , Filogenia , Valor Preditivo dos Testes , Conformação Proteica , Estrutura Secundária de Proteína , Estrutura Terciária de Proteína , Curva ROC , Reprodutibilidade dos Testes , Homologia de Sequência de Aminoácidos
18.
Bioinformatics ; 23(23): 3155-61, 2007 Dec 01.
Artigo em Inglês | MEDLINE | ID: mdl-17977887

RESUMO

MOTIVATION: An important area of research in biochemistry and molecular biology focuses on characterization of enzyme mutants. However, synthesis and analysis of experimental mutants is time consuming and expensive. We describe a machine-learning approach for inferring the activity levels of all unexplored single point mutants of an enzyme, based on a training set of such mutants with experimentally measured activity. RESULTS: Based on a Delaunay tessellation-derived four-body statistical potential function, a perturbation vector measuring environmental changes relative to wild type (wt) at every residue position uniquely characterizes each enzyme mutant for model development and prediction. First, a measure of model performance utilizing area (AUC) under the receiver operating characteristic (ROC) curve surpasses 0.83 and 0.77 for data sets of experimental HIV-1 protease and T4 lysozyme mutants, respectively. Additionally, a novel method is introduced for evaluating statistical significance associated with the number of correct test set predictions obtained from a trained model. Third, 100 stratified random splits of the protease and T4 lysozyme mutant data sets into training and test sets achieve 77.0% and 80.8% mean accuracy, respectively. Next, protease and T4 lysozyme models trained with experimental mutants are used to predict activity levels for all remaining mutants; a subsequent search for publications reporting on dozens of these test mutants reveals that experimental results are matched by 79% and 86% of predictions, respectively. Finally, learning curves for each mutant enzyme system indicate the influence of training set size on model performance. AVAILABILITY: Prediction databases at http://proteins.gmu.edu/automute/


Assuntos
Inteligência Artificial , Enzimas/química , Enzimas/genética , Modelos Químicos , Mutagênese Sítio-Dirigida/métodos , Análise de Sequência de Proteína/métodos , Algoritmos , Sequência de Aminoácidos , Substituição de Aminoácidos/genética , Simulação por Computador , Interpretação Estatística de Dados , Ativação Enzimática , Modelos Moleculares , Dados de Sequência Molecular , Mutação , Reconhecimento Automatizado de Padrão/métodos , Alinhamento de Sequência/métodos , Relação Estrutura-Atividade
19.
Hum Mutat ; 27(2): 163-72, 2006 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-16395672

RESUMO

We describe a novel statistical scoring method based on a computational geometry approach to predict the functional impact (transactivation activity) of missense mutations in the DNA-binding domain (DBD) of the tumor suppressor TP53, which is the most frequently mutated gene in human cancer. Residual scores (RS) for each residue were calculated to reflect differences in the compositional preferences of four nearest-neighbor residues between mutant and wild-type proteins. The RS were then combined into a residual score profile (RSP) representing the RS values for all 194 residues in the DBD. Mutants were grouped into functional categories based on their transactivation activities experimentally measured in yeast functional assays using p53-response elements from eight different promoters. While these functional categories showed significant differences in average RS, the latter lacked resolution power to predict the transactivation activities of individual mutants. In contrast, using decision tree models, we found that the RSP predicted transactivation with an accuracy varying between 64.2% and 78.5% depending on the promoter. Lastly, we used the best model to predict the functional outcome of all missense mutants in the DBD of p53 and compared the predictions with their frequency of occurrence in human cancers. We found that mutants predicted as functional (F) accounted for approximately 14% of all missense mutants found in cancers, while mutants predicted as nonfunctional (NF) represented approximately 86% of the mutants. These results show that this computational approach provides a fast and reliable method for predicting the functional impact of p53 mutants associated with cancer.


Assuntos
Interpretação Estatística de Dados , Genes p53 , Mutação , Ativação Transcricional , Proteína Supressora de Tumor p53/genética , Algoritmos , Inteligência Artificial , Humanos , Modelos Estatísticos , Mutação de Sentido Incorreto , Regiões Promotoras Genéticas , Curva ROC
20.
Proteins ; 64(1): 234-45, 2006 Jul 01.
Artigo em Inglês | MEDLINE | ID: mdl-16617425

RESUMO

Topological scores, measures of sequence-structure compatibility, are calculated for all 1,881 single point mutants of the human immunodeficiency virus (HIV)-1 protease using a four-body statistical potential function based on Delaunay tessellation of protein structure. Comparison of the mutant topological score data with experimental data from alanine scan studies specifically on the dimer interface residues supports previous findings that 1) L97 and F99 contribute greatly to the Gibbs energy of HIV-1 protease dimerization, 2) Q2 and T4 contribute the least toward the Gibbs energy, and 3) C-terminal residues are more sensitive to mutations than those at the N-terminus. For a more comprehensive treatment of the relationship between protease structure and function, mutant topological scores are compared with the activity levels for a set of 536 experimentally synthesized protease mutants, and a significant correlation is observed. Finally, this structure-function correlation is similarly identified by examining model systems consisting of 2,015 single point mutants of bacteriophage T4 lysozyme as well as 366 single point mutants of HIV-1 reverse transcriptase and is hypothesized to be a property generally applicable to all proteins.


Assuntos
Mutagênese , Proteínas/química , Proteínas/metabolismo , Proteínas Recombinantes/química , Proteínas Recombinantes/metabolismo , Biologia Computacional/métodos , Protease de HIV/química , Protease de HIV/metabolismo , Modelos Moleculares , Conformação Proteica , Proteínas/genética , Relação Estrutura-Atividade
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA