RESUMO
The distinctive nature of cancer as a disease prompts an exploration of the special characteristics the genes implicated in cancer exhibit. The identification of cancer-associated genes and their characteristics is crucial to further our understanding of this disease and enhanced likelihood of therapeutic drug targets success. However, the rate at which cancer genes are being identified experimentally is slow. Applying predictive analysis techniques, through the building of accurate machine learning models, is potentially a useful approach in enhancing the identification rate of these genes and their characteristics. Here, we investigated gene essentiality scores and found that they tend to be higher for cancer-associated genes compared to other protein-coding human genes. We built a dataset of extended gene properties linked to essentiality and used it to train a machine-learning model; this model reached 89% accuracy and > 0.85 for the Area Under Curve (AUC). The model showed that essentiality, evolutionary-related properties, and properties arising from protein-protein interaction networks are particularly effective in predicting cancer-associated genes. We were able to use the model to identify potential candidate genes that have not been previously linked to cancer. Prioritising genes that score highly by our methods could aid scientists in their cancer genes research.
Assuntos
Genes Essenciais , Aprendizado de Máquina , Neoplasias , Humanos , Neoplasias/genética , Mapas de Interação de Proteínas/genética , Evolução Molecular , Biologia Computacional/métodosRESUMO
Congenital renal tract malformations (RTMs) are the major cause of severe kidney failure in children. Studies to date have identified defined genetic causes for only a minority of human RTMs. While some RTMs may be caused by poorly defined environmental perturbations affecting organogenesis, it is likely that numerous causative genetic variants have yet to be identified. Unfortunately, the speed of discovering further genetic causes for RTMs is limited by challenges in prioritising candidate genes harbouring sequence variants. Here, we exploited the computer-based artificial intelligence methodology of supervised machine learning to identify genes with a high probability of being involved in renal development. These genes, when mutated, are promising candidates for causing RTMs. With this methodology, the machine learning classifier determines which attributes are common to renal development genes and identifies genes possessing these attributes. Here we report the validation of an RTM gene classifier and provide predictions of the RTM association status for all protein-coding genes in the mouse genome. Overall, our predictions, whilst not definitive, can inform the prioritisation of genes when evaluating patient sequence data for genetic diagnosis. This knowledge of renal developmental genes will accelerate the processes of reaching a genetic diagnosis for patients born with RTMs.
Assuntos
Inteligência Artificial , Sistema Urinário , Criança , Humanos , Camundongos , Animais , Rim/anormalidades , Sistema Urinário/anormalidades , Aprendizado de MáquinaRESUMO
Alzheimer's disease (AD) is characterized by the presence of ß-amyloid plaques (Aß) and neurofibrillary tangles (NFTs) in the brain. The prevalence of the disease is increasing and is expected to reach 141 million cases by 2050. Despite the risk factors associated with the disease, there is no known causative agent for AD. Clinical trials with many drugs have failed over the years, and no therapeutic has been approved for AD. There is increasing evidence that pathogens are found in the brains of AD patients and controls, such as human herpes simplex virus-1 (HSV-1). Given the lack of a human model, the route for pathogen entry into the brain remains open for scrutiny and may include entry via a disturbed blood-brain barrier or the olfactory nasal route. Many factors can contribute to the pathogenicity of HSV-1, such as the ability of HSV-1 to remain latent, tau protein phosphorylation, increased accumulation of Aß invivo and in vitro, and repeated cycle of reactivation if immunocompromised. Intriguingly, valacyclovir, a widely used drug for the treatment of HSV-1 and HSV-2 infection, has shown patient improvement in cognition compared to controls in AD clinical studies. We discuss the potential role of HSV-1 in AD pathogenesis and argue for further studies to investigate this relationship.
Assuntos
Doença de Alzheimer , Herpes Simples , Herpesvirus Humano 1 , Doença de Alzheimer/tratamento farmacológico , Peptídeos beta-Amiloides , Humanos , Emaranhados Neurofibrilares , Placa AmiloideRESUMO
Protein misfolding and aggregation is observed in many amyloidogenic diseases affecting either the central nervous system or a variety of peripheral tissues. Structural and dynamic characterization of all species along the pathways from monomers to fibrils is challenging by experimental and computational means because they involve intrinsically disordered proteins in most diseases. Yet understanding how amyloid species become toxic is the challenge in developing a treatment for these diseases. Here we review what computer, in vitro, in vivo, and pharmacological experiments tell us about the accumulation and deposition of the oligomers of the (Aß, tau), α-synuclein, IAPP, and superoxide dismutase 1 proteins, which have been the mainstream concept underlying Alzheimer's disease (AD), Parkinson's disease (PD), type II diabetes (T2D), and amyotrophic lateral sclerosis (ALS) research, respectively, for many years.
Assuntos
Amiloide/química , Amiloide/metabolismo , Doenças Neurodegenerativas/metabolismo , Doença de Alzheimer/metabolismo , Doença de Alzheimer/patologia , Peptídeos beta-Amiloides/química , Peptídeos beta-Amiloides/metabolismo , Esclerose Lateral Amiotrófica/genética , Esclerose Lateral Amiotrófica/metabolismo , Esclerose Lateral Amiotrófica/patologia , Animais , Diabetes Mellitus Tipo 2/metabolismo , Diabetes Mellitus Tipo 2/patologia , Humanos , Polipeptídeo Amiloide das Ilhotas Pancreáticas/química , Polipeptídeo Amiloide das Ilhotas Pancreáticas/metabolismo , Modelos Moleculares , Doenças Neurodegenerativas/patologia , Doença de Parkinson/metabolismo , Doença de Parkinson/patologia , Agregação Patológica de Proteínas , Deficiências na Proteostase/metabolismo , Superóxido Dismutase-1/química , Superóxido Dismutase-1/metabolismo , alfa-Sinucleína/química , alfa-Sinucleína/metabolismo , Proteínas tau/química , Proteínas tau/metabolismoRESUMO
Monoclonal antibodies (mAbs) represent a rapidly expanding market for biotherapeutics. Structural changes in the mAb can lead to unwanted immunogenicity, reduced efficacy, and loss of material during production. The pharmaceutical sector requires new protein characterization tools that are fast, applicable in situ and to the manufacturing process. Raman has been highlighted as a technique to suit this application as it is information-rich, minimally invasive, insensitive to water background and requires little to no sample preparation. This study investigates the applicability of Raman to detect Post-Translational Modifications (PTMs) and degradation seen in mAbs. IgG4 molecules have been incubated under a range of conditions known to result in degradation of the therapeutic including varied pH, temperature, agitation, photo, and chemical stresses. Aggregation was measured using size-exclusion chromatography, and PTM levels were calculated using peptide mapping. By combining principal component analysis (PCA) with Raman spectroscopy and circular dichroism (CD) spectroscopy structural analysis we were able to separate proteins based on PTMs and degradation. Furthermore, by identifying key bands that lead to the PCA separation we could correlate spectral peaks to specific PTMs. In particular, we have identified a peak which exhibits a shift in samples with higher levels of Trp oxidation. Through separation of IgG4 aggregates, by size, we have shown a linear correlation between peak wavenumbers of specific functional groups and the amount of aggregate present. We therefore demonstrate the capability for Raman spectroscopy to be used as an analytical tool to measure degradation and PTMs in-line with therapeutic production.
Assuntos
Anticorpos Monoclonais/metabolismo , Imunoglobulina G/metabolismo , Processamento de Proteína Pós-Traducional , Análise Espectral Raman/métodos , Anticorpos Monoclonais/genética , Dicroísmo Circular , Humanos , Imunoglobulina G/genética , Mapeamento de Peptídeos , Conformação ProteicaRESUMO
Glycation is a protein modification prevalent in the progression of diseases such as Diabetes and Alzheimer's, as well as a byproduct of therapeutic protein expression, notably for monoclonal antibodies (mAbs). Quantification of glycated protein is thus advantageous in both assessing the advancement of disease diagnosis and for quality control of protein therapeutics. Vibrational spectroscopy has been highlighted as a technique that can easily be modified for rapid analysis of the glycation state of proteins, and requires minimal sample preparation. Glycated samples of lysozyme and albumin were synthesised by incubation with 0.5 M glucose for 30 days. Here we show that both FTIR-ATR and Raman spectroscopy are able to distinguish between glycated and non-glycated proteins. Principal component analysis (PCA) was used to show separation between control and glycated samples. Loadings plots found specific peaks that accounted for the variation - notably a peak at 1027 cm-1 for FTIR-ATR. In Raman spectroscopy, PCA emphasised peaks at 1040 cm-1 and 1121 cm-1. Therefore, both FTIR-ATR and Raman spectroscopy found changes in peak intensities and wavenumbers within the sugar C-O/C-C/C-N region (1200-800 cm-1). For quantification of the level of glycation of lysozyme, partial least squares regression (PLSR), with statistical validation, was employed to analyse Raman spectra from solution samples containing 0-100% glycated lysozyme, generating a robust model with R2 of 0.99. We therefore show the scope and potential of Raman spectroscopy as a high throughput quantification method for glycated proteins in solution that could be applied in disease diagnostics, as well as therapeutic protein quality control.
Assuntos
Albuminas/metabolismo , Muramidase/metabolismo , Espectroscopia de Infravermelho com Transformada de Fourier , Análise Espectral Raman , Vibração , Glicosilação , HumanosRESUMO
Ribonucleic acids (RNAs) are key to the central dogma of molecular biology. While Raman spectroscopy holds great potential for studying RNA conformational dynamics, current computational Raman prediction and assignment methods are limited in terms of system size and inclusion of conformational exchange. Here, a framework is presented that predicts Raman spectra using mixtures of sub-spectra corresponding to major conformers calculated using classical and ab initio molecular dynamics. Experimental optimization allowed purines and pyrimidines to be characterized as predominantly syn and anti, respectively, and ribose into exchange between equivalent south and north populations. These measurements are in excellent agreement with Raman spectroscopy of ribonucleosides, and previous experimental and computational results. This framework provides a measure of ribonucleoside solution populations and conformational exchange in RNA subunits. It complements other experimental techniques and could be extended to other molecules, such as proteins and carbohydrates, enabling biological insights and providing a new analytical tool.
RESUMO
During the evolution of multicellular eukaryotes, gene duplication occurs frequently to generate new genes and/or functions. A duplicated gene may have a similar function to its ancestral gene. Therefore, it may be expected that duplicated genes are less likely to be critical for the survival of an organism, since there are multiple copies of the gene rendering each individual copy redundant. In this study, we explored the developmental expression patterns of duplicate gene pairs and the relationship between development co-expression and phenotypes resulting from the knockout of duplicate genes in the mouse. We define genes that generate lethal phenotypes in single gene knockout experiments as essential genes. We found that duplicate gene pairs comprised of two essential genes tend to be expressed at different stages of development, compared to duplicate gene pairs with at least one non-essential member, showing that the timing of developmental expression affects the ability of one paralogue to compensate for the loss of the other. Gene essentiality, developmental expression and gene duplication are thus closely linked.
Assuntos
Desenvolvimento Embrionário/genética , Duplicação Gênica , Perfilação da Expressão Gênica/métodos , Regulação da Expressão Gênica no Desenvolvimento , Genes Duplicados/genética , Genes Essenciais/genética , Algoritmos , Animais , Animais Recém-Nascidos , Evolução Molecular , Humanos , Camundongos , Modelos Genéticos , Organogênese/genéticaRESUMO
The genes that are required for organismal survival are annotated as 'essential genes'. Identifying all the essential genes of an animal species can reveal critical functions that are needed during the development of the organism. To inform studies on mouse development, we developed a supervised machine learning classifier based on phenotype data from mouse knockout experiments. We used this classifier to predict the essentiality of mouse genes lacking experimental data. Validation of our predictions against a blind test set of recent mouse knockout experimental data indicated a high level of accuracy (>80%). We also validated our predictions for other mouse mutagenesis methodologies, demonstrating that the predictions are accurate for lethal phenotypes isolated in random chemical mutagenesis screens and embryonic stem cell screens. The biological functions that are enriched in essential and non-essential genes have been identified, showing that essential genes tend to encode intracellular proteins that interact with nucleic acids. The genome distribution of predicted essential and non-essential genes was analysed, demonstrating that the density of essential genes varies throughout the genome. A comparison with human essential and non-essential genes was performed, revealing conservation between human and mouse gene essentiality status. Our genome-wide predictions of mouse essential genes will be of value for the planning of mouse knockout experiments and phenotyping assays, for understanding the functional processes required during mouse development, and for the prioritisation of disease candidate genes identified in human genome and exome sequence datasets.
Assuntos
Genes Controladores do Desenvolvimento , Genes Essenciais , Aprendizado de Máquina , Algoritmos , Animais , Cromossomos de Mamíferos/genética , Sequência Conservada , Bases de Dados Genéticas , Ontologia Genética , Humanos , Camundongos , Anotação de Sequência Molecular , Mutagênese/genética , Fenótipo , Mutação Puntual/genética , Mapas de Interação de Proteínas/genética , Reprodutibilidade dos Testes , SoftwareRESUMO
BACKGROUND: Islet amyloid polypeptide (IAPP) or amylin deposits can be found in the islets of type 2 diabetes patients. The peptide is suggested to be involved in the etiology of the disease through formation of amyloid deposits and destruction of ß islet cells, though the underlying molecular events leading from IAPP deposition to ß cell death are still largely unknown. RESULTS: We used OFFGEL™ proteomics to study how IAPP exposure affects the proteome of rat pancreatic insulinoma Rin-5F cells. The OFFGEL™ methodology is highly effective at generating quantitative data on hundreds of proteins affected by IAPP, with its accuracy confirmed by In Cell Western and Quantitative Real Time PCR results. Combining data on individual proteins identifies pathways and protein complexes affected by IAPP. IAPP disrupts protein synthesis and degradation, and induces oxidative stress. It causes decreases in protein transport and localization. IAPP disrupts the regulation of ubiquitin-dependent protein degradation and increases catabolic processes. IAPP causes decreases in protein transport and localization, and affects the cytoskeleton, DNA repair and oxidative stress. CONCLUSIONS: Results are consistent with a model where IAPP aggregates overwhelm the ability of a cell to degrade proteins via the ubiquitin system. Ultimately this leads to apoptosis. IAPP aggregates may be also toxic to the cell by causing oxidative stress, leading to DNA damage or by decreasing protein transport. The reversal of any of these effects, perhaps by targeting proteins which alter in response to IAPP, may be beneficial for type II diabetes.
Assuntos
Polipeptídeo Amiloide das Ilhotas Pancreáticas/farmacologia , Proteoma/efeitos dos fármacos , Animais , Linhagem Celular Tumoral , Sobrevivência Celular/efeitos dos fármacos , Cromatografia Líquida de Alta Pressão , Reparo do DNA/efeitos dos fármacos , Regulação da Expressão Gênica/efeitos dos fármacos , Humanos , Espectrometria de Massas , Estresse Oxidativo/efeitos dos fármacos , Proteoma/genética , Proteoma/metabolismo , RatosRESUMO
The dominant model for Alzheimer's disease (AD) is the amyloid cascade hypothesis, in which the accumulation of excess amyloid-ß (Aß) leads to inflammation, excess glutamate and intracellular calcium, oxidative stress, tau hyperphosphorylation and tangle formation, neuronal loss, and ultimately dementia. In a cascade, AD proceeds in a unidirectional fashion, with events only affecting downstream processes. Compelling evidence now exists for the presence of positive feedback loops in AD, however, involving oxidative stress, inflammation, glutamate, calcium, and tau. The pathological state of AD is thus a system of positive feedback loops, leading to amplification of the initial perturbation, rather than a linear cascade. Drugs may therefore be effective by targeting numerous points within the loops, rather than concentrating on upstream processes. Anti-inflammatories and anti-oxidants may be especially valuable, since these processes are involved in many loops and hence would affect numerous processes in AD.
Assuntos
Doença de Alzheimer/metabolismo , Doença de Alzheimer/patologia , Retroalimentação Fisiológica/fisiologia , Peptídeos beta-Amiloides/metabolismo , Precursor de Proteína beta-Amiloide/metabolismo , Ácido Glutâmico/metabolismo , Humanos , Estresse Oxidativo/fisiologia , Proteínas tau/metabolismoRESUMO
Targeting the early oligomers formed by the amyloid-ß (Aß) peptide of 40 and 42 amino acids is considered one promising therapeutic approach for Alzheimer's disease (AD). In vitro experiments and computer simulations are often used in synergy to reveal the modes of interactions of drugs. In this account, we present our contribution to understanding how small molecules bind to Aß40/Aß42 peptides, based either on extensive coarse-grained and all-atom simulations, or a variety of experimental techniques. We conclude by offering several perspectives on the future of this field to design more efficient drugs.
Assuntos
Doença de Alzheimer/tratamento farmacológico , Peptídeos beta-Amiloides/metabolismo , Simulação por Computador , Modelos Moleculares , Fármacos Neuroprotetores/farmacologia , Fragmentos de Peptídeos/metabolismo , Doença de Alzheimer/metabolismo , Linhagem Celular Tumoral , Desenho de Fármacos , Humanos , Fármacos Neuroprotetores/química , Estresse Oxidativo/efeitos dos fármacos , Estresse Oxidativo/fisiologia , Ligação Proteica , Conformação Proteica , Espécies Reativas de Oxigênio/metabolismoRESUMO
Essential genes are those that are critical for life. In the specific case of the mouse, they are the set of genes whose deletion means that a mouse is unable to survive after birth. As such, they are the key minimal set of genes needed for all the steps of development to produce an organism capable of life ex utero. We explored a wide range of sequence and functional features to characterise essential (lethal) and non-essential (viable) genes in mice. Experimental data curated manually identified 1301 essential genes and 3451 viable genes. Very many sequence features show highly significant differences between essential and viable mouse genes. Essential genes generally encode complex proteins, with multiple domains and many introns. These genes tend to be: long, highly expressed, old and evolutionarily conserved. These genes tend to encode ligases, transferases, phosphorylated proteins, intracellular proteins, nuclear proteins, and hubs in protein-protein interaction networks. They are involved with regulating protein-protein interactions, gene expression and metabolic processes, cell morphogenesis, cell division, cell proliferation, DNA replication, cell differentiation, DNA repair and transcription, cell differentiation and embryonic development. Viable genes tend to encode: membrane proteins or secreted proteins, and are associated with functions such as cellular communication, apoptosis, behaviour and immune response, as well as housekeeping and tissue specific functions. Viable genes are linked to transport, ion channels, signal transduction, calcium binding and lipid binding, consistent with their location in membranes and involvement with cell-cell communication. From the analysis of the composite features of essential and viable genes, we conclude that essential genes tend to be required for intracellular functions, and viable genes tend to be involved with extracellular functions and cell-cell communication. Knowledge of the features that are over-represented in essential genes allows for a deeper understanding of the functions and processes implemented during mammalian development.
Assuntos
Desenvolvimento Embrionário/genética , Genes Essenciais , Animais , Expressão Gênica , CamundongosRESUMO
The two hallmarks of Alzheimer's disease (AD) are the presence of neurofibrillary tangles (NFT) made of aggregates of the hyperphosphorylated tau protein and of amyloid plaques composed of amyloid-ß (Aß) peptides, primarily Aß1-40 and Aß1-42. Targeting the production, aggregation, and toxicity of Aß with small molecule drugs or antibodies is an active area of AD research due to the general acceptance of the amyloid cascade hypothesis, but thus far all drugs targeting Aß have failed. From a review of the recent literature and our own experience based on in vitro, in silico, and in vivo studies, we present some reasons to explain this repetitive failure.
Assuntos
Doença de Alzheimer/tratamento farmacológico , Doença de Alzheimer/metabolismo , Peptídeos beta-Amiloides/metabolismo , Fármacos Neuroprotetores/farmacologia , Fármacos Neuroprotetores/uso terapêutico , Animais , Descoberta de Drogas , HumanosRESUMO
The 20 standard amino acids encoded by the Genetic Code were adopted during the RNA World, around 4 billion years ago. This amino acid set could be regarded as a frozen accident, implying that other possible structures could equally well have been chosen to use in proteins. Amino acids were not primarily selected for their ability to support catalysis, as the RNA World already had highly effective cofactors to perform reactions, such as oxidation, reduction and transfer of small molecules. Rather, they were selected to enable the formation of soluble structures with close-packed cores, allowing the presence of ordered binding pockets. Factors to take into account when assessing why a particular amino acid might be used include its component atoms, functional groups, biosynthetic cost, use in a protein core or on the surface, solubility and stability. Applying these criteria to the 20 standard amino acids, and considering some other simple alternatives that are not used, we find that there are excellent reasons for the selection of every amino acid. Rather than being a frozen accident, the set of amino acids selected appears to be near ideal.
Assuntos
Aminoácidos/normas , Evolução Biológica , Código Genético , Modelos Biológicos , Biossíntese de Proteínas , Aminoácidos/química , Aminoácidos/metabolismo , Animais , Metabolismo Energético , Humanos , Conformação Proteica , Dobramento de Proteína , Estabilidade Proteica , SolubilidadeRESUMO
The major structural components of protective mucus hydrogels on mucosal surfaces are the secreted polymeric gel-forming mucins. The very high molecular weight and extensive O-glycosylation of gel-forming mucins, which are key to their viscoelastic properties, create problems when studying mucins using conventional biochemical/structural techniques. Thus, key structural information, such as the secondary structure of the various mucin subdomains, and glycosylation patterns along individual molecules, remains to be elucidated. Here, we utilized Raman spectroscopy, Raman optical activity (ROA), circular dichroism (CD), and tip-enhanced Raman spectroscopy (TERS) to study the structure of the secreted polymeric gel-forming mucin MUC5B. ROA indicated that the protein backbone of MUC5B is dominated by unordered conformation, which was found to originate from the heavily glycosylated central mucin domain by isolation of MUC5B O-glycan-rich regions. In sharp contrast, recombinant proteins of the N-terminal region of MUC5B (D1-D2-D'-D3 domains, NT5B), C-terminal region of MUC5B (D4-B-C-CK domains, CT5B) and the Cys-domain (within the central mucin domain of MUC5B) were found to be dominated by the ß-sheet. Using these findings, we employed TERS, which combines the chemical specificity of Raman spectroscopy with the spatial resolution of atomic force microscopy to study the secondary structure along 90 nm of an individual MUC5B molecule. Interestingly, the molecule was found to contain a large amount of α-helix/unordered structures and many signatures of glycosylation, pointing to a highly O-glycosylated region on the mucin.
Assuntos
Mucina-5B/química , Glicosilação , Voluntários Saudáveis , Humanos , Microscopia de Força Atômica , Mucina-5B/isolamento & purificação , Estrutura Secundária de Proteína , Análise Espectral RamanRESUMO
Infrared (IR) spectra contain substantial information about protein structure. This has previously most often been exploited by using known band assignments. Here, we convert spectral intensities in bins within Amide I and II regions to vectors and apply machine learning methods to determine protein secondary structure. Partial least squares was performed on spectra of 90 proteins in H2O. After preprocessing and removal of outliers, 84 proteins were used for this work. Standard normal variate and second-derivative preprocessing methods on the combined Amide I and II data generally gave the best performance, with root-mean-square values for prediction of â¼12% for α-helix, â¼7% for ß-sheet, 7% for antiparallel ß-sheet, and â¼8% for other conformations. Analysis of Fourier transform infrared (FTIR) spectra of 16 proteins in D2O showed that secondary structure determination was slightly poorer than in H2O. Interval partial least squares was used to identify the critical regions within spectra for secondary structure prediction and showed that the sides of bands were most valuable, rather than their peak maxima. In conclusion, we have shown that multivariate analysis of protein FTIR spectra can give α-helix, ß-sheet, other, and antiparallel ß-sheet contents with good accuracy, comparable to that of circular dichroism, which is widely used for this purpose.
Assuntos
Análise dos Mínimos Quadrados , Estrutura Secundária de Proteína , Proteínas/química , Espectroscopia de Infravermelho com Transformada de Fourier/métodos , Dicroísmo CircularRESUMO
Inhibition of the aggregation of the monomeric peptide ß-amyloid (Aß) into oligomers is a widely studied therapeutic approach in Alzheimer's disease (AD). Many small molecules have been reported to work in this way, including 1,4-naphthoquinon-2-yl-L-tryptophan (NQ-Trp). NQ-Trp has been reported to inhibit aggregation, to rescue cells from Aß toxicity, and showed complete phenotypic recovery in an in vivo AD model. In this work we investigated its molecular mechanism by using a combined approach of experimental and theoretical studies, and obtained converging results. NQ-Trp is a relatively weak inhibitor and the fluorescence data obtained by employing the fluorophore widely used to monitor aggregation into fibrils can be misinterpreted due to the inner filter effect. Simulations and NMR experiments showed that NQ-Trp has no specific "binding site"-type interaction with mono- and dimeric Aß, which could explain its low inhibitory efficiency. This suggests that the reported anti-AD activity of NQ-Trp-type molecules in in vivo models has to involve another mechanism. This study has revealed the potential pitfalls in the development of aggregation inhibitors for amyloidogenic peptides, which are of general interest for all the molecules studied in the context of inhibiting the formation of toxic aggregates.
Assuntos
Doença de Alzheimer/tratamento farmacológico , Peptídeos beta-Amiloides/antagonistas & inibidores , Peptídeos beta-Amiloides/química , Naftoquinonas/química , Naftoquinonas/farmacologia , Fragmentos de Peptídeos/antagonistas & inibidores , Fragmentos de Peptídeos/química , Triptofano/análogos & derivados , Humanos , Espectroscopia de Ressonância Magnética , Simulação de Dinâmica Molecular , Triptofano/química , Triptofano/farmacologiaRESUMO
Accurate identification of drug targets is a crucial part of any drug development program. We mined the human proteome to discover properties of proteins that may be important in determining their suitability for pharmaceutical modulation. Data was gathered concerning each protein's sequence, post-translational modifications, secondary structure, germline variants, expression profile and drug target status. The data was then analysed to determine features for which the target and non-target proteins had significantly different values. This analysis was repeated for subsets of the proteome consisting of all G-protein coupled receptors, ion channels, kinases and proteases, as well as proteins that are implicated in cancer. Machine learning was used to quantify the proteins in each dataset in terms of their potential to serve as a drug target. This was accomplished by first inducing a random forest that could distinguish between its targets and non-targets, and then using the random forest to quantify the drug target likeness of the non-targets. The properties that can best differentiate targets from non-targets were primarily those that are directly related to a protein's sequence (e.g. secondary structure). Germline variants, expression levels and interactions between proteins had minimal discriminative power. Overall, the best indicators of drug target likeness were found to be the proteins' hydrophobicities, in vivo half-lives, propensity for being membrane bound and the fraction of non-polar amino acids in their sequences. In terms of predicting potential targets, datasets of proteases, ion channels and cancer proteins were able to induce random forests that were highly capable of distinguishing between targets and non-targets. The non-target proteins predicted to be targets by these random forests comprise the set of the most suitable potential future drug targets, and should therefore be prioritised when building a drug development programme.