Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 179
Filtrar
Mais filtros

Base de dados
País/Região como assunto
Tipo de documento
Intervalo de ano de publicação
1.
Proc Natl Acad Sci U S A ; 119(12): e2111405119, 2022 03 22.
Artigo em Inglês | MEDLINE | ID: mdl-35294277

RESUMO

SignificanceOur results demonstrate the existence of early cellular pathways and network alterations in oligodendrocytes in the alpha-synucleinopathies Parkinson's disease and multiple system atrophy. They further reveal the involvement of an immune component triggered by alpha-synuclein protein, as well as a connection between (epi)genetic changes and immune reactivity in multiple system atrophy. The knowledge generated in this study could be used to devise novel therapeutic approaches to treat synucleinopathies.


Assuntos
Células-Tronco Pluripotentes Induzidas , Atrofia de Múltiplos Sistemas , Doença de Parkinson , Sinucleinopatias , Humanos , Células-Tronco Pluripotentes Induzidas/metabolismo , Atrofia de Múltiplos Sistemas/metabolismo , Oligodendroglia/metabolismo , Doença de Parkinson/genética , Doença de Parkinson/metabolismo , alfa-Sinucleína/genética , alfa-Sinucleína/metabolismo
2.
Curr Genomics ; 24(1): 18-23, 2023 Jun 23.
Artigo em Inglês | MEDLINE | ID: mdl-37920730

RESUMO

Synonymous (also known as silent) variations are by definition not considered to change the coded protein. Still many variations in this category affect either protein abundance or properties. As this situation is confusing, we have recently introduced systematics for synonymous variations and those that may on the surface look like synonymous, but these may affect the coded protein in various ways. A new category, unsense variation, was introduced to describe variants that do not introduce a stop codon into the variation site, but which lead to different types of changes in the coded protein. Many of these variations lead to mRNA degradation and missing protein. Here, consequences of the systematics are discussed from the perspectives of variation annotation and interpretation, evolutionary calculations, nonsynonymous-to-synonymous substitution rates, phylogenetics and other evolutionary inferences that are based on the principle of (nearly) neutral synonymous variations. It may be necessary to reassess published results. Further, databases for synonymous variations and prediction methods for such variations should consider unsense variations. Thus, there is a need to evaluate and reflect principles of numerous aspects in genetics, ranging from variation naming and classification to evolutionary calculations.

3.
Int J Mol Sci ; 24(16)2023 Aug 21.
Artigo em Inglês | MEDLINE | ID: mdl-37629203

RESUMO

Most proteins fold into characteristic three-dimensional structures. The rate of folding and unfolding varies widely and can be affected by variations in proteins. We developed a novel machine-learning-based method for the prediction of the folding rate effects of amino acid substitutions in two-state folding proteins. We collected a data set of experimentally defined folding rates for variants and used them to train a gradient boosting algorithm starting with 1161 features. Two predictors were designed. The three-class classifier had, in blind tests, specificity and sensitivity ranging from 0.324 to 0.419 and from 0.256 to 0.451, respectively. The other tool was a regression predictor that showed a Pearson correlation coefficient of 0.525. The error measures, mean absolute error and mean squared error, were 0.581 and 0.603, respectively. One of the previously presented tools could be used for comparison with the blind test data set, our method called PON-Fold showed superior performance on all used measures. The applicability of the tool was tested by predicting all possible substitutions in a protein domain. Predictions for different conformations of proteins, open and closed forms of a protein kinase, and apo and holo forms of an enzyme indicated that the choice of the structure had a large impact on the outcome. PON-Fold is freely available.


Assuntos
Algoritmos , Dobramento de Proteína , Substituição de Aminoácidos , Correlação de Dados
4.
Int J Mol Sci ; 23(18)2022 Sep 16.
Artigo em Inglês | MEDLINE | ID: mdl-36142711

RESUMO

The stability of proteins is an essential property that has several biological implications. Knowledge about protein stability is important in many ways, ranging from protein purification and structure determination to stability in cells and biotechnological applications. Experimental determination of thermal stabilities has been tedious and available data have been limited. The introduction of limited proteolysis and mass spectrometry approaches has facilitated more extensive cellular protein stability data production. We collected melting temperature information for 34,913 proteins and developed a machine learning predictor, ProTstab2, by utilizing a gradient boosting algorithm after testing seven algorithms. The method performance was assessed on a blind test data set and showed a Pearson correlation coefficient of 0.753 and root mean square error of 7.005. Comparison to previous methods indicated that ProTstab2 had superior performance. The method is fast, so it was applied to predict and compare the stabilities of all proteins in human, mouse, and zebrafish proteomes for which experimental data were not determined. The tool is freely available.


Assuntos
Proteoma , Peixe-Zebra , Algoritmos , Animais , Humanos , Aprendizado de Máquina , Camundongos , Estabilidade Proteica
5.
Blood Cells Mol Dis ; 92: 102596, 2021 12.
Artigo em Inglês | MEDLINE | ID: mdl-34547651

RESUMO

Chronic granulomatous disease (CGD) is an immunodeficiency disorder affecting about 1 in 250,000 individuals. CGD patients suffer from severe, recurrent bacterial and fungal infections. The disease is caused by mutations in the genes encoding the components of the leukocyte NADPH oxidase. This enzyme produces superoxide, which is subsequently metabolized to hydrogen peroxide and other reactive oxygen species (ROS). These products are essential for intracellular killing of pathogens by phagocytic leukocytes (neutrophils, eosinophils, monocytes and macrophages). The leukocyte NADPH oxidase is composed of five subunits, four of which are encoded by autosomal genes. These are CYBA, encoding p22phox, NCF1, encoding p47phox, NCF2, encoding p67phox and NCF4, encoding p40phox. This article lists all mutations identified in these genes in CGD patients. In addition, cytochrome b558 chaperone-1 (CYBC1), recently recognized as an essential chaperone protein for the expression of the X-linked NADPH oxidase component gp91phox (also called Nox2), is encoded by the autosomal gene CYBC1. Mutations in this gene also lead to CGD. Finally, RAC2, a small GTPase of the Rho family, is needed for activation of the NADPH oxidase, and mutations in the RAC2 gene therefore also induce CGD-like symptoms. Mutations in these last two genes are also listed in this article.


Assuntos
Doença Granulomatosa Crônica/genética , Mutação , Humanos , NADPH Oxidases/genética
6.
Rheumatology (Oxford) ; 61(1): 309-318, 2021 12 24.
Artigo em Inglês | MEDLINE | ID: mdl-33784391

RESUMO

OBJECTIVES: SSc-associated pulmonary arterial hypertension (SSc-APAH) is a late but devastating complication of SSc. Early identification of SSc-APAH may improve survival. We examined the role of circulating miRNAs in SSc-APAH. METHODS: Using quantitative RT-PCR the abundance of mature miRNAs in plasma was determined in 85 female patients with ACA-positive lcSSc. Twenty-two of the patients had SSc-APAH. Sixty-three SSc controls without PAH were matched for disease duration. Forty-six selected miRNA plasma levels were correlated with clinical data. Longitudinal samples were analysed from 14 SSc-APAH and 27 SSc patients. RESULTS: The disease duration was 12 years for the SSc-APAH patients and 12.7 years for the SSc controls. Plasma expression levels of 11 miRNAs were lower in patients with SSc-APAH. Four miRNAs displayed higher plasma levels in SSc-APAH patients compared with SSc controls. There was significant difference between groups for miR-20a-5p and miR-203a-3p when correcting for multiple comparisons (P = 0.002 for both). Receiver operating characteristics curve showed AUC = 0.69-0.83 for miR-21-5p and miR-20a-5p or their combination. miR-20a-5p and miR-203a-3p correlated inversely with NT-pro-Brain Natriuretic Protein levels (r = -0.42 and -0.47). Mixed effect model analysis could not identify any miRNAs as predictor of PAH development. However, miR-20a-5p plasma levels were lower in the longitudinal samples of SSc-APAH patients than in the SSc controls. CONCLUSIONS: Our study links expression levels of the circulating plasma miRNAs, especially miR-20a-5p and miR-203a-3p, to the occurrence of SSc-APAH in female patients with ACA-positive lcSSc.


Assuntos
MicroRNA Circulante/sangue , Hipertensão Arterial Pulmonar/metabolismo , Escleroderma Sistêmico/metabolismo , Idoso , Feminino , Humanos , Pessoa de Meia-Idade
7.
RNA Biol ; 18(4): 481-498, 2021 04.
Artigo em Inglês | MEDLINE | ID: mdl-32951567

RESUMO

Systematics is described for annotation of variations in RNA molecules. The conceptual framework is part of Variation Ontology (VariO) and facilitates depiction of types of variations, their functional and structural effects and other consequences in any RNA molecule in any organism. There are more than 150 RNA related VariO terms in seven levels, which can be further combined to generate even more complicated and detailed annotations. The terms are described together with examples, usually for variations and effects in human and in diseases. RNA variation type has two subcategories: variation classification and origin with subterms. Altogether six terms are available for function description. Several terms are available for affected RNA properties. The ontology contains also terms for structural description for affected RNA type, post-transcriptional RNA modifications, secondary and tertiary structure effects and RNA sugar variations. Together with the DNA and protein concepts and annotations, RNA terms allow comprehensive description of variations of genetic and non-genetic origin at all possible levels. The VariO annotations are readable both for humans and computer programs for advanced data integration and mining.


Assuntos
Variação Genética , Genômica/métodos , RNA/genética , Animais , Biologia Computacional , Bases de Dados Genéticas , Ontologia Genética , Humanos , Software
8.
Int J Mol Sci ; 22(15)2021 Jul 27.
Artigo em Inglês | MEDLINE | ID: mdl-34360790

RESUMO

Genetic variations have a multitude of effects on proteins. A substantial number of variations affect protein-solvent interactions, either aggregation or solubility. Aggregation is often related to structural alterations, whereas solubilizable proteins in the solid phase can be made again soluble by dilution. Solubility is a central protein property and when reduced can lead to diseases. We developed a prediction method, PON-Sol2, to identify amino acid substitutions that increase, decrease, or have no effect on the protein solubility. The method is a machine learning tool utilizing gradient boosting algorithm and was trained on a large dataset of variants with different outcomes after the selection of features among a large number of tested properties. The method is fast and has high performance. The normalized correct prediction rate for three states is 0.656, and the normalized GC2 score is 0.312 in 10-fold cross-validation. The corresponding numbers in the blind test were 0.545 and 0.157. The performance was superior in comparison to previous methods. The PON-Sol2 predictor is freely available. It can be used to predict the solubility effects of variants for any organism, even in large-scale projects.


Assuntos
Substituição de Aminoácidos , Análise de Sequência de Proteína , Software , Valor Preditivo dos Testes , Proteínas Recombinantes/química , Proteínas Recombinantes/genética , Solubilidade
9.
PLoS Comput Biol ; 15(2): e1006481, 2019 02.
Artigo em Inglês | MEDLINE | ID: mdl-30742610

RESUMO

Computational tools are widely used for interpreting variants detected in sequencing projects. The choice of these tools is critical for reliable variant impact interpretation for precision medicine and should be based on systematic performance assessment. The performance of the methods varies widely in different performance assessments, for example due to the contents and sizes of test datasets. To address this issue, we obtained 63,160 common amino acid substitutions (allele frequency ≥1% and <25%) from the Exome Aggregation Consortium (ExAC) database, which contains variants from 60,706 genomes or exomes. We evaluated the specificity, the capability to detect benign variants, for 10 variant interpretation tools. In addition to overall specificity of the tools, we tested their performance for variants in six geographical populations. PON-P2 had the best performance (95.5%) followed by FATHMM (86.4%) and VEST (83.5%). While these tools had excellent performance, the poorest method predicted more than one third of the benign variants to be disease-causing. The results allow choosing reliable methods for benign variant interpretation, for both research and clinical purposes, as well as provide a benchmark for method developers.


Assuntos
Biologia Computacional/métodos , Previsões/métodos , Análise de Sequência de DNA/métodos , Substituição de Aminoácidos/genética , Bases de Dados Genéticas , Exoma , Frequência do Gene/genética , Variação Genética , Humanos , Sensibilidade e Especificidade , Virulência
10.
Nucleic Acids Res ; 46(9): 4649-4661, 2018 05 18.
Artigo em Inglês | MEDLINE | ID: mdl-29294068

RESUMO

The phage Mu DNA transposition system provides a versatile species non-specific tool for molecular biology, genetic engineering and genome modification applications. Mu transposition is catalyzed by MuA transposase, with DNA cleavage and integration reactions ultimately attaching the transposon DNA to target DNA. To improve the activity of the Mu DNA transposition machinery, we mutagenized MuA protein and screened for hyperactivity-causing substitutions using an in vivo assay. The individual activity-enhancing substitutions were mapped onto the MuA-DNA complex structure, containing a tetramer of MuA transposase, two Mu end segments and a target DNA. This analysis, combined with the varying effect of the mutations in different assays, implied that the mutations exert their effects in several ways, including optimizing protein-protein and protein-DNA contacts. Based on these insights, we engineered highly hyperactive versions of MuA, by combining several synergistically acting substitutions located in different subdomains of the protein. Purified hyperactive MuA variants are now ready for use as second-generation tools in a variety of Mu-based DNA transposition applications. These variants will also widen the scope of Mu-based gene transfer technologies toward medical applications such as human gene therapy. Moreover, the work provides a platform for further design of custom transposases.


Assuntos
Elementos de DNA Transponíveis , Transposases/genética , Transposases/metabolismo , Substituição de Aminoácidos , Animais , Células Cultivadas , Engenharia Genética , Genoma , Camundongos , Modelos Moleculares , Mutação , Transposases/química , Transposases/isolamento & purificação
11.
Hum Mutat ; 40(10): 1634-1640, 2019 10.
Artigo em Inglês | MEDLINE | ID: mdl-31347738

RESUMO

Databases with variant and phenotype information are essential for advancing research and improving the health and welfare of individuals. These resources require data to be collected, curated, and shared among relevant specialties to maximize impact. The increasing generation of data which must be shared both nationally and globally for maximal effect presents important ethical and privacy concerns. Database curators need to ensure that their work conform to acceptable ethical standards. A Working Group of the Human Variome Project had the task of updating and streamlining ethical guidelines for locus-specific/gene variant database curators. In this article, we present practical and achievable steps which should assist database curators in carrying out their responsibilities within acceptable ethical norms.


Assuntos
Lista de Checagem , Biologia Computacional , Gerenciamento de Dados , Bases de Dados Genéticas , Marcadores Genéticos , Predisposição Genética para Doença , Variação Genética , Biologia Computacional/métodos , Gerenciamento de Dados/ética , Ética Médica , Estudos de Associação Genética , Humanos
12.
Hum Mutat ; 40(9): 1530-1545, 2019 09.
Artigo em Inglês | MEDLINE | ID: mdl-31301157

RESUMO

Accurate prediction of the impact of genomic variation on phenotype is a major goal of computational biology and an important contributor to personalized medicine. Computational predictions can lead to a better understanding of the mechanisms underlying genetic diseases, including cancer, but their adoption requires thorough and unbiased assessment. Cystathionine-beta-synthase (CBS) is an enzyme that catalyzes the first step of the transsulfuration pathway, from homocysteine to cystathionine, and in which variations are associated with human hyperhomocysteinemia and homocystinuria. We have created a computational challenge under the CAGI framework to evaluate how well different methods can predict the phenotypic effect(s) of CBS single amino acid substitutions using a blinded experimental data set. CAGI participants were asked to predict yeast growth based on the identity of the mutations. The performance of the methods was evaluated using several metrics. The CBS challenge highlighted the difficulty of predicting the phenotype of an ex vivo system in a model organism when classification models were trained on human disease data. We also discuss the variations in difficulty of prediction for known benign and deleterious variants, as well as identify methodological and experimental constraints with lessons to be learned for future challenges.


Assuntos
Substituição de Aminoácidos , Biologia Computacional/métodos , Cistationina beta-Sintase/genética , Cistationina/metabolismo , Cistationina beta-Sintase/metabolismo , Homocisteína/metabolismo , Humanos , Fenótipo , Medicina de Precisão
13.
BMC Genomics ; 20(Suppl 8): 547, 2019 Jul 16.
Artigo em Inglês | MEDLINE | ID: mdl-31307390

RESUMO

BACKGROUND: Membrane proteins constitute up to 30% of the human proteome. These proteins have special properties because the transmembrane segments are embedded into lipid bilayer while extramembranous parts are in different environments. Membrane proteins have several functions and are involved in numerous diseases. A large number of prediction methods have been introduced to predict protein subcellular localization as well as the tolerance or pathogenicity of amino acid substitutions. RESULTS: We tested the performance of 22 tolerance predictors by collecting information on membrane proteins and variants in them. The analysis indicated that the best tools had similar prediction performance on transmembrane, inside and outside regions of transmembrane proteins and comparable to overall prediction performances for all types of proteins. PON-P2 had the highest performance followed by REVEL, MetaSVM and VEST3. Further, we tested with the high quality dataset also the performance of seven subcellular localization predictors on membrane proteins. We assessed separately the performance for single pass and multi pass membrane proteins. Predictions for multi pass proteins were more reliable than those for single pass proteins. CONCLUSIONS: The predictors for variant effects had better performance than subcellular localization tools. The best tolerance predictors are highly reliable. As there are large differences in the performances of tools, end-users have to be cautious in method selection.


Assuntos
Biologia Computacional/métodos , Variação Genética , Espaço Intracelular/metabolismo , Proteínas de Membrana/genética , Proteínas de Membrana/metabolismo , Substituição de Aminoácidos , Benchmarking , Transporte Proteico
14.
BMC Genomics ; 20(1): 804, 2019 Nov 04.
Artigo em Inglês | MEDLINE | ID: mdl-31684883

RESUMO

BACKGROUND: Stability is one of the most fundamental intrinsic characteristics of proteins and can be determined with various methods. Characterization of protein properties does not keep pace with increase in new sequence data and therefore even basic properties are not known for far majority of identified proteins. There have been some attempts to develop predictors for protein stabilities; however, they have suffered from small numbers of known examples. RESULTS: We took benefit of results from a recently developed cellular stability method, which is based on limited proteolysis and mass spectrometry, and developed a machine learning method using gradient boosting of regression trees. ProTstab method has high performance and is well suited for large scale prediction of protein stabilities. CONCLUSIONS: The Pearson's correlation coefficient was 0.793 in 10-fold cross validation and 0.763 in independent blind test. The corresponding values for mean absolute error are 0.024 and 0.036, respectively. Comparison with a previously published method indicated ProTstab to have superior performance. We used the method to predict stabilities of all the remaining proteins in the entire human proteome and then correlated the predicted stabilities to protein chain lengths of isoforms and to localizations of proteins.


Assuntos
Células/metabolismo , Biologia Computacional/métodos , Proteoma/química , Proteoma/metabolismo , Humanos , Isoformas de Proteínas/química , Isoformas de Proteínas/metabolismo , Estabilidade Proteica
15.
Nucleic Acids Res ; 45(D1): D846-D853, 2017 01 04.
Artigo em Inglês | MEDLINE | ID: mdl-27924022

RESUMO

FINDbase (http://www.findbase.org) is a comprehensive data repository that records the prevalence of clinically relevant genomic variants in various populations worldwide, such as pathogenic variants leading mostly to monogenic disorders and pharmacogenomics biomarkers. The database also records the incidence of rare genetic diseases in various populations, all in well-distinct data modules. Here, we report extensive data content updates in all data modules, with direct implications to clinical pharmacogenomics. Also, we report significant new developments in FINDbase, namely (i) the release of a new version of the ETHNOS software that catalyzes development curation of national/ethnic genetic databases, (ii) the migration of all FINDbase data content into 90 distinct national/ethnic mutation databases, all built around Microsoft's PivotViewer (http://www.getpivot.com) software (iii) new data visualization tools and (iv) the interrelation of FINDbase with DruGeVar database with direct implications in clinical pharmacogenomics. The abovementioned updates further enhance the impact of FINDbase, as a key resource for Genomic Medicine applications.


Assuntos
Alelos , Bases de Dados Genéticas , Frequência do Gene , Variação Genética , Genômica/métodos , Predisposição Genética para Doença , Humanos , Farmacogenética , Software , Navegador
16.
BMC Bioinformatics ; 19(1): 461, 2018 Nov 29.
Artigo em Inglês | MEDLINE | ID: mdl-30497376

RESUMO

BACKGROUND: Benchmark datasets are essential for both method development and performance assessment. These datasets have numerous requirements, representativeness being one. In the case of variant tolerance/pathogenicity prediction, representativeness means that the dataset covers the space of variations and their effects. RESULTS: We performed the first analysis of the representativeness of variation benchmark datasets. We used statistical approaches to investigate how proteins in the benchmark datasets were representative for the entire human protein universe. We investigated the distributions of variants in chromosomes, protein structures, CATH domains and classes, Pfam protein families, Enzyme Commission (EC) classifications and Gene Ontology annotations in 24 datasets that have been used for training and testing variant tolerance prediction methods. All the datasets were available in VariBench or VariSNP databases. We tested also whether the pathogenic variant datasets contained neutral variants defined as those that have high minor allele frequency in the ExAC database. The distributions of variants over the chromosomes and proteins varied greatly between the datasets. CONCLUSIONS: None of the datasets was found to be well representative. Many of the tested datasets had quite good coverage of the different protein characteristics. Dataset size correlates to representativeness but only weakly to the performance of methods trained on them. The results imply that dataset representativeness is an important factor and should be taken into account in predictor development and testing.


Assuntos
Benchmarking , Bases de Dados como Assunto , Cromossomos/genética , Bases de Dados de Proteínas , Frequência do Gene , Ontologia Genética , Variação Genética , Humanos , Anotação de Sequência Molecular , Domínios Proteicos , Proteínas/química
17.
BMC Genomics ; 19(1): 974, 2018 Dec 28.
Artigo em Inglês | MEDLINE | ID: mdl-30591019

RESUMO

BACKGROUND: Numerous different types of variations can occur in DNA and have diverse effects and consequences. The Variation Ontology (VariO) was developed for systematic descriptions of variations and their effects at DNA, RNA and protein levels. RESULTS: VariO use and terms for DNA variations are described in depth. VariO provides systematic names for variation types and detailed descriptions for changes in DNA function, structure and properties. The principles of VariO are presented along with examples from published articles or databases, most often in relation to human diseases. VariO terms describe local DNA changes, chromosome number and structure variants, chromatin alterations, as well as genomic changes, whether of genetic or non-genetic origin. CONCLUSIONS: DNA variation systematics facilitates unambiguous descriptions of variations and their effects and further reuse and integration of data from different sources by both human and computers.


Assuntos
DNA/genética , Bases de Dados Genéticas , Variação Genética , Biologia Computacional , Genômica , Humanos , Software
18.
Trends Genet ; 31(8): 423-5, 2015 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-26091961

RESUMO

A critical aspect of science is the clear communication of complicated matters. However, language is often ambiguous, and the message can get lost in the telling. In particular, genetic terms can have different meanings for different people. Here, I discuss this problem and suggest remedies to clarify the message.


Assuntos
Genética , Terminologia como Assunto , Animais , Sequência de Bases , Genes , Humanos , Mutação , Proteínas/genética
19.
Nucleic Acids Res ; 44(5): 2020-7, 2016 Mar 18.
Artigo em Inglês | MEDLINE | ID: mdl-26843426

RESUMO

Transfer RNAs (tRNAs) are essential for encoding the transcribed genetic information from DNA into proteins. Variations in the human tRNAs are involved in diverse clinical phenotypes. Interestingly, all pathogenic variations in tRNAs are located in mitochondrial tRNAs (mt-tRNAs). Therefore, it is crucial to identify pathogenic variations in mt-tRNAs for disease diagnosis and proper treatment. We collected mt-tRNA variations using a classification based on evidence from several sources and used the data to develop a multifactorial probability-based prediction method, PON-mt-tRNA, for classification of mt-tRNA single nucleotide substitutions. We integrated a machine learning-based predictor and an evidence-based likelihood ratio for pathogenicity using evidence of segregation, biochemistry and histochemistry to predict the posterior probability of pathogenicity of variants. The accuracy and Matthews correlation coefficient (MCC) of PON-mt-tRNA are 1.00 and 0.99, respectively. In the absence of evidence from segregation, biochemistry and histochemistry, PON-mt-tRNA classifies variations based on the machine learning method with an accuracy and MCC of 0.69 and 0.39, respectively. We classified all possible single nucleotide substitutions in all human mt-tRNAs using PON-mt-tRNA. The variations in the loops are more often tolerated compared to the variations in stems. The anticodon loop contains comparatively more predicted pathogenic variations than the other loops. PON-mt-tRNA is available at http://structure.bmc.lu.se/PON-mt-tRNA/.


Assuntos
Anticódon/química , Mitocôndrias/genética , Modelos Estatísticos , RNA de Transferência/química , RNA/química , Anticódon/metabolismo , Humanos , Aprendizado de Máquina , Mitocôndrias/metabolismo , Mitocôndrias/patologia , Modelos Genéticos , Modelos Moleculares , Conformação de Ácido Nucleico , Polimorfismo de Nucleotídeo Único , RNA/metabolismo , RNA Mitocondrial , RNA de Transferência/metabolismo
20.
Int J Mol Sci ; 19(4)2018 Mar 28.
Artigo em Inglês | MEDLINE | ID: mdl-29597263

RESUMO

Several methods have been developed to predict effects of amino acid substitutions on protein stability. Benchmark datasets are essential for method training and testing and have numerous requirements including that the data is representative for the investigated phenomenon. Available machine learning algorithms for variant stability have all been trained with ProTherm data. We noticed a number of issues with the contents, quality and relevance of the database. There were errors, but also features that had not been clearly communicated. Consequently, all machine learning variant stability predictors have been trained on biased and incorrect data. We obtained a corrected dataset and trained a random forests-based tool, PON-tstab, applicable to variants in any organism. Our results highlight the importance of the benchmark quality, suitability and appropriateness. Predictions are provided for three categories: stability decreasing, increasing and those not affecting stability.


Assuntos
Bases de Dados de Proteínas , Aprendizado de Máquina , Modelos Moleculares , Proteínas/química , Estabilidade Proteica , Proteínas/genética
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA