Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 11 de 11
Filtrar
Más filtros











Base de datos
Intervalo de año de publicación
1.
Nutrients ; 15(4)2023 Feb 04.
Artículo en Inglés | MEDLINE | ID: mdl-36839168

RESUMEN

Circulating fatty acid composition is assumed to play an important role in metabolic dysfunction-associated fatty liver disease (MAFLD) pathogenesis. This study aimed to investigate the association between the overall balance of serum fatty acid composition and MAFLD prevalence. This cross-sectional study involved 400 Japanese individuals recruited from a health-screening program. We measured fatty acids in serum lipids using gas chromatography-mass spectrometry. The serum fatty acid composition balance was evaluated using fuzzy c-means clustering, which assigns individual data points to multiple clusters and calculates the percentage of data points belonging to multiple clusters, and serum fatty acid mass%. The participants were classified into four characteristic subclasses (i.e., Clusters 1, 2, 3, and 4), and the specific serum fatty acid composition balance (i.e., Cluster 4) was associated with a higher MAFLD prevalence. We suggest that the fuzzy c-means method can be used to determine the circulating fatty acid composition balance and highlight the importance of focusing on this balance when examining the relationship between MAFLD and serum fatty acids.


Asunto(s)
Ácidos Grasos , Enfermedad del Hígado Graso no Alcohólico , Humanos , Estudios Transversales , Análisis por Conglomerados , Cromatografía de Gases y Espectrometría de Masas
2.
Front Artif Intell ; 5: 924688, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-36304959

RESUMEN

Attention mechanisms are one of the most frequently used architectures in the development of artificial intelligence because they can process contextual information efficiently. Various artificial intelligence architectures, such as Transformer for processing natural language, image data, etc., include the Attention. Various improvements have been made to enhance its performance since Attention is a powerful component to realize artificial intelligence. The time complexity of Attention depends on the square of the input sequence length. Developing methods to improve the time complexity of Attention is one of the most popular research topics. Attention is a mechanism that conveys contextual information of input sequences to downstream networks. Thus, if one wants to improve the performance of processing contextual information, the focus should not be confined only on improving Attention but also on devising other similar mechanisms as possible alternatives. In this study, we devised an alternative mechanism called "Relation" that can understand the context information of sequential data. Relation is easy to implement, and its time complexity depends only on the length of the sequences; a comparison of the performance of Relation and Attention on several benchmark datasets showed that the context processing capability of Relation is comparable to that of Attention but with less computation time. Processing contextual information at high speeds would be useful because natural language processing and biological sequence processing sometimes deal with very long sequences. Hence, Relation is an ideal option for processing context information.

3.
Brief Bioinform ; 20(4): 1160-1166, 2019 07 19.
Artículo en Inglés | MEDLINE | ID: mdl-28968734

RESUMEN

This article describes several features in the MAFFT online service for multiple sequence alignment (MSA). As a result of recent advances in sequencing technologies, huge numbers of biological sequences are available and the need for MSAs with large numbers of sequences is increasing. To extract biologically relevant information from such data, sophistication of algorithms is necessary but not sufficient. Intuitive and interactive tools for experimental biologists to semiautomatically handle large data are becoming important. We are working on development of MAFFT toward these two directions. Here, we explain (i) the Web interface for recently developed options for large data and (ii) interactive usage to refine sequence data sets and MSAs.


Asunto(s)
Alineación de Secuencia/métodos , Programas Informáticos , Algoritmos , Biología Computacional/métodos , Bases de Datos Genéticas , Internet , Alineación de Secuencia/estadística & datos numéricos , Análisis de Secuencia , Interfaz Usuario-Computador
4.
BMC Bioinformatics ; 19(1): 272, 2018 07 18.
Artículo en Inglés | MEDLINE | ID: mdl-30021530

RESUMEN

BACKGROUND: Long short-term memory (LSTM) is one of the most attractive deep learning methods to learn time series or contexts of input data. Increasing studies, including biological sequence analyses in bioinformatics, utilize this architecture. Amino acid sequence profiles are widely used for bioinformatics studies, such as sequence similarity searches, multiple alignments, and evolutionary analyses. Currently, many biological sequences are becoming available, and the rapidly increasing amount of sequence data emphasizes the importance of scalable generators of amino acid sequence profiles. RESULTS: We employed the LSTM network and developed a novel profile generator to construct profiles without any assumptions, except for input sequence context. Our method could generate better profiles than existing de novo profile generators, including CSBuild and RPS-BLAST, on the basis of profile-sequence similarity search performance with linear calculation costs against input sequence size. In addition, we analyzed the effects of the memory power of LSTM and found that LSTM had high potential power to detect long-range interactions between amino acids, as in the case of beta-strand formation, which has been a difficult problem in protein bioinformatics using sequence information. CONCLUSION: We demonstrated the importance of sequence context and the feasibility of LSTM on biological sequence analyses. Our results demonstrated the effectiveness of memories in LSTM and showed that our de novo profile generator, SPBuild, achieved higher performance than that of existing methods for profile prediction of beta-strands, where long-range interactions of amino acids are important and are known to be difficult for the existing window-based prediction methods. Our findings will be useful for the development of other prediction methods related to biological sequences by machine learning methods.


Asunto(s)
Biología Computacional/métodos , Aprendizaje Profundo , Redes Neurales de la Computación , Secuencia de Aminoácidos , Bases de Datos como Asunto , Proteínas/química , Curva ROC , Homología de Secuencia de Aminoácido
5.
Bioinformatics ; 34(14): 2490-2492, 2018 07 15.
Artículo en Inglés | MEDLINE | ID: mdl-29506019

RESUMEN

Summary: We report an update for the MAFFT multiple sequence alignment program to enable parallel calculation of large numbers of sequences. The G-INS-1 option of MAFFT was recently reported to have higher accuracy than other methods for large data, but this method has been impractical for most large-scale analyses, due to the requirement of large computational resources. We introduce a scalable variant, G-large-INS-1, which has equivalent accuracy to G-INS-1 and is applicable to 50 000 or more sequences. Availability and implementation: This feature is available in MAFFT versions 7.355 or later at https://mafft.cbrc.jp/alignment/software/mpi.html. Supplementary information: Supplementary data are available at Bioinformatics online.


Asunto(s)
Biología Computacional/métodos , Alineación de Secuencia/métodos , Programas Informáticos , Algoritmos , Estructura Secundaria de Proteína , Análisis de Secuencia de Proteína/métodos , Análisis de Secuencia de ARN/métodos
6.
Algorithms Mol Biol ; 13: 5, 2018.
Artículo en Inglés | MEDLINE | ID: mdl-29467815

RESUMEN

BACKGROUND: A profile-comparison method with position-specific scoring matrix (PSSM) is among the most accurate alignment methods. Currently, cosine similarity and correlation coefficients are used as scoring functions of dynamic programming to calculate similarity between PSSMs. However, it is unclear whether these functions are optimal for profile alignment methods. By definition, these functions cannot capture nonlinear relationships between profiles. Therefore, we attempted to discover a novel scoring function, which was more suitable for the profile-comparison method than existing functions, using neural networks. RESULTS: Although neural networks required derivative-of-cost functions, the problem being addressed in this study lacked them. Therefore, we implemented a novel derivative-free neural network by combining a conventional neural network with an evolutionary strategy optimization method used as a solver. Using this novel neural network system, we optimized the scoring function to align remote sequence pairs. Our results showed that the pairwise-profile aligner using the novel scoring function significantly improved both alignment sensitivity and precision relative to aligners using existing functions. CONCLUSIONS: We developed and implemented a novel derivative-free neural network and aligner (Nepal) for optimizing sequence alignments. Nepal improved alignment quality by adapting to remote sequence alignments and increasing the expressiveness of similarity scores. Additionally, this novel scoring function can be realized using a simple matrix operation and easily incorporated into other aligners. Moreover our scoring function could potentially improve the performance of homology detection and/or multiple-sequence alignment of remote homologous sequences. The goal of the study was to provide a novel scoring function for profile alignment method and develop a novel learning system capable of addressing derivative-free problems. Our system is capable of optimizing the performance of other sophisticated methods and solving problems without derivative-of-cost functions, which do not always exist in practical problems. Our results demonstrated the usefulness of this optimization method for derivative-free problems.

7.
Acta Crystallogr D Struct Biol ; 73(Pt 9): 757-766, 2017 Sep 01.
Artículo en Inglés | MEDLINE | ID: mdl-28876239

RESUMEN

An alternative rational approach to improve protein crystals by using single-site mutation of surface residues is proposed based on the results of a statistical analysis using a compiled data set of 918 independent crystal structures, thereby reflecting not only the entropic effect but also other effects upon protein crystallization. This analysis reveals a clear difference in the crystal-packing propensity of amino acids depending on the secondary-structural class. To verify this result, a systematic crystallization experiment was performed with the biotin carboxyl carrier protein from Pyrococcus horikoshii OT3 (PhBCCP). Six single-site mutations were examined: Ala138 on the surface of a ß-sheet was mutated to Ile, Tyr, Arg, Gln, Val and Lys. In agreement with prediction, it was observed that the two mutants (A138I and A138Y) harbouring the residues with the highest crystal-packing propensities for ß-sheet at position 138 provided better crystallization scores relative to those of other constructs, including the wild type, and that the crystal-packing propensity for ß-sheet provided the best correlation with the ratio of obtaining crystals. Two new crystal forms of these mutants were obtained that diffracted to high resolution, generating novel packing interfaces with the mutated residues (Ile/Tyr). The mutations introduced did not affect the overall structures, indicating that a ß-sheet can accommodate a successful mutation if it is carefully selected so as to avoid intramolecular steric hindrance. A significant negative correlation between the ratio of obtaining amorphous precipitate and the crystal-packing propensity was also found.


Asunto(s)
Acetil-CoA Carboxilasa/química , Proteínas Arqueales/química , Pyrococcus horikoshii/química , Acetil-CoA Carboxilasa/genética , Aminoácidos/química , Aminoácidos/genética , Proteínas Arqueales/genética , Cristalografía por Rayos X , Acido Graso Sintasa Tipo II/química , Acido Graso Sintasa Tipo II/genética , Modelos Moleculares , Mutagénesis Sitio-Dirigida , Conformación Proteica , Estructura Secundaria de Proteína , Pyrococcus horikoshii/genética
8.
BMC Bioinformatics ; 18(1): 289, 2017 Jun 02.
Artículo en Inglés | MEDLINE | ID: mdl-28578658

RESUMEN

BACKGROUND: N-terminal acetylation is one of the most common protein modifications in eukaryotes and occurs co-translationally when the N-terminus of the nascent polypeptide is still attached to the ribosome. This modification has been shown to be involved in a wide range of biological phenomena such as protein half-life regulation, protein-protein and protein-membrane interactions, and protein subcellular localization. Thus, accurately predicting which proteins receive an acetyl group based on their protein sequence is expected to facilitate the functional study of this modification. As the occurrence of N-terminal acetylation strongly depends on the context of protein sequences, attempts to understand the sequence determinants of N-terminal acetylation were conducted initially by simply examining the N-terminal sequences of many acetylated and unacetylated proteins and more recently by machine learning approaches. However, a complete understanding of the sequence determinants of this modification remains to be elucidated. RESULTS: We obtained curated N-terminally acetylated and unacetylated sequences from the UniProt database and employed a decision tree algorithm to identify the sequence determinants of N-terminal acetylation for proteins whose initiator methionine (iMet) residues have been removed. The results suggested that the main determinants of N-terminal acetylation are contained within the first five residues following iMet and that the first and second positions are the most important discriminator for the occurrence of this phenomenon. The results also indicated the existence of position-specific preferred and inhibitory residues that determine the occurrence of N-terminal acetylation. The developed predictor software, termed NT-AcPredictor, accurately predicted the N-terminal acetylation, with an overall performance comparable or superior to those of preceding predictors incorporating machine learning algorithms. CONCLUSION: Our machine learning approach based on a decision tree algorithm successfully provided several sequence determinants of N-terminal acetylation for proteins lacking iMet, some of which have not previously been described. Although these sequence determinants remain insufficient to comprehensively predict the occurrence of this modification, indicating that further work on this topic is still required, the developed predictor, NT-AcPredictor, can be used to predict N-terminal acetylation with an accuracy of more than 80%.


Asunto(s)
Algoritmos , Proteínas/metabolismo , Acetilación , Secuencia de Aminoácidos , Bases de Datos Factuales , Proteínas/química , Electricidad Estática
9.
Biophys Physicobiol ; 13: 157-163, 2016.
Artículo en Inglés | MEDLINE | ID: mdl-27924270

RESUMEN

Functional sites on proteins play an important role in various molecular interactions and reactions between proteins and other molecules. Thus, mutations in functional sites can severely affect the overall phenotype. Progress of genome sequencing projects has yielded a wealth of information on single nucleotide variants (SNVs), especially those with less than 1% minor allele frequency (rare variants). To understand the functional influence of genetic variants at a protein level, we investigated the relationship between SNVs and protein functional sites in terms of minor allele frequency and the structural position of variants. As a result, we observed that SNVs were less abundant at ligand binding sites, which is consistent with a previous study on SNVs and protein interaction sites. Additionally, we found that non-rare variants tended to be located slightly apart from enzyme active sites. Examination of non-rare variants revealed that most of the mutations resulted in moderate changes of the physico-chemical properties of amino acids, suggesting the existence of functional constraints. In conclusion, this study shows that the mapping of genetic variants on protein structures could be a powerful approach to evaluate the functional impact of rare genetic variations.

10.
Bioinformatics ; 32(21): 3246-3251, 2016 11 01.
Artículo en Inglés | MEDLINE | ID: mdl-27378296

RESUMEN

MOTIVATION: Large multiple sequence alignments (MSAs), consisting of thousands of sequences, are becoming more and more common, due to advances in sequencing technologies. The MAFFT MSA program has several options for building large MSAs, but their performances have not been sufficiently assessed yet, because realistic benchmarking of large MSAs has been difficult. Recently, such assessments have been made possible through the HomFam and ContTest benchmark protein datasets. Along with the development of these datasets, an interesting theory was proposed: chained guide trees increase the accuracy of MSAs of structurally conserved regions. This theory challenges the basis of progressive alignment methods and needs to be examined by being compared with other known methods including computationally intensive ones. RESULTS: We used HomFam, ContTest and OXFam (an extended version of OXBench) to evaluate several methods enabled in MAFFT: (1) a progressive method with approximate guide trees, (2) a progressive method with chained guide trees, (3) a combination of an iterative refinement method and a progressive method and (4) a less approximate progressive method that uses a rigorous guide tree and consistency score. Other programs, Clustal Omega and UPP, available for large MSAs, were also included into the comparison. The effect of method 2 (chained guide trees) was positive in ContTest but negative in HomFam and OXFam. Methods 3 and 4 increased the benchmark scores more consistently than method 2 for the three datasets, suggesting that they are safer to use. AVAILABILITY AND IMPLEMENTATION: http://mafft.cbrc.jp/alignment/software/ CONTACT: katoh@ifrec.osaka-u.ac.jpSupplementary information: Supplementary data are available at Bioinformatics online.


Asunto(s)
Algoritmos , Alineación de Secuencia , Programas Informáticos , Proteínas
11.
J Struct Funct Genomics ; 17(4): 147-154, 2016 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-28083762

RESUMEN

Protein database search for public databases is a fundamental step in the target selection of proteins in structural and functional genomics and also for inferring protein structure, function, and evolution. Most database search methods employ amino acid substitution matrices to score amino acid pairs. The choice of substitution matrix strongly affects homology detection performance. We earlier proposed a substitution matrix named MIQS that was optimized for distant protein homology search. Herein we further evaluate MIQS in combination with LAST, a heuristic and fast database search tool with a tunable sensitivity parameter m, where larger m denotes higher sensitivity. Results show that MIQS substantially improves the homology detection and alignment quality performance of LAST across diverse m parameters. Against a protein database consisting of approximately 15 million sequences, LAST with m = 105 achieves better homology detection performance than BLASTP, and completes the search 20 times faster. Compared to the most sensitive existing methods being used today, CS-BLAST and SSEARCH, LAST with MIQS and m = 106 shows comparable homology detection performance at 2.0 and 3.9 times greater speed, respectively. Results demonstrate that MIQS-powered LAST is a time-efficient method for sensitive and accurate homology search.


Asunto(s)
Heurística Computacional , Bases de Datos de Proteínas , Análisis de Secuencia de Proteína , Algoritmos , Biología Computacional , Modelos Moleculares , Proteínas/química , Alineación de Secuencia
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA