Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 11 de 11
Filter
Add more filters











Publication year range
1.
Nutrients ; 15(4)2023 Feb 04.
Article in English | MEDLINE | ID: mdl-36839168

ABSTRACT

Circulating fatty acid composition is assumed to play an important role in metabolic dysfunction-associated fatty liver disease (MAFLD) pathogenesis. This study aimed to investigate the association between the overall balance of serum fatty acid composition and MAFLD prevalence. This cross-sectional study involved 400 Japanese individuals recruited from a health-screening program. We measured fatty acids in serum lipids using gas chromatography-mass spectrometry. The serum fatty acid composition balance was evaluated using fuzzy c-means clustering, which assigns individual data points to multiple clusters and calculates the percentage of data points belonging to multiple clusters, and serum fatty acid mass%. The participants were classified into four characteristic subclasses (i.e., Clusters 1, 2, 3, and 4), and the specific serum fatty acid composition balance (i.e., Cluster 4) was associated with a higher MAFLD prevalence. We suggest that the fuzzy c-means method can be used to determine the circulating fatty acid composition balance and highlight the importance of focusing on this balance when examining the relationship between MAFLD and serum fatty acids.


Subject(s)
Fatty Acids , Non-alcoholic Fatty Liver Disease , Humans , Cross-Sectional Studies , Cluster Analysis , Gas Chromatography-Mass Spectrometry
2.
Front Artif Intell ; 5: 924688, 2022.
Article in English | MEDLINE | ID: mdl-36304959

ABSTRACT

Attention mechanisms are one of the most frequently used architectures in the development of artificial intelligence because they can process contextual information efficiently. Various artificial intelligence architectures, such as Transformer for processing natural language, image data, etc., include the Attention. Various improvements have been made to enhance its performance since Attention is a powerful component to realize artificial intelligence. The time complexity of Attention depends on the square of the input sequence length. Developing methods to improve the time complexity of Attention is one of the most popular research topics. Attention is a mechanism that conveys contextual information of input sequences to downstream networks. Thus, if one wants to improve the performance of processing contextual information, the focus should not be confined only on improving Attention but also on devising other similar mechanisms as possible alternatives. In this study, we devised an alternative mechanism called "Relation" that can understand the context information of sequential data. Relation is easy to implement, and its time complexity depends only on the length of the sequences; a comparison of the performance of Relation and Attention on several benchmark datasets showed that the context processing capability of Relation is comparable to that of Attention but with less computation time. Processing contextual information at high speeds would be useful because natural language processing and biological sequence processing sometimes deal with very long sequences. Hence, Relation is an ideal option for processing context information.

3.
Brief Bioinform ; 20(4): 1160-1166, 2019 07 19.
Article in English | MEDLINE | ID: mdl-28968734

ABSTRACT

This article describes several features in the MAFFT online service for multiple sequence alignment (MSA). As a result of recent advances in sequencing technologies, huge numbers of biological sequences are available and the need for MSAs with large numbers of sequences is increasing. To extract biologically relevant information from such data, sophistication of algorithms is necessary but not sufficient. Intuitive and interactive tools for experimental biologists to semiautomatically handle large data are becoming important. We are working on development of MAFFT toward these two directions. Here, we explain (i) the Web interface for recently developed options for large data and (ii) interactive usage to refine sequence data sets and MSAs.


Subject(s)
Sequence Alignment/methods , Software , Algorithms , Computational Biology/methods , Databases, Genetic , Internet , Sequence Alignment/statistics & numerical data , Sequence Analysis , User-Computer Interface
4.
BMC Bioinformatics ; 19(1): 272, 2018 07 18.
Article in English | MEDLINE | ID: mdl-30021530

ABSTRACT

BACKGROUND: Long short-term memory (LSTM) is one of the most attractive deep learning methods to learn time series or contexts of input data. Increasing studies, including biological sequence analyses in bioinformatics, utilize this architecture. Amino acid sequence profiles are widely used for bioinformatics studies, such as sequence similarity searches, multiple alignments, and evolutionary analyses. Currently, many biological sequences are becoming available, and the rapidly increasing amount of sequence data emphasizes the importance of scalable generators of amino acid sequence profiles. RESULTS: We employed the LSTM network and developed a novel profile generator to construct profiles without any assumptions, except for input sequence context. Our method could generate better profiles than existing de novo profile generators, including CSBuild and RPS-BLAST, on the basis of profile-sequence similarity search performance with linear calculation costs against input sequence size. In addition, we analyzed the effects of the memory power of LSTM and found that LSTM had high potential power to detect long-range interactions between amino acids, as in the case of beta-strand formation, which has been a difficult problem in protein bioinformatics using sequence information. CONCLUSION: We demonstrated the importance of sequence context and the feasibility of LSTM on biological sequence analyses. Our results demonstrated the effectiveness of memories in LSTM and showed that our de novo profile generator, SPBuild, achieved higher performance than that of existing methods for profile prediction of beta-strands, where long-range interactions of amino acids are important and are known to be difficult for the existing window-based prediction methods. Our findings will be useful for the development of other prediction methods related to biological sequences by machine learning methods.


Subject(s)
Computational Biology/methods , Deep Learning , Neural Networks, Computer , Amino Acid Sequence , Databases as Topic , Proteins/chemistry , ROC Curve , Sequence Homology, Amino Acid
5.
Bioinformatics ; 34(14): 2490-2492, 2018 07 15.
Article in English | MEDLINE | ID: mdl-29506019

ABSTRACT

Summary: We report an update for the MAFFT multiple sequence alignment program to enable parallel calculation of large numbers of sequences. The G-INS-1 option of MAFFT was recently reported to have higher accuracy than other methods for large data, but this method has been impractical for most large-scale analyses, due to the requirement of large computational resources. We introduce a scalable variant, G-large-INS-1, which has equivalent accuracy to G-INS-1 and is applicable to 50 000 or more sequences. Availability and implementation: This feature is available in MAFFT versions 7.355 or later at https://mafft.cbrc.jp/alignment/software/mpi.html. Supplementary information: Supplementary data are available at Bioinformatics online.


Subject(s)
Computational Biology/methods , Sequence Alignment/methods , Software , Algorithms , Protein Structure, Secondary , Sequence Analysis, Protein/methods , Sequence Analysis, RNA/methods
6.
Algorithms Mol Biol ; 13: 5, 2018.
Article in English | MEDLINE | ID: mdl-29467815

ABSTRACT

BACKGROUND: A profile-comparison method with position-specific scoring matrix (PSSM) is among the most accurate alignment methods. Currently, cosine similarity and correlation coefficients are used as scoring functions of dynamic programming to calculate similarity between PSSMs. However, it is unclear whether these functions are optimal for profile alignment methods. By definition, these functions cannot capture nonlinear relationships between profiles. Therefore, we attempted to discover a novel scoring function, which was more suitable for the profile-comparison method than existing functions, using neural networks. RESULTS: Although neural networks required derivative-of-cost functions, the problem being addressed in this study lacked them. Therefore, we implemented a novel derivative-free neural network by combining a conventional neural network with an evolutionary strategy optimization method used as a solver. Using this novel neural network system, we optimized the scoring function to align remote sequence pairs. Our results showed that the pairwise-profile aligner using the novel scoring function significantly improved both alignment sensitivity and precision relative to aligners using existing functions. CONCLUSIONS: We developed and implemented a novel derivative-free neural network and aligner (Nepal) for optimizing sequence alignments. Nepal improved alignment quality by adapting to remote sequence alignments and increasing the expressiveness of similarity scores. Additionally, this novel scoring function can be realized using a simple matrix operation and easily incorporated into other aligners. Moreover our scoring function could potentially improve the performance of homology detection and/or multiple-sequence alignment of remote homologous sequences. The goal of the study was to provide a novel scoring function for profile alignment method and develop a novel learning system capable of addressing derivative-free problems. Our system is capable of optimizing the performance of other sophisticated methods and solving problems without derivative-of-cost functions, which do not always exist in practical problems. Our results demonstrated the usefulness of this optimization method for derivative-free problems.

7.
Acta Crystallogr D Struct Biol ; 73(Pt 9): 757-766, 2017 Sep 01.
Article in English | MEDLINE | ID: mdl-28876239

ABSTRACT

An alternative rational approach to improve protein crystals by using single-site mutation of surface residues is proposed based on the results of a statistical analysis using a compiled data set of 918 independent crystal structures, thereby reflecting not only the entropic effect but also other effects upon protein crystallization. This analysis reveals a clear difference in the crystal-packing propensity of amino acids depending on the secondary-structural class. To verify this result, a systematic crystallization experiment was performed with the biotin carboxyl carrier protein from Pyrococcus horikoshii OT3 (PhBCCP). Six single-site mutations were examined: Ala138 on the surface of a ß-sheet was mutated to Ile, Tyr, Arg, Gln, Val and Lys. In agreement with prediction, it was observed that the two mutants (A138I and A138Y) harbouring the residues with the highest crystal-packing propensities for ß-sheet at position 138 provided better crystallization scores relative to those of other constructs, including the wild type, and that the crystal-packing propensity for ß-sheet provided the best correlation with the ratio of obtaining crystals. Two new crystal forms of these mutants were obtained that diffracted to high resolution, generating novel packing interfaces with the mutated residues (Ile/Tyr). The mutations introduced did not affect the overall structures, indicating that a ß-sheet can accommodate a successful mutation if it is carefully selected so as to avoid intramolecular steric hindrance. A significant negative correlation between the ratio of obtaining amorphous precipitate and the crystal-packing propensity was also found.


Subject(s)
Acetyl-CoA Carboxylase/chemistry , Archaeal Proteins/chemistry , Pyrococcus horikoshii/chemistry , Acetyl-CoA Carboxylase/genetics , Amino Acids/chemistry , Amino Acids/genetics , Archaeal Proteins/genetics , Crystallography, X-Ray , Fatty Acid Synthase, Type II/chemistry , Fatty Acid Synthase, Type II/genetics , Models, Molecular , Mutagenesis, Site-Directed , Protein Conformation , Protein Structure, Secondary , Pyrococcus horikoshii/genetics
8.
BMC Bioinformatics ; 18(1): 289, 2017 Jun 02.
Article in English | MEDLINE | ID: mdl-28578658

ABSTRACT

BACKGROUND: N-terminal acetylation is one of the most common protein modifications in eukaryotes and occurs co-translationally when the N-terminus of the nascent polypeptide is still attached to the ribosome. This modification has been shown to be involved in a wide range of biological phenomena such as protein half-life regulation, protein-protein and protein-membrane interactions, and protein subcellular localization. Thus, accurately predicting which proteins receive an acetyl group based on their protein sequence is expected to facilitate the functional study of this modification. As the occurrence of N-terminal acetylation strongly depends on the context of protein sequences, attempts to understand the sequence determinants of N-terminal acetylation were conducted initially by simply examining the N-terminal sequences of many acetylated and unacetylated proteins and more recently by machine learning approaches. However, a complete understanding of the sequence determinants of this modification remains to be elucidated. RESULTS: We obtained curated N-terminally acetylated and unacetylated sequences from the UniProt database and employed a decision tree algorithm to identify the sequence determinants of N-terminal acetylation for proteins whose initiator methionine (iMet) residues have been removed. The results suggested that the main determinants of N-terminal acetylation are contained within the first five residues following iMet and that the first and second positions are the most important discriminator for the occurrence of this phenomenon. The results also indicated the existence of position-specific preferred and inhibitory residues that determine the occurrence of N-terminal acetylation. The developed predictor software, termed NT-AcPredictor, accurately predicted the N-terminal acetylation, with an overall performance comparable or superior to those of preceding predictors incorporating machine learning algorithms. CONCLUSION: Our machine learning approach based on a decision tree algorithm successfully provided several sequence determinants of N-terminal acetylation for proteins lacking iMet, some of which have not previously been described. Although these sequence determinants remain insufficient to comprehensively predict the occurrence of this modification, indicating that further work on this topic is still required, the developed predictor, NT-AcPredictor, can be used to predict N-terminal acetylation with an accuracy of more than 80%.


Subject(s)
Algorithms , Proteins/metabolism , Acetylation , Amino Acid Sequence , Databases, Factual , Proteins/chemistry , Static Electricity
9.
Biophys Physicobiol ; 13: 157-163, 2016.
Article in English | MEDLINE | ID: mdl-27924270

ABSTRACT

Functional sites on proteins play an important role in various molecular interactions and reactions between proteins and other molecules. Thus, mutations in functional sites can severely affect the overall phenotype. Progress of genome sequencing projects has yielded a wealth of information on single nucleotide variants (SNVs), especially those with less than 1% minor allele frequency (rare variants). To understand the functional influence of genetic variants at a protein level, we investigated the relationship between SNVs and protein functional sites in terms of minor allele frequency and the structural position of variants. As a result, we observed that SNVs were less abundant at ligand binding sites, which is consistent with a previous study on SNVs and protein interaction sites. Additionally, we found that non-rare variants tended to be located slightly apart from enzyme active sites. Examination of non-rare variants revealed that most of the mutations resulted in moderate changes of the physico-chemical properties of amino acids, suggesting the existence of functional constraints. In conclusion, this study shows that the mapping of genetic variants on protein structures could be a powerful approach to evaluate the functional impact of rare genetic variations.

10.
Bioinformatics ; 32(21): 3246-3251, 2016 11 01.
Article in English | MEDLINE | ID: mdl-27378296

ABSTRACT

MOTIVATION: Large multiple sequence alignments (MSAs), consisting of thousands of sequences, are becoming more and more common, due to advances in sequencing technologies. The MAFFT MSA program has several options for building large MSAs, but their performances have not been sufficiently assessed yet, because realistic benchmarking of large MSAs has been difficult. Recently, such assessments have been made possible through the HomFam and ContTest benchmark protein datasets. Along with the development of these datasets, an interesting theory was proposed: chained guide trees increase the accuracy of MSAs of structurally conserved regions. This theory challenges the basis of progressive alignment methods and needs to be examined by being compared with other known methods including computationally intensive ones. RESULTS: We used HomFam, ContTest and OXFam (an extended version of OXBench) to evaluate several methods enabled in MAFFT: (1) a progressive method with approximate guide trees, (2) a progressive method with chained guide trees, (3) a combination of an iterative refinement method and a progressive method and (4) a less approximate progressive method that uses a rigorous guide tree and consistency score. Other programs, Clustal Omega and UPP, available for large MSAs, were also included into the comparison. The effect of method 2 (chained guide trees) was positive in ContTest but negative in HomFam and OXFam. Methods 3 and 4 increased the benchmark scores more consistently than method 2 for the three datasets, suggesting that they are safer to use. AVAILABILITY AND IMPLEMENTATION: http://mafft.cbrc.jp/alignment/software/ CONTACT: katoh@ifrec.osaka-u.ac.jpSupplementary information: Supplementary data are available at Bioinformatics online.


Subject(s)
Algorithms , Sequence Alignment , Software , Proteins
11.
J Struct Funct Genomics ; 17(4): 147-154, 2016 Dec.
Article in English | MEDLINE | ID: mdl-28083762

ABSTRACT

Protein database search for public databases is a fundamental step in the target selection of proteins in structural and functional genomics and also for inferring protein structure, function, and evolution. Most database search methods employ amino acid substitution matrices to score amino acid pairs. The choice of substitution matrix strongly affects homology detection performance. We earlier proposed a substitution matrix named MIQS that was optimized for distant protein homology search. Herein we further evaluate MIQS in combination with LAST, a heuristic and fast database search tool with a tunable sensitivity parameter m, where larger m denotes higher sensitivity. Results show that MIQS substantially improves the homology detection and alignment quality performance of LAST across diverse m parameters. Against a protein database consisting of approximately 15 million sequences, LAST with m = 105 achieves better homology detection performance than BLASTP, and completes the search 20 times faster. Compared to the most sensitive existing methods being used today, CS-BLAST and SSEARCH, LAST with MIQS and m = 106 shows comparable homology detection performance at 2.0 and 3.9 times greater speed, respectively. Results demonstrate that MIQS-powered LAST is a time-efficient method for sensitive and accurate homology search.


Subject(s)
Computer Heuristics , Databases, Protein , Sequence Analysis, Protein , Algorithms , Computational Biology , Models, Molecular , Proteins/chemistry , Sequence Alignment
SELECTION OF CITATIONS
SEARCH DETAIL