Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 6 de 6
Filter
Add more filters










Database
Language
Publication year range
1.
IEEE Trans Nanobioscience ; 19(3): 562-570, 2020 07.
Article in English | MEDLINE | ID: mdl-32340957

ABSTRACT

The three-dimensional structures populated by a protein molecule determine to a great extent its biological activities. The rich information encoded by protein structure on protein function continues to motivate the development of computational approaches for determining functionally-relevant structures. The majority of structures generated in silico are not relevant. Discriminating relevant/native protein structures from non-native ones is an outstanding challenge in computational structural biology. Inherently, this is a recognition problem that can be addressed under the umbrella of machine learning. In this paper, based on the premise that near-native structures are effectively anomalies, we build on the concept of anomaly detection in machine learning. We propose methods that automatically select relevant subsets, as well as methods that select a single structure to offer as prediction. Evaluations are carried out on benchmark datasets and demonstrate that the proposed methods advance the state of the art. The presented results motivate further building on and adapting concepts and techniques from machine learning to improve recognition of near-native structures in protein structure prediction.


Subject(s)
Computational Biology/methods , Machine Learning , Protein Conformation , Proteins/chemistry , Algorithms , Databases, Protein , Proteins/physiology
2.
Methods Mol Biol ; 1958: 147-171, 2019.
Article in English | MEDLINE | ID: mdl-30945218

ABSTRACT

The protein energy landscape, which lifts the protein structure space by associating energies with structures, has been useful in improving our understanding of the relationship between structure, dynamics, and function. Currently, however, it is challenging to automatically extract and utilize the underlying organization of an energy landscape to the link structural states it houses to biological activity. In this chapter, we first report on two computational approaches that extract such an organization, one that ignores energies and operates directly in the structure space and another that operates on the energy landscape associated with the structure space. We then describe two complementary approaches, one based on unsupervised learning and another based on supervised learning. Both approaches utilize the extracted organization to address the problem of decoy selection in template-free protein structure prediction. The presented results make the case that learning organizations of protein energy landscapes advances our ability to link structures to biological activity.


Subject(s)
Computational Biology/methods , Protein Conformation , Proteins/chemistry , Algorithms , Protein Folding , Thermodynamics
3.
J Bioinform Comput Biol ; 15(6): 1740006, 2017 Dec.
Article in English | MEDLINE | ID: mdl-29113561

ABSTRACT

Metagenomics is the collective sequencing of co-existing microbial communities which are ubiquitous across various clinical and ecological environments. Due to the large volume and random short sequences (reads) obtained from community sequences, analysis of diversity, abundance and functions of different organisms within these communities are challenging tasks. We present a fast and scalable clustering algorithm for analyzing large-scale metagenome sequence data. Our approach achieves efficiency by partitioning the large number of sequence reads into groups (called canopies) using hashing. These canopies are then refined by using state-of-the-art sequence clustering algorithms. This canopy-clustering (CC) algorithm can be used as a pre-processing phase for computationally expensive clustering algorithms. We use and compare three hashing schemes for canopy construction with five popular and state-of-the-art sequence clustering methods. We evaluate our clustering algorithm on synthetic and real-world 16S and whole metagenome benchmarks. We demonstrate the ability of our proposed approach to determine meaningful Operational Taxonomic Units (OTU) and observe significant speedup with regards to run time when compared to different clustering algorithms. We also make our source code publicly available on Github. a.


Subject(s)
Algorithms , Biodiversity , Metagenome , Metagenomics/methods , Cluster Analysis , Databases, Factual , Gastrointestinal Microbiome/genetics , Humans , Liver Cirrhosis/microbiology , Microbiota , Phylogeny , RNA, Ribosomal, 16S , RNA, Ribosomal, 18S , Sequence Analysis, RNA/methods , Soil Microbiology
4.
BMC Med Inform Decis Mak ; 16 Suppl 2: 78, 2016 07 21.
Article in English | MEDLINE | ID: mdl-27461467

ABSTRACT

BACKGROUND: People want to live independently, but too often disabilities or advanced age robs them of the ability to do the necessary activities of daily living (ADLs). Finding relationships between electromyograms measured in the arm and movements of the hand and wrist needed to perform ADLs can help address performance deficits and be exploited in designing myoelectrical control systems for prosthetics and computer interfaces. METHODS: This paper reports on several machine learning techniques employed to discover the electromyogram patterns present when performing 24 typical fine motor functional activities of the hand and the rest position used to accomplish ADLs. Accelerometer data is collected from the hand as an aid in identifying the start and end of movements and to help in labeling the signal data. Techniques employed include classification of 100 ms individual signal instances, using a symbolic representation to approximate signal streams, and the use of nearest neighbor in two specific situations: creation of an affinity matrix to model learning instances and classify based on multiple adjacent signal values, and using Dynamic Time Warping (DTW) as a distance measure to classify entire activity segments. RESULTS: Results show the patterns can be learned to an accuracy of 76.64 % for a 25 class problem when classifying 100 ms instances, 83.63 % with the affinity matrix approach with symbolic representation, and 85.22 % with Dynamic Time Warping. Classification errors are, with a few exceptions, concentrated within particular grip action groups. CONCLUSION: The findings reported here support the view that grips and movements of the hand can be distinguished by combining electrical and mechanical properties of the task to an accuracy of 85.22 % for a 25 class problem. Converting the signals to a symbolic representation and classifying based on larger portions of the signal stream improve classification accuracy. This is both clinically useful and opens the way for an approach to help simulate hand functional activities. With improvements it may also prove useful in real time control applications.


Subject(s)
Electromyography/methods , Hand/physiology , Machine Learning , Movement/physiology , Pattern Recognition, Automated , Signal Processing, Computer-Assisted , Hand/physiopathology , Humans
5.
BMC Bioinformatics ; 15 Suppl 8: S4, 2014.
Article in English | MEDLINE | ID: mdl-25080993

ABSTRACT

BACKGROUND: Due to rapid sequencing of genomes, there are now millions of deposited protein sequences with no known function. Fast sequence-based comparisons allow detecting close homologs for a protein of interest to transfer functional information from the homologs to the given protein. Sequence-based comparison cannot detect remote homologs, in which evolution has adjusted the sequence while largely preserving structure. Structure-based comparisons can detect remote homologs but most methods for doing so are too expensive to apply at a large scale over structural databases of proteins. Recently, fragment-based structural representations have been proposed that allow fast detection of remote homologs with reasonable accuracy. These representations have also been used to obtain linearly-reducible maps of protein structure space. It has been shown, as additionally supported from analysis in this paper that such maps preserve functional co-localization of the protein structure space. METHODS: Inspired by a recent application of the Latent Dirichlet Allocation (LDA) model for conducting structural comparisons of proteins, we propose higher-order LDA-obtained topic-based representations of protein structures to provide an alternative route for remote homology detection and organization of the protein structure space in few dimensions. Various techniques based on natural language processing are proposed and employed to aid the analysis of topics in the protein structure domain. RESULTS: We show that a topic-based representation is just as effective as a fragment-based one at automated detection of remote homologs and organization of protein structure space. We conduct a detailed analysis of the information content in the topic-based representation, showing that topics have semantic meaning. The fragment-based and topic-based representations are also shown to allow prediction of superfamily membership. CONCLUSIONS: This work opens exciting venues in designing novel representations to extract information about protein structures, as well as organizing and mining protein structure space with mature text mining tools.


Subject(s)
Computational Biology/methods , Proteins/chemistry , Algorithms , Amino Acid Sequence , Automation , Computational Biology/instrumentation , Natural Language Processing
6.
BMC Syst Biol ; 7 Suppl 4: S11, 2013.
Article in English | MEDLINE | ID: mdl-24565031

ABSTRACT

BACKGROUND: Advances in biotechnology have changed the manner of characterizing large populations of microbial communities that are ubiquitous across several environments."Metagenome" sequencing involves decoding the DNA of organisms co-existing within ecosystems ranging from ocean, soil and human body. Several researchers are interested in metagenomics because it provides an insight into the complex biodiversity across several environments. Clinicians are using metagenomics to determine the role played by collection of microbial organisms within human body with respect to human health wellness and disease. RESULTS: We have developed an efficient and scalable, species richness estimation algorithm that uses locality sensitive hashing (LSH). Our algorithm achieves efficiency by approximating the pairwise sequence comparison operations using hashing and also incorporates matching of fixed-length, gapless subsequences criterion to improve the quality of sequence comparisons. We use LSH-based similarity function to cluster similar sequences and make individual groups, called operational taxonomic units (OTUs). We also compute different species diversity/richness metrics by utilizing OTU assignment results to further extend our analysis. CONCLUSION: The algorithm is evaluated on synthetic samples and eight targeted 16S rRNA metagenome samples taken from seawater. We compare the performance of our algorithm with several competing diversity estimation algorithms. We show the benefits of our approach with respect to computational runtime and meaningful OTU assignments. We also demonstrate practical significance of the developed algorithm by comparing bacterial diversity and structure across different skin locations. WEBSITE: http://www.cs.gmu.edu/~mlbio/LSH-DIV.


Subject(s)
Biodiversity , Metagenomics/methods , RNA, Ribosomal, 16S/genetics , Algorithms , Cluster Analysis , Environment , Humans , Time Factors
SELECTION OF CITATIONS
SEARCH DETAIL
...