Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 34
Filter
Add more filters










Publication year range
1.
ACS Omega ; 9(7): 7471-7479, 2024 Feb 20.
Article in English | MEDLINE | ID: mdl-38405499

ABSTRACT

Computational prediction of molecule-protein interactions has been key for developing new molecules to interact with a target protein for therapeutics development. Previous work includes two independent streams of approaches: (1) predicting protein-protein interactions (PPIs) between naturally occurring proteins and (2) predicting binding affinities between proteins and small-molecule ligands [also known as drug-target interaction (DTI)]. Studying the two problems in isolation has limited the ability of these computational models to generalize across the PPI and DTI tasks, both of which ultimately involve noncovalent interactions with a protein target. In this work, we developed Equivariant Graph of Graphs neural Network (EGGNet), a geometric deep learning (GDL) framework, for molecule-protein binding predictions that can handle three types of molecules for interacting with a target protein: (1) small molecules, (2) synthetic peptides, and (3) natural proteins. EGGNet leverages a graph of graphs (GoG) representation constructed from the molecular structures at atomic resolution and utilizes a multiresolution equivariant graph neural network to learn from such representations. In addition, EGGNet leverages the underlying biophysics and makes use of both atom- and residue-level interactions, which improve EGGNet's ability to rank candidate poses from blind docking. EGGNet achieves competitive performance on both a public protein-small-molecule binding affinity prediction task (80.2% top 1 success rate on CASF-2016) and a synthetic protein interface prediction task (88.4% area under the precision-recall curve). We envision that the proposed GDL framework can generalize to many other protein interaction prediction problems, such as binding site prediction and molecular docking, helping accelerate protein engineering and structure-based drug development.

2.
PLoS One ; 17(12): e0269509, 2022.
Article in English | MEDLINE | ID: mdl-36584000

ABSTRACT

Opioid overdoses within the United States continue to rise and have been negatively impacting the social and economic status of the country. In order to effectively allocate resources and identify policy solutions to reduce the number of overdoses, it is important to understand the geographical differences in opioid overdose rates and their causes. In this study, we utilized data on emergency department opioid overdose (EDOOD) visits to explore the county-level spatio-temporal distribution of opioid overdose rates within the state of Virginia and their association with aggregate socio-ecological factors. The analyses were performed using a combination of techniques including Moran's I and multilevel modeling. Using data from 2016-2021, we found that Virginia counties had notable differences in their EDOOD visit rates with significant neighborhood-level associations: many counties in the southwestern region were consistently identified as the hotspots (areas with a higher concentration of EDOOD visits) whereas many counties in the northern region were consistently identified as the coldspots (areas with a lower concentration of EDOOD visits). In most Virginia counties, EDOOD visit rates declined from 2017 to 2018. In more recent years (since 2019), the visit rates showed an increasing trend. The multilevel modeling revealed that the change in clinical care factors (i.e., access to care and quality of care) and socio-economic factors (i.e., levels of education, employment, income, family and social support, and community safety) were significantly associated with the change in the EDOOD visit rates. The findings from this study have the potential to assist policymakers in proper resource planning thereby improving health outcomes.


Subject(s)
Drug Overdose , Opiate Overdose , Humans , United States , Analgesics, Opioid , Emergency Service, Hospital , Drug Overdose/epidemiology , Virginia/epidemiology
3.
Bioinformatics ; 36(Suppl_1): i39-i47, 2020 07 01.
Article in English | MEDLINE | ID: mdl-32657370

ABSTRACT

MOTIVATION: The human body hosts more microbial organisms than human cells. Analysis of this microbial diversity provides key insight into the role played by these microorganisms on human health. Metagenomics is the collective DNA sequencing of coexisting microbial organisms in an environmental sample or a host. This has several applications in precision medicine, agriculture, environmental science and forensics. State-of-the-art predictive models for phenotype predictions from metagenomic data rely on alignments, assembly, extensive pruning, taxonomic profiling and reference sequence databases. These processes are time consuming and they do not consider novel microbial sequences when aligned with the reference genome, limiting the potential of whole metagenomics. We formulate the problem of predicting human disease from whole-metagenomic data using Multiple Instance Learning (MIL), a popular supervised learning paradigm. Our proposed alignment-free approach provides higher accuracy in prediction by harnessing the capability of deep convolutional neural network (CNN) within a MIL framework and provides interpretability via neural attention mechanism. RESULTS: The MIL formulation combined with the hierarchical feature extraction capability of deep-CNN provides significantly better predictive performance compared to popular existing approaches. The attention mechanism allows for the identification of groups of sequences that are likely to be correlated to diseases providing the much-needed interpretation. Our proposed approach does not rely on alignment, assembly and reference sequence databases; making it fast and scalable for large-scale metagenomic data. We evaluate our method on well-known large-scale metagenomic studies and show that our proposed approach outperforms comparative state-of-the-art methods for disease prediction. AVAILABILITY AND IMPLEMENTATION: https://github.com/mrahma23/IDMIL.


Subject(s)
Metagenome , Metagenomics , Algorithms , Databases, Nucleic Acid , Humans , Neural Networks, Computer , Sequence Analysis, DNA
4.
Article in English | MEDLINE | ID: mdl-28981422

ABSTRACT

The recent advent of Metagenome Wide Association Studies (MGWAS) provides insight into the role of microbes on human health and disease. However, the studies present several computational challenges. In this paper, we demonstrate a novel, efficient, and effective Multiple Instance Learning (MIL) based computational pipeline to predict patient phenotype from metagenomic data. MIL methods have the advantage that besides predicting the clinical phenotype, we can infer the instance level label or role of microbial sequence reads in the specific disease. Specifically, we use a Bag of Words method, which has been shown to be one of the most effective and efficient MIL methods. This involves assembly of the metagenomic sequence data, clustering of the assembled contigs, extracting features from the contigs, and using an SVM classifier to predict patient labels and identify the most relevant sequence clusters. With the exception of the given labels for the patients, this entire process is de novo (unsupervised). We call our pipeline "CAMIL", which stands for Clustering and Assembly with Multiple Instance Learning. We use multiple state-of-the-art clustering methods for feature extraction, evaluation, and comparison of the performance of our proposed approach for each of these clustering methods. We also present a fast and scalable pre-clustering algorithm as a preprocessing step for our proposed pipeline. Our approach achieves efficiency by partitioning the large number of sequence reads into groups (called canopies) using locality sensitive hashing (LSH). These canopies are then refined by using state-of-the-art sequence clustering algorithms. We use data from a well-known MGWAS study of patients with Type-2 Diabetes and show that our pipeline significantly outperforms the classifier used in that paper, as well as other common MIL methods.


Subject(s)
Machine Learning , Metagenome/genetics , Metagenomics/methods , Phenotype , Cluster Analysis , Humans
5.
J Bioinform Comput Biol ; 15(6): 1740006, 2017 Dec.
Article in English | MEDLINE | ID: mdl-29113561

ABSTRACT

Metagenomics is the collective sequencing of co-existing microbial communities which are ubiquitous across various clinical and ecological environments. Due to the large volume and random short sequences (reads) obtained from community sequences, analysis of diversity, abundance and functions of different organisms within these communities are challenging tasks. We present a fast and scalable clustering algorithm for analyzing large-scale metagenome sequence data. Our approach achieves efficiency by partitioning the large number of sequence reads into groups (called canopies) using hashing. These canopies are then refined by using state-of-the-art sequence clustering algorithms. This canopy-clustering (CC) algorithm can be used as a pre-processing phase for computationally expensive clustering algorithms. We use and compare three hashing schemes for canopy construction with five popular and state-of-the-art sequence clustering methods. We evaluate our clustering algorithm on synthetic and real-world 16S and whole metagenome benchmarks. We demonstrate the ability of our proposed approach to determine meaningful Operational Taxonomic Units (OTU) and observe significant speedup with regards to run time when compared to different clustering algorithms. We also make our source code publicly available on Github. a.


Subject(s)
Algorithms , Biodiversity , Metagenome , Metagenomics/methods , Cluster Analysis , Databases, Factual , Gastrointestinal Microbiome/genetics , Humans , Liver Cirrhosis/microbiology , Microbiota , Phylogeny , RNA, Ribosomal, 16S , RNA, Ribosomal, 18S , Sequence Analysis, RNA/methods , Soil Microbiology
6.
IEEE Trans Biomed Eng ; 63(8): 1687-98, 2016 08.
Article in English | MEDLINE | ID: mdl-26560865

ABSTRACT

Surface electromyography (sEMG) has been the predominant method for sensing electrical activity for a number of applications involving muscle-computer interfaces, including myoelectric control of prostheses and rehabilitation robots. Ultrasound imaging for sensing mechanical deformation of functional muscle compartments can overcome several limitations of sEMG, including the inability to differentiate between deep contiguous muscle compartments, low signal-to-noise ratio, and lack of a robust graded signal. The objective of this study was to evaluate the feasibility of real-time graded control using a computationally efficient method to differentiate between complex hand motions based on ultrasound imaging of forearm muscles. Dynamic ultrasound images of the forearm muscles were obtained from six able-bodied volunteers and analyzed to map muscle activity based on the deformation of the contracting muscles during different hand motions. Each participant performed 15 different hand motions, including digit flexion, different grips (i.e., power grasp and pinch grip), and grips in combination with wrist pronation. During the training phase, we generated a database of activity patterns corresponding to different hand motions for each participant. During the testing phase, novel activity patterns were classified using a nearest neighbor classification algorithm based on that database. The average classification accuracy was 91%. Real-time image-based control of a virtual hand showed an average classification accuracy of 92%. Our results demonstrate the feasibility of using ultrasound imaging as a robust muscle-computer interface. Potential clinical applications include control of multiarticulated prosthetic hands, stroke rehabilitation, and fundamental investigations of motor control and biomechanics.


Subject(s)
Forearm/physiology , Hand/physiology , Image Processing, Computer-Assisted/methods , Muscle, Skeletal/physiology , Ultrasonography/methods , Algorithms , Female , Hand Strength/physiology , Humans , Male , Movement/physiology
7.
Annu Int Conf IEEE Eng Med Biol Soc ; 2016: 3219-3222, 2016 Aug.
Article in English | MEDLINE | ID: mdl-28268993

ABSTRACT

Advancements in multiarticulate upper-limb prosthetics have outpaced the development of intuitive, non-invasive control mechanisms for implementing them. Surface electromyography is currently the most popular non-invasive control method, but presents a number of drawbacks including poor deep-muscle specificity. Previous research established the viability of ultrasound imaging as an alternative means of decoding movement intent, and demonstrated the ability to distinguish between complex grasps in able-bodied subjects via imaging of the anterior forearm musculature. In order to translate this work to clinical viability, able-bodied testing is insufficient. Amputation-induced changes in muscular geometry, dynamics, and imaging characteristics are all likely to influence the effectiveness of our existing techniques. In this work, we conducted preliminary trials with a transradial amputee participant to assess these effects, and potentially elucidate necessary refinements to our approach. Two trials were performed, the first using a set of three motion types, and the second using four. After a brief training period in each trial, the participant was able to control a virtual prosthetic hand in real-time; attempted grasps were successfully classified with a rate of 77% in trial 1, and 71% in trial 2. While the results are sub-optimal compared to our previous able-bodied testing, they are a promising step forward. More importantly, the data collected during these trials can provide valuable information for refining our image processing methods, especially via comparison to previously acquired data from able-bodied individuals. Ultimately, further work with amputees is a necessity for translation towards clinical application.


Subject(s)
Amputees , Artificial Limbs , Computer Systems , Ultrasonography/methods , Electromyography , Humans , Image Processing, Computer-Assisted , Movement
8.
Article in English | MEDLINE | ID: mdl-26357091

ABSTRACT

High-throughput experimental techniques provide a wide variety of heterogeneous proteomic data sources. To exploit the information spread across multiple sources for protein function prediction, these data sources are transformed into kernels and then integrated into a composite kernel. Several methods first optimize the weights on these kernels to produce a composite kernel, and then train a classifier on the composite kernel. As such, these approaches result in an optimal composite kernel, but not necessarily in an optimal classifier. On the other hand, some approaches optimize the loss of binary classifiers and learn weights for the different kernels iteratively. For multi-class or multi-label data, these methods have to solve the problem of optimizing weights on these kernels for each of the labels, which are computationally expensive and ignore the correlation among labels. In this paper, we propose a method called Predicting Protein Function using Multiple Kernels (ProMK). ProMK iteratively optimizes the phases of learning optimal weights and reduces the empirical loss of multi-label classifier for each of the labels simultaneously. ProMK can integrate kernels selectively and downgrade the weights on noisy kernels. We investigate the performance of ProMK on several publicly available protein function prediction benchmarks and synthetic datasets. We show that the proposed approach performs better than previously proposed protein function prediction approaches that integrate multiple data sources and multi-label multiple kernel learning methods. The codes of our proposed method are available at https://sites.google.com/site/guoxian85/promk.


Subject(s)
Algorithms , Proteins/chemistry , Proteins/classification , Proteomics/methods , Animals , Humans , Mice , Protein Interaction Maps , Sequence Analysis, Protein , Yeasts/genetics
9.
Int J Bioinform Res Appl ; 11(2): 111-29, 2015.
Article in English | MEDLINE | ID: mdl-25786791

ABSTRACT

The human gut is one of the most densely populated microbial communities in the world. The interaction of microbes with human host cells is responsible for several disease conditions and of criticality to human health. It is imperative to understand the relationships between these microbial communities within the human gut and their roles in disease. In this study we analyse the microbial communities within the human gut and their role in Inflammatory Bowel Disease (IBD). The bacterial communities were interrogated using Length Heterogeneity PCR (LH-PCR) fingerprinting of mucosal and luminal associated microbial communities for a class of healthy and diseases patients.


Subject(s)
Bacteria/genetics , Bacteria/isolation & purification , Inflammatory Bowel Diseases/microbiology , Intestinal Mucosa/microbiology , Pattern Recognition, Automated/methods , Polymerase Chain Reaction/methods , Algorithms , Humans , Microbiota/genetics , Reproducibility of Results , Sensitivity and Specificity , Support Vector Machine
11.
IEEE Trans Neural Syst Rehabil Eng ; 22(1): 69-76, 2014 Jan.
Article in English | MEDLINE | ID: mdl-23996580

ABSTRACT

Recently there have been major advances in the electro-mechanical design of upper extremity prosthetics. However, the development of control strategies for such prosthetics has lagged significantly behind. Conventional noninvasive myoelectric control strategies rely on the amplitude of electromyography (EMG) signals from flexor and extensor muscles in the forearm. Surface EMG has limited specificity for deep contiguous muscles because of cross talk and cannot reliably differentiate between individual digit and joint motions. We present a novel ultrasound imaging based control strategy for upper arm prosthetics that can overcome many of the limitations of myoelectric control. Real time ultrasound images of the forearm muscles were obtained using a wearable mechanically scanned single element ultrasound system, and analyzed to create maps of muscle activity based on changes in the ultrasound echogenicity of the muscle during contraction. Individual digit movements were associated with unique maps of activity. These maps were correlated with previously acquired training data to classify individual digit movements. Preliminary results using ten healthy volunteers demonstrated this approach could provide robust classification of individual finger movements with 98% accuracy (precision 96%-100% and recall 97%-100% for individual finger flexions). The change in ultrasound echogenicity was found to be proportional to the digit flexion speed (R(2)=0.9), and thus our proposed strategy provided a proportional signal that can be used for fine control. We anticipate that ultrasound imaging based control strategies could be a significant improvement over conventional myoelectric control of prosthetics.


Subject(s)
Fingers/physiology , Monitoring, Ambulatory/instrumentation , Movement/physiology , Muscle, Skeletal/diagnostic imaging , Muscle, Skeletal/physiology , Ultrasonography/instrumentation , Equipment Design , Equipment Failure Analysis , Female , Fingers/diagnostic imaging , Humans , Image Interpretation, Computer-Assisted , Male , Muscle Contraction/physiology , Psychomotor Performance/physiology , Reproducibility of Results , Sensitivity and Specificity , Young Adult
13.
Article in English | MEDLINE | ID: mdl-26356025

ABSTRACT

Automated protein function prediction is one of the grand challenges in computational biology. Multi-label learning is widely used to predict functions of proteins. Most of multi-label learning methods make prediction for unlabeled proteins under the assumption that the labeled proteins are completely annotated, i.e., without any missing functions. However, in practice, we may have a subset of the ground-truth functions for a protein, and whether the protein has other functions is unknown. To predict protein functions with incomplete annotations, we propose a Protein Function Prediction method with Weak-label Learning (ProWL) and its variant ProWL-IF. Both ProWL and ProWL-IF can replenish the missing functions of proteins. In addition, ProWL-IF makes use of the knowledge that a protein cannot have certain functions, which can further boost the performance of protein function prediction. Our experimental results on protein-protein interaction networks and gene expression benchmarks validate the effectiveness of both ProWL and ProWL-IF.


Subject(s)
Computational Biology/methods , Models, Statistical , Proteins/classification , Proteins/metabolism , Molecular Sequence Annotation/methods , Protein Interaction Maps/genetics , Proteins/genetics , Transcriptome/genetics
14.
Article in English | MEDLINE | ID: mdl-26357046

ABSTRACT

Classification problems in which several learning tasks are organized hierarchically pose a special challenge because the hierarchical structure of the problems needs to be considered. Multi-task learning (MTL) provides a framework for dealing with such interrelated learning tasks. When two different hierarchical sources organize similar information, in principle, this combined knowledge can be exploited to further improve classification performance. We have studied this problem in the context of protein structure classification by integrating the learning process for two hierarchical protein structure classification database, SCOP and CATH. Our goal is to accurately predict whether a given protein belongs to a particular class in these hierarchies using only the amino acid sequences. We have utilized the recent developments in multi-task learning to solve the interrelated classification problems. We have also evaluated how the various relationships between tasks affect the classification performance. Our evaluations show that learning schemes in which both the classification databases are used outperform the schemes which utilize only one of them.


Subject(s)
Computational Biology/methods , Proteins/chemistry , Proteins/classification , Sequence Analysis, Protein/methods , Amino Acid Sequence , Artificial Intelligence , Databases, Protein
15.
Article in English | MEDLINE | ID: mdl-24334396

ABSTRACT

High-throughput experimental techniques produce several kinds of heterogeneous proteomic and genomic data sets. To computationally annotate proteins, it is necessary and promising to integrate these heterogeneous data sources. Some methods transform these data sources into different kernels or feature representations. Next, these kernels are linearly (or nonlinearly) combined into a composite kernel. The composite kernel is utilized to develop a predictive model to infer the function of proteins. A protein can have multiple roles and functions (or labels). Therefore, multilabel learning methods are also adapted for protein function prediction. We develop a transductive multilabel classifier (TMC) to predict multiple functions of proteins using several unlabeled proteins. We also propose a method called transductive multilabel ensemble classifier (TMEC) for integrating the different data sources using an ensemble approach. The TMEC trains a graph-based multilabel classifier on each single data source, and then combines the predictions of the individual classifiers. We use a directed birelational graph to capture the relationships between pairs of proteins, between pairs of functions, and between proteins and functions. We evaluate the effectiveness of the TMC and TMEC to predict the functions of proteins on three benchmarks. We show that our approaches perform better than recently proposed protein function prediction methods on composite and multiple kernels. The code, data sets used in this paper and supplemental material are available at https://sites.google.com/site/guoxian85/tmec.


Subject(s)
Models, Statistical , Proteins/classification , Proteomics/methods , Algorithms , Animals , Databases, Protein , Diptera , Humans , Proteins/chemistry , Proteins/genetics , Proteins/metabolism , Yeasts
16.
PLoS One ; 8(4): e60042, 2013.
Article in English | MEDLINE | ID: mdl-23565181

ABSTRACT

UNLABELLED: Hepatic encephalopathy (HE) represents a dysfunctional gut-liver-brain axis in cirrhosis which can negatively impact outcomes. This altered gut-brain relationship has been treated using gut-selective antibiotics such as rifaximin, that improve cognitive function in HE, especially its subclinical form, minimal HE (MHE). However, the precise mechanism of the action of rifaximin in MHE is unclear. We hypothesized that modulation of gut microbiota and their end-products by rifaximin would affect the gut-brain axis and improve cognitive performance in cirrhosis. Aim To perform a systems biology analysis of the microbiome, metabolome and cognitive change after rifaximin in MHE. METHODS: Twenty cirrhotics with MHE underwent cognitive testing, endotoxin analysis, urine/serum metabolomics (GC and LC-MS) and fecal microbiome assessment (multi-tagged pyrosequencing) at baseline and 8 weeks post-rifaximin 550 mg BID. Changes in cognition, endotoxin, serum/urine metabolites (and microbiome were analyzed using recommended systems biology techniques. Specifically, correlation networks between microbiota and metabolome were analyzed before and after rifaximin. RESULTS: There was a significant improvement in cognition(six of seven tests improved, p<0.01) and endotoxemia (0.55 to 0.48 Eu/ml, p = 0.02) after rifaximin. There was a significant increase in serum saturated (myristic, caprylic, palmitic, palmitoleic, oleic and eicosanoic) and unsaturated (linoleic, linolenic, gamma-linolenic and arachnidonic) fatty acids post-rifaximin. No significant microbial change apart from a modest decrease in Veillonellaceae and increase in Eubacteriaceae was observed. Rifaximin resulted in a significant reduction in network connectivity and clustering on the correlation networks. The networks centered on Enterobacteriaceae, Porphyromonadaceae and Bacteroidaceae indicated a shift from pathogenic to beneficial metabolite linkages and better cognition while those centered on autochthonous taxa remained similar. CONCLUSIONS: Rifaximin is associated with improved cognitive function and endotoxemia in MHE, which is accompanied by alteration of gut bacterial linkages with metabolites without significant change in microbial abundance. TRIAL REGISTRATION: ClinicalTrials.gov NCT01069133.


Subject(s)
Hepatic Encephalopathy/etiology , Hepatic Encephalopathy/metabolism , Liver Cirrhosis/complications , Liver Cirrhosis/metabolism , Metabolome , Metabolomics , Rifamycins/pharmacology , Cognition/drug effects , Female , Gastrointestinal Tract/drug effects , Gastrointestinal Tract/microbiology , Hepatic Encephalopathy/drug therapy , Humans , Male , Metabolomics/methods , Metagenome/drug effects , Middle Aged , Rifamycins/administration & dosage , Rifamycins/therapeutic use , Rifaximin
17.
BMC Syst Biol ; 7 Suppl 4: S11, 2013.
Article in English | MEDLINE | ID: mdl-24565031

ABSTRACT

BACKGROUND: Advances in biotechnology have changed the manner of characterizing large populations of microbial communities that are ubiquitous across several environments."Metagenome" sequencing involves decoding the DNA of organisms co-existing within ecosystems ranging from ocean, soil and human body. Several researchers are interested in metagenomics because it provides an insight into the complex biodiversity across several environments. Clinicians are using metagenomics to determine the role played by collection of microbial organisms within human body with respect to human health wellness and disease. RESULTS: We have developed an efficient and scalable, species richness estimation algorithm that uses locality sensitive hashing (LSH). Our algorithm achieves efficiency by approximating the pairwise sequence comparison operations using hashing and also incorporates matching of fixed-length, gapless subsequences criterion to improve the quality of sequence comparisons. We use LSH-based similarity function to cluster similar sequences and make individual groups, called operational taxonomic units (OTUs). We also compute different species diversity/richness metrics by utilizing OTU assignment results to further extend our analysis. CONCLUSION: The algorithm is evaluated on synthetic samples and eight targeted 16S rRNA metagenome samples taken from seawater. We compare the performance of our algorithm with several competing diversity estimation algorithms. We show the benefits of our approach with respect to computational runtime and meaningful OTU assignments. We also demonstrate practical significance of the developed algorithm by comparing bacterial diversity and structure across different skin locations. WEBSITE: http://www.cs.gmu.edu/~mlbio/LSH-DIV.


Subject(s)
Biodiversity , Metagenomics/methods , RNA, Ribosomal, 16S/genetics , Algorithms , Cluster Analysis , Environment , Humans , Time Factors
18.
J Bioinform Comput Biol ; 10(5): 1250015, 2012 Oct.
Article in English | MEDLINE | ID: mdl-22849369

ABSTRACT

Next-generation sequencing technologies have allowed researchers to determine the collective genomes of microbial communities co-existing within diverse ecological environments. Varying species abundance, length and complexities within different communities, coupled with discovery of new species makes the problem of taxonomic assignment to short DNA sequence reads extremely challenging. We have developed a new sequence composition-based taxonomic classifier using extreme learning machines referred to as TAC-ELM for metagenomic analysis. TAC-ELM uses the framework of extreme learning machines to quickly and accurately learn the weights for a neural network model. The input features consist of GC content and oligonucleotides. TAC-ELM is evaluated on two metagenomic benchmarks with sequence read lengths reflecting the traditional and current sequencing technologies. Our empirical results indicate the strength of the developed approach, which outperforms state-of-the-art taxonomic classifiers in terms of accuracy and implementation complexity. We also perform experiments that evaluate the pervasive case within metagenome analysis, where a species may not have been previously sequenced or discovered and will not exist in the reference genome databases. TAC-ELM was also combined with BLAST to show improved classification results. Code and Supplementary Results: http://www.cs.gmu.edu/~mlbio/TAC-ELM (BSD License).


Subject(s)
Artificial Intelligence , Metagenome , Metagenomics/methods , Algorithms , Base Sequence , Neural Networks, Computer , Phylogeny , Sequence Analysis, DNA
19.
Am J Physiol Gastrointest Liver Physiol ; 302(9): G966-78, 2012 May 01.
Article in English | MEDLINE | ID: mdl-22241860

ABSTRACT

Several studies indicate the importance of colonic microbiota in metabolic and inflammatory disorders and importance of diet on microbiota composition. The effects of alcohol, one of the prominent components of diet, on colonic bacterial composition is largely unknown. Mounting evidence suggests that gut-derived bacterial endotoxins are cofactors for alcohol-induced tissue injury and organ failure like alcoholic liver disease (ALD) that only occur in a subset of alcoholics. We hypothesized that chronic alcohol consumption results in alterations of the gut microbiome in a subgroup of alcoholics, and this may be responsible for the observed inflammatory state and endotoxemia in alcoholics. Thus we interrogated the mucosa-associated colonic microbiome in 48 alcoholics with and without ALD as well as 18 healthy subjects. Colonic biopsy samples from subjects were analyzed for microbiota composition using length heterogeneity PCR fingerprinting and multitag pyrosequencing. A subgroup of alcoholics have an altered colonic microbiome (dysbiosis). The alcoholics with dysbiosis had lower median abundances of Bacteroidetes and higher ones of Proteobacteria. The observed alterations appear to correlate with high levels of serum endotoxin in a subset of the samples. Network topology analysis indicated that alcohol use is correlated with decreased connectivity of the microbial network, and this alteration is seen even after an extended period of sobriety. We show that the colonic mucosa-associated bacterial microbiome is altered in a subset of alcoholics. The altered microbiota composition is persistent and correlates with endotoxemia in a subgroup of alcoholics.


Subject(s)
Alcoholism/microbiology , Colon/microbiology , Liver Diseases, Alcoholic/microbiology , Metagenome , Adult , Aged , Female , Humans , Middle Aged
20.
BMC Genomics ; 12 Suppl 2: S8, 2011.
Article in English | MEDLINE | ID: mdl-21989307

ABSTRACT

BACKGROUND: Metagenomic assembly is a challenging problem due to the presence of genetic material from multiple organisms. The problem becomes even more difficult when short reads produced by next generation sequencing technologies are used. Although whole genome assemblers are not designed to assemble metagenomic samples, they are being used for metagenomics due to the lack of assemblers capable of dealing with metagenomic samples. We present an evaluation of assembly of simulated short-read metagenomic samples using a state-of-art de Bruijn graph based assembler. RESULTS: We assembled simulated metagenomic reads from datasets of various complexities using a state-of-art de Bruijn graph based parallel assembler. We have also studied the effect of k-mer size used in de Bruijn graph on metagenomic assembly and developed a clustering solution to pool the contigs obtained from different assembly runs, which allowed us to obtain longer contigs. We have also assessed the degree of chimericity of the assembled contigs using an entropy/impurity metric and compared the metagenomic assemblies to assemblies of isolated individual source genomes. CONCLUSIONS: Our results show that accuracy of the assembled contigs was better than expected for the metagenomic samples with a few dominant organisms and was especially poor in samples containing many closely related strains. Clustering contigs from different k-mer parameter of the de Bruijn graph allowed us to obtain longer contigs, however the clustering resulted in accumulation of erroneous contigs thus increasing the error rate in clustered contigs.


Subject(s)
Contig Mapping , Genome, Bacterial , Metagenome , Sequence Analysis, DNA/methods , Software , Algorithms , Computational Biology , Computer Graphics/instrumentation , Computer Simulation , Databases, Nucleic Acid , Entropy , Escherichia coli/genetics , Phylogeny , Sequence Alignment
SELECTION OF CITATIONS
SEARCH DETAIL
...