Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 11 de 11
Filter
1.
Genome Res ; 34(9): 1468-1476, 2024 Oct 11.
Article in English | MEDLINE | ID: mdl-39029947

ABSTRACT

Repetitive DNA (repeats) poses significant challenges for accurate and efficient genome assembly and sequence alignment. This is particularly true for metagenomic data, in which genome dynamics such as horizontal gene transfer, gene duplication, and gene loss/gain complicate accurate genome assembly from metagenomic communities. Detecting repeats is a crucial first step in overcoming these challenges. To address this issue, we propose GraSSRep, a novel approach that leverages the assembly graph's structure through graph neural networks (GNNs) within a self-supervised learning framework to classify DNA sequences into repetitive and nonrepetitive categories. Specifically, we frame this problem as a node classification task within a metagenomic assembly graph. In a self-supervised fashion, we rely on a high-precision (but low-recall) heuristic to generate pseudolabels for a small proportion of the nodes. We then use those pseudolabels to train a GNN embedding and a random forest classifier to propagate the labels to the remaining nodes. In this way, GraSSRep combines sequencing features with predefined and learned graph features to achieve state-of-the-art performance in repeat detection. We evaluate our method using simulated and synthetic metagenomic data sets. The results on the simulated data highlight GraSSRep's robustness to repeat attributes, demonstrating its effectiveness in handling the complexity of repeated sequences. Additionally, experiments with synthetic metagenomic data sets reveal that incorporating the graph structure and the GNN enhances the detection performance. Finally, in comparative analyses, GraSSRep outperforms existing repeat detection tools with respect to precision and recall.


Subject(s)
Metagenomics , Supervised Machine Learning , Metagenomics/methods , Repetitive Sequences, Nucleic Acid , Neural Networks, Computer , Sequence Analysis, DNA/methods , Algorithms , Metagenome
2.
Nucleic Acids Res ; 48(10): 5217-5234, 2020 06 04.
Article in English | MEDLINE | ID: mdl-32338745

ABSTRACT

As computational biologists continue to be inundated by ever increasing amounts of metagenomic data, the need for data analysis approaches that keep up with the pace of sequence archives has remained a challenge. In recent years, the accelerated pace of genomic data availability has been accompanied by the application of a wide array of highly efficient approaches from other fields to the field of metagenomics. For instance, sketching algorithms such as MinHash have seen a rapid and widespread adoption. These techniques handle increasingly large datasets with minimal sacrifices in quality for tasks such as sequence similarity calculations. Here, we briefly review the fundamentals of the most impactful probabilistic and signal processing algorithms. We also highlight more recent advances to augment previous reviews in these areas that have taken a broader approach. We then explore the application of these techniques to metagenomics, discuss their pros and cons, and speculate on their future directions.


Subject(s)
Algorithms , Metagenomics/methods , Probability , Signal Processing, Computer-Assisted , Humans , Metagenome/genetics
3.
Cytokine ; 125: 154815, 2020 01.
Article in English | MEDLINE | ID: mdl-31476685

ABSTRACT

BACKGROUND: TNF-α, a pro-inflammatory cytokine is one of the major contributors for metabolic syndromes including insulin resistance, obesity, type II diabetes etc. The role of alternative splicing, a post-transcriptional regulation of gene expression on the onset of these syndromes is poorly understood. However, the role of alternative splicing, which more than 95% of all exons in eukaryotic cells undergo in several other diseases including cancer and muscle dystrophy, has been elucidated. In this study we aim to investigate the role of alternative splicing in pathways leading to metabolic syndromes mediated by TNF-α. METHODS: A genome wide transcriptome analysis was carried out using Illumina platform. Results were validated using RT-PCR analysis. Various bioinformatics tools and databases (for example IPA, KEGG, STRING etc) were used for the pathway and interactome analysis. CURRENT FINDINGS: Transcriptome wide analysis revealed that TNF-α treatment in vitro causes a significant change in expression of 228 genes at the level of alternative splicing. Regulation of some of these genes was validated in different cell lines. Pathway analysis showed at least 15% of the alternatively spliced genes fall under the contributory pathways leading to different metabolic syndromes, among which the maximally interconnected genes were transcription regulators. CONCLUSION: These findings suggest that TNF-α.-mediated alternative splicing plays a crucial role in regulating various genes involved in pathways connected to metabolic syndromes.


Subject(s)
Alternative Splicing/genetics , Gene Expression Regulation/drug effects , Gene Expression Regulation/genetics , Metabolic Syndrome/metabolism , Transcriptome/genetics , Tumor Necrosis Factor-alpha/pharmacology , Animals , Cell Line , Computational Biology , Databases, Genetic , Exons , Gene Expression Profiling , Gene Regulatory Networks , High-Throughput Nucleotide Sequencing , Humans , Metabolic Networks and Pathways/drug effects , Metabolic Networks and Pathways/genetics , Metabolic Syndrome/genetics , Mice , Muscle Cells/drug effects , Muscle Cells/metabolism , Muscle, Skeletal/cytology , Muscle, Skeletal/drug effects , Signal Transduction/drug effects , Signal Transduction/genetics , Transcription Factors/genetics , Transcription Factors/metabolism
4.
Genome Biol ; 23(1): 182, 2022 08 29.
Article in English | MEDLINE | ID: mdl-36038949

ABSTRACT

With the arrival of telomere-to-telomere (T2T) assemblies of the human genome comes the computational challenge of efficiently and accurately constructing multiple genome alignments at an unprecedented scale. By identifying nucleotides across genomes which share a common ancestor, multiple genome alignments commonly serve as the bedrock for comparative genomics studies. In this review, we provide an overview of the algorithmic template that most multiple genome alignment methods follow. We also discuss prospective areas of improvement of multiple genome alignment for keeping up with continuously arriving high-quality T2T assembled genomes and for unlocking clinically-relevant insights.


Subject(s)
Genome, Human , Genomics , Genomics/methods , Humans , Nucleotides , Telomere/genetics
5.
Comput Struct Biotechnol J ; 20: 3208-3222, 2022.
Article in English | MEDLINE | ID: mdl-35832621

ABSTRACT

Characterizing metagenomes via kmer-based, database-dependent taxonomic classification has yielded key insights into underlying microbiome dynamics. However, novel approaches are needed to track community dynamics and genomic flux within metagenomes, particularly in response to perturbations. We describe KOMB, a novel method for tracking genome level dynamics within microbiomes. KOMB utilizes K-core decomposition to identify Structural variations (SVs), specifically, population-level Copy Number Variation (CNV) within microbiomes. K-core decomposition partitions the graph into shells containing nodes of induced degree at least K, yielding reduced computational complexity compared to prior approaches. Through validation on a synthetic community, we show that KOMB recovers and profiles repetitive genomic regions in the sample. KOMB is shown to identify functionally-important regions in Human Microbiome Project datasets, and was used to analyze longitudinal data and identify keystone taxa in Fecal Microbiota Transplantation (FMT) samples. In summary, KOMB represents a novel graph-based, taxonomy-oblivious, and reference-free approach for tracking CNV within microbiomes. KOMB is open source and available for download at https://gitlab.com/treangenlab/komb.

6.
Genome Biol ; 23(1): 133, 2022 06 20.
Article in English | MEDLINE | ID: mdl-35725628

ABSTRACT

The COVID-19 pandemic has emphasized the importance of accurate detection of known and emerging pathogens. However, robust characterization of pathogenic sequences remains an open challenge. To address this need we developed SeqScreen, which accurately characterizes short nucleotide sequences using taxonomic and functional labels and a customized set of curated Functions of Sequences of Concern (FunSoCs) specific to microbial pathogenesis. We show our ensemble machine learning model can label protein-coding sequences with FunSoCs with high recall and precision. SeqScreen is a step towards a novel paradigm of functionally informed synthetic DNA screening and pathogen characterization, available for download at www.gitlab.com/treangenlab/seqscreen .


Subject(s)
Machine Learning , Bacteria/genetics , Bacteria/pathogenicity , COVID-19 , Humans , Leukocytes, Mononuclear/virology , Open Reading Frames
7.
Nat Commun ; 13(1): 1728, 2022 04 01.
Article in English | MEDLINE | ID: mdl-35365602

ABSTRACT

Deep Learning (DL) has recently enabled unprecedented advances in one of the grand challenges in computational biology: the half-century-old problem of protein structure prediction. In this paper we discuss recent advances, limitations, and future perspectives of DL on five broad areas: protein structure prediction, protein function prediction, genome engineering, systems biology and data integration, and phylogenetic inference. We discuss each application area and cover the main bottlenecks of DL approaches, such as training data, problem scope, and the ability to leverage existing DL architectures in new contexts. To conclude, we provide a summary of the subject-specific and general challenges for DL across the biosciences.


Subject(s)
Deep Learning , Computational Biology , Phylogeny , Proteins , Systems Biology
8.
F1000Res ; 10: 246, 2021.
Article in English | MEDLINE | ID: mdl-34621504

ABSTRACT

In October 2020, 62 scientists from nine nations worked together remotely in the Second Baylor College of Medicine & DNAnexus hackathon, focusing on different related topics on Structural Variation, Pan-genomes, and SARS-CoV-2 related research.   The overarching focus was to assess the current status of the field and identify the remaining challenges. Furthermore, how to combine the strengths of the different interests to drive research and method development forward. Over the four days, eight groups each designed and developed new open-source methods to improve the identification and analysis of variations among species, including humans and SARS-CoV-2. These included improvements in SV calling, genotyping, annotations and filtering. Together with advancements in benchmarking existing methods. Furthermore, groups focused on the diversity of SARS-CoV-2. Daily discussion summary and methods are available publicly at  https://github.com/collaborativebioinformatics provides valuable insights for both participants and the research community.


Subject(s)
COVID-19 , SARS-CoV-2 , Animals , Genome, Viral , Humans , Vertebrates
9.
Virusdisease ; 31(3): 299-307, 2020 Sep.
Article in English | MEDLINE | ID: mdl-32904896

ABSTRACT

Dengue virus (DENV), the causative agent of dengue fever and severe dengue, exists as four antigenically different serotypes. These serotypes are further classified into genotypes and have varying degrees of pathogenicity. The 5' and 3' ends of the genomic RNA play a critical role in the viral life cycle. A global scale study of the RNA structural variation among the sero- and genotypes was carried out to correlate RNA structure with pathogenicity. We found that the GC rich stem and rigid loop structure of the 5' end of the genomic RNA of DENV 2 differs significantly from the others. The observed variation in base composition and base pairing may confer structural and functional advantage in highly virulent strains. This variation in the structure may influence the ease of cyclization and recruitment of viral RNA polymerase, NS5 RdRp, thereby affecting the pathogenicity of these strains.

10.
J Biosci ; 44(6)2019 Dec.
Article in English | MEDLINE | ID: mdl-31894131

ABSTRACT

Type II diabetes mellitus (T2DM) and obesity are two common pathophysiological conditions of metabolic syndrome (MetS), a collection of similar metabolic dysfunctions due to sedentary lifestyle and overnutrition. Obesity arises from improper adipogenesis which otherwise has a crucial role in maintaining proper metabolic functions. Downstream events arising from obesity have been linked to T2DM. The nuclear receptor peroxisome proliferator activator gamma (PPAR-γ), responsible for maintaining lipid and glucose homeostasis, is down-regulated under obesity leading to a weakened insulin sensitivity of the human body. In course of our review we will outline details of the down-regulation mechanism, provide an overview of the current clinical therapeutics and their shortcomings. Toxicity studies on the seminal drug troglitazone, belonging to the most effective glitazone anti-diabetic category, is also discussed. This will lead to an overview about structural adaptations on the existing glitazones to alleviate their side effects and toxicity. Finally, we forward a concept of novel therapeutics mimicking the glitazone framework, based on some design concepts and preliminary in silico studies. These could be later developed into dual acting drugs towards alleviating the deleterious effects of obesity on normal glucose metabolism, and address obesity in itself.


Subject(s)
Diabetes Mellitus, Type 2/drug therapy , Glucose/metabolism , Metabolic Syndrome/drug therapy , Obesity/drug therapy , Adipogenesis/drug effects , Diabetes Mellitus, Type 2/complications , Humans , Hypoglycemic Agents/therapeutic use , Insulin Resistance/genetics , Metabolic Syndrome/complications , Metabolic Syndrome/genetics , Obesity/complications , Obesity/genetics , PPAR gamma/genetics , Thiazolidinediones/therapeutic use , Troglitazone/therapeutic use
11.
Annu Int Conf IEEE Eng Med Biol Soc ; 2017: 1022-1025, 2017 Jul.
Article in English | MEDLINE | ID: mdl-29060048

ABSTRACT

The ability to interpret unspoken or imagined speech through electroencephalography (EEG) is of therapeutic interest for people suffering from speech disorders and `lockedin' syndrome. It is also useful for brain-computer interface (BCI) techniques not involving articulatory actions. Previous work has involved using particular words in one chosen language and training classifiers to distinguish between them. Such studies have reported accuracies of 40-60% and are not ideal for practical implementation. Furthermore, in today's multilingual society, classifiers trained in one language alone might not always have the desired effect. To address this, we present a novel approach to improve accuracy of the current model by combining bilingual interpretation and decision making. We collect data from 5 subjects with Hindi and English as primary and secondary languages respectively and ask them 20 `Yes'/`No' questions (`Haan'/`Na' in Hindi) in each language. We choose sensors present in regions important to both language processing and decision making. Data is preprocessed, and Principal Component Analysis (PCA) is carried out to reduce dimensionality. This is input to Support Vector Machine (SVM), Random Forest (RF), AdaBoost (AB), and Artificial Neural Networks (ANN) classifiers for prediction. Experimental results reveal best accuracy of 85.20% and 92.18% for decision and language classification respectively using ANN. Overall accuracy of bilingual speech classification is 75.38%.


Subject(s)
Electroencephalography , Speech , Brain-Computer Interfaces , Humans , Principal Component Analysis , Support Vector Machine
SELECTION OF CITATIONS
SEARCH DETAIL