Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 3.010
Filter
1.
Front Mol Biosci ; 11: 1467366, 2024.
Article in English | MEDLINE | ID: mdl-39351155

ABSTRACT

3D cell culture models replicate tissue complexity and aim to study cellular interactions and responses in a more physiologically relevant environment compared to traditional 2D cultures. However, the spherical structure of these models makes it difficult to extract meaningful data, necessitating advanced techniques for proper analysis. In silico simulations enhance research by predicting cellular behaviors and therapeutic responses, providing a powerful tool to complement experimental approaches. Despite their potential, these simulations often require advanced computational skills and significant resources, which creates a barrier for many researchers. To address these challenges, we developed an accessible pipeline using open-source software to facilitate virtual tissue simulations. Our approach employs the Cellular Potts Model, a versatile framework for simulating cellular behaviors in tissues. The simulations are constructed from real world 3D image stacks of cancer spheroids, ensuring that the virtual models are rooted in experimental data. By introducing a new metric for parameter optimization, we enable the creation of realistic simulations without requiring extensive computational expertise. This pipeline benefits researchers wanting to incorporate computational biology into their methods, even if they do not possess extensive expertise in this area. By reducing the technical barriers associated with advanced computational modeling, our pipeline enables more researchers to utilize these powerful tools. Our approach aims to foster a broader use of in silico methods in disease research, contributing to a deeper understanding of disease biology and the refinement of therapeutic interventions.

2.
Elife ; 132024 Oct 09.
Article in English | MEDLINE | ID: mdl-39383060

ABSTRACT

Assay for Transposase-Accessible Chromatin sequencing (ATAC-Seq) is a widely used technique to explore gene regulatory mechanisms. For most ATAC-Seq data from healthy and diseased tissues such as tumors, chromatin accessibility measurement represents a mixed signal from multiple cell types. In this work, we derive reliable chromatin accessibility marker peaks and reference profiles for most non-malignant cell types frequently observed in the microenvironment of human tumors. We then integrate these data into the EPIC deconvolution framework (Racle et al., 2017) to quantify cell-type heterogeneity in bulk ATAC-Seq data. Our EPIC-ATAC tool accurately predicts non-malignant and malignant cell fractions in tumor samples. When applied to a human breast cancer cohort, EPIC-ATAC accurately infers the immune contexture of the main breast cancer subtypes.


Subject(s)
Breast Neoplasms , Chromatin Immunoprecipitation Sequencing , Humans , Breast Neoplasms/genetics , Breast Neoplasms/immunology , Chromatin Immunoprecipitation Sequencing/methods , Tumor Microenvironment , Female , Chromatin/metabolism , Chromatin/genetics , Neoplasms/genetics , Neoplasms/immunology
3.
Elife ; 132024 Oct 10.
Article in English | MEDLINE | ID: mdl-39388235

ABSTRACT

Variant calling is fundamental in bacterial genomics, underpinning the identification of disease transmission clusters, the construction of phylogenetic trees, and antimicrobial resistance detection. This study presents a comprehensive benchmarking of variant calling accuracy in bacterial genomes using Oxford Nanopore Technologies (ONT) sequencing data. We evaluated three ONT basecalling models and both simplex (single-strand) and duplex (dual-strand) read types across 14 diverse bacterial species. Our findings reveal that deep learning-based variant callers, particularly Clair3 and DeepVariant, significantly outperform traditional methods and even exceed the accuracy of Illumina sequencing, especially when applied to ONT's super-high accuracy model. ONT's superior performance is attributed to its ability to overcome Illumina's errors, which often arise from difficulties in aligning reads in repetitive and variant-dense genomic regions. Moreover, the use of high-performing variant callers with ONT's super-high accuracy data mitigates ONT's traditional errors in homopolymers. We also investigated the impact of read depth on variant calling, demonstrating that 10× depth of ONT super-accuracy data can achieve precision and recall comparable to, or better than, full-depth Illumina sequencing. These results underscore the potential of ONT sequencing, combined with advanced variant calling algorithms, to replace traditional short-read sequencing methods in bacterial genomics, particularly in resource-limited settings.


Imagine being part of a public health institution when, suddenly, cases of Salmonella surge across your country. You are facing an outbreak of this foodborne disease, and the clock is ticking. People are consuming a contaminated product that is making them sick; how do you identify related cases, track the source of the infection, and shut down its production? In situations like these, scientists need to tell apart even closely related strains of the same bacterial species. This process, known as variant calling, relies on first analysing (or 'sequencing') the genetic information obtained from the bacteria of interest, then comparing it to a reference genome. Currently, two main approaches are available for genome sequencing. Traditional 'short-read' technologies tend to be more accurate but less reliable when covering certain types of genomic regions. New 'long-read' approaches can bypass these limitations though they have historically been less accurate. Comparison with a reference genome can be performed using a tool known as a variant caller. Many of the most effective ones are now based on artificial intelligence approaches such as deep learning. However, these have primarily been applied to human genomic data so far; it therefore remains unclear whether they are equally useful for bacterial genomes. In response, Hall et al. set out to investigate the accuracy of four deep learning-based and three traditional variant callers on datasets from 14 bacterial species obtained via long-read approaches. Their respective performance was also benchmarked against a more conventional approach representing a standard of accuracy (that is, a popular, non-deep learning variant caller used on short-read datasets). These analyses were performed on a 'truthset' established by Hall et al., a collection of validated data that allowed them to assess the performance of the various tools tested. The results show that, in this context, the deep learning variant callers more accurately detected genetic variations compared to the traditional approach. These tools, which could be run on standard laptops, were effective even with low amounts of sequencing data ­ making them useful even in settings where resources are limited. Variant calling is an essential step in tracking the emergence and spread of disease, identifying new strains of bacteria, and examining their evolution. The findings by Hall et al. should therefore benefit various sectors, particularly clinical and public health laboratories.


Subject(s)
Bacteria , Benchmarking , Deep Learning , Genome, Bacterial , Nanopore Sequencing , Nanopore Sequencing/methods , Bacteria/genetics , Bacteria/classification , Nanopores , High-Throughput Nucleotide Sequencing/methods , Genomics/methods , Genetic Variation
4.
Elife ; 122024 Oct 07.
Article in English | MEDLINE | ID: mdl-39374133

ABSTRACT

Diffusional kurtosis imaging (DKI) is a methodology for measuring the extent of non-Gaussian diffusion in biological tissue, which has shown great promise in clinical diagnosis, treatment planning, and monitoring of many neurological diseases and disorders. However, robust, fast, and accurate estimation of kurtosis from clinically feasible data acquisitions remains a challenge. In this study, we first outline a new accurate approach of estimating mean kurtosis via the sub-diffusion mathematical framework. Crucially, this extension of the conventional DKI overcomes the limitation on the maximum b-value of the latter. Kurtosis and diffusivity can now be simply computed as functions of the sub-diffusion model parameters. Second, we propose a new fast and robust fitting procedure to estimate the sub-diffusion model parameters using two diffusion times without increasing acquisition time as for the conventional DKI. Third, our sub-diffusion-based kurtosis mapping method is evaluated using both simulations and the Connectome 1.0 human brain data. Exquisite tissue contrast is achieved even when the diffusion encoded data is collected in only minutes. In summary, our findings suggest robust, fast, and accurate estimation of mean kurtosis can be realised within a clinically feasible diffusion-weighted magnetic resonance imaging data acquisition time.


Subject(s)
Brain , Diffusion Magnetic Resonance Imaging , Humans , Brain/diagnostic imaging , Diffusion Magnetic Resonance Imaging/methods , Connectome/methods , Image Processing, Computer-Assisted/methods
5.
Front Plant Sci ; 15: 1437118, 2024.
Article in English | MEDLINE | ID: mdl-39372861

ABSTRACT

Introduction: Single-cell RNA-seq (scRNA-seq) technologies have been widely used to reveal the diversity and complexity of cells, and pioneering studies on scRNA-seq in plants began to emerge since 2019. However, existing studies on plants utilized scRNA-seq focused only on the gene expression regulation. As an essential post-transcriptional mechanism for regulating gene expression, alternative polyadenylation (APA) generates diverse mRNA isoforms with distinct 3' ends through the selective use of different polyadenylation sites in a gene. APA plays important roles in regulating multiple developmental processes in plants, such as flowering time and stress response. Methods: In this study, we developed a pipeline to identify and integrate APA sites from different scRNA-seq data and analyze APA dynamics in single cells. First, high-confidence poly(A) sites in single root cells were identified and quantified. Second, three kinds of APA markers were identified for exploring APA dynamics in single cells, including differentially expressed poly(A) sites based on APA site expression, APA markers based on APA usages, and APA switching genes based on 3' UTR (untranslated region) length change. Moreover, cell type annotations of single root cells were refined by integrating both the APA information and the gene expression profile. Results: We comprehensively compiled a single-cell APA atlas from five scRNA-seq studies, covering over 150,000 cells spanning four major tissue branches, twelve cell types, and three developmental stages. Moreover, we quantified the dynamic APA usages in single cells and identified APA markers across tissues and cell types. Further, we integrated complementary information of gene expression and APA profiles to annotate cell types and reveal subtle differences between cell types. Discussion: This study reveals that APA provides an additional layer of information for determining cell identity and provides a landscape of APA dynamics during Arabidopsis root development.

6.
Bioinformation ; 20(7): 700-704, 2024.
Article in English | MEDLINE | ID: mdl-39309552

ABSTRACT

Omics studies use large-scale high-throughput data to explain changes underlying different traits or conditions. However, omics analysis often results in long lists of pathways that are difficult to interpret. Therefore, it is of interest to describe a tool named PAVER (Pathway Analysis Visualization with Embedding Representations) for large scale genomic analysis. PAVER curates similar pathways into groups, identifies the pathway most representative of each group, and provides publication-ready intuitive visualizations. PAVER clusters pathways defined by their vector embedding representations and then identifies the term most cosine similar to its respective cluster's average embedding. PAVER can integrate multiple pathway analyses, highlight relevant biological insights, and work with any pathway database.

7.
Methods ; 2024 Sep 12.
Article in English | MEDLINE | ID: mdl-39276958

ABSTRACT

The metabolic pathway known as gluconeogenesis, which produces glucose from non-carbohydrate substrates, is essential for maintaining balanced blood sugar levels while fasting. It's extremely important to anticipate gluconeogenesis rates accurately to recognize metabolic disorders and create efficient treatment strategies. The implementation of deep learning and machine learning methods to forecast complex biological processes has been gaining popularity in recent years. The recognition of both the regulation of the pathway and possible therapeutic applications of proteins depends on accurate identification associated with their gluconeogenesis patterns. This article analyzes the uses of machine learning and deep learning models, to predict gluconeogenesis efficiency. The study also discusses the challenges that come with restricted data availability and model interpretability, as well as possible applications in personalized healthcare, metabolic disease treatment, and the discovery of drugs. The predictor utilizes statistics moments on the structures of gluconeogenesis and their enzymes, while Random Forest is utilized as a classifier to ensure the accuracy of this model in identifying the best outcomes. The method was validated utilizing the independent test, self-consistency, 10 k fold cross-validations, and jackknife test which achieved 92.33 %, 91.87 %, 87.88 %, and 87.02 %. An accurate prediction of gluconeogenesis has significant implications for understanding metabolic disorders and developing targeted therapies. This study contributes to the rising field of predictive biology by mixing algorithms for deep learning, and machine learning, with metabolic pathways.

8.
Front Immunol ; 15: 1438962, 2024.
Article in English | MEDLINE | ID: mdl-39281674

ABSTRACT

γδ T-cells are a rare population of T-cells with both adaptive and innate-like properties. Despite their low prevalence, they have been found to be implicated various human diseases. γδ T-cell infiltration has been associated with improved clinical outcomes in solid cancers, prompting renewed interest in understanding their biology. To date, their biology remains elusive due to their low prevalence. The introduction of high-resolution single-cell sequencing has allowed various groups to characterize key effector subsets in various contexts, as well as begin to elucidate key regulatory mechanisms directing the differentiation and activity of these cells. In this review, we will review some of insights obtained from single-cell studies of γδ T-cells across various malignancies and highlight some important questions that remain unaddressed.


Subject(s)
Neoplasms , Receptors, Antigen, T-Cell, gamma-delta , Single-Cell Analysis , Humans , Neoplasms/immunology , Single-Cell Analysis/methods , Receptors, Antigen, T-Cell, gamma-delta/metabolism , Receptors, Antigen, T-Cell, gamma-delta/immunology , T-Lymphocyte Subsets/immunology , T-Lymphocyte Subsets/metabolism , Animals , Lymphocytes, Tumor-Infiltrating/immunology , Lymphocytes, Tumor-Infiltrating/metabolism , Tumor Microenvironment/immunology , T-Lymphocytes/immunology
9.
Brief Bioinform ; 25(5)2024 Jul 25.
Article in English | MEDLINE | ID: mdl-39297879

ABSTRACT

Structural variation (SV) refers to insertions, deletions, inversions, and duplications in human genomes. SVs are present in approximately 1.5% of the human genome. Still, this small subset of genetic variation has been implicated in the pathogenesis of psoriasis, Crohn's disease and other autoimmune disorders, autism spectrum and other neurodevelopmental disorders, and schizophrenia. Since identifying structural variants is an important problem in genetics, several specialized computational techniques have been developed to detect structural variants directly from sequencing data. With advances in whole-genome sequencing (WGS) technologies, a plethora of SV detection methods have been developed. However, dissecting SVs from WGS data remains a challenge, with the majority of SV detection methods prone to a high false-positive rate, and no existing method able to precisely detect a full range of SVs present in a sample. Previous studies have shown that none of the existing SV callers can maintain high accuracy across various SV lengths and genomic coverages. Here, we report an integrated structural variant calling framework, Variant Identification and Structural Variant Analysis (VISTA), that leverages the results of individual callers using a novel and robust filtering and merging algorithm. In contrast to existing consensus-based tools which ignore the length and coverage, VISTA overcomes this limitation by executing various combinations of top-performing callers based on variant length and genomic coverage to generate SV events with high accuracy. We evaluated the performance of VISTA on comprehensive gold-standard datasets across varying organisms and coverage. We benchmarked VISTA using the Genome-in-a-Bottle gold standard SV set, haplotype-resolved de novo assemblies from the Human Pangenome Reference Consortium, along with an in-house polymerase chain reaction (PCR)-validated mouse gold standard set. VISTA maintained the highest F1 score among top consensus-based tools measured using a comprehensive gold standard across both mouse and human genomes. VISTA also has an optimized mode, where the calls can be optimized for precision or recall. VISTA-optimized can attain 100% precision and the highest sensitivity among other variant callers. In conclusion, VISTA represents a significant advancement in structural variant calling, offering a robust and accurate framework that outperforms existing consensus-based tools and sets a new standard for SV detection in genomic research.


Subject(s)
Genome, Human , Genomic Structural Variation , Software , Humans , Whole Genome Sequencing/methods , Algorithms , Genomics/methods , Computational Biology/methods , Genetic Variation
10.
BMC Oral Health ; 24(1): 1154, 2024 Sep 29.
Article in English | MEDLINE | ID: mdl-39343890

ABSTRACT

BACKGROUND: The exact cause of recurrent aphthous stomatitis is still unknown, making it a challenge to develop effective treatments. This study employs computational biology to investigate the molecular basis of recurrent aphthous stomatitis, aiming to identify the nature of the stimuli triggering these ulcers and the type of cell death involved. METHODS: To understand the molecular underpinnings of recurrent aphthous stomatitis, we used the Génie tool for gene identification, targeting those associated with cell death in recurrent aphthous stomatitis. The ToppGene Suite was employed for functional enrichment analysis. We also used Reactome and InteractiVenn for protein integration and prioritization against a PANoptosis gene list, enabling the construction of a protein-protein interaction network to pinpoint key proteins in recurrent aphthous stomatitis pathogenesis. RESULTS: The study's computational approach identified 1,375 protein-coding genes linked to recurrent aphthous stomatitis. Critical among these were proteins responsive to bacterial stimuli, especially high mobility group protein B1 (HMGB1), toll-like receptor 2 (TLR2), and toll-like receptor 4 (TLR4). The enrichment analysis suggested an external biotic factor, likely bacterial, as a triggering agent in recurrent aphthous stomatitis. The protein interaction network highlighted the roles of tumor necrosis factor (TNF), NF-kappa-B essential modulator (IKBKG), and tumor necrosis factor receptor superfamily member 1A (TNFRSF1A), indicating an immunogenic cell death mechanism, potentially PANoptosis, in recurrent aphthous stomatitis. CONCLUSION: The findings propose that bacterial stimuli could trigger recurrent aphthous stomatitis through a PANoptosis-related cell death pathway. This new understanding of recurrent aphthous stomatitis pathogenesis underscores the significance of oral microbiota in the condition. Future experimental validation and therapeutic strategy development based on these findings are necessary.


Subject(s)
Computational Biology , Stomatitis, Aphthous , Stomatitis, Aphthous/immunology , Stomatitis, Aphthous/genetics , Humans , HMGB1 Protein/metabolism , HMGB1 Protein/genetics , Toll-Like Receptor 2 , Immunogenic Cell Death , Protein Interaction Maps/genetics , Toll-Like Receptor 4/metabolism
11.
Elife ; 132024 Sep 02.
Article in English | MEDLINE | ID: mdl-39221782

ABSTRACT

The initially homogeneous epithelium of the early Drosophila embryo differentiates into regional subpopulations with different behaviours and physical properties that are needed for morphogenesis. The factors at top of the genetic hierarchy that control these behaviours are known, but many of their targets are not. To understand how proteins work together to mediate differential cellular activities, we studied in an unbiased manner the proteomes and phosphoproteomes of the three main cell populations along the dorso-ventral axis during gastrulation using mutant embryos that represent the different populations. We detected 6111 protein groups and 6259 phosphosites of which 3398 and 3433 were differentially regulated, respectively. The changes in phosphosite abundance did not correlate with changes in host protein abundance, showing phosphorylation to be a regulatory step during gastrulation. Hierarchical clustering of protein groups and phosphosites identified clusters that contain known fate determinants such as Doc1, Sog, Snail, and Twist. The recovery of the appropriate known marker proteins in each of the different mutants we used validated the approach, but also revealed that two mutations that both interfere with the dorsal fate pathway, Toll10B and serpin27aex do this in very different manners. Diffused network analyses within each cluster point to microtubule components as one of the main groups of regulated proteins. Functional studies on the role of microtubules provide the proof of principle that microtubules have different functions in different domains along the DV axis of the embryo.


Subject(s)
Drosophila Proteins , Phosphoproteins , Proteome , Animals , Proteome/metabolism , Phosphoproteins/metabolism , Phosphoproteins/genetics , Drosophila Proteins/metabolism , Drosophila Proteins/genetics , Gene Expression Regulation, Developmental , Embryo, Nonmammalian/metabolism , Drosophila/embryology , Drosophila/metabolism , Drosophila/genetics , Drosophila melanogaster/metabolism , Drosophila melanogaster/embryology , Drosophila melanogaster/genetics , Phosphorylation , Gastrulation , Body Patterning/genetics
12.
ACS Chem Neurosci ; 15(19): 3543-3562, 2024 Oct 02.
Article in English | MEDLINE | ID: mdl-39302203

ABSTRACT

Seven treatments are approved for Alzheimer's disease, but five of them only relieve symptoms and do not alter the course of the disease. Aducanumab (Adu) and lecanemab are novel disease-modifying antiamyloid-ß (Aß) human monoclonal antibodies that specifically target the pathophysiology of Alzheimer's disease (AD) and were recently approved for its treatment. However, their administration is associated with serious side effects, and their use is limited to early stages of the disease. Therefore, drug discovery remains of great importance in AD research. To gain new insights into the development of novel drugs for Alzheimer's disease, a combination of techniques was employed, including mutation screening, molecular dynamics, and quantum biochemistry. These were used to outline the interfacial interactions of the Aducanumab::Aß2-7 complex. Our analysis identified critical stabilizing contacts, revealing up to 40% variation in the affinity of the Adu chains for Aß2-7 depending on the conformation outlined. Remarkably, two complementarity determining regions (CDRs) of the Adu heavy chain (HCDR3 and HCDR2) and one CDR of the Adu light chain (LCDR3) accounted for approximately 77% of the affinity of Adu for Aß2-7, confirming their critical role in epitope recognition. A single mutation, originally reported to have the potential to increase the affinity of Adu for Aß2-7, was shown to decrease its structural stability without increasing the overall binding affinity. Mimetic peptides that have the potential to inhibit Aß aggregation were designed by using computational outcomes. Our results support the use of these peptides as promising drugs with great potential as inhibitors of Aß aggregation.


Subject(s)
Alzheimer Disease , Amyloid beta-Peptides , Antibodies, Monoclonal, Humanized , Immunotherapy , Molecular Dynamics Simulation , Mutation , Alzheimer Disease/drug therapy , Alzheimer Disease/metabolism , Alzheimer Disease/genetics , Humans , Antibodies, Monoclonal, Humanized/pharmacology , Amyloid beta-Peptides/metabolism , Immunotherapy/methods , Peptide Fragments/metabolism , Drug Design , Drug Development/methods
13.
Protein Sci ; 33(10): e5180, 2024 Oct.
Article in English | MEDLINE | ID: mdl-39324697

ABSTRACT

Aggrescan4D (A4D) is an advanced computational tool designed for predicting protein aggregation, leveraging structural information and the influence of pH. Building upon its predecessor, Aggrescan3D (A3D), A4D has undergone numerous enhancements aimed at assisting the improvement of protein solubility. This manuscript reviews A4D's updated functionalities and explains the fundamental principles behind its pH-dependent calculations. Additionally, it presents an antibody case study to evaluate its performance in comparison with other structure-based predictors. Notably, A4D integrates advanced protein engineering protocols with pH-dependent calculations, enhancing its utility in advising solubility-enhancing mutations. A4D considers the impact of structural flexibility on aggregation propensities, and includes a large set of precalculated predictions. These capabilities should help to open new avenues for both understanding and managing protein aggregation. A4D is accessible through a dedicated web server at https://biocomp.chem.uw.edu.pl/a4d/.


Subject(s)
Protein Aggregates , Protein Engineering , Hydrogen-Ion Concentration , Protein Engineering/methods , Software , Proteins/chemistry , Proteins/genetics , Proteins/metabolism , Solubility
14.
Elife ; 132024 Sep 06.
Article in English | MEDLINE | ID: mdl-39240985

ABSTRACT

Mass cytometry is a cutting-edge high-dimensional technology for profiling marker expression at the single-cell level, advancing clinical research in immune monitoring. Nevertheless, the vast data generated by cytometry by time-of-flight (CyTOF) poses a significant analytical challenge. To address this, we describe ImmCellTyper (https://github.com/JingAnyaSun/ImmCellTyper), a novel toolkit for CyTOF data analysis. This framework incorporates BinaryClust, an in-house developed semi-supervised clustering tool that automatically identifies main cell types. BinaryClust outperforms existing clustering tools in accuracy and speed, as shown in benchmarks with two datasets of approximately 4 million cells, matching the precision of manual gating by human experts. Furthermore, ImmCellTyper offers various visualisation and analytical tools, spanning from quality control to differential analysis, tailored to users' specific needs for a comprehensive CyTOF data analysis solution. The workflow includes five key steps: (1) batch effect evaluation and correction, (2) data quality control and pre-processing, (3) main cell lineage characterisation and quantification, (4) in-depth investigation of specific cell types; and (5) differential analysis of cell abundance and functional marker expression across study groups. Overall, ImmCellTyper combines expert biological knowledge in a semi-supervised approach to accurately deconvolute well-defined main cell lineages, while maintaining the potential of unsupervised methods to discover novel cell subsets, thus facilitating high-dimensional immune profiling.


Subject(s)
Data Analysis , Flow Cytometry , Single-Cell Analysis , Humans , Flow Cytometry/methods , Single-Cell Analysis/methods , Software , Cluster Analysis
15.
Proc (IEEE Int Conf Healthc Inform) ; 2024: 93-102, 2024 Jun.
Article in English | MEDLINE | ID: mdl-39308639

ABSTRACT

A problem extension of the longest common substring (LCS) between two texts is the enumeration of all LCSs given a minimum length k (ALCS- k ), along with their positions in each text. In bioinformatics, an efficient solution to the ALCS- k for very long texts -genomes or metagenomes- can provide useful insights to discover genetic signatures responsible for biological mechanisms. The ALCS- k problem has two additional requirements compared to the LCS problem: one is the minimum length k , and the other is that all common strings longer than k must be reported. We present an efficient, two-stage ALCS- k algorithm exploiting the spectrum of text substrings of length k ( k -mers). Our approach yields a worst-case time complexity loglinear in the number of k -mers for the first stage, and an average-case loglinear in the number of common k -mers for the second stage (several orders of magnitudes smaller than the total k -mer spectrum). The space complexity is linear in the first phase (disk-based), and on average linear in the second phase (disk- and memory-based). Tests performed on genomes for different organisms (including viruses, bacteria and animal chromosomes) show that run times are consistent with our theoretical estimates; further, comparisons with MUMmer4 show an asymptotic advantage with divergent genomes.

16.
Plant Mol Biol ; 114(5): 106, 2024 Sep 24.
Article in English | MEDLINE | ID: mdl-39316155

ABSTRACT

Photosynthetic proteins play a crucial role in agricultural productivity by harnessing light energy for plant growth. Understanding these proteins, especially within C3 and C4 pathways, holds promise for improving crops in challenging environments. Despite existing models, a comprehensive computational framework specifically targeting plant photosynthetic proteins is lacking. The underutilization of plant datasets in computational algorithms accentuates the gap this study aims to fill by introducing a novel sequence-based computational method for identifying these proteins. The scope of this study encompassed diverse plant species, ensuring comprehensive representation across C3 and C4 pathways. Utilizing six deep learning models and seven shallow learning algorithms, paired with six sequence-derived feature sets followed by feature selection strategy, this study developed a comprehensive model for prediction of plant-specific photosynthetic proteins. Following 5-fold cross-validation analysis, LightGBM with 65 and 90 LGBM-VIM selected features respectively emerged as the best models for C3 (auROC: 91.78%, auPRC: 92.55%) and C4 (auROC: 99.05%, auPRC: 99.18%) plants. Validation using an independent dataset confirmed the robustness of the proposed model for both C3 (auROC: 87.23%, auPRC: 88.40%) and C4 (auROC: 92.83%, auPRC: 92.29%) categories. Comparison with existing methods demonstrated the superiority of the proposed model in predicting plant-specific photosynthetic proteins. This study further established a free online prediction server PredPSP ( https://iasri-sg.icar.gov.in/predpsp/ ) to facilitate ongoing efforts for identifying photosynthetic proteins in C3 and C4 plants. Being first of its kind, this study offers valuable insights into predicting plant-specific photosynthetic proteins which holds significant implications for plant biology.


Subject(s)
Computational Biology , Photosynthesis , Plant Proteins , Plant Proteins/metabolism , Plant Proteins/genetics , Computational Biology/methods , Plants/metabolism , Algorithms
17.
J Cardiovasc Dev Dis ; 11(9)2024 Sep 04.
Article in English | MEDLINE | ID: mdl-39330331

ABSTRACT

Heart disease continues to be one of the most fatal conditions worldwide. This is in part due to the maladaptive remodeling process by which ischemic cardiac tissue is replaced with a fibrotic scar. Direct cardiac reprogramming presents a unique solution for restoring injured cardiac tissue through the direct conversion of fibroblasts into induced cardiomyocytes, bypassing the transition through a pluripotent state. Since its inception in 2010, direct cardiac reprogramming using the transcription factors Gata4, Mef2c, and Tbx5 has revolutionized the field of cardiac regenerative medicine. Just over a decade later, the field has rapidly evolved through the expansion of identified molecular and genetic factors that can be used to optimize reprogramming efficiency. The integration of computational tools into the study of direct cardiac reprogramming has been critical to this progress. Advancements in transcriptomics, epigenetics, proteomics, genome editing, and machine learning have not only enhanced our understanding of the underlying mechanisms driving this cell fate transition, but have also driven innovations that push direct cardiac reprogramming closer to clinical application. This review article explores how these computational advancements have impacted and continue to shape the field of direct cardiac reprogramming.

18.
World Allergy Organ J ; 17(10): 100964, 2024 Oct.
Article in English | MEDLINE | ID: mdl-39328210

ABSTRACT

Background: Chronic rhinosinusitis with nasal polyps (CRSwNP) is a prevalent inflammatory disorder affecting the upper respiratory tract. Recent studies have indicated an association between CRSwNP and mitochondrial metabolic disorder characterized by impaired metabolic pathways; however, the precise mechanisms remain unclear. This study aims to investigate the mitochondrial-related signature in individuals diagnosed with CRSwNP. Methods: Through the integration of differentially expressed genes (DEGs) with the mitochondrial gene set, differentially expressed mitochondrial-related genes (DEMRGs) were identified. Subsequently, the hub DEMRGs were selected using 4 integrated machine learning algorithms. Immune and mitochondrial characteristics were estimated based on CIBERSORT and ssGSEA algorithms. Bioinformatic findings were confirmed through RT-qPCR, immunohistochemistry, and ELISA for nasal tissues, as well as Western blotting analysis for human nasal epithelial cells (hNECs). The relationship between hub DEMRGs and disease severity was assessed using Spearman correlation analysis. Results: A total of 24 DEMRGs were screened, most of which exhibited lower expression levels in CRSwNP samples. Five hub DEMRGs (ALDH1L1, BCKDHB, CBR3, HMGCS2, and OXR1) were consistently downregulated in both the discovery and validation cohorts. The hub genes showed a high diagnostic performance and were positively correlated with the infiltration of M2 macrophages and resting mast cells. Experimental results confirmed that the 5 genes were downregulated at both the mRNA and protein levels within nasal polyp tissues. Finally, a significant and inverse relationship was identified between the expression levels of these genes and both the Lund-Mackay and Lund-Kennedy scores. Conclusion: Our findings systematically unraveled 5 hub markers correlated with mitochondrial metabolism and immune cell infiltration in CRSwNP, suggesting their potential to be based to design diagnostic and therapeutic strategies for the disease.

19.
Best Pract Res Clin Rheumatol ; : 102006, 2024 Sep 26.
Article in English | MEDLINE | ID: mdl-39332994

ABSTRACT

Technological advances and high-throughput bio-chemical assays are rapidly changing ways how we formulate and test biological hypotheses, and how we treat patients. Most complex diseases arise on a background of genetics, lifestyle and environment factors, and manifest themselves as a spectrum of symptoms. To fathom intricate biological processes and their changes from healthy to disease states, we need to systematically integrate and analyze multi-omics datasets, ontologies, and diverse annotations. Without proper management of such complex biological and clinical data, artificial intelligence (AI) algorithms alone cannot be effectively trained, validated, and successfully applied to provide trustworthy and patient-centric diagnosis, prognosis and treatment. Precision medicine requires to use multi-omics approaches effectively, and offers many opportunities for using AI, "big data" analytics, and integrative computational biology workflows. Advances in optical and biochemical assay technologies including sequencing, mass spectrometry and imaging modalities have transformed research by empowering us to simultaneously view all genes expressed, identify proteome-wide changes, and assess interacting partners of each individual protein within a dynamically changing biological system, at an individual cell level. While such views are already having an impact on our understanding of healthy and disease conditions, it remains challenging to extract useful information comprehensively and systematically from individual studies, ensure that signal is separated from noise, develop models, and provide hypotheses for further research. Data remain incomplete and are often poorly connected using fragmented biological networks. In addition, statistical and machine learning models are developed at a cohort level and often not validated at the individual patient level. Combining integrative computational biology and AI has the potential to improve understanding and treatment of diseases by identifying biomarkers and building explainable models characterizing individual patients. From systematic data analysis to more specific diagnostic, prognostic and predictive biomarkers, drug mechanism of action, and patient selection, such analyses influence multiple steps from prevention to disease characterization, and from prognosis to drug discovery. Data mining, machine learning, graph theory and advanced visualization may help identify diagnostic, prognostic and predictive biomarkers, and create causal models of disease. Intertwining computational prediction and modeling with biological experiments leads to faster, more biologically and clinically relevant discoveries. However, computational analysis results and models are going to be only as accurate and useful as correct and comprehensive are the networks, ontologies and datasets used to build them. High quality, curated data portals provide the necessary foundation for translational research. They help to identify better biomarkers, new drugs, precision treatments, and should lead to improved patient outcomes and their quality of life. Intertwining computational prediction and modeling with biological experiments, efficiently and effectively leads to more useful findings faster.

20.
Viruses ; 16(9)2024 Sep 06.
Article in English | MEDLINE | ID: mdl-39339901

ABSTRACT

Computer-aided analysis of proteins or nucleic acids seems like a matter of course nowadays; however, the history of Bioinformatics and Computational Biology is quite recent. The advent of high-throughput sequencing has led to the production of "big data", which has also affected the field of virology. The collaboration between the communities of bioinformaticians and virologists already started a few decades ago and it was strongly enhanced by the recent SARS-CoV-2 pandemics. In this article, which is the first in a series on how bioinformatics can enhance virus research, we show that highly useful information is retrievable from selected general and dedicated databases. Indeed, an enormous amount of information-both in terms of nucleotide/protein sequences and their annotation-is deposited in the general databases of international organisations participating in the International Nucleotide Sequence Database Collaboration (INSDC). However, more and more virus-specific databases have been established and are progressively enriched with the contents and features reported in this article. Since viruses are intracellular obligate parasites, a special focus is given to host-pathogen protein-protein interaction databases. Finally, we illustrate several phylogenetic and phylodynamic tools, combining information on algorithms and features with practical information on how to use them and case studies that validate their usefulness. Databases and tools for functional inference will be covered in the next article of this series: Bioinformatics goes viral: II. Sequence-based and structure-based functional analyses for boosting virus research.


Subject(s)
Computational Biology , Phylogeny , Computational Biology/methods , Humans , Viruses/genetics , Viruses/classification , SARS-CoV-2/genetics , SARS-CoV-2/classification , Genome, Viral , COVID-19/virology , COVID-19/epidemiology , Databases, Genetic , High-Throughput Nucleotide Sequencing
SELECTION OF CITATIONS
SEARCH DETAIL