Your browser doesn't support javascript.
loading
: 20 | 50 | 100
1 - 20 de 54
1.
Comput Struct Biotechnol J ; 23: 1886-1896, 2024 Dec.
Article En | MEDLINE | ID: mdl-38721585

Recent advances in single-cell omics technology have transformed the landscape of cellular and molecular research, enriching the scope and intricacy of cellular characterisation. Perturbation modelling seeks to comprehensively grasp the effects of external influences like disease onset or molecular knock-outs or external stimulants on cellular physiology, specifically on transcription factors, signal transducers, biological pathways, and dynamic cell states. Machine and deep learning tools transform complex perturbational phenomena in algorithmically tractable tasks to formulate predictions based on various types of single-cell datasets. However, the recent surge in tools and datasets makes it challenging for experimental biologists and computational scientists to keep track of the recent advances in this rapidly expanding filed of single-cell modelling. Here, we recapitulate the main objectives of perturbation modelling and summarise novel single-cell perturbation technologies based on genetic manipulation like CRISPR or compounds, spanning across omic modalities. We then concisely review a burgeoning group of computational methods extending from classical statistical inference methodologies to various machine and deep learning architectures like shallow models or autoencoders, to biologically informed approaches based on gene regulatory networks, and to combinatorial efforts reminiscent of ensemble learning. We also discuss the rising trend of large foundational models in single-cell perturbation modelling inspired by large language models. Lastly, we critically assess the challenges that underline single-cell perturbation modelling while pointing towards relevant future perspectives like perturbation atlases, multi-omics and spatial datasets, causal machine learning for interpretability, multi-task learning for performance and explainability as well as prospects for solving interoperability and benchmarking pitfalls.

2.
Microb Genom ; 10(5)2024 May.
Article En | MEDLINE | ID: mdl-38785221

Wastewater-based surveillance (WBS) is an important epidemiological and public health tool for tracking pathogens across the scale of a building, neighbourhood, city, or region. WBS gained widespread adoption globally during the SARS-CoV-2 pandemic for estimating community infection levels by qPCR. Sequencing pathogen genes or genomes from wastewater adds information about pathogen genetic diversity, which can be used to identify viral lineages (including variants of concern) that are circulating in a local population. Capturing the genetic diversity by WBS sequencing is not trivial, as wastewater samples often contain a diverse mixture of viral lineages with real mutations and sequencing errors, which must be deconvoluted computationally from short sequencing reads. In this study we assess nine different computational tools that have recently been developed to address this challenge. We simulated 100 wastewater sequence samples consisting of SARS-CoV-2 BA.1, BA.2, and Delta lineages, in various mixtures, as well as a Delta-Omicron recombinant and a synthetic 'novel' lineage. Most tools performed well in identifying the true lineages present and estimating their relative abundances and were generally robust to variation in sequencing depth and read length. While many tools identified lineages present down to 1 % frequency, results were more reliable above a 5 % threshold. The presence of an unknown synthetic lineage, which represents an unclassified SARS-CoV-2 lineage, increases the error in relative abundance estimates of other lineages, but the magnitude of this effect was small for most tools. The tools also varied in how they labelled novel synthetic lineages and recombinants. While our simulated dataset represents just one of many possible use cases for these methods, we hope it helps users understand potential sources of error or bias in wastewater sequencing analysis and to appreciate the commonalities and differences across methods.


COVID-19 , Genome, Viral , SARS-CoV-2 , Wastewater , Wastewater/virology , SARS-CoV-2/genetics , SARS-CoV-2/classification , COVID-19/virology , COVID-19/epidemiology , Humans , Computational Biology/methods , Genomics/methods , Wastewater-Based Epidemiological Monitoring , Phylogeny
3.
Comput Biol Med ; 177: 108632, 2024 May 21.
Article En | MEDLINE | ID: mdl-38788373

Machine Learning (ML) and Artificial Intelligence (AI) have become an integral part of the drug discovery and development value chain. Many teams in the pharmaceutical industry nevertheless report the challenges associated with the timely, cost effective and meaningful delivery of ML and AI powered solutions for their scientists. We sought to better understand what these challenges were and how to overcome them by performing an industry wide assessment of the practices in AI and Machine Learning. Here we report results of the systematic business analysis of the personas in the modern pharmaceutical discovery enterprise in relation to their work with the AI and ML technologies. We identify 23 common business problems that individuals in these roles face when they encounter AI and ML technologies at work, and describe best practices (Good Machine Learning Practices) that address these issues.

4.
Leukemia ; 2024 Apr 04.
Article En | MEDLINE | ID: mdl-38575671

The NFKBIE gene, which encodes the NF-κB inhibitor IκBε, is mutated in 3-7% of patients with chronic lymphocytic leukemia (CLL). The most recurrent alteration is a 4-bp frameshift deletion associated with NF-κB activation in leukemic B cells and poor clinical outcome. To study the functional consequences of NFKBIE gene inactivation, both in vitro and in vivo, we engineered CLL B cells and CLL-prone mice to stably down-regulate NFKBIE expression and investigated its role in controlling NF-κB activity and disease expansion. We found that IκBε loss leads to NF-κB pathway activation and promotes both migration and proliferation of CLL cells in a dose-dependent manner. Importantly, NFKBIE inactivation was sufficient to induce a more rapid expansion of the CLL clone in lymphoid organs and contributed to the development of an aggressive disease with a shortened survival in both xenografts and genetically modified mice. IκBε deficiency was associated with an alteration of the MAPK pathway, also confirmed by RNA-sequencing in NFKBIE-mutated patient samples, and resistance to the BTK inhibitor ibrutinib. In summary, our work underscores the multimodal relevance of the NF-κB pathway in CLL and paves the way to translate these findings into novel therapeutic options.

5.
Front Mol Neurosci ; 16: 1280546, 2023.
Article En | MEDLINE | ID: mdl-38125008

Spinocerebellar ataxia type 1 (SCA1) is an autosomal dominant neurodegenerative disease caused by a trinucleotide (CAG) repeat expansion in the ATXN1 gene. It is characterized by the presence of polyglutamine (polyQ) intranuclear inclusion bodies (IIBs) within affected neurons. In order to investigate the impact of polyQ IIBs in SCA1 pathogenesis, we generated a novel protein aggregation model by inducible overexpression of the mutant ATXN1(Q82) isoform in human neuroblastoma SH-SY5Y cells. Moreover, we developed a simple and reproducible protocol for the efficient isolation of insoluble IIBs. Biophysical characterization showed that polyQ IIBs are enriched in RNA molecules which were further identified by next-generation sequencing. Finally, a protein interaction network analysis indicated that sequestration of essential RNA transcripts within ATXN1(Q82) IIBs may affect the ribosome resulting in error-prone protein synthesis and global proteome instability. These findings provide novel insights into the molecular pathogenesis of SCA1, highlighting the role of polyQ IIBs and their impact on critical cellular processes.

6.
Front Microbiol ; 14: 1292230, 2023.
Article En | MEDLINE | ID: mdl-38098662

Increasing evidence supports a role for the vaginal microbiome (VM) in the severity of HPV infection and its potential link to cervical intraepithelial neoplasia. However, a lot remains unclear regarding the precise role of certain bacteria in the context of HPV positivity and persistence of infection. Here, using next generation sequencing (NGS), we comprehensively profiled the VM in a series of 877 women who tested positive for at least one high risk HPV (hrHPV) type with the COBAS® 4,800 assay, after self-collection of a cervico-vaginal sample. Starting from gDNA, we PCR amplified the V3-V4 region of the bacterial 16S rRNA gene and applied a paired-end NGS protocol (Illumina). We report significant differences in the abundance of certain bacteria compared among different HPV-types, more particularly concerning species assigned to Lacticaseibacillus, Megasphaera and Sneathia genera. Especially for Lacticaseibacillus, we observed significant depletion in the case of HPV16, HPV18 versus hrHPVother. Overall, our results suggest that the presence or absence of specific cervicovaginal microbial genera may be linked to the observed severity in hrHPV infection, particularly in the case of HPV16, 18 types.

7.
Front Bioinform ; 3: 1275593, 2023.
Article En | MEDLINE | ID: mdl-38025398

Background: Automating data analysis pipelines is a key requirement to ensure reproducibility of results, especially when dealing with large volumes of data. Here we assembled automated pipelines for the analysis of High-throughput Sequencing (HTS) data originating from RNA-Seq, ChIP-Seq and Germline variant calling experiments. We implemented these workflows in Common workflow language (CWL) and evaluated their performance by: i) reproducing the results of two previously published studies on Chronic Lymphocytic Leukemia (CLL), and ii) analyzing whole genome sequencing data from four Genome in a Bottle Consortium (GIAB) samples, comparing the detected variants against their respective golden standard truth sets. Findings: We demonstrated that CWL-implemented workflows clearly achieved high accuracy in reproducing previously published results, discovering significant biomarkers and detecting germline SNP and small INDEL variants. Conclusion: CWL pipelines are characterized by reproducibility and reusability; combined with containerization, they provide the ability to overcome issues of software incompatibility and laborious configuration requirements. In addition, they are flexible and can be used immediately or adapted to the specific needs of an experiment or study. The CWL-based workflows developed in this study, along with version information for all software tools, are publicly available on GitHub (https://github.com/BiodataAnalysisGroup/CWL_HTS_pipelines) under the MIT License. They are suitable for the analysis of short-read (such as Illumina-based) data and constitute an open resource that can facilitate automation, reproducibility and cross-platform compatibility for standard bioinformatic analyses.

9.
J Immunol ; 211(5): 743-754, 2023 09 01.
Article En | MEDLINE | ID: mdl-37466373

Subset #201 is a clinically indolent subgroup of patients with chronic lymphocytic leukemia defined by the expression of stereotyped, mutated IGHV4-34/IGLV1-44 BCR Ig. Subset #201 is characterized by recurrent somatic hypermutations (SHMs) that frequently lead to the creation and/or disruption of N-glycosylation sites within the Ig H and L chain variable domains. To understand the relevance of this observation, using next-generation sequencing, we studied how SHM shapes the subclonal architecture of the BCR Ig repertoire in subset #201, particularly focusing on changes in N-glycosylation sites. Moreover, we profiled the Ag reactivity of the clonotypic BCR Ig expressed as rmAbs. We found that almost all analyzed cases from subset #201 carry SHMs potentially affecting N-glycosylation at the clonal and/or subclonal level and obtained evidence for N-glycan occupancy in SHM-induced novel N-glycosylation sites. These particular SHMs impact (auto)antigen recognition, as indicated by differences in Ag reactivity between the authentic rmAbs and germline revertants of SHMs introducing novel N-glycosylation sites in experiments entailing 1) flow cytometry for binding to viable cells, 2) immunohistochemistry against various human tissues, 3) ELISA against microbial Ags, and 4) protein microarrays testing reactivity against multiple autoantigens. On these grounds, N-glycosylation appears as relevant for the natural history of at least a fraction of Ig-mutated chronic lymphocytic leukemia. Moreover, subset #201 emerges as a paradigmatic case for the role of affinity maturation in the evolution of Ag reactivity of the clonotypic BCR Ig.


Leukemia, Lymphocytic, Chronic, B-Cell , Humans , Receptors, Antigen, B-Cell/genetics , Receptors, Antigen, B-Cell/metabolism , Glycosylation , Antigens/metabolism
11.
Mediterr J Rheumatol ; 34(1): 117-120, 2023 Mar.
Article En | MEDLINE | ID: mdl-37223601

Background: Age-associated B cells (ABCs) constitute a B cell subset, defined as CD19+CD21-CD11c+, that expands continuously with age and accumulates strongly in individuals with autoimmune and/or infectious diseases. In humans, ABCs are principally IgD-CD27- double-negative (DN) B cells. Data from murine models of autoimmunity, implicate ABCs/DN in the development of autoimmune disorders. T-bet, a transcription factor which is highly expressed in these cells, is considered to play a major role in various aspects of autoimmunity, such as the production of autoantibodies and the formation of spontaneous germinal centres. Aims of the study: Despite the available data, the functional features of ABCs/DN and their exact role in the pathogenesis of autoimmunity remain elusive. This project focuses on the investigation of the role of ABCs/DN in the pathogenesis of systemic lupus erythematosus (SLE) in humans, as well as the effects that various pharmacological agents may have on these cells. Methods: Samples from patients with active SLE will be used to enumerate and immunophenotype - via flow cytometry - the ABCs/DN found in the peripheral blood of the patients. Transcriptomic analysis and functional assays for the cells, both before and after in vitro pharmacological treatments, will also be performed. Anticipated benefits: The results of the study are expected to allow characterization of the pathogenetic role of ABCs/DN in SLE and could probably contribute, following careful association with the clinical state of the patients, towards the discovery and validation of novel prognostic and diagnostic markers of disease.

12.
Blood ; 141(24): 2955-2960, 2023 06 15.
Article En | MEDLINE | ID: mdl-36989492

The chromatin activation landscape of chronic lymphocytic leukemia (CLL) with stereotyped B-cell receptor immunoglobulin is currently unknown. In this study, we report the results of a whole-genome chromatin profiling of histone 3 lysine 27 acetylation of 22 CLLs from major subsets, which were compared against nonstereotyped CLLs and normal B-cell subpopulations. Although subsets 1, 2, and 4 did not differ much from their nonstereotyped CLL counterparts, subset 8 displayed a remarkably distinct chromatin activation profile. In particular, we identified 209 de novo active regulatory elements in this subset, which showed similar patterns with U-CLLs undergoing Richter transformation. These regions were enriched for binding sites of 9 overexpressed transcription factors. In 78 of 209 regions, we identified 113 candidate overexpressed target genes, 11 regions being associated with more than 2 adjacent genes. These included blocks of up to 7 genes, suggesting local coupregulation within the same genome compartment. Our findings further underscore the uniqueness of subset 8 CLL, notable for the highest risk of Richter's transformation among all CLLs and provide additional clues to decipher the molecular basis of its clinical behavior.


Leukemia, Lymphocytic, Chronic, B-Cell , Lymphoma, Large B-Cell, Diffuse , Humans , Leukemia, Lymphocytic, Chronic, B-Cell/genetics , Chromatin/genetics , B-Lymphocytes , Receptors, Antigen, B-Cell/genetics
13.
Front Oncol ; 13: 1097942, 2023.
Article En | MEDLINE | ID: mdl-36816924

Background: Microenvironmental interactions of the malignant clone with T cells are critical throughout the natural history of chronic lymphocytic leukemia (CLL). Indeed, clonal expansions of T cells and shared clonotypes exist between different CLL patients, strongly implying clonal selection by antigens. Moreover, immunogenic neoepitopes have been isolated from the clonotypic B cell receptor immunoglobulin sequences, offering a rationale for immunotherapeutic approaches. Here, we interrogated the T cell receptor (TR) gene repertoire of CLL patients with different genomic aberration profiles aiming to identify unique signatures that would point towards an additional source of immunogenic neoepitopes for T cells. Experimental design: TR gene repertoire profiling using next generation sequencing in groups of patients with CLL carrying one of the following copy-number aberrations (CNAs): del(11q), del(17p), del(13q), trisomy 12, or gene mutations in TP53 or NOTCH1. Results: Oligoclonal expansions were found in all patients with distinct recurrent genomic aberrations; these were more pronounced in cases bearing CNAs, particularly trisomy 12, rather than gene mutations. Shared clonotypes were found both within and across groups, which appeared to be CLL-biased based on extensive comparisons against TR databases from various entities. Moreover, in silico analysis identified TR clonotypes with high binding affinity to neoepitopes predicted to arise from TP53 and NOTCH1 mutations. Conclusions: Distinct TR repertoire profiles were identified in groups of patients with CLL bearing different genomic aberrations, alluding to distinct selection processes. Abnormal protein expression and gene dosage effects associated with recurrent genomic aberrations likely represent a relevant source of CLL-specific selecting antigens.

14.
PLoS Comput Biol ; 19(1): e1010752, 2023 01.
Article En | MEDLINE | ID: mdl-36622853

There is an ongoing explosion of scientific datasets being generated, brought on by recent technological advances in many areas of the natural sciences. As a result, the life sciences have become increasingly computational in nature, and bioinformatics has taken on a central role in research studies. However, basic computational skills, data analysis, and stewardship are still rarely taught in life science educational programs, resulting in a skills gap in many of the researchers tasked with analysing these big datasets. In order to address this skills gap and empower researchers to perform their own data analyses, the Galaxy Training Network (GTN) has previously developed the Galaxy Training Platform (https://training.galaxyproject.org), an open access, community-driven framework for the collection of FAIR (Findable, Accessible, Interoperable, Reusable) training materials for data analysis utilizing the user-friendly Galaxy framework as its primary data analysis platform. Since its inception, this training platform has thrived, with the number of tutorials and contributors growing rapidly, and the range of topics extending beyond life sciences to include topics such as climatology, cheminformatics, and machine learning. While initially aimed at supporting researchers directly, the GTN framework has proven to be an invaluable resource for educators as well. We have focused our efforts in recent years on adding increased support for this growing community of instructors. New features have been added to facilitate the use of the materials in a classroom setting, simplifying the contribution flow for new materials, and have added a set of train-the-trainer lessons. Here, we present the latest developments in the GTN project, aimed at facilitating the use of the Galaxy Training materials by educators, and its usage in different learning environments.


Computational Biology , Software , Humans , Computational Biology/methods , Data Analysis , Research Personnel
15.
Sci Data ; 9(1): 622, 2022 10 14.
Article En | MEDLINE | ID: mdl-36241754

Research software is a fundamental and vital part of research, yet significant challenges to discoverability, productivity, quality, reproducibility, and sustainability exist. Improving the practice of scholarship is a common goal of the open science, open source, and FAIR (Findable, Accessible, Interoperable and Reusable) communities and research software is now being understood as a type of digital object to which FAIR should be applied. This emergence reflects a maturation of the research community to better understand the crucial role of FAIR research software in maximising research value. The FAIR for Research Software (FAIR4RS) Working Group has adapted the FAIR Guiding Principles to create the FAIR Principles for Research Software (FAIR4RS Principles). The contents and context of the FAIR4RS Principles are summarised here to provide the basis for discussion of their adoption. Examples of implementation by organisations are provided to share information on how to maximise the value of research outputs, and to encourage others to amplify the importance and impact of this work.

16.
Brief Bioinform ; 23(5)2022 09 20.
Article En | MEDLINE | ID: mdl-36044248

Intraclonal diversification (ID) within the immunoglobulin (IG) genes expressed by B cell clones arises due to ongoing somatic hypermutation (SHM) in a context of continuous interactions with antigen(s). Defining the nature and order of appearance of SHMs in the IG genes can assist in improved understanding of the ID process, shedding light into the ontogeny and evolution of B cell clones in health and disease. Such endeavor is empowered thanks to the introduction of high-throughput sequencing in the study of IG gene repertoires. However, few existing tools allow the identification, quantification and characterization of SHMs related to ID, all of which have limitations in their analysis, highlighting the need for developing a purpose-built tool for the comprehensive analysis of the ID process. In this work, we present the immunoglobulin intraclonal diversification analysis (IgIDivA) tool, a novel methodology for the in-depth qualitative and quantitative analysis of the ID process from high-throughput sequencing data. IgIDivA identifies and characterizes SHMs that occur within the variable domain of the rearranged IG genes and studies in detail the connections between identified SHMs, establishing mutational pathways. Moreover, it combines established and new graph-based metrics for the objective determination of ID level, combined with statistical analysis for the comparison of ID level features for different groups of samples. Of importance, IgIDivA also provides detailed visualizations of ID through the generation of purpose-built graph networks. Beyond the method design, IgIDivA has been also implemented as an R Shiny web application. IgIDivA is freely available at https://bio.tools/igidiva.


Genes, Immunoglobulin , Immunoglobulins , B-Lymphocytes , Clone Cells , High-Throughput Nucleotide Sequencing , Immunoglobulins/genetics
17.
Methods Mol Biol ; 2453: 585-603, 2022.
Article En | MEDLINE | ID: mdl-35622343

The study of antigen receptor gene repertoires using next-generation sequencing (NGS) technologies has disclosed an unprecedented depth of complexity, requiring novel computational and analytical solutions. Several bioinformatics workflows have been developed to this end, including the T-cell receptor/immunoglobulin profiler (TRIP), a web application implemented in R shiny, specifically designed for the purposes of comprehensive repertoire analysis, which is the focus of this chapter. TRIP has the potential to perform robust immunoprofiling analysis through the extraction and processing of the IMGT/HighV-Quest output, via a series of functions, ensuring the analysis of high-quality, biologically relevant data through a multilevel process of data filtering. Subsequently, it provides in-depth analysis of antigen receptor gene rearrangements, including (a) clonality assessment; (b) extraction of variable (V), diversity (D), and joining (J) gene repertoires; (c) CDR3 characterization at both the nucleotide and amino acid level; and (d) somatic hypermutation analysis, in the case of immunoglobulin gene rearrangements. Relevant to mention, TRIP enables a high level of customization through the integration of various options in key aspects of the analysis, such as clonotype definition and computation, hence allowing for flexibility without compromising on accuracy.


Data Analysis , Immunoglobulins , Computational Biology , High-Throughput Nucleotide Sequencing , Immunoglobulins/genetics , Receptors, Antigen, T-Cell/genetics , Software
18.
Blood Adv ; 6(8): 2646-2656, 2022 04 26.
Article En | MEDLINE | ID: mdl-35235952

The TA-isoform of the p63 transcription factor (TAp63) has been reported to contribute to clinical aggressiveness in chronic lymphocytic leukemia (CLL) in a hitherto elusive way. Here, we sought to further understand and define the role of TAp63 in the pathophysiology of CLL. First, we found that elevated TAp63 expression levels are linked with adverse clinical outcomes, including disease relapse and shorter time-to-first treatment and overall survival. Next, prompted by the fact that TAp63 participates in an NF-κB/TAp63/BCL2 antiapoptotic axis in activated mature, normal B cells, we explored molecular links between TAp63 and BCL2 also in CLL. We documented a strong correlation at both the protein and the messenger RNA (mRNA) levels, alluding to the potential prosurvival role of TAp63. This claim was supported by inducible downregulation of TAp63 expression in the MEC1 CLL cell line using clustered regularly interspaced short palindromic repeats (CRISPR) system, which resulted in downregulation of BCL2 expression. Next, using chromatin immunoprecipitation (ChIP) sequencing, we examined whether BCL2 might constitute a transcriptional target of TAp63 and identified a significant binding profile of TAp63 in the BCL2 gene locus, across a genomic region previously characterized as a super enhancer in CLL. Moreover, we identified high-confidence TAp63 binding regions in genes mainly implicated in immune response and DNA-damage procedures. Finally, we found that upregulated TAp63 expression levels render CLL cells less responsive to apoptosis induction with the BCL2 inhibitor venetoclax. On these grounds, TAp63 appears to act as a positive modulator of BCL2, hence contributing to the antiapoptotic phenotype that underlies clinical aggressiveness and treatment resistance in CLL.


Leukemia, Lymphocytic, Chronic, B-Cell , Apoptosis/genetics , Gene Expression Regulation , Humans , Leukemia, Lymphocytic, Chronic, B-Cell/drug therapy , Leukemia, Lymphocytic, Chronic, B-Cell/genetics , Proto-Oncogene Proteins c-bcl-2/genetics , Proto-Oncogene Proteins c-bcl-2/metabolism , Transcription Factors , Tumor Suppressor Proteins/metabolism
19.
Sci Rep ; 12(1): 2659, 2022 02 17.
Article En | MEDLINE | ID: mdl-35177697

The COVID-19 pandemic represents an unprecedented global crisis necessitating novel approaches for, amongst others, early detection of emerging variants relating to the evolution and spread of the virus. Recently, the detection of SARS-CoV-2 RNA in wastewater has emerged as a useful tool to monitor the prevalence of the virus in the community. Here, we propose a novel methodology, called lineagespot, for the monitoring of mutations and the detection of SARS-CoV-2 lineages in wastewater samples using next-generation sequencing (NGS). Our proposed method was tested and evaluated using NGS data produced by the sequencing of 14 wastewater samples from the municipality of Thessaloniki, Greece, covering a 6-month period. The results showed the presence of SARS-CoV-2 variants in wastewater data. lineagespot was able to record the evolution and rapid domination of the Alpha variant (B.1.1.7) in the community, and allowed the correlation between the mutations evident through our approach and the mutations observed in patients from the same area and time periods. lineagespot is an open-source tool, implemented in R, and is freely available on GitHub and registered on bio.tools.


Mutation , SARS-CoV-2/genetics , SARS-CoV-2/isolation & purification , Software , Wastewater/virology , Humans
20.
NAR Genom Bioinform ; 4(1): lqab121, 2022 Mar.
Article En | MEDLINE | ID: mdl-35047813

The integration of multi-omics data can greatly facilitate the advancement of research in Life Sciences by highlighting new interactions. However, there is currently no widespread procedure for meaningful multi-omics data integration. Here, we present a robust framework, called InterTADs, for integrating multi-omics data derived from the same sample, and considering the chromatin configuration of the genome, i.e. the topologically associating domains (TADs). Following the integration process, statistical analysis highlights the differences between the groups of interest (normal versus cancer cells) relating to (i) independent and (ii) integrated events through TADs. Finally, enrichment analysis using KEGG database, Gene Ontology and transcription factor binding sites and visualization approaches are available. We applied InterTADs to multi-omics datasets from 135 patients with chronic lymphocytic leukemia (CLL) and found that the integration through TADs resulted in a dramatic reduction of heterogeneity compared to individual events. Significant differences for individual events and on TADs level were identified between patients differing in the somatic hypermutation status of the clonotypic immunoglobulin genes, the core biological stratifier in CLL, attesting to the biomedical relevance of InterTADs. In conclusion, our approach suggests a new perspective towards analyzing multi-omics data, by offering reasonable execution time, biological benchmarking and potentially contributing to pattern discovery through TADs.

...