Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 5 de 5
Filter
Add more filters










Database
Language
Publication year range
1.
Bioinform Adv ; 3(1): vbad016, 2023.
Article in English | MEDLINE | ID: mdl-37143924

ABSTRACT

Motivation: Being able to interpret and explain the predictions made by a machine learning model is of fundamental importance. Unfortunately, a trade-off between accuracy and interpretability is often observed. As a result, the interest in developing more transparent yet powerful models has grown considerably over the past few years. Interpretable models are especially needed in high-stake scenarios, such as computational biology and medical informatics, where erroneous or biased models' predictions can have deleterious consequences for a patient. Furthermore, understanding the inner workings of a model can help increase the trust in the model. Results: We introduce a novel structurally constrained neural network, MonoNet, which is more transparent, while still retaining the same learning capabilities of traditional neural models. MonoNet contains monotonically connected layers that ensure monotonic relationships between (high-level) features and outputs. We show how, by leveraging the monotonic constraint in conjunction with other post hoc strategies, we can interpret our model. To demonstrate our model's capabilities, we train MonoNet to classify cellular populations in a single-cell proteomic dataset. We also demonstrate MonoNet's performance in other benchmark datasets in different domains, including non-biological applications (in the Supplementary Material). Our experiments show how our model can achieve good performance, while providing at the same time useful biological insights about the most important biomarkers. We finally carry out an information-theoretical analysis to show how the monotonic constraint actively contributes to the learning process of the model. Availability and implementation: Code and sample data are available at https://github.com/phineasng/mononet. Supplementary information: Supplementary data are available at Bioinformatics Advances online.

2.
Brief Bioinform ; 24(3)2023 05 19.
Article in English | MEDLINE | ID: mdl-37031956

ABSTRACT

MOTIVATION: Interpretability has become a necessary feature for machine learning models deployed in critical scenarios, e.g. legal system, healthcare. In these situations, algorithmic decisions may have (potentially negative) long-lasting effects on the end-user affected by the decision. While deep learning models achieve impressive results, they often function as a black-box. Inspired by linear models, we propose a novel class of structurally constrained deep neural networks, which we call FLAN (Feature-wise Latent Additive Networks). Crucially, FLANs process each input feature separately, computing for each of them a representation in a common latent space. These feature-wise latent representations are then simply summed, and the aggregated representation is used for the prediction. These feature-wise representations allow a user to estimate the effect of each individual feature independently from the others, similarly to the way linear models are interpreted. RESULTS: We demonstrate FLAN on a series of benchmark datasets in different biological domains. Our experiments show that FLAN achieves good performances even in complex datasets (e.g. TCR-epitope binding prediction), despite the structural constraint we imposed. On the other hand, this constraint enables us to interpret FLAN by deciphering its decision process, as well as obtaining biological insights (e.g. by identifying the marker genes of different cell populations). In supplementary experiments, we show similar performances also on non-biological datasets. CODE AND DATA AVAILABILITY: Code and example data are available at https://github.com/phineasng/flan_bio.


Subject(s)
Machine Learning , Neural Networks, Computer , Protein Binding
3.
Bioinformatics ; 38(Suppl 1): i246-i254, 2022 06 24.
Article in English | MEDLINE | ID: mdl-35758821

ABSTRACT

MOTIVATION: Understanding the mechanisms underlying T cell receptor (TCR) binding is of fundamental importance to understanding adaptive immune responses. A better understanding of the biochemical rules governing TCR binding can be used, e.g. to guide the design of more powerful and safer T cell-based therapies. Advances in repertoire sequencing technologies have made available millions of TCR sequences. Data abundance has, in turn, fueled the development of many computational models to predict the binding properties of TCRs from their sequences. Unfortunately, while many of these works have made great strides toward predicting TCR specificity using machine learning, the black-box nature of these models has resulted in a limited understanding of the rules that govern the binding of a TCR and an epitope. RESULTS: We present an easy-to-use and customizable computational pipeline, DECODE, to extract the binding rules from any black-box model designed to predict the TCR-epitope binding. DECODE offers a range of analytical and visualization tools to guide the user in the extraction of such rules. We demonstrate our pipeline on a recently published TCR-binding prediction model, TITAN, and show how to use the provided metrics to assess the quality of the computed rules. In conclusion, DECODE can lead to a better understanding of the sequence motifs that underlie TCR binding. Our pipeline can facilitate the investigation of current immunotherapeutic challenges, such as cross-reactive events due to off-target TCR binding. AVAILABILITY AND IMPLEMENTATION: Code is available publicly at https://github.com/phineasng/DECODE. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Computational Biology , Receptors, Antigen, T-Cell , Epitopes , Protein Binding , Receptors, Antigen, T-Cell/chemistry
4.
Article in English | MEDLINE | ID: mdl-32850701

ABSTRACT

In the last decade, a large number of genome-wide association studies have uncovered many single-nucleotide polymorphisms (SNPs) that are associated with complex traits and confer susceptibility to diseases, such as cancer. However, so far only a few heritable traits with medium-to-high penetrance have been identified. The vast majority of the discovered variants only leads to disease in combination with other still unknown factors. Furthermore, while many studies aimed to link the effect of SNPs to changes in molecular phenotypes, the analysis has been often focused on testing associations between a single SNP and a transcript, hence disregarding the dysregulation of gene regulatory networks that has been shown to play an essential role in disease onset, notably in cancer. Here we take a systems biology approach and develop GVITamIN (Genetic VarIaTIoN functional analysis tool), a new statistical and computational approach to characterize the effect of a SNP on both genes and transcriptional regulatory programs. GVITamIN exploits a novel statistical approach to combine the usually small effect of disease-susceptibility SNPs, and reveals important potential oncogenic mechanisms, hence taking one step further in the direction of understanding the SNP mechanism of action. We apply GVITamIN on a breast cancer cohort and identify well-known cancer-related transcription factors, such as CTCF, LEF1, and FOXA1, as TFs dysregulated by breast cancer-associated SNPs. Furthermore, our results reveal that SNPs located on the RAD51B gene are significantly associated with an abnormal regulatory activity, suggesting a pivotal role for homologous recombination repair mechanisms in breast cancer.

5.
PLoS One ; 15(1): e0227180, 2020.
Article in English | MEDLINE | ID: mdl-31945090

ABSTRACT

Recent evidence shows that the disruption of constitutive insulated neighbourhoods might lead to oncogene dysregulation. We present here a systematic pan-cancer characterisation of the associations between constitutive boundaries and genome alterations in cancer. Specifically, we investigate the enrichment of somatic mutation, abnormal methylation, and copy number alteration events in the proximity of CTCF bindings overlapping with topological boundaries (junctions) in 26 cancer types. Focusing on CTCF motifs that are both in-boundary (overlapping with junctions) and active (overlapping with peaks of CTCF expression), we find a significant enrichment of somatic mutations in several cancer types. Furthermore, mutated junctions are significantly conserved across cancer types, and we also observe a positive selection of transversions rather than transitions in many cancer types. We also analyzed the mutational signature found on the different classes of CTCF motifs, finding some signatures (such as SBS26) to have a higher weight within in-boundary than off-bounday motifs. Regarding methylation, we find a significant number of over-methylated active in-boundary CTCF motifs in several cancer types; similarly to somatic-mutated junctions, they also have a significant conservation across cancer types. Finally, in several cancer types we observe that copy number alterations tend to overlap with active junctions more often than in matched normal samples. While several articles have recently reported a mutational enrichment at CTCF binding sites for specific cancer types, our analysis is pan-cancer and investigates abnormal methylation and copy number alterations in addition to somatic mutations. Our method is fully replicable and suggests several follow-up tumour-specific analyses.


Subject(s)
CCCTC-Binding Factor/genetics , CCCTC-Binding Factor/metabolism , DNA Mutational Analysis/methods , Epigenesis, Genetic/genetics , Insulator Elements/genetics , Neoplasms/genetics , Point Mutation , Amino Acid Motifs/genetics , Binding Sites/genetics , Chromosomes, Human, Pair 11/genetics , DNA Copy Number Variations/genetics , DNA Methylation , Exons/genetics , Female , Gene Expression Regulation, Neoplastic/genetics , Genome, Human/genetics , Humans , Mutation Rate , Promoter Regions, Genetic/genetics
SELECTION OF CITATIONS
SEARCH DETAIL
...