Search | Virtual Health Library

PDBench: evaluating computational methods for protein-sequence design.

Castorina, Leonardo V; Petrenas, Rokas; Subr, Kartic; Wood, Christopher W.

Bioinformatics ; 39(1)2023 01 01.

Article in English | MEDLINE | ID: mdl-36637198

ABSTRACT

SUMMARY: Ever increasing amounts of protein structure data, combined with advances in machine learning, have led to the rapid proliferation of methods available for protein-sequence design. In order to utilize a design method effectively, it is important to understand the nuances of its performance and how it varies by design target. Here, we present PDBench, a set of proteins and a number of standard tests for assessing the performance of sequence-design methods. PDBench aims to maximize the structural diversity of the benchmark, compared with previous benchmarking sets, in order to provide useful biological insight into the behaviour of sequence-design methods, which is essential for evaluating their performance and practical utility. We believe that these tools are useful for guiding the development of novel sequence design algorithms and will enable users to choose a method that best suits their design target. AVAILABILITY AND IMPLEMENTATION: https://github.com/wells-wood-research/PDBench. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Subject(s)

Algorithms , Software , Proteins/chemistry , Amino Acid Sequence , Benchmarking , Computational Biology

Attentive Variational Information Bottleneck for TCR-peptide interaction prediction.

Grazioli, Filippo; Machart, Pierre; Mösch, Anja; Li, Kai; Castorina, Leonardo V; Pfeifer, Nico; Min, Martin Renqiang.

Bioinformatics ; 39(1)2023 01 01.

Article in English | MEDLINE | ID: mdl-36571499

ABSTRACT

MOTIVATION: We present a multi-sequence generalization of Variational Information Bottleneck and call the resulting model Attentive Variational Information Bottleneck (AVIB). Our AVIB model leverages multi-head self-attention to implicitly approximate a posterior distribution over latent encodings conditioned on multiple input sequences. We apply AVIB to a fundamental immuno-oncology problem: predicting the interactions between T-cell receptors (TCRs) and peptides. RESULTS: Experimental results on various datasets show that AVIB significantly outperforms state-of-the-art methods for TCR-peptide interaction prediction. Additionally, we show that the latent posterior distribution learned by AVIB is particularly effective for the unsupervised detection of out-of-distribution amino acid sequences. AVAILABILITY AND IMPLEMENTATION: The code and the data used for this study are publicly available at: https://github.com/nec-research/vibtcr. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Subject(s)

Peptides , Software , Amino Acid Sequence , Receptors, Antigen, T-Cell/genetics

TIMED-Design: flexible and accessible protein sequence design with convolutional neural networks.

Castorina, Leonardo V; Ünal, Suleyman Mert; Subr, Kartic; Wood, Christopher W.

Protein Eng Des Sel ; 372024 Jan 29.

Article in English | MEDLINE | ID: mdl-38288671

ABSTRACT

Sequence design is a crucial step in the process of designing or engineering proteins. Traditionally, physics-based methods have been used to solve for optimal sequences, with the main disadvantages being that they are computationally intensive for the end user. Deep learning-based methods offer an attractive alternative, outperforming physics-based methods at a significantly lower computational cost. In this paper, we explore the application of Convolutional Neural Networks (CNNs) for sequence design. We describe the development and benchmarking of a range of networks, as well as reimplementations of previously described CNNs. We demonstrate the flexibility of representing proteins in a three-dimensional voxel grid by encoding additional design constraints into the input data. Finally, we describe TIMED-Design, a web application and command line tool for exploring and applying the models described in this paper. The user interface will be available at the URL: https://pragmaticproteindesign.bio.ed.ac.uk/timed. The source code for TIMED-Design is available at https://github.com/wells-wood-research/timed-design.

Subject(s)

Neural Networks, Computer , Proteins , Amino Acid Sequence , Software

Deep attention super-resolution of brain magnetic resonance images acquired under clinical protocols.

Li, Bryan M; Castorina, Leonardo V; Valdés Hernández, Maria Del C; Clancy, Una; Wiseman, Stewart J; Sakka, Eleni; Storkey, Amos J; Jaime Garcia, Daniela; Cheng, Yajun; Doubal, Fergus; Thrippleton, Michael T; Stringer, Michael; Wardlaw, Joanna M.

Front Comput Neurosci ; 16: 887633, 2022.

Article in English | MEDLINE | ID: mdl-36093418

ABSTRACT

Vast quantities of Magnetic Resonance Images (MRI) are routinely acquired in clinical practice but, to speed up acquisition, these scans are typically of a quality that is sufficient for clinical diagnosis but sub-optimal for large-scale precision medicine, computational diagnostics, and large-scale neuroimaging collaborative research. Here, we present a critic-guided framework to upsample low-resolution (often 2D) MRI full scans to help overcome these limitations. We incorporate feature-importance and self-attention methods into our model to improve the interpretability of this study. We evaluate our framework on paired low- and high-resolution brain MRI structural full scans (i.e., T1-, T2-weighted, and FLAIR sequences are simultaneously input) obtained in clinical and research settings from scanners manufactured by Siemens, Phillips, and GE. We show that the upsampled MRIs are qualitatively faithful to the ground-truth high-quality scans (PSNR = 35.39; MAE = 3.78E-3; NMSE = 4.32E-10; SSIM = 0.9852; mean normal-appearing gray/white matter ratio intensity differences ranging from 0.0363 to 0.0784 for FLAIR, from 0.0010 to 0.0138 for T1-weighted and from 0.0156 to 0.074 for T2-weighted sequences). The automatic raw segmentation of tissues and lesions using the super-resolved images has fewer false positives and higher accuracy than those obtained from interpolated images in protocols represented with more than three sets in the training sample, making our approach a strong candidate for practical application in clinical and collaborative research.

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL