Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 6 de 6
Filtrar
Mais filtros











Base de dados
Intervalo de ano de publicação
1.
Comput Struct Biotechnol J ; 21: 2732-2743, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37168871

RESUMO

Proteins are essential components of all living cells and so the study of their in situ expression, proteomics, has wide reaching applications. Peptide identification in proteomics typically relies on matching high resolution tandem mass spectra to a protein database but can also be performed de novo. While artificial spectra have been successfully incorporated into database search pipelines to increase peptide identification rates, little work has been done to investigate the utility of artificial spectra in the context of de novo peptide identification. Here, we perform a critical analysis of the use of artificial data for the training and evaluation of de novo peptide identification algorithms. First, we classify the different fragment ion types present in real spectra and then estimate the number of spurious matches using random peptides. We then categorise the different types of noise present in real spectra. Finally, we transfer this knowledge to artificial data and test the performance of a state-of-the-art de novo peptide identification algorithm trained using artificial spectra with and without relevant noise addition. Noise supplementation increased artificial training data performance from 30% to 77% of real training data peptide recall. While real data performance was not fully replicated, this work provides the first steps towards an artificial spectrum framework for the training and evaluation of de novo peptide identification algorithms. Further enhanced artificial spectra may allow for more in depth analysis of de novo algorithms as well as alleviating the reliance on database searches for training data.

2.
J Proteome Res ; 22(2): 323-333, 2023 02 03.
Artigo em Inglês | MEDLINE | ID: mdl-36534699

RESUMO

Almost all state-of-the-art de novo peptide sequencing algorithms now use machine learning models to encode fragment peaks and hence identify amino acids in mass spectrometry (MS) spectra. Previous work has highlighted how the inherent MS challenges of noise and missing peptide peaks detrimentally affect the performance of these models. In the present research we extracted and evaluated the encoding modules from 3 state-of-the-art de novo peptide sequencing algorithms. We also propose a convolutional neural network-graph neural network machine learning model for encoding peptide ions in tandem MS spectra. We compared the proposed encoding module to those used in the state-of-the-art de novo peptide sequencing algorithms by assessing their ability to identify b-ions and y-ions in MS spectra. This included a comprehensive evaluation in both real and artificial data across various levels of noise and missing peptide peaks. The proposed model performed best across all data sets using two different metrics (area under the receiver operating characteristic curve (AUC) and average precision). The work also highlighted the effect of including additional features such as intensity rank in these encoding modules as well as issues with using the AUC as a metric. This work is of significance to those designing future de novo peptide identification algorithms as it is the first step toward a new approach.


Assuntos
Algoritmos , Peptídeos , Peptídeos/análise , Sequência de Aminoácidos , Espectrometria de Massas em Tandem/métodos , Íons , Análise de Sequência de Proteína/métodos
3.
Exp Cell Res ; 419(2): 113317, 2022 10 15.
Artigo em Inglês | MEDLINE | ID: mdl-36028058

RESUMO

Computational models can shape our understanding of cell and tissue remodelling, from cell spreading, to active force generation, adhesion, and growth. In this mini-review, we discuss recent progress in modelling of chemo-mechanical cell behaviour and the evolution of multicellular systems. In particular, we highlight recent advances in (i) free-energy based single cell models that can provide new fundamental insight into cell spreading, cancer cell invasion, stem cell differentiation, and remodelling in disease, and (ii) mechanical agent-based models to simulate large numbers of discrete interacting cells in proliferative tumours. We describe how new biological understanding has emerged from such theoretical models, and the trade-offs and constraints associated with current approaches. Ultimately, we aim to make a case for why theory should be integrated with an experimental workflow to optimise new in-vitro studies, to predict feedback between cells and their microenvironment, and to deepen understanding of active cell behaviour.


Assuntos
Modelos Biológicos , Neoplasias , Simulação por Computador , Humanos , Microambiente Tumoral
4.
Cluster Comput ; 25(4): 2661-2682, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35855775

RESUMO

HPC or super-computing clusters are designed for executing computationally intensive operations that typically involve large scale I/O operations. This most commonly involves using a standard MPI library implemented in C/C++. The MPI-I/O performance in HPC clusters tends to vary significantly over a range of configuration parameters that are generally not taken into account by the algorithm. It is commonly left to individual practitioners to optimise I/O on a case by case basis at code level. This can often lead to a range of unforeseen outcomes. The ExSeisDat utility is built on top of the native MPI-I/O library comprising of Parallel I/O and Workflow Libraries to process seismic data encapsulated in SEG-Y file format. The SEG-Y File data structure is complex in nature, due to the alternative arrangement of trace header and trace data. Its size scales to petabytes and the chances of I/O performance degradation are further increased by ExSeisDat. This research paper presents a novel study of the changing I/O performance in terms of bandwidth, with the use of parallel plots against various MPI-I/O, Lustre (Parallel) File System and SEG-Y File parameters. Another novel aspect of this research is the predictive modelling of MPI-I/O behaviour over SEG-Y File benchmarks using Artificial Neural Networks (ANNs). The accuracy ranges from 62.5% to 96.5% over the set of trained ANN models. The computed Mean Square Error (MSE), Mean Absolute Error (MAE) and Mean Absolute Percentage Error (MAPE) values further support the generalisation of the prediction models. This paper demonstrates that by using our ANNs prediction technique, the configurations can be tuned beforehand to avoid poor I/O performance.

5.
Comput Struct Biotechnol J ; 20: 1402-1412, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35386104

RESUMO

Proteomics aims to characterise system-wide protein expression and typically relies on mass-spectrometry and peptide fragmentation, followed by a database search for protein identification. It has wide ranging applications from clinical to environmental settings and virtually impacts on every area of biology. In that context, de novo peptide sequencing is becoming increasingly popular. Historically its performance lagged behind database search methods but with the integration of machine learning, this field of research is gaining momentum. To enable de novo peptide sequencing to realise its full potential, it is critical to explore the mass spectrometry data underpinning peptide identification. In this research we investigate the characteristics of tandem mass spectra using 8 published datasets. We then evaluate two state of the art de novo peptide sequencing algorithms, Novor and DeepNovo, with a particular focus on their performance with regard to missing fragmentation cleavage sites and noise. DeepNovo was found to perform better than Novor overall. However, Novor recalled more correct amino acids when 6 or more cleavage sites were missing. Furthermore, less than 11% of each algorithms' correct peptide predictions emanate from data with more than one missing cleavage site, highlighting the issues missing cleavages pose. We further investigate how the algorithms manage to correctly identify peptides with many of these missing fragmentation cleavages. We show how noise negatively impacts the performance of both algorithms, when high intensity peaks are considered. Finally, we provide recommendations regarding further algorithms' improvements and offer potential avenues to overcome current inherent data limitations.

6.
Bioinformatics ; 36(4): 1309-1310, 2020 02 15.
Artigo em Inglês | MEDLINE | ID: mdl-31539022

RESUMO

SUMMARY: The overarching aim of microbiome analysis is to uncover the links between microbial phylogeny and function in order to access ecosystem functioning. This can be done using several experimental strategies targeting different biomolecules, including DNA (metagenomics), RNA (metatranscriptomics) and proteins (metaproteomics). Despite the importance of linking microbial function to phylogeny there are currently no visualization tools that effectively integrate this information. Chordomics is a Shiny-based application for linked -omics data analysis, allowing users to visualize microbial function and phylogeny on a single plot and compare datasets across time and environments. AVAILABILITY AND IMPLEMENTATION: Chordomics is available on GitHub: https://github.com/kevinmcdonnell6/chordomics; software is coded in R and JavaScript and a demonstration version is available at https://kmcd.shinyapps.io/ChordomicsDemo/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Microbiota , Software , Metagenômica , Filogenia
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA