RESUMO
With ever-increasing amounts of data produced by mass spectrometry (MS) proteomics and metabolomics, and the sheer volume of samples now analyzed, the need for a common open format possessing both file size efficiency and faster read/write speeds has become paramount to drive the next generation of data analysis pipelines. The Proteomics Standards Initiative (PSI) has established a clear and precise extensible markup language (XML) representation for data interchange, mzML, receiving substantial uptake; nevertheless, storage and file access efficiency has not been the main focus. We propose an HDF5 file format "mzMLb" that is optimized for both read/write speed and storage of the raw mass spectrometry data. We provide an extensive validation of the write speed, random read speed, and storage size, demonstrating a flexible format that with or without compression is faster than all existing approaches in virtually all cases, while with compression is comparable in size to proprietary vendor file formats. Since our approach uniquely preserves the XML encoding of the metadata, the format implicitly supports future versions of mzML and is straightforward to implement: mzMLb's design adheres to both HDF5 and NetCDF4 standard implementations, which allows it to be easily utilized by third parties due to their widespread programming language support. A reference implementation within the established ProteoWizard toolkit is provided.
Assuntos
Linguagens de Programação , Proteômica , Bases de Dados de Proteínas , Espectrometria de Massas , Metabolômica , SoftwareRESUMO
We present a pipeline in which machine learning techniques are used to automatically identify and evaluate subtypes of hospital patients admitted between 2017 and 2021 in a large UK teaching hospital. Patient clusters are determined using routinely collected hospital data, such as those used in the UK's National Early Warning Score 2 (NEWS2). An iterative, hierarchical clustering process was used to identify the minimum set of relevant features for cluster separation. With the use of state-of-the-art explainability techniques, the identified subtypes are interpreted and assigned clinical meaning, illustrating their robustness. In parallel, clinicians assessed intracluster similarities and intercluster differences of the identified patient subtypes within the context of their clinical knowledge. For each cluster, outcome prediction models were trained and their forecasting ability was illustrated against the NEWS2 of the unclustered patient cohort. These preliminary results suggest that subtype models can outperform the established NEWS2 method, providing improved prediction of patient deterioration. By considering both the computational outputs and clinician-based explanations in patient subtyping, we aim to highlight the mutual benefit of combining machine learning techniques with clinical expertise.
Assuntos
Análise por Conglomerados , Pacientes Internados , Aprendizado de Máquina , Humanos , Pacientes Internados/classificação , PrevisõesRESUMO
The cardiovascular health of the human population is a major concern for medical clinicians, with cardiovascular diseases responsible for 48% of all deaths worldwide, according to the World Health Organization. The development of new diagnostic tools that are practicable and economical to scrutinize the cardiovascular health of humans is a major driver for clinicians. We offer a new technique to obtain seismocardiographic signals up to 54 Hz covering both ballistocardiography (below 20 Hz) and audible heart sounds (20 Hz upward), using a system based on curvature sensors formed from fiber optic long period gratings. This system can visualize the real-time three-dimensional (3-D) mechanical motion of the heart by using the data from the sensing array in conjunction with a bespoke 3-D shape reconstruction algorithm. Visualization is demonstrated by adhering three to four sensors on the outside of the thorax and in close proximity to the apex of the heart; the sensing scheme revealed a complex motion of the heart wall next to the apex region of the heart. The detection scheme is low-cost, portable, easily operated and has the potential for ambulatory applications.