Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 9 de 9
Filter
Add more filters










Database
Language
Publication year range
1.
Sci Data ; 11(1): 112, 2024 Jan 23.
Article in English | MEDLINE | ID: mdl-38263211

ABSTRACT

Here we provide a curated, large scale, label free mass spectrometry-based proteomics data set derived from HeLa cell lines for general purpose machine learning and analysis. Data access and filtering is a tedious task, which takes up considerable amounts of time for researchers. Therefore we provide machine based metadata for easy selection and overview along the 7,444 raw files and MaxQuant search output. For convenience, we provide three filtered and aggregated development datasets on the protein groups, peptides and precursors level. Next to providing easy to access training data, we provide a SDRF file annotating each raw file with instrument settings allowing automated reprocessing. We encourage others to enlarge this data set by instrument runs of further HeLa samples from different machine types by providing our workflows and analysis scripts.


Subject(s)
HeLa Cells , Machine Learning , Proteomics , Humans , Mass Spectrometry , Metadata
2.
Sci Rep ; 13(1): 20039, 2023 11 16.
Article in English | MEDLINE | ID: mdl-37973887

ABSTRACT

The inflammatory activity in cirrhosis is often pronounced and related to episodes of decompensation. Systemic markers of inflammation may contain prognostic information, and we investigated their possible correlation with admissions and mortality among patients with newly diagnosed liver cirrhosis. We collected plasma samples from 149 patients with newly diagnosed (within the past 6 months) cirrhosis, and registered deaths and hospital admissions within 180 days. Ninety-two inflammatory markers were quantified and correlated with clinical variables, mortality, and admissions. Prediction models were calculated by logistic regression. We compared the disease courses of our cohort with a validation cohort of 86 patients with cirrhosis. Twenty of 92 markers of inflammation correlated significantly with mortality within 180 days (q-values of 0.00-0.044), whereas we found no significant correlations with liver-related admissions. The logistic regression models yielded AUROCs of 0.73 to 0.79 for mortality and 0.61 to 0.73 for liver-related admissions, based on a variety of modalities (clinical variables, inflammatory markers, clinical scores, or combinations thereof). The models performed moderately well in the validation cohort and were better able to predict mortality than liver-related admissions. In conclusion, markers of inflammation can be used to predict 180-day mortality in patients with newly diagnosed cirrhosis. Prediction models for newly diagnosed cirrhotic patients need further validation before implementation in clinical practice.Trial registration: NCT04422223 (and NCT03443934 for the validation cohort), and Scientific Ethics Committee No.: H-19024348.


Subject(s)
Hospitalization , Liver Cirrhosis , Humans , Liver Cirrhosis/diagnosis , Prospective Studies , Prognosis , Inflammation , Severity of Illness Index
3.
Commun Biol ; 6(1): 700, 2023 07 08.
Article in English | MEDLINE | ID: mdl-37422584

ABSTRACT

Most investigations of geographical within-species differences are limited to focusing on a single species. Here, we investigate global differences for multiple bacterial species using a dataset of 757 metagenomics sewage samples from 101 countries worldwide. The within-species variations were determined by performing genome reconstructions, and the analyses were expanded by gene focused approaches. Applying these methods, we recovered 3353 near complete (NC) metagenome assembled genomes (MAGs) encompassing 1439 different MAG species and found that within-species genomic variation was in 36% of the investigated species (12/33) coherent with regional separation. Additionally, we found that variation of organelle genes correlated less with geography compared to metabolic and membrane genes, suggesting that the global differences of these species are caused by regional environmental selection rather than dissemination limitations. From the combination of the large and globally distributed dataset and in-depth analysis, we present a wide investigation of global within-species phylogeny of sewage bacteria. The global differences found here emphasize the need for worldwide data sets when making global conclusions.


Subject(s)
Bacteria , Sewage , Phylogeny , Sewage/microbiology , Bacteria/genetics , Cluster Analysis , Geography
5.
Nat Biotechnol ; 41(3): 399-408, 2023 03.
Article in English | MEDLINE | ID: mdl-36593394

ABSTRACT

The application of multiple omics technologies in biomedical cohorts has the potential to reveal patient-level disease characteristics and individualized response to treatment. However, the scale and heterogeneous nature of multi-modal data makes integration and inference a non-trivial task. We developed a deep-learning-based framework, multi-omics variational autoencoders (MOVE), to integrate such data and applied it to a cohort of 789 people with newly diagnosed type 2 diabetes with deep multi-omics phenotyping from the DIRECT consortium. Using in silico perturbations, we identified drug-omics associations across the multi-modal datasets for the 20 most prevalent drugs given to people with type 2 diabetes with substantially higher sensitivity than univariate statistical tests. From these, we among others, identified novel associations between metformin and the gut microbiota as well as opposite molecular responses for the two statins, simvastatin and atorvastatin. We used the associations to quantify drug-drug similarities, assess the degree of polypharmacy and conclude that drug effects are distributed across the multi-omics modalities.


Subject(s)
Deep Learning , Diabetes Mellitus, Type 2 , Humans , Algorithms , Diabetes Mellitus, Type 2/drug therapy , Diabetes Mellitus, Type 2/genetics
6.
Nat Med ; 28(6): 1277-1287, 2022 06.
Article in English | MEDLINE | ID: mdl-35654907

ABSTRACT

Alcohol-related liver disease (ALD) is a major cause of liver-related death worldwide, yet understanding of the three key pathological features of the disease-fibrosis, inflammation and steatosis-remains incomplete. Here, we present a paired liver-plasma proteomics approach to infer molecular pathophysiology and to explore the diagnostic and prognostic capability of plasma proteomics in 596 individuals (137 controls and 459 individuals with ALD), 360 of whom had biopsy-based histological assessment. We analyzed all plasma samples and 79 liver biopsies using a mass spectrometry (MS)-based proteomics workflow with short gradient times and an enhanced, data-independent acquisition scheme in only 3 weeks of measurement time. In plasma and liver biopsy tissues, metabolic functions were downregulated whereas fibrosis-associated signaling and immune responses were upregulated. Machine learning models identified proteomics biomarker panels that detected significant fibrosis (receiver operating characteristic-area under the curve (ROC-AUC), 0.92, accuracy, 0.82) and mild inflammation (ROC-AUC, 0.87, accuracy, 0.79) more accurately than existing clinical assays (DeLong's test, P < 0.05). These biomarker panels were found to be accurate in prediction of future liver-related events and all-cause mortality, with a Harrell's C-index of 0.90 and 0.79, respectively. An independent validation cohort reproduced the diagnostic model performance, laying the foundation for routine MS-based liver disease testing.


Subject(s)
Liver Diseases , Proteomics , Biomarkers/metabolism , Biopsy , Humans , Inflammation/pathology , Liver/metabolism , Liver Cirrhosis/diagnosis , Liver Cirrhosis/pathology , Liver Diseases/metabolism
7.
J Proteome Res ; 21(6): 1566-1574, 2022 06 03.
Article in English | MEDLINE | ID: mdl-35549218

ABSTRACT

Spectrum clustering is a powerful strategy to minimize redundant mass spectra by grouping them based on similarity, with the aim of forming groups of mass spectra from the same repeatedly measured analytes. Each such group of near-identical spectra can be represented by its so-called consensus spectrum for downstream processing. Although several algorithms for spectrum clustering have been adequately benchmarked and tested, the influence of the consensus spectrum generation step is rarely evaluated. Here, we present an implementation and benchmark of common consensus spectrum algorithms, including spectrum averaging, spectrum binning, the most similar spectrum, and the best-identified spectrum. We have analyzed diverse public data sets using two different clustering algorithms (spectra-cluster and MaRaCluster) to evaluate how the consensus spectrum generation procedure influences downstream peptide identification. The BEST and BIN methods were found the most reliable methods for consensus spectrum generation, including for data sets with post-translational modifications (PTM) such as phosphorylation. All source code and data of the present study are freely available on GitHub at https://github.com/statisticalbiotechnology/representative-spectra-benchmark.


Subject(s)
Proteomics , Tandem Mass Spectrometry , Algorithms , Cluster Analysis , Consensus , Databases, Protein , Proteomics/methods , Software , Tandem Mass Spectrometry/methods
8.
Nat Commun ; 12(1): 5854, 2021 10 06.
Article in English | MEDLINE | ID: mdl-34615866

ABSTRACT

The amount of public proteomics data is rapidly increasing but there is no standardized format to describe the sample metadata and their relationship with the dataset files in a way that fully supports their understanding or reanalysis. Here we propose to develop the transcriptomics data format MAGE-TAB into a standard representation for proteomics sample metadata. We implement MAGE-TAB-Proteomics in a crowdsourcing project to manually curate over 200 public datasets. We also describe tools and libraries to validate and submit sample metadata-related information to the PRIDE repository. We expect that these developments will improve the reproducibility and facilitate the reanalysis and integration of public proteomics datasets.


Subject(s)
Data Analysis , Databases, Protein , Metadata , Proteomics , Big Data , Humans , Reproducibility of Results , Software , Transcriptome
9.
J Comput Aided Mol Des ; 34(7): 731-746, 2020 07.
Article in English | MEDLINE | ID: mdl-32297073

ABSTRACT

In drug development, late stage toxicity issues of a compound are the main cause of failure in clinical trials. In silico methods are therefore of high importance to guide the early design process to reduce time, costs and animal testing. Technical advances and the ever growing amount of available toxicity data enabled machine learning, especially neural networks, to impact the field of predictive toxicology. In this study, cytotoxicity prediction, one of the earliest handles in drug discovery, is investigated using a deep learning approach trained on a highly consistent in-house data set of over 34,000 compounds with a share of less than 5% of cytotoxic molecules. The model reached a balanced accuracy of over 70%, similar to previously reported studies using Random Forest. Albeit yielding good results, neural networks are often described as a black box lacking deeper mechanistic understanding of the underlying model. To overcome this absence of interpretability, a Deep Taylor Decomposition method is investigated to identify substructures that may be responsible for the cytotoxic effects, the so-called toxicophores. Furthermore, this study introduces cytotoxicity maps which provide a visual structural interpretation of the relevance of these substructures. Using this approach could be helpful in drug development to predict the potential toxicity of a compound as well as to generate new insights into the toxic mechanism. Moreover, it could also help to de-risk and optimize compounds.


Subject(s)
Cytotoxins/chemistry , Cytotoxins/toxicity , Deep Learning , Drug Discovery/methods , Cell Survival/drug effects , Computer-Aided Design , Drug Design , Drug Discovery/statistics & numerical data , HEK293 Cells , Hep G2 Cells , Humans , Models, Biological , Neural Networks, Computer , Small Molecule Libraries , Software , Toxicology/statistics & numerical data
SELECTION OF CITATIONS
SEARCH DETAIL
...