Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 69
Filtrar
1.
Int J Mol Sci ; 25(8)2024 Apr 13.
Artículo en Inglés | MEDLINE | ID: mdl-38673911

RESUMEN

One of the most significant challenges in human health risk assessment is to evaluate hazards from exposure to environmental chemical mixtures. Polycyclic aromatic hydrocarbons (PAHs) are a class of ubiquitous contaminants typically found as mixtures in gaseous and particulate phases in ambient air pollution associated with petrochemicals from Superfund sites and the burning of fossil fuels. However, little is understood about how PAHs in mixtures contribute to toxicity in lung cells. To investigate mixture interactions and component additivity from environmentally relevant PAHs, two synthetic mixtures were created from PAHs identified in passive air samplers at a legacy creosote site impacted by wildfires. The primary human bronchial epithelial cells differentiated at the air-liquid interface were treated with PAH mixtures at environmentally relevant proportions and evaluated for the differential expression of transcriptional biomarkers related to xenobiotic metabolism, oxidative stress response, barrier integrity, and DNA damage response. Component additivity was evaluated across all endpoints using two independent action (IA) models with and without the scaling of components by toxic equivalence factors. Both IA models exhibited trends that were unlike the observed mixture response and generally underestimated the toxicity across dose suggesting the potential for non-additive interactions of components. Overall, this study provides an example of the usefulness of mixture toxicity assessment with the currently available methods while demonstrating the need for more complex yet interpretable mixture response evaluation methods for environmental samples.


Asunto(s)
Células Epiteliales , Hidrocarburos Policíclicos Aromáticos , Humanos , Hidrocarburos Policíclicos Aromáticos/toxicidad , Hidrocarburos Policíclicos Aromáticos/metabolismo , Células Epiteliales/metabolismo , Células Epiteliales/efectos de los fármacos , Estrés Oxidativo/efectos de los fármacos , Daño del ADN/efectos de los fármacos , Modelos Biológicos , Contaminantes Atmosféricos/toxicidad , Células Cultivadas , Bronquios/metabolismo , Bronquios/citología , Bronquios/efectos de los fármacos , Biomarcadores
2.
PLoS One ; 19(3): e0293856, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-38551935

RESUMEN

Light-sheet microscopy has made possible the 3D imaging of both fixed and live biological tissue, with samples as large as the entire mouse brain. However, segmentation and quantification of that data remains a time-consuming manual undertaking. Machine learning methods promise the possibility of automating this process. This study seeks to advance the performance of prior models through optimizing transfer learning. We fine-tuned the existing TrailMap model using expert-labeled data from noradrenergic axonal structures in the mouse brain. By changing the cross-entropy weights and using augmentation, we demonstrate a generally improved adjusted F1-score over using the originally trained TrailMap model within our test datasets.


Asunto(s)
Aprendizaje Profundo , Animales , Ratones , Microscopía , Axones , Aprendizaje Automático , Encéfalo/diagnóstico por imagen
3.
J Proteome Res ; 2024 Feb 29.
Artículo en Inglés | MEDLINE | ID: mdl-38421884

RESUMEN

Proteoforms, the different forms of a protein with sequence variations including post-translational modifications (PTMs), execute vital functions in biological systems, such as cell signaling and epigenetic regulation. Advances in top-down mass spectrometry (MS) technology have permitted the direct characterization of intact proteoforms and their exact number of modification sites, allowing for the relative quantification of positional isomers (PI). Protein positional isomers refer to a set of proteoforms with identical total mass and set of modifications, but varying PTM site combinations. The relative abundance of PI can be estimated by matching proteoform-specific fragment ions to top-down tandem MS (MS2) data to localize and quantify modifications. However, the current approaches heavily rely on manual annotation. Here, we present IsoForma, an open-source R package for the relative quantification of PI within a single tool. Benchmarking IsoForma's performance against two existing workflows produced comparable results and improvements in speed. Overall, IsoForma provides a streamlined process for quantifying PI, reduces the analysis time, and offers an essential framework for developing customized proteoform analysis workflows. The software is open source and available at https://github.com/EMSL-Computing/isoforma-lib.

4.
Geohealth ; 8(2): e2023GH000937, 2024 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-38344245

RESUMEN

To understand how chemical exposure can impact health, researchers need tools that capture the complexities of personal chemical exposure. In practice, fine particulate matter (PM2.5) air quality index (AQI) data from outdoor stationary monitors and Hazard Mapping System (HMS) smoke density data from satellites are often used as proxies for personal chemical exposure, but do not capture total chemical exposure. Silicone wristbands can quantify more individualized exposure data than stationary air monitors or smoke satellites. However, it is not understood how these proxy measurements compare to chemical data measured from wristbands. In this study, participants wore daily wristbands, carried a phone that recorded locations, and answered daily questionnaires for a 7-day period in multiple seasons. We gathered publicly available daily PM2.5 AQI data and HMS data. We analyzed wristbands for 94 organic chemicals, including 53 polycyclic aromatic hydrocarbons. Wristband chemical detections and concentrations, behavioral variables (e.g., time spent indoors), and environmental conditions (e.g., PM2.5 AQI) significantly differed between seasons. Machine learning models were fit to predict personal chemical exposure using PM2.5 AQI only, HMS only, and a multivariate feature set including PM2.5 AQI, HMS, and other environmental and behavioral information. On average, the multivariate models increased predictive accuracy by approximately 70% compared to either the AQI model or the HMS model for all chemicals modeled. This study provides evidence that PM2.5 AQI data alone or HMS data alone is insufficient to explain personal chemical exposures. Our results identify additional key predictors of personal chemical exposure.

5.
iScience ; 27(2): 108769, 2024 Feb 16.
Artículo en Inglés | MEDLINE | ID: mdl-38303689

RESUMEN

Type 1 diabetes (T1D) is a chronic condition caused by autoimmune destruction of the insulin-producing pancreatic ß cells. While it is known that gene-environment interactions play a key role in triggering the autoimmune process leading to T1D, the pathogenic mechanism leading to the appearance of islet autoantibodies-biomarkers of autoimmunity-is poorly understood. Here we show that disruption of the complement system precedes the detection of islet autoantibodies and persists through disease onset. Our results suggest that children who exhibit islet autoimmunity and progress to clinical T1D have lower complement protein levels relative to those who do not progress within a similar time frame. Thus, the complement pathway, an understudied mechanistic and therapeutic target in T1D, merits increased attention for use as protein biomarkers of prediction and potentially prevention of T1D.

6.
Nat Chem Biol ; 2024 Feb 01.
Artículo en Inglés | MEDLINE | ID: mdl-38302607

RESUMEN

The leaf-cutter ant fungal garden ecosystem is a naturally evolved model system for efficient plant biomass degradation. Degradation processes mediated by the symbiotic fungus Leucoagaricus gongylophorus are difficult to characterize due to dynamic metabolisms and spatial complexity of the system. Herein, we performed microscale imaging across 12-µm-thick adjacent sections of Atta cephalotes fungal gardens and applied a metabolome-informed proteome imaging approach to map lignin degradation. This approach combines two spatial multiomics mass spectrometry modalities that enabled us to visualize colocalized metabolites and proteins across and through the fungal garden. Spatially profiled metabolites revealed an accumulation of lignin-related products, outlining morphologically unique lignin microhabitats. Metaproteomic analyses of these microhabitats revealed carbohydrate-degrading enzymes, indicating a prominent fungal role in lignocellulose decomposition. Integration of metabolome-informed proteome imaging data provides a comprehensive view of underlying biological pathways to inform our understanding of metabolic fungal pathways in plant matter degradation within the micrometer-scale environment.

7.
bioRxiv ; 2024 Feb 13.
Artículo en Inglés | MEDLINE | ID: mdl-38405958

RESUMEN

Background: The Human Proteome Project has credibly detected nearly 93% of the roughly 20,000 proteins which are predicted by the human genome. However, the proteome is enigmatic, where alterations in amino acid sequences from polymorphisms and alternative splicing, errors in translation, and post-translational modifications result in a proteome depth estimated at several million unique proteoforms. Recently mass spectrometry has been demonstrated in several landmark efforts mapping the human proteoform landscape in bulk analyses. Herein, we developed an integrated workflow for characterizing proteoforms from human tissue in a spatially resolved manner by coupling laser capture microdissection, nanoliter-scale sample preparation, and mass spectrometry imaging. Results: Using healthy human kidney sections as the case study, we focused our analyses on the major functional tissue units including glomeruli, tubules, and medullary rays. After laser capture microdissection, these isolated functional tissue units were processed with microPOTS (microdroplet processing in one-pot for trace samples) for sensitive top-down proteomics measurement. This provided a quantitative database of 616 proteoforms that was further leveraged as a library for mass spectrometry imaging with near-cellular spatial resolution over the entire section. Notably, several mitochondrial proteoforms were found to be differentially abundant between glomeruli and convoluted tubules, and further spatial contextualization was provided by mass spectrometry imaging confirming unique differences identified by microPOTS, and further expanding the field-of-view for unique distributions such as enhanced abundance of a truncated form (1-74) of ubiquitin within cortical regions. Conclusions: We developed an integrated workflow to directly identify proteoforms and reveal their spatial distributions. Where of the 20 differentially abundant proteoforms identified as discriminate between tubules and glomeruli by microPOTS, the vast majority of tubular proteoforms were of mitochondrial origin (8 of 10) where discriminate proteoforms in glomeruli were primarily hemoglobin subunits (9 of 10). These trends were also identified within ion images demonstrating spatially resolved characterization of proteoforms that has the potential to reshape discovery-based proteomics because the proteoforms are the ultimate effector of cellular functions. Applications of this technology have the potential to unravel etiology and pathophysiology of disease states, informing on biologically active proteoforms, which remodel the proteomic landscape in chronic and acute disorders.

8.
Artículo en Inglés | MEDLINE | ID: mdl-38177333

RESUMEN

BACKGROUND: Polycyclic aromatic hydrocarbons (PAHs) are a class of pervasive environmental pollutants with a variety of known health effects. While significant work has been completed to estimate personal exposure to PAHs, less has been done to identify sources of these exposures. Comprehensive characterization of reported sources of personal PAH exposure is a critical step to more easily identify individuals at risk of high levels of exposure and for developing targeted interventions based on source of exposure. OBJECTIVE: In this study, we leverage data from a New York (NY)-based birth cohort to identify personal characteristics or behaviors associated with personal PAH exposure and develop models for the prediction of PAH exposure. METHODS: We quantified 61 PAHs measured using silicone wristband samplers in association with 75 questionnaire variables from 177 pregnant individuals. We evaluated univariate associations between each compound and questionnaire variable, conducted regression tree analysis for each PAH compound and completed a principal component analysis of for each participant's entire PAH exposure profile to determine the predictors of PAH levels. RESULTS: Regression tree analyses of individual compounds and exposure mixture identified income, time spent outdoors, maternal age, country of birth, transportation type, and season as the variables most frequently predictive of exposure.

9.
Pac Symp Biocomput ; 29: 170-186, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-38160278

RESUMEN

Wearable silicone wristbands are a rapidly growing exposure assessment technology that offer researchers the ability to study previously inaccessible cohorts and have the potential to provide a more comprehensive picture of chemical exposure within diverse communities. However, there are no established best practices for analyzing the data within a study or across multiple studies, thereby limiting impact and access of these data for larger meta-analyses. We utilize data from three studies, from over 600 wristbands worn by participants in New York City and Eugene, Oregon, to present a first-of-its-kind manuscript detailing wristband data properties. We further discuss and provide concrete examples of key areas and considerations in common statistical modeling methods where best practices must be established to enable meta-analyses and integration of data from multiple studies. Finally, we detail important and challenging aspects of machine learning, meta-analysis, and data integration that researchers will face in order to extend beyond the limited scope of individual studies focused on specific populations.


Asunto(s)
Monitoreo del Ambiente , Dispositivos Electrónicos Vestibles , Humanos , Biología Computacional , Análisis de Datos , Monitoreo del Ambiente/métodos , Siliconas/química
10.
J Proteome Res ; 2023 Dec 12.
Artículo en Inglés | MEDLINE | ID: mdl-38085827

RESUMEN

PMart is a web-based tool for reproducible quality control, exploratory data analysis, statistical analysis, and interactive visualization of 'omics data, based on the functionality of the pmartR R package. The newly improved user interface supports more 'omics data types, additional statistical capabilities, and enhanced options for creating downloadable graphics. PMart supports the analysis of label-free and isobaric-labeled (e.g., TMT, iTRAQ) proteomics, nuclear magnetic resonance (NMR) and mass-spectrometry (MS)-based metabolomics, MS-based lipidomics, and ribonucleic acid sequencing (RNA-seq) transcriptomics data. At the end of a PMart session, a report is available that summarizes the processing steps performed and includes the pmartR R package functions used to execute the data processing. In addition, built-in safeguards in the backend code prevent users from utilizing methods that are inappropriate based on omics data type. PMart is a user-friendly interface for conducting exploratory data analysis and statistical comparisons of omics data without programming.

11.
bioRxiv ; 2023 Oct 23.
Artículo en Inglés | MEDLINE | ID: mdl-37961439

RESUMEN

Light-sheet microscopy has made possible the 3D imaging of both fixed and live biological tissue, with samples as large as the entire mouse brain. However, segmentation and quantification of that data remains a time-consuming manual undertaking. Machine learning methods promise the possibility of automating this process. This study seeks to advance the performance of prior models through optimizing transfer learning. We fine-tuned the existing TrailMap model using expert-labeled data from noradrenergic axonal structures in the mouse brain. By fine-tuning the final two layers of the neural network at a lower learning rate of the TrailMap model, we demonstrate an improved recall and an occasionally improved adjusted F1-score within our test dataset over using the originally trained TrailMap model.

12.
Metabolites ; 13(10)2023 Oct 21.
Artículo en Inglés | MEDLINE | ID: mdl-37887426

RESUMEN

Metabolomics provides a unique snapshot into the world of small molecules and the complex biological processes that govern the human, animal, plant, and environmental ecosystems encapsulated by the One Health modeling framework. However, this "molecular snapshot" is only as informative as the number of metabolites confidently identified within it. The spectral similarity (SS) score is traditionally used to identify compound(s) in mass spectrometry approaches to metabolomics, where spectra are matched to reference libraries of candidate spectra. Unfortunately, there is little consensus on which of the dozens of available SS metrics should be used. This lack of standard SS score creates analytic uncertainty and potentially leads to issues in reproducibility, especially as these data are integrated across other domains. In this work, we use metabolomic spectral similarity as a case study to showcase the challenges in consistency within just one piece of the One Health framework that must be addressed to enable data science approaches for One Health problems. Here, using a large cohort of datasets comprising both standard and complex datasets with expert-verified truth annotations, we evaluated the effectiveness of 66 similarity metrics to delineate between correct matches (true positives) and incorrect matches (true negatives). We additionally characterize the families of these metrics to make informed recommendations for their use. Our results indicate that specific families of metrics (the Inner Product, Correlative, and Intersection families of scores) tend to perform better than others, with no single similarity metric performing optimally for all queried spectra. This work and its findings provide an empirically-based resource for researchers to use in their selection of similarity metrics for GC-MS identification, increasing scientific reproducibility through taking steps towards standardizing identification workflows.

13.
Mil Med Res ; 10(1): 48, 2023 10 18.
Artículo en Inglés | MEDLINE | ID: mdl-37853489

RESUMEN

BACKGROUND: Physiological and biochemical processes across tissues of the body are regulated in response to the high demands of intense physical activity in several occupations, such as firefighting, law enforcement, military, and sports. A better understanding of such processes can ultimately help improve human performance and prevent illnesses in the work environment. METHODS: To study regulatory processes in intense physical activity simulating real-life conditions, we performed a multi-omics analysis of three biofluids (blood plasma, urine, and saliva) collected from 11 wildland firefighters before and after a 45 min, intense exercise regimen. Omics profiles post- versus pre-exercise were compared by Student's t-test followed by pathway analysis and comparison between the different omics modalities. RESULTS: Our multi-omics analysis identified and quantified 3835 proteins, 730 lipids and 182 metabolites combining the 3 different types of samples. The blood plasma analysis revealed signatures of tissue damage and acute repair response accompanied by enhanced carbon metabolism to meet energy demands. The urine analysis showed a strong, concomitant regulation of 6 out of 8 identified proteins from the renin-angiotensin system supporting increased excretion of catabolites, reabsorption of nutrients and maintenance of fluid balance. In saliva, we observed a decrease in 3 pro-inflammatory cytokines and an increase in 8 antimicrobial peptides. A systematic literature review identified 6 papers that support an altered susceptibility to respiratory infection. CONCLUSION: This study shows simultaneous regulatory signatures in biofluids indicative of homeostatic maintenance during intense physical activity with possible effects on increased infection susceptibility, suggesting that caution against respiratory diseases could benefit workers on highly physical demanding jobs.


Asunto(s)
Ejercicio Físico , Multiómica , Humanos , Ejercicio Físico/fisiología , Citocinas
14.
bioRxiv ; 2023 Oct 02.
Artículo en Inglés | MEDLINE | ID: mdl-37873084

RESUMEN

Wearable silicone wristbands are a rapidly growing exposure assessment technology that offer researchers the ability to study previously inaccessible cohorts and have the potential to provide a more comprehensive picture of chemical exposure within diverse communities. However, there are no established best practices for analyzing the data within a study or across multiple studies, thereby limiting impact and access of these data for larger meta-analyses. We utilize data from three studies, from over 600 wristbands worn by participants in New York City and Eugene, Oregon, to present a first-of-its-kind manuscript detailing wristband data properties. We further discuss and provide concrete examples of key areas and considerations in common statistical modeling methods where best practices must be established to enable meta-analyses and integration of data from multiple studies. Finally, we detail important and challenging aspects of machine learning, meta-analysis, and data integration that researchers will face in order to extend beyond the limited scope of individual studies focused on specific populations.

15.
bioRxiv ; 2023 Jul 25.
Artículo en Inglés | MEDLINE | ID: mdl-37546869

RESUMEN

Sphingomyelin (SM) is a major component of mammalian cell membranes and particularly abundant in the myelin sheath that surrounds nerve fibers. Its production is catalyzed by SM synthases SMS1 and SMS2, which interconvert phosphatidylcholine and ceramide to diacylglycerol and SM in the Golgi and at the plasma membrane, respectively. As the lipids participating in this reaction fulfill both structural and signaling functions, SMS enzymes have considerable potential to influence diverse important cellular processes. The nematode Caenorhabditis elegans is an attractive model for studying both animal development and human disease. The organism contains five SMS homologues but none of these have been characterized in any detail. Here, we carried out the first systematic analysis of SMS family members in C. elegans . Using heterologous expression systems, genetic ablation, metabolic labeling and lipidome analyses, we show that C. elegans harbors at least three distinct SM synthases and one ceramide phosphoethanolamine (CPE) synthase. Moreover, C. elegans SMS family members have partially overlapping but also unique subcellular distributions and together occupy all principal compartments of the secretory pathway. Our findings shed light on crucial aspects of sphingolipid metabolism in a valuable animal model and opens avenues for exploring the role of SM and its metabolic intermediates in organismal development.

16.
Anal Chem ; 95(33): 12195-12199, 2023 08 22.
Artículo en Inglés | MEDLINE | ID: mdl-37551970

RESUMEN

Mass spectrometry is a powerful tool for identifying and analyzing biomolecules such as metabolites and lipids in complex biological samples. Liquid chromatography and gas chromatography mass spectrometry studies quite commonly involve large numbers of samples, which can require significant time for sample preparation and analyses. To accommodate such studies, the samples are commonly split into batches. Inevitably, variations in sample handling, temperature fluctuation, imprecise timing, column degradation, and other factors result in systematic errors or biases of the measured abundances between the batches. Numerous methods are available via R packages to assist with batch correction for omics data; however, since these methods were developed by different research teams, the algorithms are available in separate R packages, each with different data input and output formats. We introduce the malbacR package, which consolidates 11 common batch effect correction methods for omics data into one place so users can easily implement and compare the following: pareto scaling, power scaling, range scaling, ComBat, EigenMS, NOMIS, RUV-random, QC-RLSC, WaveICA2.0, TIGER, and SERRF. The malbacR package standardizes data input and output formats across these batch correction methods. The package works in conjunction with the pmartR package, allowing users to seamlessly include the batch effect correction in a pmartR workflow without needing any additional data manipulation.


Asunto(s)
Algoritmos , Proyectos de Investigación , Cromatografía Liquida/métodos , Espectrometría de Masas/métodos , Cromatografía de Gases y Espectrometría de Masas
17.
medRxiv ; 2023 Jul 16.
Artículo en Inglés | MEDLINE | ID: mdl-37502972

RESUMEN

Type 1 diabetes (T1D) is a chronic condition caused by autoimmune destruction of the insulin-producing pancreatic ß-cells. While it is known that gene-environment interactions play a key role in triggering the autoimmune process leading to T1D, the pathogenic mechanism leading to the appearance of islet autoantibodies - biomarkers of autoimmunity - is poorly understood. Here we show that disruption of the complement system precedes the detection of islet autoantibodies and persists through disease onset. Our results suggest that children who exhibit islet autoimmunity and progress to clinical T1D have lower complement protein levels relative to those who do not progress within a similar timeframe. Thus, the complement pathway, an understudied mechanistic and therapeutic target in T1D, merits increased attention for use as protein biomarkers of prediction and potentially prevention of T1D.

18.
J Am Soc Mass Spectrom ; 34(9): 2061-2064, 2023 Sep 06.
Artículo en Inglés | MEDLINE | ID: mdl-37523489

RESUMEN

Due to its speed, accuracy, and adaptability to various sample types, matrix-assisted laser desorption/ionization mass spectrometry (MALDI-MS) has become a popular method to identify molecular isotope profiles from biological samples. Often MALDI-MS data do not include tandem MS fragmentation data, and thus the identification of compounds in samples requires external databases so that the accurate mass of detected signals can be matched to known molecular compounds. Most relevant MALDI-MS software tools developed to confirm compound identifications are focused on small molecules (e.g., metabolites, lipids) and cannot be easily adapted to protein data due to their more complex isotopic distributions. Here, we present an R package called IsoMatchMS for the automated annotation of MALDI-MS data for multiple datatypes: intact proteins, peptides, and glycans. This tool accepts already derived molecular formulas or, for proteomics applications, can derive molecular formulas from a list of input peptides or proteins including proteins with post-translational modifications. Visualization of all matched isotopic profiles is provided in a highly accessible HTML format called a trelliscope display, which allows users to filter and sort by several parameters such as match scores and the number of peaks matched. IsoMatchMS simplifies the annotation and visualization of MALDI-MS data for downstream analyses.


Asunto(s)
Proteínas , Programas Informáticos , Espectrometría de Masa por Láser de Matriz Asistida de Ionización Desorción/métodos , Proteínas/química , Péptidos , Proteómica/métodos
19.
Cell Rep Med ; 4(7): 101093, 2023 07 18.
Artículo en Inglés | MEDLINE | ID: mdl-37390828

RESUMEN

Type 1 diabetes (T1D) results from autoimmune destruction of ß cells. Insufficient availability of biomarkers represents a significant gap in understanding the disease cause and progression. We conduct blinded, two-phase case-control plasma proteomics on the TEDDY study to identify biomarkers predictive of T1D development. Untargeted proteomics of 2,252 samples from 184 individuals identify 376 regulated proteins, showing alteration of complement, inflammatory signaling, and metabolic proteins even prior to autoimmunity onset. Extracellular matrix and antigen presentation proteins are differentially regulated in individuals who progress to T1D vs. those that remain in autoimmunity. Targeted proteomics measurements of 167 proteins in 6,426 samples from 990 individuals validate 83 biomarkers. A machine learning analysis predicts if individuals would remain in autoimmunity or develop T1D 6 months before autoantibody appearance, with areas under receiver operating characteristic curves of 0.871 and 0.918, respectively. Our study identifies and validates biomarkers, highlighting pathways affected during T1D development.


Asunto(s)
Diabetes Mellitus Tipo 1 , Células Secretoras de Insulina , Humanos , Diabetes Mellitus Tipo 1/diagnóstico , Autoinmunidad , Autoanticuerpos , Biomarcadores
20.
Anal Chem ; 95(19): 7536-7544, 2023 05 16.
Artículo en Inglés | MEDLINE | ID: mdl-37129113

RESUMEN

As metabolomics grows into a high-throughput and high demand research field, current metrics for the identification of small molecules in gas chromatography-mass spectrometry (GC-MS) still require manual verification. Though steps have been taken to improve scoring metrics by combining spectral similarity (SS) and retention index (RI), the problem persists. A large body of literature has analyzed and refined SS scores, but few studies have explicitly studied improvements to RI scores. Here, we examined whether uninvestigated assumptions of the RI score are valid and propose ways to improve them. Query RIs were matched to library RI with a generous window of ±35 to avoid unintentional removal of valid compound identifications. Each match was manually verified as a true positive (TP), true negative, or unknown. Metabolites with at least 30 TP identifications were included in downstream analyses, resulting in a total of 87 metabolites from samples of varying complexity and type (e.g., amino acid mixtures, human urine, fungal species, and so on.). Our results showed that the RI score assumptions of normality, consistent variance across metabolites, and a mean error centered at 0 are often violated. We demonstrated through a cross-validation analysis that modifying these underlying assumptions according to empirical metabolite-specific distributions improved the TP and negative rankings. Further, we statistically determined the minimum number of samples required to estimate distributional parameters for scoring metrics. Overall, this work proposes a robust statistical pipeline to reduce the time bottleneck of metabolite identification by improving RI scores and thus minimize the effort to complete manual verification.


Asunto(s)
Metabolómica , Humanos , Cromatografía de Gases y Espectrometría de Masas/métodos , Metabolómica/métodos
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...