RESUMEN
Many of the health-associated impacts of the microbiome are mediated by its chemical activity, producing and modifying small molecules (metabolites). Thus, microbiome metabolite quantification has a central role in efforts to elucidate and measure microbiome function. In this review, we cover general considerations when designing experiments to quantify microbiome metabolites, including sample preparation, data acquisition and data processing, since these are critical to downstream data quality. We then discuss data analysis and experimental steps to demonstrate that a given metabolite feature is of microbial origin. We further discuss techniques used to quantify common microbial metabolites, including short-chain fatty acids (SCFA), secondary bile acids (BAs), tryptophan derivatives, N-acyl amides and trimethylamine N-oxide (TMAO). Lastly, we conclude with challenges and future directions for the field.
Asunto(s)
Microbioma Gastrointestinal , Microbiota , Humanos , Microbiota/genética , Ácidos Grasos Volátiles/metabolismo , Metilaminas/metabolismoRESUMEN
State-of-the-art mass spectrometers combined with modern bioinformatics algorithms for peptide-to-spectrum matching (PSM) with robust statistical scoring allow for more variable features (i.e., post-translational modifications) being reliably identified from (tandem-) mass spectrometry data, often without the need for biochemical enrichment. Semi-specific proteome searches, that enforce a theoretical enzymatic digestion to solely the N- or C-terminal end, allow to identify of native protein termini or those arising from endogenous proteolytic activity (also referred to as "neo-N-termini" analysis or "N-terminomics"). Nevertheless, deriving biological meaning from these search outputs can be challenging in terms of data mining and analysis. Thus, we introduce TermineR, a data analysis approach for the (1) annotation of peptides according to their enzymatic cleavage specificity and known protein processing features, (2) differential abundance and enrichment analysis of N-terminal sequence patterns, and (3) visualization of neo-N-termini location. We illustrate the use of TermineR by applying it to tandem mass tag (TMT)-based proteomics data of a mouse model of polycystic kidney disease, and assess the semi-specific searches for biological interpretation of cleavage events and the variable contribution of proteolytic products to general protein abundance. The TermineR approach and example data are available as an R package at https://github.com/MiguelCos/TermineR.
Asunto(s)
Proteolisis , Proteómica , Espectrometría de Masas en Tándem , Proteómica/métodos , Animales , Ratones , Espectrometría de Masas en Tándem/métodos , Procesamiento Proteico-Postraduccional , Algoritmos , Enfermedades Renales Poliquísticas/metabolismo , Proteoma/metabolismo , Proteoma/análisis , Programas Informáticos , Bases de Datos de Proteínas , Péptidos/metabolismo , Péptidos/análisis , Péptidos/químicaRESUMEN
Machine learning (ML) and deep learning (DL) models for peptide property prediction such as Prosit have enabled the creation of high quality in silico reference libraries. These libraries are used in various applications, ranging from data-independent acquisition (DIA) data analysis to data-driven rescoring of search engine results. Here, we present Oktoberfest, an open source Python package of our spectral library generation and rescoring pipeline originally only available online via ProteomicsDB. Oktoberfest is largely search engine agnostic and provides access to online peptide property predictions, promoting the adoption of state-of-the-art ML/DL models in proteomics analysis pipelines. We demonstrate its ability to reproduce and even improve our results from previously published rescoring analyses on two distinct use cases. Oktoberfest is freely available on GitHub (https://github.com/wilhelm-lab/oktoberfest) and can easily be installed locally through the cross-platform PyPI Python package.
Asunto(s)
Proteómica , Programas Informáticos , Proteómica/métodos , Péptidos , AlgoritmosRESUMEN
The human gut microbiome plays a vital role in preserving individual health and is intricately involved in essential functions. Imbalances or dysbiosis within the microbiome can significantly impact human health and are associated with many diseases. Several metaproteomics platforms are currently available to study microbial proteins within complex microbial communities. In this study, we attempted to develop an integrated pipeline to provide deeper insights into both the taxonomic and functional aspects of the cultivated human gut microbiomes derived from clinical colon biopsies. We combined a rapid peptide search by MSFragger against the Unified Human Gastrointestinal Protein database and the taxonomic and functional analyses with Unipept Desktop and MetaLab-MAG. Across seven samples, we identified and matched nearly 36,000 unique peptides to approximately 300 species and 11 phyla. Unipept Desktop provided gene ontology, InterPro entries, and enzyme commission number annotations, facilitating the identification of relevant metabolic pathways. MetaLab-MAG contributed functional annotations through Clusters of Orthologous Genes and Non-supervised Orthologous Groups categories. These results unveiled functional similarities and differences among the samples. This integrated pipeline holds the potential to provide deeper insights into the taxonomy and functions of the human gut microbiome for interrogating the intricate connections between microbiome balance and diseases.
RESUMEN
Ion mobility spectrometry-mass spectrometry (IMS-MS or IM-MS) is a powerful analytical technique that combines the gas-phase separation capabilities of IM with the identification and quantification capabilities of MS. IM-MS can differentiate molecules with indistinguishable masses but different structures (e.g., isomers, isobars, molecular classes, and contaminant ions). The importance of this analytical technique is reflected by a staged increase in the number of applications for molecular characterization across a variety of fields, from different MS-based omics (proteomics, metabolomics, lipidomics, etc.) to the structural characterization of glycans, organic matter, proteins, and macromolecular complexes. With the increasing application of IM-MS there is a pressing need for effective and accessible computational tools. This article presents an overview of the most recent free and open-source software tools specifically tailored for the analysis and interpretation of data derived from IM-MS instrumentation. This review enumerates these tools and outlines their main algorithmic approaches, while highlighting representative applications across different fields. Finally, a discussion of current limitations and expectable improvements is presented.
Asunto(s)
Algoritmos , Espectrometría de Movilidad Iónica , Espectrometría de Masas , Programas Informáticos , Espectrometría de Movilidad Iónica/métodos , Espectrometría de Masas/métodos , Proteómica/métodos , Metabolómica/métodos , HumanosRESUMEN
Direct-to-Mass Spectrometry and ambient ionization techniques can be used for biochemical fingerprinting in a fast way. Data processing is typically accomplished with vendor-provided software tools. Here, a novel, open-source functionality, entitled Tidy-Direct-to-MS, was developed for data processing of direct-to-MS data sets. It allows for fast and user-friendly processing using different modules for optional sample position detection and separation, mass-to-charge ratio drift detection and correction, consensus spectra calculation, and bracketing across sample positions as well as feature abundance calculation. The tool also provides functionality for the automated comparison of different sets of parameters, thereby assisting the user in the complex task of finding an optimal combination to maximize the total number of detected features while also checking for the detection of user-provided reference features. In addition, Tidy-Direct-to-MS has the capability for data quality review and subsequent data analysis, thereby simplifying the workflow of untargeted ambient MS-based metabolomics studies. Tidy-Direct-to-MS is implemented in the Python programming language as part of the TidyMS library and can thus be easily extended. Capabilities of Tidy-Direct-to-MS are showcased in a data set acquired in a marine metabolomics study reported in MetaboLights (MTBLS1198) using a transmission mode Direct Analysis in Real Time-Mass Spectrometry (TM-DART-MS)-based method.
Asunto(s)
Espectrometría de Masas , Metabolómica , Programas Informáticos , Metabolómica/métodos , Espectrometría de Masas/métodos , Lenguajes de ProgramaciónRESUMEN
Effective connectivity (EC) refers to directional or causal influences between interacting neuronal populations or brain regions and can be estimated from functional magnetic resonance imaging (fMRI) data via dynamic causal modeling (DCM). In contrast to functional connectivity, the impact of data processing varieties on DCM estimates of task-evoked EC has hardly ever been addressed. We therefore investigated how task-evoked EC is affected by choices made for data processing. In particular, we considered the impact of global signal regression (GSR), block/event-related design of the general linear model (GLM) used for the first-level task-evoked fMRI analysis, type of activation contrast, and significance thresholding approach. Using DCM, we estimated individual and group-averaged task-evoked EC within a brain network related to spatial conflict processing for all the parameters considered and compared the differences in task-evoked EC between any two data processing conditions via between-group parametric empirical Bayes (PEB) analysis and Bayesian data comparison (BDC). We observed strongly varying patterns of the group-averaged EC depending on the data processing choices. In particular, task-evoked EC and parameter certainty were strongly impacted by GLM design and type of activation contrast as revealed by PEB and BDC, respectively, whereas they were little affected by GSR and the type of significance thresholding. The event-related GLM design appears to be more sensitive to task-evoked modulations of EC, but provides model parameters with lower certainty than the block-based design, while the latter is more sensitive to the type of activation contrast than is the event-related design. Our results demonstrate that applying different reasonable data processing choices can substantially alter task-evoked EC as estimated by DCM. Such choices should be made with care and, whenever possible, varied across parallel analyses to evaluate their impact and identify potential convergence for robust outcomes.
Asunto(s)
Teorema de Bayes , Mapeo Encefálico , Encéfalo , Imagen por Resonancia Magnética , Humanos , Encéfalo/fisiología , Encéfalo/diagnóstico por imagen , Masculino , Femenino , Mapeo Encefálico/métodos , Adulto , Adulto Joven , Modelos Neurológicos , Procesamiento de Imagen Asistido por Computador/métodos , Vías Nerviosas/fisiología , Vías Nerviosas/diagnóstico por imagenRESUMEN
Compact but precise feature-extracting ability is core to processing complex computational tasks in neuromorphic hardware. Physical reservoir computing (RC) offers a robust framework to map temporal data into a high-dimensional space using the time dynamics of a material system, such as a volatile memristor. However, conventional physical RC systems have limited dynamics for the given material properties, restricting the methods to increase their dimensionality. This study proposes an integrated temporal kernel composed of a 2-memristor and 1-capacitor (2M1C) using a W/HfO2/TiN memristor and TiN/ZrO2/Al2O3/ZrO2/TiN capacitor to achieve higher dimensionality and tunable dynamics. The kernel elements are carefully designed and fabricated into an integrated array, of which performances are evaluated under diverse conditions. By optimizing the time dynamics of the 2M1C kernel, each memristor simultaneously extracts complementary information from input signals. The MNIST benchmark digit classification task achieves a high accuracy of 94.3% with a (196×10) single-layer network. Analog input mapping ability is tested with a Mackey-Glass time series prediction, and the system records a normalized root mean square error of 0.04 with a 20×1 readout network, the smallest readout network ever used for Mackey-Glass prediction in RC. These performances demonstrate its high potential for efficient temporal data analysis.
RESUMEN
With the development of synchrotron radiation sources and high-frame-rate detectors, the amount of experimental data collected at synchrotron radiation beamlines has increased exponentially. As a result, data processing for synchrotron radiation experiments has entered the era of big data. It is becoming increasingly important for beamlines to have the capability to process large-scale data in parallel to keep up with the rapid growth of data. Currently, there is no set of data processing solutions based on the big data technology framework for beamlines. Apache Hadoop is a widely used distributed system architecture for solving the problem of massive data storage and computation. This paper presents a set of distributed data processing schemes for beamlines with experimental data using Hadoop. The Hadoop Distributed File System is utilized as the distributed file storage system, and Hadoop YARN serves as the resource scheduler for the distributed computing cluster. A distributed data processing pipeline that can carry out massively parallel computation is designed and developed using Hadoop Spark. The entire data processing platform adopts a distributed microservice architecture, which makes the system easy to expand, reduces module coupling and improves reliability.
RESUMEN
During X-ray diffraction experiments on single crystals, the diffracted beam intensities may be affected by multiple-beam X-ray diffraction (MBD). This effect is particularly frequent at higher X-ray energies and for larger unit cells. The appearance of this so-called Renninger effect often impairs the interpretation of diffracted intensities. This applies in particular to energy spectra analysed in resonant experiments, since during scans of the incident photon energy these conditions are necessarily met for specific X-ray energies. This effect can be addressed by carefully avoiding multiple-beam reflection conditions at a given X-ray energy and a given position in reciprocal space. However, areas which are (nearly) free of MBD are not always available. This article presents a universal concept of data acquisition and post-processing for resonant X-ray diffraction experiments. Our concept facilitates the reliable determination of kinematic (MBD-free) resonant diffraction intensities even at relatively high energies which, in turn, enables the study of higher absorption edges. This way, the applicability of resonant diffraction, e.g. to reveal the local atomic and electronic structure or chemical environment, is extended for a vast majority of crystalline materials. The potential of this approach compared with conventional data reduction is demonstrated by the measurements of the Ta L3 edge of well studied lithium tantalate LiTaO3.
RESUMEN
Deflectometric profilometers are used to precisely measure the form of beam shaping optics of synchrotrons and X-ray free-electron lasers. They often utilize autocollimators which measure slope by evaluating the displacement of a reticle image on a detector. Based on our privileged access to the raw image data of an autocollimator, novel strategies to reduce the systematic measurement errors by using a set of overlapping images of the reticle obtained at different positions on the detector are discussed. It is demonstrated that imaging properties such as, for example, geometrical distortions and vignetting, can be extracted from this redundant set of images without recourse to external calibration facilities. This approach is based on the fact that the properties of the reticle itself do not change - all changes in the reticle image are due to the imaging process. Firstly, by combining interpolation and correlation, it is possible to determine the shift of a reticle image relative to a reference image with minimal error propagation. Secondly, the intensity of the reticle image is analysed as a function of its position on the CCD and a vignetting correction is calculated. Thirdly, the size of the reticle image is analysed as a function of its position and an imaging distortion correction is derived. It is demonstrated that, for different measurement ranges and aperture diameters of the autocollimator, reductions in the systematic errors of up to a factor of four to five can be achieved without recourse to external measurements.
RESUMEN
Large-scale metabolomics is a powerful technique that has attracted widespread attention in biomedical studies focused on identifying biomarkers and interpreting the mechanisms of complex diseases. Despite a rapid increase in the number of large-scale metabolomic studies, the analysis of metabolomic data remains a key challenge. Specifically, diverse unwanted variations and batch effects in processing many samples have a substantial impact on identifying true biological markers, and it is a daunting challenge to annotate a plethora of peaks as metabolites in untargeted mass spectrometry-based metabolomics. Therefore, the development of an out-of-the-box tool is urgently needed to realize data integration and to accurately annotate metabolites with enhanced functions. In this study, the LargeMetabo package based on R code was developed for processing and analyzing large-scale metabolomic data. This package is unique because it is capable of (1) integrating multiple analytical experiments to effectively boost the power of statistical analysis; (2) selecting the appropriate biomarker identification method by intelligent assessment for large-scale metabolic data and (3) providing metabolite annotation and enrichment analysis based on an enhanced metabolite database. The LargeMetabo package can facilitate flexibility and reproducibility in large-scale metabolomics. The package is freely available from https://github.com/LargeMetabo/LargeMetabo.
Asunto(s)
Metabolómica , Programas Informáticos , Reproducibilidad de los Resultados , Metabolómica/métodos , Espectrometría de Masas , BiomarcadoresRESUMEN
A comprehensive analysis of omics data can require vast computational resources and access to varied data sources that must be integrated into complex, multi-step analysis pipelines. Execution of many such analyses can be accelerated by applying the cloud computing paradigm, which provides scalable resources for storing data of different types and parallelizing data analysis computations. Moreover, these resources can be reused for different multi-omics analysis scenarios. Traditionally, developers are required to manage a cloud platform's underlying infrastructure, configuration, maintenance and capacity planning. The serverless computing paradigm simplifies these operations by automatically allocating and maintaining both servers and virtual machines, as required for analysis tasks. This paradigm offers highly parallel execution and high scalability without manual management of the underlying infrastructure, freeing developers to focus on operational logic. This paper reviews serverless solutions in bioinformatics and evaluates their usage in omics data analysis and integration. We start by reviewing the application of the cloud computing model to a multi-omics data analysis and exposing some shortcomings of the early approaches. We then introduce the serverless computing paradigm and show its applicability for performing an integrative analysis of multiple omics data sources in the context of the COVID-19 pandemic.
Asunto(s)
COVID-19/genética , COVID-19/metabolismo , Nube Computacional , Biología Computacional , Genómica , Pandemias , SARS-CoV-2 , Programas Informáticos , COVID-19/epidemiología , Humanos , SARS-CoV-2/genética , SARS-CoV-2/metabolismoRESUMEN
The isotope distribution, which reflects the number and probabilities of occurrence of different isotopologues of a molecule, can be theoretically calculated. With the current generation of (ultra)-high-resolution mass spectrometers, the isotope distribution of molecules can be measured with high sensitivity, resolution, and mass accuracy. However, the observed isotope distribution can differ substantially from the expected isotope distribution. Although differences between the observed and expected isotope distribution can complicate the analysis and interpretation of mass spectral data, they can be helpful in a number of specific applications. These applications include, yet are not limited to, the identification of peptides in proteomics, elucidation of the elemental composition of small organic molecules and metabolites, as well as wading through peaks in mass spectra of complex bioorganic mixtures such as petroleum and humus. In this review, we give a nonexhaustive overview of factors that have an impact on the observed isotope distribution, such as elemental isotope deviations, ion sampling, ion interactions, electronic noise and dephasing, centroiding, and apodization. These factors occur at different stages of obtaining the isotope distribution: during the collection of the sample, during the ionization and intake of a molecule in a mass spectrometer, during the mass separation and detection of ionized molecules, and during signal processing.
RESUMEN
Severe fever with thrombocytopenia syndrome (SFTS) is a widespread infectious disease with high mortality. Hence, identifying valuable biomarkers for detecting the early changes in SFTS is crucial. In this study, we investigated the relationship between the difference in hematocrit (HCT) and serum albumin (ALB) levels (HCT-ALB) and the prognosis of patients with SFTS virus infection. After excluding the patients who did not meet the SFTS diagnostic criteria, those with SFTS from the First Affiliated Hospital of Wannan Medical College were divided into a fatal and Nonfatal group based on their disease prognosis. A dynamic analysis of the daily laboratory data was conducted for 14 days following SFTS onset. A receiver operating characteristic (ROC) curve was used to evaluate the predictive value of HCT-ALB. Another sample of patients with SFTS admitted to the First Affiliated Hospital of Nanjing Medical University was utilized to verify the study conclusions. A total of 158 patients with SFTS were included. Among them, 126 patients were categorized in the Nonfatal group and 32 in the fatal group, leading to a mortality rate of 20.25% (32/158). Univariate analysis of the laboratory test findings and ROC curve analysis showed that alanine aminotransferase (ALT), aspartate aminotransferase (AST), HCT-ALB, and lactate dehydrogenase (LDH) had a relatively better ability to discriminate the disease condition of the patients with SFTS. Moreover, HCT-ALB served as a predictor of SFTS prognosis. Additionally, an area under the ROC curve (AUC) of 0.777 and a critical HCT-ALB value of 4.75 on day 7 were associated with a sensitivity of 83.3% and a specificity of 73.9%. On day 8 (AUC = 0.882), the critical value of HCT-ALB was 9.25, while the sensitivity was 100% and specificity was 76.5%. Further verification based on the data of 91 patients with SFTS admitted to the First Affiliated Hospital of Nanjing Medical University demonstrated a mortality rate of 51% (24/47) among those with HCT-ALB values >4.75 on day 7 of the disease course, highlighting the potential of the HCT-ALB value of >4.75 for predicting SFTS prognosis. High HCT-ALB values are closely related to the mortality of patients with SFTS. HCT-ALB is a sensitive and independent predictor of early disease in patients with SFTS.
Asunto(s)
Biomarcadores , Curva ROC , Albúmina Sérica , Síndrome de Trombocitopenia Febril Grave , Humanos , Masculino , Femenino , Pronóstico , Persona de Mediana Edad , Biomarcadores/sangre , Hematócrito , Anciano , Síndrome de Trombocitopenia Febril Grave/diagnóstico , Síndrome de Trombocitopenia Febril Grave/sangre , Síndrome de Trombocitopenia Febril Grave/mortalidad , Albúmina Sérica/análisis , Adulto , Phlebovirus , Índice de Severidad de la Enfermedad , Anciano de 80 o más Años , Aspartato Aminotransferasas/sangreRESUMEN
A large-scale outbreak of Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) occurred in Shanghai, China, in early December 2022. To study the incidence and characteristics of otitis media with effusion (OME) complicating SARS-CoV-2, we collected 267 middle ear effusion (MEE) samples and 172 nasopharyngeal (NP) swabs from patients. The SARS-CoV-2 virus was detected by RT-PCR targeting. The SARS-CoV-2 virus, angiotensin-converting enzyme 2 (ACE2), and transmembrane serine protease 2 (TMPRSS2) expression in human samples was examined via immunofluorescence. During the COVID-19 epidemic in 2022, the incidence of OME (3%) significantly increased compared to the same period from 2020 to 2022. Ear symptoms in patients with SARS-CoV-2 complicated by OME generally appeared late, even after a negative NP swab, an average of 9.33 ± 6.272 days after COVID-19 infection. The SARS-CoV-2 virus was detected in MEE, which had a higher viral load than NP swabs. The insertion rate of tympanostomy tubes was not significantly higher than in OME patients in 2019-2022. Virus migration led to high viral loads in MEE despite negative NP swabs, indicating that OME lagged behind respiratory infections but had a favorable prognosis. Furthermore, middle ear tissue from adult humans coexpressed the ACE2 receptor for the SARS-CoV-2 virus and the TMPRSS2 cofactors required for virus entry.
Asunto(s)
COVID-19 , Otitis Media con Derrame , Adulto , Humanos , SARS-CoV-2 , COVID-19/complicaciones , Enzima Convertidora de Angiotensina 2 , China/epidemiologíaRESUMEN
INTRODUCTION: Untargeted direct mass spectrometric analysis of volatile organic compounds has many potential applications across fields such as healthcare and food safety. However, robust data processing protocols must be employed to ensure that research is replicable and practical applications can be realised. User-friendly data processing and statistical tools are becoming increasingly available; however, the use of these tools have neither been analysed, nor are they necessarily suited for every data type. OBJECTIVES: This review aims to analyse data processing and analytic workflows currently in use and examine whether methodological reporting is sufficient to enable replication. METHODS: Studies identified from Web of Science and Scopus databases were systematically examined against the inclusion criteria. The experimental, data processing, and data analysis workflows were reviewed for the relevant studies. RESULTS: From 459 studies identified from the databases, a total of 110 met the inclusion criteria. Very few papers provided enough detail to allow all aspects of the methodology to be replicated accurately, with only three meeting previous guidelines for reporting experimental methods. A wide range of data processing methods were used, with only eight papers (7.3%) employing a largely similar workflow where direct comparability was achievable. CONCLUSIONS: Standardised workflows and reporting systems need to be developed to ensure research in this area is replicable, comparable, and held to a high standard. Thus, allowing the wide-ranging potential applications to be realised.
Asunto(s)
Espectrometría de Masas , Compuestos Orgánicos Volátiles , Compuestos Orgánicos Volátiles/análisis , Espectrometría de Masas/métodos , Espectrometría de Masas/normas , Humanos , Metabolómica/métodos , Metabolómica/normasRESUMEN
Light-sheet fluorescence microscopy (LSFM), a prominent fluorescence microscopy technique, offers enhanced temporal resolution for imaging biological samples in four dimensions (4D; x, y, z, time). Some of the most recent implementations, including inverted selective plane illumination microscopy (iSPIM) and lattice light-sheet microscopy (LLSM), move the sample substrate at an oblique angle relative to the detection objective's optical axis. Data from such tilted-sample-scan LSFMs require subsequent deskewing and rotation for proper visualisation and analysis. Such data preprocessing operations currently demand substantial memory allocation and pose significant computational challenges for large 4D dataset. The consequence is prolonged data preprocessing time compared to data acquisition time, which limits the ability for live-viewing the data as it is being captured by the microscope. To enable the fast preprocessing of large light-sheet microscopy datasets without significant hardware demand, we have developed WH-Transform, a memory-efficient transformation algorithm for deskewing and rotating the raw dataset, significantly reducing memory usage and the run time by more than 10-fold for large image stacks. Benchmarked against the conventional method and existing software, our approach demonstrates linear runtime compared to the cubic and quadratic runtime of the other approaches. Preprocessing a raw 3D volume of 2 GB (512 × 1536 × 600 pixels) can be accomplished in 3 s using a GPU with 24 GB of memory on a single workstation. Applied to 4D LLSM datasets of human hepatocytes, lung organoid tissue and brain organoid tissue, our method provided rapid and accurate preprocessing within seconds. Importantly, such preprocessing speeds now allow visualisation of the raw microscope data stream in real time, significantly improving the usability of LLSM in biology. In summary, this advancement holds transformative potential for light-sheet microscopy, enabling real-time, on-the-fly data preprocessing, visualisation, and analysis on standard workstations, thereby revolutionising biological imaging applications for LLSM and similar microscopes.
RESUMEN
Structured reporting (SR) has long been a goal in radiology to standardize and improve the quality of radiology reports. Despite evidence that SR reduces errors, enhances comprehensiveness, and increases adherence to guidelines, its widespread adoption has been limited. Recently, large language models (LLMs) have emerged as a promising solution to automate and facilitate SR. Therefore, this narrative review aims to provide an overview of LLMs for SR in radiology and beyond. We found that the current literature on LLMs for SR is limited, comprising ten studies on the generative pre-trained transformer (GPT)-3.5 (n = 5) and/or GPT-4 (n = 8), while two studies additionally examined the performance of Perplexity and Bing Chat or IT5. All studies reported promising results and acknowledged the potential of LLMs for SR, with six out of ten studies demonstrating the feasibility of multilingual applications. Building upon these findings, we discuss limitations, regulatory challenges, and further applications of LLMs in radiology report processing, encompassing four main areas: documentation, translation and summarization, clinical evaluation, and data mining. In conclusion, this review underscores the transformative potential of LLMs to improve efficiency and accuracy in SR and radiology report processing. KEY POINTS: Question How can LLMs help make SR in radiology more ubiquitous? Findings Current literature leveraging LLMs for SR is sparse but shows promising results, including the feasibility of multilingual applications. Clinical relevance LLMs have the potential to transform radiology report processing and enable the widespread adoption of SR. However, their future role in clinical practice depends on overcoming current limitations and regulatory challenges, including opaque algorithms and training data.
RESUMEN
The increasing recognition of the health impacts from human exposure to per- and polyfluorinated alkyl substances (PFAS) has surged the need for sophisticated analytical techniques and advanced data analyses, especially for assessing exposure by food of animal origin. Despite the existence of nearly 15,000 PFAS listed in the CompTox chemicals dashboard by the US Environmental Protection Agency, conventional monitoring and suspect screening methods often fall short, covering only a fraction of these substances. This study introduces an innovative automated data processing workflow, named PFlow, for identifying PFAS in environmental samples using direct infusion Fourier transform ion cyclotron resonance mass spectrometry (DI-FT-ICR MS). PFlow's validation on a bream liver sample, representative of low-concentration biota, involves data pre-processing, annotation of PFAS based on their precursor masses, and verification through isotopologues. Notably, PFlow annotated 17 PFAS absent in the comprehensive targeted approach and tentatively identified an additional 53 compounds, thereby demonstrating its efficiency in enhancing PFAS detection coverage. From an initial dataset of 30,332 distinct m/z values, PFlow thoroughly narrowed down the candidates to 84 potential PFAS compounds, utilizing precise mass measurements and chemical logic criteria, underscoring its potential in advancing our understanding of PFAS prevalence and of human exposure.