Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 66.947
Filtrar
1.
Protein Sci ; 33(6): e4985, 2024 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-38717278

RESUMEN

Inteins are proteins that excise themselves out of host proteins and ligate the flanking polypeptides in an auto-catalytic process called protein splicing. In nature, inteins are either contiguous or split. In the case of split inteins, the two fragments must first form a complex for the splicing to occur. Contiguous inteins have previously been artificially split in two fragments because split inteins allow for distinct applications than contiguous ones. Even naturally split inteins have been split at unnatural split sites to obtain fragments with reduced affinity for one another, which are useful to create conditional inteins or to study protein-protein interactions. So far, split sites in inteins have been heuristically identified. We developed Int&in, a web server freely available for academic research (https://intein.biologie.uni-freiburg.de) that runs a machine learning model using logistic regression to predict active and inactive split sites in inteins with high accuracy. The model was trained on a dataset of 126 split sites generated using the gp41-1, Npu DnaE and CL inteins and validated using 97 split sites extracted from the literature. Despite the limited data size, the model, which uses various protein structural features, as well as sequence conservation information, achieves an accuracy of 0.79 and 0.78 for the training and testing sets, respectively. We envision Int&in will facilitate the engineering of novel split inteins for applications in synthetic and cell biology.


Asunto(s)
Inteínas , Internet , Aprendizaje Automático , Empalme de Proteína , Programas Informáticos , Dominio Catalítico
2.
PLoS Comput Biol ; 20(5): e1012024, 2024 May.
Artículo en Inglés | MEDLINE | ID: mdl-38717988

RESUMEN

The activation levels of biologically significant gene sets are emerging tumor molecular markers and play an irreplaceable role in the tumor research field; however, web-based tools for prognostic analyses using it as a tumor molecular marker remain scarce. We developed a web-based tool PESSA for survival analysis using gene set activation levels. All data analyses were implemented via R. Activation levels of The Molecular Signatures Database (MSigDB) gene sets were assessed using the single sample gene set enrichment analysis (ssGSEA) method based on data from the Gene Expression Omnibus (GEO), The Cancer Genome Atlas (TCGA), The European Genome-phenome Archive (EGA) and supplementary tables of articles. PESSA was used to perform median and optimal cut-off dichotomous grouping of ssGSEA scores for each dataset, relying on the survival and survminer packages for survival analysis and visualisation. PESSA is an open-access web tool for visualizing the results of tumor prognostic analyses using gene set activation levels. A total of 238 datasets from the GEO, TCGA, EGA, and supplementary tables of articles; covering 51 cancer types and 13 survival outcome types; and 13,434 tumor-related gene sets are obtained from MSigDB for pre-grouping. Users can obtain the results, including Kaplan-Meier analyses based on the median and optimal cut-off values and accompanying visualization plots and the Cox regression analyses of dichotomous and continuous variables, by selecting the gene set markers of interest. PESSA (https://smuonco.shinyapps.io/PESSA/ OR http://robinl-lab.com/PESSA) is a large-scale web-based tumor survival analysis tool covering a large amount of data that creatively uses predefined gene set activation levels as molecular markers of tumors.


Asunto(s)
Biomarcadores de Tumor , Biología Computacional , Bases de Datos Genéticas , Internet , Neoplasias , Programas Informáticos , Humanos , Neoplasias/genética , Neoplasias/mortalidad , Análisis de Supervivencia , Biomarcadores de Tumor/genética , Biomarcadores de Tumor/metabolismo , Biología Computacional/métodos , Pronóstico , Perfilación de la Expresión Génica/métodos , Regulación Neoplásica de la Expresión Génica/genética
3.
PLoS One ; 19(5): e0298192, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-38717996

RESUMEN

Area cartograms are map-based data visualizations in which the area of each map region is proportional to the data value it represents. Long utilized in print media, area cartograms have also become increasingly popular online, often accompanying news articles and blog posts. Despite their popularity, there is a dearth of cartogram generation tools accessible to non-technical users unfamiliar with Geographic Information Systems software. Few tools support the generation of contiguous cartograms (i.e., area cartograms that faithfully represent the spatial adjacency of neighboring regions). We thus reviewed existing contiguous cartogram software and compared two web-based cartogram tools: fBlog and go-cart.io. We experimentally evaluated their usability through a user study comprising cartogram generation and analysis tasks. The System Usability Scale was adopted to quantify how participants perceived the usability of both tools. We also collected written feedback from participants to determine the main challenges faced while using the software. Participants generally rated go-cart.io as being more usable than fBlog. Compared to fBlog, go-cart.io offers a greater variety of built-in maps and allows importing data values by file upload. Still, our results suggest that even go-cart.io suffers from poor usability because the graphical user interface is complex and data can only be imported as a comma-separated-values file. We also propose changes to go-cart.io and make general recommendations for web-based cartogram tools to address these concerns.


Asunto(s)
Internet , Programas Informáticos , Humanos , Femenino , Masculino , Adulto , Sistemas de Información Geográfica , Interfaz Usuario-Computador , Adulto Joven
4.
PLoS One ; 19(5): e0302787, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-38718077

RESUMEN

To monitor the sharing of research data through repositories is increasingly of interest to institutions and funders, as well as from a meta-research perspective. Automated screening tools exist, but they are based on either narrow or vague definitions of open data. Where manual validation has been performed, it was based on a small article sample. At our biomedical research institution, we developed detailed criteria for such a screening, as well as a workflow which combines an automated and a manual step, and considers both fully open and restricted-access data. We use the results for an internal incentivization scheme, as well as for a monitoring in a dashboard. Here, we describe in detail our screening procedure and its validation, based on automated screening of 11035 biomedical research articles, of which 1381 articles with potential data sharing were subsequently screened manually. The screening results were highly reliable, as witnessed by inter-rater reliability values of ≥0.8 (Krippendorff's alpha) in two different validation samples. We also report the results of the screening, both for our institution and an independent sample from a meta-research study. In the largest of the three samples, the 2021 institutional sample, underlying data had been openly shared for 7.8% of research articles. For an additional 1.0% of articles, restricted-access data had been shared, resulting in 8.3% of articles overall having open and/or restricted-access data. The extraction workflow is then discussed with regard to its applicability in different contexts, limitations, possible variations, and future developments. In summary, we present a comprehensive, validated, semi-automated workflow for the detection of shared research data underlying biomedical article publications.


Asunto(s)
Investigación Biomédica , Flujo de Trabajo , Investigación Biomédica/métodos , Humanos , Difusión de la Información/métodos , Acceso a la Información , Reproducibilidad de los Resultados
5.
Nat Commun ; 15(1): 3840, 2024 May 07.
Artículo en Inglés | MEDLINE | ID: mdl-38714698

RESUMEN

As the circadian clock regulates fundamental biological processes, disrupted clocks are often observed in patients and diseased tissues. Determining the circadian time of the patient or the tissue of focus is essential in circadian medicine and research. Here we present tauFisher, a computational pipeline that accurately predicts circadian time from a single transcriptomic sample by finding correlations between rhythmic genes within the sample. We demonstrate tauFisher's performance in adding timestamps to both bulk and single-cell transcriptomic samples collected from multiple tissue types and experimental settings. Application of tauFisher at a cell-type level in a single-cell RNAseq dataset collected from mouse dermal skin implies that greater circadian phase heterogeneity may explain the dampened rhythm of collective core clock gene expression in dermal immune cells compared to dermal fibroblasts. Given its robustness and generalizability across assay platforms, experimental setups, and tissue types, as well as its potential application in single-cell RNAseq data analysis, tauFisher is a promising tool that facilitates circadian medicine and research.


Asunto(s)
Relojes Circadianos , Ritmo Circadiano , Análisis de la Célula Individual , Transcriptoma , Análisis de la Célula Individual/métodos , Animales , Ratones , Ritmo Circadiano/genética , Relojes Circadianos/genética , Humanos , Perfilación de la Expresión Génica/métodos , Biología Computacional/métodos , Piel/metabolismo , Programas Informáticos , Fibroblastos/metabolismo , Análisis de Secuencia de ARN/métodos
6.
BMC Bioinformatics ; 25(1): 179, 2024 May 07.
Artículo en Inglés | MEDLINE | ID: mdl-38714913

RESUMEN

BACKGROUND: As genomic studies continue to implicate non-coding sequences in disease, testing the roles of these variants requires insights into the cell type(s) in which they are likely to be mediating their effects. Prior methods for associating non-coding variants with cell types have involved approaches using linkage disequilibrium or ontological associations, incurring significant processing requirements. GaiaAssociation is a freely available, open-source software that enables thousands of genomic loci implicated in a phenotype to be tested for enrichment at regulatory loci of multiple cell types in minutes, permitting insights into the cell type(s) mediating the studied phenotype. RESULTS: In this work, we present Regulatory Landscape Enrichment Analysis (RLEA) by GaiaAssociation and demonstrate its capability to test the enrichment of 12,133 variants across the cis-regulatory regions of 44 cell types. This analysis was completed in 134.0 ± 2.3 s, highlighting the efficient processing provided by GaiaAssociation. The intuitive interface requires only four inputs, offers a collection of customizable functions, and visualizes variant enrichment in cell-type regulatory regions through a heatmap matrix. GaiaAssociation is available on PyPi for download as a command line tool or Python package and the source code can also be installed from GitHub at https://github.com/GreallyLab/gaiaAssociation . CONCLUSIONS: GaiaAssociation is a novel package that provides an intuitive and efficient resource to understand the enrichment of non-coding variants across the cis-regulatory regions of different cells, empowering studies seeking to identify disease-mediating cell types.


Asunto(s)
Programas Informáticos , Variación Genética , Humanos , Genómica/métodos , Biología Computacional/métodos , Fenotipo , Secuencias Reguladoras de Ácidos Nucleicos/genética , Desequilibrio de Ligamiento
7.
Brief Bioinform ; 25(3)2024 Mar 27.
Artículo en Inglés | MEDLINE | ID: mdl-38725155

RESUMEN

Single-cell RNA sequencing (scRNA-seq) experiments have become instrumental in developmental and differentiation studies, enabling the profiling of cells at a single or multiple time-points to uncover subtle variations in expression profiles reflecting underlying biological processes. Benchmarking studies have compared many of the computational methods used to reconstruct cellular dynamics; however, researchers still encounter challenges in their analysis due to uncertainty with respect to selecting the most appropriate methods and parameters. Even among universal data processing steps used by trajectory inference methods such as feature selection and dimension reduction, trajectory methods' performances are highly dataset-specific. To address these challenges, we developed Escort, a novel framework for evaluating a dataset's suitability for trajectory inference and quantifying trajectory properties influenced by analysis decisions. Escort evaluates the suitability of trajectory analysis and the combined effects of processing choices using trajectory-specific metrics. Escort navigates single-cell trajectory analysis through these data-driven assessments, reducing uncertainty and much of the decision burden inherent to trajectory inference analyses. Escort is implemented in an accessible R package and R/Shiny application, providing researchers with the necessary tools to make informed decisions during trajectory analysis and enabling new insights into dynamic biological processes at single-cell resolution.


Asunto(s)
RNA-Seq , Análisis de la Célula Individual , Análisis de la Célula Individual/métodos , RNA-Seq/métodos , Humanos , Biología Computacional/métodos , Análisis de Secuencia de ARN/métodos , Programas Informáticos , Algoritmos , Perfilación de la Expresión Génica/métodos , Análisis de Expresión Génica de una Sola Célula
8.
PLoS One ; 19(5): e0291183, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-38713711

RESUMEN

BACKGROUND: Mendelian randomisation (MR) is the use of genetic variants as instrumental variables. Mode-based estimators (MBE) are one of the most popular types of estimators used in univariable-MR studies and is often used as a sensitivity analysis for pleiotropy. However, because there are no plurality valid regression estimators, modal estimators for multivariable-MR have been under-explored. METHODS: We use the residual framework for multivariable-MR to introduce two multivariable modal estimators: multivariable-MBE, which uses IVW to create residuals fed into a traditional plurality valid estimator, and an estimator which instead has the residuals fed into the contamination mixture method (CM), multivariable-CM. We then use Monte-Carlo simulations to explore the performance of these estimators when compared to existing ones and re-analyse the data used by Grant and Burgess (2021) looking at the causal effect of intelligence, education, and household income on Alzheimer's disease as an applied example. RESULTS: In our simulation, we found that multivariable-MBE was generally too variable to be much use. Multivariable-CM produced more precise estimates on the other hand. Multivariable-CM performed better than MR-Egger in almost all settings, and Weighted Median under balanced pleiotropy. However, it underperformed Weighted Median when there was a moderate amount of directional pleiotropy. Our re-analysis supported the conclusion of Grant and Burgess (2021), that intelligence had a protective effect on Alzheimer's disease, while education, and household income do not have a causal effect. CONCLUSIONS: Here we introduced two, non-regression-based, plurality valid estimators for multivariable MR. Of these, "multivariable-CM" which uses IVW to create residuals fed into a contamination-mixture model, performed the best. This estimator uses a plurality of variants valid assumption, and appears to provide precise and unbiased estimates in the presence of balanced pleiotropy and small amounts of directional pleiotropy.


Asunto(s)
Análisis de la Aleatorización Mendeliana , Análisis de la Aleatorización Mendeliana/métodos , Humanos , Enfermedad de Alzheimer/genética , Método de Montecarlo , Análisis Multivariante , Simulación por Computador , Variación Genética , Programas Informáticos
9.
PLoS Comput Biol ; 20(5): e1012045, 2024 May.
Artículo en Inglés | MEDLINE | ID: mdl-38722873

RESUMEN

This paper extends the FAIR (Findable, Accessible, Interoperable, Reusable) guidelines to provide criteria for assessing if software conforms to best practices in open source. By adding "USE" (User-Centered, Sustainable, Equitable), software development can adhere to open source best practice by incorporating user-input early on, ensuring front-end designs are accessible to all possible stakeholders, and planning long-term sustainability alongside software design. The FAIR-USE4OS guidelines will allow funders and researchers to more effectively evaluate and plan open-source software projects. There is good evidence of funders increasingly mandating that all funded research software is open source; however, even under the FAIR guidelines, this could simply mean software released on public repositories with a Zenodo DOI. By creating FAIR-USE software, best practice can be demonstrated from the very beginning of the design process and the software has the greatest chance of success by being impactful.


Asunto(s)
Guías como Asunto , Programas Informáticos , Biología Computacional/métodos , Diseño de Software , Humanos
10.
Sci Rep ; 14(1): 10633, 2024 05 09.
Artículo en Inglés | MEDLINE | ID: mdl-38724550

RESUMEN

Single-cell RNA sequencing (scRNA-seq) technology has been widely used to study the differences in gene expression at the single cell level, providing insights into the research of cell development, differentiation, and functional heterogeneity. Various pipelines and workflows of scRNA-seq analysis have been developed but few considered multi-timepoint data specifically. In this study, we develop CASi, a comprehensive framework for analyzing multiple timepoints' scRNA-seq data, which provides users with: (1) cross-timepoint cell annotation, (2) detection of potentially novel cell types emerged over time, (3) visualization of cell population evolution, and (4) identification of temporal differentially expressed genes (tDEGs). Through comprehensive simulation studies and applications to a real multi-timepoint single cell dataset, we demonstrate the robust and favorable performance of the proposal versus existing methods serving similar purposes.


Asunto(s)
Análisis de Secuencia de ARN , Análisis de la Célula Individual , Análisis de la Célula Individual/métodos , Análisis de Secuencia de ARN/métodos , Humanos , Perfilación de la Expresión Génica/métodos , Programas Informáticos , Biología Computacional/métodos
11.
Sci Rep ; 14(1): 10694, 2024 05 10.
Artículo en Inglés | MEDLINE | ID: mdl-38724620

RESUMEN

This study investigated the potential associations between allergic diseases (asthma, allergic rhinitis, and atopic dermatitis) and the development of primary open-angle glaucoma. We utilized authorized data from the Korean National Health Information Database (KNHID), which provides comprehensive medical claims data and information from the National Health Screening Program. We compared the baseline characteristics of subjects with and without allergic diseases and calculated the incidence and risk of glaucoma development. Cox proportional hazard regression analysis was used to determine the risk of glaucoma development in subjects with allergic diseases. A total of 171,129 subjects aged 20-39 with or without allergic diseases who underwent a general health examination between 2009 and 2015 were included. Subjects with allergic diseases exhibited a higher incidence of glaucoma compared to the control group. The hazard ratio (HR) of glaucoma onset was 1.49 and 1.39 in subjects with at least one allergic disease before and after adjusting for potential confounding factors, respectively. Among allergic diseases, atopic dermatitis showed the highest risk for glaucoma development (aHR 1.73) after adjusting for confounders. Allergic rhinitis showed an increased risk for incident glaucoma after adjustment (aHR 1.38). Asthma showed the lowest but still increased risk for glaucoma (aHR 1.22). The associations were consistent in all subgroup analyses stratified by sex, smoking, drinking, exercise, diabetes, hypertension, dyslipidemia, or history of steroid. In conclusion, allergic diseases are associated with increased risk of glaucoma development. Among allergic diseases, atopic dermatitis showed the highest risk for glaucoma development followed by allergic rhinitis and asthma.


Asunto(s)
Glaucoma de Ángulo Abierto , Humanos , Glaucoma de Ángulo Abierto/epidemiología , Masculino , Femenino , Adulto , República de Corea/epidemiología , Adulto Joven , Factores de Riesgo , Incidencia , Estudios de Cohortes , Rinitis Alérgica/epidemiología , Dermatitis Atópica/epidemiología , Asma/epidemiología , Asma/complicaciones , Hipersensibilidad/epidemiología , Hipersensibilidad/complicaciones , Modelos de Riesgos Proporcionales
12.
Commun Biol ; 7(1): 553, 2024 May 09.
Artículo en Inglés | MEDLINE | ID: mdl-38724695

RESUMEN

For the last two decades, the amount of genomic data produced by scientific and medical applications has been growing at a rapid pace. To enable software solutions that analyze, process, and transmit these data in an efficient and interoperable way, ISO and IEC released the first version of the compression standard MPEG-G in 2019. However, non-proprietary implementations of the standard are not openly available so far, limiting fair scientific assessment of the standard and, therefore, hindering its broad adoption. In this paper, we present Genie, to the best of our knowledge the first open-source encoder that compresses genomic data according to the MPEG-G standard. We demonstrate that Genie reaches state-of-the-art compression ratios while offering interoperability with any other standard-compliant decoder independent from its manufacturer. Finally, the ISO/IEC ecosystem ensures the long-term sustainability and decodability of the compressed data through the ISO/IEC-supported reference decoder.


Asunto(s)
Compresión de Datos , Genómica , Programas Informáticos , Genómica/métodos , Compresión de Datos/métodos , Humanos
13.
BMC Bioinformatics ; 25(1): 184, 2024 May 09.
Artículo en Inglés | MEDLINE | ID: mdl-38724907

RESUMEN

BACKGROUND: Major advances in sequencing technologies and the sharing of data and metadata in science have resulted in a wealth of publicly available datasets. However, working with and especially curating public omics datasets remains challenging despite these efforts. While a growing number of initiatives aim to re-use previous results, these present limitations that often lead to the need for further in-house curation and processing. RESULTS: Here, we present the Omics Dataset Curation Toolkit (OMD Curation Toolkit), a python3 package designed to accompany and guide the researcher during the curation process of metadata and fastq files of public omics datasets. This workflow provides a standardized framework with multiple capabilities (collection, control check, treatment and integration) to facilitate the arduous task of curating public sequencing data projects. While centered on the European Nucleotide Archive (ENA), the majority of the provided tools are generic and can be used to curate datasets from different sources. CONCLUSIONS: Thus, it offers valuable tools for the in-house curation previously needed to re-use public omics data. Due to its workflow structure and capabilities, it can be easily used and benefit investigators in developing novel omics meta-analyses based on sequencing data.


Asunto(s)
Curaduría de Datos , Programas Informáticos , Flujo de Trabajo , Curaduría de Datos/métodos , Metadatos , Bases de Datos Genéticas , Genómica/métodos , Biología Computacional/métodos
14.
Brief Bioinform ; 25(3)2024 Mar 27.
Artículo en Inglés | MEDLINE | ID: mdl-38725156

RESUMEN

Protein acetylation is one of the extensively studied post-translational modifications (PTMs) due to its significant roles across a myriad of biological processes. Although many computational tools for acetylation site identification have been developed, there is a lack of benchmark dataset and bespoke predictors for non-histone acetylation site prediction. To address these problems, we have contributed to both dataset creation and predictor benchmark in this study. First, we construct a non-histone acetylation site benchmark dataset, namely NHAC, which includes 11 subsets according to the sequence length ranging from 11 to 61 amino acids. There are totally 886 positive samples and 4707 negative samples for each sequence length. Secondly, we propose TransPTM, a transformer-based neural network model for non-histone acetylation site predication. During the data representation phase, per-residue contextualized embeddings are extracted using ProtT5 (an existing pre-trained protein language model). This is followed by the implementation of a graph neural network framework, which consists of three TransformerConv layers for feature extraction and a multilayer perceptron module for classification. The benchmark results reflect that TransPTM has the competitive performance for non-histone acetylation site prediction over three state-of-the-art tools. It improves our comprehension on the PTM mechanism and provides a theoretical basis for developing drug targets for diseases. Moreover, the created PTM datasets fills the gap in non-histone acetylation site datasets and is beneficial to the related communities. The related source code and data utilized by TransPTM are accessible at https://www.github.com/TransPTM/TransPTM.


Asunto(s)
Redes Neurales de la Computación , Procesamiento Proteico-Postraduccional , Acetilación , Biología Computacional/métodos , Bases de Datos de Proteínas , Programas Informáticos , Algoritmos , Humanos , Proteínas/química , Proteínas/metabolismo
15.
PLoS One ; 19(5): e0302333, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-38728285

RESUMEN

In software development, it's common to reuse existing source code by copying and pasting, resulting in the proliferation of numerous code clones-similar or identical code fragments-that detrimentally affect software quality and maintainability. Although several techniques for code clone detection exist, many encounter challenges in effectively identifying semantic clones due to their inability to extract syntax and semantics information. Fewer techniques leverage low-level source code representations like bytecode or assembly for clone detection. This work introduces a novel code representation for identifying syntactic and semantic clones in Java source code. It integrates high-level features extracted from the Abstract Syntax Tree with low-level features derived from intermediate representations generated by static analysis tools, like the Soot framework. Leveraging this combined representation, fifteen machine-learning models are trained to effectively detect code clones. Evaluation on a large dataset demonstrates the models' efficacy in accurately identifying semantic clones. Among these classifiers, ensemble classifiers, such as the LightGBM classifier, exhibit exceptional accuracy. Linearly combining features enhances the effectiveness of the models compared to multiplication and distance combination techniques. The experimental findings indicate that the proposed method can outperform the current clone detection techniques in detecting semantic clones.


Asunto(s)
Semántica , Programas Informáticos , Lenguajes de Programación , Aprendizaje Automático , Algoritmos
16.
PLoS One ; 19(5): e0301720, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-38739583

RESUMEN

A key benefit of the Open Computing Language (OpenCL) software framework is its capability to operate across diverse architectures. Field programmable gate arrays (FPGAs) are a high-speed computing architecture used for computation acceleration. This study investigates the impact of memory access time on overall performance in general FPGA computing environments through the creation of eight benchmarks within the OpenCL framework. The developed benchmarks capture a range of memory access behaviors, and they play a crucial role in assessing the performance of spinning and sleeping on FPGA-based architectures. The results obtained guide the formulation of new implementations and contribute to defining an abstraction of FPGAs. This abstraction is then utilized to create tailored implementations of primitives that are well-suited for this platform. While other research endeavors concentrate on creating benchmarks with the Compute Unified Device Architecture (CUDA) to scrutinize the memory systems across diverse GPU architectures and propose recommendations for future generations of GPU computation platforms, this study delves into the memory system analysis for the broader FPGA computing platform. It achieves this by employing the highly abstracted OpenCL framework, exploring various data workload characteristics, and experimentally delineating the appropriate implementation of primitives that can seamlessly integrate into a design tailored for the FPGA computing platform. Additionally, the results underscore the efficacy of employing a task-parallel model to mitigate the need for high-cost synchronization mechanisms in designs constructed on general FPGA computing platforms.


Asunto(s)
Benchmarking , Programas Informáticos , Humanos , Lenguajes de Programación
17.
Nat Commun ; 15(1): 3675, 2024 May 01.
Artículo en Inglés | MEDLINE | ID: mdl-38693118

RESUMEN

The wide applications of liquid chromatography - mass spectrometry (LC-MS) in untargeted metabolomics demand an easy-to-use, comprehensive computational workflow to support efficient and reproducible data analysis. However, current tools were primarily developed to perform specific tasks in LC-MS based metabolomics data analysis. Here we introduce MetaboAnalystR 4.0 as a streamlined pipeline covering raw spectra processing, compound identification, statistical analysis, and functional interpretation. The key features of MetaboAnalystR 4.0 includes an auto-optimized feature detection and quantification algorithm for LC-MS1 spectra processing, efficient MS2 spectra deconvolution and compound identification for data-dependent or data-independent acquisition, and more accurate functional interpretation through integrated spectral annotation. Comprehensive validation studies using LC-MS1 and MS2 spectra obtained from standards mixtures, dilution series and clinical metabolomics samples have shown its excellent performance across a wide range of common tasks such as peak picking, spectral deconvolution, and compound identification with good computing efficiency. Together with its existing statistical analysis utilities, MetaboAnalystR 4.0 represents a significant step toward a unified, end-to-end workflow for LC-MS based global metabolomics in the open-source R environment.


Asunto(s)
Espectrometría de Masas , Metabolómica , Flujo de Trabajo , Algoritmos , Cromatografía Liquida/métodos , Cromatografía Líquida con Espectrometría de Masas , Espectrometría de Masas/métodos , Metabolómica/métodos , Programas Informáticos
18.
Nat Commun ; 15(1): 3992, 2024 May 11.
Artículo en Inglés | MEDLINE | ID: mdl-38734767

RESUMEN

Visual proteomics attempts to build atlases of the molecular content of cells but the automated annotation of cryo electron tomograms remains challenging. Template matching (TM) and methods based on machine learning detect structural signatures of macromolecules. However, their applicability remains limited in terms of both the abundance and size of the molecular targets. Here we show that the performance of TM is greatly improved by using template-specific search parameter optimization and by including higher-resolution information. We establish a TM pipeline with systematically tuned parameters for the automated, objective and comprehensive identification of structures with confidence 10 to 100-fold above the noise level. We demonstrate high-fidelity and high-confidence localizations of nuclear pore complexes, vaults, ribosomes, proteasomes, fatty acid synthases, lipid membranes and microtubules, and individual subunits inside crowded eukaryotic cells. We provide software tools for the generic implementation of our method that is broadly applicable towards realizing visual proteomics.


Asunto(s)
Microscopía por Crioelectrón , Tomografía con Microscopio Electrónico , Complejo de la Endopetidasa Proteasomal , Proteómica , Ribosomas , Programas Informáticos , Tomografía con Microscopio Electrónico/métodos , Microscopía por Crioelectrón/métodos , Ribosomas/ultraestructura , Ribosomas/metabolismo , Complejo de la Endopetidasa Proteasomal/ultraestructura , Complejo de la Endopetidasa Proteasomal/metabolismo , Complejo de la Endopetidasa Proteasomal/química , Humanos , Proteómica/métodos , Poro Nuclear/ultraestructura , Poro Nuclear/metabolismo , Microtúbulos/ultraestructura , Microtúbulos/metabolismo , Ácido Graso Sintasas/metabolismo , Aprendizaje Automático , Imagenología Tridimensional/métodos , Algoritmos , Procesamiento de Imagen Asistido por Computador/métodos
19.
J Vis Exp ; (206)2024 Apr 26.
Artículo en Inglés | MEDLINE | ID: mdl-38738870

RESUMEN

The interplay between the brain and the cardiovascular systems is garnering increased attention for its potential to advance our understanding of human physiology and improve health outcomes. However, the multimodal analysis of these signals is challenging due to the lack of guidelines, standardized signal processing and statistical tools, graphical user interfaces (GUIs), and automation for processing large datasets or increasing reproducibility. A further void exists in standardized EEG and heart-rate variability (HRV) feature extraction methods, undermining clinical diagnostics or the robustness of machine learning (ML) models. In response to these limitations, we introduce the BrainBeats toolbox. Implemented as an open-source EEGLAB plugin, BrainBeats integrates three main protocols: 1) Heartbeat-evoked potentials (HEP) and oscillations (HEO) for assessing time-locked brain-heart interplay at the millisecond accuracy; 2) EEG and HRV feature extraction for examining associations/differences between various brain and heart metrics or for building robust feature-based ML models; 3) Automated extraction of heart artifacts from EEG signals to remove any potential cardiovascular contamination while conducting EEG analysis. We provide a step-by-step tutorial for applying these three methods to an open-source dataset containing simultaneous 64-channel EEG, ECG, and PPG signals. Users can easily fine-tune parameters to tailor their unique research needs using the graphical user interface (GUI) or the command line. BrainBeats should make brain-heart interplay research more accessible and reproducible.


Asunto(s)
Electroencefalografía , Frecuencia Cardíaca , Humanos , Electroencefalografía/métodos , Frecuencia Cardíaca/fisiología , Procesamiento de Señales Asistido por Computador , Programas Informáticos , Encéfalo/fisiología , Aprendizaje Automático
20.
Brief Bioinform ; 25(3)2024 Mar 27.
Artículo en Inglés | MEDLINE | ID: mdl-38701421

RESUMEN

Cancer is a complex cellular ecosystem where malignant cells coexist and interact with immune, stromal and other cells within the tumor microenvironment (TME). Recent technological advancements in spatially resolved multiplexed imaging at single-cell resolution have led to the generation of large-scale and high-dimensional datasets from biological specimens. This underscores the necessity for automated methodologies that can effectively characterize molecular, cellular and spatial properties of TMEs for various malignancies. This study introduces SpatialCells, an open-source software package designed for region-based exploratory analysis and comprehensive characterization of TMEs using multiplexed single-cell data. The source code and tutorials are available at https://semenovlab.github.io/SpatialCells. SpatialCells efficiently streamlines the automated extraction of features from multiplexed single-cell data and can process samples containing millions of cells. Thus, SpatialCells facilitates subsequent association analyses and machine learning predictions, making it an essential tool in advancing our understanding of tumor growth, invasion and metastasis.


Asunto(s)
Análisis de la Célula Individual , Programas Informáticos , Microambiente Tumoral , Análisis de la Célula Individual/métodos , Humanos , Neoplasias/patología , Aprendizaje Automático , Biología Computacional/métodos
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...