Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 4.515
Filtrar
1.
Brief Bioinform ; 25(Supplement_1)2024 Jul 23.
Artigo em Inglês | MEDLINE | ID: mdl-39041911

RESUMO

This manuscript describes the development of a resource module that is part of a learning platform named 'NIGMS Sandbox for Cloud-based Learning', https://github.com/NIGMS/NIGMS-Sandbox. The overall genesis of the Sandbox is described in the editorial authored by National Institute of General Medical Sciences: NIGMS Sandbox: A Learning Platform toward Democratizing Cloud Computing for Biomedical Research at the beginning of this supplement. This module delivers learning materials introducing the utility of the BASH (Bourne Again Shell) programming language for genomic data analysis in an interactive format that uses appropriate cloud resources for data access and analyses. The next-generation sequencing revolution has generated massive amounts of novel biological data from a multitude of platforms that survey an ever-growing list of genomic modalities. These data require significant downstream computational and statistical analyses to glean meaningful biological insights. However, the skill sets required to generate these data are vastly different from the skills required to analyze these data. Bench scientists that generate next-generation data often lack the training required to perform analysis of these datasets and require support from bioinformatics specialists. Dedicated computational training is required to empower biologists in the area of genomic data analysis, however, learning to efficiently leverage a command line interface is a significant barrier in learning how to leverage common analytical tools. Cloud platforms have the potential to democratize access to the technical tools and computational resources necessary to work with modern sequencing data, providing an effective framework for bioinformatics education. This module aims to provide an interactive platform that slowly builds technical skills and knowledge needed to interact with genomics data on the command line in the Cloud. The sandbox format of this module enables users to move through the material at their own pace and test their grasp of the material with knowledge self-checks before building on that material in the next sub-module. This manuscript describes the development of a resource module that is part of a learning platform named ``NIGMS Sandbox for Cloud-based Learning'' https://github.com/NIGMS/NIGMS-Sandbox. The overall genesis of the Sandbox is described in the editorial NIGMS Sandbox [1] at the beginning of this Supplement. This module delivers learning materials on the analysis of bulk and single-cell ATAC-seq data in an interactive format that uses appropriate cloud resources for data access and analyses.


Assuntos
Computação em Nuvem , Biologia Computacional , Software , Biologia Computacional/métodos , Linguagens de Programação , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Genômica/métodos , Humanos
2.
Bioinformatics ; 40(7)2024 Jul 01.
Artigo em Inglês | MEDLINE | ID: mdl-38950177

RESUMO

SUMMARY: Effective collaboration between developers of Bayesian inference methods and users is key to advance our quantitative understanding of biosystems. We here present hopsy, a versatile open-source platform designed to provide convenient access to powerful Markov chain Monte Carlo sampling algorithms tailored to models defined on convex polytopes (CP). Based on the high-performance C++ sampling library HOPS, hopsy inherits its strengths and extends its functionalities with the accessibility of the Python programming language. A versatile plugin-mechanism enables seamless integration with domain-specific models, providing method developers with a framework for testing, benchmarking, and distributing CP samplers to approach real-world inference tasks. We showcase hopsy by solving common and newly composed domain-specific sampling problems, highlighting important design choices. By likening hopsy to a marketplace, we emphasize its role in bringing together users and developers, where users get access to state-of-the-art methods, and developers contribute their own innovative solutions for challenging domain-specific inference problems. AVAILABILITY AND IMPLEMENTATION: Sources, documentation and a continuously updated list of sampling algorithms are available at https://jugit.fz-juelich.de/IBG-1/ModSim/hopsy, with Linux, Windows and MacOS binaries at https://pypi.org/project/hopsy/.


Assuntos
Algoritmos , Linguagens de Programação , Software , Teorema de Bayes , Método de Monte Carlo , Cadeias de Markov , Biologia Computacional/métodos
3.
Sci Rep ; 14(1): 16572, 2024 Jul 17.
Artigo em Inglês | MEDLINE | ID: mdl-39019939

RESUMO

Bioinformatics tools are essential for performing analyses in the omics sciences. Given the numerous experimental opportunities arising from advances in the field of omics and easier access to high-throughput sequencing platforms, these tools play a fundamental role in research projects. Despite the considerable progress made possible by the development of bioinformatics tools, some tools are tailored to specific analytical goals, leading to challenges for non-bioinformaticians who need to integrate the results of these specific tools into a customized pipeline. To solve this problem, we have developed the BioPipeline Creator, a user-friendly Java-based GUI that allows different software tools to be integrated into the repertoire while ensuring easy user interaction via an accessible graphical interface. Consisting of client and server software components, BioPipeline Creator provides an intuitive graphical interface that simplifies the use of various bioinformatics tools for users without advanced computer skills. It can run on less sophisticated devices or workstations, allowing users to keep their operating system without having to switch to another compatible system. The server is responsible for the processing tasks and can perform the analysis in the user's local or remote network structure. Compatible with the most important operating systems, available at https://github.com/allanverasce/bpc.git .


Assuntos
Biologia Computacional , Software , Interface Usuário-Computador , Biologia Computacional/métodos , Linguagens de Programação , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Humanos
4.
PLoS Comput Biol ; 20(6): e1011912, 2024 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-38843301

RESUMO

To standardize metabolomics data analysis and facilitate future computational developments, it is essential to have a set of well-defined templates for common data structures. Here we describe a collection of data structures involved in metabolomics data processing and illustrate how they are utilized in a full-featured Python-centric pipeline. We demonstrate the performance of the pipeline, and the details in annotation and quality control using large-scale LC-MS metabolomics and lipidomics data and LC-MS/MS data. Multiple previously published datasets are also reanalyzed to showcase its utility in biological data analysis. This pipeline allows users to streamline data processing, quality control, annotation, and standardization in an efficient and transparent manner. This work fills a major gap in the Python ecosystem for computational metabolomics.


Assuntos
Metabolômica , Software , Metabolômica/métodos , Metabolômica/estatística & dados numéricos , Biologia Computacional/métodos , Lipidômica/métodos , Cromatografia Líquida/métodos , Espectrometria de Massas em Tandem/métodos , Linguagens de Programação , Humanos
5.
BMC Bioinformatics ; 25(1): 219, 2024 Jun 19.
Artigo em Inglês | MEDLINE | ID: mdl-38898394

RESUMO

BACKGROUND: With the surge in genomic data driven by advancements in sequencing technologies, the demand for efficient bioinformatics tools for sequence analysis has become paramount. BLAST-like alignment tool (BLAT), a sequence alignment tool, faces limitations in performance efficiency and integration with modern programming environments, particularly Python. This study introduces PxBLAT, a Python-based framework designed to enhance the capabilities of BLAT, focusing on usability, computational efficiency, and seamless integration within the Python ecosystem. RESULTS: PxBLAT demonstrates significant improvements over BLAT in execution speed and data handling, as evidenced by comprehensive benchmarks conducted across various sample groups ranging from 50 to 600 samples. These experiments highlight a notable speedup, reducing execution time compared to BLAT. The framework also introduces user-friendly features such as improved server management, data conversion utilities, and shell completion, enhancing the overall user experience. Additionally, the provision of extensive documentation and comprehensive testing supports community engagement and facilitates the adoption of PxBLAT. CONCLUSIONS: PxBLAT stands out as a robust alternative to BLAT, offering performance and user interaction enhancements. Its development underscores the potential for modern programming languages to improve bioinformatics tools, aligning with the needs of contemporary genomic research. By providing a more efficient, user-friendly tool, PxBLAT has the potential to impact genomic data analysis workflows, supporting faster and more accurate sequence analysis in a Python environment.


Assuntos
Biologia Computacional , Alinhamento de Sequência , Software , Biologia Computacional/métodos , Alinhamento de Sequência/métodos , Linguagens de Programação , Genômica/métodos
6.
Bioinformatics ; 40(Supplement_1): i266-i276, 2024 Jun 28.
Artigo em Inglês | MEDLINE | ID: mdl-38940140

RESUMO

SUMMARY: Pretrained large language models (LLMs) have significantly improved code generation. As these models scale up, there is an increasing need for the output to handle more intricate tasks and to be appropriately specialized to particular domains. Here, we target bioinformatics due to the amount of domain knowledge, algorithms, and data operations this discipline requires. We present BioCoder, a benchmark developed to evaluate LLMs in generating bioinformatics-specific code. BioCoder spans much of the field, covering cross-file dependencies, class declarations, and global variables. It incorporates 1026 Python functions and 1243 Java methods extracted from GitHub, along with 253 examples from the Rosalind Project, all pertaining to bioinformatics. Using topic modeling, we show that the overall coverage of the included code is representative of the full spectrum of bioinformatics calculations. BioCoder incorporates a fuzz-testing framework for evaluation. We have applied it to evaluate various models including InCoder, CodeGen, CodeGen2, SantaCoder, StarCoder, StarCoder+, InstructCodeT5+, GPT-3.5, and GPT-4. Furthermore, we fine-tuned one model (StarCoder), demonstrating that our training dataset can enhance the performance on our testing benchmark (by >15% in terms of Pass@K under certain prompt configurations and always >3%). The results highlight two key aspects of successful models: (i) Successful models accommodate a long prompt (>2600 tokens) with full context, including functional dependencies. (ii) They contain domain-specific knowledge of bioinformatics, beyond just general coding capability. This is evident from the performance gain of GPT-3.5/4 compared to the smaller models on our benchmark (50% versus up to 25%). AVAILABILITY AND IMPLEMENTATION: All datasets, benchmark, Docker images, and scripts required for testing are available at: https://github.com/gersteinlab/biocoder and https://biocoder-benchmark.github.io/.


Assuntos
Algoritmos , Benchmarking , Biologia Computacional , Linguagens de Programação , Software , Biologia Computacional/métodos , Benchmarking/métodos
7.
PLoS Comput Biol ; 20(6): e1012173, 2024 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-38900779

RESUMO

Interactive Jupyter Notebooks in combination with Conda environments can be used to generate FAIR (Findable, Accessible, Interoperable and Reusable/Reproducible) biomolecular simulation workflows. The interactive programming code accompanied by documentation and the possibility to inspect intermediate results with versatile graphical charts and data visualization is very helpful, especially in iterative processes, where parameters might be adjusted to a particular system of interest. This work presents a collection of FAIR notebooks covering various areas of the biomolecular simulation field, such as molecular dynamics (MD), protein-ligand docking, molecular checking/modeling, molecular interactions, and free energy perturbations. Workflows can be launched with myBinder or easily installed in a local system. The collection of notebooks aims to provide a compilation of demonstration workflows, and it is continuously updated and expanded with examples using new methodologies and tools.


Assuntos
Biologia Computacional , Simulação de Dinâmica Molecular , Software , Fluxo de Trabalho , Biologia Computacional/métodos , Linguagens de Programação , Interface Usuário-Computador , Proteínas/química , Simulação de Acoplamento Molecular , Reprodutibilidade dos Testes , Ligantes
8.
Bioinformatics ; 40(6)2024 Jun 03.
Artigo em Inglês | MEDLINE | ID: mdl-38837370

RESUMO

MOTIVATION: The BigWig and BigBed file formats were originally designed for the visualization of next-generation sequencing data through a genome browser. Due to their versatility, these formats have long since become ubiquitous for the storage of processed sequencing data and regularly serve as the basis for downstream data analysis. As the number and size of sequencing experiments continues to accelerate, there is an increasing demand to efficiently generate and query BigWig and BigBed files in a scalable and robust manner, and to efficiently integrate these functionalities into data analysis environments and third-party applications. RESULTS: Here, we present Bigtools, a feature-complete, high-performance, and integrable software library for generating and querying both BigWig and BigBed files. Bigtools is written in the Rust programming language and includes a flexible suite of command line tools as well as bindings to Python. AVAILABILITY AND IMPLEMENTATION: Bigtools is cross-platform and released under the MIT license. It is distributed on Crates.io, Bioconda, and the Python Package Index, and the source code is available at https://github.com/jackh726/bigtools.


Assuntos
Sequenciamento de Nucleotídeos em Larga Escala , Software , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Linguagens de Programação
9.
Gigascience ; 132024 01 02.
Artigo em Inglês | MEDLINE | ID: mdl-38896539

RESUMO

BACKGROUND: Scientific workflow systems are increasingly popular for expressing and executing complex data analysis pipelines over large datasets, as they offer reproducibility, dependability, and scalability of analyses by automatic parallelization on large compute clusters. However, implementing workflows is difficult due to the involvement of many black-box tools and the deep infrastructure stack necessary for their execution. Simultaneously, user-supporting tools are rare, and the number of available examples is much lower than in classical programming languages. RESULTS: To address these challenges, we investigate the efficiency of large language models (LLMs), specifically ChatGPT, to support users when dealing with scientific workflows. We performed 3 user studies in 2 scientific domains to evaluate ChatGPT for comprehending, adapting, and extending workflows. Our results indicate that LLMs efficiently interpret workflows but achieve lower performance for exchanging components or purposeful workflow extensions. We characterize their limitations in these challenging scenarios and suggest future research directions. CONCLUSIONS: Our results show a high accuracy for comprehending and explaining scientific workflows while achieving a reduced performance for modifying and extending workflow descriptions. These findings clearly illustrate the need for further research in this area.


Assuntos
Fluxo de Trabalho , Linguagens de Programação , Software , Biologia Computacional/métodos , Humanos
10.
PLoS One ; 19(5): e0301720, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38739583

RESUMO

A key benefit of the Open Computing Language (OpenCL) software framework is its capability to operate across diverse architectures. Field programmable gate arrays (FPGAs) are a high-speed computing architecture used for computation acceleration. This study investigates the impact of memory access time on overall performance in general FPGA computing environments through the creation of eight benchmarks within the OpenCL framework. The developed benchmarks capture a range of memory access behaviors, and they play a crucial role in assessing the performance of spinning and sleeping on FPGA-based architectures. The results obtained guide the formulation of new implementations and contribute to defining an abstraction of FPGAs. This abstraction is then utilized to create tailored implementations of primitives that are well-suited for this platform. While other research endeavors concentrate on creating benchmarks with the Compute Unified Device Architecture (CUDA) to scrutinize the memory systems across diverse GPU architectures and propose recommendations for future generations of GPU computation platforms, this study delves into the memory system analysis for the broader FPGA computing platform. It achieves this by employing the highly abstracted OpenCL framework, exploring various data workload characteristics, and experimentally delineating the appropriate implementation of primitives that can seamlessly integrate into a design tailored for the FPGA computing platform. Additionally, the results underscore the efficacy of employing a task-parallel model to mitigate the need for high-cost synchronization mechanisms in designs constructed on general FPGA computing platforms.


Assuntos
Benchmarking , Software , Humanos , Linguagens de Programação
11.
J Am Soc Cytopathol ; 13(4): 309-318, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38702208

RESUMO

INTRODUCTION: Effective feedback on cytology performance relies on navigating complex laboratory information system data, which is prone to errors and lacks flexibility. As a comprehensive solution, we used the Python programming language to create a dashboard application for screening and diagnostic quality metrics. MATERIALS AND METHODS: Data from the 5-year period (2018-2022) were accessed. Versatile open-source Python libraries (user developed program code packages) were used from the first step of LIS data cleaning through the creation of the application. To evaluate performance, we selected 3 gynecologic metrics: the ASC/LSIL ratio, the ASC-US/ASC-H ratio, and the proportion of cytologic abnormalities in comparison to the total number of cases (abnormal rate). We also evaluated the referral rate of cytologists/cytotechnologists (CTs) and the ratio of thyroid AUS interpretations by cytopathologists (CPs). These were formed into colored graphs that showcase individual results in established, color-coded laboratory "goal," "borderline," and "attention" zones based on published reference benchmarks. A representation of the results distribution for the entire laboratory was also developed. RESULTS: We successfully created a web-based test application that presents interactive dashboards with different interfaces for the CT, CP, and laboratory management (https://drkvcsstvn-dashboards.hf.space/app). The user can choose to view the desired quality metric, year, and the anonymized CT or CP, with an additional automatically generated written report of results. CONCLUSIONS: Python programming proved to be an effective toolkit to ensure high-level data processing in a modular and reproducible way to create a personalized, laboratory specific cytology dashboard.


Assuntos
Linguagens de Programação , Garantia da Qualidade dos Cuidados de Saúde , Humanos , Feminino , Citodiagnóstico/métodos , Citodiagnóstico/normas , Software , Citologia
12.
J Bioinform Comput Biol ; 22(2): 2471001, 2024 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-38779779

RESUMO

ChatGPT, a recently developed product by openAI, is successfully leaving its mark as a multi-purpose natural language based chatbot. In this paper, we are more interested in analyzing its potential in the field of computational biology. A major share of work done by computational biologists these days involve coding up bioinformatics algorithms, analyzing data, creating pipelining scripts and even machine learning modeling and feature extraction. This paper focuses on the potential influence (both positive and negative) of ChatGPT in the mentioned aspects with illustrative examples from different perspectives. Compared to other fields of computer science, computational biology has (1) less coding resources, (2) more sensitivity and bias issues (deals with medical data), and (3) more necessity of coding assistance (people from diverse background come to this field). Keeping such issues in mind, we cover use cases such as code writing, reviewing, debugging, converting, refactoring, and pipelining using ChatGPT from the perspective of computational biologists in this paper.


Assuntos
Algoritmos , Biologia Computacional , Biologia Computacional/métodos , Software , Linguagens de Programação , Humanos , Processamento de Linguagem Natural , Aprendizado de Máquina
13.
PLoS One ; 19(5): e0302333, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38728285

RESUMO

In software development, it's common to reuse existing source code by copying and pasting, resulting in the proliferation of numerous code clones-similar or identical code fragments-that detrimentally affect software quality and maintainability. Although several techniques for code clone detection exist, many encounter challenges in effectively identifying semantic clones due to their inability to extract syntax and semantics information. Fewer techniques leverage low-level source code representations like bytecode or assembly for clone detection. This work introduces a novel code representation for identifying syntactic and semantic clones in Java source code. It integrates high-level features extracted from the Abstract Syntax Tree with low-level features derived from intermediate representations generated by static analysis tools, like the Soot framework. Leveraging this combined representation, fifteen machine-learning models are trained to effectively detect code clones. Evaluation on a large dataset demonstrates the models' efficacy in accurately identifying semantic clones. Among these classifiers, ensemble classifiers, such as the LightGBM classifier, exhibit exceptional accuracy. Linearly combining features enhances the effectiveness of the models compared to multiplication and distance combination techniques. The experimental findings indicate that the proposed method can outperform the current clone detection techniques in detecting semantic clones.


Assuntos
Semântica , Software , Linguagens de Programação , Aprendizado de Máquina , Algoritmos
14.
PLoS Comput Biol ; 20(5): e1012067, 2024 May.
Artigo em Inglês | MEDLINE | ID: mdl-38709825

RESUMO

Chromosome conformation capture (3C) technologies reveal the incredible complexity of genome organization. Maps of increasing size, depth, and resolution are now used to probe genome architecture across cell states, types, and organisms. Larger datasets add challenges at each step of computational analysis, from storage and memory constraints to researchers' time; however, analysis tools that meet these increased resource demands have not kept pace. Furthermore, existing tools offer limited support for customizing analysis for specific use cases or new biology. Here we introduce cooltools (https://github.com/open2c/cooltools), a suite of computational tools that enables flexible, scalable, and reproducible analysis of high-resolution contact frequency data. Cooltools leverages the widely-adopted cooler format which handles storage and access for high-resolution datasets. Cooltools provides a paired command line interface (CLI) and Python application programming interface (API), which respectively facilitate workflows on high-performance computing clusters and in interactive analysis environments. In short, cooltools enables the effective use of the latest and largest genome folding datasets.


Assuntos
Biologia Computacional , Software , Biologia Computacional/métodos , Linguagens de Programação , Genômica/métodos , Genoma/genética , Mapeamento Cromossômico/métodos , Humanos
15.
PLoS One ; 19(4): e0289141, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38598521

RESUMO

Diagnostic tests play a crucial role in establishing the presence of a specific disease in an individual. Receiver Operating Characteristic (ROC) curve analyses are essential tools that provide performance metrics for diagnostic tests. Accurate determination of the cutoff point in ROC curve analyses is the most critical aspect of the process. A variety of methods have been developed to find the optimal cutoffs. Although the R programming language provides a variety of package programs for conducting ROC curve analysis and determining the appropriate cutoffs, it typically needs coding skills and a substantial investment of time. Specifically, the necessity for data preprocessing and analysis can present a significant challenge, especially for individuals without coding experience. We have developed the CERA (ChatGPT-Enhanced ROC Analysis) tool, a user-friendly ROC curve analysis web tool using the shiny interface for faster and more effective analyses to solve this problem. CERA is not only user-friendly, but it also interacts with ChatGPT, which interprets the outputs. This allows for an interpreted report generated by R-Markdown to be presented to the user, enhancing the accessibility and understanding of the analysis results.


Assuntos
Linguagens de Programação , Software , Humanos , Curva ROC , Biomarcadores
16.
Stud Health Technol Inform ; 313: 167-172, 2024 Apr 26.
Artigo em Inglês | MEDLINE | ID: mdl-38682525

RESUMO

Healthcare-associated infections (HAIs) may have grave consequences for patients. In the case of sepsis, the 30-day mortality rate is about 25%. HAIs cost EU member states an estimated 7 billion Euros annually. Clinical decision support tools may be useful for infection monitoring, early warning, and alerts. MONI, a tool for monitoring nosocomial infections, is used at University Hospital Vienna, but needs to be clinically and technically revised and updated. A new, completely configurable pipeline-based system for defining and processing HAI definitions was developed and validated. A network of data access points, clinical rules, and explanatory output is arranged as an inference network, a clinical pipeline as it is called, and processed in a stepwise manner. Arden-Syntax-based medical logic modules were used to implement the respective rules. The system was validated by creating a pipeline for the ECDC PN5 pneumonia rule. It was tested on a set of patient data from intensive care medicine. The results were compared with previously obtained MONI output as a suitable reference, yielding a sensitivity of 93.8% and a specificity of 99.8%. Clinical pipelines show promise as an open and configurable approach to graphically-based, human-readable, machine-executable HAI definitions.


Assuntos
Infecção Hospitalar , Sistemas de Apoio a Decisões Clínicas , Humanos , Infecção Hospitalar/prevenção & controle , Controle de Infecções , Áustria , Linguagens de Programação , Software
17.
BMC Bioinformatics ; 25(1): 166, 2024 Apr 25.
Artigo em Inglês | MEDLINE | ID: mdl-38664639

RESUMO

BACKGROUND: The Biology System Description Language (BiSDL) is an accessible, easy-to-use computational language for multicellular synthetic biology. It allows synthetic biologists to represent spatiality and multi-level cellular dynamics inherent to multicellular designs, filling a gap in the state of the art. Developed for designing and simulating spatial, multicellular synthetic biological systems, BiSDL integrates high-level conceptual design with detailed low-level modeling, fostering collaboration in the Design-Build-Test-Learn cycle. BiSDL descriptions directly compile into Nets-Within-Nets (NWNs) models, offering a unique approach to spatial and hierarchical modeling in biological systems. RESULTS: BiSDL's effectiveness is showcased through three case studies on complex multicellular systems: a bacterial consortium, a synthetic morphogen system and a conjugative plasmid transfer process. These studies highlight the BiSDL proficiency in representing spatial interactions and multi-level cellular dynamics. The language facilitates the compilation of conceptual designs into detailed, simulatable models, leveraging the NWNs formalism. This enables intuitive modeling of complex biological systems, making advanced computational tools more accessible to a broader range of researchers. CONCLUSIONS: BiSDL represents a significant step forward in computational languages for synthetic biology, providing a sophisticated yet user-friendly tool for designing and simulating complex biological systems with an emphasis on spatiality and cellular dynamics. Its introduction has the potential to transform research and development in synthetic biology, allowing for deeper insights and novel applications in understanding and manipulating multicellular systems.


Assuntos
Biologia Sintética , Biologia Sintética/métodos , Modelos Biológicos , Linguagens de Programação , Biologia de Sistemas/métodos , Software
18.
PLoS Comput Biol ; 20(4): e1011800, 2024 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-38656994

RESUMO

Biochemical signaling pathways in living cells are often highly organized into spatially segregated volumes, membranes, scaffolds, subcellular compartments, and organelles comprising small numbers of interacting molecules. At this level of granularity stochastic behavior dominates, well-mixed continuum approximations based on concentrations break down and a particle-based approach is more accurate and more efficient. We describe and validate a new version of the open-source MCell simulation program (MCell4), which supports generalized 3D Monte Carlo modeling of diffusion and chemical reaction of discrete molecules and macromolecular complexes in solution, on surfaces representing membranes, and combinations thereof. The main improvements in MCell4 compared to the previous versions, MCell3 and MCell3-R, include a Python interface and native BioNetGen reaction language (BNGL) support. MCell4's Python interface opens up completely new possibilities for interfacing with external simulators to allow creation of sophisticated event-driven multiscale/multiphysics simulations. The native BNGL support, implemented through a new open-source library libBNG (also introduced in this paper), provides the capability to run a given BNGL model spatially resolved in MCell4 and, with appropriate simplifying assumptions, also in the BioNetGen simulation environment, greatly accelerating and simplifying model validation and comparison.


Assuntos
Método de Monte Carlo , Software , Difusão , Simulação por Computador , Modelos Biológicos , Linguagens de Programação , Biologia Computacional/métodos , Transdução de Sinais/fisiologia
19.
Anal Bioanal Chem ; 416(14): 3349-3360, 2024 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-38607384

RESUMO

The analysis of almost holistic food profiles has developed considerably over the last years. This has also led to larger amounts of data and the ability to obtain more information about health-beneficial and adverse constituents in food than ever before. Especially in the field of proteomics, software is used for evaluation, and these do not provide specific approaches for unique monitoring questions. An additional and more comprehensive way of evaluation can be done with the programming language Python. It offers broad possibilities by a large ecosystem for mass spectrometric data analysis, but needs to be tailored for specific sets of features, the research questions behind. It also offers the applicability of various machine-learning approaches. The aim of the present study was to develop an algorithm for selecting and identifying potential marker peptides from mass spectrometric data. The workflow is divided into three steps: (I) feature engineering, (II) chemometric data analysis, and (III) feature identification. The first step is the transformation of the mass spectrometric data into a structure, which enables the application of existing data analysis packages in Python. The second step is the data analysis for selecting single features. These features are further processed in the third step, which is the feature identification. The data used exemplarily in this proof-of-principle approach was from a study on the influence of a heat treatment on the milk proteome/peptidome.


Assuntos
Temperatura Alta , Leite , Peptídeos , Fluxo de Trabalho , Leite/química , Animais , Peptídeos/análise , Peptídeos/química , Biomarcadores/análise , Software , Proteômica/métodos , Espectrometria de Massas/métodos , Linguagens de Programação , Algoritmos
20.
J Chem Inf Model ; 64(8): 2948-2954, 2024 Apr 22.
Artigo em Inglês | MEDLINE | ID: mdl-38488634

RESUMO

SMARTS is a widely used language in cheminformatics for defining substructural queries for database lookups, reaction templates for chemical transformations, and other applications. As an extension to SMILES, many SMARTS patterns can represent the same query. Despite this, no canonicalization algorithm invariant of the line notation sequence or atomic numbering is publicly available. Here, we introduce RDCanon, an open-source Python package that can be used to standardize SMARTS queries. RDCanon is designed to ensure that the sequence of atomic queries remains consistent for all graphs representing the same substructure query and to ensure a canonical sequence of primitives within each individual atom query; furthermore, the algorithm can be applied to canonicalize the order of reactants, agents, and products and their atom map numbers in reaction SMARTS templates. As part of its canonicalization algorithm, RDCanon provides a mechanism in which the canonicalized SMARTS is optimized for speed against specific molecular databases. Several case studies are provided to showcase improved efficiency in substructure matching and retrosynthetic analysis.


Assuntos
Algoritmos , Software , Linguagens de Programação , Quimioinformática/métodos , Bases de Dados de Compostos Químicos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...