Pesquisa | BVS Doenças Infecciosas e Parasitárias

1.

epidecodeR: a functional exploration tool for epigenetic and epitranscriptomic regulation.

Joshi, Kandarp; Wang, Dan O.

Brief Bioinform ; 25(2)2024 Jan 22.

Artigo em Inglês | MEDLINE | ID: mdl-38271482

RESUMO

Recent technological advances in sequencing DNA and RNA modifications using high-throughput platforms have generated vast epigenomic and epitranscriptomic datasets whose power in transforming life science is yet fully unleashed. Currently available in silico methods have facilitated the identification, positioning and quantitative comparisons of individual modification sites. However, the essential challenge to link specific 'epi-marks' to gene expression in the particular context of cellular and biological processes is unmet. To fast-track exploration, we generated epidecodeR implemented in R, which allows biologists to quickly survey whether an epigenomic or epitranscriptomic status of their interest potentially influences gene expression responses. The evaluation is based on the cumulative distribution function and the statistical significance in differential expression of genes grouped by the number of 'epi-marks'. This tool proves useful in predicting the role of H3K9ac and H3K27ac in associated gene expression after knocking down deacetylases FAM60A and SDS3 and N6-methyl-adenosine-associated gene expression after knocking out the reader proteins. We further used epidecodeR to explore the effectiveness of demethylase FTO inhibitors and histone-associated modifications in drug abuse in animals. epidecodeR is available for downloading as an R package at https://bioconductor.riken.jp/packages/3.13/bioc/html/epidecodeR.html.

Assuntos

Epigenômica , Software , Animais , Epigenômica/métodos , Metilação de DNA , DNA/metabolismo , Epigênese Genética

2.

Label-free proteome quantification and evaluation.

Fu, Jianbo; Yang, Qingxia; Luo, Yongchao; Zhang, Song; Tang, Jing; Zhang, Ying; Zhang, Hongning; Xu, Hanxiang; Zhu, Feng.

Brief Bioinform ; 24(1)2023 01 19.

Artigo em Inglês | MEDLINE | ID: mdl-36403090

RESUMO

The label-free quantification (LFQ) has emerged as an exceptional technique in proteomics owing to its broad proteome coverage, great dynamic ranges and enhanced analytical reproducibility. Due to the extreme difficulty lying in an in-depth quantification, the LFQ chains incorporating a variety of transformation, pretreatment and imputation methods are required and constructed. However, it remains challenging to determine the well-performing chain, owing to its strong dependence on the studied data and the diverse possibility of integrated chains. In this study, an R package EVALFQ was therefore constructed to enable a performance evaluation on >3000 LFQ chains. This package is unique in (a) automatically evaluating the performance using multiple criteria, (b) exploring the quantification accuracy based on spiking proteins and (c) discovering the well-performing chains by comprehensive assessment. All in all, because of its superiority in assessing from multiple perspectives and scanning among over 3000 chains, this package is expected to attract broad interests from the fields of proteomic quantification. The package is available at https://github.com/idrblab/EVALFQ.

Assuntos

Proteoma , Proteômica , Proteoma/metabolismo , Proteômica/métodos , Reprodutibilidade dos Testes

3.

kalis: a modern implementation of the Li & Stephens model for local ancestry inference in R.

Aslett, Louis J M; Christ, Ryan R.

BMC Bioinformatics ; 25(1): 86, 2024 Feb 28.

Artigo em Inglês | MEDLINE | ID: mdl-38418970

RESUMO

BACKGROUND: Approximating the recent phylogeny of N phased haplotypes at a set of variants along the genome is a core problem in modern population genomics and central to performing genome-wide screens for association, selection, introgression, and other signals. The Li & Stephens (LS) model provides a simple yet powerful hidden Markov model for inferring the recent ancestry at a given variant, represented as an N × N distance matrix based on posterior decodings. RESULTS: We provide a high-performance engine to make these posterior decodings readily accessible with minimal pre-processing via an easy to use package kalis, in the statistical programming language R. kalis enables investigators to rapidly resolve the ancestry at loci of interest and developers to build a range of variant-specific ancestral inference pipelines on top. kalis exploits both multi-core parallelism and modern CPU vector instruction sets to enable scaling to hundreds of thousands of genomes. CONCLUSIONS: The resulting distance matrices accessible via kalis enable local ancestry, selection, and association studies in modern large scale genomic datasets.

Assuntos

Genoma , Genômica , Humanos , Cadeias de Markov , Haplótipos , Etnicidade , Genética Populacional

4.

Inference of genomic landscapes using ordered Hidden Markov Models with emission densities (oHMMed).

Vogl, Claus; Karapetiants, Mariia; Yildirim, Burçin; Kjartansdóttir, Hrönn; Kosiol, Carolin; Bergman, Juraj; Majka, Michal; Mikula, Lynette Caitlin.

BMC Bioinformatics ; 25(1): 151, 2024 Apr 16.

Artigo em Inglês | MEDLINE | ID: mdl-38627634

RESUMO

BACKGROUND: Genomes are inherently inhomogeneous, with features such as base composition, recombination, gene density, and gene expression varying along chromosomes. Evolutionary, biological, and biomedical analyses aim to quantify this variation, account for it during inference procedures, and ultimately determine the causal processes behind it. Since sequential observations along chromosomes are not independent, it is unsurprising that autocorrelation patterns have been observed e.g., in human base composition. In this article, we develop a class of Hidden Markov Models (HMMs) called oHMMed (ordered HMM with emission densities, the corresponding R package of the same name is available on CRAN): They identify the number of comparably homogeneous regions within autocorrelated observed sequences. These are modelled as discrete hidden states; the observed data points are realisations of continuous probability distributions with state-specific means that enable ordering of these distributions. The observed sequence is labelled according to the hidden states, permitting only neighbouring states that are also neighbours within the ordering of their associated distributions. The parameters that characterise these state-specific distributions are inferred. RESULTS: We apply our oHMMed algorithms to the proportion of G and C bases (modelled as a mixture of normal distributions) and the number of genes (modelled as a mixture of poisson-gamma distributions) in windows along the human, mouse, and fruit fly genomes. This results in a partitioning of the genomes into regions by statistically distinguishable averages of these features, and in a characterisation of their continuous patterns of variation. In regard to the genomic G and C proportion, this latter result distinguishes oHMMed from segmentation algorithms based in isochore or compositional domain theory. We further use oHMMed to conduct a detailed analysis of variation of chromatin accessibility (ATAC-seq) and epigenetic markers H3K27ac and H3K27me3 (modelled as a mixture of poisson-gamma distributions) along the human chromosome 1 and their correlations. CONCLUSIONS: Our algorithms provide a biologically assumption free approach to characterising genomic landscapes shaped by continuous, autocorrelated patterns of variation. Despite this, the resulting genome segmentation enables extraction of compositionally distinct regions for further downstream analyses.

Assuntos

Genoma , Genômica , Animais , Humanos , Camundongos , Cadeias de Markov , Composição de Bases , Probabilidade , Algoritmos

5.

SpaceANOVA: Spatial Co-occurrence Analysis of Cell Types in Multiplex Imaging Data Using Point Process and Functional ANOVA.

Seal, Souvik; Neelon, Brian; Angel, Peggi M; O'Quinn, Elizabeth C; Hill, Elizabeth; Vu, Thao; Ghosh, Debashis; Mehta, Anand S; Wallace, Kristin; Alekseyenko, Alexander V.

J Proteome Res ; 23(4): 1131-1143, 2024 04 05.

Artigo em Inglês | MEDLINE | ID: mdl-38417823

RESUMO

Multiplex imaging platforms have enabled the identification of the spatial organization of different types of cells in complex tissue or the tumor microenvironment. Exploring the potential variations in the spatial co-occurrence or colocalization of different cell types across distinct tissue or disease classes can provide significant pathological insights, paving the way for intervention strategies. However, the existing methods in this context either rely on stringent statistical assumptions or suffer from a lack of generalizability. We present a highly powerful method to study differential spatial co-occurrence of cell types across multiple tissue or disease groups, based on the theories of the Poisson point process and functional analysis of variance. Notably, the method accommodates multiple images per subject and addresses the problem of missing tissue regions, commonly encountered due to data-collection complexities. We demonstrate the superior statistical power and robustness of the method in comparison with existing approaches through realistic simulation studies. Furthermore, we apply the method to three real data sets on different diseases collected using different imaging platforms. In particular, one of these data sets reveals novel insights into the spatial characteristics of various types of colorectal adenoma.

Assuntos

Simulação por Computador , Análise de Variância

6.

IsoForma: An R Package for Quantifying and Visualizing Positional Isomers in Top-Down LC-MS/MS Data.

Degnan, David J; Lewis, Logan A; Bramer, Lisa M; McCue, Lee Ann; Pesavento, James J; Zhou, Mowei; Bilbao, Aivett.

J Proteome Res ; 23(8): 3318-3321, 2024 Aug 02.

Artigo em Inglês | MEDLINE | ID: mdl-38421884

RESUMO

Proteoforms, the different forms of a protein with sequence variations including post-translational modifications (PTMs), execute vital functions in biological systems, such as cell signaling and epigenetic regulation. Advances in top-down mass spectrometry (MS) technology have permitted the direct characterization of intact proteoforms and their exact number of modification sites, allowing for the relative quantification of positional isomers (PI). Protein positional isomers refer to a set of proteoforms with identical total mass and set of modifications, but varying PTM site combinations. The relative abundance of PI can be estimated by matching proteoform-specific fragment ions to top-down tandem MS (MS2) data to localize and quantify modifications. However, the current approaches heavily rely on manual annotation. Here, we present IsoForma, an open-source R package for the relative quantification of PI within a single tool. Benchmarking IsoForma's performance against two existing workflows produced comparable results and improvements in speed. Overall, IsoForma provides a streamlined process for quantifying PI, reduces the analysis time, and offers an essential framework for developing customized proteoform analysis workflows. The software is open source and available at https://github.com/EMSL-Computing/isoforma-lib.

Assuntos

Espectrometria de Massa com Cromatografia Líquida , Isoformas de Proteínas , Processamento de Proteína Pós-Traducional , Software , Espectrometria de Massas em Tandem , Humanos , Isomerismo , Espectrometria de Massa com Cromatografia Líquida/métodos , Isoformas de Proteínas/análise , Proteômica/métodos , Espectrometria de Massas em Tandem/métodos

7.

Multicenter Collaborative Study to Optimize Mass Spectrometry Workflows of Clinical Specimens.

Kardell, Oliver; von Toerne, Christine; Merl-Pham, Juliane; König, Ann-Christine; Blindert, Marcel; Barth, Teresa K; Mergner, Julia; Ludwig, Christina; Tüshaus, Johanna; Eckert, Stephan; Müller, Stephan A; Breimann, Stephan; Giesbertz, Pieter; Bernhardt, Alexander M; Schweizer, Lisa; Albrecht, Vincent; Teupser, Daniel; Imhof, Axel; Kuster, Bernhard; Lichtenthaler, Stefan F; Mann, Matthias; Cox, Jürgen; Hauck, Stefanie M.

J Proteome Res ; 23(1): 117-129, 2024 01 05.

Artigo em Inglês | MEDLINE | ID: mdl-38015820

RESUMO

The foundation for integrating mass spectrometry (MS)-based proteomics into systems medicine is the development of standardized start-to-finish and fit-for-purpose workflows for clinical specimens. An essential step in this pursuit is to highlight the common ground in a diverse landscape of different sample preparation techniques and liquid chromatography-mass spectrometry (LC-MS) setups. With the aim to benchmark and improve the current best practices among the proteomics MS laboratories of the CLINSPECT-M consortium, we performed two consecutive round-robin studies with full freedom to operate in terms of sample preparation and MS measurements. The six study partners were provided with two clinically relevant sample matrices: plasma and cerebrospinal fluid (CSF). In the first round, each laboratory applied their current best practice protocol for the respective matrix. Based on the achieved results and following a transparent exchange of all lab-specific protocols within the consortium, each laboratory could advance their methods before measuring the same samples in the second acquisition round. Both time points are compared with respect to identifications (IDs), data completeness, and precision, as well as reproducibility. As a result, the individual performances of participating study centers were improved in the second measurement, emphasizing the effect and importance of the expert-driven exchange of best practices for direct practical improvements.

Assuntos

Plasma , Espectrometria de Massas em Tandem , Espectrometria de Massas em Tandem/métodos , Cromatografia Líquida/métodos , Fluxo de Trabalho , Reprodutibilidade dos Testes , Plasma/química

8.

Ravages: An R package for the simulation and analysis of rare variants in multicategory phenotypes.

Bocher, Ozvan; Marenne, Gaëlle; Génin, Emmanuelle; Perdry, Hervé.

Genet Epidemiol ; 47(6): 450-460, 2023 09.

Artigo em Inglês | MEDLINE | ID: mdl-37158367

RESUMO

Current software packages for the analysis and the simulations of rare variants are only available for binary and continuous traits. Ravages provides solutions in a single R package to perform rare variant association tests for multicategory, binary and continuous phenotypes, to simulate datasets under different scenarios and to compute statistical power. Association tests can be run in the whole genome thanks to C++ implementation of most of the functions, using either RAVA-FIRST, a recently developed strategy to filter and analyse genome-wide rare variants, or user-defined candidate regions. Ravages also includes a simulation module that generates genetic data for cases who can be stratified into several subgroups and for controls. Through comparisons with existing programmes, we show that Ravages complements existing tools and will be useful to study the genetic architecture of complex diseases. Ravages is available on the CRAN at https://cran.r-project.org/web/packages/Ravages/ and maintained on Github at https://github.com/genostats/Ravages.

Assuntos

Variação Genética , Modelos Genéticos , Humanos , Simulação por Computador , Fenótipo , Software

9.

simona: a comprehensive R package for semantic similarity analysis on bio-ontologies.

Gu, Zuguang.

BMC Genomics ; 25(1): 869, 2024 Sep 16.

Artigo em Inglês | MEDLINE | ID: mdl-39285315

RESUMO

BACKGROUND: Bio-ontologies are keys in structuring complex biological information for effective data integration and knowledge representation. Semantic similarity analysis on bio-ontologies quantitatively assesses the degree of similarity between biological concepts based on the semantics encoded in ontologies. It plays an important role in structured and meaningful interpretations and integration of complex data from multiple biological domains. RESULTS: We present simona, a novel R package for semantic similarity analysis on general bio-ontologies. Simona implements infrastructures for ontology analysis by offering efficient data structures, fast ontology traversal methods, and elegant visualizations. Moreover, it provides a robust toolbox supporting over 70 methods for semantic similarity analysis. With simona, we conducted a benchmark against current semantic similarity methods. The results demonstrate methods are clustered based on their mathematical methodologies, thus guiding researchers in the selection of appropriate methods. Additionally, we explored annotation-based versus topology-based methods, revealing that semantic similarities solely based on ontology topology can efficiently reveal semantic similarity structures, facilitating analysis on less-studied organisms and other ontologies. CONCLUSIONS: Simona offers a versatile interface and efficient implementation for processing, visualization, and semantic similarity analysis on bio-ontologies. We believe that simona will serve as a robust tool for uncovering relationships and enhancing the interoperability of biological knowledge systems.

Assuntos

Ontologias Biológicas , Semântica , Software , Biologia Computacional/métodos

10.

Neutrality in plant-herbivore interactions.

Pan, Vincent S; Wetzel, William C.

Proc Biol Sci ; 291(2017): 20232687, 2024 Feb 28.

Artigo em Inglês | MEDLINE | ID: mdl-38378151

RESUMO

Understanding the distribution of herbivore damage among leaves and individual plants is a central goal of plant-herbivore biology. Commonly observed unequal patterns of herbivore damage have conventionally been attributed to the heterogeneity in plant quality or herbivore behaviour or distribution. Meanwhile, the potential role of stochastic processes in structuring plant-herbivore interactions has been overlooked. Here, we show that based on simple first principle expectations from metabolic theory, random sampling of different sizes of herbivores from a regional pool is sufficient to explain patterns of variation in herbivore damage. This is despite making the neutral assumption that herbivory is caused by randomly feeding herbivores on identical and passive plants. We then compared its predictions against 765 datasets of herbivory on 496 species across 116° of latitude from the Herbivory Variability Network. Using only one free parameter, the estimated attack rate, our neutral model approximates the observed frequency distribution of herbivore damage among plants and especially among leaves very well. Our results suggest that neutral stochastic processes play a large and underappreciated role in natural variation in herbivory and may explain the low predictability of herbivory patterns. We argue that such prominence warrants its consideration as a powerful force in plant-herbivore interactions.

Assuntos

Herbivoria , Folhas de Planta , Plantas

11.

LION: an integrated R package for effective prediction of ncRNA-protein interaction.

Han, Siyu; Yang, Xiao; Sun, Hang; Yang, Hu; Zhang, Qi; Peng, Cheng; Fang, Wensi; Li, Ying.

Brief Bioinform ; 23(6)2022 11 19.

Artigo em Inglês | MEDLINE | ID: mdl-36155620

RESUMO

Understanding ncRNA-protein interaction is of critical importance to unveil ncRNAs' functions. Here, we propose an integrated package LION which comprises a new method for predicting ncRNA/lncRNA-protein interaction as well as a comprehensive strategy to meet the requirement of customisable prediction. Experimental results demonstrate that our method outperforms its competitors on multiple benchmark datasets. LION can also improve the performance of some widely used tools and build adaptable models for species- and tissue-specific prediction. We expect that LION will be a powerful and efficient tool for the prediction and analysis of ncRNA/lncRNA-protein interaction. The R Package LION is available on GitHub at https://github.com/HAN-Siyu/LION/.

Assuntos

RNA Longo não Codificante , RNA não Traduzido/genética

12.

Improve consensus partitioning via a hierarchical procedure.

Gu, Zuguang; Hübschmann, Daniel.

Brief Bioinform ; 23(3)2022 05 13.

Artigo em Inglês | MEDLINE | ID: mdl-35289356

RESUMO

Consensus partitioning is an unsupervised method widely used in high-throughput data analysis for revealing subgroups and assigning stability for the classification. However, standard consensus partitioning procedures are weak for identifying large numbers of stable subgroups. There are two major issues. First, subgroups with small differences are difficult to be separated if they are simultaneously detected with subgroups with large differences. Second, stability of classification generally decreases as the number of subgroups increases. In this work, we proposed a new strategy to solve these two issues by applying consensus partitioning in a hierarchical procedure. We demonstrated hierarchical consensus partitioning can be efficient to reveal more meaningful subgroups. We also tested the performance of hierarchical consensus partitioning on revealing a great number of subgroups with a large deoxyribonucleic acid methylation dataset. The hierarchical consensus partitioning is implemented in the R package cola with comprehensive functionalities for analysis and visualization. It can also automate the analysis only with a minimum of two lines of code, which generates a detailed HTML report containing the complete analysis. The cola package is available at https://bioconductor.org/packages/cola/.

Assuntos

Software , Consenso

13.

A tool for feature extraction from biological sequences.

Amerifar, Sare; Norouzi, Mahammad; Ghandi, Mahmoud.

Brief Bioinform ; 23(3)2022 05 13.

Artigo em Inglês | MEDLINE | ID: mdl-35383372

RESUMO

With the advances in sequencing technologies, a huge amount of biological data is extracted nowadays. Analyzing this amount of data is beyond the ability of human beings, creating a splendid opportunity for machine learning methods to grow. The methods, however, are practical only when the sequences are converted into feature vectors. Many tools target this task including iLearnPlus, a Python-based tool which supports a rich set of features. In this paper, we propose a holistic tool that extracts features from biological sequences (i.e. DNA, RNA and Protein). These features are the inputs to machine learning models that predict properties, structures or functions of the input sequences. Our tool not only supports all features in iLearnPlus but also 30 additional features which exist in the literature. Moreover, our tool is based on R language which makes an alternative for bioinformaticians to transform sequences into feature vectors. We have compared the conversion time of our tool with that of iLearnPlus: we transform the sequences much faster. We convert small nucleotides by a median of 2.8X faster, while we outperform iLearnPlus by a median of 6.3X for large sequences. Finally, in amino acids, our tool achieves a median speedup of 23.9X.

Assuntos

Aprendizado de Máquina , Proteínas , DNA/genética , Humanos , Proteínas/química , RNA/genética , Análise de Sequência/métodos

14.

ggPlantmap: an open-source R package for the creation of informative and quantitative ggplot maps derived from plant images.

Jo, Leonardo; Kajala, Kaisa.

J Exp Bot ; 75(17): 5366-5376, 2024 Sep 11.

Artigo em Inglês | MEDLINE | ID: mdl-38329371

RESUMO

As plant research generates an ever-growing volume of spatial quantitative data, the need for decentralized and user-friendly visualization tools to explore large and complex datasets becomes crucial. Existing resources, such as the Plant eFP (electronic Fluorescent Pictograph) viewer, have played a pivotal role on the communication of gene expression data across many plant species. However, although widely used by the plant research community, the Plant eFP viewer lacks open and user-friendly tools for the creation of customized expression maps independently. Plant biologists with less coding experience can often encounter challenges when attempting to explore ways to communicate their own spatial quantitative data. We present 'ggPlantmap' an open-source R package designed to address this challenge by providing an easy and user-friendly method for the creation of ggplot representative maps from plant images. ggPlantmap is built in R, one of the most used languages in biology, to empower plant scientists to create and customize eFP-like viewers tailored to their experimental data. Here, we provide an overview of the package and tutorials that are accessible even to users with minimal R programming experience. We hope that ggPlantmap can assist the plant science community, fostering innovation, and improving our understanding of plant development and function.

Assuntos

Plantas , Software , Plantas/metabolismo , Processamento de Imagem Assistida por Computador/métodos

15.

unmconf : an R package for Bayesian regression with unmeasured confounders.

Hebdon, Ryan; Stamey, James; Kahle, David; Zhang, Xiang.

BMC Med Res Methodol ; 24(1): 195, 2024 Sep 07.

Artigo em Inglês | MEDLINE | ID: mdl-39244581

RESUMO

The inability to correctly account for unmeasured confounding can lead to bias in parameter estimates, invalid uncertainty assessments, and erroneous conclusions. Sensitivity analysis is an approach to investigate the impact of unmeasured confounding in observational studies. However, the adoption of this approach has been slow given the lack of accessible software. An extensive review of available R packages to account for unmeasured confounding list deterministic sensitivity analysis methods, but no R packages were listed for probabilistic sensitivity analysis. The R package unmconf implements the first available package for probabilistic sensitivity analysis through a Bayesian unmeasured confounding model. The package allows for normal, binary, Poisson, or gamma responses, accounting for one or two unmeasured confounders from the normal or binomial distribution. The goal of unmconf is to implement a user friendly package that performs Bayesian modeling in the presence of unmeasured confounders, with simple commands on the front end while performing more intensive computation on the back end. We investigate the applicability of this package through novel simulation studies. The results indicate that credible intervals will have near nominal coverage probability and smaller bias when modeling the unmeasured confounder(s) for varying levels of internal/external validation data across various combinations of response-unmeasured confounder distributional families.

Assuntos

Teorema de Bayes , Fatores de Confusão Epidemiológicos , Software , Humanos , Simulação por Computador , Modelos Estatísticos , Algoritmos , Viés , Análise de Regressão

16.

SurvdigitizeR: an algorithm for automated survival curve digitization.

Zhang, Jasper Zhongyuan; Rios, Juan David; Pechlivanoglou, Tilemanchos; Yang, Alan; Zhang, Qiyue; Deris, Dimitrios; Cromwell, Ian; Pechlivanoglou, Petros.

BMC Med Res Methodol ; 24(1): 147, 2024 Jul 13.

Artigo em Inglês | MEDLINE | ID: mdl-39003440

RESUMO

BACKGROUND: Decision analytic models and meta-analyses often rely on survival probabilities that are digitized from published Kaplan-Meier (KM) curves. However, manually extracting these probabilities from KM curves is time-consuming, expensive, and error-prone. We developed an efficient and accurate algorithm that automates extraction of survival probabilities from KM curves. METHODS: The automated digitization algorithm processes images from a JPG or PNG format, converts them in their hue, saturation, and lightness scale and uses optical character recognition to detect axis location and labels. It also uses a k-medoids clustering algorithm to separate multiple overlapping curves on the same figure. To validate performance, we generated survival plots form random time-to-event data from a sample size of 25, 50, 150, and 250, 1000 individuals split into 1,2, or 3 treatment arms. We assumed an exponential distribution and applied random censoring. We compared automated digitization and manual digitization performed by well-trained researchers. We calculated the root mean squared error (RMSE) at 100-time points for both methods. The algorithm's performance was also evaluated by Bland-Altman analysis for the agreement between automated and manual digitization on a real-world set of published KM curves. RESULTS: The automated digitizer accurately identified survival probabilities over time in the simulated KM curves. The average RMSE for automated digitization was 0.012, while manual digitization had an average RMSE of 0.014. Its performance was negatively correlated with the number of curves in a figure and the presence of censoring markers. In real-world scenarios, automated digitization and manual digitization showed very close agreement. CONCLUSIONS: The algorithm streamlines the digitization process and requires minimal user input. It effectively digitized KM curves in simulated and real-world scenarios, demonstrating accuracy comparable to conventional manual digitization. The algorithm has been developed as an open-source R package and as a Shiny application and is available on GitHub: https://github.com/Pechli-Lab/SurvdigitizeR and https://pechlilab.shinyapps.io/SurvdigitizeR/ .

Assuntos

Algoritmos , Humanos , Estimativa de Kaplan-Meier , Análise de Sobrevida , Probabilidade

17.

crossnma: An R package to synthesize cross-design evidence and cross-format data using network meta-analysis and network meta-regression.

Hamza, Tasnim; Schwarzer, Guido; Salanti, Georgia.

BMC Med Res Methodol ; 24(1): 169, 2024 Aug 05.

Artigo em Inglês | MEDLINE | ID: mdl-39103781

RESUMO

BACKGROUND: Although aggregate data (AD) from randomised clinical trials (RCTs) are used in the majority of network meta-analyses (NMAs), other study designs (e.g., cohort studies and other non-randomised studies, NRS) can be informative about relative treatment effects. The individual participant data (IPD) of the study, when available, are preferred to AD for adjusting for important participant characteristics and to better handle heterogeneity and inconsistency in the network. RESULTS: We developed the R package crossnma to perform cross-format (IPD and AD) and cross-design (RCT and NRS) NMA and network meta-regression (NMR). The models are implemented as Bayesian three-level hierarchical models using Just Another Gibbs Sampler (JAGS) software within the R environment. The R package crossnma includes functions to automatically create the JAGS model, reformat the data (based on user input), assess convergence and summarize the results. We demonstrate the workflow within crossnma by using a network of six trials comparing four treatments. CONCLUSIONS: The R package crossnma enables the user to perform NMA and NMR with different data types in a Bayesian framework and facilitates the inclusion of all types of evidence recognising differences in risk of bias.

Assuntos

Teorema de Bayes , Metanálise em Rede , Software , Humanos , Ensaios Clínicos Controlados Aleatórios como Assunto/métodos , Ensaios Clínicos Controlados Aleatórios como Assunto/estatística & dados numéricos , Projetos de Pesquisa , Algoritmos , Metanálise como Assunto

18.

IncidencePrevalence: An R package to calculate population-level incidence rates and prevalence using the OMOP common data model.

Raventós, Berta; Català, Martí; Du, Mike; Guo, Yuchen; Black, Adam; Inberg, Ger; Li, Xintong; López-Güell, Kim; Newby, Danielle; de Ridder, Maria; Barboza, Cesar; Duarte-Salles, Talita; Verhamme, Katia; Rijnbeek, Peter; Prieto Alhambra, Daniel; Burn, Edward.

Pharmacoepidemiol Drug Saf ; 33(1): e5717, 2024 Jan.

Artigo em Inglês | MEDLINE | ID: mdl-37876360

RESUMO

PURPOSE: Real-world data (RWD) offers a valuable resource for generating population-level disease epidemiology metrics. We aimed to develop a well-tested and user-friendly R package to compute incidence rates and prevalence in data mapped to the observational medical outcomes partnership (OMOP) common data model (CDM). MATERIALS AND METHODS: We created IncidencePrevalence, an R package to support the analysis of population-level incidence rates and point- and period-prevalence in OMOP-formatted data. On top of unit testing, we assessed the face validity of the package. To do so, we calculated incidence rates of COVID-19 using RWD from Spain (SIDIAP) and the United Kingdom (CPRD Aurum), and replicated two previously published studies using data from the Netherlands (IPCI) and the United Kingdom (CPRD Gold). We compared the obtained results to those previously published, and measured execution times by running a benchmark analysis across databases. RESULTS: IncidencePrevalence achieved high agreement to previously published data in CPRD Gold and IPCI, and showed good performance across databases. For COVID-19, incidence calculated by the package was similar to public data after the first-wave of the pandemic. CONCLUSION: For data mapped to the OMOP CDM, the IncidencePrevalence R package can support descriptive epidemiological research. It enables reliable estimation of incidence and prevalence from large real-world data sets. It represents a simple, but extendable, analytical framework to generate estimates in a reproducible and timely manner.

Assuntos

COVID-19 , Gerenciamento de Dados , Humanos , Incidência , Prevalência , Bases de Dados Factuais , COVID-19/epidemiologia

19.

Statistical rules for safety monitoring in clinical trials.

Martens, Michael J; Logan, Brent R.

Clin Trials ; 21(2): 152-161, 2024 04.

Artigo em Inglês | MEDLINE | ID: mdl-37877375

RESUMO

BACKGROUND/AIMS: Protecting patient safety is an essential component of the conduct of clinical trials. Rigorous safety monitoring schemes are implemented for these studies to guard against excess toxicity risk from study therapies. They often include protocol-specified stopping rules dictating that an excessive number of safety events will trigger a halt of the study. Statistical methods are useful for constructing rules that protect patients from exposure to excessive toxicity while also maintaining the chance of a false safety signal at a low level. Several statistical techniques have been proposed for this purpose, but the current literature lacks a rigorous comparison to determine which method may be best suitable for a given trial design. The aims of this article are (1) to describe a general framework for repeated monitoring of safety events in clinical trials; (2) to survey common statistical techniques for creating safety stopping criteria; and (3) to provide investigators with a software tool for constructing and assessing these stopping rules. METHODS: The properties and operating characteristics of stopping rules produced by Pocock and O'Brien-Fleming tests, Bayesian Beta-Binomial models, and sequential probability ratio tests (SPRTs) are studied and compared for common scenarios that may arise in phase II and III trials. We developed the R package "stoppingrule" for constructing and evaluating stopping rules from these methods. Its usage is demonstrated through a redesign of a stopping rule for BMT CTN 0601 (registered at Clinicaltrials.gov as NCT00745420), a phase II, single-arm clinical trial that evaluated outcomes in pediatric sickle cell disease patients treated by bone marrow transplant. RESULTS: Methods with aggressive stopping criteria early in the trial, such as the Pocock test and Bayesian Beta-Binomial models with weak priors, have permissive stopping criteria at late stages. This results in a trade-off where rules with aggressive early monitoring generally will have a smaller number of expected toxicities but also lower power than rules with more conservative early stopping, such as the O-Brien-Fleming test and Beta-Binomial models with strong priors. The modified SPRT method is sensitive to the choice of alternative toxicity rate. The maximized SPRT generally has a higher number of expected toxicities and/or worse power than other methods. CONCLUSIONS: Because the goal is to minimize the number of patients exposed to and experiencing toxicities from an unsafe therapy, we recommend using the Pocock or Beta-Binomial, weak prior methods for constructing safety stopping rules. At the design stage, the operating characteristics of candidate rules should be evaluated under various possible toxicity rates in order to guide the choice of rule(s) for a given trial; our R package facilitates this evaluation.

Assuntos

Modelos Estatísticos , Projetos de Pesquisa , Humanos , Criança , Teorema de Bayes , Probabilidade , Avaliação de Resultados em Cuidados de Saúde

20.

Advanced Machine Learning Models for Estimating the Distribution of Sea-Surface Particulate Organic Carbon (POC) Concentrations Using Satellite Remote Sensing Data: The Mediterranean as an Example.

Li, Chao; Wu, Huisheng; Yang, Chaojun; Cui, Long; Ma, Ziyue; Wang, Lejie.

Sensors (Basel) ; 24(17)2024 Aug 31.

Artigo em Inglês | MEDLINE | ID: mdl-39275580

RESUMO

Accurate estimation of the distribution of POC in the sea surface is an important issue in understanding the carbon cycle at the basin scale in the ocean. This study explores the best machine learning approach to determine the distribution of POC in the ocean surface layer based on data obtained using satellite remote sensing. In order to estimate and verify the accuracy of this method, it is necessary to obtain a large amount of POC data from field observations, so this study was conducted in the Mediterranean Sea, where such data have been obtained and published. The research initially utilizes the Geographic Detector (GD) method to identify spatial correlations between POC and 47 environmental factors in the region. Four machine learning models of a Bayesian optimized random forest (BRF), a backpropagation neural network, adaptive boosting, and extreme gradient boosting were utilized to construct POC assessment models. Model validation yielded that the BRF exhibited superior performance in estimating sea-surface POC. To build a more accurate tuneRanger random forest (TRRF) model, we introduced the tuneRanger R package for further optimization, resulting in an R2 of 0.868, a mean squared error of 1.119 (mg/m3)2, and a mean absolute error of 1.041 mg/m3. It was employed to estimate the surface POC concentrations in the Mediterranean for May and June 2017. Spatial analysis revealed higher concentrations in the west and north and lower concentrations in the east and south, with higher levels near the coast and lower levels far from the coast. Additionally, we deliberated on the impact of human activities on the surface POC in the Mediterranean. This research contributes a high-precision method for satellite retrieval of surface POC concentrations in the Mediterranean, thereby enriching the understanding of POC dynamics in this area.

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA